diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf new file mode 100644 index 000000000..caf451ded Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf new file mode 100644 index 000000000..ba905a254 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek39.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek39.pdf new file mode 100644 index 000000000..d6a57bb16 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek39.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf new file mode 100644 index 000000000..491d4d210 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf new file mode 100644 index 000000000..9199be871 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf new file mode 100644 index 000000000..0b1000252 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf new file mode 100644 index 000000000..4186ab06a Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek44.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek44.pdf new file mode 100644 index 000000000..0b17ce9e4 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek44.pdf differ diff --git a/doc/HandWrittenNotes/2025/FYSSTweek45.pdf b/doc/HandWrittenNotes/2025/FYSSTweek45.pdf new file mode 100644 index 000000000..09fefba30 Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTweek45.pdf differ diff --git a/doc/LectureNotes/.DS_Store b/doc/LectureNotes/.DS_Store index 510782495..cb7504035 100644 Binary files a/doc/LectureNotes/.DS_Store and b/doc/LectureNotes/.DS_Store differ diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek34-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek34-checkpoint.ipynb new file mode 100644 index 000000000..50e773ede --- /dev/null +++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek34-checkpoint.ipynb @@ -0,0 +1,315 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "232d1306", + "metadata": {}, + "source": [ + "# Exercises week 34\n", + "\n", + "## Coding Setup and Linear Regression" + ] + }, + { + "cell_type": "markdown", + "id": "9b66a351", + "metadata": {}, + "source": [ + "Welcome to FYS-STK3155/4155!\n", + "\n", + "In this first week will focus on getting you set up with the programs you are going to be using throughout this course. We expect that many of you will encounter some trouble with setting these programs up, as they can be extremely finnicky and prone to not working the same on all machines, so we strongly encourage you to not get discouraged, and to show up to the group-sessions where we can help you along. The group sessions are also the best place to find group partners for the projects and to be challenged on your understanding of the material, which are both essential to doing well in this course. We strongly encourage you to form groups of 2-3 participants. \n", + "\n", + "If you are unable to complete this week's exercises, don't worry, this will likely be the most frustrating week for many of you. You have time to get back on track next week, especially if you come to the group-sessions! Note also that this week's set of exercises does not count for the additional score. The deadline for the weekly exercises is set to Fridays, at midnight." + ] + }, + { + "cell_type": "markdown", + "id": "36d8750b", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Create and use a Github repository\n", + "- Set up and use a virtual environment in Python\n", + "- Fit an OLS model to data using scikit-learn\n", + "- Fit a model on training data and evaluate it on test data\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in a jupyter notebook. Exercises 1,2 and 3 require no writing in the notebook. Then, in canvas, include\n", + "- The jupyter notebook with the exercises completed\n", + "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n", + "- Optional: A link to your github repository, which must be set to public, include the notebook file, a README file, requirements file and gitignore file.\n", + "\n", + "We require you to deliver a jupyter notebook so that we can evaluate the results of your code without needing to download and run the code of every student, as well as to teach you to use this useful tool." + ] + }, + { + "cell_type": "markdown", + "id": "2a9c7ef8", + "metadata": {}, + "source": [ + "## Exercise 1 - Github Setup\n" + ] + }, + { + "cell_type": "markdown", + "id": "1498aed1", + "metadata": {}, + "source": [ + "In this course, we require you to pay extra mind to the reproducibility of your results and the shareability of your code. The first step toward these goals is using a version control system like git and online repository like Github.\n", + "\n", + "**a)** Download git if you don't already have it on your machine, check with the terminal command ´git --version´ (https://git-scm.com/downloads).\n", + "\n", + "**b)** Create a Github account(https://github.com/), or log in to github with your UiO account (https://github.uio.no/login).\n", + "\n", + "**c)** Learn the basics of opening the terminal and navigating folders on your operating system. Things to learn: Opening a terminal, opening a terminal in a specific folder, listing the contents of the current folder, navigating into a folder, navigating out of a folder.\n", + "\n", + "**d)** Download the Github CLI tool and run ´gh auth login´ in your terminal to authenticate your local machine for some of the later steps. (https://github.com/cli/cli#installation). You might need to change file permissions to make it work, ask us or ChatGPT for help with these issues.\n", + "\n", + "**e)** As an alternative to the above terminal based instructions, you could install GitHub Desktop (see https://desktop.github.com/download/) or if you prefer GitLab, GitLab desktop (see https://about.gitlab.com/install/). This sets up all communications between your PC/Laptop and the repository. This allows you to combine exercises 1 and 2 in an easy way if you don't want to use terminarl. Keep in mind that these GUIs (graphical user interfaces) are not text editors." + ] + }, + { + "cell_type": "markdown", + "id": "c56fbefa", + "metadata": {}, + "source": [ + "## Exercise 2 - Setting up a Github repository\n" + ] + }, + { + "cell_type": "markdown", + "id": "fb9b8acd", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "**a)** Create an empty repository for your coursework in this course in your browser at github.com (or uio github).\n", + "\n", + "**b)** Open a terminal in the location you want to create your local folder for this repository, like your desktop.\n", + "\n", + "**c)** Clone the repository to your laptop using the terminal command ´gh repo clone username/repository-name´. This creates a folder with the same name as the repository. Moving it or renaming it might require some extra steps.\n", + "\n", + "**d)** Download this jupyter notebook. Add the notebook to the local folder.\n", + "\n", + "**e)** Run the ´git add .´ command command in a terminal opened in the local folder to stage the current changes in the folder to be commited to the version control history. Run ´git status´ to see the staged files.\n", + "\n", + "**f)** Run the ´git commit -m \"Adding first weekly assignment file\"´ command to commit the staged changes to the version control history. Run ´git status´ to see that no files are staged.\n", + "\n", + "**g)** Run the ´git push\" command to upload the commited changes to the remote repository on Github.\n", + "\n", + "**h)** Add a file called README.txt to the repository at Github.com. Don't do this in your local folder. Add a suitable title for your repository and some inforomation to the file.\n", + "\n", + "**i)** Run the ´git fetch origin´ command to fetch the latest remote changes to your repository.\n", + "\n", + "**j)** Run the ´git pull´ command to download and update files to match the remote changes.\n" + ] + }, + { + "cell_type": "markdown", + "id": "f84d0db6", + "metadata": {}, + "source": [ + "## Exercise 3 - Setting up a Python virtual environment\n" + ] + }, + { + "cell_type": "markdown", + "id": "b5a4818a", + "metadata": {}, + "source": [ + "Following the themes from the previous exercises, another way of improving the reproducibility of your results and shareability of your code is having a good handle on which python packages you are using.\n", + "\n", + "There are many ways to manage your packages in Python, and you are free to use any approach you want, but in this course we encourage you to use something called a virtual environment. A virtual environemnt is a folder in your project which contains a Python runtime executable as well as all the packages you are using in the current project. In this way, each of your projects has its required set of packages installed in the same folder, so that if anything goes wrong while managing your packages it only affects the one project, and if multiple projects require different versions of the same package, you don't need to worry about messing up old projects. Also, it's easy to just delete the folder and start over if anything goes wrong.\n", + "\n", + "Virtual environments are typically created, activated, managed and updated using terminal commands, but for now we recommend that you let for example VS Code (a popular cross-paltform package) handle it for you to make the coding experience much easier. If you are familiar with another approach for virtual environments that works for you, feel free to keep doing it that way.\n" + ] + }, + { + "cell_type": "markdown", + "id": "0f6de364", + "metadata": {}, + "source": [ + "**a)** Open this notebook in VS Code (https://code.visualstudio.com/Download). Download the Python and Jupyter extensions.\n", + "\n", + "**b)** Press ´Cmd + Shift + P´, then search and run ´Python: Create Environment...´\n", + "\n", + "**c)** Select ´Venv´\n", + "\n", + "**d)** Choose the most up-to-date version of Python your have installed.\n", + "\n", + "**e)** Press ´Cmd + Shift + P´, then search and run ´Python: Select Interpreter´\n", + "\n", + "**f)** Selevet the (.venv) option you just created.\n", + "\n", + "**g)** Open a terminal in VS Code, the venv name should be visible at the beginning of the line. Run `pip list` to see that there are no packages install in the environment.\n", + "\n", + "**h)** In this terminal, run `pip install matplotlib numpy scikit-learn`. This will install the listed packages.\n", + "\n", + "**i)** To make these installations reproducible, which is important for reproducing results and sharing your code, run ´pip freeze > requirements.txt´ to create the file requirements.txt with all your dependencies.\n", + "\n", + "Now, anyone who wants to recreate your package setup can download your requirements.txt file and run ´pip install -r requirements.txt´ to install the correct packages and versions. To keep the requirements.txt file up to date with your environment, you will need to re-run the freeze command whenever you install a new package.\n", + "\n", + "**j)** Create a .gitignore file at the root of your project folder, and add the line ´.venv´ to it. This way, you won't try to upload a copy of all your python packages when you regularly push your changes to Github. Ignored files should not show up when you run ´git status´, and are not staged when running ´git add .´, try it!" + ] + }, + { + "cell_type": "markdown", + "id": "5d184ab1", + "metadata": {}, + "source": [ + "## Exercise 3 - Fitting an OLS model to data\n" + ] + }, + { + "cell_type": "markdown", + "id": "d19ebd67", + "metadata": {}, + "source": [ + "Great job on getting through all of that! Now it is time to do some actual machine learning!\n", + "\n", + "**a)** Complete the code below so that you fit a second order polynomial to the data. You will need to look up some scikit-learn documentation online (look at the imported functions for hints).\n", + "\n", + "**b)** Compute the mean square error for the line model and for the second degree polynomial model." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "b58fb9bf", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.preprocessing import PolynomialFeatures # use the fit_transform method of the created object!\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "0208e9ca", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAASZ5JREFUeJzt3QuczPX++PH37GItdpclLVIuiSSV08U1KUVJOnXSr4vLOV1Jkn+FOn6ojqXboZtup1Kd8lMhKioJSeIkHZduWJdEInZdF7vzf7y/s7PNzs7l+535zv31fDzmrJ35zsx3J8f3vZ/P++JwOp1OAQAAsEGaHS8CAACgCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtqkiUlZaWyi+//CJZWVnicDii/fYAACAE2k9z37590rBhQ0lLS4ufwEKDisaNG0f7bQEAgA22bt0qJ5xwgj2BRZMmTWTz5s2V7h88eLA888wzpl5DVyrcJ5adnW3l7QEAQIwUFRUZCwPu67gtgcWKFSukpKSk/Ps1a9bIxRdfLNdcc43p13Bvf2hQQWABAEBiCZbGYCmwOO644yp8P2HCBGnevLl07do1tLMDAABJJeQciyNHjsgbb7whw4cPDxi9FBcXGzfPpRQAAJCcQi43nTVrluzdu1cGDhwY8Lj8/HzJyckpv5G4CQBA8nI4tX4kBD169JBq1arJnDlzAh7na8VCg4vCwkK/ORaax3H06NFQTgsJJj09XapUqULpMQDEOb1+6wJBoOt3yFshWhkyf/58mTFjRtBjMzIyjJtZ+/fvl59//tmol0VqqFGjhjRo0MAIVAEAiS2kwOKVV16R+vXrS69evWw9GV2p0KBCLzSaKMpvsclNg0fN1fntt9+koKBAWrRoEbDpCgAgCQML7ZypgcWAAQOMJWw76faHXmw0qMjMzLT1tRGf9L9z1apVjVUwDTKqV68e61MCAITB8q+HugWyZcsW+dvf/iaRwkpFamGVAgCSh+Ulh0suuYT8BwAA4kxJqVOWF/wuO/cdlvpZ1eXcprmSnhb9X9SjPisEAADYG0x8sm6HzFr1i/x+4Ej5Yw1yqsuY3q2lZ5sGEk0EFgAAJKB5a7bLuDnrZHvhYZ+P7yg8LIPeWClTbmwX1eCCzW0baJMwzQvRmyYiHn/88cYMlZdfftlIdjXr1Vdfldq1a0f0XAEAyRFUDHpjpd+gQrmTFjT40JWNaEnKwEI/wC837Jb3Vm0zvkbjA+3Zs6ds375dNm3aJHPnzpVu3brJXXfdJZdffrkcO3Ys4u8PAEhuJWXXtpnfbJP7Z64uDxwC0WM0+NDtkmipkgpLQ9HYZ9ImYHl5ecafGzVqJO3atZP27dvLRRddZKxE3HzzzfLEE08YpbobN26U3Nxc6d27tzzyyCNSq1YtWbhwofz1r3+tUBUzZswYGTt2rLz++usyefJk+eGHH6RmzZpy4YUXyqRJk4xeIgCA5DcvyLZHMJrQGS1pqbA05N5n0sejSQOAM844o7xDqZZVPvnkk7J27VqZOnWqLFiwQO677z7jsY4dOxrBgrZJ1ZUPvd1zzz3l/T0eeugh+fbbb40ZLboqEmxGCwAgdbY9gtEqkWipkkxLRBrN+Voa0vt0DUAfv7h1XlTLb1q1aiX//e9/jT8PGzas/P4mTZrIww8/LLfffrs8++yzRjtr7cGuqxXulQ83z54hzZo1M4KTc845x2h/rqsdAIDkVBLg2maGXu3yclylp9GSNCsWun8ULIkl2vtMxvs6neVbG9pcTLdGdKskKytL+vXrJ7t375aDBw8GfI2vv/7a2DY58cQTjed17drVuF8blQEAktfyINe2QNy/QmsqQDR/oU6awMLs/lE095nUd999J02bNjW2LzSRs23btvLuu+8awcIzzzxjHKOtrP05cOCAMUlWt0j+/e9/y4oVK2TmzJlBnwcASHw7w7hm6UpFtEtNk2orxOz+UTT3mTSHYvXq1XL33XcbgYSWnj7++OPlLaynT59e4XjdDtFBbJ6+//57Y1VjwoQJxrh59Z///CdqPwMAIHbq1TI/HdzTkG7N5e6LW8ak82bSrFjo/pFWf/j7CPX+BhHcZyouLpYdO3bItm3bZOXKlTJ+/Hjp06ePsUrRv39/Ofnkk40kzKeeesqoCtFKj+eee67Ca2jeheZNfPrpp7Jr1y5ji0S3PzTgcD9v9uzZRiInACAFOEN7WqeTj4tJUJFUgYV+gLqPpBwx2GeaN2+eNGjQwAgOtKfFZ599ZiRZvvfee5Kenm5Uh2i56cSJE6VNmzbGtkZ+fn6F19DKEE3mvPbaa40Jr1qKql+1XPXtt9+W1q1bGysXjz32WER+BgBAfNl1oNjS8ZH+JdrUOTijPFGsqKjIqH4oLCw08gY8HT58WAoKCoychFDHZ8eqjwVCZ8d/dwBIRl9u2C3XvbjM1LHuX5sjlVcR6PqdlDkWbvphaklpPEx4AwDAjm1+7cfkNJGsGQ+/RCddYKE0iOjQvG6sTwMAgLD9zzknyj/n/1jpfv11WYONv3VqYvxCHS+/RCdlYAEAQLK38c6LkxUKbwQWAADEaRtvp5/H7+7eQoZc2CIuViiStioEAIBUaOPtEJFpK7ZKvCKwAAAgjiyP0xEVZhFYAAAQR3bG6YgKswgsAACII/XjcESFFSRvAgAQ5RyK5QF6LQXrXRGLUehWsGIRBTo2fdasWZIsxo4dK2eeeabp43Wyq34Gq1atiuh5AUAiVHt0nrjA6KZ517RVxlf9Xu/37l3hL3lT77/ijAZxWRGSvCsWpSUim5eK7P9VpNbxIid1FElLj9jbDRw4UPbu3es3eNi+fbvUqVMnYu8PAIh//kpIdWVC79dW3CpQ7wq3FxYXyFkn1om7HhbJGVismy0yb4RI0S9/3JfdUKTnRJHWV8TklPLy8mLyvgCA+C8hdZZtb4yasVr2HDxq+jX19bTjZrytXKQlXVAxvX/FoEIVbXfdr4/HeCvEvS0wY8YM6datm9SoUcOYfPrll19WeM6SJUukS5cukpmZKY0bN5ahQ4fKgQMHgm5PvPzyy8ao9Vq1asngwYOlpKTEmJKqwU39+vXlH//4R4XnbdmyxRjvrsfrUJm+ffvKr7/+WuEYnah6/PHHS1ZWltx0003G0DBvL730kpx66qnGELFWrVrJs88+G+anBgCpVUK6x0JQEc8lp2lJtf2hKxV+40FdhxrpOi4OPPDAA3LPPfcYeQennHKKXHfddXLs2DHjsQ0bNhij16+++mr573//K//3f/9nBBpDhgwJ+Jr6vLlz5xoj3N966y3517/+Jb169ZKff/5ZFi1aZIxs//vf/y5fffWVcXxpaakRVPz+++/G45988ols3LjRGNvuNn36dCNoGT9+vPznP/8xRsN7Bw06Av5///d/jaDlu+++M44dPXq0TJ06NSKfHQAkmp0RKg2Nx5LT5NkK0ZwK75WKCpwiRdtcxzXtIrGmQYVe9NW4cePktNNOk/Xr1xu/7efn58sNN9wgw4YNMx5v0aKFPPnkk9K1a1eZMmWK39HiGijoioWuLLRu3dpYEfnhhx/kww8/lLS0NGnZsqURXHz22Wdy3nnnyaeffiqrV682Rpbrqoh67bXXjHNZsWKFnHPOOTJp0iRjlUJv6uGHH5b58+dXWLUYM2aMPP7443LVVVcZ3+v483Xr1snzzz8vAwYMiPhnCQDxrn6ESkPjseQ0eVYsNFHTzuMirG3btuV/1lUAtXPnTuPrt99+K6+++qqxPeG+9ejRwwgcNAjwp0mTJkZQ4abbFxpgaFDheZ/7fXR1QQMKd1Ch9PjatWsbj7mP0SDEU4cOHcr/rNszulKigYfn+WoAovcDAET+dFIdsTMVQl+qQZyWnCbPioVWf9h5XIRVrVq1/M+ac6E0cFD79++X2267zcir8Kb5E2Ze0/26vu5zv48d9FzViy++WCkASU+PXCUOACSSrzfvkVJ/9aMWueMTnWwab4mbyRVYaEmpVn9ooqa/liL6uB4X59q1a2dsJZx88skRfR9Ntty6datxc69a6Ptq6ayuXLiP0ZyM/v37lz9v2bJlFVZAGjZsaORm6PYNACC8XAgNFQLFIPE6Lj35AgvtU6ElpVr9Uek/S1lE13NCxPpZFBYWVmoAVbdu3QrbDGaNGDFC2rdvbyRr3nzzzVKzZk3jgq/JlU8//bRt59y9e3c5/fTTjYBAcyk0eVQrSTSX4+yzzzaOueuuu4w+Hfp9p06djETNtWvXSrNmzcpfR3NEdHUlJyfHSDotLi42Ej337Nkjw4cPt+18ASBR1TeZC6Hj0HVyqWcFSV52hlx37onSpF5Nn506403yBBZK+1T0fc1PH4sJEe1jsXDhQjnrrLMq3Kd5B1qGGUr+hVZpaOWIlpw6nU5p3rx5hWoNO+i2yHvvvSd33nmnnH/++UYuhgYGTz31VPkx+p6aK3HfffcZCZtaqTJo0CD56KOPyo/R4EfLZh999FG59957jUBIAxZ38ikApLpzTbbpHnJhC+MWqOV3vHM49aoVRUVFRcZvtvobvvZN8KQXLk1O1KoCf5UP8dh5E+Gx7b87ACRA503xvaZudN6M1+2NYNfv5F2xcNMgIg5KSgEAqafEz5AxDRo0ePBu2R3vORNWJWdgAQBAjFYlxnkFDg08Age9aRvuRN7qCIbAAgCAKA0Z69nGNZW0Q/O69p9AnKQBEFgAABCFIWPjIjk0LI4GcMZl580o55MixvjvDSAVhoxtj8TQMF2lWKitFvrFzQDOuAos3J0ajxw5EutTQRQdPHjQ+OrdJRQAkq0B1k47h4atnSXyaAuRheP9HBCbAZxxtRVSpUoVox/Cb7/9ZlxkPGdcIDlXKjSo0NklOp+EFuAAEqXCw/vxn37dF92hYR+PFln6pIkDoz+AM64CC23YpAO5tKfB5s2bY306iBINKvLy8mJ9GgBgqcLD1+P+uBtg2TI0bM0sk0FFbAZwxlVgoapVq2aMCWc7JDXoyhQrFQASrcLj1vObyguLCwLO9LB9aNixIyLLXxD59EHrz43iAM64CyyUboHQgREAEI8VHurFz80FFbY0wNL8iHdvFlk7M8h4MomLAZxxGVgAABCvFR7KzAj0Id1Olk4n1wuvAZZue8y6TeRYGEmfERzA6QuBBQAAEajcaHF8rfAaYZlO0PQju1HEB3D6QmABAEAEKjfqh/o6uvWx6JHwgoquI0W63kfnTQAAYu1PJ9WR3JrV5PcDoRcRpDlE9oTyfG1mNfc+kX3bQ35v6ThUpNsoiRUaRQAA4FEN0vXRz8IKKtw5GHe8udJ4PUtBhXbKDDWoqFFP5C9TRS55SGLJcmCxbds2ufHGG6Vu3bqSmZkpp59+uvznP/+JzNkBABDlElMzfSnMGjdnnVFlYmr7Q2d9WK76UA6RG2aI3POjSJsrJdYsbYXs2bNHOnXqJN26dZO5c+fKcccdJz/99JPUqVMncmcIAEAMS0xD5fSYDxI0iVM7Y3rP+jCr450iLS6SeGEpsJg4caI0btxYXnnllfL7mjZtGvA5xcXFxs2tqKgolPMEACCmJaa2VZkcOyKy4kWRPZtE6jQROeeWEDtjOlxBRYy3PsIKLGbPni09evSQa665RhYtWiSNGjWSwYMHyy233OL3Ofn5+TJu3Dg7zhUAgIiwdThYoOoQLSH98mkRZ6nHfX8XaW1xC6NJF5EbZ4hUqSbxxlKOxcaNG2XKlClGy+2PPvpIBg0aJEOHDpWpU6f6fc6oUaOksLCw/LZ161Y7zhsAANuEWhqaVb1Kectub46y2SLl80HcfSk8gwql36+dIVKtpkcD8AAJmtdMFRn4flwGFZZXLEpLS+Xss8+W8eNdI1rPOussWbNmjTz33HMyYMAAn8/JyMgwbgAAxCu9+GsQoLNArORZXPOnE+SVLzYZ4YAz0HyQY0dcKxWBHDlY9irer1bmgvtFzr8nJr0pIrZioZNHW7duXeG+U089VbZs2WL3eQEAEDV68dcgQFlpvn1x6zyZcmM7Yx6Ip7yc6sb95fNBNKfCe6WiEqfIGTeIZDeo3EGz7+siF4yI+6DC8oqFVoT88MMPFe778ccf5aSTTrL7vAAAiGpVSE5mNflbpyYyc9U2+f3AUdNj0DUo0QBDE0B37jtsbKtUmg+iiZpmZNQUGbbGVSWiCZ06lVQHiCVAQBFSYHH33XdLx44dja2Qvn37yvLly+WFF14wbgAAJGr/Ci019awK0c6bZzbOkQXf/xZ8m6NsxaO8pLS0RGTzkoqBgVZ/mKHHaRDRtIskKofT6bRUtvv+++8bCZnav0JLTYcPHx6wKsSblpvm5OQYiZzZ2dmhnDMAALY2xfK+ELqDh1vPbyqzv91eIehoEGgM+rrZrkZXnj0pdGz5Jf8QefemwNshjnSRB3bEbVKm2eu35cAiWicGAECktz86T1zgt3+Fe7tj0b3d5OvNe/xvc7hXKRY/JrJwvJ9XEpGWl4r88GHgGR9x1pMilOs3Q8gAACkpWFMsd+dMDSr8ds50BxTLnhU5vDfAKzlEtn8r0mGI61jPlQtdqehwR1wHFVYQWAAAUpLZplh+j9NtjzlDRQ7tMfEqTpGibSKn9BS5aEzlzptxuv0RCgILAEBKMtsUq9JxAbc9gtCETg0idIUiSRFYAABSkpmmWHVrVpM/nVSn4irF3PtCH21e63hJdpbHpgMAkCpNsXYfOCLdHpkvyxfMEpk3SmR6vxCDCoer0ZWWniY5AgsAQMrSklFfnTPdLk37SmYX/03OXTzAlXQZ1ptNSKhGV6FiKwQAIKkeXBw75pQ7p31TYUtkZPqbcluV98Vhpce3L7pSoUFF6yskFRBYAAAk1ZtkDZn2Tfn3VeSY5Fd5Uf6S/nn4L35BYgwOsxOBBQAgpZtkaTtvlSalMqnK03J5+jLx7n9lWWauSO/JKbNK4YnAAgAgqd4kq2faMvln1SmS6Qg8fCyozDoi5w1KuVUKTwQWAICUpc2vbMulSMFtD18ILAAAKavV7gVyRZX3w3uRFEvODIbAAgCQmkpL5JSvx4S+UqFbHq16uXpTpPgqhScCCwBAatq8VBwHd4f23DifRBpLBBYAgNSkczusqlFP5LLHRdpcGYkzSgoEFgCA1GR1bsdpV4lc/RLbHkEQWAAAkqovhZaQarWHTiXVQWM6E8QnzY3IbihS9EuQV3WIdLyTrQ+TCCwAAEnTQVObXWlfCjedXqqDxrRtdyW68tBzosj0/iL+5ps26SJy4wzXqHOYwhAyAEBSBBWD3lgpvxYelPZp6+SKtKXG152FB4379XGftES072uulQvvXIprpooMfJ+gwiJWLAAASdGW+5K05TKm6mvS0PF7+WO/OHPlwaP9Zdyc6nJx6zzf2yIaXGjZ6OalroROzb2ghDRkBBYAgISmORVt9y2WKVUnVXosT36XZ6tOkkH79LgzpUPzur5fRIOIpl0if7IpgK0QAEBC21l0wFipUN4LEu7vx1R93TgOkUdgAQBIaCcfXG1sf/gr/tD7Gzp2G8ch8ggsAAAJ7dSsg7Yeh/AQWAAAElpaVp6txyE8BBYAgMRW1ujKqY2sfDDu1wmkehwijsACAJDYyhpdaVjhHVzo98Y9Otac8tGoILAAACS+skZXjuyKHTYd2vhKG2Dp44gK+lgAAJIDja7iAoEFACB+lJaEFxjQ6CrmCCwAAPFh3WyReSMqThvVrQwdFMZWRsIgxwIAEB9BhU4Z9R5hXrTddb8+joRAYAEAiP32h65U+BxdXnbfvJGu4xD3CCwAALGlORXeKxUVOEWKtrmOQ9wjsAAAxJYmatp5HGKKwAIAEFta/WHncYgpqkIAADEtIS1p3EF2SV05zrnb54TSUqfITkddOa5xB6EjRfwjsAAAxLSEdPnmQnn1SD+ZUnWSEUR4Bhf6vRpzpJ8M3FwoHZrXjfZPAIvYCgEAxLSEdOe+w/JR6bky6Ogw2SG5FQ7dIXWN+/VxPQ7xjxULAEAMSkgdrhLSVr2kflZ1414NHj4pPlvOTfte6ste2Sm1ZXlpKykt+x3YfRziG4EFACCmJaTnNu0sDXKqy47Cw0YQsay0dYUjdWckL6e6nNu04moG4hNbIQCAmJaQpqc5ZExvVzDhnbvp/l4f1+MQ/wgsAAAxLyHt2aaBTLmxnbEy4Um/1/v1cSQGtkIAAPbTklKt/tBETZ95Fg7X43pcGQ0eLm6dJ8sLfjcSNTWnQrc/WKlILAQWAAD7aZ8KLSnV6g9jQ8MzuCgLFHpOqDQSXYMISkoTG1shAIDI0D4VfV8TZ3bFbQynrlT0fY1R6EmKFQsAQMTMKz1HHjo8WRof+ba8hHTr4TNkdOnp0jPWJ4eIILAAAETEvDXbZdAbK41NkG3yRwmpo+iocT9JmcmJrRAAgO1KSp0ybs46v+2xlD6uxyGFA4uxY8eKw+GocGvVqlXkzg4AkJC0smN7of8W3BpO6ON6HFJ8K+S0006T+fPn//ECVdhNAYBUmURqltm5Hsz/SD6WowINJPLy8kwfX1xcbNzcioqKrL4lACBOJpGaZXauB/M/ko/lHIuffvpJGjZsKM2aNZMbbrhBtmzZEvD4/Px8ycnJKb81btw4nPMFAMRwEqlZ2thK53/4a22l9+vjzP9IPg6n02k6c2bu3Lmyf/9+admypWzfvl3GjRsn27ZtkzVr1khWVpbpFQsNLgoLCyU7O9uenwIAEPrWx6YlIm/3Fzm0189BZV0yh622tC3irgoR3+2xqApJMHr91gWCYNdvS4GFt71798pJJ50kTzzxhNx00022nhgAIMIBxeLHRL56NkBA4WXA+yJNu1h6Gw0utPrDM5FTVyp0qBhBRWIxe/0OK/Oydu3acsopp8j69evDeRkAQDStnSUye4hI8b7ITCz1wPyP1BNWYKHbIhs2bJB+/frZd0YAgMj5eLTI0icjO7HUC/M/Uoul5M177rlHFi1aJJs2bZKlS5fKn//8Z0lPT5frrrsucmcIALDHmlkhBhWaY9GowiRSwJYVi59//tkIInbv3i3HHXecdO7cWZYtW2b8GQAQ5zkVHw4P4Yn+J5ECYQcW06ZNs3I4ACBeaNOrg7stP00nkTo0qGASKUyibSYAJNvKRMHnIpuXuGo8tYqjSWdLiZc6vqNQasngo0Nly+GzmEQKSwgsACBZaBOrOUNFDu35477PHxXJzBU573ZTL+FuQDDy6M3yZWkbJpHCMqabAkAyrFIsnCgyvV/FoMLt0O8iC8eLZNYJ+lJ7pJYMOjpMPio91/ieSaSwihULAEj0VYoP7xXZv8PEwY6AKxWzS9rL3ceGSKnX75yek0gpG0UwrFgAQELP9+hnMqgoW7m44H45lFlxkOQuZ7YMOjpU7jo2tFJQ4YlJpDCDFQsASNTtD82nsOjbg7ly1Z7H5Jy076W+7JWdUluWl7YKGFC4MYkUZhBYAEAi0soPX/kUQTz/zUEpkTRZVtra9HN0AyWPSaQwia0QAEhEWk5qUXGNBjJvXzNLz3FnZejQMOZ7wAwCCwBIRCEUaHzbZqSpLQ9PulJBqSmsILAAgERkZXy59rHo+7qUtOxt6S1G9zpVloy4kKAClpBjAQCJSLtpasCglR7+pFcXuW6aSLPzjTkf55Y6pUFOdaN01Ix6WRlsf8AyViwAIB5bcq9+x/VVv/dFB4L1nhz4ta5+UeTkbuXDwzRIuOIM86sPVIEgFKxYAEA89aWYN0Kk6Jc/7stuKNJzou8hYHpf39crPUeTNDWfoiSjk7FK4V51mLdmu7ywuCDoaVAFgnA4nE53Z/joKCoqkpycHCksLJTs7OxovjUAxHmzq/4+sjLLtiL6vuZ/wqiuamxeKt9+971RTqqVH+4kTd360IqOi1vnSeeJC4Jug7g3PkjYRKjXb7ZCACDWNDDQVQefpR5l980b6XdbRPtSTN6QJ30WN5QP951cofJDA4nb31gpTy9Ybyq3IrdmNYIKhIWtEACItc1LK25/VOIUKdrmOs6rGkS3N8bOXis7iooDvsXzizeYOpW/9zqVoAJhIbAAgFjb/2tIx2lQoSPNzexnHzziJwnUS15OprlzAfxgKwQAYq3W8ZaP0xHmOsrcSpKcI8hjmo9BwibCRWABALF2UkdX9YffS79DJLuR67gyOsLcbD8KN3cQ4v0utO2GnQgsACDWtM+ElpQGuuz3nFDejyKcEeY3dWpilJJ6om037ESOBQBEglZwbFoiUrBIZO/PIrVPEGlyviv50iNAqNiT4jU/fSwmVCo1DbV5VffWeXJ/r9bGiocGJ/o6uv3BSgXsQmABAHZbO0tk9hCR4n0V7//8cZHMOiK9n/Tf8KpVL1f1hyZqak6Fbn/4CET+dFId0Vig1GSShWfTKw0iOjSvG+pPBwREYAEAdvp4tMjSJ/0/fmiPyPR+ro6ZvoILDSJMDBj7evMeS0GFIocC0UCOBQDYZc2swEGFp7kj/M8BMcFKjgU5FIgmViwAwA4aJHw43Pzx+37x2fBKy0jN5D+YzbHQ0ecDOzVlpQJRQ2ABAKEqm9Fh5EPo7eBua8/30fBKe1N4lpG6Z314rzZowKGP7Sg87LOXhTungqAC0UZgAQB2TSK1yqPhlb8umho46P3eWxkaLGjAoY9p2OD5PHIqEEvkWABAqJNIwwkqshqWN7wK1EXTfZ8+rsd50kBDAw76UiCesGIBALZNIrXg0onlZaTBumjqO+njepx3magGDzoSnb4UiBcEFgBg6yTSIDJzRXpPrlBqarbCw99x9KVAPCGwAIBITCL1VCVT5OSLRc65yWfnTbMVHqF22wSiicACAIJVfHh2wDQ7ibTHeCmtWV++21dD1tc4Xepn1/S7RWG2woPJo0gEBBYAYKbiw5jZMdHVclv/XLTdT56FTiJtKPNq9ZFx7/9QljuxOmDpKBUeSCYOp9MZZgaSNUVFRZKTkyOFhYWSnZ0dzbcGAJMVH/38P66tuJVWhRgqhwHfdJgsV31Wr1LY4Q4L/FVsWOljAUSb2es3gQUAeG5/PHqyyKHfAydf3rte5PsPfKxqNJKSHvnSeXYtv1Ue7m2NJSMu9LkCYbbzJhBtZq/fbIUAgJuOOQ8UVCh9XI/zM4l0ecFe2V64LKTSUUWFBxIdgQUAuBV8bv64Zl19TiKdv26H7UPEgERCYAEgNfmq+jC74+DnOM2R+NcXm0y9BKWjSFYEFgBSj7+qj7PcCZlBnNS50l3uttzBUDqKZEdgASA153x412xo+eiiCSLVaokc2R84edNr+8NMW243fVdKR5HMGEIGIHUEnPOh9zlE0qsFfg1tx+3VOdNKzsTfOjWhdBRJjcACQOoIOufD6ar6uOB+kawGlaeRag8LjxkfoeRM6MAwIJmxFQIgdZIzzc75qNtc5O61vlt6+0FbbsCFwAJA6iRnthto7vkaSPgoJQ0kUFtuKfv+f85pbPr1gETFVgiA5FqlWDjR1ZLbe8tDkzMX5otk1vFfL2rM+WjkWp0IgeZOaLtuXZnw5Z/zf5LOExcYZalAsiKwAJA8qxT/PE1k4Xg/B7jXENzrCd7BRdn3PScE3PIwE1xou+67u5/i83HdKtFVDYILJCsCCwCJvUKxcZHI//V3rVLsC3ax9kjOzPZKztStkr6v+U3OtGraii3+zsCgPS+09wWQbMixAJC4KxSzh4oc3mP9uZqcOWyNpeRMK4L1tAg2LwRI2RWLCRMmiMPhkGHDhtl3RgAQzNpZrhWKUIIK7+TM0//i+mpTUGGlpwXzQpCMQl6xWLFihTz//PPStm1be88IAAJZM0vkHZPVHT6TMxuGnJxpltmeFswLQTIKacVi//79csMNN8iLL74odepohjUARGn7450BfjpnmhRmcqYnzZH4csNueW/VNuOrO2diz4EjEqhjtz6kPS/oaYFkFNKKxR133CG9evWS7t27y8MPPxzw2OLiYuPmVlRUFMpbAkh15e24Q6RlpBpUmEzO1CBBcyB0u0JXFjQI8JzvoVUdmoDpmUuhwcIVZzSQFxYXBA19mBeCZGU5sJg2bZqsXLnS2AoxIz8/X8aNGxfKuQFIdZ7dM/UWsB13AFoFcv49plcqNGgYO3ut7Cj645eivOwMGXvFaUY5qT6uJaPewYMGGc8vLgj42hpLPH3dWcwLQdKyFFhs3bpV7rrrLvnkk0+kenVze4OjRo2S4cOHV1ixaNyY7nMAQuieGeFVCqVBw+1vrKx0vwYZev+z158lD33wXcibMbpbUqdmRojPBpIssPj6669l586d0q5du/L7SkpKZPHixfL0008bWx7p6RV/I8jIyDBuABD2aHMruo4U6XqfpXwK3f4YOWN1wGPuffe/cqC4JPTzohoESc5SYHHRRRfJ6tUV/0/317/+VVq1aiUjRoyoFFQAgOVtD21yNW9keEFFx6Ei3UZZftqyDbtl78GjAY8JN6hQVIMgmVkKLLKysqRNmzYV7qtZs6bUrVu30v0AENVtD5WRJXLF0yKnXRnS07/cuEsiiQmnSAV03gSQ+NseOljsvEGWEjR9M1elUSsj3Vi5CHTG3hNO3a9MNQiSXdiBxcKFC+05EwCpt/VR8LnInKGhBRU9xrs6aNrYjlvbaz/92fqgx93cuZlM/vQnv8HDrec3ldnfbq9QiqorFRpUUA2CZMeKBYAE2/oo65553u22tuFW7ZvVldo1qgbMs6hTo6rceVELadUgq1IfC8/g4b6epwbsgwEkKwILAAm09WHPaHN/9MI/4arTfZabuuVfdbpxnAYPF7fO8xs86FcGjCEVEVgAiN7Wx6YlInPuDD2fQlcqLPalsEoDhudubCdjZ6+THUUVu2p6b2UQPACVEVgAiOOtD4dIjboiPfNFshrYOto8kGCrEQD8I7AAEKdbH2UX8cv/GdEVCn9YjQBCQ2ABIAqDw5xR3/YINkQMQGQQWACIHO2kaXX7Q3tSXDNVpEnnkLc9/E0epdwTiLy0KLwHgFSlE0lN09UEh0jvJ0WadQ0rqNDJo55BhdpReNi4Xx8HEDkEFgAiR5tXWdn66PtaWPkUuv2hKxW+Nl7c9+njehyAyGArBEDkaBWHBgxFukrg52KemSvyl1dEmnYJu+Jj2cbdlVYqPOkZ6OOae0FiJhAZrFgAiBwNFHpOLPvG4WfrY7JI8wvCDip0i+OOf/tvbOWJseVA5BBYAIgs3drQLY7sBrZvfXjnVew9FHjkuZtWieh2yJcbdst7q7YZX9keAezBVgiAyNPgoVUvV5WIJnTaODgsUF6Fv7Hlew4US+eJC6gaASKAwAJAdGgQoXkUNtKg4uUlBQHzKrxdcUYDuePNbyoFIu6qkSk3tiO4AMLAVgiAhKTbH+0e+lj+8eF3po7XqaXPXN/OGGdO1QgQOaxYAIhb/rpnalARaAKpL89c107S0hxUjQARRmABIC6DCM2DeOiD7yrlQYzudarcP3ON6dd151W0b15X3v+vuS6gVI0AoSOwABBzvlpw+6J5EIPf/Mby62tSpq50aMBihtnjAFRGYAEgptylomayGqxmPmhexYSrTi9PxtStFF310ADFGWB1Q48DEBqSNwHEjJVS0VBoXoVnhYeuWujqhb92XZ6rGwBCQ2ABpPJI84LPRVa/4/qq30eZ5lRYKRW1ok6NqkZehTcNNLSkVFcmPOn3lJoC4WMrBEhF62aLzBtRcaS5dsLU9ts2dMI0K5JJkv+4so3flQcNHi5uneez4gRAeAgsgFQMKqb3r5yxoIPC9H6b2mxHMklSL/+Btk9uO7+pXNa2YcDX0CCCklLAfgQWQCrQbQ5tp71vu8i8UX4uy3qfQ2TeSFf7bRvabQcTLJnSH3/H1sqoIo9c3VYua8t2BhAr5FgAyW7NLJHHWohMvVxkxi0iB3cFONgpUrTNFYREQaBkSl+C7VTUykiXHm3y7Dk5ACFhxQJIZh+PFln6pPXn6aCwKHEnU3r3sXA1w2otdWpWM/Igdu1zNcwKZEdRMV0zgRgjsACS1dpZoQUVSqePRpGZZEodb24GXTOB2CKwAJI1p+KD/xfCEx2u6hAdaR5lwZIp6ZoJJAZyLIBkpDkSAXMpfClbHeg5ISqJm6EmevpLs9D79XG6ZgKxRWABJGOjq1ByJHSlIoqlplbRNRNIDGyFAMnY6KrdQHPPz8gW6fW4SFYD1/ZHBFcqvKeX/umkOvL15j2WGlT5S/TUrpkaVNA1E4g9h9PpjFSbfp+KiookJydHCgsLJTs7O5pvDaRGoyt3+6jMXJFDvwd+jb9MFWlzpcRieqnGEKUep97AQnDgHaTQNROIn+s3KxZAom59zBkauNFVsB6VHYdGLajwNb3UM6hQ2iRLjzMzr4OumUD8IscCSLRVikltRF7vI3JoT4ADna7VigtGubZGPNWoJ3LNVJFLHoqr6aXuY/R4fR6AxMSKBZDwWx8B1G0uMmyNq0pEEzq1P0WEcynCmV6qP5keT5MrIHERWACJsv2hSZqWJmqUNbrSIKJpF4mFUJtV0eQKSFwEFkAi0BUHz8qPOG50ZUezKppcAYmLHAsgEVjqSxE/ja6CNbXyRpMrIPERWACJwMrsjjhqdGVleilNroDkQGABJALd0jCqOwJccDPriPSfLTJsdVwEFd5NrbSJlSfv2EEfN1NqCiC+kWMBJALd0ug5sawqxLs3RdkVuveTIs26SjzyNb00lM6bAOIfnTeBhG/h3ciVTxFHqxQAkg+dN4F4Lh0Nta+EBg+tesWsLwUABENgAcTD0DDd5jC74hDDvhQAEAzJm0C0O2d696Mo2u66Xx8HgARHYAHEvHNm2X3zRrqOA4AExlYIEI18ioJFQTpnOkWKtrmOtXmbgxHjAKKJwAKIZj6FrR02zY0s12mhnoPAtLOlNqGiXwSASGArBLB7haLgc5F5o0Sm97M438Nih00TQcWgN1ZWmi66o/Cwcb8+DgAxDSymTJkibdu2NepX9dahQweZO3eu7ScFJOwKxaQ2IlMvF1n2rMUn69CwRrYNDdPtD12pCJDRYTyuxwFAzLZCTjjhBJkwYYK0aNFCtK/W1KlTpU+fPvLNN9/IaaedZuuJAQm1SrH4MZGF40N8AfuHhmlOhfdKhScNJ/TxV78okHpZGeReAIifzpu5ubny6KOPyk033WTqeDpvIulWKebeJ7IvjG2FCHTOfG/VNrlr2ipLzyH3AkBMO2+WlJTI22+/LQcOHDC2RPwpLi42bp4nBiRVXwqfGw4mdLnXNdsjAp0zdQXCKnfuBYPAAEQ1eXP16tVSq1YtycjIkNtvv11mzpwprVu7xiL7kp+fb0Q47lvjxo3DOmEgLrY+Ni4SmXNniEFFWT5Ft1Gu0tIItOPWbQ1dgbCysUHuBYCYBBYtW7aUVatWyVdffSWDBg2SAQMGyLp16/weP2rUKGPZxH3bunVruOcMxD5B87UrRA7tjYt8Cl80V0K3NTze0RR37oXmaABATHIsunfvLs2bN5fnn3/e1PHkWCBltz5iMInUVx8LMyb/z5nS58xGETsvAIknatNNS0tLK+RQAKnXktuE9oNFWl5mSz6FlU6amitxceu88uN37SuWhz74LiI5GgBgObDQbY1LL71UTjzxRNm3b5+8+eabsnDhQvnoo4/4NJHctNW21WZXEVihCKWTpgYdHZrXLQ9KXlpSYCRq+gqRNDzJy3EFKwAQ8cBi586d0r9/f9m+fbuxHKLNsjSouPjii0N6cyDuZ3xoi23thhlKOekF94ucf49tuRTuTprOMKo53LkXerwGEZ6v5V7z0MfpZwEgZjkWVpFjgbh27IjInGEi62aKHD34x/016ooc3B2zPApdaeg8cYHfXAn3SsOSEReaCgqYIQIgbnMsgKTx8WiRpU/5zqMwE1Rk5or85ZWIlJCa7aSpx7m3PQLxzr2g8yYAuxBYILXplsemJSILHhL5eYXJJ/nZROg9WaT5BZE4S+PiH85x/hI+zQQhAGAFgQVSu3x0zlCRQ3usPc/YFtn1x/fZDSNeQmq2SsPXcWx7AIgmAgukcE+Kfsa6g+XF/575IlkN/kjstFhCaqVc1LuTptVqDjsSPgHACgILpOb2x5yhoQUVSoMKzaMIIYjYc+CIPPSB9dWDQNUcUvb9ZW1cORPuQCXY6HR9HX1ccy3IrQBgF6pCkHp0zoe25A6FVnwMW+13hcIzkNi064C8tXyL7CgK3EDOfUk3s3rga1tDYwLP0R7uQCUns5pc9+KyoD/SW7e0J9cCQFBUhQD++lL89HHorxNgxkeo7bOtrB54VnN8sm6HvPzFpgpBhec2x187NbE1MRQAzCCwQPIHFIsfE/nq2RCHhpWpVkvkyil+EzT95TKYZaVcVAMP3e4YPn2V39fS0OS9VeY6hdK+G4CdCCyQ3AHF0skiRw6E/DLGRfq0q0Sufing9oe/XAarzK4emOlrsfvAEcmtWVX2HDhK+24A8Ts2HYh7a2aJTDhRZOH4gEGFv0BAs470NsdxgZTev1PkmlcCVn0Eu8hbYXb1wGwA8ueyCaXemyu07wYQKQQWSL7ume8MEDmyP+ihgS6nzx+7XKpe/ZykV8uISo6Coyzp0uzqgdkApHvrPCMpVFcmPOn3lJoCiAS2QpA81s4SWfpkWC+xy5klj1e5TbpcfbNRVfHeqm1Be02Em6MQyuqBlb4W+pq07wYQLQQWSJ6cig/+X0hPLblkvGw4WEN2OmtLepNO0uXQMUu9JoJd5IPJC6ELptUppbTvBhAt9LFAcij4XGTq5WH3pfBX3RGo14Qmbz69YL38c/6PQd9OA5DRvU6VOjUzbFk9oF03gGihjwVSi/anCLMvRSidKoP1rtCL/P+cc6I0qVcjIlsQTCkFEG8ILJAcdGaHFTrivPdkKWnVW5Zv2G1clHftKzY1mnzZht2SluaQ+et2yL++2OT3+Lu7t5AhF7aI+EWebQ4A8YTAAslBB4HplNGiIE2hqtYU6XSXyPn3yLx1O2XcxAWWS0XveHOl7D10NOAxGkpMW7HVCCwAIJVQborkoNsZPScGLiLVRlejtopcMMIIKjSXIpT+E8GCCu9OmgCQSggskDy03Xbf11wrF55q1BO5Zmp5oys7O2UGwxwOAKmGrRAkxuAwzaHQ7Y4AHTDLg4tWvQI+z85OmcEwhwNAqiGwQOIMDtOVCN3u8DMIrJwGEU27+B1n/tOvwbtyhos5HABSFYEF4su62eKcc5c4DvnITSjaLjK9v2u7I1hwYcM481AxhwNAKiOwQHwFFRo4+B8P5rpszxvp2u4IMG3UvTpR8Nt+mfTpeltOr2ZGuhwsLgmamxFKJ00ASBYEFogPpSVyaM69Ut3pFEfAX/KdIkXbXDkUXtsdkV6duKVzM5n86U8+W2jr93/r1MRoVkWDKgCpjMAiDnn+xp3UnRQ9kjNL9v0qmYd2BB45GqTTpr923HaoU6Oq3HlRC2nVIKtS4MIKBQD8gcAizqTM7Id1s0XmjShvaBWk1iNop81Il5DmX3W6EdzRQhsAAiOwiCP+fuPWqZl6v68BWAkbVATMpfDPyLLQwWFaQuohUiWkvoI6WmgDgH8EFnEilAFYCdmPouZxInPv8xtU6KxdfzkWxhxeR8XBYZFqRFU7s6o8c0M7ad+sbuJ93gAQQwQWcSLYb9yeLaIT6rdlrfSYO0Ic+4LM8CgTKHFzr6OWZF/zrKT7KDW1qxGV++0nXH26dDq5ni2vCQCphJbeccLsb9wJ1SLaKB/tJ2IyqPBnj7OWPH70L/LV1V9J+ml9fB6jeQ66bWGWLkLc0qVJpedoImbSbDkBQAywYhEnzP7GnRAtonXro+BzOfbubZIeYGsjkAeP3ii7nLVlp9SWrbXOkNHXnB7wYq/bFaN7tZbBb6409fpPX9dOLmvbQEZe2ppETACwEYFFnHD/xq2Jms5EbhHtUe1h/OWyeI0udYrskLpyQo9hckZ2DUsX+zo1q5l6j7u7n2IEFYpETACwF1shcUIvcFp9oBxx1CJak0q/3LBb3lu1zfiq3wet9igrIbXK/dLjjvaTutk1pM+ZjYyLvtmf2ew2UZN6NUI6PwBAcKxYxBFd6tf9fTMNmOxuouXr9T5Zt8N8Tw3d/tCVijA6SehKhQYVH5WeKwND2PJJqu0kAEhQBBZRphfwZRt3G7/960W4Q7N60t7jt3IzDZjsbqLl6/Vq16gqew8erXSs354aWk4awkrFQ0evl9+cuUYuxfLSVuKUNONn0Z/ZavCUNNtJAJDAHE6n0R0gaoqKiiQnJ0cKCwslOztbUolewEfOWF3pgq0X8QlXBU5ODNZEy325tVrREEobbPcFesmIC/+40K9+R+Tdm0y/hv6t2yO15Ozi56S0bEfO82dQoQRP7p/HeA+vc3a/NhUfABC56zc5FlGiF7zb31jpcxVA79PH9JhAOQ3BmmjpbezstYHzIGxog+3ZU8Nfi+2Az3e6XuPhtNvLgwrPUk+lwYF3Xw/3aol+TsG2k/S1PFFGCgDRwVZIFOgFfOzsdUGPG/Huf43jdhT5/i3dTNvqHUXF8vSC9XJX9xbl763P09f8fX+x5NasJnk5mcZ2QLhtsCskS2qL7eyGIkV60Q8cqvzqyJXtHcbKoxf3l2u8tjpU54kLwupAyjwPAIgdAosocF/Ygyk8dMy4+ctpKD5Waur9/jn/R2mZV8v4s78R4hqwXNYmT8JRIQlSW2z3nGhUhTjFIQ6P0MD9pw3N+0txs57S6rweklfF9VfPu9RTV2ns6EBKGSkAxAaBRRSE0y3T87f0x645w/TzNJej8OBRv2sHenH+1xebgr5OmpTKuWnfS33ZWyHB0mcSZOsr5JsOk6Xhl+PkeNHkVJdfpa5s7zBGzuoxIDU7kAJACiGwiIJwyxvdv6XrH3Slwcz2ha9cDl90d8Cd8+CtR9pyGVP1NWno+COX4hdnrjx4tL9c2fv2SlsLRuLkZ/XEIZMrBCMrSltJ6WdpMqXR9qA5DpSMAkBiI3kzCvQ3+7zs8C+Euw4UlzfRsovmebpXRTz1TFsuU6pOkjyPoELp91OqTZaeaSv8JoJqQuay0tYyu7Sj8bWk7K+ZPh4ssdRdMuovG0Lvd5ejAgDiT0oGFpa6SdpAf7Mfe0X4AYH+lq5JiTUzKo4MD9ffOjUxtjZ026N92jrpk7ZExld72Zjx4f0XRL83LvrzRrqaYlmczvrqFwUBP+947UAKADAn5bZC7G4uZZa+9nM3tvPdxyKzihwpccrBI39cqL1prwt3JceBYv/HhUKDlV7pX0nT5WMkV4pMPMMpUrTN1RSraRdLOQ8PffCdvLSkIODnbaUDKQAgvqRUYOGvGZTfbpI2c5dBenfePKdprpz54McBn3u0rCLE7qRFDarqL3tYmv3oWqGwZP+vIeU8mPm8KRkFgMSUMoFFsOZSZvoj2EFfu9PJ9Yyb2xc/7Qq4WqEOHCmRZRt225q0qD/l02dskWbLXw7tBTyaYgVrpx3K503JKAAknpTJsTCbA1Chm2SUfLlxl+njgiU3mqWvMeWGM6Tttw8ar2VttcIhkt3I1RTLRG5EvH3eAIDISZnAIr77I5i9qjsqXMBDlVuzqiy6t5v0rFUgVYt/D+1ce05wNcUy0U47EPpRAEBySZnAIlb9EcxUoJhd7ncf576Aa4Bglrvi44q0pXLKoW/l64JdFXIkTNO23X1fM5ph+aLnpsPJRvc61dTL0Y8CAFI4xyI/P19mzJgh33//vWRmZkrHjh1l4sSJ0rJlS4l30Rqp7Tnqe9Oug/LW8i1+Z3+4aYdM3YoINGe2To2q0r7ZHwGIPv/CVsdL+/xP5fcDRwKek69GV4fefVGkvblppEVSQ2r9eZKkaVCh2x9eKxXedFVlYKemRvUHI8wBILVYWrFYtGiR3HHHHbJs2TL55JNP5OjRo3LJJZfIgQMHJN7Z0R8h2OqDVp3oAK3rXlwmd01bZczs8J4R4j2hU7/e8ebKgEGFyr/q9ErnVq1Kmoz/cxtXjoSfVYqh6e/Kc1UnSQOpuOVR/fBOkYXjRTJz/SZbGh05nSIb2o+XtDOudZWWBgkq3OhHAQCpyeF0Bruk+ffbb79J/fr1jYDj/PPPt3Wee7z1sQj2PH+lrBLgt3XNc+j66GcBk0r1uvv0dWfJZW0bWjo3XaXIr/qi5DoCBX0Okcw6Iof2iNMYHVaR/iwFp9wkza5/wsRPFV99QwAA9jJ7/Q6r3FRfXOXm+l/OLi4uNm6eJxZLofRHCNb/4pnr28lDH/guZQ1UEfH6l5uCzv3QRZE6NTMs/Uzp38+Ry76bZCIl1Cly6HeRC+4Xx8pXRYp+KX/kSEZdSe/9uDRr82eTP5W5c6MfBQAkt5ADi9LSUhk2bJh06tRJ2rRpEzAvY9y4cRJPrPRHMNP/YvR7a2R3kDwHXzb/ftC2yol0KZUOaetE0raLbJ4oTof5WhOp21xk2BpXJ01N6Kx1vFQzkUthFv0oACB1hBxYaK7FmjVrZMmSJQGPGzVqlAwfPrzCikXjxo3FTp4Jk3b/Rmym/0UoQYU6KbeGPZUT62aLzBtRYcXBYbXRlQYRZe25AQCIamAxZMgQef/992Xx4sVywgknBDw2IyPDuEVKpPfwI9FnwZ1j0a9Dk/AqJ3QI2KJHRBZNCP1kvBpdAQAQtaoQzfPUoGLmzJmyYMECadq0qcSSO/fBe0XBu/IiHGb7LGhPCSurBBr4aFVHyJUTa2eJTDgxvKDCT6MrAACiEljo9scbb7whb775pmRlZcmOHTuM26FDhyTaguU+KH083JHowVpo6/36+MN9XHkmwYILjRFuPb9p+WqKv26V+r3fIV0fjxZ5e4DIkf2h/VDGiaaL/GWq30ZXAABEvNzU4WegxCuvvCIDBw6Marmp9pHQfhHBvHVL+6CJg8FyNNwrI8rzw3If4Q4AfG3LePN+jtlzKLdmlsg7AyRs10wVOe3K8F8HAJASiiJRbhpGy4u4nf1hJkfDvargfVyej+OMsegbdhtNr/YeOmp6sqepygnNqfjwj0TYkGTmivSezEoFACAiqqTy7I9g/Sk8VxXM9mPQ79PSHD6DCl+TPS2VYWo56MHdYklmXZGz/+oKZ7Tqo0lncioAABGTsIFFuLM/zPSnCGlVIZKTVEMZGtZ7EqsTAICoSdjppuHOojDTn8K9qhDV1RTd7ti4SOTTh0UWPCyyYaHrPne/CbNIzgQAxEDCrlhYyX2I6qpCOKsp2uhqzlBjdscfHnXN8+j9pEirXq6x5R6NsPz6y8skZwIAoi6hA4twZlHYkaMRbDVF8zQcfipJKq2mBKr20EBjej+Rvq+L9JwoMr2/16t6qFZL5MoprFQAAGIiYbdCPLlzH/qc2cj4aqadt9n+FH67XgZhqT+FNrt6x0S57twRrlWLvq+5Vi48VcsSOX+kyMgtBBUAgJhJ+BWLUIW0qhCJ1RTd/tBmV2bs+8VVGaKBgwYYHkPDjLbcVHsAAGIsZQOLcHM0zApYSaJJmTo8LJTKEIaGAQDiUEoHFuHkaIRMg4lNS0QKPhcp3GIuEdOTlcoQAACiLOUDCyv9KcJmVH3cJXLIeglreR4Fk0gBAHGMwCLSdIVCcyF++FBk2bPhvdYVT5JHAQCIawQWkV6hmHufyL7wx7dLy8tE2lxlx1kBABAxBBaRDCq094QdOtwp0uNhe14LAIAIIrCIxNaHJmbOuj2816lWU6R1H5HLJ4tUqWbX2QEAEFEEFnavUmj5qNVKD09d7hVp1pW+FACAhERgYevWR4BW22ZkNRTpNoqAAgCQsJKipXfMlTe6CiOoUJdOJKgAACQ0Ags7aDlpONsfmbmuAWPM+AAAJDi2QuzgbrNtRZVMkQ6DRZqc72rNzUoFACAJEFjYIZQ221e9wAoFACDpEFiE0kXTe6KoftUx5kXbg+dZZDcS6TmBoAIAkJQILMIpJdVgoudEV5CgX42qEO8h7GXaD3Z1z6SMFACQxEjetFJK6p2gqSsUer8+rsFF39dEshtUXqHQxMye+eRSAACSHisWYZWS6n0OkXkjRVr1cgUX+tXXdgkAACmAwCLsUlKnSNE213HuFQn9CgBACmIrxK5S0lBKTgEASDKsWASq9rBSShpKySkAAEmGwCJYtUfQUlKH63E9DgCAFJeW0qPN540Smd4vcLWHrlxokGHQUlJPZd9rXwoSNAEASMHAQoOFSW1Epl4usuxZPweVrUxotYcGIX5LSRu67qfZFQAAKbgVYmm0uVe1B6WkAAAElTqBRaijzT2rPSglBQAgoNTZCgl1tDnVHgAAmJY6KxaW+0xQ7QEAgFWps2JhaeWBag8AAEKROoGFux9FpZJRH6j2AAAghbdCAnXOdHP3o2C0OQAAEVMl6TtnenL3o6h0fCPXtgcrFAAAhMXhdDot1l+Gp6ioSHJycqSwsFCys7Mj1JeibLvD33aGmRUOAABg+fpdJTn7Uuh9DlfnTG1q5WtbhH4UAADYLi15+1J4dM4EAABRkZb0fSks968AAACpF1iY7UtB50wAAKImLXn7UmjnzEZ0zgQAIIoSN7Bw96UweAcXdM4EACAWEjew8OxLkd2g4v10zgQAICYSt9zUTYMHLSmlLwUAADGX+IGFoi8FAABxIbG3QgAAQGIHFosXL5bevXtLw4YNxeFwyKxZsyJzZgAAIPkDiwMHDsgZZ5whzzzzTGTOCAAApE6OxaWXXmrcAAAAop68WVxcbNw8p6MBAIDkFPHkzfz8fGPMqvvWuHHjSL8lAABI1sBi1KhRxux2923r1q2RfksAAJCsWyEZGRnGDQAAJD/6WAAAgNitWOzfv1/Wr19f/n1BQYGsWrVKcnNz5cQTTwz6fKfTaXwliRMAgMThvm67r+N+OS367LPP9BUr3QYMGGDq+Vu3bvX5fG7cuHHjxo2bxP1Nr+OBOPR/JIpKS0vll19+kaysLKNzZziRk1aYaDJodna2recI3/jMo4/PPPr4zKOPzzwxPnMNF/bt22d03k5LS4ufIWR6MieccIJtr6cfCH8Ro4vPPPr4zKOPzzz6+Mzj/zPXthHBkLwJAABsQ2ABAABsk7CBhfbGGDNmDD0yoojPPPr4zKOPzzz6+MyT6zOPevImAABIXgm7YgEAAOIPgQUAALANgQUAALANgQUAALANgQUAAEiNwOKZZ56RJk2aSPXq1eW8886T5cuXBzz+7bffllatWhnHn3766fLhhx9G7VyThZXP/MUXX5QuXbpInTp1jFv37t2D/jdC+H/P3aZNm2a0xb/yyisjfo6p/pnv3btX7rjjDmnQoIFRnnfKKafw70sEP+9JkyZJy5YtJTMz02g7fffdd8vhw4ejdr6JbvHixdK7d2+j9bb+GzFr1qygz1m4cKG0a9fO+Pt98skny6uvvhr6CTjj1LRp05zVqlVzvvzyy861a9c6b7nlFmft2rWdv/76q8/jv/jiC2d6errzkUceca5bt87597//3Vm1alXn6tWro37uicrqZ3799dc7n3nmGec333zj/O6775wDBw505uTkOH/++eeon3uqfOZuBQUFzkaNGjm7dOni7NOnT9TONxU/8+LiYufZZ5/tvOyyy5xLliwxPvuFCxc6V61aFfVzT4XP+9///rczIyPD+Kqf9UcffeRs0KCB8+677476uSeqDz/80PnAAw84Z8yYYQwNmzlzZsDjN27c6KxRo4Zz+PDhxvXzqaeeMq6n8+bNC+n94zawOPfcc5133HFH+fclJSXOhg0bOvPz830e37dvX2evXr0q3Hfeeec5b7vttoifa7Kw+pl7O3bsmDMrK8s5derUCJ5lcgnlM9fPuWPHjs6XXnrJmCpMYBHZz3zKlCnOZs2aOY8cORLFs0zdz1uPvfDCCyvcpxe8Tp06Rfxck5GYCCzuu+8+52mnnVbhvmuvvdbZo0ePkN4zLrdCjhw5Il9//bWxtO45vEy///LLL30+R+/3PF716NHD7/EI/zP3dvDgQTl69Kjk5uZG8EyTR6if+YMPPij169eXm266KUpnmtqf+ezZs6VDhw7GVsjxxx8vbdq0kfHjx0tJSUkUzzx1Pu+OHTsaz3Fvl2zcuNHYdrrsssuidt6p5kubr59Rn25qxq5du4z/0+r/iT3p999//73P5+zYscPn8Xo/IvOZexsxYoSxp+f9FxT2feZLliyRf/3rX7Jq1aoonWVyCeUz1wvbggUL5IYbbjAucOvXr5fBgwcbQbS2RIa9n/f1119vPK9z587GmO5jx47J7bffLvfff3+Uzjr17PBz/dTR6ocOHTJyXayIyxULJJ4JEyYYyYQzZ840ErRgv3379km/fv2MpNl69erF+nRSRmlpqbFC9MILL8if/vQnufbaa+WBBx6Q5557LtanlpQ0iVBXhJ599llZuXKlzJgxQz744AN56KGHYn1qSOQVC/1HMz09XX799dcK9+v3eXl5Pp+j91s5HuF/5m6PPfaYEVjMnz9f2rZtG+EzTd3PfMOGDbJp0yYj29vzoqeqVKkiP/zwgzRv3jwKZ55af8+1EqRq1arG89xOPfVU47c8XeqvVq1axM87lT7v0aNHGwH0zTffbHyvFX4HDhyQW2+91QjodCsF9vJ3/czOzra8WqHi8r+Q/h9VfzP49NNPK/wDqt/rXqcver/n8eqTTz7xezzC/8zVI488YvwmMW/ePDn77LOjdLap+ZlrKfXq1auNbRD37YorrpBu3boZf9ayPNj/97xTp07G9oc7iFM//vijEXAQVNj/eWuulnfw4A7qmJkZGbZfP51xXKKkJUevvvqqUf5y6623GiVKO3bsMB7v16+fc+TIkRXKTatUqeJ87LHHjNLHMWPGUG4a4c98woQJRhnZO++849y+fXv5bd++fTH8KZL7M/dGVUjkP/MtW7YY1U5Dhgxx/vDDD87333/fWb9+fefDDz8cw58ieT9v/bdbP++33nrLKIP8+OOPnc2bNzcq/2CO/husbQD0ppf5J554wvjz5s2bjcf189bP3bvc9N577zWun9pGICnLTZXW0p544onGxUtLlpYtW1b+WNeuXY1/VD1Nnz7decoppxjHa+nMBx98EIOzTmxWPvOTTjrJ+EvrfdN/GBC5v+eeCCyi85kvXbrUKF/XC6SWnv7jH/8wyn5h/+d99OhR59ixY41gonr16s7GjRs7Bw8e7NyzZ0+Mzj7xfPbZZz7/bXZ/zvpVP3fv55x55pnGfyP9O/7KK6+E/P4O/R97FlMAAECqi8scCwAAkJgILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgNjl/wOB6vqgznVLNAAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "n = 100\n", + "x = np.random.rand(n, 1)\n", + "y = 2.0 + 5 * x**2 + 0.1 * np.random.randn(n, 1)\n", + "\n", + "line_model = LinearRegression().fit(x, y)\n", + "line_predict = line_model.predict(x)\n", + "#line_mse = ...\n", + "\n", + "#poly_features = ...\n", + "#poly_model = LinearRegression().fit(..., y)\n", + "#poly_predict = ...\n", + "#poly_mse = ...\n", + "\n", + "plt.scatter(x, y, label = \"Data\")\n", + "plt.scatter(x, line_predict, label = \"Line model\")\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "248d8931", + "metadata": {}, + "source": [ + "## Exercise 4 - The train-test split\n" + ] + }, + { + "cell_type": "markdown", + "id": "1efd3376", + "metadata": {}, + "source": [ + "Hopefully your model fit the data quite well, but to know how well the model actually generalizes to unseen data, which is most often what we care about, we need to split our data into training and testing data. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0f8d75fb", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split" + ] + }, + { + "cell_type": "markdown", + "id": "edb213fc", + "metadata": {}, + "source": [ + "**a)** Complete the code below so that the polynomial features and the targets y get split into training and test data.\n", + "\n", + "**b)** What is the shape of X_test?\n", + "\n", + "**c)** Fit your model to X_train\n", + "\n", + "**d)** Compute the MSE when your model predicts on the training data and on the testing data, using y_train and y_test as targets for the two cases.\n", + "\n", + "**e)** Why do we not fit the model to X_test?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a03e0388", + "metadata": {}, + "outputs": [], + "source": [ + "polynomial_features = ...\n", + "\n", + "#X_train, X_test, y_train, y_test = train_test_split(polynomial_features, y, test_size=0.2)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "22e7536e", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb index 8ce6a6c3d..403eab1f3 100644 --- a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb +++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb @@ -323,7 +323,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb index ddf3e11e5..3dd1ad167 100644 --- a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb +++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb @@ -172,7 +172,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek38-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek38-checkpoint.ipynb new file mode 100644 index 000000000..4ffd81af5 --- /dev/null +++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek38-checkpoint.ipynb @@ -0,0 +1,483 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1da77599", + "metadata": {}, + "source": [ + "# Exercises week 38\n", + "\n", + "## September 15-19\n", + "\n", + "## Resampling and the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "e9f27b0e", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Derive expectation and variances values related to linear regression\n", + "- Compute expectation and variances values related to linear regression\n", + "- Compute and evaluate the trade-off between bias and variance of a model\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n", + "\n", + "- The jupyter notebook with the exercises completed\n", + "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n" + ] + }, + { + "cell_type": "markdown", + "id": "984af8e3", + "metadata": {}, + "source": [ + "## Use the books!\n", + "\n", + "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n", + "\n", + "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n", + "\n", + "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n", + "\n", + "### Definitions\n", + "\n", + "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n" + ] + }, + { + "cell_type": "markdown", + "id": "c16f7d0e", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "9fcf981a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d4189366", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4fca21b", + "metadata": {}, + "source": [ + "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "5de0c7e6", + "metadata": {}, + "source": [ + "## Exercise 1: Expectation values for ordinary least squares expressions\n" + ] + }, + { + "cell_type": "markdown", + "id": "d878c699", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "08b7007d", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "46e93394", + "metadata": {}, + "source": [ + "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n" + ] + }, + { + "cell_type": "markdown", + "id": "be1b65be", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "d2143684", + "metadata": {}, + "source": [ + "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n", + "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n" + ] + }, + { + "cell_type": "markdown", + "id": "f5c2dc22", + "metadata": {}, + "source": [ + "## Exercise 2: Expectation values for Ridge regression\n" + ] + }, + { + "cell_type": "markdown", + "id": "3893e3e7", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "79dc571f", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "028209a1", + "metadata": {}, + "source": [ + "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b4e721fc", + "metadata": {}, + "source": [ + "**b)** Show that the variance is\n" + ] + }, + { + "cell_type": "markdown", + "id": "090eb1e1", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T}\\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b8e8697", + "metadata": {}, + "source": [ + "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n" + ] + }, + { + "cell_type": "markdown", + "id": "74bc300b", + "metadata": {}, + "source": [ + "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "eeb86010", + "metadata": {}, + "source": [ + "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n", + "\n", + "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n" + ] + }, + { + "cell_type": "markdown", + "id": "522a0d1d", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "831db06c", + "metadata": {}, + "source": [ + "**a)** Show that you can rewrite this into an expression which contains\n", + "\n", + "- the variance of the model (the variance term)\n", + "- the expected deviation of the mean of the model from the true data (the bias term)\n", + "- the variance of the noise\n", + "\n", + "In other words, show that:\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cc52b3c", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cb50416", + "metadata": {}, + "source": [ + "with\n" + ] + }, + { + "cell_type": "markdown", + "id": "e49bdbb4", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "eca5554a", + "metadata": {}, + "source": [ + "and\n" + ] + }, + { + "cell_type": "markdown", + "id": "b1054343", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", + "$$\n", + "In order to arrive at the last equation, we have to approximate the unknown function $f$ with the output/target values $y$." + ] + }, + { + "cell_type": "markdown", + "id": "70fbfcd7", + "metadata": {}, + "source": [ + "**b)** Explain what the terms mean and discuss their interpretations.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b8f8b9d1", + "metadata": {}, + "source": [ + "## Exercise 4: Computing the Bias and Variance\n" + ] + }, + { + "cell_type": "markdown", + "id": "9e012430", + "metadata": {}, + "source": [ + "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n", + "\n", + "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "b5bf581c", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "predictions = np.random.rand(bootstraps, n) * 10 + 10\n", + "targets = np.random.rand(bootstraps, n)\n", + "\n", + "mse = ...\n", + "bias = ...\n", + "variance = ..." + ] + }, + { + "cell_type": "markdown", + "id": "7b1dc621", + "metadata": {}, + "source": [ + "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n", + "\n", + "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n" + ] + }, + { + "cell_type": "markdown", + "id": "8da63362", + "metadata": {}, + "source": [ + "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd5855e4", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.preprocessing import (\n", + " PolynomialFeatures,\n", + ") # use the fit_transform method of the created object!\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e35fa37", + "metadata": {}, + "outputs": [], + "source": [ + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "x = np.linspace(-3, 3, n)\n", + "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n", + "\n", + "biases = []\n", + "variances = []\n", + "mses = []\n", + "\n", + "# for p in range(1, 5):\n", + "# predictions = ...\n", + "# targets = ...\n", + "#\n", + "# X = ...\n", + "# X_train, X_test, y_train, y_test = ...\n", + "# for b in range(bootstraps):\n", + "# X_train_re, y_train_re = ...\n", + "#\n", + "# # fit your model on the sampled data\n", + "#\n", + "# # make predictions on the test data\n", + "# predictions[b, :] =\n", + "# targets[b, :] =\n", + "#\n", + "# biases.append(...)\n", + "# variances.append(...)\n", + "# mses.append(...)" + ] + }, + { + "cell_type": "markdown", + "id": "253b8461", + "metadata": {}, + "source": [ + "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n", + "\n", + "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n" + ] + }, + { + "cell_type": "markdown", + "id": "46250fbc", + "metadata": {}, + "source": [ + "## Exercise 5: Interpretation of scaling and metrics\n" + ] + }, + { + "cell_type": "markdown", + "id": "5af53055", + "metadata": {}, + "source": [ + "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n", + "\n", + "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n", + "\n", + "Briefly answer the following:\n", + "\n", + "**a)** Why do we scale data?\n", + "\n", + "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n", + "\n", + "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n", + "\n", + "**d)** Why do we say that the Ridge method gives a biased model?\n", + "\n", + "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n", + "\n", + "**h)** What is an advantage of the R2 score over the MSE?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek39-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek39-checkpoint.ipynb new file mode 100644 index 000000000..7e520c96d --- /dev/null +++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek39-checkpoint.ipynb @@ -0,0 +1,197 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "433db993", + "metadata": {}, + "source": [ + "# Exercises week 39\n", + "\n", + "## Getting started with project 1\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b931365", + "metadata": {}, + "source": [ + "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n", + "\n", + "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2a63bae1", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Create a properly formatted report in Overleaf\n", + "- Select and present graphs for a scientific report\n", + "- Write an abstract and introduction for a scientific report\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise 4.\n" + ] + }, + { + "cell_type": "markdown", + "id": "e0f2d99d", + "metadata": {}, + "source": [ + "## Exercise 1: Creating the report document\n" + ] + }, + { + "cell_type": "markdown", + "id": "d06bfb29", + "metadata": {}, + "source": [ + "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n", + "\n", + "**a)** Create an account on Overleaf.com\n", + "\n", + "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n", + "\n", + "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n", + "\n", + "**d)** Read the general guideline for writing a report, which can be found at .\n", + "\n", + "**e)** Look at the provided example of an earlier project, found at \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "338b2ee1", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "ec36f4c3", + "metadata": {}, + "source": [ + "## Exercise 2: Adding good figures\n" + ] + }, + { + "cell_type": "markdown", + "id": "f50723f8", + "metadata": {}, + "source": [ + "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n", + "\n", + "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n", + "\n", + "**c)** Refer to the figure in your text using \\ref.\n", + "\n", + "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n", + "\n", + "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n" + ] + }, + { + "cell_type": "markdown", + "id": "276c214e", + "metadata": {}, + "source": [ + "## Exercise 3: Writing an abstract and introduction\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4134eb5", + "metadata": {}, + "source": [ + "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n", + "\n", + "**a)** Read the guidelines on abstract and introduction before you start.\n", + "\n", + "**b)** Write an abstract for project 1 in your report.\n", + "\n", + "**c)** Write an introduction for project 1 in your report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2f512b59", + "metadata": {}, + "source": [ + "## Exercise 4: Making the code avaliable and presentable\n" + ] + }, + { + "cell_type": "markdown", + "id": "77fe1fec", + "metadata": {}, + "source": [ + "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n", + "\n", + "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n", + "\n", + "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n", + "\n", + "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n", + "\n", + "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n", + "\n", + "**e)** Create a README file in the repository or project folder with\n", + "\n", + "- the name of the group members\n", + "- a short description of the project\n", + "- a description of how to install the required packages to run your code from a requirements.txt file\n", + "- names and descriptions of the various notebooks in the Code folder and the results they produce\n" + ] + }, + { + "cell_type": "markdown", + "id": "f1d72c56", + "metadata": {}, + "source": [ + "## Exercise 5: Referencing\n", + "\n", + "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n", + "\n", + "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n", + "\n", + "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n", + "\n", + "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb index 6d6019289..f80e8787a 100644 --- a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb +++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb @@ -2,1492 +2,624 @@ "cells": [ { "cell_type": "markdown", - "id": "b2937d10", - "metadata": {}, + "id": "860d70d8", + "metadata": { + "editable": true + }, "source": [ "\n", - "" + "" ] }, { "cell_type": "markdown", - "id": "3dd00d19", - "metadata": {}, + "id": "119c0988", + "metadata": { + "editable": true + }, "source": [ - "# Exercises weeks 43 and 44 \n", - "**October 23-27, 2023**\n", - "\n", - "Date: **Deadline is Sunday November 5 at midnight**\n", + "# Exercises week 43 \n", + "**October 20-24, 2025**\n", "\n", - "You can hand in the exercises from week 43 and week 44 as one exercise and get a total score of two additional points." + "Date: **Deadline Friday October 24 at midnight**" ] }, { "cell_type": "markdown", - "id": "82a19a1d", - "metadata": {}, + "id": "909887eb", + "metadata": { + "editable": true + }, "source": [ - "# Overarching aims of the exercises weeks 43 and 44\n", - "\n", - "The aim of the exercises this week and next week is to get started with writing a neural network code\n", - "of relevance for project 2. \n", - "\n", - "During week 41 we discussed three different types of gates, the\n", - "so-called XOR, the OR and the AND gates. In order to develop a code\n", - "for neural networks, it can be useful to set up a simpler system with\n", - "only two inputs and one output. This can make it easier to debug and\n", - "study the feed forward pass and the back propagation part. In the\n", - "exercise this and next week, we propose to study this system with just\n", - "one hidden layer and two hidden nodes. There is only one output node\n", - "and we can choose to use either a simple regression case (fitting a\n", - "line) or just a binary classification case with the cross-entropy as\n", - "cost function.\n", + "# Overarching aims of the exercises for week 43\n", "\n", - "Their inputs and outputs can be\n", - "summarized using the following tables, first for the OR gate with\n", - "inputs $x_1$ and $x_2$ and outputs $y$:\n", + "The aim of the exercises this week is to gain some confidence with\n", + "ways to visualize the results of a classification problem. We will\n", + "target three ways of setting up the analysis. The first and simplest\n", + "one is the\n", + "1. so-called confusion matrix. The next one is the so-called\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
$x_1$ $x_2$ $y$
0 0 0
0 1 1
1 0 1
1 1 1
" - ] - }, - { - "cell_type": "markdown", - "id": "f74f69af", - "metadata": {}, - "source": [ - "## The AND and XOR Gates\n", + "2. ROC curve. Finally we have the\n", "\n", - "The AND gate is defined as\n", + "3. Cumulative gain curve.\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
$x_1$ $x_2$ $y$
0 0 0
0 1 0
1 0 0
1 1 1
\n", + "We will use Logistic Regression as method for the classification in\n", + "this exercise. You can compare these results with those obtained with\n", + "your neural network code from project 2 without a hidden layer.\n", "\n", - "And finally we have the XOR gate\n", + "In these exercises we will use binary and multi-class data sets\n", + "(the Iris data set from week 41).\n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
$x_1$ $x_2$ $y$
0 0 0
0 1 1
1 0 1
1 1 0
" + "The underlying mathematics is described here." ] }, { "cell_type": "markdown", - "id": "1b52d47a", - "metadata": {}, + "id": "1e1cb4fb", + "metadata": { + "editable": true + }, "source": [ - "## Representing the Data Sets\n", + "### Confusion Matrix\n", "\n", - "Our design matrix is defined by the input values $x_1$ and $x_2$. Since we have four possible outputs, our design matrix reads" + "A **confusion matrix** summarizes a classifier’s performance by\n", + "tabulating predictions versus true labels. For binary classification,\n", + "it is a $2\\times2$ table whose entries are counts of outcomes:" ] }, { "cell_type": "markdown", - "id": "3e6910cb", - "metadata": {}, + "id": "7b090385", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\boldsymbol{X}=\\begin{bmatrix} 0 & 0 \\\\\n", - " 0 & 1 \\\\\n", - "\t\t 1 & 0 \\\\\n", - "\t\t 1 & 1 \\end{bmatrix},\n", + "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "90a3b78a", - "metadata": {}, + "id": "1e14904b", + "metadata": { + "editable": true + }, "source": [ - "while the vector of outputs is $\\boldsymbol{y}^T=[0,1,1,0]$ for the XOR gate, $\\boldsymbol{y}^T=[0,0,0,1]$ for the AND gate and $\\boldsymbol{y}^T=[0,1,1,1]$ for the OR gate.\n", - "\n", - "Your tasks here are\n", - "\n", - "1. Set up the design matrix with the inputs as discussed above and a vector containing the output, the so-called targets. Note that the design matrix is the same for all gates. You need just to define different outputs.\n", - "\n", - "2. Construct a neural network with only one hidden layer and two hidden nodes using the Sigmoid function as activation function.\n", - "\n", - "3. Set up the output layer with only one output node and use again the Sigmoid function as activation function for the output.\n", - "\n", - "4. Initialize the weights and biases and perform a feed forward pass and compare the outputs with the targets.\n", - "\n", - "5. Set up the cost function (cross entropy for classification of binary cases).\n", - "\n", - "6. Calculate the gradients needed for the back propagation part.\n", - "\n", - "7. Use the gradients to train the network in the back propagation part. Think of using automatic differentiation.\n", - "\n", - "8. Train the network and study your results and compare with results obtained either with **scikit-learn** or **TensorFlow**.\n", - "\n", - "Everything you develop here can be used directly into the code for the project." + "Here TP (true positives) is the number of cases correctly predicted as\n", + "positive, FP (false positives) is the number incorrectly predicted as\n", + "positive, TN (true negatives) is correctly predicted negative, and FN\n", + "(false negatives) is incorrectly predicted negative . In other words,\n", + "“positive” means class 1 and “negative” means class 0; for example, TP\n", + "occurs when the prediction and actual are both positive. Formally:" ] }, { "cell_type": "markdown", - "id": "d6a3ab1e", - "metadata": {}, + "id": "e93ea290", + "metadata": { + "editable": true + }, "source": [ - "## Setting up the Neural Network\n", - "\n", - "We define first our design matrix and the various output vectors for the different gates." + "$$\n", + "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 1, - "id": "152123b0", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "c80bea5b", + "metadata": { + "editable": true + }, "source": [ - "%matplotlib inline\n", - "\n", - "\"\"\"\n", - "Simple code that tests XOR, OR and AND gates with linear regression\n", - "\"\"\"\n", - "\n", - "# import necessary packages\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "from sklearn import datasets\n", - "\n", - "def sigmoid(x):\n", - " return 1/(1 + np.exp(-x))\n", - "\n", - "def feed_forward(X):\n", - " # weighted sum of inputs to the hidden layer\n", - " z_h = np.matmul(X, hidden_weights) + hidden_bias\n", - " # activation in the hidden layer\n", - " a_h = sigmoid(z_h)\n", - " \n", - " # weighted sum of inputs to the output layer\n", - " z_o = np.matmul(a_h, output_weights) + output_bias\n", - " # softmax output\n", - " # axis 0 holds each input and axis 1 the probabilities of each category\n", - " probabilities = sigmoid(z_o)\n", - " return probabilities\n", - "\n", - "# we obtain a prediction by taking the class with the highest likelihood\n", - "def predict(X):\n", - " probabilities = feed_forward(X)\n", - " return np.argmax(probabilities, axis=1)\n", - "\n", - "# ensure the same random numbers appear every time\n", - "np.random.seed(0)\n", - "\n", - "# Design matrix\n", - "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", - "\n", - "# The XOR gate\n", - "yXOR = np.array( [ 0, 1 ,1, 0])\n", - "# The OR gate\n", - "yOR = np.array( [ 0, 1 ,1, 1])\n", - "# The AND gate\n", - "yAND = np.array( [ 0, 0 ,0, 1])\n", - "\n", - "# Defining the neural network\n", - "n_inputs, n_features = X.shape\n", - "n_hidden_neurons = 2\n", - "n_categories = 2\n", - "n_features = 2\n", + "where TPR and FPR are the true and false positive rates defined below.\n", "\n", - "# we make the weights normally distributed using numpy.random.randn\n", - "\n", - "# weights and bias in the hidden layer\n", - "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n", - "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n", - "\n", - "# weights and bias in the output layer\n", - "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n", - "output_bias = np.zeros(n_categories) + 0.01\n", - "\n", - "probabilities = feed_forward(X)\n", - "print(probabilities)\n", - "\n", - "\n", - "predictions = predict(X)\n", - "print(predictions)" + "In multiclass classification with $K$ classes, the confusion matrix\n", + "generalizes to a $K\\times K$ table. Entry $N_{ij}$ in the table is\n", + "the count of instances whose true class is $i$ and whose predicted\n", + "class is $j$. For example, a three-class confusion matrix can be written\n", + "as:" ] }, { "cell_type": "markdown", - "id": "73319f0a", - "metadata": {}, + "id": "a0f68f5f", + "metadata": { + "editable": true + }, "source": [ - "Not an impressive result, but this was our first forward pass with randomly assigned weights. Let us now add the full network with the back-propagation algorithm discussed above." + "$$\n", + "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n", + "$$" ] }, { "cell_type": "markdown", - "id": "a7e0c47a", - "metadata": {}, + "id": "869669b2", + "metadata": { + "editable": true + }, "source": [ - "## The Code using Scikit-Learn" + "Here the diagonal entries $N_{ii}$ are the true positives for each\n", + "class, and off-diagonal entries are misclassifications. This matrix\n", + "allows computation of per-class metrics: e.g. for class $i$,\n", + "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n", + "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n", + "all remaining entries.\n", + "\n", + "As defined above, TPR and FPR come from the binary case. In binary\n", + "terms with $P$ actual positives and $N$ actual negatives, one has" ] }, { - "cell_type": "code", - "execution_count": 2, - "id": "dbbacc67", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "2abd82a7", + "metadata": { + "editable": true + }, "source": [ - "# import necessary packages\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.neural_network import MLPClassifier\n", - "from sklearn.metrics import accuracy_score\n", - "import seaborn as sns\n", - "\n", - "# ensure the same random numbers appear every time\n", - "np.random.seed(0)\n", - "\n", - "# Design matrix\n", - "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", - "\n", - "# The XOR gate\n", - "yXOR = np.array( [ 0, 1 ,1, 0])\n", - "# The OR gate\n", - "yOR = np.array( [ 0, 1 ,1, 1])\n", - "# The AND gate\n", - "yAND = np.array( [ 0, 0 ,0, 1])\n", - "\n", - "# Defining the neural network\n", - "n_inputs, n_features = X.shape\n", - "n_hidden_neurons = 2\n", - "n_categories = 2\n", - "n_features = 2\n", - "\n", - "eta_vals = np.logspace(-5, 1, 7)\n", - "lmbd_vals = np.logspace(-5, 1, 7)\n", - "# store models for later use\n", - "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", - "epochs = 100\n", - "\n", - "for i, eta in enumerate(eta_vals):\n", - " for j, lmbd in enumerate(lmbd_vals):\n", - " dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n", - " alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n", - " dnn.fit(X, yXOR)\n", - " DNN_scikit[i][j] = dnn\n", - " print(\"Learning rate = \", eta)\n", - " print(\"Lambda = \", lmbd)\n", - " print(\"Accuracy score on data set: \", dnn.score(X, yXOR))\n", - " print()\n", - "\n", - "sns.set()\n", - "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", - "for i in range(len(eta_vals)):\n", - " for j in range(len(lmbd_vals)):\n", - " dnn = DNN_scikit[i][j]\n", - " test_pred = dnn.predict(X)\n", - " test_accuracy[i][j] = accuracy_score(yXOR, test_pred)\n", - "\n", - "fig, ax = plt.subplots(figsize = (10, 10))\n", - "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", - "ax.set_title(\"Test Accuracy\")\n", - "ax.set_ylabel(\"$\\eta$\")\n", - "ax.set_xlabel(\"$\\lambda$\")\n", - "plt.show()" + "$$\n", + "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n", + "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n", + "$$" ] }, { "cell_type": "markdown", - "id": "1cac501a", - "metadata": {}, + "id": "2f79325c", + "metadata": { + "editable": true + }, "source": [ - "## Building a neural network code\n", - "\n", - "Here we present a flexible object oriented codebase\n", - "for a feed forward neural network, along with a demonstration of how\n", - "to use it. Before we get into the details of the neural network, we\n", - "will first present some implementations of various schedulers, cost\n", - "functions and activation functions that can be used together with the\n", - "neural network.\n", - "\n", - "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023." + "as used in standard confusion-matrix\n", + "formulations. These rates will be used in constructing ROC curves." ] }, { "cell_type": "markdown", - "id": "dd153528", - "metadata": {}, + "id": "0ce65a47", + "metadata": { + "editable": true + }, "source": [ - "### Learning rate methods\n", + "### ROC Curve\n", "\n", - "The code below shows object oriented implementations of the Constant,\n", - "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n", - "of the classes belong to the shared abstract Scheduler class, and\n", - "share the update_change() and reset() methods allowing for any of the\n", - "schedulers to be seamlessly used during the training stage, as will\n", - "later be shown in the fit() method of the neural\n", - "network. Update_change() only has one parameter, the gradient\n", - "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n", - "from the weights. The reset() function takes no parameters, and resets\n", - "the desired variables. For Constant and Momentum, reset does nothing." + "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n", + "between true positives and false positives as a discrimination\n", + "threshold varies. Specifically, for a binary classifier that outputs\n", + "a score or probability, one varies the threshold $t$ for declaring\n", + "**positive**, and computes at each $t$ the true positive rate\n", + "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n", + "confusion matrix at that threshold. The ROC curve is then the graph\n", + "of TPR versus FPR. By definition," ] }, { - "cell_type": "code", - "execution_count": 3, - "id": "f55eea63", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "d750fdff", + "metadata": { + "editable": true + }, "source": [ - "import autograd.numpy as np\n", - "\n", - "class Scheduler:\n", - " \"\"\"\n", - " Abstract class for Schedulers\n", - " \"\"\"\n", - "\n", - " def __init__(self, eta):\n", - " self.eta = eta\n", - "\n", - " # should be overwritten\n", - " def update_change(self, gradient):\n", - " raise NotImplementedError\n", - "\n", - " # overwritten if needed\n", - " def reset(self):\n", - " pass\n", - "\n", - "\n", - "class Constant(Scheduler):\n", - " def __init__(self, eta):\n", - " super().__init__(eta)\n", - "\n", - " def update_change(self, gradient):\n", - " return self.eta * gradient\n", - " \n", - " def reset(self):\n", - " pass\n", - "\n", - "\n", - "class Momentum(Scheduler):\n", - " def __init__(self, eta: float, momentum: float):\n", - " super().__init__(eta)\n", - " self.momentum = momentum\n", - " self.change = 0\n", - "\n", - " def update_change(self, gradient):\n", - " self.change = self.momentum * self.change + self.eta * gradient\n", - " return self.change\n", - "\n", - " def reset(self):\n", - " pass\n", - "\n", - "\n", - "class Adagrad(Scheduler):\n", - " def __init__(self, eta):\n", - " super().__init__(eta)\n", - " self.G_t = None\n", - "\n", - " def update_change(self, gradient):\n", - " delta = 1e-8 # avoid division ny zero\n", - "\n", - " if self.G_t is None:\n", - " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", - "\n", - " self.G_t += gradient @ gradient.T\n", - "\n", - " G_t_inverse = 1 / (\n", - " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", - " )\n", - " return self.eta * gradient * G_t_inverse\n", - "\n", - " def reset(self):\n", - " self.G_t = None\n", - "\n", - "\n", - "class AdagradMomentum(Scheduler):\n", - " def __init__(self, eta, momentum):\n", - " super().__init__(eta)\n", - " self.G_t = None\n", - " self.momentum = momentum\n", - " self.change = 0\n", - "\n", - " def update_change(self, gradient):\n", - " delta = 1e-8 # avoid division ny zero\n", - "\n", - " if self.G_t is None:\n", - " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", - "\n", - " self.G_t += gradient @ gradient.T\n", - "\n", - " G_t_inverse = 1 / (\n", - " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", - " )\n", - " self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n", - " return self.change\n", - "\n", - " def reset(self):\n", - " self.G_t = None\n", - "\n", - "\n", - "class RMS_prop(Scheduler):\n", - " def __init__(self, eta, rho):\n", - " super().__init__(eta)\n", - " self.rho = rho\n", - " self.second = 0.0\n", - "\n", - " def update_change(self, gradient):\n", - " delta = 1e-8 # avoid division ny zero\n", - " self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n", - " return self.eta * gradient / (np.sqrt(self.second + delta))\n", - "\n", - " def reset(self):\n", - " self.second = 0.0\n", - "\n", - "\n", - "class Adam(Scheduler):\n", - " def __init__(self, eta, rho, rho2):\n", - " super().__init__(eta)\n", - " self.rho = rho\n", - " self.rho2 = rho2\n", - " self.moment = 0\n", - " self.second = 0\n", - " self.n_epochs = 1\n", - "\n", - " def update_change(self, gradient):\n", - " delta = 1e-8 # avoid division ny zero\n", - "\n", - " self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n", - " self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n", - "\n", - " moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n", - " second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n", - "\n", - " return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n", - "\n", - " def reset(self):\n", - " self.n_epochs += 1\n", - " self.moment = 0\n", - " self.second = 0" + "$$\n", + "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n", + "$$" ] }, { "cell_type": "markdown", - "id": "1a9bcb3e", - "metadata": {}, + "id": "561bfb2c", + "metadata": { + "editable": true + }, "source": [ - "### Usage of the above learning rate schedulers\n", + "where $TP,FP,TN,FN$ are counts determined by threshold $t$. A perfect\n", + "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n", "\n", - "To initalize a scheduler, simply create the object and pass in the\n", - "necessary parameters such as the learning rate and the momentum as\n", - "shown below. As the Scheduler class is an abstract class it should not\n", - "called directly, and will raise an error upon usage." + "Formally, the ROC curve is obtained by plotting\n", + "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n", + "sweeps through the sorted scores). The Area Under the ROC Curve (AUC)\n", + "quantifies the average performance over all thresholds. It can be\n", + "interpreted probabilistically: $\\mathrm{AUC} =\n", + "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n", + "instance $X^+$ receives a higher score $s$ than a random negative\n", + "instance $X^-$ . Equivalently, the AUC is the integral under the ROC\n", + "curve:" ] }, { - "cell_type": "code", - "execution_count": 4, - "id": "86013cb4", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "5ca722fe", + "metadata": { + "editable": true + }, "source": [ - "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", - "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)" + "$$\n", + "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n", + "$$" ] }, { "cell_type": "markdown", - "id": "535331f6", - "metadata": {}, + "id": "30080a86", + "metadata": { + "editable": true + }, "source": [ - "Here is a small example for how a segment of code using schedulers\n", - "could look. Switching out the schedulers is simple." + "where $f$ ranges over FPR (or fraction of negatives). A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0." ] }, { - "cell_type": "code", - "execution_count": 5, - "id": "7e0f6b5a", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "9e627156", + "metadata": { + "editable": true + }, "source": [ - "weights = np.ones((3,3))\n", - "print(f\"Before scheduler:\\n{weights=}\")\n", + "### Cumulative Gain\n", "\n", - "epochs = 10\n", - "for e in range(epochs):\n", - " gradient = np.random.rand(3, 3)\n", - " change = adam_scheduler.update_change(gradient)\n", - " weights = weights - change\n", - " adam_scheduler.reset()\n", - "\n", - "print(f\"\\nAfter scheduler:\\n{weights=}\")" + "The cumulative gain curve (or gains chart) evaluates how many\n", + "positives are captured as one targets an increasing fraction of the\n", + "population, sorted by model confidence. To construct it, sort all\n", + "instances by decreasing predicted probability of the positive class.\n", + "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n", + "of all actual positives that fall in this subset. In formula form, if\n", + "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n", + "number of positives among the top $\\alpha$ of the data, the cumulative\n", + "gain at level $\\alpha$ is" ] }, { "cell_type": "markdown", - "id": "f018ae57", - "metadata": {}, + "id": "3e9132ef", + "metadata": { + "editable": true + }, "source": [ - "### Cost functions\n", - "\n", - "Here we discuss cost functions that can be used when creating the\n", - "neural network. Every cost function takes the target vector as its\n", - "parameter, and returns a function valued only at $x$ such that it may\n", - "easily be differentiated." + "$$\n", + "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 6, - "id": "c13507bf", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "75be6f5c", + "metadata": { + "editable": true + }, "source": [ - "import autograd.numpy as np\n", - "\n", - "def CostOLS(target):\n", - " \n", - " def func(X):\n", - " return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n", - "\n", - " return func\n", + "For example, cutting off at the top 10% of predictions yields a gain\n", + "equal to (positives in top 10%) divided by (total positives) .\n", + "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n", + "gives the gain curve. The baseline (random) curve is the diagonal\n", + "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n", + "toward 1.\n", "\n", - "\n", - "def CostLogReg(target):\n", - "\n", - " def func(X):\n", - " \n", - " return -(1.0 / target.shape[0]) * np.sum(\n", - " (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n", - " )\n", - "\n", - " return func\n", - "\n", - "\n", - "def CostCrossEntropy(target):\n", - " \n", - " def func(X):\n", - " return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n", - "\n", - " return func" + "A related measure is the {\\em lift}, often called the gain ratio. It is the ratio of the model’s capture rate to that of random selection. Equivalently," ] }, { "cell_type": "markdown", - "id": "6dab17bc", - "metadata": {}, - "source": [ - "Below we give a short example of how these cost function may be used\n", - "to obtain results if you wish to test them out on your own using\n", - "AutoGrad's automatics differentiation." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "a5dbba01", - "metadata": {}, - "outputs": [], + "id": "e5525570", + "metadata": { + "editable": true + }, "source": [ - "from autograd import grad\n", - "\n", - "target = np.array([[1, 2, 3]]).T\n", - "a = np.array([[4, 5, 6]]).T\n", - "\n", - "cost_func = CostCrossEntropy\n", - "cost_func_derivative = grad(cost_func(target))\n", - "\n", - "valued_at_a = cost_func_derivative(a)\n", - "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")" + "$$\n", + "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n", + "$$" ] }, { "cell_type": "markdown", - "id": "b55d31d4", - "metadata": {}, + "id": "18ff8dc2", + "metadata": { + "editable": true + }, "source": [ - "### Activation functions\n", - "\n", - "Finally, before we look at the neural network, we will look at the\n", - "activation functions which can be specified between the hidden layers\n", - "and as the output function. Each function can be valued for any given\n", - "vector or matrix X, and can be differentiated via derivate()." + "A lift $>1$ indicates better-than-random targeting. In practice, gain\n", + "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n", + "show how many positives can be “gained” by focusing on a fraction of\n", + "the population ." ] }, { - "cell_type": "code", - "execution_count": 8, - "id": "b3e045a6", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "c3d3fde8", + "metadata": { + "editable": true + }, "source": [ - "import autograd.numpy as np\n", - "from autograd import elementwise_grad\n", - "\n", - "def identity(X):\n", - " return X\n", - "\n", - "\n", - "def sigmoid(X):\n", - " try:\n", - " return 1.0 / (1 + np.exp(-X))\n", - " except FloatingPointError:\n", - " return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n", - "\n", - "\n", - "def softmax(X):\n", - " X = X - np.max(X, axis=-1, keepdims=True)\n", - " delta = 10e-10\n", - " return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n", - "\n", - "\n", - "def RELU(X):\n", - " return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n", - "\n", - "\n", - "def LRELU(X):\n", - " delta = 10e-4\n", - " return np.where(X > np.zeros(X.shape), X, delta * X)\n", + "### Other measures: Precision, Recall, and the F$_1$ Measure\n", "\n", - "\n", - "def derivate(func):\n", - " if func.__name__ == \"RELU\":\n", - "\n", - " def func(X):\n", - " return np.where(X > 0, 1, 0)\n", - "\n", - " return func\n", - "\n", - " elif func.__name__ == \"LRELU\":\n", - "\n", - " def func(X):\n", - " delta = 10e-4\n", - " return np.where(X > 0, 1, delta)\n", - "\n", - " return func\n", - "\n", - " else:\n", - " return elementwise_grad(func)" + "Precision and recall (sensitivity) quantify binary classification\n", + "accuracy in terms of positive predictions. They are defined from the\n", + "confusion matrix as:" ] }, { "cell_type": "markdown", - "id": "c0189342", - "metadata": {}, + "id": "f1f14c8e", + "metadata": { + "editable": true + }, "source": [ - "Below follows a short demonstration of how to use an activation\n", - "function. The derivative of the activation function will be important\n", - "when calculating the output delta term during backpropagation. Note\n", - "that derivate() can also be used for cost functions for a more\n", - "generalized approach." + "$$\n", + "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 9, - "id": "640aa861", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "422cc743", + "metadata": { + "editable": true + }, "source": [ - "z = np.array([[4, 5, 6]]).T\n", - "print(f\"Input to activation function:\\n{z}\")\n", - "\n", - "act_func = sigmoid\n", - "a = act_func(z)\n", - "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n", + "Precision is the fraction of predicted positives that are correct, and\n", + "recall is the fraction of actual positives that are correctly\n", + "identified . A high-precision classifier makes few false-positive\n", + "errors, while a high-recall classifier makes few false-negative\n", + "errors.\n", "\n", - "act_func_derivative = derivate(act_func)\n", - "valued_at_z = act_func_derivative(a)\n", - "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")" + "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean. The usual formula is:" ] }, { "cell_type": "markdown", - "id": "1007ccdd", - "metadata": {}, + "id": "621a2e8b", + "metadata": { + "editable": true + }, "source": [ - "### The Neural Network\n", - "\n", - "Now that we have gotten a good understanding of the implementation of\n", - "some important components, we can take a look at an object oriented\n", - "implementation of a feed forward neural network. The feed forward\n", - "neural network has been implemented as a class named FFNN, which can\n", - "be initiated as a regressor or classifier dependant on the choice of\n", - "cost function. The FFNN can have any number of input nodes, hidden\n", - "layers with any amount of hidden nodes, and any amount of output nodes\n", - "meaning it can perform multiclass classification as well as binary\n", - "classification and regression problems. Although there is a lot of\n", - "code present, it makes for an easy to use and generalizeable interface\n", - "for creating many types of neural networks as will be demonstrated\n", - "below." + "$$\n", + "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 10, - "id": "9584a2da", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "62eee54a", + "metadata": { + "editable": true + }, "source": [ - "import math\n", - "import autograd.numpy as np\n", - "import sys\n", - "import warnings\n", - "from autograd import grad, elementwise_grad\n", - "from random import random, seed\n", - "from copy import deepcopy, copy\n", - "from typing import Tuple, Callable\n", - "from sklearn.utils import resample\n", - "\n", - "warnings.simplefilter(\"error\")\n", - "\n", - "\n", - "class FFNN:\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Feed Forward Neural Network with interface enabling flexible design of a\n", - " nerual networks architecture and the specification of activation function\n", - " in the hidden layers and output layer respectively. This model can be used\n", - " for both regression and classification problems, depending on the output function.\n", - "\n", - " Attributes:\n", - " ------------\n", - " I dimensions (tuple[int]): A list of positive integers, which specifies the\n", - " number of nodes in each of the networks layers. The first integer in the array\n", - " defines the number of nodes in the input layer, the second integer defines number\n", - " of nodes in the first hidden layer and so on until the last number, which\n", - " specifies the number of nodes in the output layer.\n", - " II hidden_func (Callable): The activation function for the hidden layers\n", - " III output_func (Callable): The activation function for the output layer\n", - " IV cost_func (Callable): Our cost function\n", - " V seed (int): Sets random seed, makes results reproducible\n", - " \"\"\"\n", - "\n", - " def __init__(\n", - " self,\n", - " dimensions: tuple[int],\n", - " hidden_func: Callable = sigmoid,\n", - " output_func: Callable = lambda x: x,\n", - " cost_func: Callable = CostOLS,\n", - " seed: int = None,\n", - " ):\n", - " self.dimensions = dimensions\n", - " self.hidden_func = hidden_func\n", - " self.output_func = output_func\n", - " self.cost_func = cost_func\n", - " self.seed = seed\n", - " self.weights = list()\n", - " self.schedulers_weight = list()\n", - " self.schedulers_bias = list()\n", - " self.a_matrices = list()\n", - " self.z_matrices = list()\n", - " self.classification = None\n", - "\n", - " self.reset_weights()\n", - " self._set_classification()\n", - "\n", - " def fit(\n", - " self,\n", - " X: np.ndarray,\n", - " t: np.ndarray,\n", - " scheduler: Scheduler,\n", - " batches: int = 1,\n", - " epochs: int = 100,\n", - " lam: float = 0,\n", - " X_val: np.ndarray = None,\n", - " t_val: np.ndarray = None,\n", - " ):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " This function performs the training the neural network by performing the feedforward and backpropagation\n", - " algorithm to update the networks weights.\n", - "\n", - " Parameters:\n", - " ------------\n", - " I X (np.ndarray) : training data\n", - " II t (np.ndarray) : target data\n", - " III scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n", - " IV scheduler_args (list[int]) : list of all arguments necessary for scheduler\n", - "\n", - " Optional Parameters:\n", - " ------------\n", - " V batches (int) : number of batches the datasets are split into, default equal to 1\n", - " VI epochs (int) : number of iterations used to train the network, default equal to 100\n", - " VII lam (float) : regularization hyperparameter lambda\n", - " VIII X_val (np.ndarray) : validation set\n", - " IX t_val (np.ndarray) : validation target set\n", - "\n", - " Returns:\n", - " ------------\n", - " I scores (dict) : A dictionary containing the performance metrics of the model.\n", - " The number of the metrics depends on the parameters passed to the fit-function.\n", - "\n", - " \"\"\"\n", - "\n", - " # setup \n", - " if self.seed is not None:\n", - " np.random.seed(self.seed)\n", - "\n", - " val_set = False\n", - " if X_val is not None and t_val is not None:\n", - " val_set = True\n", - "\n", - " # creating arrays for score metrics\n", - " train_errors = np.empty(epochs)\n", - " train_errors.fill(np.nan)\n", - " val_errors = np.empty(epochs)\n", - " val_errors.fill(np.nan)\n", - "\n", - " train_accs = np.empty(epochs)\n", - " train_accs.fill(np.nan)\n", - " val_accs = np.empty(epochs)\n", - " val_accs.fill(np.nan)\n", - "\n", - " self.schedulers_weight = list()\n", - " self.schedulers_bias = list()\n", - "\n", - " batch_size = X.shape[0] // batches\n", - "\n", - " X, t = resample(X, t)\n", - "\n", - " # this function returns a function valued only at X\n", - " cost_function_train = self.cost_func(t)\n", - " if val_set:\n", - " cost_function_val = self.cost_func(t_val)\n", - "\n", - " # create schedulers for each weight matrix\n", - " for i in range(len(self.weights)):\n", - " self.schedulers_weight.append(copy(scheduler))\n", - " self.schedulers_bias.append(copy(scheduler))\n", - "\n", - " print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n", - "\n", - " try:\n", - " for e in range(epochs):\n", - " for i in range(batches):\n", - " # allows for minibatch gradient descent\n", - " if i == batches - 1:\n", - " # If the for loop has reached the last batch, take all thats left\n", - " X_batch = X[i * batch_size :, :]\n", - " t_batch = t[i * batch_size :, :]\n", - " else:\n", - " X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n", - " t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n", - "\n", - " self._feedforward(X_batch)\n", - " self._backpropagate(X_batch, t_batch, lam)\n", - "\n", - " # reset schedulers for each epoch (some schedulers pass in this call)\n", - " for scheduler in self.schedulers_weight:\n", - " scheduler.reset()\n", - "\n", - " for scheduler in self.schedulers_bias:\n", - " scheduler.reset()\n", - "\n", - " # computing performance metrics\n", - " pred_train = self.predict(X)\n", - " train_error = cost_function_train(pred_train)\n", - "\n", - " train_errors[e] = train_error\n", - " if val_set:\n", - " \n", - " pred_val = self.predict(X_val)\n", - " val_error = cost_function_val(pred_val)\n", - " val_errors[e] = val_error\n", - "\n", - " if self.classification:\n", - " train_acc = self._accuracy(self.predict(X), t)\n", - " train_accs[e] = train_acc\n", - " if val_set:\n", - " val_acc = self._accuracy(pred_val, t_val)\n", - " val_accs[e] = val_acc\n", - "\n", - " # printing progress bar\n", - " progression = e / epochs\n", - " print_length = self._progress_bar(\n", - " progression,\n", - " train_error=train_errors[e],\n", - " train_acc=train_accs[e],\n", - " val_error=val_errors[e],\n", - " val_acc=val_accs[e],\n", - " )\n", - " except KeyboardInterrupt:\n", - " # allows for stopping training at any point and seeing the result\n", - " pass\n", - "\n", - " # visualization of training progression (similiar to tensorflow progression bar)\n", - " sys.stdout.write(\"\\r\" + \" \" * print_length)\n", - " sys.stdout.flush()\n", - " self._progress_bar(\n", - " 1,\n", - " train_error=train_errors[e],\n", - " train_acc=train_accs[e],\n", - " val_error=val_errors[e],\n", - " val_acc=val_accs[e],\n", - " )\n", - " sys.stdout.write(\"\")\n", - "\n", - " # return performance metrics for the entire run\n", - " scores = dict()\n", - "\n", - " scores[\"train_errors\"] = train_errors\n", - "\n", - " if val_set:\n", - " scores[\"val_errors\"] = val_errors\n", - "\n", - " if self.classification:\n", - " scores[\"train_accs\"] = train_accs\n", - "\n", - " if val_set:\n", - " scores[\"val_accs\"] = val_accs\n", - "\n", - " return scores\n", - "\n", - " def predict(self, X: np.ndarray, *, threshold=0.5):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Performs prediction after training of the network has been finished.\n", - "\n", - " Parameters:\n", - " ------------\n", - " I X (np.ndarray): The design matrix, with n rows of p features each\n", - "\n", - " Optional Parameters:\n", - " ------------\n", - " II threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n", - " in classification problems\n", - "\n", - " Returns:\n", - " ------------\n", - " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", - " This vector is thresholded if regression=False, meaning that classification results\n", - " in a vector of 1s and 0s, while regressions in an array of decimal numbers\n", - "\n", - " \"\"\"\n", - "\n", - " predict = self._feedforward(X)\n", - "\n", - " if self.classification:\n", - " return np.where(predict > threshold, 1, 0)\n", - " else:\n", - " return predict\n", - "\n", - " def reset_weights(self):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Resets/Reinitializes the weights in order to train the network for a new problem.\n", - "\n", - " \"\"\"\n", - " if self.seed is not None:\n", - " np.random.seed(self.seed)\n", - "\n", - " self.weights = list()\n", - " for i in range(len(self.dimensions) - 1):\n", - " weight_array = np.random.randn(\n", - " self.dimensions[i] + 1, self.dimensions[i + 1]\n", - " )\n", - " weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n", - "\n", - " self.weights.append(weight_array)\n", - "\n", - " def _feedforward(self, X: np.ndarray):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Calculates the activation of each layer starting at the input and ending at the output.\n", - " Each following activation is calculated from a weighted sum of each of the preceeding\n", - " activations (except in the case of the input layer).\n", - "\n", - " Parameters:\n", - " ------------\n", - " I X (np.ndarray): The design matrix, with n rows of p features each\n", - "\n", - " Returns:\n", - " ------------\n", - " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", - " \"\"\"\n", - "\n", - " # reset matrices\n", - " self.a_matrices = list()\n", - " self.z_matrices = list()\n", - "\n", - " # if X is just a vector, make it into a matrix\n", - " if len(X.shape) == 1:\n", - " X = X.reshape((1, X.shape[0]))\n", - "\n", - " # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n", - " # to add bias to our data\n", - " bias = np.ones((X.shape[0], 1)) * 0.01\n", - " X = np.hstack([bias, X])\n", - "\n", - " # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n", - " # exponent indicates layer number).\n", - " a = X\n", - " self.a_matrices.append(a)\n", - " self.z_matrices.append(a)\n", - "\n", - " # The feed forward algorithm\n", - " for i in range(len(self.weights)):\n", - " if i < len(self.weights) - 1:\n", - " z = a @ self.weights[i]\n", - " self.z_matrices.append(z)\n", - " a = self.hidden_func(z)\n", - " # bias column again added to the data here\n", - " bias = np.ones((a.shape[0], 1)) * 0.01\n", - " a = np.hstack([bias, a])\n", - " self.a_matrices.append(a)\n", - " else:\n", - " try:\n", - " # a^L, the nodes in our output layers\n", - " z = a @ self.weights[i]\n", - " a = self.output_func(z)\n", - " self.a_matrices.append(a)\n", - " self.z_matrices.append(z)\n", - " except Exception as OverflowError:\n", - " print(\n", - " \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n", - " )\n", - "\n", - " # this will be a^L\n", - " return a\n", - "\n", - " def _backpropagate(self, X, t, lam):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Performs the backpropagation algorithm. In other words, this method\n", - " calculates the gradient of all the layers starting at the\n", - " output layer, and moving from right to left accumulates the gradient until\n", - " the input layer is reached. Each layers respective weights are updated while\n", - " the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n", - "\n", - " Parameters:\n", - " ------------\n", - " I X (np.ndarray): The design matrix, with n rows of p features each.\n", - " II t (np.ndarray): The target vector, with n rows of p targets.\n", - " III lam (float32): regularization parameter used to punish the weights in case of overfitting\n", - "\n", - " Returns:\n", - " ------------\n", - " No return value.\n", - "\n", - " \"\"\"\n", - " out_derivative = derivate(self.output_func)\n", - " hidden_derivative = derivate(self.hidden_func)\n", - "\n", - " for i in range(len(self.weights) - 1, -1, -1):\n", - " # delta terms for output\n", - " if i == len(self.weights) - 1:\n", - " # for multi-class classification\n", - " if (\n", - " self.output_func.__name__ == \"softmax\"\n", - " ):\n", - " delta_matrix = self.a_matrices[i + 1] - t\n", - " # for single class classification\n", - " else:\n", - " cost_func_derivative = grad(self.cost_func(t))\n", - " delta_matrix = out_derivative(\n", - " self.z_matrices[i + 1]\n", - " ) * cost_func_derivative(self.a_matrices[i + 1])\n", - "\n", - " # delta terms for hidden layer\n", - " else:\n", - " delta_matrix = (\n", - " self.weights[i + 1][1:, :] @ delta_matrix.T\n", - " ).T * hidden_derivative(self.z_matrices[i + 1])\n", - "\n", - " # calculate gradient\n", - " gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n", - " gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n", - " 1, delta_matrix.shape[1]\n", - " )\n", - "\n", - " # regularization term\n", - " gradient_weights += self.weights[i][1:, :] * lam\n", - "\n", - " # use scheduler\n", - " update_matrix = np.vstack(\n", - " [\n", - " self.schedulers_bias[i].update_change(gradient_bias),\n", - " self.schedulers_weight[i].update_change(gradient_weights),\n", - " ]\n", - " )\n", - "\n", - " # update weights and bias\n", - " self.weights[i] -= update_matrix\n", - "\n", - " def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Calculates accuracy of given prediction to target\n", - "\n", - " Parameters:\n", - " ------------\n", - " I prediction (np.ndarray): vector of predicitons output network\n", - " (1s and 0s in case of classification, and real numbers in case of regression)\n", - " II target (np.ndarray): vector of true values (What the network ideally should predict)\n", - "\n", - " Returns:\n", - " ------------\n", - " A floating point number representing the percentage of correctly classified instances.\n", - " \"\"\"\n", - " assert prediction.size == target.size\n", - " return np.average((target == prediction))\n", - " def _set_classification(self):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Decides if FFNN acts as classifier (True) og regressor (False),\n", - " sets self.classification during init()\n", - " \"\"\"\n", - " self.classification = False\n", - " if (\n", - " self.cost_func.__name__ == \"CostLogReg\"\n", - " or self.cost_func.__name__ == \"CostCrossEntropy\"\n", - " ):\n", - " self.classification = True\n", - "\n", - " def _progress_bar(self, progression, **kwargs):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Displays progress of training\n", - " \"\"\"\n", - " print_length = 40\n", - " num_equals = int(progression * print_length)\n", - " num_not = print_length - num_equals\n", - " arrow = \">\" if num_equals > 0 else \"\"\n", - " bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n", - " perc_print = self._format(progression * 100, decimals=5)\n", - " line = f\" {bar} {perc_print}% \"\n", - "\n", - " for key in kwargs:\n", - " if not np.isnan(kwargs[key]):\n", - " value = self._format(kwargs[key], decimals=4)\n", - " line += f\"| {key}: {value} \"\n", - " sys.stdout.write(\"\\r\" + line)\n", - " sys.stdout.flush()\n", - " return len(line)\n", - "\n", - " def _format(self, value, decimals=4):\n", - " \"\"\"\n", - " Description:\n", - " ------------\n", - " Formats decimal numbers for progress bar\n", - " \"\"\"\n", - " if value > 0:\n", - " v = value\n", - " elif value < 0:\n", - " v = -10 * value\n", - " else:\n", - " v = 1\n", - " n = 1 + math.floor(math.log10(v))\n", - " if n >= decimals - 1:\n", - " return str(round(value))\n", - " return f\"{value:.{decimals-n-1}f}\"" + "This can be shown to equal" ] }, { "cell_type": "markdown", - "id": "9ccd1fc1", - "metadata": {}, + "id": "7a6a2e7a", + "metadata": { + "editable": true + }, "source": [ - "Before we make a model, we will quickly generate a dataset we can use\n", - "for our linear regression problem as shown below" + "$$\n", + "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 11, - "id": "7f3a5b31", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "b96c9ff4", + "metadata": { + "editable": true + }, "source": [ - "import autograd.numpy as np\n", - "from sklearn.model_selection import train_test_split\n", - "\n", - "def SkrankeFunction(x, y):\n", - " return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n", - "\n", - "def create_X(x, y, n):\n", - " if len(x.shape) > 1:\n", - " x = np.ravel(x)\n", - " y = np.ravel(y)\n", - "\n", - " N = len(x)\n", - " l = int((n + 1) * (n + 2) / 2) # Number of elements in beta\n", - " X = np.ones((N, l))\n", - "\n", - " for i in range(1, n + 1):\n", - " q = int((i) * (i + 1) / 2)\n", - " for k in range(i + 1):\n", - " X[:, q + k] = (x ** (i - k)) * (y**k)\n", + "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n", + "trade-off between precision and recall.\n", "\n", - " return X\n", + "For multi-class classification, one computes per-class\n", + "precision/recall/F$_1$ (treating each class as “positive” in a\n", + "one-vs-rest manner) and then averages. Common averaging methods are:\n", "\n", - "step=0.5\n", - "x = np.arange(0, 1, step)\n", - "y = np.arange(0, 1, step)\n", - "x, y = np.meshgrid(x, y)\n", - "target = SkrankeFunction(x, y)\n", - "target = target.reshape(target.shape[0], 1)\n", + "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n", + "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ . This treats all classes equally regardless of size.\n", + "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$. This accounts for class imbalance by giving more weight to larger classes .\n", "\n", - "poly_degree=3\n", - "X = create_X(x, y, poly_degree)\n", - "\n", - "X_train, X_test, t_train, t_test = train_test_split(X, target)" + "Each of these averages has different use-cases. Micro-average is\n", + "dominated by common classes, macro-average highlights performance on\n", + "rare classes, and weighted-average is a compromise. These formulas\n", + "and concepts allow rigorous evaluation of classifier performance in\n", + "both binary and multi-class settings." ] }, { "cell_type": "markdown", - "id": "1ac05bb6", - "metadata": {}, - "source": [ - "Now that we have our dataset ready for the regression, we can create\n", - "our regressor. Note that with the seed parameter, we can make sure our\n", - "results stay the same every time we run the neural network. For\n", - "inititialization, we simply specify the dimensions (we wish the amount\n", - "of input nodes to be equal to the datapoints, and the output to\n", - "predict one value)." - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "0f857604", - "metadata": {}, - "outputs": [], + "id": "9274bf3f", + "metadata": { + "editable": true + }, "source": [ - "input_nodes = X_train.shape[1]\n", - "output_nodes = 1\n", + "## Exercises\n", "\n", - "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)" - ] - }, - { - "cell_type": "markdown", - "id": "eeff4315", - "metadata": {}, - "source": [ - "We then fit our model with our training data using the scheduler of our choice." + "Here is a simple code example which uses the Logistic regression machinery from **scikit-learn**.\n", + "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n", + "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)." ] }, { "cell_type": "code", - "execution_count": 13, - "id": "46246810", - "metadata": {}, + "execution_count": 1, + "id": "be9ff0b9", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, "outputs": [], "source": [ - "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "%matplotlib inline\n", "\n", - "scheduler = Constant(eta=1e-3)\n", - "scores = linear_regression.fit(X_train, t_train, scheduler)" + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "# from sklearn.datasets import fill in the data set\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data, fill inn\n", + "mydata.data = ?\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "# define which type of problem, binary or multiclass\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" ] }, { "cell_type": "markdown", - "id": "8c6f9954", - "metadata": {}, - "source": [ - "Due to the progress bar we can see the MSE (train_error) throughout\n", - "the FFNN's training. Note that the fit() function has some optional\n", - "parameters with defualt arguments. For example, the regularization\n", - "hyperparameter can be left ignored if not needed, and equally the FFNN\n", - "will by default run for 100 epochs. These can easily be changed, such\n", - "as for example:" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "2661939c", - "metadata": {}, - "outputs": [], + "id": "51760b3e", + "metadata": { + "editable": true + }, "source": [ - "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "### Exercise a)\n", "\n", - "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)" + "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem." ] }, { "cell_type": "markdown", - "id": "74c5624c", - "metadata": {}, - "source": [ - "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", - "\n", - "Let us then switch to a binary classification. We use a binary\n", - "classification dataset, and follow a similar setup to the regression\n", - "case." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "4b8eb115", - "metadata": {}, - "outputs": [], + "id": "c1d42f5f", + "metadata": { + "editable": true + }, "source": [ - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.preprocessing import MinMaxScaler\n", - "\n", - "wisconsin = load_breast_cancer()\n", - "X = wisconsin.data\n", - "target = wisconsin.target\n", - "target = target.reshape(target.shape[0], 1)\n", - "\n", - "X_train, X_val, t_train, t_val = train_test_split(X, target)\n", + "### Exercise b)\n", "\n", - "scaler = MinMaxScaler()\n", - "scaler.fit(X_train)\n", - "X_train = scaler.transform(X_train)\n", - "X_val = scaler.transform(X_val)" + "Use a binary classification data available from **scikit-learn**. As an example you can use\n", + "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines" ] }, { "cell_type": "code", - "execution_count": 16, - "id": "2c0f92bd", - "metadata": {}, + "execution_count": 2, + "id": "d20bb8be", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, "outputs": [], "source": [ - "input_nodes = X_train.shape[1]\n", - "output_nodes = 1\n", - "\n", - "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + "from sklearn.datasets import load_digits\n", + "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n", + "X, y = digits.data, digits.target" ] }, { "cell_type": "markdown", - "id": "49201ae4", - "metadata": {}, + "id": "828ea1cd", + "metadata": { + "editable": true + }, "source": [ - "We will now make use of our validation data by passing it into our fit function as a keyword argument" + "Alternatively, you can use the _make$\\_$classification_\n", + "functionality. This function generates a random $n$-class classification\n", + "dataset, which can be configured for binary classification by setting\n", + "n_classes=2. You can also control the number of samples, features,\n", + "informative features, redundant features, and more." ] }, { "cell_type": "code", - "execution_count": 17, - "id": "55b5e426", - "metadata": {}, + "execution_count": 3, + "id": "d271f0ba", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, "outputs": [], "source": [ - "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", - "\n", - "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n", - "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + "from sklearn.datasets import make_classification\n", + "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)" ] }, { "cell_type": "markdown", - "id": "b51762fb", - "metadata": {}, + "id": "0068b032", + "metadata": { + "editable": true + }, "source": [ - "Finally, we will create a neural network with 2 hidden layers with activation functions." + "You can use this option for the multiclass case as well, see the next exercise.\n", + "If you prefer to study other binary classification datasets, feel free\n", + "to replace the above suggestions with your own dataset.\n", + "\n", + "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve." ] }, { - "cell_type": "code", - "execution_count": 18, - "id": "6b59e27d", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "c45f5b41", + "metadata": { + "editable": true + }, "source": [ - "input_nodes = X_train.shape[1]\n", - "hidden_nodes1 = 100\n", - "hidden_nodes2 = 30\n", - "output_nodes = 1\n", - "\n", - "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "### Exercise c) week 43\n", "\n", - "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + "As a multiclass problem, we will use the Iris data set discussed in\n", + "the exercises from weeks 41 and 42. This is a three-class data set and\n", + "you can set it up using **scikit-learn**," ] }, { "cell_type": "code", - "execution_count": 19, - "id": "72c87921", - "metadata": {}, + "execution_count": 4, + "id": "3b045d56", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, "outputs": [], "source": [ - "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", - "\n", - "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", - "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + "from sklearn.datasets import load_iris\n", + "iris = load_iris()\n", + "X = iris.data # Features\n", + "y = iris.target # Target labels" ] }, { "cell_type": "markdown", - "id": "ed40d7d2", - "metadata": {}, - "source": [ - "### Multiclass classification\n", - "\n", - "Finally, we will demonstrate the use case of multiclass classification\n", - "using our FFNN with the famous MNIST dataset, which contain images of\n", - "digits between the range of 0 to 9." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "315ef3fe", - "metadata": {}, - "outputs": [], + "id": "14cc859c", + "metadata": { + "editable": true + }, "source": [ - "from sklearn.datasets import load_digits\n", - "\n", - "def onehot(target: np.ndarray):\n", - " onehot = np.zeros((target.size, target.max() + 1))\n", - " onehot[np.arange(target.size), target] = 1\n", - " return onehot\n", - "\n", - "digits = load_digits()\n", - "\n", - "X = digits.data\n", - "target = digits.target\n", - "target = onehot(target)\n", - "\n", - "input_nodes = 64\n", - "hidden_nodes1 = 100\n", - "hidden_nodes2 = 30\n", - "output_nodes = 10\n", - "\n", - "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", - "\n", - "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n", - "\n", - "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", - "\n", - "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", - "scores = multiclass.fit(X, target, scheduler, epochs=1000)" + "Make plots of the confusion matrix, the ROC curve and the cumulative\n", + "gain curve for this (or other) multiclass data set." ] } ], @@ -1507,7 +639,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.10" + "version": "3.9.15" } }, "nbformat": 4, diff --git a/doc/LectureNotes/.ipynb_checkpoints/project1-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/project1-checkpoint.ipynb new file mode 100644 index 000000000..5170af951 --- /dev/null +++ b/doc/LectureNotes/.ipynb_checkpoints/project1-checkpoint.ipynb @@ -0,0 +1,688 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b209e219", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "6fa4c4bc", + "metadata": { + "editable": true + }, + "source": [ + "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n", + "\n", + "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n", + "\n", + "Date: **September 2**\n" + ] + }, + { + "cell_type": "markdown", + "id": "beb333e3", + "metadata": {}, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + " - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + " - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + " - A PDF file of the report\n", + " - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + " - A README file with\n", + " - the name of the group members\n", + " - a short description of the project\n", + " - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n", + " - names and descriptions of the various notebooks in the Code folder and the results they produce\n" + ] + }, + { + "cell_type": "markdown", + "id": "735b16c4", + "metadata": { + "editable": true + }, + "source": [ + "## Preamble: Note on writing reports, using reference material, AI and other tools\n", + "\n", + "We want you to answer the three different projects by handing in\n", + "reports written like a standard scientific/technical report. The\n", + "links at\n", + "\n", + "contain more information. There you can find examples of previous\n", + "reports, the projects themselves, how we grade reports etc. How to\n", + "write reports will also be discussed during the various lab\n", + "sessions. Please do ask us if you are in doubt.\n", + "\n", + "When using codes and material from other sources, you should refer to\n", + "these in the bibliography of your report, indicating wherefrom you for\n", + "example got the code, whether this is from the lecture notes,\n", + "softwares like Scikit-Learn, TensorFlow, PyTorch or other sources. These sources\n", + "should always be cited correctly. How to cite some\n", + "of the libraries is often indicated from their corresponding GitHub\n", + "sites or websites, see for example how to cite Scikit-Learn at\n", + ".\n", + "\n", + "We enocurage you to use tools like\n", + "[ChatGPT](https://openai.com/chatgpt/) or similar in writing the report. If you use for example ChatGPT,\n", + "please do cite it properly and include (if possible) your questions and answers as an addition to the report. This can\n", + "be uploaded to for example your website, GitHub/GitLab or similar as supplemental material.\n", + "\n", + "If you would like to study other data sets, feel free to propose other\n", + "sets. What we have proposed here are mere suggestions from our\n", + "side. If you opt for another data set, consider using a set which has\n", + "been studied in the scientific literature. This makes it easier for\n", + "you to compare and analyze your results. Comparing with existing\n", + "results from the scientific literature is also an essential element of\n", + "the scientific discussion. The University of California at Irvine\n", + "with its Machine Learning repository at\n", + " is an excellent site to\n", + "look up for examples and\n", + "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n", + "interesting site. Feel free to explore these sites. When selecting\n", + "other data sets, make sure these are sets used for regression problems\n", + "(not classification).\n" + ] + }, + { + "cell_type": "markdown", + "id": "0b7956ca", + "metadata": { + "editable": true + }, + "source": [ + "## Regression analysis and resampling methods\n", + "\n", + "The main aim of this project is to study in more detail various\n", + "regression methods, including Ordinary Least Squares (OLS) reegression, Ridge regression and LASSO regression.\n", + "In addition to the scientific part, in this course we want also to\n", + "give you an experience in writing scientific reports.\n", + "\n", + "We will study how to fit polynomials to specific\n", + "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n", + "\n", + "We will use Runge's function (see for a discussion). The one-dimensional function we will study is\n" + ] + }, + { + "cell_type": "markdown", + "id": "28ba3d22", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1+25x^2}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "9a3e10ba", + "metadata": { + "editable": true + }, + "source": [ + "Our first step will be to perform an OLS regression analysis of this\n", + "function, trying out a polynomial fit with an $x$ dependence of the\n", + "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", + "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n", + "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n", + "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", + "\n", + "We will also include bootstrap as a resampling technique in order to\n", + "study the so-called **bias-variance tradeoff**. After that we will\n", + "include the so-called cross-validation technique.\n" + ] + }, + { + "cell_type": "markdown", + "id": "8aa547a5", + "metadata": { + "editable": true + }, + "source": [ + "### Part a : Ordinary Least Square (OLS) for the Runge function\n", + "\n", + "We will generate our own dataset for abovementioned function\n", + "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n", + "of an added stochastic noise to this function using the normal\n", + "distribution $N(0,1)$.\n", + "\n", + "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", + "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", + "\n", + "Evaluate the mean Squared error (MSE)\n" + ] + }, + { + "cell_type": "markdown", + "id": "68fbf03d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n", + "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "b49509bc", + "metadata": { + "editable": true + }, + "source": [ + "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", + "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n", + "then the score $R^2$ is defined as\n" + ] + }, + { + "cell_type": "markdown", + "id": "0fa4ffc6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "ce462b32", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined the mean value of $\\boldsymbol{y}$ as\n" + ] + }, + { + "cell_type": "markdown", + "id": "a5fbef36", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\bar{y} = \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "a6afe9cb", + "metadata": { + "editable": true + }, + "source": [ + "Plot the resulting scores (MSE and R$^2$) as functions of the polynomial degree (here up to polymial degree 15).\n", + "Plot also the parameters $\\theta$ as you increase the order of the polynomial. Comment your results.\n", + "\n", + "Your code has to include a scaling/centering of the data (for example by\n", + "subtracting the mean value), and\n", + "a split of the data in training and test data. For the scaling you can\n", + "either write your own code or use for example the function for\n", + "splitting training data provided by the library **Scikit-Learn** (make\n", + "sure you have installed it). This function is called\n", + "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", + "\n", + "It is normal in essentially all Machine Learning studies to split the\n", + "data in a training set and a test set (eventually also an additional\n", + "validation set). There\n", + "is no explicit recipe for how much data should be included as training\n", + "data and say test data. An accepted rule of thumb is to use\n", + "approximately $2/3$ to $4/5$ of the data as training data.\n", + "\n", + "You can easily reuse the solutions to your exercises from week 35.\n", + "See also the lecture slides from week 35 and week 36.\n", + "\n", + "On scaling, we recommend reading the following section from the scikit-learn software description, see .\n" + ] + }, + { + "cell_type": "markdown", + "id": "3be10f68", + "metadata": { + "editable": true + }, + "source": [ + "### Part b: Adding Ridge regression for the Runge function\n", + "\n", + "Write your own code for the Ridge method as done in the previous\n", + "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", + "\n", + "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n", + "analyze your results with those obtained in part a) with the OLS method. Study the\n", + "dependence on $\\lambda$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "caa7909c", + "metadata": { + "editable": true + }, + "source": [ + "### Part c: Writing your own gradient descent code\n", + "\n", + "Replace now the analytical expressions for the optimal parameters\n", + "$\\boldsymbol{\\theta}$ with your own gradient descent code. In this exercise we\n", + "focus only on the simplest gradient descent approach with a fixed\n", + "learning rate (see the exercises from week 37 and the lecture notes\n", + "from week 36).\n", + "\n", + "Study and compare your results from parts a) and b) with your gradient\n", + "descent approch. Discuss in particular the role of the learning rate.\n" + ] + }, + { + "cell_type": "markdown", + "id": "3aac4df1", + "metadata": { + "editable": true + }, + "source": [ + "### Part d: Including momentum and more advanced ways to update the learning the rate\n", + "\n", + "We keep our focus on OLS and Ridge regression and update our code for\n", + "the gradient descent method by including **momentum**, **ADAgrad**,\n", + "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n", + "rate. Discuss the results and compare the different methods applied to\n", + "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d0862a53", + "metadata": { + "editable": true + }, + "source": [ + "### Part e: Writing our own code for Lasso regression\n", + "\n", + "LASSO regression (see lecture slides from week 36 and week 37)\n", + "represents our first encounter with a machine learning method which\n", + "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n", + "descent methods you developed in parts c) and d) to solve the LASSO\n", + "optimization problem. You can compare your results with\n", + "the functionalities of **Scikit-Learn**.\n", + "\n", + "Discuss (critically) your results for the Runge function from OLS,\n", + "Ridge and LASSO regression using the various gradient descent\n", + "approaches.\n" + ] + }, + { + "cell_type": "markdown", + "id": "9170032e", + "metadata": { + "editable": true + }, + "source": [ + "### Part f: Stochastic gradient descent\n", + "\n", + "Our last gradient step is to include stochastic gradient descent using\n", + "the same methods to update the learning rates as in parts c-e).\n", + "Compare and discuss your results with and without stochastic gradient\n", + "and give a critical assessment of the various methods.\n" + ] + }, + { + "cell_type": "markdown", + "id": "bacd1035", + "metadata": { + "editable": true + }, + "source": [ + "### Part g: Bias-variance trade-off and resampling techniques\n", + "\n", + "Our aim here is to study the bias-variance trade-off by implementing\n", + "the **bootstrap** resampling technique. **We will only use the simpler\n", + "ordinary least squares here**.\n", + "\n", + "With a code which does OLS and includes resampling techniques,\n", + "we will now discuss the bias-variance trade-off in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks and basically all Machine Learning algorithms.\n", + "\n", + "Before you perform an analysis of the bias-variance trade-off on your\n", + "test data, make first a figure similar to Fig. 2.11 of Hastie,\n", + "Tibshirani, and Friedman. Figure 2.11 of this reference displays only\n", + "the test and training MSEs. The test MSE can be used to indicate\n", + "possible regions of low/high bias and variance. You will most likely\n", + "not get an equally smooth curve! You may also need to increase the\n", + "polynomial order and play around with the number of data points as\n", + "well (see also the exercise set from week 35).\n", + "\n", + "With this result we move on to the bias-variance trade-off analysis.\n", + "\n", + "Consider a\n", + "dataset $\\mathcal{L}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n", + "\n", + "We assume that the true data is generated from a noisy model\n" + ] + }, + { + "cell_type": "markdown", + "id": "b871ec69", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "b47c19bc", + "metadata": { + "editable": true + }, + "source": [ + "Here $\\epsilon$ is normally distributed with mean zero and standard\n", + "deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n", + "\n", + "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n", + "squared error via the so-called cost function\n" + ] + }, + { + "cell_type": "markdown", + "id": "6db622c2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "5a7eb70d", + "metadata": { + "editable": true + }, + "source": [ + "Here the expected value $\\mathbb{E}$ is the sample value.\n", + "\n", + "Show that you can rewrite this in terms of a term which contains the\n", + "variance of the model itself (the so-called variance term), a term\n", + "which measures the deviation from the true data and the mean value of\n", + "the model (the bias term) and finally the variance of the noise.\n", + "\n", + "That is, show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "d50292fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "50fa641f", + "metadata": { + "editable": true + }, + "source": [ + "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n" + ] + }, + { + "cell_type": "markdown", + "id": "2bd429c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "737c2819", + "metadata": { + "editable": true + }, + "source": [ + "and\n" + ] + }, + { + "cell_type": "markdown", + "id": "41ef92ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "b948dab0", + "metadata": { + "editable": true + }, + "source": [ + "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n", + "\n", + "The answer to this exercise should be included in the theory part of\n", + "the report. This exercise is also part of the weekly exercises of\n", + "week 38. Explain what the terms mean and discuss their\n", + "interpretations.\n", + "\n", + "Perform then a bias-variance analysis of the Runge function by\n", + "studying the MSE value as function of the complexity of your model.\n", + "\n", + "Discuss the bias and variance trade-off as function\n", + "of your model complexity (the degree of the polynomial) and the number\n", + "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n", + "You can follow the code example in the jupyter-book at .\n" + ] + }, + { + "cell_type": "markdown", + "id": "6a0548bf", + "metadata": { + "editable": true + }, + "source": [ + "### Part h): Cross-validation as resampling techniques, adding more complexity\n", + "\n", + "The aim here is to implement another widely popular\n", + "resampling technique, the so-called cross-validation method.\n", + "\n", + "Implement the $k$-fold cross-validation algorithm (feel free to use\n", + "the functionality of **Scikit-Learn** or write your own code) and\n", + "evaluate again the MSE function resulting from the test folds.\n", + "\n", + "Compare the MSE you get from your cross-validation code with the one\n", + "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n", + "\n", + "In addition to using the ordinary least squares method, you should\n", + "include both Ridge and Lasso regression in the final analysis.\n" + ] + }, + { + "cell_type": "markdown", + "id": "df9845cb", + "metadata": { + "editable": true + }, + "source": [ + "## Background literature\n", + "\n", + "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n", + "\n", + "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n" + ] + }, + { + "cell_type": "markdown", + "id": "b9e04791", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to numerical projects\n", + "\n", + "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n", + "\n", + "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "\n", + "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "\n", + "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", + "\n", + "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "\n", + "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "\n", + "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "\n", + "- Try to give an interpretation of you results in your answers to the problems.\n", + "\n", + "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "\n", + "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n" + ] + }, + { + "cell_type": "markdown", + "id": "3fab6237", + "metadata": { + "editable": true + }, + "source": [ + "## Format for electronic delivery of report and programs\n", + "\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", + "\n", + "- Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "\n", + "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "\n", + "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "\n", + "Finally,\n", + "we encourage you to collaborate. Optimal working groups consist of\n", + "2-3 students. You can then hand in a common report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "3388eb60", + "metadata": { + "editable": true + }, + "source": [ + "## Software and needed installations\n", + "\n", + "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n", + "we recommend that you install the following Python packages via **pip** as\n", + "\n", + "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n", + "\n", + "For Python3, replace **pip** with **pip3**.\n", + "\n", + "See below for a discussion of **tensorflow** and **scikit-learn**.\n", + "\n", + "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n", + "for a seamless installation of additional software via for example\n", + "\n", + "1. brew install python3\n", + "\n", + "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n", + "you can use **pip** as well and simply install Python as\n", + "\n", + "1. sudo apt-get install python3 (or python for python2.7)\n", + "\n", + "etc etc.\n", + "\n", + "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", + "\n", + "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n", + "\n", + "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", + "\n", + "Popular software packages written in Python for ML are\n", + "\n", + "- [Scikit-learn](http://scikit-learn.org/stable/),\n", + "\n", + "- [Tensorflow](https://www.tensorflow.org/),\n", + "\n", + "- [PyTorch](http://pytorch.org/) and\n", + "\n", + "- [Keras](https://keras.io/).\n", + "\n", + "These are all freely available at their respective GitHub sites. They\n", + "encompass communities of developers in the thousands or more. And the number\n", + "of code developers and contributors keeps increasing.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/.ipynb_checkpoints/week37-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/week37-checkpoint.ipynb new file mode 100644 index 000000000..9038066ab --- /dev/null +++ b/doc/LectureNotes/.ipynb_checkpoints/week37-checkpoint.ipynb @@ -0,0 +1,3484 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "311a2385", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "9e4484dc", + "metadata": { + "editable": true + }, + "source": [ + "# Week 37: Gradient descent methods\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 8-12, 2025**\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "a24010ae", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 37, lecture Monday\n", + "\n", + "**Plans and material for the lecture on Monday September 8.**\n", + "\n", + "The family of gradient descent methods\n", + "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n", + "\n", + "2. Improving gradient descent with momentum\n", + "\n", + "3. Introducing stochastic gradient descent\n", + "\n", + "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "4a291d59", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at and chapter 8.3-8.5 at \n", + "\n", + "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n", + "\n", + "3. Video on gradient descent at \n", + "\n", + "4. Video on Stochastic gradient descent at " + ] + }, + { + "cell_type": "markdown", + "id": "85c747e2", + "metadata": { + "editable": true + }, + "source": [ + "## Material for lecture Monday September 8" + ] + }, + { + "cell_type": "markdown", + "id": "6580dfe2", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and revisiting Ordinary Least Squares from last week\n", + "\n", + "Last week we started with linear regression as a case study for the gradient descent\n", + "methods. Linear regression is a great test case for the gradient\n", + "descent methods discussed in the lectures since it has several\n", + "desirable properties such as:\n", + "\n", + "1. An analytical solution (recall homework sets for week 35).\n", + "\n", + "2. The gradient can be computed analytically.\n", + "\n", + "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n", + "\n", + "We revisit an example similar to what we had in the first homework set. We have a function of the type" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "c2ddcfe5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "x = 2*np.random.rand(m,1)\n", + "y = 4+3*x+np.random.randn(m,1)" + ] + }, + { + "cell_type": "markdown", + "id": "e1e8a5b2", + "metadata": { + "editable": true + }, + "source": [ + "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n", + "The linear regression model is given by" + ] + }, + { + "cell_type": "markdown", + "id": "c8a5100b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b026883e", + "metadata": { + "editable": true + }, + "source": [ + "such that" + ] + }, + { + "cell_type": "markdown", + "id": "3a2f7b75", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6380eed5", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent example\n", + "\n", + "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n", + "\n", + "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)" + ] + }, + { + "cell_type": "markdown", + "id": "c5d3766d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "X \\equiv \\begin{bmatrix}\n", + "1 & x_1 \\\\\n", + "\\vdots & \\vdots \\\\\n", + "1 & x_{100} & \\\\\n", + "\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1d313807", + "metadata": { + "editable": true + }, + "source": [ + "The cost/loss/risk function is given by" + ] + }, + { + "cell_type": "markdown", + "id": "bee64882", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ffe8d02", + "metadata": { + "editable": true + }, + "source": [ + "and we want to find $\\theta$ such that $C(\\theta)$ is minimized." + ] + }, + { + "cell_type": "markdown", + "id": "97225362", + "metadata": { + "editable": true + }, + "source": [ + "## The derivative of the cost/loss function\n", + "\n", + "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show that the gradient can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "9fe2a0b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2e678439", + "metadata": { + "editable": true + }, + "source": [ + "where $X$ is the design matrix defined above." + ] + }, + { + "cell_type": "markdown", + "id": "5f45e358", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix\n", + "The Hessian matrix of $C(\\theta)$ is given by" + ] + }, + { + "cell_type": "markdown", + "id": "1713ee43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "671ea0fc", + "metadata": { + "editable": true + }, + "source": [ + "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite." + ] + }, + { + "cell_type": "markdown", + "id": "7df56d17", + "metadata": { + "editable": true + }, + "source": [ + "## Simple program\n", + "\n", + "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to" + ] + }, + { + "cell_type": "markdown", + "id": "5887c657", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5a012ac0", + "metadata": { + "editable": true + }, + "source": [ + "We can use the expression we computed for the gradient and let use a\n", + "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n", + "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n", + "\n", + "And finally we can compare our solution for $\\theta$ with the analytic result given by \n", + "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$." + ] + }, + { + "cell_type": "markdown", + "id": "cf1fd4f4", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient Descent Example\n", + "\n", + "Here our simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4417d3aa", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "\n", + "# Importing various packages\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "# Hessian matrix\n", + "H = (2.0/n)* X.T @ X\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n", + "print(theta_linreg)\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "for iter in range(Niterations):\n", + " gradient = (2.0/n)*X.T @ (X @ theta-y)\n", + " theta -= eta*gradient\n", + "\n", + "print(theta)\n", + "xnew = np.array([[0],[2]])\n", + "xbnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = xbnew.dot(theta)\n", + "ypredict2 = xbnew.dot(theta_linreg)\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "7d39d005", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and Ridge\n", + "\n", + "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$," + ] + }, + { + "cell_type": "markdown", + "id": "45a85d32", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "31d267ea", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows" + ] + }, + { + "cell_type": "markdown", + "id": "f8f50b02", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C_{\\text{ridge}}(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ac21d44c", + "metadata": { + "editable": true + }, + "source": [ + "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by" + ] + }, + { + "cell_type": "markdown", + "id": "aae5aaa1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "319922a5", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix for Ridge Regression\n", + "The Hessian matrix of Ridge Regression for our simple example is given by" + ] + }, + { + "cell_type": "markdown", + "id": "724078a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dbc443e3", + "metadata": { + "editable": true + }, + "source": [ + "This implies that the Hessian matrix is positive definite, hence the stationary point is a\n", + "minimum.\n", + "Note that the Ridge cost function is convex being a sum of two convex\n", + "functions. Therefore, the stationary point is a global\n", + "minimum of this function." + ] + }, + { + "cell_type": "markdown", + "id": "2ea2bf50", + "metadata": { + "editable": true + }, + "source": [ + "## Program example for gradient descent with Ridge Regression" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9f431da1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "\n", + "#Ridge parameter lambda\n", + "lmbda = 0.001\n", + "Id = n*lmbda* np.eye(XT_X.shape[0])\n", + "\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "\n", + "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n", + "print(theta_linreg)\n", + "# Start plain gradient descent\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n", + " theta -= eta*gradients\n", + "\n", + "print(theta)\n", + "ypredict = X @ theta\n", + "ypredict2 = X @ theta_linreg\n", + "plt.plot(x, ypredict, \"r-\")\n", + "plt.plot(x, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example for Ridge')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8aa155a9", + "metadata": { + "editable": true + }, + "source": [ + "## Using gradient descent methods, limitations\n", + "\n", + "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n", + "\n", + "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n", + "\n", + "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n", + "\n", + "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n", + "\n", + "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n", + "\n", + "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points." + ] + }, + { + "cell_type": "markdown", + "id": "03bd2e44", + "metadata": { + "editable": true + }, + "source": [ + "## Momentum based GD\n", + "\n", + "We discuss here some simple examples where we introduce what is called\n", + "'memory'about previous steps, or what is normally called momentum\n", + "gradient descent.\n", + "For the mathematical details, see whiteboad notes from lecture on September 8, 2025." + ] + }, + { + "cell_type": "markdown", + "id": "0e101e2d", + "metadata": { + "editable": true + }, + "source": [ + "## Improving gradient descent with momentum" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "09ecede4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# take a step\n", + "\t\tsolution = solution - step_size * gradient\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# perform the gradient descent search\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3489dbbc", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "426eaa39", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# keep track of the change\n", + "\tchange = 0.0\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# calculate update\n", + "\t\tnew_change = step_size * gradient + momentum * change\n", + "\t\t# take a step\n", + "\t\tsolution = solution - new_change\n", + "\t\t# save the change\n", + "\t\tchange = new_change\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# define momentum\n", + "momentum = 0.3\n", + "# perform the gradient descent search with momentum\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6220214d", + "metadata": { + "editable": true + }, + "source": [ + "## Overview video on Stochastic Gradient Descent (SGD)\n", + "\n", + "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n", + "There are several reasons for using stochastic gradient descent. Some of these are:\n", + "\n", + "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n", + "\n", + "2. Hopefully avoid Local Minima\n", + "\n", + "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset." + ] + }, + { + "cell_type": "markdown", + "id": "bf86ac65", + "metadata": { + "editable": true + }, + "source": [ + "## Batches and mini-batches\n", + "\n", + "In gradient descent we compute the cost function and its gradient for all data points we have.\n", + "\n", + "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n", + "training data can have on order of millions of examples. Hence, it\n", + "seems wasteful to compute the full cost function over the entire\n", + "training set in order to perform only a single parameter update. A\n", + "very common approach to addressing this challenge is to compute the\n", + "gradient over batches of the training data. For example, a typical batch could contain some thousand examples from\n", + "an entire training set of several millions. This batch is then used to\n", + "perform a parameter update." + ] + }, + { + "cell_type": "markdown", + "id": "4ac61edb", + "metadata": { + "editable": true + }, + "source": [ + "## Pros and cons\n", + "\n", + "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n", + "\n", + "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n", + "\n", + "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient." + ] + }, + { + "cell_type": "markdown", + "id": "0058008d", + "metadata": { + "editable": true + }, + "source": [ + "## Convergence rates\n", + "\n", + "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n", + "\n", + "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration." + ] + }, + { + "cell_type": "markdown", + "id": "f994e1e2", + "metadata": { + "editable": true + }, + "source": [ + "## Accuracy\n", + "\n", + "In general, stochastic Gradient Descent is Less accurate than gradient\n", + "descent, as it calculates the gradient on single examples, which may\n", + "not accurately represent the overall dataset. Gradient Descent is\n", + "more accurate because it uses the average gradient calculated over the\n", + "entire dataset.\n", + "\n", + "There are other disadvantages to using SGD. The main drawback is that\n", + "its convergence behaviour can be more erratic due to the random\n", + "sampling of individual training examples. This can lead to less\n", + "accurate results, as the algorithm may not converge to the true\n", + "minimum of the cost function. Additionally, the learning rate, which\n", + "determines the step size of each update to the model’s parameters,\n", + "must be carefully chosen to ensure convergence.\n", + "\n", + "It is however the method of choice in deep learning algorithms where\n", + "SGD is often used in combination with other optimization techniques,\n", + "such as momentum or adaptive learning rates" + ] + }, + { + "cell_type": "markdown", + "id": "842a8611", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent (SGD)\n", + "\n", + "In stochastic gradient descent, the extreme case is the case where we\n", + "have only one batch, that is we include the whole data set.\n", + "\n", + "This process is called Stochastic Gradient\n", + "Descent (SGD) (or also sometimes on-line gradient descent). This is\n", + "relatively less common to see because in practice due to vectorized\n", + "code optimizations it can be computationally much more efficient to\n", + "evaluate the gradient for 100 examples, than the gradient for one\n", + "example 100 times. Even though SGD technically refers to using a\n", + "single example at a time to evaluate the gradient, you will hear\n", + "people use the term SGD even when referring to mini-batch gradient\n", + "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n", + "for “Batch gradient descent” are rare to see), where it is usually\n", + "assumed that mini-batches are used. The size of the mini-batch is a\n", + "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n", + "usually based on memory constraints (if any), or set to some value,\n", + "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n", + "vectorized operation implementations work faster when their inputs are\n", + "sized in powers of 2.\n", + "\n", + "In our notes with SGD we mean stochastic gradient descent with mini-batches." + ] + }, + { + "cell_type": "markdown", + "id": "90bd121a", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent\n", + "\n", + "Stochastic gradient descent (SGD) and variants thereof address some of\n", + "the shortcomings of the Gradient descent method discussed above.\n", + "\n", + "The underlying idea of SGD comes from the observation that the cost\n", + "function, which we want to minimize, can almost always be written as a\n", + "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$," + ] + }, + { + "cell_type": "markdown", + "id": "5cd81303", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "60e085a9", + "metadata": { + "editable": true + }, + "source": [ + "## Computation of gradients\n", + "\n", + "This in turn means that the gradient can be\n", + "computed as a sum over $i$-gradients" + ] + }, + { + "cell_type": "markdown", + "id": "fef0100e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aaba7f05", + "metadata": { + "editable": true + }, + "source": [ + "Stochasticity/randomness is introduced by only taking the\n", + "gradient on a subset of the data called minibatches. If there are $n$\n", + "data points and the size of each minibatch is $M$, there will be $n/M$\n", + "minibatches. We denote these minibatches by $B_k$ where\n", + "$k=1,\\cdots,n/M$." + ] + }, + { + "cell_type": "markdown", + "id": "038b47ae", + "metadata": { + "editable": true + }, + "source": [ + "## SGD example\n", + "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n", + "and we choose to have $M=5$ minibathces,\n", + "then each minibatch contains two data points. In particular we have\n", + "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n", + "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n", + "have only a single batch with all data points and on the other extreme,\n", + "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n", + "$B_k = \\mathbf{x}_k$.\n", + "\n", + "The idea is now to approximate the gradient by replacing the sum over\n", + "all data points with a sum over the data points in one the minibatches\n", + "picked at random in each gradient descent step" + ] + }, + { + "cell_type": "markdown", + "id": "0ad42833", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta}\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n", + "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "64b15ba2", + "metadata": { + "editable": true + }, + "source": [ + "## The gradient step\n", + "\n", + "Thus a gradient descent step now looks like" + ] + }, + { + "cell_type": "markdown", + "id": "49c6adb0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82873545", + "metadata": { + "editable": true + }, + "source": [ + "where $k$ is picked at random with equal\n", + "probability from $[1,n/M]$. An iteration over the number of\n", + "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n", + "typical to choose a number of epochs and for each epoch iterate over\n", + "the number of minibatches, as exemplified in the code below." + ] + }, + { + "cell_type": "markdown", + "id": "35a8e70d", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example code" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "6aa32b90", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 10 #number of epochs\n", + "\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for \n", + " j += 1" + ] + }, + { + "cell_type": "markdown", + "id": "6e20f534", + "metadata": { + "editable": true + }, + "source": [ + "Taking the gradient only on a subset of the data has two important\n", + "benefits. First, it introduces randomness which decreases the chance\n", + "that our opmization scheme gets stuck in a local minima. Second, if\n", + "the size of the minibatches are small relative to the number of\n", + "datapoints ($M < n$), the computation of the gradient is much\n", + "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n", + "all $n$ datapoints." + ] + }, + { + "cell_type": "markdown", + "id": "71745d3e", + "metadata": { + "editable": true + }, + "source": [ + "## When do we stop?\n", + "\n", + "A natural question is when do we stop the search for a new minimum?\n", + "One possibility is to compute the full gradient after a given number\n", + "of epochs and check if the norm of the gradient is smaller than some\n", + "threshold and stop if true. However, the condition that the gradient\n", + "is zero is valid also for local minima, so this would only tell us\n", + "that we are close to a local/global minimum. However, we could also\n", + "evaluate the cost function at this point, store the result and\n", + "continue the search. If the test kicks in at a later stage we can\n", + "compare the values of the cost function and keep the $\\theta$ that\n", + "gave the lowest value." + ] + }, + { + "cell_type": "markdown", + "id": "bad95be2", + "metadata": { + "editable": true + }, + "source": [ + "## Slightly different approach\n", + "\n", + "Another approach is to let the step length $\\eta_j$ depend on the\n", + "number of epochs in such a way that it becomes very small after a\n", + "reasonable time such that we do not move at all. Such approaches are\n", + "also called scaling. There are many such ways to [scale the learning\n", + "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n", + "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n", + "also\n", + "\n", + "for a discussion of different scaling functions for the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "40b4d87e", + "metadata": { + "editable": true + }, + "source": [ + "## Time decay rate\n", + "\n", + "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n", + "\n", + "In this way we can fix the number of epochs, compute $\\theta$ and\n", + "evaluate the cost function at the end. Repeating the computation will\n", + "give a different result since the scheme is random by design. Then we\n", + "pick the final $\\theta$ that gives the lowest value of the cost\n", + "function." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "1208bbec", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "def step_length(t,t0,t1):\n", + " return t0/(t+t1)\n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 500 #number of epochs\n", + "t0 = 1.0\n", + "t1 = 10\n", + "\n", + "eta_j = t0/t1\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for theta\n", + " t = epoch*m+i\n", + " eta_j = step_length(t,t0,t1)\n", + " j += 1\n", + "\n", + "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))" + ] + }, + { + "cell_type": "markdown", + "id": "b83b5ed1", + "metadata": { + "editable": true + }, + "source": [ + "## Code with a Number of Minibatches which varies\n", + "\n", + "In the code here we vary the number of mini-batches." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1f669db6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Importing various packages\n", + "from math import exp, sqrt\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ ((X @ theta)-y)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3e9ed564", + "metadata": { + "editable": true + }, + "source": [ + "## Replace or not\n", + "\n", + "In the above code, we have use replacement in setting up the\n", + "mini-batches. The discussion\n", + "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n", + "useful." + ] + }, + { + "cell_type": "markdown", + "id": "9c0ac318", + "metadata": { + "editable": true + }, + "source": [ + "## Second moment of the gradient\n", + "\n", + "In stochastic gradient descent, with and without momentum, we still\n", + "have to specify a schedule for tuning the learning rates $\\eta_t$\n", + "as a function of time. As discussed in the context of Newton's\n", + "method, this presents a number of dilemmas. The learning rate is\n", + "limited by the steepest direction which can change depending on the\n", + "current position in the landscape. To circumvent this problem, ideally\n", + "our algorithm would keep track of curvature and take large steps in\n", + "shallow, flat directions and small steps in steep, narrow directions.\n", + "Second-order methods accomplish this by calculating or approximating\n", + "the Hessian and normalizing the learning rate by the\n", + "curvature. However, this is very computationally expensive for\n", + "extremely large models. Ideally, we would like to be able to\n", + "adaptively change the step size to match the landscape without paying\n", + "the steep computational price of calculating or approximating\n", + "Hessians.\n", + "\n", + "During the last decade a number of methods have been introduced that accomplish\n", + "this by tracking not only the gradient, but also the second moment of\n", + "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n", + "[ADAM](https://arxiv.org/abs/1412.6980)." + ] + }, + { + "cell_type": "markdown", + "id": "d8f518c4", + "metadata": { + "editable": true + }, + "source": [ + "## Challenge: Choosing a Fixed Learning Rate\n", + "A fixed $\\eta$ is hard to get right:\n", + "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n", + "\n", + "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n", + "\n", + "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n", + "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n", + "1. Steep coordinates require a smaller step size to avoid oscillation.\n", + "\n", + "2. Flat/shallow coordinates could use a larger step to speed up progress.\n", + "\n", + "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature." + ] + }, + { + "cell_type": "markdown", + "id": "3dcb89bd", + "metadata": { + "editable": true + }, + "source": [ + "## Motivation for Adaptive Step Sizes\n", + "\n", + "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n", + "\n", + "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n", + "\n", + "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n", + "\n", + "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n", + "\n", + "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods." + ] + }, + { + "cell_type": "markdown", + "id": "8f258bc2", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "2a3715f8", + "metadata": { + "editable": true + }, + "source": [ + "## Derivation of the AdaGrad Algorithm\n", + "\n", + "**Accumulating Gradient History.**\n", + "\n", + "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n", + "\n", + "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n", + "\n", + "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n", + "\n", + "4. At each iteration $t$, update the accumulation:" + ] + }, + { + "cell_type": "markdown", + "id": "a1d9578a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "r_t = r_{t-1} + g_t \\circ g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6b5bc5e", + "metadata": { + "editable": true + }, + "source": [ + "1. Here $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n", + "\n", + "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$." + ] + }, + { + "cell_type": "markdown", + "id": "44b313c8", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Update Rule Derivation\n", + "\n", + "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:" + ] + }, + { + "cell_type": "markdown", + "id": "b56c85b9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5bcc6bd2", + "metadata": { + "editable": true + }, + "source": [ + "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n", + "In coordinates, this means each parameter $j$ has an individual step size:" + ] + }, + { + "cell_type": "markdown", + "id": "41fc9f01", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8151719b", + "metadata": { + "editable": true + }, + "source": [ + "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:" + ] + }, + { + "cell_type": "markdown", + "id": "bb75b0ad", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3c71fd46", + "metadata": { + "editable": true + }, + "source": [ + "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows." + ] + }, + { + "cell_type": "markdown", + "id": "1d835a18", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Properties\n", + "\n", + "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n", + "\n", + "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n", + "\n", + "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n", + "\n", + "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem\n", + "\n", + "It effectively reduces the need to tune $\\eta$ by hand.\n", + "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)" + ] + }, + { + "cell_type": "markdown", + "id": "77dcc8c3", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp: Adaptive Learning Rates\n", + "\n", + "Addresses AdaGrad’s diminishing learning rate issue.\n", + "Uses a decaying average of squared gradients (instead of a cumulative sum):" + ] + }, + { + "cell_type": "markdown", + "id": "21161d57", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e87e09a9", + "metadata": { + "editable": true + }, + "source": [ + "with $\\rho$ typically $0.9$ (or $0.99$).\n", + "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n", + "\n", + "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n", + "\n", + "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n", + "\n", + "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)" + ] + }, + { + "cell_type": "markdown", + "id": "1a98c681", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "8b337277", + "metadata": { + "editable": true + }, + "source": [ + "## Adam Optimizer\n", + "\n", + "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n", + "\n", + "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice." + ] + }, + { + "cell_type": "markdown", + "id": "af77b83f", + "metadata": { + "editable": true + }, + "source": [ + "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n", + "\n", + "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n", + "both the first and second moment of the gradient and use this\n", + "information to adaptively change the learning rate for different\n", + "parameters. The method is efficient when working with large\n", + "problems involving lots data and/or parameters. It is a combination of the\n", + "gradient descent with momentum algorithm and the RMSprop algorithm\n", + "discussed above." + ] + }, + { + "cell_type": "markdown", + "id": "bc924f77", + "metadata": { + "editable": true + }, + "source": [ + "## Why Combine Momentum and RMSProp?\n", + "\n", + "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice" + ] + }, + { + "cell_type": "markdown", + "id": "86e5ab5e", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Exponential Moving Averages (Moments)\n", + "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n", + "**First moment (mean) $m_t$.**\n", + "\n", + "The Momentum term" + ] + }, + { + "cell_type": "markdown", + "id": "949f359d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0ba26be3", + "metadata": { + "editable": true + }, + "source": [ + "**Second moment (uncentered variance) $v_t$.**\n", + "\n", + "The RMS term" + ] + }, + { + "cell_type": "markdown", + "id": "4fb9b2a2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8711e597", + "metadata": { + "editable": true + }, + "source": [ + "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n", + "\n", + " These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)" + ] + }, + { + "cell_type": "markdown", + "id": "49e6e73d", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Bias Correction\n", + "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates" + ] + }, + { + "cell_type": "markdown", + "id": "ca5bb491", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5e19d7bf", + "metadata": { + "editable": true + }, + "source": [ + "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n", + "\n", + "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n", + "\n", + "* Bias correction is important for Adam’s stability in early iterations" + ] + }, + { + "cell_type": "markdown", + "id": "f79d952e", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Update Rule Derivation\n", + "Finally, Adam updates parameters using the bias-corrected moments:" + ] + }, + { + "cell_type": "markdown", + "id": "13e9862f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5693500e", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n", + "Breaking it down:\n", + "1. Compute gradient $\\nabla C(\\theta_t)$.\n", + "\n", + "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n", + "\n", + "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n", + "\n", + "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n", + "\n", + "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n", + "\n", + "This is the Adam update rule as given in the original paper." + ] + }, + { + "cell_type": "markdown", + "id": "65a5e1e7", + "metadata": { + "editable": true + }, + "source": [ + "## Adam vs. AdaGrad and RMSProp\n", + "\n", + "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n", + "\n", + "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n", + "\n", + "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n", + "\n", + " * Momentum ($m_t$) provides acceleration and smoother convergence.\n", + "\n", + " * Adaptive $v_t$ scaling moderates the step size per dimension.\n", + "\n", + " * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n", + "\n", + "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone" + ] + }, + { + "cell_type": "markdown", + "id": "27686255", + "metadata": { + "editable": true + }, + "source": [ + "## Adaptivity Across Dimensions\n", + "\n", + "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n", + "\n", + "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n", + "\n", + "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction." + ] + }, + { + "cell_type": "markdown", + "id": "f3dfc1e2", + "metadata": { + "editable": true + }, + "source": [ + "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "045d399c", + "metadata": { + "editable": true + }, + "source": [ + "## Algorithms and codes for Adagrad, RMSprop and Adam\n", + "\n", + "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n", + "\n", + "The codes which implement these algorithms are discussed below here." + ] + }, + { + "cell_type": "markdown", + "id": "4e75ee41", + "metadata": { + "editable": true + }, + "source": [ + "## Practical tips\n", + "\n", + "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n", + "\n", + "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n", + "\n", + "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n", + "\n", + "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications." + ] + }, + { + "cell_type": "markdown", + "id": "ddbb28ab", + "metadata": { + "editable": true + }, + "source": [ + "## Sneaking in automatic differentiation using Autograd\n", + "\n", + "In the examples here we take the liberty of sneaking in automatic\n", + "differentiation (without having discussed the mathematics). In\n", + "project 1 you will write the gradients as discussed above, that is\n", + "hard-coding the gradients. By introducing automatic differentiation\n", + "via the library **autograd**, which is now replaced by **JAX**, we have\n", + "more flexibility in setting up alternative cost functions.\n", + "\n", + "The\n", + "first example shows results with ordinary leats squares." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "dae38b6c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ca5a343a", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "08d97c1e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x#+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 30\n", + "\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "# Now improve with momentum gradient descent\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "for iter in range(Niterations):\n", + " # calculate gradient\n", + " gradients = training_gradient(theta)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd wth momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "727d8fc3", + "metadata": { + "editable": true + }, + "source": [ + "## Including Stochastic Gradient Descent with Autograd\n", + "\n", + "In this code we include the stochastic gradient descent approach\n", + "discussed above. Note here that we specify which argument we are\n", + "taking the derivative with respect to when using **autograd**." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "4e41c003", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "fe00db52", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "8f22105b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "\n", + "for epoch in range(n_epochs):\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + "print(\"theta from own sdg with momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "8956bf7a", + "metadata": { + "editable": true + }, + "source": [ + "## But none of these can compete with Newton's method\n", + "\n", + "Note that we here have introduced automatic differentiation" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "044275ef", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Newton's method\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+5*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "# Note that here the Hessian does not depend on the parameters theta\n", + "invH = np.linalg.pinv(H)\n", + "theta = np.random.randn(3,1)\n", + "Niterations = 5\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= invH @ gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own Newton code\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "353b50b3", + "metadata": { + "editable": true + }, + "source": [ + "## Similar (second order function now) problem but now with AdaGrad" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "fdc8debd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " Giter += gradients*gradients\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + " theta -= update\n", + "print(\"theta from own AdaGrad\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "b738f1b8", + "metadata": { + "editable": true + }, + "source": [ + "Running this code we note an almost perfect agreement with the results from matrix inversion." + ] + }, + { + "cell_type": "markdown", + "id": "65ce93ba", + "metadata": { + "editable": true + }, + "source": [ + "## RMSprop for adaptive learning rate with Stochastic Gradient Descent" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "604d7286", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameter rho\n", + "rho = 0.99\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + "\t# Accumulated gradient\n", + "\t# Scaling with rho the new and the previous results\n", + " Giter = (rho*Giter+(1-rho)*gradients*gradients)\n", + "\t# Taking the diagonal only and inverting\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + "\t# Hadamard product\n", + " theta -= update\n", + "print(\"theta from own RMSprop\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "e663a714", + "metadata": { + "editable": true + }, + "source": [ + "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "749fa687", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n", + "theta1 = 0.9\n", + "theta2 = 0.999\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-7\n", + "iter = 0\n", + "for epoch in range(n_epochs):\n", + " first_moment = 0.0\n", + " second_moment = 0.0\n", + " iter += 1\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " # Computing moments first\n", + " first_moment = theta1*first_moment + (1-theta1)*gradients\n", + " second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n", + " first_term = first_moment/(1.0-theta1**iter)\n", + " second_term = second_moment/(1.0-theta2**iter)\n", + "\t# Scaling with rho the new and the previous results\n", + " update = eta*first_term/(np.sqrt(second_term)+delta)\n", + " theta -= update\n", + "print(\"theta from own ADAM\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "8801fcd5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n", + "\n", + "2. Work on project 1\n", + "\n", + "\n", + "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended." + ] + }, + { + "cell_type": "markdown", + "id": "8ea68725", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on different scaling methods\n", + "\n", + "Before fitting a regression model, it is good practice to normalize or\n", + "standardize the features. This ensures all features are on a\n", + "comparable scale, which is especially important when using\n", + "regularization. In the exercises this week we will perform standardization, scaling each\n", + "feature to have mean 0 and standard deviation 1.\n", + "\n", + "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n", + "Then we subtract the mean and divide by the standard deviation for each feature.\n", + "\n", + "In the example here we\n", + "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n", + "(and each feature) means the model does not require a separate intercept\n", + "term, the data is shifted such that the intercept is effectively 0\n", + ". (In practice, one could include an intercept in the model and not\n", + "penalize it, but here we simplify by centering.)\n", + "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "04811786", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Standardize features (zero mean, unit variance for each feature)\n", + "X_mean = X.mean(axis=0)\n", + "X_std = X.std(axis=0)\n", + "X_std[X_std == 0] = 1 # safeguard to avoid division by zero for constant features\n", + "X_norm = (X - X_mean) / X_std\n", + "\n", + "# Center the target to zero mean (optional, to simplify intercept handling)\n", + "y_mean = ?\n", + "y_centered = ?" + ] + }, + { + "cell_type": "markdown", + "id": "b0e7cc2c", + "metadata": { + "editable": true + }, + "source": [ + "Do we need to center the values of $y$?\n", + "\n", + "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", + "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n", + "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", + "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", + "same scale)." + ] + }, + { + "cell_type": "markdown", + "id": "f8a8132d", + "metadata": { + "editable": true + }, + "source": [ + "## Functionality in Scikit-Learn\n", + "\n", + "**Scikit-Learn** has several functions which allow us to rescale the\n", + "data, normally resulting in much better results in terms of various\n", + "accuracy scores. The **StandardScaler** function in **Scikit-Learn**\n", + "ensures that for each feature/predictor we study the mean value is\n", + "zero and the variance is one (every column in the design/feature\n", + "matrix). This scaling has the drawback that it does not ensure that\n", + "we have a particular maximum or minimum in our data set. Another\n", + "function included in **Scikit-Learn** is the **MinMaxScaler** which\n", + "ensures that all features are exactly between $0$ and $1$. The" + ] + }, + { + "cell_type": "markdown", + "id": "03eca41f", + "metadata": { + "editable": true + }, + "source": [ + "## More preprocessing\n", + "\n", + "The **Normalizer** scales each data\n", + "point such that the feature vector has a euclidean length of one. In other words, it\n", + "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n", + "radius of 1. This means every data point is scaled by a different number (by the\n", + "inverse of it’s length).\n", + "This normalization is often used when only the direction (or angle) of the data matters,\n", + "not the length of the feature vector.\n", + "\n", + "The **RobustScaler** works similarly to the StandardScaler in that it\n", + "ensures statistical properties for each feature that guarantee that\n", + "they are on the same scale. However, the RobustScaler uses the median\n", + "and quartiles, instead of mean and variance. This makes the\n", + "RobustScaler ignore data points that are very different from the rest\n", + "(like measurement errors). These odd data points are also called\n", + "outliers, and might often lead to trouble for other scaling\n", + "techniques." + ] + }, + { + "cell_type": "markdown", + "id": "710e8f88", + "metadata": { + "editable": true + }, + "source": [ + "## Frequently used scaling functions\n", + "\n", + "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n", + "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:" + ] + }, + { + "cell_type": "markdown", + "id": "5d3df9bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "be0fd5f1", + "metadata": { + "editable": true + }, + "source": [ + "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively, of the feature $x_j$.\n", + "This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one.\n", + "\n", + "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n", + "on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "2a0924bb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#Model training, we compute the mean value of y and X\n", + "y_train_mean = np.mean(y_train)\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "X_train = X_train - X_train_mean\n", + "y_train = y_train - y_train_mean\n", + "\n", + "# The we fit our model with the training data\n", + "trained_model = some_model.fit(X_train,y_train)\n", + "\n", + "\n", + "#Model prediction, we need also to transform our data set used for the prediction.\n", + "X_test = X_test - X_train_mean #Use mean from training data\n", + "y_pred = trained_model(X_test)\n", + "y_pred = y_pred + y_train_mean\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "d116f448", + "metadata": { + "editable": true + }, + "source": [ + "Let us try to understand what this may imply mathematically when we\n", + "subtract the mean values, also known as *zero centering*. For\n", + "simplicity, we will focus on ordinary regression, as done in the above example.\n", + "\n", + "The cost/loss function for regression is" + ] + }, + { + "cell_type": "markdown", + "id": "41caea07", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1fa96f7c", + "metadata": { + "editable": true + }, + "source": [ + "Recall also that we use the squared value. This expression can lead to an\n", + "increased penalty for higher differences between predicted and\n", + "output/target values.\n", + "\n", + "What we have done is to single out the $\\theta_0$ term in the\n", + "definition of the mean squared error (MSE). The design matrix $X$\n", + "does in this case not contain any intercept column. When we take the\n", + "derivative with respect to $\\theta_0$, we want the derivative to obey" + ] + }, + { + "cell_type": "markdown", + "id": "70038d6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "852a77d0", + "metadata": { + "editable": true + }, + "source": [ + "for all $j$. For $\\theta_0$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "fc4afaaf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "94b18ced", + "metadata": { + "editable": true + }, + "source": [ + "Multiplying away the constant $2/n$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "d7a95314", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eaf6a485", + "metadata": { + "editable": true + }, + "source": [ + "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n", + "Our result for $\\theta_0$ simplifies then to" + ] + }, + { + "cell_type": "markdown", + "id": "3d9442a2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e4aeef17", + "metadata": { + "editable": true + }, + "source": [ + "We obtain then" + ] + }, + { + "cell_type": "markdown", + "id": "4ce9dee9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "752ce099", + "metadata": { + "editable": true + }, + "source": [ + "If we define" + ] + }, + { + "cell_type": "markdown", + "id": "7cad5229", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "46f1aaf9", + "metadata": { + "editable": true + }, + "source": [ + "and the mean value of the outputs as" + ] + }, + { + "cell_type": "markdown", + "id": "7d25a9fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "57b4c7d9", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "fb833214", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5fa29cd3", + "metadata": { + "editable": true + }, + "source": [ + "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have" + ] + }, + { + "cell_type": "markdown", + "id": "6c0e668d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9d928664", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite the latter equation as" + ] + }, + { + "cell_type": "markdown", + "id": "65434b84", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "127c9817", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "46f45c10", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4fbaa69a", + "metadata": { + "editable": true + }, + "source": [ + "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n", + "\n", + "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)" + ] + }, + { + "cell_type": "markdown", + "id": "25f1abd4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9fd5ef9e", + "metadata": { + "editable": true + }, + "source": [ + "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then" + ] + }, + { + "cell_type": "markdown", + "id": "f1cb8e35", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c0c5100a", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n", + "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n", + "\n", + "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then" + ] + }, + { + "cell_type": "markdown", + "id": "c80e55cb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a47f5c5e", + "metadata": { + "editable": true + }, + "source": [ + "What does this mean? And why do we insist on all this? Let us look at some examples.\n", + "\n", + "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n", + "Note also that we do not split the data into training and test." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e093186c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "\n", + "np.random.seed(2021)\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "def fit_theta(X, y):\n", + " return np.linalg.pinv(X.T @ X) @ X.T @ y\n", + "\n", + "\n", + "true_theta = [2, 0.5, 3.7]\n", + "\n", + "x = np.linspace(0, 1, 11)\n", + "y = np.sum(\n", + " np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n", + ") + 0.1 * np.random.normal(size=len(x))\n", + "\n", + "degree = 3\n", + "X = np.zeros((len(x), degree))\n", + "\n", + "# Include the intercept in the design matrix\n", + "for p in range(degree):\n", + " X[:, p] = x ** p\n", + "\n", + "theta = fit_theta(X, y)\n", + "\n", + "# Intercept is included in the design matrix\n", + "skl = LinearRegression(fit_intercept=False).fit(X, y)\n", + "\n", + "print(f\"True theta: {true_theta}\")\n", + "print(f\"Fitted theta: {theta}\")\n", + "print(f\"Sklearn fitted theta: {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with intercept column\")\n", + "print(MSE(y,ypredictOwn))\n", + "print(f\"MSE with intercept column from SKL\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "\n", + "plt.figure()\n", + "plt.scatter(x, y, label=\"Data\")\n", + "plt.plot(x, X @ theta, label=\"Fit\")\n", + "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n", + "\n", + "\n", + "# Do not include the intercept in the design matrix\n", + "X = np.zeros((len(x), degree - 1))\n", + "\n", + "for p in range(degree - 1):\n", + " X[:, p] = x ** (p + 1)\n", + "\n", + "# Intercept is not included in the design matrix\n", + "skl = LinearRegression(fit_intercept=True).fit(X, y)\n", + "\n", + "# Use centered values for X and y when computing coefficients\n", + "y_offset = np.average(y, axis=0)\n", + "X_offset = np.average(X, axis=0)\n", + "\n", + "theta = fit_theta(X - X_offset, y - y_offset)\n", + "intercept = np.mean(y_offset - X_offset @ theta)\n", + "\n", + "print(f\"Manual intercept: {intercept}\")\n", + "print(f\"Fitted theta (without intercept): {theta}\")\n", + "print(f\"Sklearn intercept: {skl.intercept_}\")\n", + "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with Manual intercept\")\n", + "print(MSE(y,ypredictOwn+intercept))\n", + "print(f\"MSE with Sklearn intercept\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n", + "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n", + "plt.grid()\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "de555fff", + "metadata": { + "editable": true + }, + "source": [ + "The intercept is the value of our output/target variable\n", + "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n", + "\n", + "Printing the MSE, we see first that both methods give the same MSE, as\n", + "they should. However, when we move to for example Ridge regression,\n", + "the way we treat the intercept may give a larger or smaller MSE,\n", + "meaning that the MSE can be penalized by the value of the\n", + "intercept. Not including the intercept in the fit, means that the\n", + "regularization term does not include $\\theta_0$. For different values\n", + "of $\\lambda$, this may lead to different MSE values. \n", + "\n", + "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by" + ] + }, + { + "cell_type": "markdown", + "id": "72178d39", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8e5d822b", + "metadata": { + "editable": true + }, + "source": [ + "but when we take out the intercept, this equation becomes" + ] + }, + { + "cell_type": "markdown", + "id": "e9218f82", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2223d1b1", + "metadata": { + "editable": true + }, + "source": [ + "For Lasso regression we have" + ] + }, + { + "cell_type": "markdown", + "id": "e5474a5b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "691295ed", + "metadata": { + "editable": true + }, + "source": [ + "It means that, when scaling the design matrix and the outputs/targets,\n", + "by subtracting the mean values, we have an optimization problem which\n", + "is not penalized by the intercept. The MSE value can then be smaller\n", + "since it focuses only on the remaining quantities. If we however bring\n", + "back the intercept, we will get a MSE which then contains the\n", + "intercept.\n", + "\n", + "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known vanilla data set." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "e243cef5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree))\n", + "#We include explicitely the intercept column\n", + "for degree in range(Maxpolydegree):\n", + " X[:,degree] = x**degree\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "p = Maxpolydegree\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n", + " # Note: we include the intercept column and no scaling\n", + " RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n", + " RegRidge.fit(X_train,y_train)\n", + " # and then make the prediction\n", + " ytildeOwnRidge = X_train @ OwnRidgeTheta\n", + " ypredictOwnRidge = X_test @ OwnRidgeTheta\n", + " ytildeRidge = RegRidge.predict(X_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta)\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ef2eaa7a", + "metadata": { + "editable": true + }, + "source": [ + "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n", + "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n", + "What happens if we do not include the intercept in our fit?\n", + "Let us see how we can change this code by zero centering." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "546e3504", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(315)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree-1))\n", + "\n", + "for degree in range(1,Maxpolydegree): #No intercept column\n", + " X[:,degree-1] = x**(degree)\n", + "\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "#Center by removing mean from each feature\n", + "X_train_scaled = X_train - X_train_mean \n", + "X_test_scaled = X_test - X_train_mean\n", + "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n", + "#Remove the intercept from the training data.\n", + "y_scaler = np.mean(y_train) \n", + "y_train_scaled = y_train - y_scaler \n", + "\n", + "p = Maxpolydegree-1\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n", + " intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n", + " #Add intercept to prediction\n", + " ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n", + " RegRidge = linear_model.Ridge(lmb)\n", + " RegRidge.fit(X_train,y_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta) #Intercept is given by mean of target variable\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print('Intercept from own implementation:')\n", + " print(intercept_)\n", + " print('Intercept from Scikit-Learn Ridge implementation')\n", + " print(RegRidge.intercept_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "f6787352", + "metadata": { + "editable": true + }, + "source": [ + "We see here, when compared to the code which includes explicitely the\n", + "intercept column, that our MSE value is actually smaller. This is\n", + "because the regularization term does not include the intercept value\n", + "$\\theta_0$ in the fitting. This applies to Lasso regularization as\n", + "well. It means that our optimization is now done only with the\n", + "centered matrix and/or vector that enter the fitting procedure." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.doctree new file mode 100644 index 000000000..d34aaa2a0 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.doctree new file mode 100644 index 000000000..4ab79f481 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.doctree new file mode 100644 index 000000000..c6faf048b Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.doctree new file mode 100644 index 000000000..80e654d7f Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.doctree new file mode 100644 index 000000000..477788e73 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.doctree new file mode 100644 index 000000000..21a71eca2 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.doctree new file mode 100644 index 000000000..be2cbefec Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.doctree new file mode 100644 index 000000000..154b70b88 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.doctree new file mode 100644 index 000000000..956f542aa Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.doctree new file mode 100644 index 000000000..48c14d047 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.doctree new file mode 100644 index 000000000..a50c92fae Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.doctree new file mode 100644 index 000000000..be662a48b Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.doctree new file mode 100644 index 000000000..709722fbc Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.doctree new file mode 100644 index 000000000..01a2cdd03 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.doctree new file mode 100644 index 000000000..c7d11aa68 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.doctree new file mode 100644 index 000000000..8c79795a1 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.doctree new file mode 100644 index 000000000..45e7ab3c7 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.doctree new file mode 100644 index 000000000..c56b43479 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.doctree new file mode 100644 index 000000000..cf4965621 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.doctree new file mode 100644 index 000000000..c59e69a76 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.doctree new file mode 100644 index 000000000..cecfe527d Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.doctree new file mode 100644 index 000000000..2490a33e5 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.doctree new file mode 100644 index 000000000..1241e9190 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.doctree new file mode 100644 index 000000000..dff26e46b Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.doctree new file mode 100644 index 000000000..9a11b76dd Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.doctree new file mode 100644 index 000000000..28c738f27 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.doctree new file mode 100644 index 000000000..e12b8a69e Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.doctree new file mode 100644 index 000000000..2c161d7bb Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.doctree new file mode 100644 index 000000000..b8db93782 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.doctree new file mode 100644 index 000000000..30aa1080d Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/ma/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/ma/README.doctree new file mode 100644 index 000000000..237a4fd91 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/ma/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.doctree new file mode 100644 index 000000000..f87551977 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.doctree new file mode 100644 index 000000000..0cbde70f4 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.doctree new file mode 100644 index 000000000..e3dc468e7 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.doctree new file mode 100644 index 000000000..37d431da9 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.doctree new file mode 100644 index 000000000..33dd0fdfc Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.doctree new file mode 100644 index 000000000..3b98e4a91 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.doctree new file mode 100644 index 000000000..6240076ea Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.doctree new file mode 100644 index 000000000..b686f1bf6 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.doctree new file mode 100644 index 000000000..6ae9ca3f7 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.doctree new file mode 100644 index 000000000..8e0dc1337 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/environment.pickle b/doc/LectureNotes/_build/.doctrees/environment.pickle index 98ef94a83..f9bc94dbb 100644 Binary files a/doc/LectureNotes/_build/.doctrees/environment.pickle and b/doc/LectureNotes/_build/.doctrees/environment.pickle differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree index e4d37262e..3e8c9fa18 100644 Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree index f8892c80f..9aab23cea 100644 Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree index bd5448904..2824a9bf4 100644 Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek38.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek38.doctree new file mode 100644 index 000000000..9961e3be8 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek38.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek39.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek39.doctree new file mode 100644 index 000000000..cc6fae4d9 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek39.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek41.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek41.doctree new file mode 100644 index 000000000..abdaa4371 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek41.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree new file mode 100644 index 000000000..5d34d9619 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek43.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek43.doctree new file mode 100644 index 000000000..793ebcfad Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek43.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek44.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek44.doctree new file mode 100644 index 000000000..41ce5a839 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek44.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/intro.doctree b/doc/LectureNotes/_build/.doctrees/intro.doctree index 79d28ce2c..f7f4a762d 100644 Binary files a/doc/LectureNotes/_build/.doctrees/intro.doctree and b/doc/LectureNotes/_build/.doctrees/intro.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/project1.doctree b/doc/LectureNotes/_build/.doctrees/project1.doctree index a9a908212..03d34307b 100644 Binary files a/doc/LectureNotes/_build/.doctrees/project1.doctree and b/doc/LectureNotes/_build/.doctrees/project1.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/project2.doctree b/doc/LectureNotes/_build/.doctrees/project2.doctree new file mode 100644 index 000000000..23faf0efe Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/project2.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week37.doctree b/doc/LectureNotes/_build/.doctrees/week37.doctree new file mode 100644 index 000000000..d6aa660bb Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week37.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week38.doctree b/doc/LectureNotes/_build/.doctrees/week38.doctree new file mode 100644 index 000000000..7cd935a7f Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week38.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week39.doctree b/doc/LectureNotes/_build/.doctrees/week39.doctree new file mode 100644 index 000000000..d09449378 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week39.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week40.doctree b/doc/LectureNotes/_build/.doctrees/week40.doctree new file mode 100644 index 000000000..da8a5efe0 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week40.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week41.doctree b/doc/LectureNotes/_build/.doctrees/week41.doctree new file mode 100644 index 000000000..3b83ec1c9 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week41.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week42.doctree b/doc/LectureNotes/_build/.doctrees/week42.doctree new file mode 100644 index 000000000..a93db2b6f Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week42.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week43.doctree b/doc/LectureNotes/_build/.doctrees/week43.doctree new file mode 100644 index 000000000..66ccdcbeb Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week43.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week44.doctree b/doc/LectureNotes/_build/.doctrees/week44.doctree new file mode 100644 index 000000000..cc92706c4 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week44.doctree differ diff --git a/doc/LectureNotes/_build/.doctrees/week45.doctree b/doc/LectureNotes/_build/.doctrees/week45.doctree new file mode 100644 index 000000000..b114b1f95 Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week45.doctree differ diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.html new file mode 100644 index 000000000..62d549a2d --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.html @@ -0,0 +1,587 @@ + + + + + + + + + + + A11Y Dark — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

A11Y Dark

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

A11Y Dark#

+

This is the Pygments implementation of a11y-dark from Eric Bailey’s +accessible themes for syntax +highlighting

+

Screenshot of the a11y-dark theme in a bash script

+
+

Colors#

+

Background color: #2b2b2b #2b2b2b

+

Highlight color: #ffd9002e #ffd9002e

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#d4d0ab

#d4d0ab

9.0 : 1

AAA

AAA

#ffa07a

#ffa07a

7.1 : 1

AAA

AAA

#f5ab35

#f5ab35

7.3 : 1

AAA

AAA

#ffd700

#ffd700

10.1 : 1

AAA

AAA

#abe338

#abe338

9.3 : 1

AAA

AAA

#00e0e0

#00e0e0

8.6 : 1

AAA

AAA

#dcc6e0

#dcc6e0

8.9 : 1

AAA

AAA

#f8f8f2

#f8f8f2

13.3 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.html new file mode 100644 index 000000000..5b04aa896 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.html @@ -0,0 +1,573 @@ + + + + + + + + + + + A11Y High Contrast Dark — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

A11Y High Contrast Dark

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

A11Y High Contrast Dark#

+

This style mimics the a11 light theme from eric bailey’s accessible themes.

+

Screenshot of the a11y-high-contrast-dark theme in a bash script

+
+

Colors#

+

Background color: #2b2b2b #2b2b2b

+

Highlight color: #ffd9002e #ffd9002e

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#ffd900

#ffd900

10.2 : 1

AAA

AAA

#ffa07a

#ffa07a

7.1 : 1

AAA

AAA

#abe338

#abe338

9.3 : 1

AAA

AAA

#00e0e0

#00e0e0

8.6 : 1

AAA

AAA

#dcc6e0

#dcc6e0

8.9 : 1

AAA

AAA

#f8f8f2

#f8f8f2

13.3 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.html new file mode 100644 index 000000000..d6b6b1c7b --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.html @@ -0,0 +1,585 @@ + + + + + + + + + + + A11Y High Contrast Light — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

A11Y High Contrast Light

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

A11Y High Contrast Light#

+

This style mimics the a11y-light theme (but with more contrast) from eric bailey’s accessible themes.

+

Screenshot of the a11y-high-contrast-light theme in a bash script

+
+

Colors#

+

Background color: #fefefe #fefefe

+

Highlight color: #fae4c2 #fae4c2

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#515151

#515151

7.9 : 1

AAA

AAA

#a12236

#a12236

7.4 : 1

AAA

AAA

#7f4707

#7f4707

7.4 : 1

AAA

AAA

#912583

#912583

7.4 : 1

AAA

AAA

#00622f

#00622f

7.5 : 1

AAA

AAA

#005b82

#005b82

7.4 : 1

AAA

AAA

#6730c5

#6730c5

7.4 : 1

AAA

AAA

#080808

#080808

19.9 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.html new file mode 100644 index 000000000..ff367bdb9 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + A11Y Light — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

A11Y Light

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

A11Y Light#

+

This style inspired by the a11y-light theme from eric bailey’s accessible themes.

+

Screenshot of the a11y-light theme in a bash script

+
+

Colors#

+

Background color: #f2f2f2 #f2f2f2

+

Highlight color: #fdf2e2 #fdf2e2

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#515151

#515151

7.1 : 1

AAA

AAA

#d71835

#d71835

4.6 : 1

AA

AAA

#7f4707

#7f4707

6.7 : 1

AA

AAA

#116633

#116633

6.3 : 1

AA

AAA

#00749c

#00749c

4.7 : 1

AA

AAA

#8045e5

#8045e5

4.8 : 1

AA

AAA

#1e1e1e

#1e1e1e

14.9 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.html new file mode 100644 index 000000000..daad1ecd8 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Blinds Dark — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Blinds Dark

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Blinds Dark#

+

This style mimics the blinds dark theme from vscode themes.

+

Screenshot of the blinds-dark theme in a bash script

+
+

Colors#

+

Background color: #242424 #242424

+

Highlight color: #66666691 #66666691

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#8c8c8c

#8c8c8c

4.6 : 1

AA

AAA

#ee6677

#ee6677

5.0 : 1

AA

AAA

#ccbb44

#ccbb44

8.0 : 1

AAA

AAA

#66ccee

#66ccee

8.5 : 1

AAA

AAA

#5391cf

#5391cf

4.7 : 1

AA

AAA

#d166a3

#d166a3

4.5 : 1

AA

AAA

#bbbbbb

#bbbbbb

8.1 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.html new file mode 100644 index 000000000..f36e2e54d --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Blinds Light — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Blinds Light

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Blinds Light#

+

This style mimics the blinds light theme from vscode themes.

+

Screenshot of the blinds-light theme in a bash script

+
+

Colors#

+

Background color: #fcfcfc #fcfcfc

+

Highlight color: #add6ff #add6ff

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#737373

#737373

4.6 : 1

AA

AAA

#bf5400

#bf5400

4.6 : 1

AA

AAA

#996b00

#996b00

4.6 : 1

AA

AAA

#008561

#008561

4.5 : 1

AA

AAA

#0072b2

#0072b2

5.1 : 1

AA

AAA

#cc398b

#cc398b

4.5 : 1

AA

AAA

#000000

#000000

20.5 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.html new file mode 100644 index 000000000..2d6c83ac7 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Github Dark — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Github Dark

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Github Dark#

+

This style mimics the github dark default theme from vs code themes.

+

Screenshot of the github-dark theme in a bash script

+
+

Colors#

+

Background color: #0d1117 #0d1117

+

Highlight color: #6e7681 #6e7681

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#8b949e

#8b949e

6.2 : 1

AA

AAA

#ff7b72

#ff7b72

7.5 : 1

AAA

AAA

#ffa657

#ffa657

9.8 : 1

AAA

AAA

#7ee787

#7ee787

12.3 : 1

AAA

AAA

#79c0ff

#79c0ff

9.7 : 1

AAA

AAA

#d2a8ff

#d2a8ff

9.7 : 1

AAA

AAA

#c9d1d9

#c9d1d9

12.3 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.html new file mode 100644 index 000000000..b913171f5 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Github Dark Colorblind — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Github Dark Colorblind

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Github Dark Colorblind#

+

This style mimics the github dark colorblind theme from vscode.

+

Screenshot of the github-dark-colorblind theme in a bash script

+
+

Colors#

+

Background color: #0d1117 #0d1117

+

Highlight color: #58a6ff70 #58a6ff70

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#b1bac4

#b1bac4

9.6 : 1

AAA

AAA

#ec8e2c

#ec8e2c

7.6 : 1

AAA

AAA

#fdac54

#fdac54

10.1 : 1

AAA

AAA

#a5d6ff

#a5d6ff

12.3 : 1

AAA

AAA

#79c0ff

#79c0ff

9.7 : 1

AAA

AAA

#d2a8ff

#d2a8ff

9.7 : 1

AAA

AAA

#c9d1d9

#c9d1d9

12.3 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.html new file mode 100644 index 000000000..982873a19 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Github Dark High Contrast — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Github Dark High Contrast

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Github Dark High Contrast#

+

This style mimics the github dark high contrast theme from vs code themes.

+

Screenshot of the github-dark-high-contrast theme in a bash script

+
+

Colors#

+

Background color: #0d1117 #0d1117

+

Highlight color: #58a6ff70 #58a6ff70

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#d9dee3

#d9dee3

14.0 : 1

AAA

AAA

#ff9492

#ff9492

8.9 : 1

AAA

AAA

#ffb757

#ffb757

11.0 : 1

AAA

AAA

#72f088

#72f088

13.1 : 1

AAA

AAA

#91cbff

#91cbff

11.0 : 1

AAA

AAA

#dbb7ff

#dbb7ff

11.0 : 1

AAA

AAA

#c9d1d9

#c9d1d9

12.3 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.html new file mode 100644 index 000000000..1200efee2 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Github Light — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Github Light

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Github Light#

+

This style mimics the github light theme from vscode themes.

+

Screenshot of the github-light theme in a bash script

+
+

Colors#

+

Background color: #ffffff #ffffff

+

Highlight color: #0969da4a #0969da4a

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#6e7781

#6e7781

4.5 : 1

AA

AAA

#cf222e

#cf222e

5.4 : 1

AA

AAA

#953800

#953800

7.4 : 1

AAA

AAA

#116329

#116329

7.4 : 1

AAA

AAA

#0550ae

#0550ae

7.6 : 1

AAA

AAA

#8250df

#8250df

5.0 : 1

AA

AAA

#24292f

#24292f

14.7 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.html new file mode 100644 index 000000000..4d5c0b528 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.html @@ -0,0 +1,573 @@ + + + + + + + + + + + Github Light Colorblind — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Github Light Colorblind

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Github Light Colorblind#

+

This style mimics the github light colorblind theme from vscode themes.

+

Screenshot of the github-light-colorblind theme in a bash script

+
+

Colors#

+

Background color: #ffffff #ffffff

+

Highlight color: #0969da4a #0969da4a

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#6e7781

#6e7781

4.5 : 1

AA

AAA

#b35900

#b35900

4.8 : 1

AA

AAA

#8a4600

#8a4600

7.1 : 1

AAA

AAA

#0550ae

#0550ae

7.6 : 1

AAA

AAA

#8250df

#8250df

5.0 : 1

AA

AAA

#24292f

#24292f

14.7 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.html new file mode 100644 index 000000000..a48958f20 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Github Light High Contrast — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Github Light High Contrast

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Github Light High Contrast#

+

This style mimics the github light high contrast theme from vscode themes.

+

Screenshot of the github-light-high-contrast theme in a bash script

+
+

Colors#

+

Background color: #ffffff #ffffff

+

Highlight color: #0969da4a #0969da4a

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#66707b

#66707b

5.0 : 1

AA

AAA

#a0111f

#a0111f

8.1 : 1

AAA

AAA

#702c00

#702c00

10.2 : 1

AAA

AAA

#024c1a

#024c1a

10.2 : 1

AAA

AAA

#023b95

#023b95

10.2 : 1

AAA

AAA

#622cbc

#622cbc

8.1 : 1

AAA

AAA

#24292f

#24292f

14.7 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.html new file mode 100644 index 000000000..a86c6b072 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Gotthard Dark — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Gotthard Dark

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Gotthard Dark#

+

This style mimics the gotthard dark theme from vscode.

+

Screenshot of the gotthard-dark theme in a bash script

+
+

Colors#

+

Background color: #000000 #000000

+

Highlight color: #4c4b4be8 #4c4b4be8

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#f5f5f5

#f5f5f5

19.3 : 1

AAA

AAA

#ab6369

#ab6369

4.7 : 1

AA

AAA

#b89784

#b89784

7.8 : 1

AAA

AAA

#caab6d

#caab6d

9.6 : 1

AAA

AAA

#81b19b

#81b19b

8.7 : 1

AAA

AAA

#6f98b3

#6f98b3

6.8 : 1

AA

AAA

#b19db4

#b19db4

8.4 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.html new file mode 100644 index 000000000..f6919644d --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Gotthard Light — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Gotthard Light

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Gotthard Light#

+

This style mimics the gotthard light theme from vscode.

+

Screenshot of the gotthard-light theme in a bash script

+
+

Colors#

+

Background color: #F5F5F5 #F5F5F5

+

Highlight color: #E1E1E1 #E1E1E1

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#141414

#141414

16.9 : 1

AAA

AAA

#9f4e55

#9f4e55

5.2 : 1

AA

AAA

#a25e53

#a25e53

4.5 : 1

AA

AAA

#98661b

#98661b

4.5 : 1

AA

AAA

#437a6b

#437a6b

4.5 : 1

AA

AAA

#3d73a9

#3d73a9

4.6 : 1

AA

AAA

#974eb7

#974eb7

4.7 : 1

AA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.html new file mode 100644 index 000000000..e2d7a65c3 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.html @@ -0,0 +1,579 @@ + + + + + + + + + + + Greative — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Greative

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Greative#

+

This style mimics greative theme from vscode themes.

+

Screenshot of the greative theme in a bash script

+
+

Colors#

+

Background color: #010726 #010726

+

Highlight color: #473d18 #473d18

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#797979

#797979

4.6 : 1

AA

AAA

#f78c6c

#f78c6c

8.4 : 1

AAA

AAA

#9e8741

#9e8741

5.7 : 1

AA

AAA

#c5e478

#c5e478

13.9 : 1

AAA

AAA

#a2bffc

#a2bffc

10.8 : 1

AAA

AAA

#5ca7e4

#5ca7e4

7.6 : 1

AAA

AAA

#9e86c8

#9e86c8

6.3 : 1

AA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.html new file mode 100644 index 000000000..dcd757c82 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.html @@ -0,0 +1,591 @@ + + + + + + + + + + + Pitaya Smoothie — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Pitaya Smoothie

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Pitaya Smoothie#

+

This style mimics the a11 light theme from eric bailey’s accessible themes.

+

Screenshot of the pitaya-smoothie theme in a bash script

+
+

Colors#

+

Background color: #181036 #181036

+

Highlight color: #2A1968 #2A1968

+

WCAG compliance

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Color

Hex

Ratio

Normal text

Large text

#8786ac

#8786ac

5.2 : 1

AA

AAA

#f26196

#f26196

5.9 : 1

AA

AAA

#f5a394

#f5a394

9.0 : 1

AAA

AAA

#fad000

#fad000

12.1 : 1

AAA

AAA

#18c1c4

#18c1c4

8.1 : 1

AAA

AAA

#66e9ec

#66e9ec

12.4 : 1

AAA

AAA

#7998f2

#7998f2

6.5 : 1

AA

AAA

#c4a2f5

#c4a2f5

8.4 : 1

AAA

AAA

#fefeff

#fefeff

17.9 : 1

AAA

AAA

+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.html new file mode 100644 index 000000000..2afc98044 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.html @@ -0,0 +1,542 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

Copyright (c) 2020 Jeff Forcier.

+

Based on original work copyright (c) 2011 Kenneth Reitz and copyright (c) 2010 +Armin Ronacher.

+

Some rights reserved.

+

Redistribution and use in source and binary forms of the theme, with or +without modification, are permitted provided that the following conditions +are met:

+
    +
  • Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer.

  • +
  • Redistributions in binary form must reproduce the above +copyright notice, this list of conditions and the following +disclaimer in the documentation and/or other materials provided +with the distribution.

  • +
  • The names of the contributors may not be used to endorse or +promote products derived from this software without specific +prior written permission.

  • +
+

THIS THEME IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS THEME, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.html new file mode 100644 index 000000000..c3d0376f1 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.html @@ -0,0 +1,546 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

Extensions allow extending the debugger without modifying the debugger code. This is implemented with explicit namespace +packages.

+

To implement your own extension:

+
    +
  1. Ensure that the root folder of your extension is in sys.path (add it to PYTHONPATH)

  2. +
  3. Ensure that your module follows the directory structure below

  4. +
  5. The __init__.py files inside the pydevd_plugin and extension folder must contain the preamble below, +and nothing else. +Preamble:

  6. +
+
try:
+    __import__('pkg_resources').declare_namespace(__name__)
+except ImportError:
+    import pkgutil
+    __path__ = pkgutil.extend_path(__path__, __name__)
+
+
+
    +
  1. Your plugin name inside the extensions folder must start with "pydevd_plugin"

  2. +
  3. Implement one or more of the abstract base classes defined in _pydevd_bundle.pydevd_extension_api. This can be done +by either inheriting from them or registering with the abstract base class.

  4. +
+
    +
  • Directory structure:

  • +
+
|--  root_directory-> must be on python path
+|    |-- pydevd_plugins
+|    |   |-- __init__.py -> must contain preamble
+|    |   |-- extensions
+|    |   |   |-- __init__.py -> must contain preamble
+|    |   |   |-- pydevd_plugin_plugin_name.py
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.html new file mode 100644 index 000000000..384a01789 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.html @@ -0,0 +1,540 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

BSD 3-Clause License

+

Copyright (c) 2013-2024, Kim Davies and contributors. +All rights reserved.

+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met:

+
    +
  1. Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer.

  2. +
  3. Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution.

  4. +
  5. Neither the name of the copyright holder nor the names of its +contributors may be used to endorse or promote products derived from +this software without specific prior written permission.

  6. +
+

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.html new file mode 100644 index 000000000..2773d816b --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.html @@ -0,0 +1,504 @@ + + + + + + + + + + + The MIT License (MIT) — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

The MIT License (MIT)

+ +
+
+ +
+
+
+ + + + +
+ +
+

The MIT License (MIT)#

+

Copyright © 2016 Yoshiki Shibukawa

+

Permission is hereby granted, free of charge, to any person obtaining a copy of this software +and associated documentation files (the “Software”), to deal in the Software without restriction, +including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, +subject to the following conditions:

+

The above copyright notice and this permission notice shall be included in all copies or substantial +portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT +NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH +THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.html new file mode 100644 index 000000000..4b1989c67 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.html @@ -0,0 +1,548 @@ + + + + + + + + + + + The IPython licensing terms — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

The IPython licensing terms

+ +
+ +
+
+ + + + +
+ +
+

The IPython licensing terms#

+

IPython is licensed under the terms of the Modified BSD License (also known as +New or Revised or 3-Clause BSD). See the LICENSE file.

+
+

About the IPython Development Team#

+

Fernando Perez began IPython in 2001 based on code from Janko Hauser +<jhauser@zscout.de> and Nathaniel Gray <n8gray@caltech.edu>. Fernando is still +the project lead.

+

The IPython Development Team is the set of all contributors to the IPython +project. This includes all of the IPython subprojects.

+

The core team that coordinates development on GitHub can be found here: +ipython/.

+
+ +
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.html new file mode 100644 index 000000000..4825ab866 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.html @@ -0,0 +1,496 @@ + + + + + + + + + + + Welcome to your Jupyter Book — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Welcome to your Jupyter Book

+ +
+
+ +
+
+
+ + + + +
+ +
+

Welcome to your Jupyter Book#

+

This is a small sample book to give you a feel for how book content is +structured. +It shows off a few of the major file types, as well as some sample content. +It does not go in-depth into any particular topic - check out the Jupyter Book documentation for more information.

+

Check out the content pages bundled with this sample book to see more.

+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.html new file mode 100644 index 000000000..8397c2c25 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.html @@ -0,0 +1,558 @@ + + + + + + + + + + + Notebooks with MyST Markdown — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Notebooks with MyST Markdown

+ +
+ +
+
+ + + + +
+ +
+

Notebooks with MyST Markdown#

+

Jupyter Book also lets you write text-based notebooks using MyST Markdown. +See the Notebooks with MyST Markdown documentation for more detailed instructions. +This page shows off a notebook written in MyST Markdown.

+
+

An example cell#

+

With MyST Markdown, you can define code cells with a directive like so:

+
+
+
print(2 + 2)
+
+
+
+
+

When your book is built, the contents of any {code-cell} blocks will be +executed with your default Jupyter kernel, and their outputs will be displayed +in-line with the rest of your content.

+
+

See also

+

Jupyter Book uses Jupytext to convert text-based files to notebooks, and can support many other text-based notebook files.

+
+
+
+

Create a notebook with MyST Markdown#

+

MyST Markdown notebooks are defined by two things:

+
    +
  1. YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed). +See the YAML at the top of this page for example.

  2. +
  3. The presence of {code-cell} directives, which will be executed with your book.

  4. +
+

That’s all that is needed to get started!

+
+
+

Quickly add YAML metadata for MyST Notebooks#

+

If you have a markdown file and you’d like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:

+
jupyter-book myst init path/to/markdownfile.md
+
+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + + + + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.html new file mode 100644 index 000000000..96b13e256 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.html @@ -0,0 +1,566 @@ + + + + + + + + + + + Markdown Files — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Markdown Files

+ +
+ +
+
+ + + + +
+ +
+

Markdown Files#

+

Whether you write your book’s content in Jupyter Notebooks (.ipynb) or +in regular markdown files (.md), you’ll write in the same flavor of markdown +called MyST Markdown. +This is a simple file to help you get started and show off some syntax.

+
+

What is MyST?#

+

MyST stands for “Markedly Structured Text”. It +is a slight variation on a flavor of markdown called “CommonMark” markdown, +with small syntax extensions to allow you to write roles and directives +in the Sphinx ecosystem.

+

For more about MyST, see the MyST Markdown Overview.

+
+
+

Sample Roles and Directives#

+

Roles and directives are two of the most powerful tools in Jupyter Book. They +are like functions, but written in a markup language. They both +serve a similar purpose, but roles are written in one line, whereas +directives span many lines. They both accept different kinds of inputs, +and what they do with those inputs depends on the specific role or directive +that is being called.

+

Here is a “note” directive:

+
+

Note

+

Here is a note

+
+

It will be rendered in a special box when you build your book.

+

Here is an inline directive to refer to a document: Notebooks with MyST Markdown.

+
+
+

Citations#

+

You can also cite references that are stored in a bibtex file. For example, +the following syntax: {cite}`holdgraf_evidence_2014` will render like +this: .

+

Moreover, you can insert a bibliography into your page with this syntax: +The {bibliography} directive must be used for all the {cite} roles to +render properly. +For example, if the references for your book are stored in references.bib, +then the bibliography is inserted with:

+
+
+

Learn more#

+

This is just a simple starter to get you started. +You can learn a lot more at jupyterbook.org.

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.html new file mode 100644 index 000000000..09cc2a1db --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.html @@ -0,0 +1,590 @@ + + + + + + + + + + + Content with notebooks — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Content with notebooks

+ +
+ +
+
+ + + + +
+ +
+

Content with notebooks#

+

You can also create content with Jupyter Notebooks. This means that you can include +code blocks and their outputs in your book.

+
+

Markdown + notebooks#

+

As it is markdown, you can embed images, HTML, etc into your posts!

+

+

You can also \(add_{math}\) and

+
+\[ +math^{blocks} +\]
+

or

+
+\[\begin{split} +\begin{aligned} +\mbox{mean} la_{tex} \\ \\ +math blocks +\end{aligned} +\end{split}\]
+

But make sure you $Escape $your $dollar signs $you want to keep!

+
+
+

MyST markdown#

+

MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check +out the MyST guide in Jupyter Book, +or see the MyST markdown documentation.

+
+
+

Code blocks and outputs#

+

Jupyter Book will also embed your code blocks and output in your book. +For example, here’s some sample Matplotlib code:

+
+
+
from matplotlib import rcParams, cycler
+import matplotlib.pyplot as plt
+import numpy as np
+plt.ion()
+
+
+
+
+
+
+
# Fixing random state for reproducibility
+np.random.seed(19680801)
+
+N = 10
+data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]
+data = np.array(data).T
+cmap = plt.cm.coolwarm
+rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))
+
+
+from matplotlib.lines import Line2D
+custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),
+                Line2D([0], [0], color=cmap(.5), lw=4),
+                Line2D([0], [0], color=cmap(1.), lw=4)]
+
+fig, ax = plt.subplots(figsize=(10, 5))
+lines = ax.plot(data)
+ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);
+
+
+
+
+

There is a lot more that you can do with outputs (such as including interactive outputs) +with your book. For more information about this, see the Jupyter Book documentation

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.html new file mode 100644 index 000000000..6d98fb80e --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.html @@ -0,0 +1,541 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

Main authors:

+
    +
  • David Eppstein

    + +
  • +
  • Peter Tröger

    +
      +
    • wrote the original latexcodec package, which contained a simple +but very effective LaTeX encoder

    • +
    +
  • +
  • Matthias Troffaes (matthias.troffaes@gmail.com)

    +
      +
    • wrote the lexer

    • +
    • integrated codec with the lexer for a simpler and more robust +design

    • +
    • various bugfixes

    • +
    +
  • +
+

Contributors:

+
    +
  • Michael Radziej

  • +
  • Philipp Spitzer

  • +
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.html new file mode 100644 index 000000000..f4b6e00f6 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.html @@ -0,0 +1,535 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+
latexcodec is a lexer and codec to work with LaTeX code in Python
+
Copyright (c) 2011-2020 by Matthias C. M. Troffaes
+
+

Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation +files (the “Software”), to deal in the Software without +restriction, including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following +conditions:

+

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.html new file mode 100644 index 000000000..a65ea1098 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.html @@ -0,0 +1,609 @@ + + + + + + + + + + + markdown-it-container — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

markdown-it-container

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

markdown-it-container#

+

Build Status +NPM version +Coverage Status

+
+

Plugin for creating block-level custom containers for markdown-it markdown parser.

+
+

v2.+ requires markdown-it v5.+, see changelog.

+

With this plugin you can create block containers like:

+
::: warning
+*here be dragons*
+:::
+
+
+

…. and specify how they should be rendered. If no renderer defined, <div> with +container name class will be created:

+
<div class="warning">
+<em>here be dragons</em>
+</div>
+
+
+

Markup is the same as for fenced code blocks. +Difference is, that marker use another character and content is rendered as markdown markup.

+
+

Installation#

+

node.js, browser:

+
$ npm install markdown-it-container --save
+$ bower install markdown-it-container --save
+
+
+
+
+

API#

+
var md = require('markdown-it')()
+            .use(require('markdown-it-container'), name [, options]);
+
+
+

Params:

+
    +
  • name - container name (mandatory)

  • +
  • options:

    +
      +
    • validate - optional, function to validate tail after opening marker, should +return true on success.

    • +
    • render - optional, renderer function for opening/closing tokens.

    • +
    • marker - optional (:), character to use in delimiter.

    • +
    +
  • +
+
+
+

Example#

+
var md = require('markdown-it')();
+
+md.use(require('markdown-it-container'), 'spoiler', {
+
+  validate: function(params) {
+    return params.trim().match(/^spoiler\s+(.*)$/);
+  },
+
+  render: function (tokens, idx) {
+    var m = tokens[idx].info.trim().match(/^spoiler\s+(.*)$/);
+
+    if (tokens[idx].nesting === 1) {
+      // opening tag
+      return '<details><summary>' + md.utils.escapeHtml(m[1]) + '</summary>\n';
+
+    } else {
+      // closing tag
+      return '</details>\n';
+    }
+  }
+});
+
+console.log(md.render('::: spoiler click me\n*content*\n:::\n'));
+
+// Output:
+//
+// <details><summary>click me</summary>
+// <p><em>content</em></p>
+// </details>
+
+
+
+
+

License#

+

MIT

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.html new file mode 100644 index 000000000..977ff6ea8 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.html @@ -0,0 +1,551 @@ + + + + + + + + + + + markdown-it-deflist — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

markdown-it-deflist

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

markdown-it-deflist#

+

Build Status +NPM version +Coverage Status

+
+

Definition list (<dl>) tag plugin for markdown-it markdown parser.

+
+

v2.+ requires markdown-it v5.+, see changelog.

+

Syntax is based on pandoc definition lists.

+
+

Install#

+

node.js, browser:

+
npm install markdown-it-deflist --save
+bower install markdown-it-deflist --save
+
+
+
+
+

Use#

+
var md = require('markdown-it')()
+            .use(require('markdown-it-deflist'));
+
+md.render(/*...*/);
+
+
+

Differences in browser. If you load script directly into the page, without +package system, module will add itself globally as window.markdownitDeflist.

+
+
+

License#

+

MIT

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.html new file mode 100644 index 000000000..2b12331ff --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.html @@ -0,0 +1,747 @@ + + + + + + + + + + + markdown-it-texmath — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + + + + + + +
+ +

License +npm +npm

+
+

markdown-it-texmath#

+

Add TeX math equations to your Markdown documents rendered by markdown-it parser. KaTeX is used as a fast math renderer.

+
+

Features#

+

Simplify the process of authoring markdown documents containing math formulas. +This extension is a comfortable tool for scientists, engineers and students with markdown as their first choice document format.

+
    +
  • Macro support

  • +
  • Simple formula numbering

  • +
  • Inline math with tables, lists and blockquote.

  • +
  • User setting delimiters:

    +
      +
    • 'dollars' (default)

      +
        +
      • inline: $...$

      • +
      • display: $$...$$

      • +
      • display + equation number: $$...$$ (1)

      • +
      +
    • +
    • 'brackets'

      +
        +
      • inline: \(...\)

      • +
      • display: \[...\]

      • +
      • display + equation number: \[...\] (1)

      • +
      +
    • +
    • 'gitlab'

      +
        +
      • inline: $`...`$

      • +
      • display: ```math ... ```

      • +
      • display + equation number: ```math ... ``` (1)

      • +
      +
    • +
    • 'julia'

      +
        +
      • inline: $...$ or ``...``

      • +
      • display: ```math ... ```

      • +
      • display + equation number: ```math ... ``` (1)

      • +
      +
    • +
    • 'kramdown'

      +
        +
      • inline: $$...$$

      • +
      • display: $$...$$

      • +
      • display + equation number: $$...$$ (1)

      • +
      +
    • +
    +
  • +
+
+
+

Show me#

+

View a test table.

+

try it out …

+
+
+

Use with node.js#

+

Install the extension. Verify having markdown-it and katex already installed .

+
npm install markdown-it-texmath
+
+
+

Use it with JavaScript.

+
let kt = require('katex'),
+    tm = require('markdown-it-texmath').use(kt),
+    md = require('markdown-it')().use(tm,{delimiters:'dollars',macros:{"\\RR": "\\mathbb{R}"}});
+
+md.render('Euler\'s identity \(e^{i\pi}+1=0\) is a beautiful formula in $\\RR 2$.')
+
+
+
+
+

Use in Browser#

+
<html>
+<head>
+  <meta charset='utf-8'>
+  <link rel="stylesheet" href="katex.min.css">
+  <link rel="stylesheet" href="texmath.css">
+  <script src="markdown-it.min.js"></script>
+  <script src="katex.min.js"></script>
+  <script src="texmath.js"></script>
+</head>
+<body>
+  <div id="out"></div>
+  <script>
+    let md;
+    document.addEventListener("DOMContentLoaded", () => {
+        const tm = texmath.use(katex);
+        md = markdownit().use(tm,{delimiters:'dollars',macros:{"\\RR": "\\mathbb{R}"}});
+        out.innerHTML = md.render('Euler\'s identity $e^{i\pi}+1=0$ is a beautiful formula in //RR 2.');
+    })
+  </script>
+</body>
+</html>
+
+
+
+
+

CDN#

+

Use following links for texmath.js and texmath.css

+
    +
  • https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.js

  • +
  • https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.css

  • +
+
+
+

Dependencies#

+
    +
  • markdown-it: Markdown parser done right. Fast and easy to extend.

  • +
  • katex: This is where credits for fast rendering TeX math in HTML go to.

  • +
+
+
+

ToDo#

+

nothing yet

+
+
+

FAQ#

+
    +
  • markdown-it-texmath with React Native does not work, why ?

    +
      +
    • markdown-it-texmath is using regular expressions with y (sticky) property and cannot avoid this. The use of the y flag in regular expressions means the plugin is not compatible with React Native (which as of now doesn’t support it and throws an error Invalid flags supplied to RegExp constructor).

    • +
    +
  • +
+
+
+

CHANGELOG#

+
+

[0.6.0] on October 04, 2019#

+ +
+
+

[0.5.5] on February 07, 2019#

+ +
+
+

[0.5.4] on January 20, 2019#

+ +
+
+

[0.5.3] on November 11, 2018#

+ +
+
+

[0.5.2] on September 07, 2018#

+ +
+
+

[0.5.0] on August 15, 2018#

+
    +
  • Fatal blockquote bug investigated. Implemented workaround to vscode bug, which has finally gone with vscode 1.26.0 .

  • +
+
+
+

[0.4.6] on January 05, 2018#

+
    +
  • Escaped underscore bug removed.

  • +
+
+
+

[0.4.5] on November 06, 2017#

+
    +
  • Backslash bug removed.

  • +
+
+
+

[0.4.4] on September 27, 2017#

+
    +
  • Modifying the block mode regular expression with gitlab delimiters, so removing the newline bug.

  • +
+
+
+
+

License#

+

markdown-it-texmath is licensed under the MIT License

+

© Stefan Gössner

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + + + + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/ma/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/ma/README.html new file mode 100644 index 000000000..4087f4257 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/ma/README.html @@ -0,0 +1,777 @@ + + + + + + + + + + + A guide to masked arrays in NumPy — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + + + + + + +
+ +
+

A guide to masked arrays in NumPy#

+ +

See http://www.scipy.org/scipy/numpy/wiki/MaskedArray (dead link) +for updates of this document.

+
+

History#

+

As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became +increasingly frustrated with the subclassing of masked arrays (even if +I can only blame my inexperience). I needed to develop a class of arrays +that could store some additional information along with numerical values, +while keeping the possibility for missing data (picture storing a series +of dates along with measurements, what would later become the TimeSeries +Scikit +(dead link).

+

I started to implement such a class, but then quickly realized that +any additional information disappeared when processing these subarrays +(for example, adding a constant value to a subarray would erase its +dates). I ended up writing the equivalent of numpy.core.ma for my +particular class, ufuncs included. Everything went fine until I needed to +subclass my new class, when more problems showed up: some attributes of +the new subclass were lost during processing. I identified the culprit as +MaskedArray, which returns masked ndarrays when I expected masked +arrays of my class. I was preparing myself to rewrite numpy.core.ma +when I forced myself to learn how to subclass ndarrays. As I became more +familiar with the __new__ and __array_finalize__ methods, +I started to wonder why masked arrays were objects, and not ndarrays, +and whether it wouldn’t be more convenient for subclassing if they did +behave like regular ndarrays.

+

The new maskedarray is what I eventually come up with. The +main differences with the initial numpy.core.ma package are +that MaskedArray is now a subclass of ndarray and that the +_data section can now be any subclass of ndarray. Apart from a +couple of issues listed below, the behavior of the new MaskedArray +class reproduces the old one. Initially the maskedarray +implementation was marginally slower than numpy.ma in some areas, +but work is underway to speed it up; the expectation is that it can be +made substantially faster than the present numpy.ma.

+

Note that if the subclass has some special methods and +attributes, they are not propagated to the masked version: +this would require a modification of the __getattribute__ +method (first trying ndarray.__getattribute__, then trying +self._data.__getattribute__ if an exception is raised in the first +place), which really slows things down.

+
+
+

Main differences#

+
+
    +
  • The _data part of the masked array can be any subclass of ndarray (but not recarray, cf below).

  • +
  • fill_value is now a property, not a function.

  • +
  • in the majority of cases, the mask is forced to nomask when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled.

  • +
  • I got rid of the share_mask flag, I never understood its purpose.

  • +
  • put, putmask and take now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, put and putmask both update the mask when needed. * if a is a masked array, bool(a) raises a ValueError, as it does with ndarrays.

  • +
  • in the same way, the comparison of two masked arrays is a masked array, not a boolean

  • +
  • filled(a) returns an array of the same subclass as a._data, and no test is performed on whether it is contiguous or not.

  • +
  • the mask is always printed, even if it’s nomask, which makes things easy (for me at least) to remember that a masked array is used.

  • +
  • cumsum works as if the _data array was filled with 0. The mask is preserved, but not updated.

  • +
  • cumprod works as if the _data array was filled with 1. The mask is preserved, but not updated.

  • +
+
+
+
+

New features#

+

This list is non-exhaustive…

+
+
    +
  • the mr_ function mimics r_ for masked arrays.

  • +
  • the anom method returns the anomalies (deviations from the average)

  • +
+
+
+
+

Using the new package with numpy.core.ma#

+

I tried to make sure that the new package can understand old masked +arrays. Unfortunately, there’s no upward compatibility.

+

For example:

+
>>> import numpy.core.ma as old_ma
+>>> import maskedarray as new_ma
+>>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0])
+>>> x
+array(data =
+ [     1      2 999999      4      5],
+      mask =
+ [False False True False False],
+      fill_value=999999)
+>>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0])
+>>> y
+array(data = [1 2 -- 4 5],
+      mask = [False False True False False],
+      fill_value=999999)
+>>> x==y
+array(data =
+ [True True True True True],
+      mask =
+ [False False True False False],
+      fill_value=?)
+>>> old_ma.getmask(x) == new_ma.getmask(x)
+array([True, True, True, True, True])
+>>> old_ma.getmask(y) == new_ma.getmask(y)
+array([True, True, False, True, True])
+>>> old_ma.getmask(y)
+False
+
+
+
+
+

Using maskedarray with matplotlib#

+

Starting with matplotlib 0.91.2, the masked array importing will work with +the maskedarray branch) as well as with earlier versions.

+

By default matplotlib still uses numpy.ma, but there is an rcParams setting +that you can use to select maskedarray instead. In the matplotlibrc file +you will find:

+
#maskedarray : False       # True to use external maskedarray module
+                           # instead of numpy.ma; this is a temporary #
+                           setting for testing maskedarray.
+
+
+

Uncomment and set to True to select maskedarray everywhere. +Alternatively, you can test a script with maskedarray by using a +command-line option, e.g.:

+
python simple_plot.py --maskedarray
+
+
+
+
+

Masked records#

+

Like numpy.ma.core, the ndarray-based implementation +of MaskedArray is limited when working with records: you can +mask any record of the array, but not a field in a record. If you +need this feature, you may want to give the mrecords package +a try (available in the maskedarray directory in the scipy +sandbox). This module defines a new class, MaskedRecord. An +instance of this class accepts a recarray as data, and uses two +masks: the fieldmask has as many entries as records in the array, +each entry with the same fields as a record, but of boolean types: +they indicate whether the field is masked or not; a record entry +is flagged as masked in the mask array if all the fields are +masked. A few examples in the file should give you an idea of what +can be done. Note that mrecords is still experimental…

+
+
+

Optimizing maskedarray#

+
+
+

Should masked arrays be filled before processing or not?#

+

In the current implementation, most operations on masked arrays involve +the following steps:

+
+
    +
  • the input arrays are filled

  • +
  • the operation is performed on the filled arrays

  • +
  • the mask is set for the results, from the combination of the input masks and the mask corresponding to the domain of the operation.

  • +
+
+

For example, consider the division of two masked arrays:

+
import numpy
+import maskedarray as ma
+x = ma.array([1,2,3,4],mask=[1,0,0,0], dtype=numpy.float64)
+y = ma.array([-1,0,1,2], mask=[0,0,0,1], dtype=numpy.float64)
+
+
+

The division of x by y is then computed as:

+
d1 = x.filled(0) # d1 = array([0., 2., 3., 4.])
+d2 = y.filled(1) # array([-1.,  0.,  1.,  1.])
+m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
+array([True,False,False,True])
+dm = ma.divide.domain(d1,d2) # array([False,  True, False, False])
+result = (d1/d2).view(MaskedArray) # masked_array([-0. inf, 3., 4.])
+result._mask = logical_or(m, dm)
+
+
+

Note that a division by zero takes place. To avoid it, we can consider +to fill the input arrays, taking the domain mask into account, so that:

+
d1 = x._data.copy() # d1 = array([1., 2., 3., 4.])
+d2 = y._data.copy() # array([-1.,  0.,  1.,  2.])
+dm = ma.divide.domain(d1,d2) # array([False,  True, False, False])
+numpy.putmask(d2, dm, 1) # d2 = array([-1.,  1.,  1.,  2.])
+m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
+array([True,False,False,True])
+result = (d1/d2).view(MaskedArray) # masked_array([-1. 0., 3., 2.])
+result._mask = logical_or(m, dm)
+
+
+

Note that the .copy() is required to avoid updating the inputs with +putmask. The .filled() method also involves a .copy().

+

A third possibility consists in avoid filling the arrays:

+
d1 = x._data # d1 = array([1., 2., 3., 4.])
+d2 = y._data # array([-1.,  0.,  1.,  2.])
+dm = ma.divide.domain(d1,d2) # array([False,  True, False, False])
+m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
+array([True,False,False,True])
+result = (d1/d2).view(MaskedArray) # masked_array([-1. inf, 3., 2.])
+result._mask = logical_or(m, dm)
+
+
+

Note that here again the division by zero takes place.

+

A quick benchmark gives the following results:

+
+
    +
  • numpy.ma.divide : 2.69 ms per loop

  • +
  • classical division : 2.21 ms per loop

  • +
  • division w/ prefilling : 2.34 ms per loop

  • +
  • division w/o filling : 1.55 ms per loop

  • +
+
+

So, is it worth filling the arrays beforehand ? Yes, if we are interested +in avoiding floating-point exceptions that may fill the result with infs +and nans. No, if we are only interested into speed…

+
+
+

Thanks#

+

I’d like to thank Paul Dubois, Travis Oliphant and Sasha for the +original masked array package: without you, I would never have started +that (it might be argued that I shouldn’t have anyway, but that’s +another story…). I also wish to extend these thanks to Reggie Dugard +and Eric Firing for their suggestions and numerous improvements.

+
+
+

Revision notes#

+
+
    +
  • 08/25/2007 : Creation of this page

  • +
  • 01/23/2007 : The package has been moved to the SciPy sandbox, and is regularly updated: please check out your SVN version!

  • +
+
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + + + + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.html new file mode 100644 index 000000000..9e2a43677 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.html @@ -0,0 +1,582 @@ + + + + + + + + + + + NCSA Open Source License — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

NCSA Open Source License

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

This software is dual-licensed under the The University of Illinois/NCSA +Open Source License (NCSA) and The 3-Clause BSD License

+
+

NCSA Open Source License#

+

Copyright (c) 2019 Kevin Sheppard. All rights reserved.

+

Developed by: Kevin Sheppard (kevin.sheppard@economics.ox.ac.uk, +kevin.k.sheppard@gmail.com) +http://www.kevinsheppard.com

+

Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the “Software”), to deal with +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions:

+

Redistributions of source code must retain the above copyright notice, this +list of conditions and the following disclaimers.

+

Redistributions in binary form must reproduce the above copyright notice, this +list of conditions and the following disclaimers in the documentation and/or +other materials provided with the distribution.

+

Neither the names of Kevin Sheppard, nor the names of any contributors may be +used to endorse or promote products derived from this Software without specific +prior written permission.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH +THE SOFTWARE.

+
+
+

3-Clause BSD License#

+

Copyright (c) 2019 Kevin Sheppard. All rights reserved.

+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met:

+
    +
  1. Redistributions of source code must retain the above copyright notice, +this list of conditions and the following disclaimer.

  2. +
  3. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution.

  4. +
  5. Neither the name of the copyright holder nor the names of its contributors +may be used to endorse or promote products derived from this software +without specific prior written permission.

  6. +
+

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF +THE POSSIBILITY OF SUCH DAMAGE.

+
+
+

Components#

+

Many parts of this module have been derived from original sources, +often the algorithm’s designer. Component licenses are located with +the component code.

+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.html new file mode 100644 index 000000000..04fe3fdfa --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.html @@ -0,0 +1,540 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

BSD 3-Clause License

+

Copyright (c) 2013-2024, Kim Davies and contributors. +All rights reserved.

+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met:

+
    +
  1. Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer.

  2. +
  3. Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution.

  4. +
  5. Neither the name of the copyright holder nor the names of its +contributors may be used to endorse or promote products derived from +this software without specific prior written permission.

  6. +
+

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.html new file mode 100644 index 000000000..399a27069 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.html @@ -0,0 +1,528 @@ + + + + + + + + + + + Authors — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

Authors

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+

Authors#

+
+

Creator#

+

Jonathan Slenders <jonathan AT slenders.be>

+
+
+

Contributors#

+
    +
  • Amjith Ramanujam <amjith.r AT gmail.com>

  • +
+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.html new file mode 100644 index 000000000..1100b02f2 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.html @@ -0,0 +1,535 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+
pybtex-docutils is a docutils backend for pybtex
+
Copyright (c) 2013-2021 by Matthias C. M. Troffaes
+
+

Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation +files (the “Software”), to deal in the Software without +restriction, including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following +conditions:

+

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.html new file mode 100644 index 000000000..2dd213461 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.html @@ -0,0 +1,538 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

BSD 3-Clause License

+

Copyright (c) 2009-2012, Brian Granger, Min Ragan-Kelley

+

All rights reserved.

+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met:

+
    +
  1. Redistributions of source code must retain the above copyright notice, this +list of conditions and the following disclaimer.

  2. +
  3. Redistributions in binary form must reproduce the above copyright notice, +this list of conditions and the following disclaimer in the documentation +and/or other materials provided with the distribution.

  4. +
  5. Neither the name of the copyright holder nor the names of its +contributors may be used to endorse or promote products derived from +this software without specific prior written permission.

  6. +
+

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.html new file mode 100644 index 000000000..e26a1b3d2 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.html @@ -0,0 +1,530 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

MIT License

+

Copyright (c) 2018 - 2025 Isaac Muse isaacmuse@gmail.com

+

Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the “Software”), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions:

+

The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.html new file mode 100644 index 000000000..73d7a77f2 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.html @@ -0,0 +1,577 @@ + + + + + + + + + + + License for Sphinx — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

License for Sphinx

+ +
+ +
+
+ + + + +
+ +
+

License for Sphinx#

+

Unless otherwise indicated, all code in the Sphinx project is licenced under the +two clause BSD licence below.

+

Copyright (c) 2007-2024 by the Sphinx team (see AUTHORS file). +All rights reserved.

+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met:

+
    +
  • Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer.

  • +
  • Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution.

  • +
+

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+
+
+

Licenses for incorporated software#

+

The included implementation of NumpyDocstring._parse_numpydoc_see_also_section +was derived from code under the following license:

+
+

Copyright (C) 2008 Stefan van der Walt <stefan@mentat.za.net>, Pauli Virtanen <pav@iki.fi>

+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met:

+
+
    +
  1. Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer.

  2. +
  3. Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in +the documentation and/or other materials provided with the +distribution.

  4. +
+
+

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS’’ AND ANY EXPRESS OR +IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, +INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, +STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING +IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE.

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.html new file mode 100644 index 000000000..f731983cf --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.html @@ -0,0 +1,578 @@ + + + + + + + + + + + Translation workflow — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + + + + + + +
+ +
+

Translation workflow#

+

This folder contains code and translations for supporting multiple languages with Sphinx. +See the Sphinx internationalization documentation for more details.

+
+

Structure of translation files#

+
+

Translation source files#

+

The source files for our translations are hand-edited, and contain the raw mapping of words onto various languages. +They are checked in to git history with this repository.

+

src/sphinx_book_theme/assets/translations/jsons contains a collection of JSON files that define the translation for various phrases in this repository. +Each file is a different phrase, and its contents define language codes and translated phrases for each language we support. +They were originally created with the smodin.io language translator (see below for how to update them).

+
+
+

Compiled translation files#

+

The translation source files are compiled at build time (when we run stb compile) automatically. +This is executed by the Python script at python src/sphinx_book_theme/_compile_translations.py (more information on that below).

+

These compiled files are not checked into .git history, but they are bundled with the theme when it is distributed in a package. +Here’s a brief explanation of each:

+
    +
  • src/sphinx_book_theme/theme/sphinx_book_theme/static/locales contains Sphinx locale files that were auto-converted from the files in jsons/ by the helper script below.

  • +
  • src/sphinx_book_theme/_compile_translations.py is a helper script to auto-generate Sphinx locale files from the JSONs in jsons/.

  • +
+
+
+
+

Workflow of translations#

+

Here’s a short workflow of how to add a new translation, assuming that you are translating using the smodin.io service.

+
    +
  1. Go to the smodin.io service

  2. +
  3. Select as many languages as you like.

  4. +
  5. Type in the phrase you’d like to translate.

  6. +
  7. Click TRANSLATE and then Download JSON.

  8. +
  9. This will download a JSON file with a bunch of language-code: translated-phrase mappings.

  10. +
  11. Put this JSON in the jsons/ folder, and rename it to be the phrase you’ve translated in English. +So if the original phrase is My phrase, you should name the file My phrase.json.

  12. +
  13. Run the prettier formatter on this JSON to split it into multiple lines (this makes it easier to read and edit if translations should be updated)

    +
    prettier sphinx_book_theme/translations/jsons/<message name>.json
    +
    +
    +
  14. +
  15. Run python src/sphinx_book_theme/_compile_translations.py

  16. +
  17. This will generate the locale files (.mo) that Sphinx uses in its translation machinery, and put them in locales/<language-code>/LC_MESSAGES/<msg>.mo.

  18. +
+

Sphinx should now know how to translate this message!

+
+
+

To update a translation#

+

To update a translation, you may go to the phase you’d like to modify in jsons/, then find the entry for the language you’d like to update, and change its value. +Finally, run python src/sphinx_book_theme/_compile_translations.py and this will update the .mo files.

+
+
+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + + + + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.html new file mode 100644 index 000000000..3fc23d826 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.html @@ -0,0 +1,539 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +
+
sphinxcontrib-bibtex is a Sphinx extension for BibTeX style citations
+
Copyright (c) 2011-2024 by Matthias C. M. Troffaes
+
All rights reserved.
+
+

Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met:

+
    +
  • Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer.

  • +
  • Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution.

  • +
+

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.html new file mode 100644 index 000000000..1e7e089e2 --- /dev/null +++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.html @@ -0,0 +1,514 @@ + + + + + + + + + + + <no title> — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + +
+
+
+
+
+ +
+ +
+ + + + + +
+
+ + + + + +
+ + + + + + + + + + + + + +
+ +
+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + +
+

+ +
+
+ +
+

Contents

+
+ +
+
+
+ + + + +
+ +

PyZMQ’s CFFI support is designed only for (Unix) systems conforming to have_sys_un_h = True.

+ + + + +
+ + + + + + +
+ +
+
+
+ +
+ + + +
+ + +
+
+ + +
+ + +
+
+
+ + + + + +
+
+ + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/_images/000000.png b/doc/LectureNotes/_build/html/_images/000000.png new file mode 100644 index 000000000..6495ae016 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/000000.png differ diff --git a/doc/LectureNotes/_build/html/_images/005b82.png b/doc/LectureNotes/_build/html/_images/005b82.png new file mode 100644 index 000000000..1842c5ee3 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/005b82.png differ diff --git a/doc/LectureNotes/_build/html/_images/00622f.png b/doc/LectureNotes/_build/html/_images/00622f.png new file mode 100644 index 000000000..1226459ce Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/00622f.png differ diff --git a/doc/LectureNotes/_build/html/_images/0072b2.png b/doc/LectureNotes/_build/html/_images/0072b2.png new file mode 100644 index 000000000..03e4db02f Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/0072b2.png differ diff --git a/doc/LectureNotes/_build/html/_images/00749c.png b/doc/LectureNotes/_build/html/_images/00749c.png new file mode 100644 index 000000000..64f9a001a Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/00749c.png differ diff --git a/doc/LectureNotes/_build/html/_images/008561.png b/doc/LectureNotes/_build/html/_images/008561.png new file mode 100644 index 000000000..8c4c56b64 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/008561.png differ diff --git a/doc/LectureNotes/_build/html/_images/00e0e0.png b/doc/LectureNotes/_build/html/_images/00e0e0.png new file mode 100644 index 000000000..0ceb5bcd8 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/00e0e0.png differ diff --git a/doc/LectureNotes/_build/html/_images/023b95.png b/doc/LectureNotes/_build/html/_images/023b95.png new file mode 100644 index 000000000..8b48ae837 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/023b95.png differ diff --git a/doc/LectureNotes/_build/html/_images/024c1a.png b/doc/LectureNotes/_build/html/_images/024c1a.png new file mode 100644 index 000000000..692ad9457 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/024c1a.png differ diff --git a/doc/LectureNotes/_build/html/_images/0550ae.png b/doc/LectureNotes/_build/html/_images/0550ae.png new file mode 100644 index 000000000..b6e505fa9 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/0550ae.png differ diff --git a/doc/LectureNotes/_build/html/_images/080808.png b/doc/LectureNotes/_build/html/_images/080808.png new file mode 100644 index 000000000..ad395a689 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/080808.png differ diff --git a/doc/LectureNotes/_build/html/_images/116329.png b/doc/LectureNotes/_build/html/_images/116329.png new file mode 100644 index 000000000..55ddb67a3 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/116329.png differ diff --git a/doc/LectureNotes/_build/html/_images/116633.png b/doc/LectureNotes/_build/html/_images/116633.png new file mode 100644 index 000000000..340ef62ef Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/116633.png differ diff --git a/doc/LectureNotes/_build/html/_images/141414.png b/doc/LectureNotes/_build/html/_images/141414.png new file mode 100644 index 000000000..4aa24384e Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/141414.png differ diff --git a/doc/LectureNotes/_build/html/_images/18c1c4.png b/doc/LectureNotes/_build/html/_images/18c1c4.png new file mode 100644 index 000000000..1cfdaff04 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/18c1c4.png differ diff --git a/doc/LectureNotes/_build/html/_images/1e1e1e.png b/doc/LectureNotes/_build/html/_images/1e1e1e.png new file mode 100644 index 000000000..bd434c627 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/1e1e1e.png differ diff --git a/doc/LectureNotes/_build/html/_images/24292f.png b/doc/LectureNotes/_build/html/_images/24292f.png new file mode 100644 index 000000000..8a6e5b70d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/24292f.png differ diff --git a/doc/LectureNotes/_build/html/_images/3d73a9.png b/doc/LectureNotes/_build/html/_images/3d73a9.png new file mode 100644 index 000000000..bb65f9821 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/3d73a9.png differ diff --git a/doc/LectureNotes/_build/html/_images/437a6b.png b/doc/LectureNotes/_build/html/_images/437a6b.png new file mode 100644 index 000000000..3be95ecdd Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/437a6b.png differ diff --git a/doc/LectureNotes/_build/html/_images/515151.png b/doc/LectureNotes/_build/html/_images/515151.png new file mode 100644 index 000000000..491fc486c Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/515151.png differ diff --git a/doc/LectureNotes/_build/html/_images/5391cf.png b/doc/LectureNotes/_build/html/_images/5391cf.png new file mode 100644 index 000000000..9676ecf32 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/5391cf.png differ diff --git a/doc/LectureNotes/_build/html/_images/5ca7e4.png b/doc/LectureNotes/_build/html/_images/5ca7e4.png new file mode 100644 index 000000000..b580c19ec Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/5ca7e4.png differ diff --git a/doc/LectureNotes/_build/html/_images/622cbc.png b/doc/LectureNotes/_build/html/_images/622cbc.png new file mode 100644 index 000000000..3591ab100 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/622cbc.png differ diff --git a/doc/LectureNotes/_build/html/_images/66707b.png b/doc/LectureNotes/_build/html/_images/66707b.png new file mode 100644 index 000000000..f4189a06c Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/66707b.png differ diff --git a/doc/LectureNotes/_build/html/_images/66ccee.png b/doc/LectureNotes/_build/html/_images/66ccee.png new file mode 100644 index 000000000..a83dab6e2 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/66ccee.png differ diff --git a/doc/LectureNotes/_build/html/_images/66e9ec.png b/doc/LectureNotes/_build/html/_images/66e9ec.png new file mode 100644 index 000000000..1c98cea18 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/66e9ec.png differ diff --git a/doc/LectureNotes/_build/html/_images/6730c5.png b/doc/LectureNotes/_build/html/_images/6730c5.png new file mode 100644 index 000000000..38814dbc4 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/6730c5.png differ diff --git a/doc/LectureNotes/_build/html/_images/6e7781.png b/doc/LectureNotes/_build/html/_images/6e7781.png new file mode 100644 index 000000000..db5ddb9ea Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/6e7781.png differ diff --git a/doc/LectureNotes/_build/html/_images/6f98b3.png b/doc/LectureNotes/_build/html/_images/6f98b3.png new file mode 100644 index 000000000..fbaa00f2f Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/6f98b3.png differ diff --git a/doc/LectureNotes/_build/html/_images/702c00.png b/doc/LectureNotes/_build/html/_images/702c00.png new file mode 100644 index 000000000..64de65cc3 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/702c00.png differ diff --git a/doc/LectureNotes/_build/html/_images/72f088.png b/doc/LectureNotes/_build/html/_images/72f088.png new file mode 100644 index 000000000..e624bc7f6 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/72f088.png differ diff --git a/doc/LectureNotes/_build/html/_images/737373.png b/doc/LectureNotes/_build/html/_images/737373.png new file mode 100644 index 000000000..436059c52 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/737373.png differ diff --git a/doc/LectureNotes/_build/html/_images/797979.png b/doc/LectureNotes/_build/html/_images/797979.png new file mode 100644 index 000000000..5642e0e9e Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/797979.png differ diff --git a/doc/LectureNotes/_build/html/_images/7998f2.png b/doc/LectureNotes/_build/html/_images/7998f2.png new file mode 100644 index 000000000..fc8b9ec22 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/7998f2.png differ diff --git a/doc/LectureNotes/_build/html/_images/79c0ff.png b/doc/LectureNotes/_build/html/_images/79c0ff.png new file mode 100644 index 000000000..0c15a6509 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/79c0ff.png differ diff --git a/doc/LectureNotes/_build/html/_images/7ee787.png b/doc/LectureNotes/_build/html/_images/7ee787.png new file mode 100644 index 000000000..639863c5c Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/7ee787.png differ diff --git a/doc/LectureNotes/_build/html/_images/7f4707.png b/doc/LectureNotes/_build/html/_images/7f4707.png new file mode 100644 index 000000000..248de1972 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/7f4707.png differ diff --git a/doc/LectureNotes/_build/html/_images/8045e5.png b/doc/LectureNotes/_build/html/_images/8045e5.png new file mode 100644 index 000000000..08ab32e85 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8045e5.png differ diff --git a/doc/LectureNotes/_build/html/_images/81b19b.png b/doc/LectureNotes/_build/html/_images/81b19b.png new file mode 100644 index 000000000..e2b23db8f Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/81b19b.png differ diff --git a/doc/LectureNotes/_build/html/_images/8250df.png b/doc/LectureNotes/_build/html/_images/8250df.png new file mode 100644 index 000000000..fd096abf0 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8250df.png differ diff --git a/doc/LectureNotes/_build/html/_images/8786ac.png b/doc/LectureNotes/_build/html/_images/8786ac.png new file mode 100644 index 000000000..995c0e551 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8786ac.png differ diff --git a/doc/LectureNotes/_build/html/_images/8a4600.png b/doc/LectureNotes/_build/html/_images/8a4600.png new file mode 100644 index 000000000..2fc6f5809 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8a4600.png differ diff --git a/doc/LectureNotes/_build/html/_images/8b949e.png b/doc/LectureNotes/_build/html/_images/8b949e.png new file mode 100644 index 000000000..ad7978584 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8b949e.png differ diff --git a/doc/LectureNotes/_build/html/_images/8c8c8c.png b/doc/LectureNotes/_build/html/_images/8c8c8c.png new file mode 100644 index 000000000..b8cd92d80 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8c8c8c.png differ diff --git a/doc/LectureNotes/_build/html/_images/912583.png b/doc/LectureNotes/_build/html/_images/912583.png new file mode 100644 index 000000000..a71611eae Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/912583.png differ diff --git a/doc/LectureNotes/_build/html/_images/91cbff.png b/doc/LectureNotes/_build/html/_images/91cbff.png new file mode 100644 index 000000000..58e7706f7 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/91cbff.png differ diff --git a/doc/LectureNotes/_build/html/_images/953800.png b/doc/LectureNotes/_build/html/_images/953800.png new file mode 100644 index 000000000..b102d5e58 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/953800.png differ diff --git a/doc/LectureNotes/_build/html/_images/974eb7.png b/doc/LectureNotes/_build/html/_images/974eb7.png new file mode 100644 index 000000000..0cd42cd71 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/974eb7.png differ diff --git a/doc/LectureNotes/_build/html/_images/98661b.png b/doc/LectureNotes/_build/html/_images/98661b.png new file mode 100644 index 000000000..030036eaf Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/98661b.png differ diff --git a/doc/LectureNotes/_build/html/_images/996b00.png b/doc/LectureNotes/_build/html/_images/996b00.png new file mode 100644 index 000000000..1f1404f3f Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/996b00.png differ diff --git a/doc/LectureNotes/_build/html/_images/9e86c8.png b/doc/LectureNotes/_build/html/_images/9e86c8.png new file mode 100644 index 000000000..d66df7500 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/9e86c8.png differ diff --git a/doc/LectureNotes/_build/html/_images/9e8741.png b/doc/LectureNotes/_build/html/_images/9e8741.png new file mode 100644 index 000000000..53d7ec828 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/9e8741.png differ diff --git a/doc/LectureNotes/_build/html/_images/9f4e55.png b/doc/LectureNotes/_build/html/_images/9f4e55.png new file mode 100644 index 000000000..422fefee1 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/9f4e55.png differ diff --git a/doc/LectureNotes/_build/html/_images/a0111f.png b/doc/LectureNotes/_build/html/_images/a0111f.png new file mode 100644 index 000000000..dc3d2c82d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a0111f.png differ diff --git a/doc/LectureNotes/_build/html/_images/a11y-dark.png b/doc/LectureNotes/_build/html/_images/a11y-dark.png new file mode 100644 index 000000000..08447103a Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-dark.png differ diff --git a/doc/LectureNotes/_build/html/_images/a11y-high-contrast-dark.png b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-dark.png new file mode 100644 index 000000000..6e422ed62 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-dark.png differ diff --git a/doc/LectureNotes/_build/html/_images/a11y-high-contrast-light.png b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-light.png new file mode 100644 index 000000000..6bb19b562 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-light.png differ diff --git a/doc/LectureNotes/_build/html/_images/a11y-light.png b/doc/LectureNotes/_build/html/_images/a11y-light.png new file mode 100644 index 000000000..7585d6db7 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-light.png differ diff --git a/doc/LectureNotes/_build/html/_images/a12236.png b/doc/LectureNotes/_build/html/_images/a12236.png new file mode 100644 index 000000000..a61aa4df5 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a12236.png differ diff --git a/doc/LectureNotes/_build/html/_images/a25e53.png b/doc/LectureNotes/_build/html/_images/a25e53.png new file mode 100644 index 000000000..67d5db79c Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a25e53.png differ diff --git a/doc/LectureNotes/_build/html/_images/a2bffc.png b/doc/LectureNotes/_build/html/_images/a2bffc.png new file mode 100644 index 000000000..74fd8d7fd Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a2bffc.png differ diff --git a/doc/LectureNotes/_build/html/_images/a5d6ff.png b/doc/LectureNotes/_build/html/_images/a5d6ff.png new file mode 100644 index 000000000..85dc8ab5c Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a5d6ff.png differ diff --git a/doc/LectureNotes/_build/html/_images/ab6369.png b/doc/LectureNotes/_build/html/_images/ab6369.png new file mode 100644 index 000000000..bbe790f35 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ab6369.png differ diff --git a/doc/LectureNotes/_build/html/_images/abe338.png b/doc/LectureNotes/_build/html/_images/abe338.png new file mode 100644 index 000000000..f71421dc4 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/abe338.png differ diff --git a/doc/LectureNotes/_build/html/_images/b19db4.png b/doc/LectureNotes/_build/html/_images/b19db4.png new file mode 100644 index 000000000..2bcf14bf7 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b19db4.png differ diff --git a/doc/LectureNotes/_build/html/_images/b1bac4.png b/doc/LectureNotes/_build/html/_images/b1bac4.png new file mode 100644 index 000000000..fa99254a4 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b1bac4.png differ diff --git a/doc/LectureNotes/_build/html/_images/b35900.png b/doc/LectureNotes/_build/html/_images/b35900.png new file mode 100644 index 000000000..af6314be1 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b35900.png differ diff --git a/doc/LectureNotes/_build/html/_images/b89784.png b/doc/LectureNotes/_build/html/_images/b89784.png new file mode 100644 index 000000000..553c732ed Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b89784.png differ diff --git a/doc/LectureNotes/_build/html/_images/bbbbbb.png b/doc/LectureNotes/_build/html/_images/bbbbbb.png new file mode 100644 index 000000000..45fa9bad4 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/bbbbbb.png differ diff --git a/doc/LectureNotes/_build/html/_images/bf5400.png b/doc/LectureNotes/_build/html/_images/bf5400.png new file mode 100644 index 000000000..ba4f6a062 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/bf5400.png differ diff --git a/doc/LectureNotes/_build/html/_images/blinds-dark.png b/doc/LectureNotes/_build/html/_images/blinds-dark.png new file mode 100644 index 000000000..fca57415b Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/blinds-dark.png differ diff --git a/doc/LectureNotes/_build/html/_images/blinds-light.png b/doc/LectureNotes/_build/html/_images/blinds-light.png new file mode 100644 index 000000000..c715ebf74 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/blinds-light.png differ diff --git a/doc/LectureNotes/_build/html/_images/c4a2f5.png b/doc/LectureNotes/_build/html/_images/c4a2f5.png new file mode 100644 index 000000000..815fa9a09 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/c4a2f5.png differ diff --git a/doc/LectureNotes/_build/html/_images/c5e478.png b/doc/LectureNotes/_build/html/_images/c5e478.png new file mode 100644 index 000000000..450f66bdf Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/c5e478.png differ diff --git a/doc/LectureNotes/_build/html/_images/c9d1d9.png b/doc/LectureNotes/_build/html/_images/c9d1d9.png new file mode 100644 index 000000000..56df44316 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/c9d1d9.png differ diff --git a/doc/LectureNotes/_build/html/_images/caab6d.png b/doc/LectureNotes/_build/html/_images/caab6d.png new file mode 100644 index 000000000..a20ea8967 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/caab6d.png differ diff --git a/doc/LectureNotes/_build/html/_images/cc398b.png b/doc/LectureNotes/_build/html/_images/cc398b.png new file mode 100644 index 000000000..b05f6b3f6 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/cc398b.png differ diff --git a/doc/LectureNotes/_build/html/_images/ccbb44.png b/doc/LectureNotes/_build/html/_images/ccbb44.png new file mode 100644 index 000000000..20d1da3f3 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ccbb44.png differ diff --git a/doc/LectureNotes/_build/html/_images/cf222e.png b/doc/LectureNotes/_build/html/_images/cf222e.png new file mode 100644 index 000000000..eba14bfb1 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/cf222e.png differ diff --git a/doc/LectureNotes/_build/html/_images/d166a3.png b/doc/LectureNotes/_build/html/_images/d166a3.png new file mode 100644 index 000000000..34af2ff43 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d166a3.png differ diff --git a/doc/LectureNotes/_build/html/_images/d2a8ff.png b/doc/LectureNotes/_build/html/_images/d2a8ff.png new file mode 100644 index 000000000..d3ba734d9 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d2a8ff.png differ diff --git a/doc/LectureNotes/_build/html/_images/d4d0ab.png b/doc/LectureNotes/_build/html/_images/d4d0ab.png new file mode 100644 index 000000000..4c7b827a4 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d4d0ab.png differ diff --git a/doc/LectureNotes/_build/html/_images/d71835.png b/doc/LectureNotes/_build/html/_images/d71835.png new file mode 100644 index 000000000..aee961355 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d71835.png differ diff --git a/doc/LectureNotes/_build/html/_images/d9dee3.png b/doc/LectureNotes/_build/html/_images/d9dee3.png new file mode 100644 index 000000000..61bac902b Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d9dee3.png differ diff --git a/doc/LectureNotes/_build/html/_images/dbb7ff.png b/doc/LectureNotes/_build/html/_images/dbb7ff.png new file mode 100644 index 000000000..fe7039bd1 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/dbb7ff.png differ diff --git a/doc/LectureNotes/_build/html/_images/dcc6e0.png b/doc/LectureNotes/_build/html/_images/dcc6e0.png new file mode 100644 index 000000000..ad963c944 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/dcc6e0.png differ diff --git a/doc/LectureNotes/_build/html/_images/ec8e2c.png b/doc/LectureNotes/_build/html/_images/ec8e2c.png new file mode 100644 index 000000000..857cbbd5a Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ec8e2c.png differ diff --git a/doc/LectureNotes/_build/html/_images/ee6677.png b/doc/LectureNotes/_build/html/_images/ee6677.png new file mode 100644 index 000000000..a074ed315 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ee6677.png differ diff --git a/doc/LectureNotes/_build/html/_images/f26196.png b/doc/LectureNotes/_build/html/_images/f26196.png new file mode 100644 index 000000000..136cec902 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f26196.png differ diff --git a/doc/LectureNotes/_build/html/_images/f5a394.png b/doc/LectureNotes/_build/html/_images/f5a394.png new file mode 100644 index 000000000..4650b86f2 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f5a394.png differ diff --git a/doc/LectureNotes/_build/html/_images/f5ab35.png b/doc/LectureNotes/_build/html/_images/f5ab35.png new file mode 100644 index 000000000..5df91ee45 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f5ab35.png differ diff --git a/doc/LectureNotes/_build/html/_images/f5f5f5.png b/doc/LectureNotes/_build/html/_images/f5f5f5.png new file mode 100644 index 000000000..6703ca30c Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f5f5f5.png differ diff --git a/doc/LectureNotes/_build/html/_images/f78c6c.png b/doc/LectureNotes/_build/html/_images/f78c6c.png new file mode 100644 index 000000000..5cf8e8bcd Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f78c6c.png differ diff --git a/doc/LectureNotes/_build/html/_images/f8f8f2.png b/doc/LectureNotes/_build/html/_images/f8f8f2.png new file mode 100644 index 000000000..34d0957f3 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f8f8f2.png differ diff --git a/doc/LectureNotes/_build/html/_images/fad000.png b/doc/LectureNotes/_build/html/_images/fad000.png new file mode 100644 index 000000000..58ac7e8bf Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/fad000.png differ diff --git a/doc/LectureNotes/_build/html/_images/fdac54.png b/doc/LectureNotes/_build/html/_images/fdac54.png new file mode 100644 index 000000000..9f681b21d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/fdac54.png differ diff --git a/doc/LectureNotes/_build/html/_images/fefeff.png b/doc/LectureNotes/_build/html/_images/fefeff.png new file mode 100644 index 000000000..ab6d16c80 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/fefeff.png differ diff --git a/doc/LectureNotes/_build/html/_images/ff7b72.png b/doc/LectureNotes/_build/html/_images/ff7b72.png new file mode 100644 index 000000000..d5a3abbcb Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ff7b72.png differ diff --git a/doc/LectureNotes/_build/html/_images/ff9492.png b/doc/LectureNotes/_build/html/_images/ff9492.png new file mode 100644 index 000000000..5368ed567 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ff9492.png differ diff --git a/doc/LectureNotes/_build/html/_images/ffa07a.png b/doc/LectureNotes/_build/html/_images/ffa07a.png new file mode 100644 index 000000000..c771ccf9d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffa07a.png differ diff --git a/doc/LectureNotes/_build/html/_images/ffa657.png b/doc/LectureNotes/_build/html/_images/ffa657.png new file mode 100644 index 000000000..ea0e84e3d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffa657.png differ diff --git a/doc/LectureNotes/_build/html/_images/ffb757.png b/doc/LectureNotes/_build/html/_images/ffb757.png new file mode 100644 index 000000000..cb52b6c2a Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffb757.png differ diff --git a/doc/LectureNotes/_build/html/_images/ffd700.png b/doc/LectureNotes/_build/html/_images/ffd700.png new file mode 100644 index 000000000..86dca1571 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffd700.png differ diff --git a/doc/LectureNotes/_build/html/_images/ffd900.png b/doc/LectureNotes/_build/html/_images/ffd900.png new file mode 100644 index 000000000..786918447 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffd900.png differ diff --git a/doc/LectureNotes/_build/html/_images/github-dark-colorblind.png b/doc/LectureNotes/_build/html/_images/github-dark-colorblind.png new file mode 100644 index 000000000..96cf5944d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-dark-colorblind.png differ diff --git a/doc/LectureNotes/_build/html/_images/github-dark-high-contrast.png b/doc/LectureNotes/_build/html/_images/github-dark-high-contrast.png new file mode 100644 index 000000000..f73c3480a Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-dark-high-contrast.png differ diff --git a/doc/LectureNotes/_build/html/_images/github-dark.png b/doc/LectureNotes/_build/html/_images/github-dark.png new file mode 100644 index 000000000..50bac6465 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-dark.png differ diff --git a/doc/LectureNotes/_build/html/_images/github-light-colorblind.png b/doc/LectureNotes/_build/html/_images/github-light-colorblind.png new file mode 100644 index 000000000..d20cedcaf Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-light-colorblind.png differ diff --git a/doc/LectureNotes/_build/html/_images/github-light-high-contrast.png b/doc/LectureNotes/_build/html/_images/github-light-high-contrast.png new file mode 100644 index 000000000..9a2bd4dbe Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-light-high-contrast.png differ diff --git a/doc/LectureNotes/_build/html/_images/github-light.png b/doc/LectureNotes/_build/html/_images/github-light.png new file mode 100644 index 000000000..de457f2ba Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-light.png differ diff --git a/doc/LectureNotes/_build/html/_images/gotthard-dark.png b/doc/LectureNotes/_build/html/_images/gotthard-dark.png new file mode 100644 index 000000000..4976ac1d6 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/gotthard-dark.png differ diff --git a/doc/LectureNotes/_build/html/_images/gotthard-light.png b/doc/LectureNotes/_build/html/_images/gotthard-light.png new file mode 100644 index 000000000..b9c67150d Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/gotthard-light.png differ diff --git a/doc/LectureNotes/_build/html/_images/greative.png b/doc/LectureNotes/_build/html/_images/greative.png new file mode 100644 index 000000000..935a4b6cd Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/greative.png differ diff --git a/doc/LectureNotes/_build/html/_images/pitaya-smoothie.png b/doc/LectureNotes/_build/html/_images/pitaya-smoothie.png new file mode 100644 index 000000000..ce1f7ba74 Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/pitaya-smoothie.png differ diff --git a/doc/LectureNotes/_build/html/_panels_static/panels-main.c949a650a448cc0ae9fd3441c0e17fb0.css b/doc/LectureNotes/_build/html/_panels_static/panels-main.c949a650a448cc0ae9fd3441c0e17fb0.css new file mode 100644 index 000000000..fc14abc85 --- /dev/null +++ b/doc/LectureNotes/_build/html/_panels_static/panels-main.c949a650a448cc0ae9fd3441c0e17fb0.css @@ -0,0 +1 @@ +details.dropdown .summary-title{padding-right:3em !important;-moz-user-select:none;-ms-user-select:none;-webkit-user-select:none;user-select:none}details.dropdown:hover{cursor:pointer}details.dropdown .summary-content{cursor:default}details.dropdown summary{list-style:none;padding:1em}details.dropdown summary .octicon.no-title{vertical-align:middle}details.dropdown[open] summary .octicon.no-title{visibility:hidden}details.dropdown summary::-webkit-details-marker{display:none}details.dropdown summary:focus{outline:none}details.dropdown summary:hover .summary-up svg,details.dropdown summary:hover .summary-down svg{opacity:1}details.dropdown .summary-up svg,details.dropdown .summary-down svg{display:block;opacity:.6}details.dropdown .summary-up,details.dropdown .summary-down{pointer-events:none;position:absolute;right:1em;top:.75em}details.dropdown[open] .summary-down{visibility:hidden}details.dropdown:not([open]) .summary-up{visibility:hidden}details.dropdown.fade-in[open] summary~*{-moz-animation:panels-fade-in .5s ease-in-out;-webkit-animation:panels-fade-in .5s ease-in-out;animation:panels-fade-in .5s ease-in-out}details.dropdown.fade-in-slide-down[open] summary~*{-moz-animation:panels-fade-in .5s ease-in-out, panels-slide-down .5s ease-in-out;-webkit-animation:panels-fade-in .5s ease-in-out, panels-slide-down .5s ease-in-out;animation:panels-fade-in .5s ease-in-out, panels-slide-down .5s ease-in-out}@keyframes panels-fade-in{0%{opacity:0}100%{opacity:1}}@keyframes panels-slide-down{0%{transform:translate(0, -10px)}100%{transform:translate(0, 0)}}.octicon{display:inline-block;fill:currentColor;vertical-align:text-top}.tabbed-content{box-shadow:0 -.0625rem var(--tabs-color-overline),0 .0625rem var(--tabs-color-underline);display:none;order:99;padding-bottom:.75rem;padding-top:.75rem;width:100%}.tabbed-content>:first-child{margin-top:0 !important}.tabbed-content>:last-child{margin-bottom:0 !important}.tabbed-content>.tabbed-set{margin:0}.tabbed-set{border-radius:.125rem;display:flex;flex-wrap:wrap;margin:1em 0;position:relative}.tabbed-set>input{opacity:0;position:absolute}.tabbed-set>input:checked+label{border-color:var(--tabs-color-label-active);color:var(--tabs-color-label-active)}.tabbed-set>input:checked+label+.tabbed-content{display:block}.tabbed-set>input:focus+label{outline-style:auto}.tabbed-set>input:not(.focus-visible)+label{outline:none;-webkit-tap-highlight-color:transparent}.tabbed-set>label{border-bottom:.125rem solid transparent;color:var(--tabs-color-label-inactive);cursor:pointer;font-size:var(--tabs-size-label);font-weight:700;padding:1em 1.25em .5em;transition:color 250ms;width:auto;z-index:1}html .tabbed-set>label:hover{color:var(--tabs-color-label-active)} diff --git a/doc/LectureNotes/_build/html/_panels_static/panels-variables.06eb56fa6e07937060861dad626602ad.css b/doc/LectureNotes/_build/html/_panels_static/panels-variables.06eb56fa6e07937060861dad626602ad.css new file mode 100644 index 000000000..adc616622 --- /dev/null +++ b/doc/LectureNotes/_build/html/_panels_static/panels-variables.06eb56fa6e07937060861dad626602ad.css @@ -0,0 +1,7 @@ +:root { +--tabs-color-label-active: hsla(231, 99%, 66%, 1); +--tabs-color-label-inactive: rgba(178, 206, 245, 0.62); +--tabs-color-overline: rgb(207, 236, 238); +--tabs-color-underline: rgb(207, 236, 238); +--tabs-size-label: 1rem; +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md new file mode 100644 index 000000000..23af038a3 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md @@ -0,0 +1,26 @@ +# A11Y Dark + +This is the Pygments implementation of a11y-dark from [Eric Bailey's +accessible themes for syntax +highlighting](https://github.com/ericwbailey/a11y-syntax-highlighting) + +![Screenshot of the a11y-dark theme in a bash script](./images/a11y-dark.png) + +## Colors + +Background color: ![#2b2b2b](https://via.placeholder.com/20/2b2b2b/2b2b2b.png) `#2b2b2b` + +Highlight color: ![#ffd9002e](https://via.placeholder.com/20/ffd9002e/ffd9002e.png) `#ffd9002e` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#d4d0ab](../../a11y_pygments/assets/d4d0ab.png) | `#d4d0ab` | 9.0 : 1 | AAA | AAA | +| ![#ffa07a](../../a11y_pygments/assets/ffa07a.png) | `#ffa07a` | 7.1 : 1 | AAA | AAA | +| ![#f5ab35](../../a11y_pygments/assets/f5ab35.png) | `#f5ab35` | 7.3 : 1 | AAA | AAA | +| ![#ffd700](../../a11y_pygments/assets/ffd700.png) | `#ffd700` | 10.1 : 1 | AAA | AAA | +| ![#abe338](../../a11y_pygments/assets/abe338.png) | `#abe338` | 9.3 : 1 | AAA | AAA | +| ![#00e0e0](../../a11y_pygments/assets/00e0e0.png) | `#00e0e0` | 8.6 : 1 | AAA | AAA | +| ![#dcc6e0](../../a11y_pygments/assets/dcc6e0.png) | `#dcc6e0` | 8.9 : 1 | AAA | AAA | +| ![#f8f8f2](../../a11y_pygments/assets/f8f8f2.png) | `#f8f8f2` | 13.3 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md new file mode 100644 index 000000000..575b3759b --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md @@ -0,0 +1,22 @@ +# A11Y High Contrast Dark + +This style mimics the a11 light theme from eric bailey's accessible themes. + +![Screenshot of the a11y-high-contrast-dark theme in a bash script](./images/a11y-high-contrast-dark.png) + +## Colors + +Background color: ![#2b2b2b](https://via.placeholder.com/20/2b2b2b/2b2b2b.png) `#2b2b2b` + +Highlight color: ![#ffd9002e](https://via.placeholder.com/20/ffd9002e/ffd9002e.png) `#ffd9002e` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#ffd900](../../a11y_pygments/assets/ffd900.png) | `#ffd900` | 10.2 : 1 | AAA | AAA | +| ![#ffa07a](../../a11y_pygments/assets/ffa07a.png) | `#ffa07a` | 7.1 : 1 | AAA | AAA | +| ![#abe338](../../a11y_pygments/assets/abe338.png) | `#abe338` | 9.3 : 1 | AAA | AAA | +| ![#00e0e0](../../a11y_pygments/assets/00e0e0.png) | `#00e0e0` | 8.6 : 1 | AAA | AAA | +| ![#dcc6e0](../../a11y_pygments/assets/dcc6e0.png) | `#dcc6e0` | 8.9 : 1 | AAA | AAA | +| ![#f8f8f2](../../a11y_pygments/assets/f8f8f2.png) | `#f8f8f2` | 13.3 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md new file mode 100644 index 000000000..a0b6be8fc --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md @@ -0,0 +1,24 @@ +# A11Y High Contrast Light + +This style mimics the a11y-light theme (but with more contrast) from eric bailey's accessible themes. + +![Screenshot of the a11y-high-contrast-light theme in a bash script](./images/a11y-high-contrast-light.png) + +## Colors + +Background color: ![#fefefe](https://via.placeholder.com/20/fefefe/fefefe.png) `#fefefe` + +Highlight color: ![#fae4c2](https://via.placeholder.com/20/fae4c2/fae4c2.png) `#fae4c2` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#515151](../../a11y_pygments/assets/515151.png) | `#515151` | 7.9 : 1 | AAA | AAA | +| ![#a12236](../../a11y_pygments/assets/a12236.png) | `#a12236` | 7.4 : 1 | AAA | AAA | +| ![#7f4707](../../a11y_pygments/assets/7f4707.png) | `#7f4707` | 7.4 : 1 | AAA | AAA | +| ![#912583](../../a11y_pygments/assets/912583.png) | `#912583` | 7.4 : 1 | AAA | AAA | +| ![#00622f](../../a11y_pygments/assets/00622f.png) | `#00622f` | 7.5 : 1 | AAA | AAA | +| ![#005b82](../../a11y_pygments/assets/005b82.png) | `#005b82` | 7.4 : 1 | AAA | AAA | +| ![#6730c5](../../a11y_pygments/assets/6730c5.png) | `#6730c5` | 7.4 : 1 | AAA | AAA | +| ![#080808](../../a11y_pygments/assets/080808.png) | `#080808` | 19.9 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md new file mode 100644 index 000000000..911cef825 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md @@ -0,0 +1,23 @@ +# A11Y Light + +This style inspired by the a11y-light theme from eric bailey's accessible themes. + +![Screenshot of the a11y-light theme in a bash script](./images/a11y-light.png) + +## Colors + +Background color: ![#f2f2f2](https://via.placeholder.com/20/f2f2f2/f2f2f2.png) `#f2f2f2` + +Highlight color: ![#fdf2e2](https://via.placeholder.com/20/fdf2e2/fdf2e2.png) `#fdf2e2` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#515151](../../a11y_pygments/assets/515151.png) | `#515151` | 7.1 : 1 | AAA | AAA | +| ![#d71835](../../a11y_pygments/assets/d71835.png) | `#d71835` | 4.6 : 1 | AA | AAA | +| ![#7f4707](../../a11y_pygments/assets/7f4707.png) | `#7f4707` | 6.7 : 1 | AA | AAA | +| ![#116633](../../a11y_pygments/assets/116633.png) | `#116633` | 6.3 : 1 | AA | AAA | +| ![#00749c](../../a11y_pygments/assets/00749c.png) | `#00749c` | 4.7 : 1 | AA | AAA | +| ![#8045e5](../../a11y_pygments/assets/8045e5.png) | `#8045e5` | 4.8 : 1 | AA | AAA | +| ![#1e1e1e](../../a11y_pygments/assets/1e1e1e.png) | `#1e1e1e` | 14.9 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md new file mode 100644 index 000000000..62529463f --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md @@ -0,0 +1,23 @@ +# Blinds Dark + +This style mimics the blinds dark theme from vscode themes. + +![Screenshot of the blinds-dark theme in a bash script](./images/blinds-dark.png) + +## Colors + +Background color: ![#242424](https://via.placeholder.com/20/242424/242424.png) `#242424` + +Highlight color: ![#66666691](https://via.placeholder.com/20/66666691/66666691.png) `#66666691` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | ------- | ----------- | ---------- | +| ![#8c8c8c](../../a11y_pygments/assets/8c8c8c.png) | `#8c8c8c` | 4.6 : 1 | AA | AAA | +| ![#ee6677](../../a11y_pygments/assets/ee6677.png) | `#ee6677` | 5.0 : 1 | AA | AAA | +| ![#ccbb44](../../a11y_pygments/assets/ccbb44.png) | `#ccbb44` | 8.0 : 1 | AAA | AAA | +| ![#66ccee](../../a11y_pygments/assets/66ccee.png) | `#66ccee` | 8.5 : 1 | AAA | AAA | +| ![#5391cf](../../a11y_pygments/assets/5391cf.png) | `#5391cf` | 4.7 : 1 | AA | AAA | +| ![#d166a3](../../a11y_pygments/assets/d166a3.png) | `#d166a3` | 4.5 : 1 | AA | AAA | +| ![#bbbbbb](../../a11y_pygments/assets/bbbbbb.png) | `#bbbbbb` | 8.1 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md new file mode 100644 index 000000000..28e724e59 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md @@ -0,0 +1,23 @@ +# Blinds Light + +This style mimics the blinds light theme from vscode themes. + +![Screenshot of the blinds-light theme in a bash script](./images/blinds-light.png) + +## Colors + +Background color: ![#fcfcfc](https://via.placeholder.com/20/fcfcfc/fcfcfc.png) `#fcfcfc` + +Highlight color: ![#add6ff](https://via.placeholder.com/20/add6ff/add6ff.png) `#add6ff` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#737373](../../a11y_pygments/assets/737373.png) | `#737373` | 4.6 : 1 | AA | AAA | +| ![#bf5400](../../a11y_pygments/assets/bf5400.png) | `#bf5400` | 4.6 : 1 | AA | AAA | +| ![#996b00](../../a11y_pygments/assets/996b00.png) | `#996b00` | 4.6 : 1 | AA | AAA | +| ![#008561](../../a11y_pygments/assets/008561.png) | `#008561` | 4.5 : 1 | AA | AAA | +| ![#0072b2](../../a11y_pygments/assets/0072b2.png) | `#0072b2` | 5.1 : 1 | AA | AAA | +| ![#cc398b](../../a11y_pygments/assets/cc398b.png) | `#cc398b` | 4.5 : 1 | AA | AAA | +| ![#000000](../../a11y_pygments/assets/000000.png) | `#000000` | 20.5 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md new file mode 100644 index 000000000..0e24df43e --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md @@ -0,0 +1,23 @@ +# Github Dark + +This style mimics the github dark default theme from vs code themes. + +![Screenshot of the github-dark theme in a bash script](./images/github-dark.png) + +## Colors + +Background color: ![#0d1117](https://via.placeholder.com/20/0d1117/0d1117.png) `#0d1117` + +Highlight color: ![#6e7681](https://via.placeholder.com/20/6e7681/6e7681.png) `#6e7681` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#8b949e](../../a11y_pygments/assets/8b949e.png) | `#8b949e` | 6.2 : 1 | AA | AAA | +| ![#ff7b72](../../a11y_pygments/assets/ff7b72.png) | `#ff7b72` | 7.5 : 1 | AAA | AAA | +| ![#ffa657](../../a11y_pygments/assets/ffa657.png) | `#ffa657` | 9.8 : 1 | AAA | AAA | +| ![#7ee787](../../a11y_pygments/assets/7ee787.png) | `#7ee787` | 12.3 : 1 | AAA | AAA | +| ![#79c0ff](../../a11y_pygments/assets/79c0ff.png) | `#79c0ff` | 9.7 : 1 | AAA | AAA | +| ![#d2a8ff](../../a11y_pygments/assets/d2a8ff.png) | `#d2a8ff` | 9.7 : 1 | AAA | AAA | +| ![#c9d1d9](../../a11y_pygments/assets/c9d1d9.png) | `#c9d1d9` | 12.3 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md new file mode 100644 index 000000000..9ad72f9f9 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md @@ -0,0 +1,23 @@ +# Github Dark Colorblind + +This style mimics the github dark colorblind theme from vscode. + +![Screenshot of the github-dark-colorblind theme in a bash script](./images/github-dark-colorblind.png) + +## Colors + +Background color: ![#0d1117](https://via.placeholder.com/20/0d1117/0d1117.png) `#0d1117` + +Highlight color: ![#58a6ff70](https://via.placeholder.com/20/58a6ff70/58a6ff70.png) `#58a6ff70` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#b1bac4](../../a11y_pygments/assets/b1bac4.png) | `#b1bac4` | 9.6 : 1 | AAA | AAA | +| ![#ec8e2c](../../a11y_pygments/assets/ec8e2c.png) | `#ec8e2c` | 7.6 : 1 | AAA | AAA | +| ![#fdac54](../../a11y_pygments/assets/fdac54.png) | `#fdac54` | 10.1 : 1 | AAA | AAA | +| ![#a5d6ff](../../a11y_pygments/assets/a5d6ff.png) | `#a5d6ff` | 12.3 : 1 | AAA | AAA | +| ![#79c0ff](../../a11y_pygments/assets/79c0ff.png) | `#79c0ff` | 9.7 : 1 | AAA | AAA | +| ![#d2a8ff](../../a11y_pygments/assets/d2a8ff.png) | `#d2a8ff` | 9.7 : 1 | AAA | AAA | +| ![#c9d1d9](../../a11y_pygments/assets/c9d1d9.png) | `#c9d1d9` | 12.3 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md new file mode 100644 index 000000000..c395966c5 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md @@ -0,0 +1,23 @@ +# Github Dark High Contrast + +This style mimics the github dark high contrast theme from vs code themes. + +![Screenshot of the github-dark-high-contrast theme in a bash script](./images/github-dark-high-contrast.png) + +## Colors + +Background color: ![#0d1117](https://via.placeholder.com/20/0d1117/0d1117.png) `#0d1117` + +Highlight color: ![#58a6ff70](https://via.placeholder.com/20/58a6ff70/58a6ff70.png) `#58a6ff70` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#d9dee3](../../a11y_pygments/assets/d9dee3.png) | `#d9dee3` | 14.0 : 1 | AAA | AAA | +| ![#ff9492](../../a11y_pygments/assets/ff9492.png) | `#ff9492` | 8.9 : 1 | AAA | AAA | +| ![#ffb757](../../a11y_pygments/assets/ffb757.png) | `#ffb757` | 11.0 : 1 | AAA | AAA | +| ![#72f088](../../a11y_pygments/assets/72f088.png) | `#72f088` | 13.1 : 1 | AAA | AAA | +| ![#91cbff](../../a11y_pygments/assets/91cbff.png) | `#91cbff` | 11.0 : 1 | AAA | AAA | +| ![#dbb7ff](../../a11y_pygments/assets/dbb7ff.png) | `#dbb7ff` | 11.0 : 1 | AAA | AAA | +| ![#c9d1d9](../../a11y_pygments/assets/c9d1d9.png) | `#c9d1d9` | 12.3 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md new file mode 100644 index 000000000..8059f6d20 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md @@ -0,0 +1,23 @@ +# Github Light + +This style mimics the github light theme from vscode themes. + +![Screenshot of the github-light theme in a bash script](./images/github-light.png) + +## Colors + +Background color: ![#ffffff](https://via.placeholder.com/20/ffffff/ffffff.png) `#ffffff` + +Highlight color: ![#0969da4a](https://via.placeholder.com/20/0969da4a/0969da4a.png) `#0969da4a` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#6e7781](../../a11y_pygments/assets/6e7781.png) | `#6e7781` | 4.5 : 1 | AA | AAA | +| ![#cf222e](../../a11y_pygments/assets/cf222e.png) | `#cf222e` | 5.4 : 1 | AA | AAA | +| ![#953800](../../a11y_pygments/assets/953800.png) | `#953800` | 7.4 : 1 | AAA | AAA | +| ![#116329](../../a11y_pygments/assets/116329.png) | `#116329` | 7.4 : 1 | AAA | AAA | +| ![#0550ae](../../a11y_pygments/assets/0550ae.png) | `#0550ae` | 7.6 : 1 | AAA | AAA | +| ![#8250df](../../a11y_pygments/assets/8250df.png) | `#8250df` | 5.0 : 1 | AA | AAA | +| ![#24292f](../../a11y_pygments/assets/24292f.png) | `#24292f` | 14.7 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md new file mode 100644 index 000000000..120cbd39a --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md @@ -0,0 +1,22 @@ +# Github Light Colorblind + +This style mimics the github light colorblind theme from vscode themes. + +![Screenshot of the github-light-colorblind theme in a bash script](./images/github-light-colorblind.png) + +## Colors + +Background color: ![#ffffff](https://via.placeholder.com/20/ffffff/ffffff.png) `#ffffff` + +Highlight color: ![#0969da4a](https://via.placeholder.com/20/0969da4a/0969da4a.png) `#0969da4a` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#6e7781](../../a11y_pygments/assets/6e7781.png) | `#6e7781` | 4.5 : 1 | AA | AAA | +| ![#b35900](../../a11y_pygments/assets/b35900.png) | `#b35900` | 4.8 : 1 | AA | AAA | +| ![#8a4600](../../a11y_pygments/assets/8a4600.png) | `#8a4600` | 7.1 : 1 | AAA | AAA | +| ![#0550ae](../../a11y_pygments/assets/0550ae.png) | `#0550ae` | 7.6 : 1 | AAA | AAA | +| ![#8250df](../../a11y_pygments/assets/8250df.png) | `#8250df` | 5.0 : 1 | AA | AAA | +| ![#24292f](../../a11y_pygments/assets/24292f.png) | `#24292f` | 14.7 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md new file mode 100644 index 000000000..e938e986f --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md @@ -0,0 +1,23 @@ +# Github Light High Contrast + +This style mimics the github light high contrast theme from vscode themes. + +![Screenshot of the github-light-high-contrast theme in a bash script](./images/github-light-high-contrast.png) + +## Colors + +Background color: ![#ffffff](https://via.placeholder.com/20/ffffff/ffffff.png) `#ffffff` + +Highlight color: ![#0969da4a](https://via.placeholder.com/20/0969da4a/0969da4a.png) `#0969da4a` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#66707b](../../a11y_pygments/assets/66707b.png) | `#66707b` | 5.0 : 1 | AA | AAA | +| ![#a0111f](../../a11y_pygments/assets/a0111f.png) | `#a0111f` | 8.1 : 1 | AAA | AAA | +| ![#702c00](../../a11y_pygments/assets/702c00.png) | `#702c00` | 10.2 : 1 | AAA | AAA | +| ![#024c1a](../../a11y_pygments/assets/024c1a.png) | `#024c1a` | 10.2 : 1 | AAA | AAA | +| ![#023b95](../../a11y_pygments/assets/023b95.png) | `#023b95` | 10.2 : 1 | AAA | AAA | +| ![#622cbc](../../a11y_pygments/assets/622cbc.png) | `#622cbc` | 8.1 : 1 | AAA | AAA | +| ![#24292f](../../a11y_pygments/assets/24292f.png) | `#24292f` | 14.7 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md new file mode 100644 index 000000000..8fa52f430 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md @@ -0,0 +1,23 @@ +# Gotthard Dark + +This style mimics the gotthard dark theme from vscode. + +![Screenshot of the gotthard-dark theme in a bash script](./images/gotthard-dark.png) + +## Colors + +Background color: ![#000000](https://via.placeholder.com/20/000000/000000.png) `#000000` + +Highlight color: ![#4c4b4be8](https://via.placeholder.com/20/4c4b4be8/4c4b4be8.png) `#4c4b4be8` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#f5f5f5](../../a11y_pygments/assets/f5f5f5.png) | `#f5f5f5` | 19.3 : 1 | AAA | AAA | +| ![#ab6369](../../a11y_pygments/assets/ab6369.png) | `#ab6369` | 4.7 : 1 | AA | AAA | +| ![#b89784](../../a11y_pygments/assets/b89784.png) | `#b89784` | 7.8 : 1 | AAA | AAA | +| ![#caab6d](../../a11y_pygments/assets/caab6d.png) | `#caab6d` | 9.6 : 1 | AAA | AAA | +| ![#81b19b](../../a11y_pygments/assets/81b19b.png) | `#81b19b` | 8.7 : 1 | AAA | AAA | +| ![#6f98b3](../../a11y_pygments/assets/6f98b3.png) | `#6f98b3` | 6.8 : 1 | AA | AAA | +| ![#b19db4](../../a11y_pygments/assets/b19db4.png) | `#b19db4` | 8.4 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md new file mode 100644 index 000000000..4ff0d9874 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md @@ -0,0 +1,23 @@ +# Gotthard Light + +This style mimics the gotthard light theme from vscode. + +![Screenshot of the gotthard-light theme in a bash script](./images/gotthard-light.png) + +## Colors + +Background color: ![#F5F5F5](https://via.placeholder.com/20/F5F5F5/F5F5F5.png) `#F5F5F5` + +Highlight color: ![#E1E1E1](https://via.placeholder.com/20/E1E1E1/E1E1E1.png) `#E1E1E1` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#141414](../../a11y_pygments/assets/141414.png) | `#141414` | 16.9 : 1 | AAA | AAA | +| ![#9f4e55](../../a11y_pygments/assets/9f4e55.png) | `#9f4e55` | 5.2 : 1 | AA | AAA | +| ![#a25e53](../../a11y_pygments/assets/a25e53.png) | `#a25e53` | 4.5 : 1 | AA | AAA | +| ![#98661b](../../a11y_pygments/assets/98661b.png) | `#98661b` | 4.5 : 1 | AA | AAA | +| ![#437a6b](../../a11y_pygments/assets/437a6b.png) | `#437a6b` | 4.5 : 1 | AA | AAA | +| ![#3d73a9](../../a11y_pygments/assets/3d73a9.png) | `#3d73a9` | 4.6 : 1 | AA | AAA | +| ![#974eb7](../../a11y_pygments/assets/974eb7.png) | `#974eb7` | 4.7 : 1 | AA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md new file mode 100644 index 000000000..5e70f12fe --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md @@ -0,0 +1,23 @@ +# Greative + +This style mimics greative theme from vscode themes. + +![Screenshot of the greative theme in a bash script](./images/greative.png) + +## Colors + +Background color: ![#010726](https://via.placeholder.com/20/010726/010726.png) `#010726` + +Highlight color: ![#473d18](https://via.placeholder.com/20/473d18/473d18.png) `#473d18` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#797979](../../a11y_pygments/assets/797979.png) | `#797979` | 4.6 : 1 | AA | AAA | +| ![#f78c6c](../../a11y_pygments/assets/f78c6c.png) | `#f78c6c` | 8.4 : 1 | AAA | AAA | +| ![#9e8741](../../a11y_pygments/assets/9e8741.png) | `#9e8741` | 5.7 : 1 | AA | AAA | +| ![#c5e478](../../a11y_pygments/assets/c5e478.png) | `#c5e478` | 13.9 : 1 | AAA | AAA | +| ![#a2bffc](../../a11y_pygments/assets/a2bffc.png) | `#a2bffc` | 10.8 : 1 | AAA | AAA | +| ![#5ca7e4](../../a11y_pygments/assets/5ca7e4.png) | `#5ca7e4` | 7.6 : 1 | AAA | AAA | +| ![#9e86c8](../../a11y_pygments/assets/9e86c8.png) | `#9e86c8` | 6.3 : 1 | AA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md new file mode 100644 index 000000000..a83a734b2 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md @@ -0,0 +1,25 @@ +# Pitaya Smoothie + +This style mimics the a11 light theme from eric bailey's accessible themes. + +![Screenshot of the pitaya-smoothie theme in a bash script](./images/pitaya-smoothie.png) + +## Colors + +Background color: ![#181036](https://via.placeholder.com/20/181036/181036.png) `#181036` + +Highlight color: ![#2A1968](https://via.placeholder.com/20/2A1968/2A1968.png) `#2A1968` + +**WCAG compliance** + +| Color | Hex | Ratio | Normal text | Large text | +| ------------------------------------------------- | --------- | -------- | ----------- | ---------- | +| ![#8786ac](../../a11y_pygments/assets/8786ac.png) | `#8786ac` | 5.2 : 1 | AA | AAA | +| ![#f26196](../../a11y_pygments/assets/f26196.png) | `#f26196` | 5.9 : 1 | AA | AAA | +| ![#f5a394](../../a11y_pygments/assets/f5a394.png) | `#f5a394` | 9.0 : 1 | AAA | AAA | +| ![#fad000](../../a11y_pygments/assets/fad000.png) | `#fad000` | 12.1 : 1 | AAA | AAA | +| ![#18c1c4](../../a11y_pygments/assets/18c1c4.png) | `#18c1c4` | 8.1 : 1 | AAA | AAA | +| ![#66e9ec](../../a11y_pygments/assets/66e9ec.png) | `#66e9ec` | 12.4 : 1 | AAA | AAA | +| ![#7998f2](../../a11y_pygments/assets/7998f2.png) | `#7998f2` | 6.5 : 1 | AA | AAA | +| ![#c4a2f5](../../a11y_pygments/assets/c4a2f5.png) | `#c4a2f5` | 8.4 : 1 | AAA | AAA | +| ![#fefeff](../../a11y_pygments/assets/fefeff.png) | `#fefeff` | 17.9 : 1 | AAA | AAA | diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst new file mode 100644 index 000000000..19361a719 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst @@ -0,0 +1,34 @@ +Copyright (c) 2020 Jeff Forcier. + +Based on original work copyright (c) 2011 Kenneth Reitz and copyright (c) 2010 +Armin Ronacher. + +Some rights reserved. + +Redistribution and use in source and binary forms of the theme, with or +without modification, are permitted provided that the following conditions +are met: + +* Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials provided + with the distribution. + +* The names of the contributors may not be used to endorse or + promote products derived from this software without specific + prior written permission. + +THIS THEME IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS THEME, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md new file mode 100644 index 000000000..030e303ee --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md @@ -0,0 +1,30 @@ +Extensions allow extending the debugger without modifying the debugger code. This is implemented with explicit namespace +packages. + +To implement your own extension: + +1. Ensure that the root folder of your extension is in sys.path (add it to PYTHONPATH) +2. Ensure that your module follows the directory structure below +3. The ``__init__.py`` files inside the pydevd_plugin and extension folder must contain the preamble below, +and nothing else. +Preamble: +```python +try: + __import__('pkg_resources').declare_namespace(__name__) +except ImportError: + import pkgutil + __path__ = pkgutil.extend_path(__path__, __name__) +``` +4. Your plugin name inside the extensions folder must start with `"pydevd_plugin"` +5. Implement one or more of the abstract base classes defined in `_pydevd_bundle.pydevd_extension_api`. This can be done +by either inheriting from them or registering with the abstract base class. + +* Directory structure: +``` +|-- root_directory-> must be on python path +| |-- pydevd_plugins +| | |-- __init__.py -> must contain preamble +| | |-- extensions +| | | |-- __init__.py -> must contain preamble +| | | |-- pydevd_plugin_plugin_name.py +``` \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md new file mode 100644 index 000000000..19b6b4524 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md @@ -0,0 +1,31 @@ +BSD 3-Clause License + +Copyright (c) 2013-2024, Kim Davies and contributors. +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst new file mode 100644 index 000000000..58a2394f0 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst @@ -0,0 +1,19 @@ +The MIT License (MIT) +---------------------------- + +Copyright © 2016 Yoshiki Shibukawa + +Permission is hereby granted, free of charge, to any person obtaining a copy of this software +and associated documentation files (the “Software”), to deal in the Software without restriction, +including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, +subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all copies or substantial +portions of the Software. + +THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT +NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH +THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst new file mode 100644 index 000000000..e5c79ef38 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst @@ -0,0 +1,41 @@ +============================= + The IPython licensing terms +============================= + +IPython is licensed under the terms of the Modified BSD License (also known as +New or Revised or 3-Clause BSD). See the LICENSE file. + + +About the IPython Development Team +---------------------------------- + +Fernando Perez began IPython in 2001 based on code from Janko Hauser + and Nathaniel Gray . Fernando is still +the project lead. + +The IPython Development Team is the set of all contributors to the IPython +project. This includes all of the IPython subprojects. + +The core team that coordinates development on GitHub can be found here: +https://github.com/ipython/. + +Our Copyright Policy +-------------------- + +IPython uses a shared copyright model. Each contributor maintains copyright +over their contributions to IPython. But, it is important to note that these +contributions are typically only changes to the repositories. Thus, the IPython +source code, in its entirety is not the copyright of any single person or +institution. Instead, it is the collective copyright of the entire IPython +Development Team. If individual contributors want to maintain a record of what +changes/contributions they have specific copyright on, they should indicate +their copyright in the commit message of the change, when they commit the +change to one of the IPython repositories. + +With this in mind, the following banner should be used in any source code file +to indicate the copyright and license terms: + +:: + + # Copyright (c) IPython Development Team. + # Distributed under the terms of the Modified BSD License. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md new file mode 100644 index 000000000..f8cdc73cb --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md @@ -0,0 +1,11 @@ +# Welcome to your Jupyter Book + +This is a small sample book to give you a feel for how book content is +structured. +It shows off a few of the major file types, as well as some sample content. +It does not go in-depth into any particular topic - check out [the Jupyter Book documentation](https://jupyterbook.org) for more information. + +Check out the content pages bundled with this sample book to see more. + +```{tableofcontents} +``` diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md new file mode 100644 index 000000000..a057a320d --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md @@ -0,0 +1,53 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Notebooks with MyST Markdown + +Jupyter Book also lets you write text-based notebooks using MyST Markdown. +See [the Notebooks with MyST Markdown documentation](https://jupyterbook.org/file-types/myst-notebooks.html) for more detailed instructions. +This page shows off a notebook written in MyST Markdown. + +## An example cell + +With MyST Markdown, you can define code cells with a directive like so: + +```{code-cell} +print(2 + 2) +``` + +When your book is built, the contents of any `{code-cell}` blocks will be +executed with your default Jupyter kernel, and their outputs will be displayed +in-line with the rest of your content. + +```{seealso} +Jupyter Book uses [Jupytext](https://jupytext.readthedocs.io/en/latest/) to convert text-based files to notebooks, and can support [many other text-based notebook files](https://jupyterbook.org/file-types/jupytext.html). +``` + +## Create a notebook with MyST Markdown + +MyST Markdown notebooks are defined by two things: + +1. YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed). + See the YAML at the top of this page for example. +2. The presence of `{code-cell}` directives, which will be executed with your book. + +That's all that is needed to get started! + +## Quickly add YAML metadata for MyST Notebooks + +If you have a markdown file and you'd like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command: + +``` +jupyter-book myst init path/to/markdownfile.md +``` diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md new file mode 100644 index 000000000..faeea6061 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md @@ -0,0 +1,55 @@ +# Markdown Files + +Whether you write your book's content in Jupyter Notebooks (`.ipynb`) or +in regular markdown files (`.md`), you'll write in the same flavor of markdown +called **MyST Markdown**. +This is a simple file to help you get started and show off some syntax. + +## What is MyST? + +MyST stands for "Markedly Structured Text". It +is a slight variation on a flavor of markdown called "CommonMark" markdown, +with small syntax extensions to allow you to write **roles** and **directives** +in the Sphinx ecosystem. + +For more about MyST, see [the MyST Markdown Overview](https://jupyterbook.org/content/myst.html). + +## Sample Roles and Directives + +Roles and directives are two of the most powerful tools in Jupyter Book. They +are like functions, but written in a markup language. They both +serve a similar purpose, but **roles are written in one line**, whereas +**directives span many lines**. They both accept different kinds of inputs, +and what they do with those inputs depends on the specific role or directive +that is being called. + +Here is a "note" directive: + +```{note} +Here is a note +``` + +It will be rendered in a special box when you build your book. + +Here is an inline directive to refer to a document: {doc}`markdown-notebooks`. + + +## Citations + +You can also cite references that are stored in a `bibtex` file. For example, +the following syntax: `` {cite}`holdgraf_evidence_2014` `` will render like +this: {cite}`holdgraf_evidence_2014`. + +Moreover, you can insert a bibliography into your page with this syntax: +The `{bibliography}` directive must be used for all the `{cite}` roles to +render properly. +For example, if the references for your book are stored in `references.bib`, +then the bibliography is inserted with: + +```{bibliography} +``` + +## Learn more + +This is just a simple starter to get you started. +You can learn a lot more at [jupyterbook.org](https://jupyterbook.org). diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb new file mode 100644 index 000000000..fdb7176c4 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb @@ -0,0 +1,122 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Content with notebooks\n", + "\n", + "You can also create content with Jupyter Notebooks. This means that you can include\n", + "code blocks and their outputs in your book.\n", + "\n", + "## Markdown + notebooks\n", + "\n", + "As it is markdown, you can embed images, HTML, etc into your posts!\n", + "\n", + "![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)\n", + "\n", + "You can also $add_{math}$ and\n", + "\n", + "$$\n", + "math^{blocks}\n", + "$$\n", + "\n", + "or\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "\\mbox{mean} la_{tex} \\\\ \\\\\n", + "math blocks\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "But make sure you \\$Escape \\$your \\$dollar signs \\$you want to keep!\n", + "\n", + "## MyST markdown\n", + "\n", + "MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check\n", + "out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),\n", + "or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).\n", + "\n", + "## Code blocks and outputs\n", + "\n", + "Jupyter Book will also embed your code blocks and output in your book.\n", + "For example, here's some sample Matplotlib code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from matplotlib import rcParams, cycler\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "plt.ion()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Fixing random state for reproducibility\n", + "np.random.seed(19680801)\n", + "\n", + "N = 10\n", + "data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]\n", + "data = np.array(data).T\n", + "cmap = plt.cm.coolwarm\n", + "rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))\n", + "\n", + "\n", + "from matplotlib.lines import Line2D\n", + "custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),\n", + " Line2D([0], [0], color=cmap(.5), lw=4),\n", + " Line2D([0], [0], color=cmap(1.), lw=4)]\n", + "\n", + "fig, ax = plt.subplots(figsize=(10, 5))\n", + "lines = ax.plot(data)\n", + "ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is a lot more that you can do with outputs (such as including interactive outputs)\n", + "with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst new file mode 100644 index 000000000..a9d662da7 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst @@ -0,0 +1,26 @@ +Main authors: + +* David Eppstein + + - wrote the original LaTeX codec as a recipe on ActiveState + http://code.activestate.com/recipes/252124-latex-codec/ + +* Peter Tröger + + - wrote the original latexcodec package, which contained a simple + but very effective LaTeX encoder + +* Matthias Troffaes (matthias.troffaes@gmail.com) + + - wrote the lexer + + - integrated codec with the lexer for a simpler and more robust + design + + - various bugfixes + +Contributors: + +* Michael Radziej + +* Philipp Spitzer diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst new file mode 100644 index 000000000..a7dbb5e82 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst @@ -0,0 +1,23 @@ +| latexcodec is a lexer and codec to work with LaTeX code in Python +| Copyright (c) 2011-2020 by Matthias C. M. Troffaes + +Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation +files (the "Software"), to deal in the Software without +restriction, including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following +conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md new file mode 100644 index 000000000..03868d78b --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md @@ -0,0 +1,95 @@ +# markdown-it-container + +[![Build Status](https://img.shields.io/travis/markdown-it/markdown-it-container/master.svg?style=flat)](https://travis-ci.org/markdown-it/markdown-it-container) +[![NPM version](https://img.shields.io/npm/v/markdown-it-container.svg?style=flat)](https://www.npmjs.org/package/markdown-it-container) +[![Coverage Status](https://img.shields.io/coveralls/markdown-it/markdown-it-container/master.svg?style=flat)](https://coveralls.io/r/markdown-it/markdown-it-container?branch=master) + +> Plugin for creating block-level custom containers for [markdown-it](https://github.com/markdown-it/markdown-it) markdown parser. + +__v2.+ requires `markdown-it` v5.+, see changelog.__ + +With this plugin you can create block containers like: + +``` +::: warning +*here be dragons* +::: +``` + +.... and specify how they should be rendered. If no renderer defined, `
` with +container name class will be created: + +```html +
+here be dragons +
+``` + +Markup is the same as for [fenced code blocks](http://spec.commonmark.org/0.18/#fenced-code-blocks). +Difference is, that marker use another character and content is rendered as markdown markup. + + +## Installation + +node.js, browser: + +```bash +$ npm install markdown-it-container --save +$ bower install markdown-it-container --save +``` + + +## API + +```js +var md = require('markdown-it')() + .use(require('markdown-it-container'), name [, options]); +``` + +Params: + +- __name__ - container name (mandatory) +- __options:__ + - __validate__ - optional, function to validate tail after opening marker, should + return `true` on success. + - __render__ - optional, renderer function for opening/closing tokens. + - __marker__ - optional (`:`), character to use in delimiter. + + +## Example + +```js +var md = require('markdown-it')(); + +md.use(require('markdown-it-container'), 'spoiler', { + + validate: function(params) { + return params.trim().match(/^spoiler\s+(.*)$/); + }, + + render: function (tokens, idx) { + var m = tokens[idx].info.trim().match(/^spoiler\s+(.*)$/); + + if (tokens[idx].nesting === 1) { + // opening tag + return '
' + md.utils.escapeHtml(m[1]) + '\n'; + + } else { + // closing tag + return '
\n'; + } + } +}); + +console.log(md.render('::: spoiler click me\n*content*\n:::\n')); + +// Output: +// +//
click me +//

content

+//
+``` + +## License + +[MIT](https://github.com/markdown-it/markdown-it-container/blob/master/LICENSE) diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md new file mode 100644 index 000000000..414157bcc --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md @@ -0,0 +1,38 @@ +# markdown-it-deflist + +[![Build Status](https://img.shields.io/travis/markdown-it/markdown-it-deflist/master.svg?style=flat)](https://travis-ci.org/markdown-it/markdown-it-deflist) +[![NPM version](https://img.shields.io/npm/v/markdown-it-deflist.svg?style=flat)](https://www.npmjs.org/package/markdown-it-deflist) +[![Coverage Status](https://img.shields.io/coveralls/markdown-it/markdown-it-deflist/master.svg?style=flat)](https://coveralls.io/r/markdown-it/markdown-it-deflist?branch=master) + +> Definition list (`
`) tag plugin for [markdown-it](https://github.com/markdown-it/markdown-it) markdown parser. + +__v2.+ requires `markdown-it` v5.+, see changelog.__ + +Syntax is based on [pandoc definition lists](http://johnmacfarlane.net/pandoc/README.html#definition-lists). + + +## Install + +node.js, browser: + +```bash +npm install markdown-it-deflist --save +bower install markdown-it-deflist --save +``` + +## Use + +```js +var md = require('markdown-it')() + .use(require('markdown-it-deflist')); + +md.render(/*...*/); +``` + +_Differences in browser._ If you load script directly into the page, without +package system, module will add itself globally as `window.markdownitDeflist`. + + +## License + +[MIT](https://github.com/markdown-it/markdown-it-deflist/blob/master/LICENSE) diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md new file mode 100644 index 000000000..f79f33563 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md @@ -0,0 +1,137 @@ +[![License](https://img.shields.io/github/license/goessner/markdown-it-texmath.svg)](https://github.com/goessner/markdown-it-texmath/blob/master/licence.txt) +[![npm](https://img.shields.io/npm/v/markdown-it-texmath.svg)](https://www.npmjs.com/package/markdown-it-texmath) +[![npm](https://img.shields.io/npm/dt/markdown-it-texmath.svg)](https://www.npmjs.com/package/markdown-it-texmath) + +# markdown-it-texmath + +Add TeX math equations to your Markdown documents rendered by [markdown-it](https://github.com/markdown-it/markdown-it) parser. [KaTeX](https://github.com/Khan/KaTeX) is used as a fast math renderer. + +## Features +Simplify the process of authoring markdown documents containing math formulas. +This extension is a comfortable tool for scientists, engineers and students with markdown as their first choice document format. + +* Macro support +* Simple formula numbering +* Inline math with tables, lists and blockquote. +* User setting delimiters: + * `'dollars'` (default) + * inline: `$...$` + * display: `$$...$$` + * display + equation number: `$$...$$ (1)` + * `'brackets'` + * inline: `\(...\)` + * display: `\[...\]` + * display + equation number: `\[...\] (1)` + * `'gitlab'` + * inline: ``$`...`$`` + * display: `` ```math ... ``` `` + * display + equation number: `` ```math ... ``` (1)`` + * `'julia'` + * inline: `$...$` or ``` ``...`` ``` + * display: `` ```math ... ``` `` + * display + equation number: `` ```math ... ``` (1)`` + * `'kramdown'` + * inline: ``$$...$$`` + * display: `$$...$$` + * display + equation number: `$$...$$ (1)` + +## Show me + +View a [test table](https://goessner.github.io/markdown-it-texmath/index.html). + +[try it out ...](https://goessner.github.io/markdown-it-texmath/markdown-it-texmath-demo.html) + +## Use with `node.js` + +Install the extension. Verify having `markdown-it` and `katex` already installed . +``` +npm install markdown-it-texmath +``` +Use it with JavaScript. +```js +let kt = require('katex'), + tm = require('markdown-it-texmath').use(kt), + md = require('markdown-it')().use(tm,{delimiters:'dollars',macros:{"\\RR": "\\mathbb{R}"}}); + +md.render('Euler\'s identity \(e^{i\pi}+1=0\) is a beautiful formula in $\\RR 2$.') +``` + +## Use in Browser +```html + + + + + + + + + + +
+ + + +``` +## CDN + +Use following links for `texmath.js` and `texmath.css` +* `https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.js` +* `https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.css` + +## Dependencies + +* [`markdown-it`](https://github.com/markdown-it/markdown-it): Markdown parser done right. Fast and easy to extend. +* [`katex`](https://github.com/Khan/KaTeX): This is where credits for fast rendering TeX math in HTML go to. + +## ToDo + + nothing yet + +## FAQ + +* __`markdown-it-texmath` with React Native does not work, why ?__ + * `markdown-it-texmath` is using regular expressions with `y` [(sticky) property](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky) and cannot avoid this. The use of the `y` flag in regular expressions means the plugin is not compatible with React Native (which as of now doesn't support it and throws an error `Invalid flags supplied to RegExp constructor`). + +## CHANGELOG + +### [0.6.0] on October 04, 2019 +* Add support for [Julia Markdown](https://docs.julialang.org/en/v1/stdlib/Markdown/) on [request](https://github.com/goessner/markdown-it-texmath/issues/15). + +### [0.5.5] on February 07, 2019 +* Remove [rendering bug with brackets delimiters](https://github.com/goessner/markdown-it-texmath/issues/9). + +### [0.5.4] on January 20, 2019 +* Remove pathological [bug within blockquotes](https://github.com/goessner/mdmath/issues/50). + +### [0.5.3] on November 11, 2018 +* Add support for Tex macros (https://katex.org/docs/supported.html#macros) . +* Bug with [brackets delimiters](https://github.com/goessner/markdown-it-texmath/issues/9) . + +### [0.5.2] on September 07, 2018 +* Add support for [Kramdown](https://kramdown.gettalong.org/) . + +### [0.5.0] on August 15, 2018 +* Fatal blockquote bug investigated. Implemented workaround to vscode bug, which has finally gone with vscode 1.26.0 . + +### [0.4.6] on January 05, 2018 +* Escaped underscore bug removed. + +### [0.4.5] on November 06, 2017 +* Backslash bug removed. + +### [0.4.4] on September 27, 2017 +* Modifying the `block` mode regular expression with `gitlab` delimiters, so removing the `newline` bug. + +## License + +`markdown-it-texmath` is licensed under the [MIT License](./license.txt) + + © [Stefan Gössner](https://github.com/goessner) diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst new file mode 100644 index 000000000..cd1010329 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst @@ -0,0 +1,236 @@ +================================== +A guide to masked arrays in NumPy +================================== + +.. Contents:: + +See http://www.scipy.org/scipy/numpy/wiki/MaskedArray (dead link) +for updates of this document. + + +History +------- + +As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became +increasingly frustrated with the subclassing of masked arrays (even if +I can only blame my inexperience). I needed to develop a class of arrays +that could store some additional information along with numerical values, +while keeping the possibility for missing data (picture storing a series +of dates along with measurements, what would later become the `TimeSeries +Scikit `__ +(dead link). + +I started to implement such a class, but then quickly realized that +any additional information disappeared when processing these subarrays +(for example, adding a constant value to a subarray would erase its +dates). I ended up writing the equivalent of *numpy.core.ma* for my +particular class, ufuncs included. Everything went fine until I needed to +subclass my new class, when more problems showed up: some attributes of +the new subclass were lost during processing. I identified the culprit as +MaskedArray, which returns masked ndarrays when I expected masked +arrays of my class. I was preparing myself to rewrite *numpy.core.ma* +when I forced myself to learn how to subclass ndarrays. As I became more +familiar with the *__new__* and *__array_finalize__* methods, +I started to wonder why masked arrays were objects, and not ndarrays, +and whether it wouldn't be more convenient for subclassing if they did +behave like regular ndarrays. + +The new *maskedarray* is what I eventually come up with. The +main differences with the initial *numpy.core.ma* package are +that MaskedArray is now a subclass of *ndarray* and that the +*_data* section can now be any subclass of *ndarray*. Apart from a +couple of issues listed below, the behavior of the new MaskedArray +class reproduces the old one. Initially the *maskedarray* +implementation was marginally slower than *numpy.ma* in some areas, +but work is underway to speed it up; the expectation is that it can be +made substantially faster than the present *numpy.ma*. + + +Note that if the subclass has some special methods and +attributes, they are not propagated to the masked version: +this would require a modification of the *__getattribute__* +method (first trying *ndarray.__getattribute__*, then trying +*self._data.__getattribute__* if an exception is raised in the first +place), which really slows things down. + +Main differences +---------------- + + * The *_data* part of the masked array can be any subclass of ndarray (but not recarray, cf below). + * *fill_value* is now a property, not a function. + * in the majority of cases, the mask is forced to *nomask* when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled. + * I got rid of the *share_mask* flag, I never understood its purpose. + * *put*, *putmask* and *take* now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, *put* and *putmask* both update the mask when needed. * if *a* is a masked array, *bool(a)* raises a *ValueError*, as it does with ndarrays. + * in the same way, the comparison of two masked arrays is a masked array, not a boolean + * *filled(a)* returns an array of the same subclass as *a._data*, and no test is performed on whether it is contiguous or not. + * the mask is always printed, even if it's *nomask*, which makes things easy (for me at least) to remember that a masked array is used. + * *cumsum* works as if the *_data* array was filled with 0. The mask is preserved, but not updated. + * *cumprod* works as if the *_data* array was filled with 1. The mask is preserved, but not updated. + +New features +------------ + +This list is non-exhaustive... + + * the *mr_* function mimics *r_* for masked arrays. + * the *anom* method returns the anomalies (deviations from the average) + +Using the new package with numpy.core.ma +---------------------------------------- + +I tried to make sure that the new package can understand old masked +arrays. Unfortunately, there's no upward compatibility. + +For example: + +>>> import numpy.core.ma as old_ma +>>> import maskedarray as new_ma +>>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) +>>> x +array(data = + [ 1 2 999999 4 5], + mask = + [False False True False False], + fill_value=999999) +>>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) +>>> y +array(data = [1 2 -- 4 5], + mask = [False False True False False], + fill_value=999999) +>>> x==y +array(data = + [True True True True True], + mask = + [False False True False False], + fill_value=?) +>>> old_ma.getmask(x) == new_ma.getmask(x) +array([True, True, True, True, True]) +>>> old_ma.getmask(y) == new_ma.getmask(y) +array([True, True, False, True, True]) +>>> old_ma.getmask(y) +False + + +Using maskedarray with matplotlib +--------------------------------- + +Starting with matplotlib 0.91.2, the masked array importing will work with +the maskedarray branch) as well as with earlier versions. + +By default matplotlib still uses numpy.ma, but there is an rcParams setting +that you can use to select maskedarray instead. In the matplotlibrc file +you will find:: + + #maskedarray : False # True to use external maskedarray module + # instead of numpy.ma; this is a temporary # + setting for testing maskedarray. + + +Uncomment and set to True to select maskedarray everywhere. +Alternatively, you can test a script with maskedarray by using a +command-line option, e.g.:: + + python simple_plot.py --maskedarray + + +Masked records +-------------- + +Like *numpy.ma.core*, the *ndarray*-based implementation +of MaskedArray is limited when working with records: you can +mask any record of the array, but not a field in a record. If you +need this feature, you may want to give the *mrecords* package +a try (available in the *maskedarray* directory in the scipy +sandbox). This module defines a new class, *MaskedRecord*. An +instance of this class accepts a *recarray* as data, and uses two +masks: the *fieldmask* has as many entries as records in the array, +each entry with the same fields as a record, but of boolean types: +they indicate whether the field is masked or not; a record entry +is flagged as masked in the *mask* array if all the fields are +masked. A few examples in the file should give you an idea of what +can be done. Note that *mrecords* is still experimental... + +Optimizing maskedarray +---------------------- + +Should masked arrays be filled before processing or not? +-------------------------------------------------------- + +In the current implementation, most operations on masked arrays involve +the following steps: + + * the input arrays are filled + * the operation is performed on the filled arrays + * the mask is set for the results, from the combination of the input masks and the mask corresponding to the domain of the operation. + +For example, consider the division of two masked arrays:: + + import numpy + import maskedarray as ma + x = ma.array([1,2,3,4],mask=[1,0,0,0], dtype=numpy.float64) + y = ma.array([-1,0,1,2], mask=[0,0,0,1], dtype=numpy.float64) + +The division of x by y is then computed as:: + + d1 = x.filled(0) # d1 = array([0., 2., 3., 4.]) + d2 = y.filled(1) # array([-1., 0., 1., 1.]) + m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m = + array([True,False,False,True]) + dm = ma.divide.domain(d1,d2) # array([False, True, False, False]) + result = (d1/d2).view(MaskedArray) # masked_array([-0. inf, 3., 4.]) + result._mask = logical_or(m, dm) + +Note that a division by zero takes place. To avoid it, we can consider +to fill the input arrays, taking the domain mask into account, so that:: + + d1 = x._data.copy() # d1 = array([1., 2., 3., 4.]) + d2 = y._data.copy() # array([-1., 0., 1., 2.]) + dm = ma.divide.domain(d1,d2) # array([False, True, False, False]) + numpy.putmask(d2, dm, 1) # d2 = array([-1., 1., 1., 2.]) + m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m = + array([True,False,False,True]) + result = (d1/d2).view(MaskedArray) # masked_array([-1. 0., 3., 2.]) + result._mask = logical_or(m, dm) + +Note that the *.copy()* is required to avoid updating the inputs with +*putmask*. The *.filled()* method also involves a *.copy()*. + +A third possibility consists in avoid filling the arrays:: + + d1 = x._data # d1 = array([1., 2., 3., 4.]) + d2 = y._data # array([-1., 0., 1., 2.]) + dm = ma.divide.domain(d1,d2) # array([False, True, False, False]) + m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m = + array([True,False,False,True]) + result = (d1/d2).view(MaskedArray) # masked_array([-1. inf, 3., 2.]) + result._mask = logical_or(m, dm) + +Note that here again the division by zero takes place. + +A quick benchmark gives the following results: + + * *numpy.ma.divide* : 2.69 ms per loop + * classical division : 2.21 ms per loop + * division w/ prefilling : 2.34 ms per loop + * division w/o filling : 1.55 ms per loop + +So, is it worth filling the arrays beforehand ? Yes, if we are interested +in avoiding floating-point exceptions that may fill the result with infs +and nans. No, if we are only interested into speed... + + +Thanks +------ + +I'd like to thank Paul Dubois, Travis Oliphant and Sasha for the +original masked array package: without you, I would never have started +that (it might be argued that I shouldn't have anyway, but that's +another story...). I also wish to extend these thanks to Reggie Dugard +and Eric Firing for their suggestions and numerous improvements. + + +Revision notes +-------------- + + * 08/25/2007 : Creation of this page + * 01/23/2007 : The package has been moved to the SciPy sandbox, and is regularly updated: please check out your SVN version! diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md new file mode 100644 index 000000000..a6cf1b17e --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md @@ -0,0 +1,71 @@ +**This software is dual-licensed under the The University of Illinois/NCSA +Open Source License (NCSA) and The 3-Clause BSD License** + +# NCSA Open Source License +**Copyright (c) 2019 Kevin Sheppard. All rights reserved.** + +Developed by: Kevin Sheppard (, +) +[http://www.kevinsheppard.com](http://www.kevinsheppard.com) + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal with +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +Redistributions of source code must retain the above copyright notice, this +list of conditions and the following disclaimers. + +Redistributions in binary form must reproduce the above copyright notice, this +list of conditions and the following disclaimers in the documentation and/or +other materials provided with the distribution. + +Neither the names of Kevin Sheppard, nor the names of any contributors may be +used to endorse or promote products derived from this Software without specific +prior written permission. + +**THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH +THE SOFTWARE.** + + +# 3-Clause BSD License +**Copyright (c) 2019 Kevin Sheppard. All rights reserved.** + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its contributors + may be used to endorse or promote products derived from this software + without specific prior written permission. + +**THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF +THE POSSIBILITY OF SUCH DAMAGE.** + +# Components + +Many parts of this module have been derived from original sources, +often the algorithm's designer. Component licenses are located with +the component code. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md new file mode 100644 index 000000000..19b6b4524 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md @@ -0,0 +1,31 @@ +BSD 3-Clause License + +Copyright (c) 2013-2024, Kim Davies and contributors. +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst new file mode 100644 index 000000000..f7c8f60f4 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst @@ -0,0 +1,11 @@ +Authors +======= + +Creator +------- +Jonathan Slenders + +Contributors +------------ + +- Amjith Ramanujam diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst new file mode 100644 index 000000000..aff5c5aa2 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst @@ -0,0 +1,23 @@ +| pybtex-docutils is a docutils backend for pybtex +| Copyright (c) 2013-2021 by Matthias C. M. Troffaes + +Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation +files (the "Software"), to deal in the Software without +restriction, including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following +conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md new file mode 100644 index 000000000..f7072d1c9 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md @@ -0,0 +1,30 @@ +BSD 3-Clause License + +Copyright (c) 2009-2012, Brian Granger, Min Ragan-Kelley + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +3. Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md new file mode 100644 index 000000000..20528a992 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2018 - 2025 Isaac Muse + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst new file mode 100644 index 000000000..db36b190c --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst @@ -0,0 +1,67 @@ +License for Sphinx +================== + +Unless otherwise indicated, all code in the Sphinx project is licenced under the +two clause BSD licence below. + +Copyright (c) 2007-2024 by the Sphinx team (see AUTHORS file). +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +Licenses for incorporated software +================================== + +The included implementation of NumpyDocstring._parse_numpydoc_see_also_section +was derived from code under the following license: + +------------------------------------------------------------------------------- + +Copyright (C) 2008 Stefan van der Walt , Pauli Virtanen + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + +THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR +IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, +INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, +STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING +IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +------------------------------------------------------------------------------- diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md new file mode 100644 index 000000000..9e7e1deb5 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md @@ -0,0 +1,53 @@ +# Translation workflow + +This folder contains code and translations for supporting multiple languages with Sphinx. +See [the Sphinx internationalization documentation](https://www.sphinx-doc.org/en/master/usage/configuration.html) for more details. + +## Structure of translation files + +### Translation source files + +The source files for our translations are hand-edited, and contain the raw mapping of words onto various languages. +They are checked in to `git` history with this repository. + +`src/sphinx_book_theme/assets/translations/jsons` contains a collection of JSON files that define the translation for various phrases in this repository. +Each file is a different phrase, and its contents define language codes and translated phrases for each language we support. +They were originally created with [the smodin.io language translator](https://smodin.me/translate-one-text-into-multiple-languages) (see below for how to update them). + +### Compiled translation files + +The translation source files are compiled at build time (when we run `stb compile`) automatically. +This is executed by the Python script at `python src/sphinx_book_theme/_compile_translations.py` (more information on that below). + +These compiled files are **not checked into `.git` history**, but they **are** bundled with the theme when it is distributed in a package. +Here's a brief explanation of each: + +- `src/sphinx_book_theme/theme/sphinx_book_theme/static/locales` contains Sphinx locale files that were auto-converted from the files in `jsons/` by the helper script below. +- `src/sphinx_book_theme/_compile_translations.py` is a helper script to auto-generate Sphinx locale files from the JSONs in `jsons/`. + +## Workflow of translations + +Here's a short workflow of how to add a new translation, assuming that you are translating using the [smodin.io service](https://smodin.io/translate-one-text-into-multiple-languages). + +1. Go to [the smodin.io service](https://smodin.io/translate-one-text-into-multiple-languages) +2. Select as many languages as you like. +3. Type in the phrase you'd like to translate. +4. Click `TRANSLATE` and then `Download JSON`. +5. This will download a JSON file with a bunch of `language-code: translated-phrase` mappings. +6. Put this JSON in the `jsons/` folder, and rename it to be the phrase you've translated in English. + So if the original phrase is `My phrase`, you should name the file `My phrase.json`. +7. Run [the `prettier` formatter](https://prettier.io/) on this JSON to split it into multiple lines (this makes it easier to read and edit if translations should be updated) + + ```bash + prettier sphinx_book_theme/translations/jsons/.json + ``` + +8. Run `python src/sphinx_book_theme/_compile_translations.py` +9. This will generate the locale files (`.mo`) that Sphinx uses in its translation machinery, and put them in `locales//LC_MESSAGES/.mo`. + +Sphinx should now know how to translate this message! + +## To update a translation + +To update a translation, you may go to the phase you'd like to modify in `jsons/`, then find the entry for the language you'd like to update, and change its value. +Finally, run `python src/sphinx_book_theme/_compile_translations.py` and this will update the `.mo` files. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst new file mode 100644 index 000000000..fb3f16937 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst @@ -0,0 +1,26 @@ +| sphinxcontrib-bibtex is a Sphinx extension for BibTeX style citations +| Copyright (c) 2011-2024 by Matthias C. M. Troffaes +| All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md new file mode 100644 index 000000000..00bb32989 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md @@ -0,0 +1 @@ +PyZMQ's CFFI support is designed only for (Unix) systems conforming to `have_sys_un_h = True`. diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb index 886db99ef..403eab1f3 100644 --- a/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb @@ -323,7 +323,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb index ddf3e11e5..3dd1ad167 100644 --- a/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb @@ -172,7 +172,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb index 25296c4e0..bb6ba7a35 100644 --- a/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb @@ -2,32 +2,33 @@ "cells": [ { "cell_type": "markdown", - "id": "1b941c35", + "id": "8e6632a0", "metadata": { "editable": true }, "source": [ "\n", - "" + "\n" ] }, { "cell_type": "markdown", - "id": "dc05b096", + "id": "82705c4f", "metadata": { "editable": true }, "source": [ "# Exercises week 37\n", + "\n", "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n", "\n", - "Date: **September 8-12, 2025**" + "Date: **September 8-12, 2025**\n" ] }, { "cell_type": "markdown", - "id": "2cf07405", + "id": "921bf331", "metadata": { "editable": true }, @@ -35,55 +36,56 @@ "## Learning goals\n", "\n", "After having completed these exercises you will have:\n", + "\n", "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n", "\n", "2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n", "\n", "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n", "\n", - "4. Scale the data properly" + "4. Scale the data properly\n" ] }, { "cell_type": "markdown", - "id": "3c139edb", + "id": "adff65d5", "metadata": { "editable": true }, "source": [ "## Simple one-dimensional second-order polynomial\n", "\n", - "We start with a very simple function" + "We start with a very simple function\n" ] }, { "cell_type": "markdown", - "id": "aad4cfac", + "id": "70418b3d", "metadata": { "editable": true }, "source": [ "$$\n", "f(x)= 2-x+5x^2,\n", - "$$" + "$$\n" ] }, { "cell_type": "markdown", - "id": "6682282f", + "id": "11a3cf73", "metadata": { "editable": true }, "source": [ - "defined for $x\\in [-2,2]$. You can add noise if you wish. \n", + "defined for $x\\in [-2,2]$. You can add noise if you wish.\n", "\n", "We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n", - "Feel free to play around with higher-order polynomials." + "Feel free to play around with higher-order polynomials.\n" ] }, { "cell_type": "markdown", - "id": "89e2f4c4", + "id": "04a06b51", "metadata": { "editable": true }, @@ -94,12 +96,12 @@ "standardize the features. This ensures all features are on a\n", "comparable scale, which is especially important when using\n", "regularization. Here we will perform standardization, scaling each\n", - "feature to have mean 0 and standard deviation 1." + "feature to have mean 0 and standard deviation 1.\n" ] }, { "cell_type": "markdown", - "id": "b06d4e53", + "id": "408db3d9", "metadata": { "editable": true }, @@ -114,13 +116,13 @@ "term, the data is shifted such that the intercept is effectively 0\n", ". (In practice, one could include an intercept in the model and not\n", "penalize it, but here we simplify by centering.)\n", - "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + "Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n" ] }, { "cell_type": "code", "execution_count": 1, - "id": "63796480", + "id": "37fb732c", "metadata": { "collapsed": false, "editable": true @@ -140,46 +142,46 @@ }, { "cell_type": "markdown", - "id": "80748600", + "id": "d861e1e3", "metadata": { "editable": true }, "source": [ - "Fill in the necessary details.\n", + "Fill in the necessary details. Do we need to center the $y$-values?\n", "\n", "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n", "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", - "same scale)." + "same scale).\n" ] }, { "cell_type": "markdown", - "id": "92751e5f", + "id": "b3e774d0", "metadata": { "editable": true }, "source": [ "## Exercise 2, calculate the gradients\n", "\n", - "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function." + "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n" ] }, { "cell_type": "markdown", - "id": "aedfbd7a", + "id": "d5dc7708", "metadata": { "editable": true }, "source": [ - "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$" + "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n" ] }, { "cell_type": "code", "execution_count": 2, - "id": "5d1288fa", + "id": "4c9c86ac", "metadata": { "collapsed": false, "editable": true @@ -187,7 +189,9 @@ "outputs": [], "source": [ "# Set regularization parameter, either a single value or a vector of values\n", - "lambda = ?\n", + "# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called \"anonymous functions\" or \"lambda functions.\"\n", + "lam = ?\n", + "\n", "\n", "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n", "I = np.eye(n_features)\n", @@ -200,7 +204,7 @@ }, { "cell_type": "markdown", - "id": "628f5e89", + "id": "eeae00fd", "metadata": { "editable": true }, @@ -208,37 +212,37 @@ "This computes the Ridge and OLS regression coefficients directly. The identity\n", "matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n", "then invert this matrix and multiply by $X^T y$. The result\n", - "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n", - "fitted parameters $\\boldsymbol{\\theta}$." + "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n", + "fitted parameters $\\boldsymbol{\\theta}$.\n" ] }, { "cell_type": "markdown", - "id": "f115ba4e", + "id": "e1c215d5", "metadata": { "editable": true }, "source": [ "### 3a)\n", "\n", - "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$." + "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n" ] }, { "cell_type": "markdown", - "id": "a9b5189c", + "id": "587dd3dc", "metadata": { "editable": true }, "source": [ "### 3b)\n", "\n", - "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36." + "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n" ] }, { "cell_type": "markdown", - "id": "a3969ff6", + "id": "bfa34697", "metadata": { "editable": true }, @@ -250,15 +254,15 @@ "necessary if $n$ and $p$ are so large that the closed-form might be\n", "too slow or memory-intensive. We derive the gradients from the cost\n", "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n", - "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n", + "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n", "\n", - "Below is a template code for gradient descent implementation of ridge:" + "Below is a template code for gradient descent implementation of ridge:\n" ] }, { "cell_type": "code", "execution_count": 3, - "id": "34d87303", + "id": "49245f55", "metadata": { "collapsed": false, "editable": true @@ -273,19 +277,8 @@ "# Initialize weights for gradient descent\n", "theta = np.zeros(n_features)\n", "\n", - "# Arrays to store history for plotting\n", - "cost_history = np.zeros(num_iters)\n", - "\n", "# Gradient descent loop\n", - "m = n_samples # number of data points\n", "for t in range(num_iters):\n", - " # Compute prediction error\n", - " error = X_norm.dot(theta) - y_centered \n", - " # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n", - " cost_OLS = ?\n", - " cost_Ridge = ?\n", - " # You could add a history for both methods (optional)\n", - " cost_history[t] = ?\n", " # Compute gradients for OSL and Ridge\n", " grad_OLS = ?\n", " grad_Ridge = ?\n", @@ -302,31 +295,33 @@ }, { "cell_type": "markdown", - "id": "989f70bb", + "id": "f3f43f2c", "metadata": { "editable": true }, "source": [ "### 4a)\n", "\n", - "Discuss the results as function of the learning rate parameters and the number of iterations." + "Write first a gradient descent code for OLS only using the above template.\n", + "Discuss the results as function of the learning rate parameters and the number of iterations\n" ] }, { "cell_type": "markdown", - "id": "370b2dad", + "id": "9ba303be", "metadata": { "editable": true }, "source": [ "### 4b)\n", "\n", - "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?" + "Write then a similar code for Ridge regression using the above template.\n", + "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n" ] }, { "cell_type": "markdown", - "id": "ef197cd7", + "id": "78362c6c", "metadata": { "editable": true }, @@ -346,13 +341,13 @@ "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n", "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n", "\n", - "Below is the code to generate the dataset:" + "Below is the code to generate the dataset:\n" ] }, { "cell_type": "code", - "execution_count": 4, - "id": "4ccc2f65", + "execution_count": null, + "id": "8be1cebe", "metadata": { "collapsed": false, "editable": true @@ -375,13 +370,13 @@ "X = np.random.randn(n_samples, n_features) # standard normal distribution\n", "\n", "# Generate target values y with a linear combination of X and theta_true, plus noise\n", - "noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n", + "noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n", "y = X.dot @ theta_true + noise" ] }, { "cell_type": "markdown", - "id": "00e279ef", + "id": "e2693666", "metadata": { "editable": true }, @@ -390,29 +385,29 @@ "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n", "coefficient. For example, feature 0 has\n", "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n", - "the expected relationship is:" + "the expected relationship is:\n" ] }, { "cell_type": "markdown", - "id": "c910b3f4", + "id": "bc954d12", "metadata": { "editable": true }, "source": [ "$$\n", "y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n", - "$$" + "$$\n" ] }, { "cell_type": "markdown", - "id": "89e6e040", + "id": "6534b610", "metadata": { "editable": true }, "source": [ - "You can remove the noise if you wish to. \n", + "You can remove the noise if you wish to.\n", "\n", "Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n", "\n", @@ -420,11 +415,15 @@ "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n", "generate the data. Keep in mind that due to regularization and noise,\n", "the learned values will not exactly equal the true ones, but they\n", - "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?" + "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n" ] } ], - "metadata": {}, + "metadata": { + "language_info": { + "name": "python" + } + }, "nbformat": 4, "nbformat_minor": 5 } diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek38.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek38.ipynb new file mode 100644 index 000000000..c100028a5 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek38.ipynb @@ -0,0 +1,485 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1da77599", + "metadata": {}, + "source": [ + "# Exercises week 38\n", + "\n", + "## September 15-19\n", + "\n", + "## Resampling and the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "e9f27b0e", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Derive expectation and variances values related to linear regression\n", + "- Compute expectation and variances values related to linear regression\n", + "- Compute and evaluate the trade-off between bias and variance of a model\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n", + "\n", + "- The jupyter notebook with the exercises completed\n", + "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n" + ] + }, + { + "cell_type": "markdown", + "id": "984af8e3", + "metadata": {}, + "source": [ + "## Use the books!\n", + "\n", + "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n", + "\n", + "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n", + "\n", + "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n", + "\n", + "### Definitions\n", + "\n", + "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n" + ] + }, + { + "cell_type": "markdown", + "id": "c16f7d0e", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "9fcf981a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d4189366", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4fca21b", + "metadata": {}, + "source": [ + "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "5de0c7e6", + "metadata": {}, + "source": [ + "## Exercise 1: Expectation values for ordinary least squares expressions\n" + ] + }, + { + "cell_type": "markdown", + "id": "d878c699", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "08b7007d", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "46e93394", + "metadata": {}, + "source": [ + "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n" + ] + }, + { + "cell_type": "markdown", + "id": "be1b65be", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "d2143684", + "metadata": {}, + "source": [ + "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n", + "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n" + ] + }, + { + "cell_type": "markdown", + "id": "f5c2dc22", + "metadata": {}, + "source": [ + "## Exercise 2: Expectation values for Ridge regression\n" + ] + }, + { + "cell_type": "markdown", + "id": "3893e3e7", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "79dc571f", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "028209a1", + "metadata": {}, + "source": [ + "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b4e721fc", + "metadata": {}, + "source": [ + "**b)** Show that the variance is\n" + ] + }, + { + "cell_type": "markdown", + "id": "090eb1e1", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T}\\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b8e8697", + "metadata": {}, + "source": [ + "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n" + ] + }, + { + "cell_type": "markdown", + "id": "74bc300b", + "metadata": {}, + "source": [ + "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "eeb86010", + "metadata": {}, + "source": [ + "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n", + "\n", + "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n" + ] + }, + { + "cell_type": "markdown", + "id": "522a0d1d", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "831db06c", + "metadata": {}, + "source": [ + "**a)** Show that you can rewrite this into an expression which contains\n", + "\n", + "- the variance of the model (the variance term)\n", + "- the expected deviation of the mean of the model from the true data (the bias term)\n", + "- the variance of the noise\n", + "\n", + "In other words, show that:\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cc52b3c", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cb50416", + "metadata": {}, + "source": [ + "with\n" + ] + }, + { + "cell_type": "markdown", + "id": "e49bdbb4", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "eca5554a", + "metadata": {}, + "source": [ + "and\n" + ] + }, + { + "cell_type": "markdown", + "id": "b1054343", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", + "$$\n", + "\n", + "In order to arrive at the equation for the bias, we have to approximate the unknown function $f$ with the output/target values $y$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "70fbfcd7", + "metadata": {}, + "source": [ + "**b)** Explain what the terms mean and discuss their interpretations.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b8f8b9d1", + "metadata": {}, + "source": [ + "## Exercise 4: Computing the Bias and Variance\n" + ] + }, + { + "cell_type": "markdown", + "id": "9e012430", + "metadata": {}, + "source": [ + "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n", + "\n", + "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5bf581c", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "predictions = np.random.rand(bootstraps, n) * 10 + 10\n", + "# The definition of targets has been updated, and was wrong earlier in the week.\n", + "targets = np.random.rand(1, n)\n", + "\n", + "mse = ...\n", + "bias = ...\n", + "variance = ..." + ] + }, + { + "cell_type": "markdown", + "id": "7b1dc621", + "metadata": {}, + "source": [ + "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n", + "\n", + "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n" + ] + }, + { + "cell_type": "markdown", + "id": "8da63362", + "metadata": {}, + "source": [ + "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd5855e4", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.preprocessing import (\n", + " PolynomialFeatures,\n", + ") # use the fit_transform method of the created object!\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e35fa37", + "metadata": {}, + "outputs": [], + "source": [ + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "x = np.linspace(-3, 3, n)\n", + "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n", + "\n", + "biases = []\n", + "variances = []\n", + "mses = []\n", + "\n", + "# for p in range(1, 5):\n", + "# predictions = ...\n", + "# targets = ...\n", + "#\n", + "# X = ...\n", + "# X_train, X_test, y_train, y_test = ...\n", + "# for b in range(bootstraps):\n", + "# X_train_re, y_train_re = ...\n", + "#\n", + "# # fit your model on the sampled data\n", + "#\n", + "# # make predictions on the test data\n", + "# predictions[b, :] =\n", + "# targets[b, :] =\n", + "#\n", + "# biases.append(...)\n", + "# variances.append(...)\n", + "# mses.append(...)" + ] + }, + { + "cell_type": "markdown", + "id": "253b8461", + "metadata": {}, + "source": [ + "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n", + "\n", + "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n" + ] + }, + { + "cell_type": "markdown", + "id": "46250fbc", + "metadata": {}, + "source": [ + "## Exercise 5: Interpretation of scaling and metrics\n" + ] + }, + { + "cell_type": "markdown", + "id": "5af53055", + "metadata": {}, + "source": [ + "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n", + "\n", + "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n", + "\n", + "Briefly answer the following:\n", + "\n", + "**a)** Why do we scale data?\n", + "\n", + "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n", + "\n", + "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n", + "\n", + "**d)** Why do we say that the Ridge method gives a biased model?\n", + "\n", + "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n", + "\n", + "**h)** What is an advantage of the R2 score over the MSE?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek39.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek39.ipynb new file mode 100644 index 000000000..22a86cb56 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek39.ipynb @@ -0,0 +1,185 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "433db993", + "metadata": {}, + "source": [ + "# Exercises week 39\n", + "\n", + "## Getting started with project 1\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b931365", + "metadata": {}, + "source": [ + "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n", + "\n", + "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2a63bae1", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Create a properly formatted report in Overleaf\n", + "- Select and present graphs for a scientific report\n", + "- Write an abstract and introduction for a scientific report\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise 4.\n" + ] + }, + { + "cell_type": "markdown", + "id": "e0f2d99d", + "metadata": {}, + "source": [ + "## Exercise 1: Creating the report document\n" + ] + }, + { + "cell_type": "markdown", + "id": "d06bfb29", + "metadata": {}, + "source": [ + "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n", + "\n", + "**a)** Create an account on Overleaf.com, or log in using SSO with your UiO email.\n", + "\n", + "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n", + "\n", + "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n", + "\n", + "**d)** Read the general guideline for writing a report, which can be found at .\n", + "\n", + "**e)** Look at the provided example of an earlier project, found at \n" + ] + }, + { + "cell_type": "markdown", + "id": "ec36f4c3", + "metadata": {}, + "source": [ + "## Exercise 2: Adding good figures\n" + ] + }, + { + "cell_type": "markdown", + "id": "f50723f8", + "metadata": {}, + "source": [ + "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n", + "\n", + "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n", + "\n", + "**c)** Refer to the figure in your text using \\ref.\n", + "\n", + "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n", + "\n", + "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n" + ] + }, + { + "cell_type": "markdown", + "id": "276c214e", + "metadata": {}, + "source": [ + "## Exercise 3: Writing an abstract and introduction\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4134eb5", + "metadata": {}, + "source": [ + "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n", + "\n", + "**a)** Read the guidelines on abstract and introduction before you start.\n", + "\n", + "**b)** Write an abstract for project 1 in your report.\n", + "\n", + "**c)** Write an introduction for project 1 in your report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2f512b59", + "metadata": {}, + "source": [ + "## Exercise 4: Making the code available and presentable\n" + ] + }, + { + "cell_type": "markdown", + "id": "77fe1fec", + "metadata": {}, + "source": [ + "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n", + "\n", + "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n", + "\n", + "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n", + "\n", + "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n", + "\n", + "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n", + "\n", + "**e)** Create a README file in the repository or project folder with\n", + "\n", + "- the name of the group members\n", + "- a short description of the project\n", + "- a description of how to install the required packages to run your code from a requirements.txt file\n", + "- names and descriptions of the various notebooks in the Code folder and the results they produce\n" + ] + }, + { + "cell_type": "markdown", + "id": "f1d72c56", + "metadata": {}, + "source": [ + "## Exercise 5: Referencing\n", + "\n", + "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n", + "\n", + "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n", + "\n", + "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n", + "\n", + "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek41.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek41.ipynb new file mode 100644 index 000000000..190c0b96a --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek41.ipynb @@ -0,0 +1,804 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4b4c06bc", + "metadata": {}, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "bcb25e64", + "metadata": {}, + "source": [ + "# Exercises week 41\n", + "\n", + "**October 6-10, 2025**\n", + "\n", + "Date: **Deadline is Friday October 10 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "id": "bb01f126", + "metadata": {}, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n", + "\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", + "\n", + "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6f61b09", + "metadata": {}, + "outputs": [], + "source": [ + "import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later\n", + "from sklearn import datasets\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "\n", + "# Defining some activation functions\n", + "def ReLU(z):\n", + " return np.where(z > 0, z, 0)\n", + "\n", + "\n", + "def sigmoid(z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + "\n", + "def softmax(z):\n", + " \"\"\"Compute softmax values for each set of scores in the rows of the matrix z.\n", + " Used with batched input data.\"\"\"\n", + " e_z = np.exp(z - np.max(z, axis=0))\n", + " return e_z / np.sum(e_z, axis=1)[:, np.newaxis]\n", + "\n", + "\n", + "def softmax_vec(z):\n", + " \"\"\"Compute softmax values for each set of scores in the vector z.\n", + " Use this function when you use the activation function on one vector at a time\"\"\"\n", + " e_z = np.exp(z - np.max(z))\n", + " return e_z / np.sum(e_z)" + ] + }, + { + "cell_type": "markdown", + "id": "6248ec53", + "metadata": {}, + "source": [ + "# Exercise 1\n", + "\n", + "In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37f30740", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(2024)\n", + "\n", + "x = np.random.randn(2) # network input. This is a single input with two features\n", + "W1 = np.random.randn(4, 2) # first layer weights" + ] + }, + { + "cell_type": "markdown", + "id": "4ed2cf3d", + "metadata": {}, + "source": [ + "**a)** Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "edf7217b", + "metadata": {}, + "source": [ + "**b)** Define the bias of the first layer, `b1`with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2129c19f", + "metadata": {}, + "outputs": [], + "source": [ + "b1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "09e8d453", + "metadata": {}, + "source": [ + "**c)** Compute the intermediary `z1` for the first layer\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6837119b", + "metadata": {}, + "outputs": [], + "source": [ + "z1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "6f71374e", + "metadata": {}, + "source": [ + "**d)** Compute the activation `a1` for the first layer using the ReLU activation function defined earlier.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d41ed19", + "metadata": {}, + "outputs": [], + "source": [ + "a1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "088710c0", + "metadata": {}, + "source": [ + "Confirm that you got the correct activation with the test below. Make sure that you define `b1` with the randn function right after you define `W1`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d2f54b4", + "metadata": {}, + "outputs": [], + "source": [ + "sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])\n", + "\n", + "print(np.allclose(a1, sol1))" + ] + }, + { + "cell_type": "markdown", + "id": "7fb0cf46", + "metadata": {}, + "source": [ + "# Exercise 2\n", + "\n", + "Now we will add a layer to the network with an output of length 8 and ReLU activation.\n", + "\n", + "**a)** What is the input of the second layer? What is its shape?\n", + "\n", + "**b)** Define the weight and bias of the second layer with the right shapes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00063acf", + "metadata": {}, + "outputs": [], + "source": [ + "W2 = ...\n", + "b2 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "5bd7d84b", + "metadata": {}, + "source": [ + "**c)** Compute the intermediary `z2` and activation `a2` for the second layer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fd0383d", + "metadata": {}, + "outputs": [], + "source": [ + "z2 = ...\n", + "a2 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "1b5daae5", + "metadata": {}, + "source": [ + "Confirm that you got the correct activation shape with the test below.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7f2f8a1", + "metadata": {}, + "outputs": [], + "source": [ + "print(\n", + " np.allclose(np.exp(len(a2)), 2980.9579870417283)\n", + ") # This should evaluate to True if a2 has the correct shape :)" + ] + }, + { + "cell_type": "markdown", + "id": "3759620d", + "metadata": {}, + "source": [ + "# Exercise 3\n", + "\n", + "We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.\n", + "\n", + "**a)** Complete the function below so that it returns a list `layers` of weight and bias tuples `(W, b)` for each layer, in order, with the correct shapes that we can use later as our network parameters.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c58f10f9", + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = ...\n", + " b = ...\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers" + ] + }, + { + "cell_type": "markdown", + "id": "bdc0cda2", + "metadata": {}, + "source": [ + "**b)** Comple the function below so that it evaluates the intermediary `z` and activation `a` for each layer, with ReLU actication, and returns the final activation `a`. This is the complete feed-forward pass, a full neural network!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5262df05", + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_all_relu(layers, input):\n", + " a = input\n", + " for W, b in layers:\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "245adbcb", + "metadata": {}, + "source": [ + "**c)** Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89a8f70d", + "metadata": {}, + "outputs": [], + "source": [ + "input_size = ...\n", + "layer_output_sizes = [...]\n", + "\n", + "x = np.random.rand(input_size)\n", + "layers = ...\n", + "predict = ...\n", + "print(predict)" + ] + }, + { + "cell_type": "markdown", + "id": "0da7fd52", + "metadata": {}, + "source": [ + "**d)** Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "306d8b7c", + "metadata": {}, + "source": [ + "# Exercise 4 - Custom activation for each layer\n" + ] + }, + { + "cell_type": "markdown", + "id": "221c7b6c", + "metadata": {}, + "source": [ + "So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.\n" + ] + }, + { + "cell_type": "markdown", + "id": "10896d06", + "metadata": {}, + "source": [ + "**a)** Complete the `feed_forward` function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de062369", + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward(input, layers, activation_funcs):\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "8f7df363", + "metadata": {}, + "source": [ + "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", + "\n", + "Evaluate a network with three layers and these activation functions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "301b46dc", + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = ...\n", + "layer_output_sizes = [...]\n", + "activation_funcs = [ReLU, ReLU, sigmoid]\n", + "layers = ...\n", + "\n", + "x = np.random.randn(network_input_size)\n", + "feed_forward(x, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "9c914fd0", + "metadata": {}, + "source": [ + "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "a8d6c425", + "metadata": {}, + "source": [ + "# Exercise 5 - Processing multiple inputs at once\n" + ] + }, + { + "cell_type": "markdown", + "id": "0f4330a4", + "metadata": {}, + "source": [ + "So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.\n", + "\n", + "To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.\n" + ] + }, + { + "cell_type": "markdown", + "id": "17023bb7", + "metadata": {}, + "source": [ + "**a)** Complete the function `create_layers_batch` so that the weight matrix is the transpose of what it was when you only sent in one input at a time.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a241fd79", + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers_batch(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = ...\n", + " b = ...\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers" + ] + }, + { + "cell_type": "markdown", + "id": "a6349db6", + "metadata": {}, + "source": [ + "**b)** Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function `feed_forward_batch` so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "425f3bcc", + "metadata": {}, + "outputs": [], + "source": [ + "inputs = np.random.rand(1000, 4)\n", + "\n", + "\n", + "def feed_forward_batch(inputs, layers, activation_funcs):\n", + " a = inputs\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "efd07b4e", + "metadata": {}, + "source": [ + "**c)** Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce6fcc2f", + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = ...\n", + "layer_output_sizes = [...]\n", + "activation_funcs = [...]\n", + "layers = create_layers_batch(network_input_size, layer_output_sizes)\n", + "\n", + "x = np.random.randn(network_input_size)\n", + "feed_forward_batch(inputs, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "87999271", + "metadata": {}, + "source": [ + "You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.\n" + ] + }, + { + "cell_type": "markdown", + "id": "237eb782", + "metadata": {}, + "source": [ + "# Exercise 6 - Predicting on real data\n" + ] + }, + { + "cell_type": "markdown", + "id": "54d5fde7", + "metadata": {}, + "source": [ + "You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).\n", + "\n", + "This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bd4c148", + "metadata": {}, + "outputs": [], + "source": [ + "iris = datasets.load_iris()\n", + "\n", + "_, ax = plt.subplots()\n", + "scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)\n", + "ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])\n", + "_ = ax.legend(\n", + " scatter.legend_elements()[0], iris.target_names, loc=\"lower right\", title=\"Classes\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed3e2fc9", + "metadata": {}, + "outputs": [], + "source": [ + "inputs = iris.data\n", + "\n", + "# Since each prediction is a vector with a score for each of the three types of flowers,\n", + "# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.\n", + "targets = np.zeros((len(iris.data), 3))\n", + "for i, t in enumerate(iris.target):\n", + " targets[i, t] = 1\n", + "\n", + "\n", + "def accuracy(predictions, targets):\n", + " one_hot_predictions = np.zeros(predictions.shape)\n", + "\n", + " for i, prediction in enumerate(predictions):\n", + " one_hot_predictions[i, np.argmax(prediction)] = 1\n", + " return accuracy_score(one_hot_predictions, targets)" + ] + }, + { + "cell_type": "markdown", + "id": "0362c4a9", + "metadata": {}, + "source": [ + "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n" + ] + }, + { + "cell_type": "markdown", + "id": "bf62607e", + "metadata": {}, + "source": [ + "**b)** Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 \"nodes\", the second has the number of nodes you found in exercise a). Softmax returns a \"probability distribution\", in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5366d4ae", + "metadata": {}, + "outputs": [], + "source": [ + "...\n", + "layers = ..." + ] + }, + { + "cell_type": "markdown", + "id": "c528846f", + "metadata": {}, + "source": [ + "**c)** Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c783105", + "metadata": {}, + "outputs": [], + "source": [ + "predictions = feed_forward_batch(inputs, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "01a3caa8", + "metadata": {}, + "source": [ + "**d)** Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2612b82", + "metadata": {}, + "outputs": [], + "source": [ + "print(accuracy(predictions, targets))" + ] + }, + { + "cell_type": "markdown", + "id": "334560b6", + "metadata": {}, + "source": [ + "# Exercise 7 - Training on real data (Optional)\n", + "\n", + "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n" + ] + }, + { + "cell_type": "markdown", + "id": "700cabe4", + "metadata": {}, + "source": [ + "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f30e6e2c", + "metadata": {}, + "outputs": [], + "source": [ + "def cross_entropy(predict, target):\n", + " return np.sum(-target * np.log(predict))\n", + "\n", + "\n", + "def cost(input, layers, activation_funcs, target):\n", + " predict = feed_forward_batch(input, layers, activation_funcs)\n", + " return cross_entropy(predict, target)" + ] + }, + { + "cell_type": "markdown", + "id": "7ea9c1a4", + "metadata": {}, + "source": [ + "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n", + "\n", + "$$\n", + "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6c753e3b", + "metadata": {}, + "source": [ + "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56bef776", + "metadata": {}, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "\n", + "gradient_func = grad(\n", + " cost, 1\n", + ") # Taking the gradient wrt. the second input to the cost function, i.e. the layers" + ] + }, + { + "cell_type": "markdown", + "id": "7b1b74bc", + "metadata": {}, + "source": [ + "**a)** What shape should the gradient of the cost function wrt. weights and biases be?\n", + "\n", + "**b)** Use the `gradient_func` function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what's inside. What does the `grad` func from autograd actually do?\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "841c9e87", + "metadata": {}, + "outputs": [], + "source": [ + "layers_grad = gradient_func(\n", + " inputs, layers, activation_funcs, targets\n", + ") # Don't change this" + ] + }, + { + "cell_type": "markdown", + "id": "adc9e9be", + "metadata": {}, + "source": [ + "**c)** Finish the `train_network` function.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e4d38d3", + "metadata": {}, + "outputs": [], + "source": [ + "def train_network(\n", + " inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n", + "):\n", + " for i in range(epochs):\n", + " layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n", + " for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n", + " W -= ...\n", + " b -= ..." + ] + }, + { + "cell_type": "markdown", + "id": "2f65d663", + "metadata": {}, + "source": [ + "**e)** What do we call the gradient method used above?\n" + ] + }, + { + "cell_type": "markdown", + "id": "7059dd8c", + "metadata": {}, + "source": [ + "**d)** Train your network and see how the accuracy changes! Make a plot if you want.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5027c7a5", + "metadata": {}, + "outputs": [], + "source": [ + "..." + ] + }, + { + "cell_type": "markdown", + "id": "3bc77016", + "metadata": {}, + "source": [ + "**e)** How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb new file mode 100644 index 000000000..9925836a4 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb @@ -0,0 +1,719 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercises week 42\n", + "\n", + "**October 13-17, 2025**\n", + "\n", + "Date: **Deadline is Friday October 17 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "The aim of the exercises this week is to train the neural network you implemented last week.\n", + "\n", + "To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.\n", + "\n", + "You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.\n", + "\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, some setup code that you will need.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later\n", + "from autograd import grad, elementwise_grad\n", + "from sklearn import datasets\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "\n", + "# Defining some activation functions\n", + "def ReLU(z):\n", + " return np.where(z > 0, z, 0)\n", + "\n", + "\n", + "# Derivative of the ReLU function\n", + "def ReLU_der(z):\n", + " return np.where(z > 0, 1, 0)\n", + "\n", + "\n", + "def sigmoid(z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + "\n", + "def mse(predict, target):\n", + " return np.mean((predict - target) ** 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1 - Understand the feed forward pass\n", + "\n", + "**a)** Complete last weeks' exercises if you haven't already (recommended).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 2 - Gradient with one layer using autograd\n", + "\n", + "For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.\n", + "\n", + "In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_one_layer(W, b, x):\n", + " z = ...\n", + " a = ...\n", + " return a\n", + "\n", + "\n", + "def cost_one_layer(W, b, x, target):\n", + " predict = feed_forward_one_layer(W, b, x)\n", + " return mse(predict, target)\n", + "\n", + "\n", + "x = np.random.rand(2)\n", + "target = np.random.rand(3)\n", + "\n", + "W = ...\n", + "b = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "autograd_one_layer = grad(cost_one_layer, [0, 1])\n", + "W_g, b_g = autograd_one_layer(W, b, x, target)\n", + "print(W_g, b_g)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 3 - Gradient with one layer writing backpropagation by hand\n", + "\n", + "Before you use the gradient you found using autograd, you will have to find the gradient \"manually\", to better understand how the backpropagation computation works. To do backpropagation \"manually\", you will need to write out expressions for many derivatives along the computation.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.\n", + "\n", + "$$\n", + "\\frac{dC}{dW} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{dW}\n", + "$$\n", + "\n", + "$$\n", + "\\frac{dC}{db} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{db}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Which intermediary results can be reused between the two expressions?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = W @ x + b\n", + "a = sigmoid(z)\n", + "\n", + "predict = a\n", + "\n", + "\n", + "def mse_der(predict, target):\n", + " return ...\n", + "\n", + "\n", + "print(mse_der(predict, target))\n", + "\n", + "cost_autograd = grad(mse, 0)\n", + "print(cost_autograd(predict, target))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sigmoid_der(z):\n", + " return ...\n", + "\n", + "\n", + "print(sigmoid_der(z))\n", + "\n", + "sigmoid_autograd = elementwise_grad(sigmoid, 0)\n", + "print(sigmoid_autograd(z))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Using the two derivatives you just computed, compute this intermetidary gradient you will use later:\n", + "\n", + "$$\n", + "\\frac{dC}{dz} = \\frac{dC}{da}\\frac{da}{dz}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da = ...\n", + "dC_dz = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**e)** What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**f)** Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da = ...\n", + "dC_dz = ...\n", + "dC_dW = ...\n", + "dC_db = ...\n", + "\n", + "print(dC_dW, dC_db)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should get the same results as with autograd.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "W_g, b_g = autograd_one_layer(W, b, x, target)\n", + "print(W_g, b_g)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 4 - Gradient with two layers writing backpropagation by hand\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let's move up to two layers.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.random.rand(2)\n", + "target = np.random.rand(4)\n", + "\n", + "W1 = np.random.rand(3, 2)\n", + "b1 = np.random.rand(3)\n", + "\n", + "W2 = np.random.rand(4, 3)\n", + "b2 = np.random.rand(4)\n", + "\n", + "layers = [(W1, b1), (W2, b2)]" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [], + "source": [ + "z1 = W1 @ x + b1\n", + "a1 = sigmoid(z1)\n", + "z2 = W2 @ a1 + b2\n", + "a2 = sigmoid(z2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.\n", + "\n", + "**a)** Compute the gradients of the last layer, just like you did the single layer in the previous exercise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da2 = ...\n", + "dC_dz2 = ...\n", + "dC_dW2 = ...\n", + "dC_db2 = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.\n", + "\n", + "$$\n", + "\\frac{dC}{da_1} = \\frac{dC}{dz_2}\\frac{dz_2}{da_1}\n", + "$$\n", + "\n", + "**b)** What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute $z_2$)\n", + "\n", + "$$\n", + "\\frac{dz_2}{da_1}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.\n", + "\n", + "$$\n", + "\\frac{dC}{dW_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{dW_1}\n", + "$$\n", + "\n", + "$$\n", + "\\frac{dC}{db_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{db_1}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da1 = ...\n", + "dC_dz1 = ...\n", + "dC_dW1 = ...\n", + "dC_db1 = ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(dC_dW1, dC_db1)\n", + "print(dC_dW2, dC_db2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Make sure you got the same gradient as the following code which uses autograd to do backpropagation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_two_layers(layers, x):\n", + " W1, b1 = layers[0]\n", + " z1 = W1 @ x + b1\n", + " a1 = sigmoid(z1)\n", + "\n", + " W2, b2 = layers[1]\n", + " z2 = W2 @ a1 + b2\n", + " a2 = sigmoid(z2)\n", + "\n", + " return a2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cost_two_layers(layers, x, target):\n", + " predict = feed_forward_two_layers(layers, x)\n", + " return mse(predict, target)\n", + "\n", + "\n", + "grad_two_layers = grad(cost_two_layers, 0)\n", + "grad_two_layers(layers, x, target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**e)** How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 5 - Gradient with any number of layers writing backpropagation by hand\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done on getting this far! Now it's time to compute the gradient with any number of layers.\n", + "\n", + "First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = np.random.randn(layer_output_size, i_size)\n", + " b = np.random.randn(layer_output_size)\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers\n", + "\n", + "\n", + "def feed_forward(input, layers, activation_funcs):\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = W @ a + b\n", + " a = activation_func(z)\n", + " return a\n", + "\n", + "\n", + "def cost(layers, input, activation_funcs, target):\n", + " predict = feed_forward(input, layers, activation_funcs)\n", + " return mse(predict, target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.\n", + "\n", + "Here is a function which does that for you.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_saver(input, layers, activation_funcs):\n", + " layer_inputs = []\n", + " zs = []\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " layer_inputs.append(a)\n", + " z = W @ a + b\n", + " a = activation_func(z)\n", + "\n", + " zs.append(z)\n", + "\n", + " return layer_inputs, zs, a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def backpropagation(\n", + " input, layers, activation_funcs, target, activation_ders, cost_der=mse_der\n", + "):\n", + " layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)\n", + "\n", + " layer_grads = [() for layer in layers]\n", + "\n", + " # We loop over the layers, from the last to the first\n", + " for i in reversed(range(len(layers))):\n", + " layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]\n", + "\n", + " if i == len(layers) - 1:\n", + " # For last layer we use cost derivative as dC_da(L) can be computed directly\n", + " dC_da = ...\n", + " else:\n", + " # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)\n", + " (W, b) = layers[i + 1]\n", + " dC_da = ...\n", + "\n", + " dC_dz = ...\n", + " dC_dW = ...\n", + " dC_db = ...\n", + "\n", + " layer_grads[i] = (dC_dW, dC_db)\n", + "\n", + " return layer_grads" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = 2\n", + "layer_output_sizes = [3, 4]\n", + "activation_funcs = [sigmoid, ReLU]\n", + "activation_ders = [sigmoid_der, ReLU_der]\n", + "\n", + "layers = create_layers(network_input_size, layer_output_sizes)\n", + "\n", + "x = np.random.rand(network_input_size)\n", + "target = np.random.rand(4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)\n", + "print(layer_grads)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cost_grad = grad(cost, 0)\n", + "cost_grad(layers, x, [sigmoid, ReLU], target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 6 - Batched inputs\n", + "\n", + "Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 7 - Training\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.\n", + "- IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!\n", + "- Instead, use the fact that the derivatives multiplied together simplify to **prediction - target** (see [source1](https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1), [source2](https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/))\n", + "\n", + "**b)** Use stochastic gradient descent with momentum when you train your network.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 8 (Optional) - Object orientation\n", + "\n", + "Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.\n", + "\n", + "**a)** Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.\n", + "\n", + "We provide here a skeleton structure which should get you started.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class NeuralNetwork:\n", + " def __init__(\n", + " self,\n", + " network_input_size,\n", + " layer_output_sizes,\n", + " activation_funcs,\n", + " activation_ders,\n", + " cost_fun,\n", + " cost_der,\n", + " ):\n", + " pass\n", + "\n", + " def predict(self, inputs):\n", + " # Simple feed forward pass\n", + " pass\n", + "\n", + " def cost(self, inputs, targets):\n", + " pass\n", + "\n", + " def _feed_forward_saver(self, inputs):\n", + " pass\n", + "\n", + " def compute_gradient(self, inputs, targets):\n", + " pass\n", + "\n", + " def update_weights(self, layer_grads):\n", + " pass\n", + "\n", + " # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it\n", + " def autograd_compliant_predict(self, layers, inputs):\n", + " pass\n", + "\n", + " def autograd_gradient(self, inputs, targets):\n", + " pass" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek43.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek43.ipynb new file mode 100644 index 000000000..f80e8787a --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek43.ipynb @@ -0,0 +1,647 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "860d70d8", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "119c0988", + "metadata": { + "editable": true + }, + "source": [ + "# Exercises week 43 \n", + "**October 20-24, 2025**\n", + "\n", + "Date: **Deadline Friday October 24 at midnight**" + ] + }, + { + "cell_type": "markdown", + "id": "909887eb", + "metadata": { + "editable": true + }, + "source": [ + "# Overarching aims of the exercises for week 43\n", + "\n", + "The aim of the exercises this week is to gain some confidence with\n", + "ways to visualize the results of a classification problem. We will\n", + "target three ways of setting up the analysis. The first and simplest\n", + "one is the\n", + "1. so-called confusion matrix. The next one is the so-called\n", + "\n", + "2. ROC curve. Finally we have the\n", + "\n", + "3. Cumulative gain curve.\n", + "\n", + "We will use Logistic Regression as method for the classification in\n", + "this exercise. You can compare these results with those obtained with\n", + "your neural network code from project 2 without a hidden layer.\n", + "\n", + "In these exercises we will use binary and multi-class data sets\n", + "(the Iris data set from week 41).\n", + "\n", + "The underlying mathematics is described here." + ] + }, + { + "cell_type": "markdown", + "id": "1e1cb4fb", + "metadata": { + "editable": true + }, + "source": [ + "### Confusion Matrix\n", + "\n", + "A **confusion matrix** summarizes a classifier’s performance by\n", + "tabulating predictions versus true labels. For binary classification,\n", + "it is a $2\\times2$ table whose entries are counts of outcomes:" + ] + }, + { + "cell_type": "markdown", + "id": "7b090385", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e14904b", + "metadata": { + "editable": true + }, + "source": [ + "Here TP (true positives) is the number of cases correctly predicted as\n", + "positive, FP (false positives) is the number incorrectly predicted as\n", + "positive, TN (true negatives) is correctly predicted negative, and FN\n", + "(false negatives) is incorrectly predicted negative . In other words,\n", + "“positive” means class 1 and “negative” means class 0; for example, TP\n", + "occurs when the prediction and actual are both positive. Formally:" + ] + }, + { + "cell_type": "markdown", + "id": "e93ea290", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c80bea5b", + "metadata": { + "editable": true + }, + "source": [ + "where TPR and FPR are the true and false positive rates defined below.\n", + "\n", + "In multiclass classification with $K$ classes, the confusion matrix\n", + "generalizes to a $K\\times K$ table. Entry $N_{ij}$ in the table is\n", + "the count of instances whose true class is $i$ and whose predicted\n", + "class is $j$. For example, a three-class confusion matrix can be written\n", + "as:" + ] + }, + { + "cell_type": "markdown", + "id": "a0f68f5f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "869669b2", + "metadata": { + "editable": true + }, + "source": [ + "Here the diagonal entries $N_{ii}$ are the true positives for each\n", + "class, and off-diagonal entries are misclassifications. This matrix\n", + "allows computation of per-class metrics: e.g. for class $i$,\n", + "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n", + "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n", + "all remaining entries.\n", + "\n", + "As defined above, TPR and FPR come from the binary case. In binary\n", + "terms with $P$ actual positives and $N$ actual negatives, one has" + ] + }, + { + "cell_type": "markdown", + "id": "2abd82a7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n", + "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2f79325c", + "metadata": { + "editable": true + }, + "source": [ + "as used in standard confusion-matrix\n", + "formulations. These rates will be used in constructing ROC curves." + ] + }, + { + "cell_type": "markdown", + "id": "0ce65a47", + "metadata": { + "editable": true + }, + "source": [ + "### ROC Curve\n", + "\n", + "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n", + "between true positives and false positives as a discrimination\n", + "threshold varies. Specifically, for a binary classifier that outputs\n", + "a score or probability, one varies the threshold $t$ for declaring\n", + "**positive**, and computes at each $t$ the true positive rate\n", + "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n", + "confusion matrix at that threshold. The ROC curve is then the graph\n", + "of TPR versus FPR. By definition," + ] + }, + { + "cell_type": "markdown", + "id": "d750fdff", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "561bfb2c", + "metadata": { + "editable": true + }, + "source": [ + "where $TP,FP,TN,FN$ are counts determined by threshold $t$. A perfect\n", + "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n", + "\n", + "Formally, the ROC curve is obtained by plotting\n", + "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n", + "sweeps through the sorted scores). The Area Under the ROC Curve (AUC)\n", + "quantifies the average performance over all thresholds. It can be\n", + "interpreted probabilistically: $\\mathrm{AUC} =\n", + "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n", + "instance $X^+$ receives a higher score $s$ than a random negative\n", + "instance $X^-$ . Equivalently, the AUC is the integral under the ROC\n", + "curve:" + ] + }, + { + "cell_type": "markdown", + "id": "5ca722fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "30080a86", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ ranges over FPR (or fraction of negatives). A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0." + ] + }, + { + "cell_type": "markdown", + "id": "9e627156", + "metadata": { + "editable": true + }, + "source": [ + "### Cumulative Gain\n", + "\n", + "The cumulative gain curve (or gains chart) evaluates how many\n", + "positives are captured as one targets an increasing fraction of the\n", + "population, sorted by model confidence. To construct it, sort all\n", + "instances by decreasing predicted probability of the positive class.\n", + "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n", + "of all actual positives that fall in this subset. In formula form, if\n", + "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n", + "number of positives among the top $\\alpha$ of the data, the cumulative\n", + "gain at level $\\alpha$ is" + ] + }, + { + "cell_type": "markdown", + "id": "3e9132ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "75be6f5c", + "metadata": { + "editable": true + }, + "source": [ + "For example, cutting off at the top 10% of predictions yields a gain\n", + "equal to (positives in top 10%) divided by (total positives) .\n", + "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n", + "gives the gain curve. The baseline (random) curve is the diagonal\n", + "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n", + "toward 1.\n", + "\n", + "A related measure is the {\\em lift}, often called the gain ratio. It is the ratio of the model’s capture rate to that of random selection. Equivalently," + ] + }, + { + "cell_type": "markdown", + "id": "e5525570", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "18ff8dc2", + "metadata": { + "editable": true + }, + "source": [ + "A lift $>1$ indicates better-than-random targeting. In practice, gain\n", + "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n", + "show how many positives can be “gained” by focusing on a fraction of\n", + "the population ." + ] + }, + { + "cell_type": "markdown", + "id": "c3d3fde8", + "metadata": { + "editable": true + }, + "source": [ + "### Other measures: Precision, Recall, and the F$_1$ Measure\n", + "\n", + "Precision and recall (sensitivity) quantify binary classification\n", + "accuracy in terms of positive predictions. They are defined from the\n", + "confusion matrix as:" + ] + }, + { + "cell_type": "markdown", + "id": "f1f14c8e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "422cc743", + "metadata": { + "editable": true + }, + "source": [ + "Precision is the fraction of predicted positives that are correct, and\n", + "recall is the fraction of actual positives that are correctly\n", + "identified . A high-precision classifier makes few false-positive\n", + "errors, while a high-recall classifier makes few false-negative\n", + "errors.\n", + "\n", + "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean. The usual formula is:" + ] + }, + { + "cell_type": "markdown", + "id": "621a2e8b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "62eee54a", + "metadata": { + "editable": true + }, + "source": [ + "This can be shown to equal" + ] + }, + { + "cell_type": "markdown", + "id": "7a6a2e7a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b96c9ff4", + "metadata": { + "editable": true + }, + "source": [ + "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n", + "trade-off between precision and recall.\n", + "\n", + "For multi-class classification, one computes per-class\n", + "precision/recall/F$_1$ (treating each class as “positive” in a\n", + "one-vs-rest manner) and then averages. Common averaging methods are:\n", + "\n", + "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n", + "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ . This treats all classes equally regardless of size.\n", + "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$. This accounts for class imbalance by giving more weight to larger classes .\n", + "\n", + "Each of these averages has different use-cases. Micro-average is\n", + "dominated by common classes, macro-average highlights performance on\n", + "rare classes, and weighted-average is a compromise. These formulas\n", + "and concepts allow rigorous evaluation of classifier performance in\n", + "both binary and multi-class settings." + ] + }, + { + "cell_type": "markdown", + "id": "9274bf3f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercises\n", + "\n", + "Here is a simple code example which uses the Logistic regression machinery from **scikit-learn**.\n", + "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n", + "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "be9ff0b9", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "# from sklearn.datasets import fill in the data set\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data, fill inn\n", + "mydata.data = ?\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "# define which type of problem, binary or multiclass\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "51760b3e", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise a)\n", + "\n", + "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem." + ] + }, + { + "cell_type": "markdown", + "id": "c1d42f5f", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise b)\n", + "\n", + "Use a binary classification data available from **scikit-learn**. As an example you can use\n", + "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d20bb8be", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n", + "X, y = digits.data, digits.target" + ] + }, + { + "cell_type": "markdown", + "id": "828ea1cd", + "metadata": { + "editable": true + }, + "source": [ + "Alternatively, you can use the _make$\\_$classification_\n", + "functionality. This function generates a random $n$-class classification\n", + "dataset, which can be configured for binary classification by setting\n", + "n_classes=2. You can also control the number of samples, features,\n", + "informative features, redundant features, and more." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d271f0ba", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import make_classification\n", + "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "0068b032", + "metadata": { + "editable": true + }, + "source": [ + "You can use this option for the multiclass case as well, see the next exercise.\n", + "If you prefer to study other binary classification datasets, feel free\n", + "to replace the above suggestions with your own dataset.\n", + "\n", + "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve." + ] + }, + { + "cell_type": "markdown", + "id": "c45f5b41", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise c) week 43\n", + "\n", + "As a multiclass problem, we will use the Iris data set discussed in\n", + "the exercises from weeks 41 and 42. This is a three-class data set and\n", + "you can set it up using **scikit-learn**," + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3b045d56", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_iris\n", + "iris = load_iris()\n", + "X = iris.data # Features\n", + "y = iris.target # Target labels" + ] + }, + { + "cell_type": "markdown", + "id": "14cc859c", + "metadata": { + "editable": true + }, + "source": [ + "Make plots of the confusion matrix, the ROC curve and the cumulative\n", + "gain curve for this (or other) multiclass data set." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek44.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek44.ipynb new file mode 100644 index 000000000..32aa0e723 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/exercisesweek44.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "55f7cd56", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "37c83276", + "metadata": { + "editable": true + }, + "source": [ + "# Exercises week 44\n", + "\n", + "**October 27-31, 2025**\n", + "\n", + "Date: **Deadline is Friday October 31 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "id": "58a26983", + "metadata": { + "editable": true + }, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "The exercise set this week has two parts.\n", + "\n", + "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n", + "\n", + "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n" + ] + }, + { + "cell_type": "markdown", + "id": "350c58e2", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n", + "\n", + "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise **1d)**\n" + ] + }, + { + "cell_type": "markdown", + "id": "00f65f6e", + "metadata": {}, + "source": [ + "## Exercise 1:\n", + "\n", + "Following the same directions as in the weekly exercises for week 39:\n", + "\n", + "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n", + "\n", + "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n", + "\n", + "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n", + "\n", + "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n", + "\n", + "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n" + ] + }, + { + "cell_type": "markdown", + "id": "6dff53b8", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 2:\n", + "\n", + "**a)** Linear and logistic regression methods\n", + "\n", + "1. What is the main difference between ordinary least squares and Ridge regression?\n", + "\n", + "2. Which kind of data set would you use logistic regression for?\n", + "\n", + "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n", + "\n", + "4. Can you find an analytic solution to a logistic regression type of problem?\n", + "\n", + "5. What kind of cost function would you use in logistic regression?\n" + ] + }, + { + "cell_type": "markdown", + "id": "21a056a4", + "metadata": { + "editable": true + }, + "source": [ + "**b)** Deep learning\n", + "\n", + "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n", + "\n", + "2. Describe the architecture of a typical feed forward Neural Network (NN).\n", + "\n", + "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n", + "\n", + "4. How would you know if your model is suffering from the problem of exploding gradients?\n", + "\n", + "5. Can you name and explain a few hyperparameters used for training a neural network?\n", + "\n", + "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n", + "\n", + "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n", + "\n", + "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n", + "\n", + "9. How does L1/L2 regularization affect a neural network?\n", + "\n", + "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n" + ] + }, + { + "cell_type": "markdown", + "id": "7c48bc09", + "metadata": { + "editable": true + }, + "source": [ + "**c)** Optimization part\n", + "\n", + "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n", + "\n", + "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n", + "\n", + "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n", + "\n", + "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n", + "\n", + "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n" + ] + }, + { + "cell_type": "markdown", + "id": "56b0b5f6", + "metadata": { + "editable": true + }, + "source": [ + "**d)** Analysis of results\n", + "\n", + "1. How do you assess overfitting and underfitting?\n", + "\n", + "2. Why do we divide the data in test and train and/or eventually validation sets?\n", + "\n", + "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n", + "\n", + "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/project1.ipynb b/doc/LectureNotes/_build/html/_sources/project1.ipynb index aba42cd41..5170af951 100644 --- a/doc/LectureNotes/_build/html/_sources/project1.ipynb +++ b/doc/LectureNotes/_build/html/_sources/project1.ipynb @@ -9,7 +9,7 @@ "source": [ "\n", - "" + "\n" ] }, { @@ -20,9 +20,34 @@ }, "source": [ "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n", + "\n", "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n", "\n", - "Date: **September 2**" + "Date: **September 2**\n" + ] + }, + { + "cell_type": "markdown", + "id": "beb333e3", + "metadata": {}, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + " - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + " - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + " - A PDF file of the report\n", + " - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + " - A README file with\n", + " - the name of the group members\n", + " - a short description of the project\n", + " - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n", + " - names and descriptions of the various notebooks in the Code folder and the results they produce\n" ] }, { @@ -35,7 +60,7 @@ "## Preamble: Note on writing reports, using reference material, AI and other tools\n", "\n", "We want you to answer the three different projects by handing in\n", - "reports written like a standard scientific/technical report. The\n", + "reports written like a standard scientific/technical report. The\n", "links at\n", "\n", "contain more information. There you can find examples of previous\n", @@ -63,14 +88,14 @@ "been studied in the scientific literature. This makes it easier for\n", "you to compare and analyze your results. Comparing with existing\n", "results from the scientific literature is also an essential element of\n", - "the scientific discussion. The University of California at Irvine\n", + "the scientific discussion. The University of California at Irvine\n", "with its Machine Learning repository at\n", " is an excellent site to\n", "look up for examples and\n", "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n", "interesting site. Feel free to explore these sites. When selecting\n", "other data sets, make sure these are sets used for regression problems\n", - "(not classification)." + "(not classification).\n" ] }, { @@ -90,7 +115,7 @@ "We will study how to fit polynomials to specific\n", "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n", "\n", - "We will use Runge's function (see for a discussion). The one-dimensional function we will study is" + "We will use Runge's function (see for a discussion). The one-dimensional function we will study is\n" ] }, { @@ -102,7 +127,7 @@ "source": [ "$$\n", "f(x) = \\frac{1}{1+25x^2}.\n", - "$$" + "$$\n" ] }, { @@ -114,14 +139,14 @@ "source": [ "Our first step will be to perform an OLS regression analysis of this\n", "function, trying out a polynomial fit with an $x$ dependence of the\n", - "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", + "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n", "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n", - "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", + "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", "\n", "We will also include bootstrap as a resampling technique in order to\n", - "study the so-called **bias-variance tradeoff**. After that we will\n", - "include the so-called cross-validation technique." + "study the so-called **bias-variance tradeoff**. After that we will\n", + "include the so-called cross-validation technique.\n" ] }, { @@ -133,15 +158,15 @@ "source": [ "### Part a : Ordinary Least Square (OLS) for the Runge function\n", "\n", - "We will generate our own dataset for abovementioned function\n", + "We will generate our own dataset for abovementioned function\n", "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n", "of an added stochastic noise to this function using the normal\n", "distribution $N(0,1)$.\n", "\n", - "*Write your own code* (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", - "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", + "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", + "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", "\n", - "Evaluate the mean Squared error (MSE)" + "Evaluate the mean Squared error (MSE)\n" ] }, { @@ -154,7 +179,7 @@ "$$\n", "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n", "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n", - "$$" + "$$\n" ] }, { @@ -164,9 +189,9 @@ "editable": true }, "source": [ - "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", + "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n", - "then the score $R^2$ is defined as" + "then the score $R^2$ is defined as\n" ] }, { @@ -178,7 +203,7 @@ "source": [ "$$\n", "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n", - "$$" + "$$\n" ] }, { @@ -188,7 +213,7 @@ "editable": true }, "source": [ - "where we have defined the mean value of $\\boldsymbol{y}$ as" + "where we have defined the mean value of $\\boldsymbol{y}$ as\n" ] }, { @@ -200,7 +225,7 @@ "source": [ "$$\n", "\\bar{y} = \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n", - "$$" + "$$\n" ] }, { @@ -215,23 +240,23 @@ "\n", "Your code has to include a scaling/centering of the data (for example by\n", "subtracting the mean value), and\n", - "a split of the data in training and test data. For the scaling you can\n", + "a split of the data in training and test data. For the scaling you can\n", "either write your own code or use for example the function for\n", "splitting training data provided by the library **Scikit-Learn** (make\n", - "sure you have installed it). This function is called\n", - "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", + "sure you have installed it). This function is called\n", + "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", "\n", "It is normal in essentially all Machine Learning studies to split the\n", - "data in a training set and a test set (eventually also an additional\n", - "validation set). There\n", + "data in a training set and a test set (eventually also an additional\n", + "validation set). There\n", "is no explicit recipe for how much data should be included as training\n", - "data and say test data. An accepted rule of thumb is to use\n", + "data and say test data. An accepted rule of thumb is to use\n", "approximately $2/3$ to $4/5$ of the data as training data.\n", "\n", "You can easily reuse the solutions to your exercises from week 35.\n", "See also the lecture slides from week 35 and week 36.\n", "\n", - "On scaling, we recommend reading the following section from the scikit-learn software description, see ." + "On scaling, we recommend reading the following section from the scikit-learn software description, see .\n" ] }, { @@ -241,14 +266,14 @@ "editable": true }, "source": [ - "### Part b: Adding Ridge regression for the Runge function\n", + "### Part b: Adding Ridge regression for the Runge function\n", "\n", "Write your own code for the Ridge method as done in the previous\n", - "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", + "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", "\n", "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n", - "analyze your results with those obtained in part a) with the OLS method. Study the\n", - "dependence on $\\lambda$." + "analyze your results with those obtained in part a) with the OLS method. Study the\n", + "dependence on $\\lambda$.\n" ] }, { @@ -267,7 +292,7 @@ "from week 36).\n", "\n", "Study and compare your results from parts a) and b) with your gradient\n", - "descent approch. Discuss in particular the role of the learning rate." + "descent approch. Discuss in particular the role of the learning rate.\n" ] }, { @@ -283,7 +308,7 @@ "the gradient descent method by including **momentum**, **ADAgrad**,\n", "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n", "rate. Discuss the results and compare the different methods applied to\n", - "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods." + "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n" ] }, { @@ -299,12 +324,12 @@ "represents our first encounter with a machine learning method which\n", "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n", "descent methods you developed in parts c) and d) to solve the LASSO\n", - "optimization problem. You can compare your results with \n", + "optimization problem. You can compare your results with\n", "the functionalities of **Scikit-Learn**.\n", "\n", "Discuss (critically) your results for the Runge function from OLS,\n", "Ridge and LASSO regression using the various gradient descent\n", - "approaches." + "approaches.\n" ] }, { @@ -319,7 +344,7 @@ "Our last gradient step is to include stochastic gradient descent using\n", "the same methods to update the learning rates as in parts c-e).\n", "Compare and discuss your results with and without stochastic gradient\n", - "and give a critical assessment of the various methods." + "and give a critical assessment of the various methods.\n" ] }, { @@ -332,14 +357,14 @@ "### Part g: Bias-variance trade-off and resampling techniques\n", "\n", "Our aim here is to study the bias-variance trade-off by implementing\n", - "the **bootstrap** resampling technique. **We will only use the simpler\n", + "the **bootstrap** resampling technique. **We will only use the simpler\n", "ordinary least squares here**.\n", "\n", - "With a code which does OLS and includes resampling techniques, \n", + "With a code which does OLS and includes resampling techniques,\n", "we will now discuss the bias-variance trade-off in the context of\n", "continuous predictions such as regression. However, many of the\n", "intuitions and ideas discussed here also carry over to classification\n", - "tasks and basically all Machine Learning algorithms. \n", + "tasks and basically all Machine Learning algorithms.\n", "\n", "Before you perform an analysis of the bias-variance trade-off on your\n", "test data, make first a figure similar to Fig. 2.11 of Hastie,\n", @@ -356,7 +381,7 @@ "dataset $\\mathcal{L}$ consisting of the data\n", "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n", "\n", - "We assume that the true data is generated from a noisy model" + "We assume that the true data is generated from a noisy model\n" ] }, { @@ -368,7 +393,7 @@ "source": [ "$$\n", "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n", - "$$" + "$$\n" ] }, { @@ -387,7 +412,7 @@ "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n", "\n", "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n", - "squared error via the so-called cost function" + "squared error via the so-called cost function\n" ] }, { @@ -399,7 +424,7 @@ "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", - "$$" + "$$\n" ] }, { @@ -409,14 +434,14 @@ "editable": true }, "source": [ - "Here the expected value $\\mathbb{E}$ is the sample value. \n", + "Here the expected value $\\mathbb{E}$ is the sample value.\n", "\n", "Show that you can rewrite this in terms of a term which contains the\n", "variance of the model itself (the so-called variance term), a term\n", "which measures the deviation from the true data and the mean value of\n", "the model (the bias term) and finally the variance of the noise.\n", "\n", - "That is, show that" + "That is, show that\n" ] }, { @@ -428,7 +453,7 @@ "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", - "$$" + "$$\n" ] }, { @@ -438,7 +463,7 @@ "editable": true }, "source": [ - "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)" + "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n" ] }, { @@ -450,7 +475,7 @@ "source": [ "$$\n", "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", - "$$" + "$$\n" ] }, { @@ -460,7 +485,7 @@ "editable": true }, "source": [ - "and" + "and\n" ] }, { @@ -472,7 +497,7 @@ "source": [ "$$\n", "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", - "$$" + "$$\n" ] }, { @@ -482,11 +507,11 @@ "editable": true }, "source": [ - "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$. \n", + "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n", "\n", "The answer to this exercise should be included in the theory part of\n", - "the report. This exercise is also part of the weekly exercises of\n", - "week 38. Explain what the terms mean and discuss their\n", + "the report. This exercise is also part of the weekly exercises of\n", + "week 38. Explain what the terms mean and discuss their\n", "interpretations.\n", "\n", "Perform then a bias-variance analysis of the Runge function by\n", @@ -495,7 +520,7 @@ "Discuss the bias and variance trade-off as function\n", "of your model complexity (the degree of the polynomial) and the number\n", "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n", - "You can follow the code example in the jupyter-book at ." + "You can follow the code example in the jupyter-book at .\n" ] }, { @@ -505,20 +530,20 @@ "editable": true }, "source": [ - "### Part h): Cross-validation as resampling techniques, adding more complexity\n", + "### Part h): Cross-validation as resampling techniques, adding more complexity\n", "\n", "The aim here is to implement another widely popular\n", - "resampling technique, the so-called cross-validation method. \n", + "resampling technique, the so-called cross-validation method.\n", "\n", "Implement the $k$-fold cross-validation algorithm (feel free to use\n", "the functionality of **Scikit-Learn** or write your own code) and\n", "evaluate again the MSE function resulting from the test folds.\n", "\n", "Compare the MSE you get from your cross-validation code with the one\n", - "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results. \n", + "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n", "\n", "In addition to using the ordinary least squares method, you should\n", - "include both Ridge and Lasso regression in the final analysis." + "include both Ridge and Lasso regression in the final analysis.\n" ] }, { @@ -532,7 +557,7 @@ "\n", "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n", "\n", - "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h)." + "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n" ] }, { @@ -544,25 +569,25 @@ "source": [ "## Introduction to numerical projects\n", "\n", - "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers. \n", + "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n", "\n", - " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", "\n", - " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", "\n", - " * Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", + "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", "\n", - " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", "\n", - " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", "\n", - " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", "\n", - " * Try to give an interpretation of you results in your answers to the problems.\n", + "- Try to give an interpretation of you results in your answers to the problems.\n", "\n", - " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", "\n", - " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n" ] }, { @@ -574,17 +599,17 @@ "source": [ "## Format for electronic delivery of report and programs\n", "\n", - "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", "\n", - " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "- Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", "\n", - " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", "\n", - " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", "\n", - "Finally, \n", - "we encourage you to collaborate. Optimal working groups consist of \n", - "2-3 students. You can then hand in a common report." + "Finally,\n", + "we encourage you to collaborate. Optimal working groups consist of\n", + "2-3 students. You can then hand in a common report.\n" ] }, { @@ -596,42 +621,46 @@ "source": [ "## Software and needed installations\n", "\n", - "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, \n", + "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n", "we recommend that you install the following Python packages via **pip** as\n", + "\n", "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n", "\n", "For Python3, replace **pip** with **pip3**.\n", "\n", - "See below for a discussion of **tensorflow** and **scikit-learn**. \n", + "See below for a discussion of **tensorflow** and **scikit-learn**.\n", "\n", - "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows \n", + "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n", "for a seamless installation of additional software via for example\n", + "\n", "1. brew install python3\n", "\n", "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n", - "you can use **pip** as well and simply install Python as \n", - "1. sudo apt-get install python3 (or python for python2.7)\n", + "you can use **pip** as well and simply install Python as\n", + "\n", + "1. sudo apt-get install python3 (or python for python2.7)\n", + "\n", + "etc etc.\n", "\n", - "etc etc. \n", + "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", "\n", - "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n", "\n", - "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", + "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", "\n", "Popular software packages written in Python for ML are\n", "\n", - "* [Scikit-learn](http://scikit-learn.org/stable/), \n", + "- [Scikit-learn](http://scikit-learn.org/stable/),\n", "\n", - "* [Tensorflow](https://www.tensorflow.org/),\n", + "- [Tensorflow](https://www.tensorflow.org/),\n", "\n", - "* [PyTorch](http://pytorch.org/) and \n", + "- [PyTorch](http://pytorch.org/) and\n", "\n", - "* [Keras](https://keras.io/).\n", + "- [Keras](https://keras.io/).\n", "\n", - "These are all freely available at their respective GitHub sites. They \n", + "These are all freely available at their respective GitHub sites. They\n", "encompass communities of developers in the thousands or more. And the number\n", - "of code developers and contributors keeps increasing." + "of code developers and contributors keeps increasing.\n" ] } ], diff --git a/doc/LectureNotes/_build/html/_sources/project2.ipynb b/doc/LectureNotes/_build/html/_sources/project2.ipynb new file mode 100644 index 000000000..faf4aee16 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/project2.ipynb @@ -0,0 +1,635 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "96e577ca", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "067c02b9", + "metadata": { + "editable": true + }, + "source": [ + "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n", + "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n", + "\n", + "Date: **October 14, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "01f9fedd", + "metadata": { + "editable": true + }, + "source": [ + "## Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + "\n", + " * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + "\n", + " * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "\n", + "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + "\n", + "A PDF file of the report\n", + " * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + "\n", + " * A README file with the name of the group members\n", + "\n", + " * a short description of the project\n", + "\n", + " * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce" + ] + }, + { + "cell_type": "markdown", + "id": "9f8e4871", + "metadata": { + "editable": true + }, + "source": [ + "### Preamble: Note on writing reports, using reference material, AI and other tools\n", + "\n", + "We want you to answer the three different projects by handing in\n", + "reports written like a standard scientific/technical report. The links\n", + "at\n", + "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n", + "contain more information. There you can find examples of previous\n", + "reports, the projects themselves, how we grade reports etc. How to\n", + "write reports will also be discussed during the various lab\n", + "sessions. Please do ask us if you are in doubt.\n", + "\n", + "When using codes and material from other sources, you should refer to\n", + "these in the bibliography of your report, indicating wherefrom you for\n", + "example got the code, whether this is from the lecture notes,\n", + "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n", + "sources. These sources should always be cited correctly. How to cite\n", + "some of the libraries is often indicated from their corresponding\n", + "GitHub sites or websites, see for example how to cite Scikit-Learn at\n", + "https://scikit-learn.org/dev/about.html./n", + "\n", + "We enocurage you to use tools like ChatGPT or similar in writing the\n", + "report. If you use for example ChatGPT, please do cite it properly and\n", + "include (if possible) your questions and answers as an addition to the\n", + "report. This can be uploaded to for example your website,\n", + "GitHub/GitLab or similar as supplemental material.\n", + "\n", + "If you would like to study other data sets, feel free to propose other\n", + "sets. What we have proposed here are mere suggestions from our\n", + "side. If you opt for another data set, consider using a set which has\n", + "been studied in the scientific literature. This makes it easier for\n", + "you to compare and analyze your results. Comparing with existing\n", + "results from the scientific literature is also an essential element of\n", + "the scientific discussion. The University of California at Irvine with\n", + "its Machine Learning repository at\n", + "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n", + "up for examples and inspiration. Kaggle.com is an equally interesting\n", + "site. Feel free to explore these sites." + ] + }, + { + "cell_type": "markdown", + "id": "460cc6ea", + "metadata": { + "editable": true + }, + "source": [ + "## Classification and Regression, writing our own neural network code\n", + "\n", + "The main aim of this project is to study both classification and\n", + "regression problems by developing our own \n", + "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see and ) as well as the lecture material from the same weeks (see and ) should contain enough information for you to get started with writing your own code.\n", + "\n", + "We will also reuse our codes on gradient descent methods from project 1.\n", + "\n", + "The data sets that we propose here are (the default sets)\n", + "\n", + "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n", + "\n", + " * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "* Classification.\n", + "\n", + " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at .\n", + "\n", + "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1." + ] + }, + { + "cell_type": "markdown", + "id": "d62a07ef", + "metadata": { + "editable": true + }, + "source": [ + "### Part a): Analytical warm-up\n", + "\n", + "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n", + "gradients. The functions whose gradients we need are:\n", + "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n", + "\n", + "2. The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms\n", + "\n", + "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n", + "\n", + "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.\n", + "\n", + "We will test three activation functions for our neural network setup, these are the \n", + "1. The Sigmoid (aka **logit**) function,\n", + "\n", + "2. the RELU function and\n", + "\n", + "3. the Leaky RELU function\n", + "\n", + "Set up their expressions and their first derivatives.\n", + "You may consult the lecture notes (with codes and more) from week 42 at ." + ] + }, + { + "cell_type": "markdown", + "id": "9cd8b8ac", + "metadata": { + "editable": true + }, + "source": [ + "### Reminder about the gradient machinery from project 1\n", + "\n", + "In the setup of a neural network code you will need your gradient descent codes from\n", + "project 1. For neural networks we will recommend using stochastic\n", + "gradient descent with either the RMSprop or the ADAM algorithms for\n", + "updating the learning rates. But you should feel free to try plain gradient descent as well.\n", + "\n", + "We recommend reading chapter 8 on optimization from the textbook of\n", + "Goodfellow, Bengio and Courville at\n", + ". This chapter contains many\n", + "useful insights and discussions on the optimization part of machine\n", + "learning. A useful reference on the back progagation algorithm is\n", + "Nielsen's book at . \n", + "\n", + "You will find the Python [Seaborn\n", + "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n", + "useful when plotting the results as function of the learning rate\n", + "$\\eta$ and the hyper-parameter $\\lambda$ ." + ] + }, + { + "cell_type": "markdown", + "id": "5931b155", + "metadata": { + "editable": true + }, + "source": [ + "### Part b): Writing your own Neural Network code\n", + "\n", + "Your aim now, and this is the central part of this project, is to\n", + "write your own FFNN code implementing the back\n", + "propagation algorithm discussed in the lecture slides from week 41 at and week 42 at .\n", + "\n", + "We will focus on a regression problem first, using the one-dimensional Runge function" + ] + }, + { + "cell_type": "markdown", + "id": "b273fc8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1+25x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e13db1ec", + "metadata": { + "editable": true + }, + "source": [ + "from project 1.\n", + "\n", + "Use only the mean-squared error as cost function (no regularization terms) and \n", + "write an FFNN code for a regression problem with a flexible number of hidden\n", + "layers and nodes using only the Sigmoid function as activation function for\n", + "the hidden layers. Initialize the weights using a normal\n", + "distribution. How would you initialize the biases? And which\n", + "activation function would you select for the final output layer?\n", + "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n", + "\n", + "Train your network and compare the results with those from your OLS\n", + "regression code from project 1 using the one-dimensional Runge\n", + "function. When comparing your neural network code with the OLS\n", + "results from project 1, use the same data sets which gave you the best\n", + "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n", + "best result. Compare these results with your neural network with one\n", + "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n", + "\n", + "Comment your results and give a critical discussion of the results\n", + "obtained with the OLS code from project 1 and your own neural network\n", + "code. Make an analysis of the learning rates employed to find the\n", + "optimal MSE score. Test both stochastic gradient descent\n", + "with RMSprop and ADAM and plain gradient descent with different\n", + "learning rates.\n", + "\n", + "You should, as you did in project 1, scale your data." + ] + }, + { + "cell_type": "markdown", + "id": "4f864e31", + "metadata": { + "editable": true + }, + "source": [ + "### Part c): Testing against other software libraries\n", + "\n", + "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n", + "\n", + "Furthermore, you should also test that your derivatives are correctly\n", + "calculated using automatic differentiation, using for example the\n", + "**Autograd** library or the **JAX** library. It is optional to implement\n", + "these libraries for the present project. In this project they serve as\n", + "useful tests of our derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "c9faeafd", + "metadata": { + "editable": true + }, + "source": [ + "### Part d): Testing different activation functions and depths of the neural network\n", + "\n", + "You should also test different activation functions for the hidden\n", + "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n", + "discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n", + "It is optional in this project to perform a bias-variance trade-off analysis." + ] + }, + { + "cell_type": "markdown", + "id": "d865c22b", + "metadata": { + "editable": true + }, + "source": [ + "### Part e): Testing different norms\n", + "\n", + "Finally, still using the one-dimensional Runge function, add now the\n", + "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms. Find the\n", + "optimal results for the hyperparameters $\\lambda$ and the learning\n", + "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n", + "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n", + "Use again the same data sets and the best results from project 1 in your comparisons." + ] + }, + { + "cell_type": "markdown", + "id": "5270af8f", + "metadata": { + "editable": true + }, + "source": [ + "### Part f): Classification analysis using neural networks\n", + "\n", + "With a well-written code it should now be easy to change the\n", + "activation function for the output layer.\n", + "\n", + "Here we will change the cost function for our neural network code\n", + "developed in parts b), d) and e) in order to perform a classification\n", + "analysis. The classification problem we will study is the multiclass\n", + "MNIST problem, see the description of the full data set at\n", + ". We will use the Softmax cross entropy function discussed in a). \n", + "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n", + "\n", + "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the \n", + "MNIST-Fashion data set at for example .\n", + "\n", + "To set up the data set, the following python programs may be useful" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e0e1fea", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import fetch_openml\n", + "\n", + "# Fetch the MNIST dataset\n", + "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n", + "\n", + "# Extract data (features) and target (labels)\n", + "X = mnist.data\n", + "y = mnist.target" + ] + }, + { + "cell_type": "markdown", + "id": "8fe85677", + "metadata": { + "editable": true + }, + "source": [ + "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b28318b2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = X / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "97e02c71", + "metadata": { + "editable": true + }, + "source": [ + "And then perform the standard train-test splitting" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "88af355c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "d1f8f0ed", + "metadata": { + "editable": true + }, + "source": [ + "To measure the performance of our classification problem we will use the\n", + "so-called *accuracy* score. The accuracy is as you would expect just\n", + "the number of correctly guessed targets $t_i$ divided by the total\n", + "number of targets, that is" + ] + }, + { + "cell_type": "markdown", + "id": "554b3a48", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77bfdd5c", + "metadata": { + "editable": true + }, + "source": [ + "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n", + "otherwise if we have a binary classification problem. Here $t_i$\n", + "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n", + "\n", + "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions. \n", + "\n", + "Again, we strongly recommend that you compare your own neural Network\n", + "code for classification and pertinent results against a similar code using **Scikit-Learn** or **tensorflow/keras** or **pytorch**.\n", + "\n", + "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n", + "The weblink here compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n", + "\n", + "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "eaa9e72e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "# Initialize the model\n", + "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n", + "# Train the model\n", + "model.fit(X_train, y_train)\n", + "from sklearn.metrics import accuracy_score\n", + "# Make predictions on the test set\n", + "y_pred = model.predict(X_test)\n", + "# Calculate accuracy\n", + "accuracy = accuracy_score(y_test, y_pred)\n", + "print(f\"Model Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "c7ba883e", + "metadata": { + "editable": true + }, + "source": [ + "### Part g) Critical evaluation of the various algorithms\n", + "\n", + "After all these glorious calculations, you should now summarize the\n", + "various algorithms and come with a critical evaluation of their pros\n", + "and cons. Which algorithm works best for the regression case and which\n", + "is best for the classification case. These codes can also be part of\n", + "your final project 3, but now applied to other data sets." + ] + }, + { + "cell_type": "markdown", + "id": "595be693", + "metadata": { + "editable": true + }, + "source": [ + "## Summary of methods to implement and analyze\n", + "\n", + "**Required Implementation:**\n", + "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n", + "\n", + "2. Implement a neural network with\n", + "\n", + " * A flexible number of layers\n", + "\n", + " * A flexible number of nodes in each layer\n", + "\n", + " * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n", + "\n", + " * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n", + "\n", + " * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n", + "\n", + "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n", + "\n", + "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n", + "\n", + " * With no optimization algorithm\n", + "\n", + " * With RMS Prop\n", + "\n", + " * With ADAM\n", + "\n", + "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n", + "\n", + "6. Implement and compute metrics like the MSE and Accuracy" + ] + }, + { + "cell_type": "markdown", + "id": "35138b41", + "metadata": { + "editable": true + }, + "source": [ + "### Required Analysis:\n", + "\n", + "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n", + "\n", + "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n", + "\n", + "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n", + "\n", + "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n", + "\n", + "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network" + ] + }, + { + "cell_type": "markdown", + "id": "b18bea03", + "metadata": { + "editable": true + }, + "source": [ + "### Optional (Note that you should include at least two of these in the report):\n", + "\n", + "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n", + "\n", + "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n", + "\n", + "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n", + "\n", + "4. Use a more complex classification dataset instead, like the fashion MNIST (see )\n", + "\n", + "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "6. Compute and interpret a confusion matrix of your best classification model (see )" + ] + }, + { + "cell_type": "markdown", + "id": "580d8424", + "metadata": { + "editable": true + }, + "source": [ + "## Background literature\n", + "\n", + "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at . It is an excellent read.\n", + "\n", + "2. Goodfellow, Bengio and Courville, Deep Learning at . Here we recommend chapters 6, 7 and 8\n", + "\n", + "3. Raschka et al. at . Here we recommend chapters 11, 12 and 13." + ] + }, + { + "cell_type": "markdown", + "id": "96f5c67e", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to numerical projects\n", + "\n", + "Here follows a brief recipe and recommendation on how to write a report for each\n", + "project.\n", + "\n", + " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "\n", + " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "\n", + " * Include the source code of your program. Comment your program properly.\n", + "\n", + " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "\n", + " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "\n", + " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "\n", + " * Try to give an interpretation of you results in your answers to the problems.\n", + "\n", + " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "\n", + " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + ] + }, + { + "cell_type": "markdown", + "id": "d1bc28ba", + "metadata": { + "editable": true + }, + "source": [ + "## Format for electronic delivery of report and programs\n", + "\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n", + "\n", + " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "\n", + " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "\n", + " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "\n", + "Finally, \n", + "we encourage you to collaborate. Optimal working groups consist of \n", + "2-3 students. You can then hand in a common report." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week37.ipynb b/doc/LectureNotes/_build/html/_sources/week37.ipynb new file mode 100644 index 000000000..fe89adb05 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week37.ipynb @@ -0,0 +1,3856 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d842e7e1", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0cd52479", + "metadata": { + "editable": true + }, + "source": [ + "# Week 37: Gradient descent methods\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 8-12, 2025**\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "699b6141", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 37, lecture Monday\n", + "\n", + "**Plans and material for the lecture on Monday September 8.**\n", + "\n", + "The family of gradient descent methods\n", + "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n", + "\n", + "2. Improving gradient descent with momentum\n", + "\n", + "3. Introducing stochastic gradient descent\n", + "\n", + "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n", + "\n", + "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n", + "\n", + "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "dd264b1c", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at and chapter 8.3-8.5 at \n", + "\n", + "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n", + "\n", + "3. Video on gradient descent at \n", + "\n", + "4. Video on Stochastic gradient descent at " + ] + }, + { + "cell_type": "markdown", + "id": "608927bc", + "metadata": { + "editable": true + }, + "source": [ + "## Material for lecture Monday September 8" + ] + }, + { + "cell_type": "markdown", + "id": "60640670", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and revisiting Ordinary Least Squares from last week\n", + "\n", + "Last week we started with linear regression as a case study for the gradient descent\n", + "methods. Linear regression is a great test case for the gradient\n", + "descent methods discussed in the lectures since it has several\n", + "desirable properties such as:\n", + "\n", + "1. An analytical solution (recall homework sets for week 35).\n", + "\n", + "2. The gradient can be computed analytically.\n", + "\n", + "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n", + "\n", + "We revisit an example similar to what we had in the first homework set. We have a function of the type" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "947b67ee", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "x = 2*np.random.rand(m,1)\n", + "y = 4+3*x+np.random.randn(m,1)" + ] + }, + { + "cell_type": "markdown", + "id": "0a787eca", + "metadata": { + "editable": true + }, + "source": [ + "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n", + "The linear regression model is given by" + ] + }, + { + "cell_type": "markdown", + "id": "d7e84ac7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f34c217e", + "metadata": { + "editable": true + }, + "source": [ + "such that" + ] + }, + { + "cell_type": "markdown", + "id": "b145d4eb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2df6d60d", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent example\n", + "\n", + "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n", + "\n", + "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)" + ] + }, + { + "cell_type": "markdown", + "id": "1deafba0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "X \\equiv \\begin{bmatrix}\n", + "1 & x_1 \\\\\n", + "\\vdots & \\vdots \\\\\n", + "1 & x_{100} & \\\\\n", + "\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "520ac423", + "metadata": { + "editable": true + }, + "source": [ + "The cost/loss/risk function is given by" + ] + }, + { + "cell_type": "markdown", + "id": "48e7232b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0194af20", + "metadata": { + "editable": true + }, + "source": [ + "and we want to find $\\theta$ such that $C(\\theta)$ is minimized." + ] + }, + { + "cell_type": "markdown", + "id": "9f58d823", + "metadata": { + "editable": true + }, + "source": [ + "## The derivative of the cost/loss function\n", + "\n", + "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show that the gradient can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "10129d02", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4cd07523", + "metadata": { + "editable": true + }, + "source": [ + "where $X$ is the design matrix defined above." + ] + }, + { + "cell_type": "markdown", + "id": "1bda7e01", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix\n", + "The Hessian matrix of $C(\\theta)$ is given by" + ] + }, + { + "cell_type": "markdown", + "id": "aa64bdd1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3e7f4c5d", + "metadata": { + "editable": true + }, + "source": [ + "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite." + ] + }, + { + "cell_type": "markdown", + "id": "79ed73a8", + "metadata": { + "editable": true + }, + "source": [ + "## Simple program\n", + "\n", + "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to" + ] + }, + { + "cell_type": "markdown", + "id": "1b70ad9b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2fbef92d", + "metadata": { + "editable": true + }, + "source": [ + "We can use the expression we computed for the gradient and let use a\n", + "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n", + "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n", + "\n", + "And finally we can compare our solution for $\\theta$ with the analytic result given by \n", + "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$." + ] + }, + { + "cell_type": "markdown", + "id": "0728a369", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient Descent Example\n", + "\n", + "Here our simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a48d43f0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "\n", + "# Importing various packages\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "# Hessian matrix\n", + "H = (2.0/n)* X.T @ X\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n", + "print(theta_linreg)\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "for iter in range(Niterations):\n", + " gradient = (2.0/n)*X.T @ (X @ theta-y)\n", + " theta -= eta*gradient\n", + "\n", + "print(theta)\n", + "xnew = np.array([[0],[2]])\n", + "xbnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = xbnew.dot(theta)\n", + "ypredict2 = xbnew.dot(theta_linreg)\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6c1c6ed1", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and Ridge\n", + "\n", + "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$," + ] + }, + { + "cell_type": "markdown", + "id": "a82ce6e3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb0de7c2", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows" + ] + }, + { + "cell_type": "markdown", + "id": "b76c0dea", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C_{\\text{ridge}}(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4eeb07f6", + "metadata": { + "editable": true + }, + "source": [ + "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by" + ] + }, + { + "cell_type": "markdown", + "id": "cc7d6c64", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "08bd65db", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix for Ridge Regression\n", + "The Hessian matrix of Ridge Regression for our simple example is given by" + ] + }, + { + "cell_type": "markdown", + "id": "a1c5a4d1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f178c97e", + "metadata": { + "editable": true + }, + "source": [ + "This implies that the Hessian matrix is positive definite, hence the stationary point is a\n", + "minimum.\n", + "Note that the Ridge cost function is convex being a sum of two convex\n", + "functions. Therefore, the stationary point is a global\n", + "minimum of this function." + ] + }, + { + "cell_type": "markdown", + "id": "3853aec7", + "metadata": { + "editable": true + }, + "source": [ + "## Program example for gradient descent with Ridge Regression" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "81740e7b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "\n", + "#Ridge parameter lambda\n", + "lmbda = 0.001\n", + "Id = n*lmbda* np.eye(XT_X.shape[0])\n", + "\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "\n", + "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n", + "print(theta_linreg)\n", + "# Start plain gradient descent\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n", + " theta -= eta*gradients\n", + "\n", + "print(theta)\n", + "ypredict = X @ theta\n", + "ypredict2 = X @ theta_linreg\n", + "plt.plot(x, ypredict, \"r-\")\n", + "plt.plot(x, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example for Ridge')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "aa1b6e08", + "metadata": { + "editable": true + }, + "source": [ + "## Using gradient descent methods, limitations\n", + "\n", + "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n", + "\n", + "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n", + "\n", + "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n", + "\n", + "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n", + "\n", + "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n", + "\n", + "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points." + ] + }, + { + "cell_type": "markdown", + "id": "d1b9be1a", + "metadata": { + "editable": true + }, + "source": [ + "## Momentum based GD\n", + "\n", + "We discuss here some simple examples where we introduce what is called\n", + "'memory'about previous steps, or what is normally called momentum\n", + "gradient descent.\n", + "For the mathematical details, see whiteboad notes from lecture on September 8, 2025." + ] + }, + { + "cell_type": "markdown", + "id": "2e1267e6", + "metadata": { + "editable": true + }, + "source": [ + "## Improving gradient descent with momentum" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "494e82a7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# take a step\n", + "\t\tsolution = solution - step_size * gradient\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# perform the gradient descent search\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "46858c7c", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "6a917123", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# keep track of the change\n", + "\tchange = 0.0\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# calculate update\n", + "\t\tnew_change = step_size * gradient + momentum * change\n", + "\t\t# take a step\n", + "\t\tsolution = solution - new_change\n", + "\t\t# save the change\n", + "\t\tchange = new_change\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# define momentum\n", + "momentum = 0.3\n", + "# perform the gradient descent search with momentum\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "361b2aa8", + "metadata": { + "editable": true + }, + "source": [ + "## Overview video on Stochastic Gradient Descent (SGD)\n", + "\n", + "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n", + "There are several reasons for using stochastic gradient descent. Some of these are:\n", + "\n", + "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n", + "\n", + "2. Hopefully avoid Local Minima\n", + "\n", + "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset." + ] + }, + { + "cell_type": "markdown", + "id": "2dacb8ef", + "metadata": { + "editable": true + }, + "source": [ + "## Batches and mini-batches\n", + "\n", + "In gradient descent we compute the cost function and its gradient for all data points we have.\n", + "\n", + "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n", + "training data can have on order of millions of examples. Hence, it\n", + "seems wasteful to compute the full cost function over the entire\n", + "training set in order to perform only a single parameter update. A\n", + "very common approach to addressing this challenge is to compute the\n", + "gradient over batches of the training data. For example, a typical batch could contain some thousand examples from\n", + "an entire training set of several millions. This batch is then used to\n", + "perform a parameter update." + ] + }, + { + "cell_type": "markdown", + "id": "59c9add4", + "metadata": { + "editable": true + }, + "source": [ + "## Pros and cons\n", + "\n", + "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n", + "\n", + "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n", + "\n", + "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient." + ] + }, + { + "cell_type": "markdown", + "id": "a5168cc9", + "metadata": { + "editable": true + }, + "source": [ + "## Convergence rates\n", + "\n", + "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n", + "\n", + "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration." + ] + }, + { + "cell_type": "markdown", + "id": "47321307", + "metadata": { + "editable": true + }, + "source": [ + "## Accuracy\n", + "\n", + "In general, stochastic Gradient Descent is Less accurate than gradient\n", + "descent, as it calculates the gradient on single examples, which may\n", + "not accurately represent the overall dataset. Gradient Descent is\n", + "more accurate because it uses the average gradient calculated over the\n", + "entire dataset.\n", + "\n", + "There are other disadvantages to using SGD. The main drawback is that\n", + "its convergence behaviour can be more erratic due to the random\n", + "sampling of individual training examples. This can lead to less\n", + "accurate results, as the algorithm may not converge to the true\n", + "minimum of the cost function. Additionally, the learning rate, which\n", + "determines the step size of each update to the model’s parameters,\n", + "must be carefully chosen to ensure convergence.\n", + "\n", + "It is however the method of choice in deep learning algorithms where\n", + "SGD is often used in combination with other optimization techniques,\n", + "such as momentum or adaptive learning rates" + ] + }, + { + "cell_type": "markdown", + "id": "96f44d6b", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent (SGD)\n", + "\n", + "In stochastic gradient descent, the extreme case is the case where we\n", + "have only one batch, that is we include the whole data set.\n", + "\n", + "This process is called Stochastic Gradient\n", + "Descent (SGD) (or also sometimes on-line gradient descent). This is\n", + "relatively less common to see because in practice due to vectorized\n", + "code optimizations it can be computationally much more efficient to\n", + "evaluate the gradient for 100 examples, than the gradient for one\n", + "example 100 times. Even though SGD technically refers to using a\n", + "single example at a time to evaluate the gradient, you will hear\n", + "people use the term SGD even when referring to mini-batch gradient\n", + "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n", + "for “Batch gradient descent” are rare to see), where it is usually\n", + "assumed that mini-batches are used. The size of the mini-batch is a\n", + "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n", + "usually based on memory constraints (if any), or set to some value,\n", + "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n", + "vectorized operation implementations work faster when their inputs are\n", + "sized in powers of 2.\n", + "\n", + "In our notes with SGD we mean stochastic gradient descent with mini-batches." + ] + }, + { + "cell_type": "markdown", + "id": "898ef421", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent\n", + "\n", + "Stochastic gradient descent (SGD) and variants thereof address some of\n", + "the shortcomings of the Gradient descent method discussed above.\n", + "\n", + "The underlying idea of SGD comes from the observation that the cost\n", + "function, which we want to minimize, can almost always be written as a\n", + "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$," + ] + }, + { + "cell_type": "markdown", + "id": "4e827950", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05e99546", + "metadata": { + "editable": true + }, + "source": [ + "## Computation of gradients\n", + "\n", + "This in turn means that the gradient can be\n", + "computed as a sum over $i$-gradients" + ] + }, + { + "cell_type": "markdown", + "id": "b92afe6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b20a4aca", + "metadata": { + "editable": true + }, + "source": [ + "Stochasticity/randomness is introduced by only taking the\n", + "gradient on a subset of the data called minibatches. If there are $n$\n", + "data points and the size of each minibatch is $M$, there will be $n/M$\n", + "minibatches. We denote these minibatches by $B_k$ where\n", + "$k=1,\\cdots,n/M$." + ] + }, + { + "cell_type": "markdown", + "id": "7884cc0d", + "metadata": { + "editable": true + }, + "source": [ + "## SGD example\n", + "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n", + "and we choose to have $M=5$ minibathces,\n", + "then each minibatch contains two data points. In particular we have\n", + "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n", + "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n", + "have only a single batch with all data points and on the other extreme,\n", + "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n", + "$B_k = \\mathbf{x}_k$.\n", + "\n", + "The idea is now to approximate the gradient by replacing the sum over\n", + "all data points with a sum over the data points in one the minibatches\n", + "picked at random in each gradient descent step" + ] + }, + { + "cell_type": "markdown", + "id": "392aeed0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta}\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n", + "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "04581249", + "metadata": { + "editable": true + }, + "source": [ + "## The gradient step\n", + "\n", + "Thus a gradient descent step now looks like" + ] + }, + { + "cell_type": "markdown", + "id": "d21077a4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b4bed668", + "metadata": { + "editable": true + }, + "source": [ + "where $k$ is picked at random with equal\n", + "probability from $[1,n/M]$. An iteration over the number of\n", + "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n", + "typical to choose a number of epochs and for each epoch iterate over\n", + "the number of minibatches, as exemplified in the code below." + ] + }, + { + "cell_type": "markdown", + "id": "9c15b282", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example code" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "602bda4c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 10 #number of epochs\n", + "\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for \n", + " j += 1" + ] + }, + { + "cell_type": "markdown", + "id": "332831a7", + "metadata": { + "editable": true + }, + "source": [ + "Taking the gradient only on a subset of the data has two important\n", + "benefits. First, it introduces randomness which decreases the chance\n", + "that our opmization scheme gets stuck in a local minima. Second, if\n", + "the size of the minibatches are small relative to the number of\n", + "datapoints ($M < n$), the computation of the gradient is much\n", + "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n", + "all $n$ datapoints." + ] + }, + { + "cell_type": "markdown", + "id": "187eb27c", + "metadata": { + "editable": true + }, + "source": [ + "## When do we stop?\n", + "\n", + "A natural question is when do we stop the search for a new minimum?\n", + "One possibility is to compute the full gradient after a given number\n", + "of epochs and check if the norm of the gradient is smaller than some\n", + "threshold and stop if true. However, the condition that the gradient\n", + "is zero is valid also for local minima, so this would only tell us\n", + "that we are close to a local/global minimum. However, we could also\n", + "evaluate the cost function at this point, store the result and\n", + "continue the search. If the test kicks in at a later stage we can\n", + "compare the values of the cost function and keep the $\\theta$ that\n", + "gave the lowest value." + ] + }, + { + "cell_type": "markdown", + "id": "8ddbdbb5", + "metadata": { + "editable": true + }, + "source": [ + "## Slightly different approach\n", + "\n", + "Another approach is to let the step length $\\eta_j$ depend on the\n", + "number of epochs in such a way that it becomes very small after a\n", + "reasonable time such that we do not move at all. Such approaches are\n", + "also called scaling. There are many such ways to [scale the learning\n", + "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n", + "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n", + "also\n", + "\n", + "for a discussion of different scaling functions for the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "35ea8e21", + "metadata": { + "editable": true + }, + "source": [ + "## Time decay rate\n", + "\n", + "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n", + "\n", + "In this way we can fix the number of epochs, compute $\\theta$ and\n", + "evaluate the cost function at the end. Repeating the computation will\n", + "give a different result since the scheme is random by design. Then we\n", + "pick the final $\\theta$ that gives the lowest value of the cost\n", + "function." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "77a60fcd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "def step_length(t,t0,t1):\n", + " return t0/(t+t1)\n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 500 #number of epochs\n", + "t0 = 1.0\n", + "t1 = 10\n", + "\n", + "eta_j = t0/t1\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for theta\n", + " t = epoch*m+i\n", + " eta_j = step_length(t,t0,t1)\n", + " j += 1\n", + "\n", + "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))" + ] + }, + { + "cell_type": "markdown", + "id": "b030b80c", + "metadata": { + "editable": true + }, + "source": [ + "## Code with a Number of Minibatches which varies\n", + "\n", + "In the code here we vary the number of mini-batches." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "9bdf875b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Importing various packages\n", + "from math import exp, sqrt\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ ((X @ theta)-y)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "365cebd9", + "metadata": { + "editable": true + }, + "source": [ + "## Replace or not\n", + "\n", + "In the above code, we have use replacement in setting up the\n", + "mini-batches. The discussion\n", + "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n", + "useful." + ] + }, + { + "cell_type": "markdown", + "id": "e7c9011a", + "metadata": { + "editable": true + }, + "source": [ + "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison" + ] + }, + { + "cell_type": "markdown", + "id": "f1c85da0", + "metadata": { + "editable": true + }, + "source": [ + "### Theoretical Convergence Speed and convex optimization\n", + "\n", + "Consider minimizing an empirical cost function" + ] + }, + { + "cell_type": "markdown", + "id": "66df0f80", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f02b845", + "metadata": { + "editable": true + }, + "source": [ + "where each $l_i(\\theta)$ is a\n", + "differentiable loss term. Gradient Descent (GD) updates parameters\n", + "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n", + "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n", + "l_i(\\theta)$ selected at random. In equation form, one GD step is:" + ] + }, + { + "cell_type": "markdown", + "id": "21997f1a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cdefe165", + "metadata": { + "editable": true + }, + "source": [ + "whereas one SGD step is:" + ] + }, + { + "cell_type": "markdown", + "id": "ac200d56", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb3edfb3", + "metadata": { + "editable": true + }, + "source": [ + "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n", + "converge to the global minimum, but their rates differ. GD can take\n", + "larger, more stable steps since it uses the exact gradient, achieving\n", + "an error that decreases on the order of $O(1/t)$ per iteration for\n", + "convex objectives (and even exponentially fast for strongly convex\n", + "cases). In contrast, plain SGD has more variance in each step, leading\n", + "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n", + "for general convex objectives (\\thetaith appropriate diminishing step\n", + "sizes) . Intuitively, GD’s trajectory is smoother and more\n", + "predictable, while SGD’s path oscillates due to noise but costs far\n", + "less per iteration, enabling many more updates in the same time." + ] + }, + { + "cell_type": "markdown", + "id": "7fe05c0d", + "metadata": { + "editable": true + }, + "source": [ + "### Strongly Convex Case\n", + "\n", + "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n", + "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as" + ] + }, + { + "cell_type": "markdown", + "id": "2ae403f1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "44272171", + "metadata": { + "editable": true + }, + "source": [ + "a geometric (linear) convergence per iteration . Achieving an\n", + "$\\epsilon$-accurate solution thus takes on the order of\n", + "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n", + "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n", + "obtain a linear rate – instead, with a properly decaying step size\n", + "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n", + "$O(1/t)$ convergence rate in expectation . For example, one result\n", + "of Moulines and Bach 2011, see shows that with $\\eta_t = \\Theta(1/t)$," + ] + }, + { + "cell_type": "markdown", + "id": "9cde29ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9b77f20e", + "metadata": { + "editable": true + }, + "source": [ + "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n", + "iteration than GD’s exponential decay, but each SGD iteration is $N$\n", + "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n", + "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n", + "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n", + "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n", + "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n", + "computations. In large-scale regimes (huge $N$), SGD can be\n", + "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n", + "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n", + "with millions of data points, one epoch of GD (one full gradient) is\n", + "extremely costly, whereas SGD can make $N$ cheap updates in the time\n", + "GD makes one – often yielding a good solution faster in practice, even\n", + "though SGD’s asymptotic error decays more slowly. As one lecture\n", + "succinctly puts it: “SGD can be super effective in terms of iteration\n", + "cost and memory, but SGD is slow to converge and can’t adapt to strong\n", + "convexity” . Thus, the break-even point depends on $N$ and the desired\n", + "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n", + "updates win; for extremely high precision (very small $\\epsilon$) on a\n", + "modest $N$, GD’s fast convergence per step can be advantageous." + ] + }, + { + "cell_type": "markdown", + "id": "4479bd97", + "metadata": { + "editable": true + }, + "source": [ + "### Non-Convex Problems\n", + "\n", + "In non-convex optimization (e.g. deep neural networks), neither GD nor\n", + "SGD guarantees global minima, but SGD often displays faster progress\n", + "in finding useful minima. Theoretical results here are weaker, usually\n", + "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n", + "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n", + "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n", + "similar polynomial complexity (often worse due to gradient\n", + "noise). However, a noteworthy difference is that SGD’s stochasticity\n", + "can help escape saddle points or poor local minima. Random gradient\n", + "fluctuations act like implicit noise, helping the iterate “jump” out\n", + "of flat saddle regions where full-batch GD could stagnate . In fact,\n", + "research has shown that adding noise to GD can guarantee escaping\n", + "saddle points in polynomial time, and the inherent noise in SGD often\n", + "serves this role. Empirically, this means SGD can sometimes find a\n", + "lower loss basin faster, whereas full-batch GD might get “stuck” near\n", + "saddle points or need a very small learning rate to navigate complex\n", + "error surfaces . Overall, in modern high-dimensional machine learning,\n", + "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n", + "because it converges to good solutions much faster in practice,\n", + "despite the lack of a linear convergence guarantee. Full-batch GD is\n", + "rarely used on large neural networks, as it would require tiny steps\n", + "to avoid divergence and is extremely slow per iteration ." + ] + }, + { + "cell_type": "markdown", + "id": "31ea65c9", + "metadata": { + "editable": true + }, + "source": [ + "## Memory Usage and Scalability\n", + "\n", + "A major advantage of SGD is its memory efficiency in handling large\n", + "datasets. Full-batch GD requires access to the entire training set for\n", + "each iteration, which often means the whole dataset (or a large\n", + "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n", + "in memory usage that scales linearly with the dataset size $N$. For\n", + "instance, if each training sample is large (e.g. high-dimensional\n", + "features), computing a full gradient may require storing a substantial\n", + "portion of the data or all intermediate gradients until they are\n", + "aggregated. In contrast, SGD needs only a single (or a small\n", + "mini-batch of) training example(s) in memory at any time . The\n", + "algorithm processes one sample (or mini-batch) at a time and\n", + "immediately updates the model, discarding that sample before moving to\n", + "the next. This streaming approach means that memory footprint is\n", + "essentially independent of $N$ (apart from storing the model\n", + "parameters themselves). As one source notes, gradient descent\n", + "“requires more memory than SGD” because it “must store the entire\n", + "dataset for each iteration,” whereas SGD “only needs to store the\n", + "current training example” . In practical terms, if you have a dataset\n", + "of size, say, 1 million examples, full-batch GD would need memory for\n", + "all million every step, while SGD could be implemented to load just\n", + "one example at a time – a crucial benefit if data are too large to fit\n", + "in RAM or GPU memory. This scalability makes SGD suitable for\n", + "large-scale learning: as long as you can stream data from disk, SGD\n", + "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n", + "“does not need to remember which examples were visited” in the past,\n", + "allowing it to run in an online fashion on infinite data streams\n", + ". Full-batch GD, on the other hand, would require multiple passes\n", + "through a giant dataset per update (or a complex distributed memory\n", + "system), which is often infeasible.\n", + "\n", + "There is also a secondary memory effect: computing a full-batch\n", + "gradient in deep learning requires storing all intermediate\n", + "activations for backpropagation across the entire batch. A very large\n", + "batch (approaching the full dataset) might exhaust GPU memory due to\n", + "the need to hold activation gradients for thousands or millions of\n", + "examples simultaneously. SGD/minibatches mitigate this by splitting\n", + "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n", + "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n", + "could not even be executed if $N$ is huge. Techniques like gradient\n", + "accumulation exist to simulate large-batch GD by summing many\n", + "small-batch gradients – but these still process data in manageable\n", + "chunks to avoid memory overflow. In summary, memory complexity for GD\n", + "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n", + "(only the model and perhaps a mini-batch reside in memory) . This is a\n", + "key reason why batch GD “does not scale” to very large data and why\n", + "virtually all large-scale machine learning algorithms rely on\n", + "stochastic or mini-batch methods." + ] + }, + { + "cell_type": "markdown", + "id": "3f3fe4c4", + "metadata": { + "editable": true + }, + "source": [ + "## Empirical Evidence: Convergence Time and Memory in Practice\n", + "\n", + "Empirical studies strongly support the theoretical trade-offs\n", + "above. In large-scale machine learning tasks, SGD often converges to a\n", + "good solution much faster in wall-clock time than full-batch GD, and\n", + "it uses far less memory. For example, Bottou & Bousquet (2008)\n", + "analyzed learning time under a fixed computational budget and\n", + "concluded that when data is abundant, it’s better to use a faster\n", + "(even if less precise) optimization method to process more examples in\n", + "the same time . This analysis showed that for large-scale problems,\n", + "processing more data with SGD yields lower error than spending the\n", + "time to do exact (batch) optimization on fewer data . In other words,\n", + "if you have a time budget, it’s often optimal to accept slightly\n", + "slower convergence per step (as with SGD) in exchange for being able\n", + "to use many more training samples in that time. This phenomenon is\n", + "borne out by experiments:" + ] + }, + { + "cell_type": "markdown", + "id": "69d08c69", + "metadata": { + "editable": true + }, + "source": [ + "### Deep Neural Networks\n", + "\n", + "In modern deep learning, full-batch GD is so slow that it is rarely\n", + "attempted; instead, mini-batch SGD is standard. A recent study\n", + "demonstrated that it is possible to train a ResNet-50 on ImageNet\n", + "using full-batch gradient descent, but it required careful tuning\n", + "(e.g. gradient clipping, tiny learning rates) and vast computational\n", + "resources – and even then, each full-batch update was extremely\n", + "expensive.\n", + "\n", + "Using a huge batch\n", + "(closer to full GD) tends to slow down convergence if the learning\n", + "rate is not scaled up, and often encounters optimization difficulties\n", + "(plateaus) that small batches avoid.\n", + "Empirically, small or medium\n", + "batch SGD finds minima in fewer clock hours because it can rapidly\n", + "loop over the data with gradient noise aiding exploration." + ] + }, + { + "cell_type": "markdown", + "id": "4e2b549d", + "metadata": { + "editable": true + }, + "source": [ + "### Memory constraints\n", + "\n", + "From a memory standpoint, practitioners note that batch GD becomes\n", + "infeasible on large data. For example, if one tried to do full-batch\n", + "training on a dataset that doesn’t fit in RAM or GPU memory, the\n", + "program would resort to heavy disk I/O or simply crash. SGD\n", + "circumvents this by processing mini-batches. Even in cases where data\n", + "does fit in memory, using a full batch can spike memory usage due to\n", + "storing all gradients. One empirical observation is that mini-batch\n", + "training has a “lower, fluctuating usage pattern” of memory, whereas\n", + "full-batch loading “quickly consumes memory (often exceeding limits)”\n", + ". This is especially relevant for graph neural networks or other\n", + "models where a “batch” may include a huge chunk of a graph: full-batch\n", + "gradient computation can exhaust GPU memory, whereas mini-batch\n", + "methods keep memory usage manageable .\n", + "\n", + "In summary, SGD converges faster than full-batch GD in terms of actual\n", + "training time for large-scale problems, provided we measure\n", + "convergence as reaching a good-enough solution. Theoretical bounds\n", + "show SGD needs more iterations, but because it performs many more\n", + "updates per unit time (and requires far less memory), it often\n", + "achieves lower loss in a given time frame than GD. Full-batch GD might\n", + "take slightly fewer iterations in theory, but each iteration is so\n", + "costly that it is “slower… especially for large datasets” . Meanwhile,\n", + "memory scaling strongly favors SGD: GD’s memory cost grows with\n", + "dataset size, making it impractical beyond a point, whereas SGD’s\n", + "memory use is modest and mostly constant w.r.t. $N$ . These\n", + "differences have made SGD (and mini-batch variants) the de facto\n", + "choice for training large machine learning models, from logistic\n", + "regression on millions of examples to deep neural networks with\n", + "billions of parameters. The consensus in both research and practice is\n", + "that for large-scale or high-dimensional tasks, SGD-type methods\n", + "converge quicker per unit of computation and handle memory constraints\n", + "better than standard full-batch gradient descent ." + ] + }, + { + "cell_type": "markdown", + "id": "48c2661e", + "metadata": { + "editable": true + }, + "source": [ + "## Second moment of the gradient\n", + "\n", + "In stochastic gradient descent, with and without momentum, we still\n", + "have to specify a schedule for tuning the learning rates $\\eta_t$\n", + "as a function of time. As discussed in the context of Newton's\n", + "method, this presents a number of dilemmas. The learning rate is\n", + "limited by the steepest direction which can change depending on the\n", + "current position in the landscape. To circumvent this problem, ideally\n", + "our algorithm would keep track of curvature and take large steps in\n", + "shallow, flat directions and small steps in steep, narrow directions.\n", + "Second-order methods accomplish this by calculating or approximating\n", + "the Hessian and normalizing the learning rate by the\n", + "curvature. However, this is very computationally expensive for\n", + "extremely large models. Ideally, we would like to be able to\n", + "adaptively change the step size to match the landscape without paying\n", + "the steep computational price of calculating or approximating\n", + "Hessians.\n", + "\n", + "During the last decade a number of methods have been introduced that accomplish\n", + "this by tracking not only the gradient, but also the second moment of\n", + "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n", + "[ADAM](https://arxiv.org/abs/1412.6980)." + ] + }, + { + "cell_type": "markdown", + "id": "a2106298", + "metadata": { + "editable": true + }, + "source": [ + "## Challenge: Choosing a Fixed Learning Rate\n", + "A fixed $\\eta$ is hard to get right:\n", + "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n", + "\n", + "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n", + "\n", + "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n", + "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n", + "1. Steep coordinates require a smaller step size to avoid oscillation.\n", + "\n", + "2. Flat/shallow coordinates could use a larger step to speed up progress.\n", + "\n", + "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature." + ] + }, + { + "cell_type": "markdown", + "id": "477a053c", + "metadata": { + "editable": true + }, + "source": [ + "## Motivation for Adaptive Step Sizes\n", + "\n", + "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n", + "\n", + "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n", + "\n", + "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n", + "\n", + "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n", + "\n", + "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods." + ] + }, + { + "cell_type": "markdown", + "id": "f0924df8", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "7743f26d", + "metadata": { + "editable": true + }, + "source": [ + "## Derivation of the AdaGrad Algorithm\n", + "\n", + "**Accumulating Gradient History.**\n", + "\n", + "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n", + "\n", + "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n", + "\n", + "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n", + "\n", + "4. At each iteration $t$, update the accumulation:" + ] + }, + { + "cell_type": "markdown", + "id": "ef4b5d6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "r_t = r_{t-1} + g_t \\circ g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "927e2738", + "metadata": { + "editable": true + }, + "source": [ + "1. Here $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n", + "\n", + "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$." + ] + }, + { + "cell_type": "markdown", + "id": "1753de13", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Update Rule Derivation\n", + "\n", + "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:" + ] + }, + { + "cell_type": "markdown", + "id": "0db67ba3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7831e978", + "metadata": { + "editable": true + }, + "source": [ + "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n", + "In coordinates, this means each parameter $j$ has an individual step size:" + ] + }, + { + "cell_type": "markdown", + "id": "92a7758a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df62a4ff", + "metadata": { + "editable": true + }, + "source": [ + "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:" + ] + }, + { + "cell_type": "markdown", + "id": "c8a2b948", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3f269e80", + "metadata": { + "editable": true + }, + "source": [ + "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows." + ] + }, + { + "cell_type": "markdown", + "id": "f4ec584c", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Properties\n", + "\n", + "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n", + "\n", + "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n", + "\n", + "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n", + "\n", + "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem\n", + "\n", + "It effectively reduces the need to tune $\\eta$ by hand.\n", + "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)" + ] + }, + { + "cell_type": "markdown", + "id": "4b741016", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp: Adaptive Learning Rates\n", + "\n", + "Addresses AdaGrad’s diminishing learning rate issue.\n", + "Uses a decaying average of squared gradients (instead of a cumulative sum):" + ] + }, + { + "cell_type": "markdown", + "id": "76108e75", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4c6a3353", + "metadata": { + "editable": true + }, + "source": [ + "with $\\rho$ typically $0.9$ (or $0.99$).\n", + "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n", + "\n", + "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n", + "\n", + "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n", + "\n", + "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)" + ] + }, + { + "cell_type": "markdown", + "id": "3e0a76ae", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "fa5fd82e", + "metadata": { + "editable": true + }, + "source": [ + "## Adam Optimizer\n", + "\n", + "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n", + "\n", + "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice." + ] + }, + { + "cell_type": "markdown", + "id": "89cda2f6", + "metadata": { + "editable": true + }, + "source": [ + "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n", + "\n", + "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n", + "both the first and second moment of the gradient and use this\n", + "information to adaptively change the learning rate for different\n", + "parameters. The method is efficient when working with large\n", + "problems involving lots data and/or parameters. It is a combination of the\n", + "gradient descent with momentum algorithm and the RMSprop algorithm\n", + "discussed above." + ] + }, + { + "cell_type": "markdown", + "id": "69310c2b", + "metadata": { + "editable": true + }, + "source": [ + "## Why Combine Momentum and RMSProp?\n", + "\n", + "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice" + ] + }, + { + "cell_type": "markdown", + "id": "7d6b8734", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Exponential Moving Averages (Moments)\n", + "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n", + "**First moment (mean) $m_t$.**\n", + "\n", + "The Momentum term" + ] + }, + { + "cell_type": "markdown", + "id": "106ce6bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ba64fd6", + "metadata": { + "editable": true + }, + "source": [ + "**Second moment (uncentered variance) $v_t$.**\n", + "\n", + "The RMS term" + ] + }, + { + "cell_type": "markdown", + "id": "d2e1a9ee", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "00aae51f", + "metadata": { + "editable": true + }, + "source": [ + "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n", + "\n", + " These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)" + ] + }, + { + "cell_type": "markdown", + "id": "38adfadd", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Bias Correction\n", + "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates" + ] + }, + { + "cell_type": "markdown", + "id": "484156fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45d1d0c2", + "metadata": { + "editable": true + }, + "source": [ + "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n", + "\n", + "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n", + "\n", + "* Bias correction is important for Adam’s stability in early iterations" + ] + }, + { + "cell_type": "markdown", + "id": "e62d5568", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Update Rule Derivation\n", + "Finally, Adam updates parameters using the bias-corrected moments:" + ] + }, + { + "cell_type": "markdown", + "id": "3eb873c1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fc1129f6", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n", + "Breaking it down:\n", + "1. Compute gradient $\\nabla C(\\theta_t)$.\n", + "\n", + "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n", + "\n", + "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n", + "\n", + "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n", + "\n", + "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n", + "\n", + "This is the Adam update rule as given in the original paper." + ] + }, + { + "cell_type": "markdown", + "id": "6f15ce48", + "metadata": { + "editable": true + }, + "source": [ + "## Adam vs. AdaGrad and RMSProp\n", + "\n", + "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n", + "\n", + "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n", + "\n", + "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n", + "\n", + " * Momentum ($m_t$) provides acceleration and smoother convergence.\n", + "\n", + " * Adaptive $v_t$ scaling moderates the step size per dimension.\n", + "\n", + " * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n", + "\n", + "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone" + ] + }, + { + "cell_type": "markdown", + "id": "44cb65e2", + "metadata": { + "editable": true + }, + "source": [ + "## Adaptivity Across Dimensions\n", + "\n", + "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n", + "\n", + "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n", + "\n", + "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction." + ] + }, + { + "cell_type": "markdown", + "id": "e3862c40", + "metadata": { + "editable": true + }, + "source": [ + "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "c4aa2b35", + "metadata": { + "editable": true + }, + "source": [ + "## Algorithms and codes for Adagrad, RMSprop and Adam\n", + "\n", + "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n", + "\n", + "The codes which implement these algorithms are discussed below here." + ] + }, + { + "cell_type": "markdown", + "id": "01de27d3", + "metadata": { + "editable": true + }, + "source": [ + "## Practical tips\n", + "\n", + "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n", + "\n", + "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n", + "\n", + "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n", + "\n", + "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications." + ] + }, + { + "cell_type": "markdown", + "id": "78a1a601", + "metadata": { + "editable": true + }, + "source": [ + "## Sneaking in automatic differentiation using Autograd\n", + "\n", + "In the examples here we take the liberty of sneaking in automatic\n", + "differentiation (without having discussed the mathematics). In\n", + "project 1 you will write the gradients as discussed above, that is\n", + "hard-coding the gradients. By introducing automatic differentiation\n", + "via the library **autograd**, which is now replaced by **JAX**, we have\n", + "more flexibility in setting up alternative cost functions.\n", + "\n", + "The\n", + "first example shows results with ordinary leats squares." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c721352d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e36cec47", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "fc5df7eb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x#+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 30\n", + "\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "# Now improve with momentum gradient descent\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "for iter in range(Niterations):\n", + " # calculate gradient\n", + " gradients = training_gradient(theta)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd wth momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "0b27af70", + "metadata": { + "editable": true + }, + "source": [ + "## Including Stochastic Gradient Descent with Autograd\n", + "\n", + "In this code we include the stochastic gradient descent approach\n", + "discussed above. Note here that we specify which argument we are\n", + "taking the derivative with respect to when using **autograd**." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "adef9763", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "310fe5b2", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bcf65acf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "\n", + "for epoch in range(n_epochs):\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + "print(\"theta from own sdg with momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "f5e2c550", + "metadata": { + "editable": true + }, + "source": [ + "## But none of these can compete with Newton's method\n", + "\n", + "Note that we here have introduced automatic differentiation" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "300a02a4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Newton's method\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+5*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "# Note that here the Hessian does not depend on the parameters theta\n", + "invH = np.linalg.pinv(H)\n", + "theta = np.random.randn(3,1)\n", + "Niterations = 5\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= invH @ gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own Newton code\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "5cb5fd26", + "metadata": { + "editable": true + }, + "source": [ + "## Similar (second order function now) problem but now with AdaGrad" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "030efc5d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " Giter += gradients*gradients\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + " theta -= update\n", + "print(\"theta from own AdaGrad\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "66850bb7", + "metadata": { + "editable": true + }, + "source": [ + "Running this code we note an almost perfect agreement with the results from matrix inversion." + ] + }, + { + "cell_type": "markdown", + "id": "e1608bcf", + "metadata": { + "editable": true + }, + "source": [ + "## RMSprop for adaptive learning rate with Stochastic Gradient Descent" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "0ba7d8f7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameter rho\n", + "rho = 0.99\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + "\t# Accumulated gradient\n", + "\t# Scaling with rho the new and the previous results\n", + " Giter = (rho*Giter+(1-rho)*gradients*gradients)\n", + "\t# Taking the diagonal only and inverting\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + "\t# Hadamard product\n", + " theta -= update\n", + "print(\"theta from own RMSprop\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "0503f74b", + "metadata": { + "editable": true + }, + "source": [ + "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "c2a2732a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n", + "theta1 = 0.9\n", + "theta2 = 0.999\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-7\n", + "iter = 0\n", + "for epoch in range(n_epochs):\n", + " first_moment = 0.0\n", + " second_moment = 0.0\n", + " iter += 1\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " # Computing moments first\n", + " first_moment = theta1*first_moment + (1-theta1)*gradients\n", + " second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n", + " first_term = first_moment/(1.0-theta1**iter)\n", + " second_term = second_moment/(1.0-theta2**iter)\n", + "\t# Scaling with rho the new and the previous results\n", + " update = eta*first_term/(np.sqrt(second_term)+delta)\n", + " theta -= update\n", + "print(\"theta from own ADAM\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "b8475863", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n", + "\n", + "2. Work on project 1\n", + "\n", + "\n", + "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended." + ] + }, + { + "cell_type": "markdown", + "id": "4d4d0717", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on different scaling methods\n", + "\n", + "Before fitting a regression model, it is good practice to normalize or\n", + "standardize the features. This ensures all features are on a\n", + "comparable scale, which is especially important when using\n", + "regularization. In the exercises this week we will perform standardization, scaling each\n", + "feature to have mean 0 and standard deviation 1.\n", + "\n", + "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n", + "Then we subtract the mean and divide by the standard deviation for each feature.\n", + "\n", + "In the example here we\n", + "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n", + "(and each feature) means the model does not require a separate intercept\n", + "term, the data is shifted such that the intercept is effectively 0\n", + ". (In practice, one could include an intercept in the model and not\n", + "penalize it, but here we simplify by centering.)\n", + "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "46375144", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Standardize features (zero mean, unit variance for each feature)\n", + "X_mean = X.mean(axis=0)\n", + "X_std = X.std(axis=0)\n", + "X_std[X_std == 0] = 1 # safeguard to avoid division by zero for constant features\n", + "X_norm = (X - X_mean) / X_std\n", + "\n", + "# Center the target to zero mean (optional, to simplify intercept handling)\n", + "y_mean = ?\n", + "y_centered = ?" + ] + }, + { + "cell_type": "markdown", + "id": "39426ccf", + "metadata": { + "editable": true + }, + "source": [ + "Do we need to center the values of $y$?\n", + "\n", + "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", + "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n", + "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", + "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", + "same scale)." + ] + }, + { + "cell_type": "markdown", + "id": "df7fe27f", + "metadata": { + "editable": true + }, + "source": [ + "## Functionality in Scikit-Learn\n", + "\n", + "**Scikit-Learn** has several functions which allow us to rescale the\n", + "data, normally resulting in much better results in terms of various\n", + "accuracy scores. The **StandardScaler** function in **Scikit-Learn**\n", + "ensures that for each feature/predictor we study the mean value is\n", + "zero and the variance is one (every column in the design/feature\n", + "matrix). This scaling has the drawback that it does not ensure that\n", + "we have a particular maximum or minimum in our data set. Another\n", + "function included in **Scikit-Learn** is the **MinMaxScaler** which\n", + "ensures that all features are exactly between $0$ and $1$. The" + ] + }, + { + "cell_type": "markdown", + "id": "8fd48e39", + "metadata": { + "editable": true + }, + "source": [ + "## More preprocessing\n", + "\n", + "The **Normalizer** scales each data\n", + "point such that the feature vector has a euclidean length of one. In other words, it\n", + "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n", + "radius of 1. This means every data point is scaled by a different number (by the\n", + "inverse of it’s length).\n", + "This normalization is often used when only the direction (or angle) of the data matters,\n", + "not the length of the feature vector.\n", + "\n", + "The **RobustScaler** works similarly to the StandardScaler in that it\n", + "ensures statistical properties for each feature that guarantee that\n", + "they are on the same scale. However, the RobustScaler uses the median\n", + "and quartiles, instead of mean and variance. This makes the\n", + "RobustScaler ignore data points that are very different from the rest\n", + "(like measurement errors). These odd data points are also called\n", + "outliers, and might often lead to trouble for other scaling\n", + "techniques." + ] + }, + { + "cell_type": "markdown", + "id": "d6c60a0a", + "metadata": { + "editable": true + }, + "source": [ + "## Frequently used scaling functions\n", + "\n", + "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n", + "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:" + ] + }, + { + "cell_type": "markdown", + "id": "1bb6eaa0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25135896", + "metadata": { + "editable": true + }, + "source": [ + "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively, of the feature $x_j$.\n", + "This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one.\n", + "\n", + "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n", + "on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "469ca11e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#Model training, we compute the mean value of y and X\n", + "y_train_mean = np.mean(y_train)\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "X_train = X_train - X_train_mean\n", + "y_train = y_train - y_train_mean\n", + "\n", + "# The we fit our model with the training data\n", + "trained_model = some_model.fit(X_train,y_train)\n", + "\n", + "\n", + "#Model prediction, we need also to transform our data set used for the prediction.\n", + "X_test = X_test - X_train_mean #Use mean from training data\n", + "y_pred = trained_model(X_test)\n", + "y_pred = y_pred + y_train_mean\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "33722029", + "metadata": { + "editable": true + }, + "source": [ + "Let us try to understand what this may imply mathematically when we\n", + "subtract the mean values, also known as *zero centering*. For\n", + "simplicity, we will focus on ordinary regression, as done in the above example.\n", + "\n", + "The cost/loss function for regression is" + ] + }, + { + "cell_type": "markdown", + "id": "fe27291e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ead1167d", + "metadata": { + "editable": true + }, + "source": [ + "Recall also that we use the squared value. This expression can lead to an\n", + "increased penalty for higher differences between predicted and\n", + "output/target values.\n", + "\n", + "What we have done is to single out the $\\theta_0$ term in the\n", + "definition of the mean squared error (MSE). The design matrix $X$\n", + "does in this case not contain any intercept column. When we take the\n", + "derivative with respect to $\\theta_0$, we want the derivative to obey" + ] + }, + { + "cell_type": "markdown", + "id": "b2efb706", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65333100", + "metadata": { + "editable": true + }, + "source": [ + "for all $j$. For $\\theta_0$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "1fde497c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "264ce562", + "metadata": { + "editable": true + }, + "source": [ + "Multiplying away the constant $2/n$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "0f63a6f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2ba0a6e4", + "metadata": { + "editable": true + }, + "source": [ + "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n", + "Our result for $\\theta_0$ simplifies then to" + ] + }, + { + "cell_type": "markdown", + "id": "3b377f93", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f05e9d08", + "metadata": { + "editable": true + }, + "source": [ + "We obtain then" + ] + }, + { + "cell_type": "markdown", + "id": "84784b8e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b62c6e5a", + "metadata": { + "editable": true + }, + "source": [ + "If we define" + ] + }, + { + "cell_type": "markdown", + "id": "ecce9763", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9e1842a", + "metadata": { + "editable": true + }, + "source": [ + "and the mean value of the outputs as" + ] + }, + { + "cell_type": "markdown", + "id": "be12163e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a097e9ab", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "239422b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed9778bb", + "metadata": { + "editable": true + }, + "source": [ + "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have" + ] + }, + { + "cell_type": "markdown", + "id": "7179b77b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aad2f56e", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite the latter equation as" + ] + }, + { + "cell_type": "markdown", + "id": "26aa9739", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d270cb13", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "5a52457b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c98105d", + "metadata": { + "editable": true + }, + "source": [ + "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n", + "\n", + "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)" + ] + }, + { + "cell_type": "markdown", + "id": "4d82302f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a3a07a10", + "metadata": { + "editable": true + }, + "source": [ + "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then" + ] + }, + { + "cell_type": "markdown", + "id": "ea19374e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11dd1361", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n", + "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n", + "\n", + "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then" + ] + }, + { + "cell_type": "markdown", + "id": "f6a52f34", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9d6807dc", + "metadata": { + "editable": true + }, + "source": [ + "What does this mean? And why do we insist on all this? Let us look at some examples.\n", + "\n", + "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n", + "Note also that we do not split the data into training and test." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "2ed0cafc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "\n", + "np.random.seed(2021)\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "def fit_theta(X, y):\n", + " return np.linalg.pinv(X.T @ X) @ X.T @ y\n", + "\n", + "\n", + "true_theta = [2, 0.5, 3.7]\n", + "\n", + "x = np.linspace(0, 1, 11)\n", + "y = np.sum(\n", + " np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n", + ") + 0.1 * np.random.normal(size=len(x))\n", + "\n", + "degree = 3\n", + "X = np.zeros((len(x), degree))\n", + "\n", + "# Include the intercept in the design matrix\n", + "for p in range(degree):\n", + " X[:, p] = x ** p\n", + "\n", + "theta = fit_theta(X, y)\n", + "\n", + "# Intercept is included in the design matrix\n", + "skl = LinearRegression(fit_intercept=False).fit(X, y)\n", + "\n", + "print(f\"True theta: {true_theta}\")\n", + "print(f\"Fitted theta: {theta}\")\n", + "print(f\"Sklearn fitted theta: {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with intercept column\")\n", + "print(MSE(y,ypredictOwn))\n", + "print(f\"MSE with intercept column from SKL\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "\n", + "plt.figure()\n", + "plt.scatter(x, y, label=\"Data\")\n", + "plt.plot(x, X @ theta, label=\"Fit\")\n", + "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n", + "\n", + "\n", + "# Do not include the intercept in the design matrix\n", + "X = np.zeros((len(x), degree - 1))\n", + "\n", + "for p in range(degree - 1):\n", + " X[:, p] = x ** (p + 1)\n", + "\n", + "# Intercept is not included in the design matrix\n", + "skl = LinearRegression(fit_intercept=True).fit(X, y)\n", + "\n", + "# Use centered values for X and y when computing coefficients\n", + "y_offset = np.average(y, axis=0)\n", + "X_offset = np.average(X, axis=0)\n", + "\n", + "theta = fit_theta(X - X_offset, y - y_offset)\n", + "intercept = np.mean(y_offset - X_offset @ theta)\n", + "\n", + "print(f\"Manual intercept: {intercept}\")\n", + "print(f\"Fitted theta (without intercept): {theta}\")\n", + "print(f\"Sklearn intercept: {skl.intercept_}\")\n", + "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with Manual intercept\")\n", + "print(MSE(y,ypredictOwn+intercept))\n", + "print(f\"MSE with Sklearn intercept\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n", + "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n", + "plt.grid()\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "f72dbb49", + "metadata": { + "editable": true + }, + "source": [ + "The intercept is the value of our output/target variable\n", + "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n", + "\n", + "Printing the MSE, we see first that both methods give the same MSE, as\n", + "they should. However, when we move to for example Ridge regression,\n", + "the way we treat the intercept may give a larger or smaller MSE,\n", + "meaning that the MSE can be penalized by the value of the\n", + "intercept. Not including the intercept in the fit, means that the\n", + "regularization term does not include $\\theta_0$. For different values\n", + "of $\\lambda$, this may lead to different MSE values. \n", + "\n", + "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by" + ] + }, + { + "cell_type": "markdown", + "id": "b7759b1f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba0ecd6e", + "metadata": { + "editable": true + }, + "source": [ + "but when we take out the intercept, this equation becomes" + ] + }, + { + "cell_type": "markdown", + "id": "ae897f1e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f9c41f7f", + "metadata": { + "editable": true + }, + "source": [ + "For Lasso regression we have" + ] + }, + { + "cell_type": "markdown", + "id": "fa013cc4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0c9b24be", + "metadata": { + "editable": true + }, + "source": [ + "It means that, when scaling the design matrix and the outputs/targets,\n", + "by subtracting the mean values, we have an optimization problem which\n", + "is not penalized by the intercept. The MSE value can then be smaller\n", + "since it focuses only on the remaining quantities. If we however bring\n", + "back the intercept, we will get a MSE which then contains the\n", + "intercept.\n", + "\n", + "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known vanilla data set." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "4f9b1fa0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree))\n", + "#We include explicitely the intercept column\n", + "for degree in range(Maxpolydegree):\n", + " X[:,degree] = x**degree\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "p = Maxpolydegree\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n", + " # Note: we include the intercept column and no scaling\n", + " RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n", + " RegRidge.fit(X_train,y_train)\n", + " # and then make the prediction\n", + " ytildeOwnRidge = X_train @ OwnRidgeTheta\n", + " ypredictOwnRidge = X_test @ OwnRidgeTheta\n", + " ytildeRidge = RegRidge.predict(X_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta)\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "1aa5ca37", + "metadata": { + "editable": true + }, + "source": [ + "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n", + "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n", + "What happens if we do not include the intercept in our fit?\n", + "Let us see how we can change this code by zero centering." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "a731e32c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(315)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree-1))\n", + "\n", + "for degree in range(1,Maxpolydegree): #No intercept column\n", + " X[:,degree-1] = x**(degree)\n", + "\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "#Center by removing mean from each feature\n", + "X_train_scaled = X_train - X_train_mean \n", + "X_test_scaled = X_test - X_train_mean\n", + "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n", + "#Remove the intercept from the training data.\n", + "y_scaler = np.mean(y_train) \n", + "y_train_scaled = y_train - y_scaler \n", + "\n", + "p = Maxpolydegree-1\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n", + " intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n", + " #Add intercept to prediction\n", + " ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n", + " RegRidge = linear_model.Ridge(lmb)\n", + " RegRidge.fit(X_train,y_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta) #Intercept is given by mean of target variable\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print('Intercept from own implementation:')\n", + " print(intercept_)\n", + " print('Intercept from Scikit-Learn Ridge implementation')\n", + " print(RegRidge.intercept_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6ea197d8", + "metadata": { + "editable": true + }, + "source": [ + "We see here, when compared to the code which includes explicitely the\n", + "intercept column, that our MSE value is actually smaller. This is\n", + "because the regularization term does not include the intercept value\n", + "$\\theta_0$ in the fitting. This applies to Lasso regularization as\n", + "well. It means that our optimization is now done only with the\n", + "centered matrix and/or vector that enter the fitting procedure." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week38.ipynb b/doc/LectureNotes/_build/html/_sources/week38.ipynb new file mode 100644 index 000000000..1d25f9941 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week38.ipynb @@ -0,0 +1,2283 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8f27372d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "fff8ca30", + "metadata": { + "editable": true + }, + "source": [ + "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n", + "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n", + "\n", + "Date: **September 15-19, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "7ee7e714", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 38, lecture Monday September 15\n", + "\n", + "**Material for the lecture on Monday September 15.**\n", + "\n", + "1. Statistical interpretation of OLS and various expectation values\n", + "\n", + "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", + "\n", + "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at \n", + "\n", + "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n", + "\n", + "5. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "3b5ac440", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos\n", + "1. Raschka et al, pages 175-192\n", + "\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", + "\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", + "\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", + "\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see " + ] + }, + { + "cell_type": "markdown", + "id": "6d5dba52", + "metadata": { + "editable": true + }, + "source": [ + "## Linking the regression analysis with a statistical interpretation\n", + "\n", + "We will now couple the discussions of ordinary least squares, Ridge\n", + "and Lasso regression with a statistical interpretation, that is we\n", + "move from a linear algebra analysis to a statistical analysis. In\n", + "particular, we will focus on what the regularization terms can result\n", + "in. We will amongst other things show that the regularization\n", + "parameter can reduce considerably the variance of the parameters\n", + "$\\theta$.\n", + "\n", + "On of the advantages of doing linear regression is that we actually end up with\n", + "analytical expressions for several statistical quantities. \n", + "Standard least squares and Ridge regression allow us to\n", + "derive quantities like the variance and other expectation values in a\n", + "rather straightforward way.\n", + "\n", + "It is assumed that $\\varepsilon_i\n", + "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n", + "independent, i.e.:" + ] + }, + { + "cell_type": "markdown", + "id": "bfc2983a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \n", + "\\mbox{Cov}(\\varepsilon_{i_1},\n", + "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n", + "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2. \\end{array} \\right.\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b5f5980", + "metadata": { + "editable": true + }, + "source": [ + "The randomness of $\\varepsilon_i$ implies that\n", + "$\\mathbf{y}_i$ is also a random variable. In particular,\n", + "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n", + "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n", + "non-random scalar. To specify the parameters of the distribution of\n", + "$\\mathbf{y}_i$ we need to calculate its first two moments. \n", + "\n", + "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n", + "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n", + "row number $i$ and perform a sum over all values $p$." + ] + }, + { + "cell_type": "markdown", + "id": "3464c7e8", + "metadata": { + "editable": true + }, + "source": [ + "## Assumptions made\n", + "\n", + "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n", + "that there exists a function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n", + "which describe our data" + ] + }, + { + "cell_type": "markdown", + "id": "ed0fd2df", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "feb9d4c2", + "metadata": { + "editable": true + }, + "source": [ + "We approximate this function with our model from the solution of the linear regression equations, that is our\n", + "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with" + ] + }, + { + "cell_type": "markdown", + "id": "eb6d71f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "566399f6", + "metadata": { + "editable": true + }, + "source": [ + "## Expectation value and variance\n", + "\n", + "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$" + ] + }, + { + "cell_type": "markdown", + "id": "6b33f497", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \n", + "\\mathbb{E}(y_i) & =\n", + "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n", + "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5f2f79f2", + "metadata": { + "editable": true + }, + "source": [ + "while\n", + "its variance is" + ] + }, + { + "cell_type": "markdown", + "id": "199121b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n", + "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n", + "[\\mathbb{E}(y_i)]^2 \\\\ & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n", + "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n", + "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n", + "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n", + "\\ast} \\, \\theta)^2 \\\\ & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n", + "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n", + "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n", + "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n", + "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2. \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a1cc529", + "metadata": { + "editable": true + }, + "source": [ + "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n", + "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)." + ] + }, + { + "cell_type": "markdown", + "id": "149e63be", + "metadata": { + "editable": true + }, + "source": [ + "## Expectation value and variance for $\\boldsymbol{\\theta}$\n", + "\n", + "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "6a6fb04a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "79420d06", + "metadata": { + "editable": true + }, + "source": [ + "This means that the estimator of the regression parameters is unbiased.\n", + "\n", + "We can also calculate the variance\n", + "\n", + "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is" + ] + }, + { + "cell_type": "markdown", + "id": "0e3de992", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{eqnarray*}\n", + "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n", + "\\\\\n", + "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n", + "\\\\\n", + "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n", + "% \\\\\n", + "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n", + "\\\\\n", + "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n", + "\\end{eqnarray*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d3ea2897", + "metadata": { + "editable": true + }, + "source": [ + "where we have used that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n", + "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n", + "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n", + "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n", + "variance of the estimate of the $j$-th regression coefficient:\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n", + "construct a confidence interval for the estimates.\n", + "\n", + "In a similar way, we can obtain analytical expressions for say the\n", + "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n", + "when we employ Ridge regression, allowing us again to define a confidence interval. \n", + "\n", + "It is rather straightforward to show that" + ] + }, + { + "cell_type": "markdown", + "id": "da5e3927", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ab5488b", + "metadata": { + "editable": true + }, + "source": [ + "We see clearly that \n", + "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n", + "\n", + "We can also compute the variance as" + ] + }, + { + "cell_type": "markdown", + "id": "f904a739", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T} \\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10fd648b", + "metadata": { + "editable": true + }, + "source": [ + "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n", + "\n", + "With this, we can compute the difference" + ] + }, + { + "cell_type": "markdown", + "id": "4812c2a4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "199d8531", + "metadata": { + "editable": true + }, + "source": [ + "The difference is non-negative definite since each component of the\n", + "matrix product is non-negative definite. \n", + "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below." + ] + }, + { + "cell_type": "markdown", + "id": "96c16676", + "metadata": { + "editable": true + }, + "source": [ + "## Deriving OLS from a probability distribution\n", + "\n", + "Our basic assumption when we derived the OLS equations was to assume\n", + "that our output is determined by a given continuous function\n", + "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n", + "distribution with zero mean value and an undetermined variance\n", + "$\\sigma^2$.\n", + "\n", + "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n", + "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n", + "the design matrix are not stochastic variables, we can assume that the\n", + "probability distribution of our targets is also a normal distribution\n", + "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n", + "single output $y_i$ is given by the Gaussian distribution" + ] + }, + { + "cell_type": "markdown", + "id": "a2a1a004", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5aad445b", + "metadata": { + "editable": true + }, + "source": [ + "## Independent and Identically Distributed (iid)\n", + "\n", + "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n", + "We define this distribution as" + ] + }, + { + "cell_type": "markdown", + "id": "d197c8bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e2e7462f", + "metadata": { + "editable": true + }, + "source": [ + "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n", + "\n", + "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have" + ] + }, + { + "cell_type": "markdown", + "id": "eb635d3d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "445ed13e", + "metadata": { + "editable": true + }, + "source": [ + "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n", + "in case we have a simple one-dimensional input and output case" + ] + }, + { + "cell_type": "markdown", + "id": "319bfc6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "90abf35a", + "metadata": { + "editable": true + }, + "source": [ + "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n", + "We can now rewrite the above probability as" + ] + }, + { + "cell_type": "markdown", + "id": "04b66fbd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4a27b5a7", + "metadata": { + "editable": true + }, + "source": [ + "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$." + ] + }, + { + "cell_type": "markdown", + "id": "8d12543f", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum Likelihood Estimation (MLE)\n", + "\n", + "In statistics, maximum likelihood estimation (MLE) is a method of\n", + "estimating the parameters of an assumed probability distribution,\n", + "given some observed data. This is achieved by maximizing a likelihood\n", + "function so that, under the assumed statistical model, the observed\n", + "data is the most probable. \n", + "\n", + "We will assume here that our events are given by the above Gaussian\n", + "distribution and we will determine the optimal parameters $\\theta$ by\n", + "maximizing the above PDF. However, computing the derivatives of a\n", + "product function is cumbersome and can easily lead to overflow and/or\n", + "underflowproblems, with potentials for loss of numerical precision.\n", + "\n", + "In practice, it is more convenient to maximize the logarithm of the\n", + "PDF because it is a monotonically increasing function of the argument.\n", + "Alternatively, and this will be our option, we will minimize the\n", + "negative of the logarithm since this is a monotonically decreasing\n", + "function.\n", + "\n", + "Note also that maximization/minimization of the logarithm of the PDF\n", + "is equivalent to the maximization/minimization of the function itself." + ] + }, + { + "cell_type": "markdown", + "id": "2e5cd118", + "metadata": { + "editable": true + }, + "source": [ + "## A new Cost Function\n", + "\n", + "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF" + ] + }, + { + "cell_type": "markdown", + "id": "c71a5edf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e663bf2e", + "metadata": { + "editable": true + }, + "source": [ + "which becomes" + ] + }, + { + "cell_type": "markdown", + "id": "c4bc4873", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f5bc59b8", + "metadata": { + "editable": true + }, + "source": [ + "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely" + ] + }, + { + "cell_type": "markdown", + "id": "4f6ddf4a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "afda0a6b", + "metadata": { + "editable": true + }, + "source": [ + "which leads to the well-known OLS equation for the optimal paramters $\\theta$" + ] + }, + { + "cell_type": "markdown", + "id": "b5335dc0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f86a52d", + "metadata": { + "editable": true + }, + "source": [ + "Next week we will make a similar analysis for Ridge and Lasso regression" + ] + }, + { + "cell_type": "markdown", + "id": "5cdb1767", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods\n", + "\n", + "Before we proceed, we need to rethink what we have been doing. In our\n", + "eager to fit the data, we have omitted several important elements in\n", + "our regression analysis. In what follows we will\n", + "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n", + "\n", + "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n", + "\n", + "and discuss how to select a given model (one of the difficult parts in machine learning)." + ] + }, + { + "cell_type": "markdown", + "id": "69435d77", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", + "\n", + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n", + "cross-validation and the bootstrap method." + ] + }, + { + "cell_type": "markdown", + "id": "cefbb559", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." + ] + }, + { + "cell_type": "markdown", + "id": "2659401a", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." + ] + }, + { + "cell_type": "markdown", + "id": "4d5d7748", + "metadata": { + "editable": true + }, + "source": [ + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." + ] + }, + { + "cell_type": "markdown", + "id": "54df92b3", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", + "\n", + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." + ] + }, + { + "cell_type": "markdown", + "id": "5b1a1390", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n", + "\n", + "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**." + ] + }, + { + "cell_type": "markdown", + "id": "39f233e4", + "metadata": { + "editable": true + }, + "source": [ + "## The Central Limit Theorem\n", + "\n", + "Suppose we have a PDF $p(x)$ from which we generate a series $N$\n", + "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n", + "is viewed as the average of a specific measurement, e.g., throwing \n", + "dice 100 times and then taking the average value, or producing a certain\n", + "amount of random numbers. \n", + "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n", + "which follows. We do the same for $\\mathbb{E}[z]=z$.\n", + "\n", + "If we compute the mean $z$ of $m$ such mean values $x_i$" + ] + }, + { + "cell_type": "markdown", + "id": "361320d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a363db1e", + "metadata": { + "editable": true + }, + "source": [ + "the question we pose is which is the PDF of the new variable $z$." + ] + }, + { + "cell_type": "markdown", + "id": "92967efc", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the Limit\n", + "\n", + "The probability of obtaining an average value $z$ is the product of the \n", + "probabilities of obtaining arbitrary individual mean values $x_i$,\n", + "but with the constraint that the average is $z$. We can express this through\n", + "the following expression" + ] + }, + { + "cell_type": "markdown", + "id": "1bffca97", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n", + " \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0dacb6fc", + "metadata": { + "editable": true + }, + "source": [ + "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n", + "All measurements that lead to each individual $x_i$ are expected to\n", + "be independent, which in turn means that we can express $\\tilde{p}$ as the \n", + "product of individual $p(x_i)$. The independence assumption is important in the derivation of the central limit theorem." + ] + }, + { + "cell_type": "markdown", + "id": "baeedf81", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting the $\\delta$-function\n", + "\n", + "If we use the integral expression for the $\\delta$-function" + ] + }, + { + "cell_type": "markdown", + "id": "20cc7770", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f67d3b94", + "metadata": { + "editable": true + }, + "source": [ + "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n", + "we arrive at" + ] + }, + { + "cell_type": "markdown", + "id": "17f59fb6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n", + " dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5f899fbe", + "metadata": { + "editable": true + }, + "source": [ + "with the integral over $x$ resulting in" + ] + }, + { + "cell_type": "markdown", + "id": "19a1f5bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n", + " \\int_{-\\infty}^{\\infty}dxp(x)\n", + " \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1db8fcf2", + "metadata": { + "editable": true + }, + "source": [ + "## Identifying Terms\n", + "\n", + "The second term on the rhs disappears since this is just the mean and \n", + "employing the definition of $\\sigma^2$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "bfadf7e5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n", + " 1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c65ce24", + "metadata": { + "editable": true + }, + "source": [ + "resulting in" + ] + }, + { + "cell_type": "markdown", + "id": "8cd5650a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n", + " \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11fdc936", + "metadata": { + "editable": true + }, + "source": [ + "and in the limit $m\\rightarrow \\infty$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "ed88642e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n", + " \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82c61b81", + "metadata": { + "editable": true + }, + "source": [ + "which is the normal distribution with variance\n", + "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n", + "and $\\mu$ is also the mean of the PDF $p(x)$." + ] + }, + { + "cell_type": "markdown", + "id": "bc43db46", + "metadata": { + "editable": true + }, + "source": [ + "## Wrapping it up\n", + "\n", + "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n", + "the average of $m$ random values corresponding to a PDF $p(x)$ \n", + "is a normal distribution whose mean is the \n", + "mean value of the PDF $p(x)$ and whose variance is the variance\n", + "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n", + "\n", + "The central limit theorem leads to the well-known expression for the\n", + "standard deviation, given by" + ] + }, + { + "cell_type": "markdown", + "id": "25418113", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma_m=\n", + "\\frac{\\sigma}{\\sqrt{m}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e5d3c3eb", + "metadata": { + "editable": true + }, + "source": [ + "The latter is true only if the average value is known exactly. This is obtained in the limit\n", + "$m\\rightarrow \\infty$ only. Because the mean and the variance are measured quantities we obtain \n", + "the familiar expression in statistics (the so-called Bessel correction)" + ] + }, + { + "cell_type": "markdown", + "id": "c504cba4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma_m\\approx \n", + "\\frac{\\sigma}{\\sqrt{m-1}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "079ded2a", + "metadata": { + "editable": true + }, + "source": [ + "In many cases however the above estimate for the standard deviation,\n", + "in particular if correlations are strong, may be too simplistic. Keep\n", + "in mind that we have assumed that the variables $x$ are independent\n", + "and identically distributed. This is obviously not always the\n", + "case. For example, the random numbers (or better pseudorandom numbers)\n", + "we generate in various calculations do always exhibit some\n", + "correlations.\n", + "\n", + "The theorem is satisfied by a large class of PDFs. Note however that for a\n", + "finite $m$, it is not always possible to find a closed form /analytic expression for\n", + "$\\tilde{p}(x)$." + ] + }, + { + "cell_type": "markdown", + "id": "e8534a50", + "metadata": { + "editable": true + }, + "source": [ + "## Confidence Intervals\n", + "\n", + "Confidence intervals are used in statistics and represent a type of estimate\n", + "computed from the observed data. This gives a range of values for an\n", + "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n", + "\n", + "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n", + "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n", + "\n", + "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n", + "\n", + "This quantity can be used to\n", + "construct a confidence interval for the estimates." + ] + }, + { + "cell_type": "markdown", + "id": "2fc73431", + "metadata": { + "editable": true + }, + "source": [ + "## Standard Approach based on the Normal Distribution\n", + "\n", + "We will assume that the parameters $\\theta$ follow a normal\n", + "distribution. We can then define the confidence interval. Here we will be using as\n", + "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n", + "for the standard deviation. We have then a confidence interval" + ] + }, + { + "cell_type": "markdown", + "id": "0f8b0845", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25105753", + "metadata": { + "editable": true + }, + "source": [ + "where $z$ defines the level of certainty (or confidence). For a normal\n", + "distribution typical parameters are $z=2.576$ which corresponds to a\n", + "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n", + "$95\\%$. A confidence level of $95\\%$ is commonly used and it is\n", + "normally referred to as a *two-sigmas* confidence level, that is we\n", + "approximate $z\\approx 2$.\n", + "\n", + "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n", + "\n", + "In this text you will also find an in-depth discussion of the\n", + "Bootstrap method, why it works and various theorems related to it." + ] + }, + { + "cell_type": "markdown", + "id": "89be6eea", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap background\n", + "\n", + "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n", + "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n", + "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n", + "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n", + "$\\widehat{\\theta}$. You can think of this as using a histogram\n", + "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n", + "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n", + "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n", + "estimators." + ] + }, + { + "cell_type": "markdown", + "id": "6c240b38", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: More Bootstrap background\n", + "\n", + "In the case that $\\widehat{\\theta}$ has\n", + "more than one component, and the components are independent, we use the\n", + "same estimator on each component separately. If the probability\n", + "density function of $X_i$, $p(x)$, had been known, then it would have\n", + "been straightforward to do this by: \n", + "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n", + "\n", + "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n", + "\n", + "By repeated use of the above two points, many\n", + "estimates of $\\widehat{\\theta}$ can be obtained. The\n", + "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n", + "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$." + ] + }, + { + "cell_type": "markdown", + "id": "fbd95a5c", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap approach\n", + "\n", + "But\n", + "unless there is enough information available about the process that\n", + "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n", + "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552) asked the\n", + "question: What if we replace $p(x)$ by the relative frequency\n", + "of the observation $X_i$?\n", + "\n", + "If we draw observations in accordance with\n", + "the relative frequency of the observations, will we obtain the same\n", + "result in some asymptotic sense? The answer is yes." + ] + }, + { + "cell_type": "markdown", + "id": "dc50d43a", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap steps\n", + "\n", + "The independent bootstrap works like this: \n", + "\n", + "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n", + "\n", + "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n", + "\n", + "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n", + "\n", + "4. Repeat this process $k$ times. \n", + "\n", + "When you are done, you can draw a histogram of the relative frequency\n", + "of $\\widehat \\theta^*$. This is your estimate of the probability\n", + "distribution $p(t)$. Using this probability distribution you can\n", + "estimate any statistics thereof. In principle you never draw the\n", + "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n", + "you use the estimators corresponding to the statistic of interest. For\n", + "example, if you are interested in estimating the variance of $\\widehat\n", + "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n", + "$\\widehat \\theta^*$." + ] + }, + { + "cell_type": "markdown", + "id": "283068cc", + "metadata": { + "editable": true + }, + "source": [ + "## Code example for the Bootstrap method\n", + "\n", + "The following code starts with a Gaussian distribution with mean value\n", + "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n", + "used in the bootstrap analysis. The bootstrap analysis returns a data\n", + "set after a given number of bootstrap operations (as many as we have\n", + "data points). This data set consists of estimated mean values for each\n", + "bootstrap operation. The histogram generated by the bootstrap method\n", + "shows that the distribution for these mean values is also a Gaussian,\n", + "centered around the mean value $\\mu=100$ but with standard deviation\n", + "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n", + "this case the same as the number of original data points). The value\n", + "of the standard deviation is what we expect from the central limit\n", + "theorem." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ff4790ba", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import numpy as np\n", + "from time import time\n", + "from scipy.stats import norm\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Returns mean of bootstrap samples \n", + "# Bootstrap algorithm\n", + "def bootstrap(data, datapoints):\n", + " t = np.zeros(datapoints)\n", + " n = len(data)\n", + " # non-parametric bootstrap \n", + " for i in range(datapoints):\n", + " t[i] = np.mean(data[np.random.randint(0,n,n)])\n", + " # analysis \n", + " print(\"Bootstrap Statistics :\")\n", + " print(\"original bias std. error\")\n", + " print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n", + " return t\n", + "\n", + "# We set the mean value to 100 and the standard deviation to 15\n", + "mu, sigma = 100, 15\n", + "datapoints = 10000\n", + "# We generate random numbers according to the normal distribution\n", + "x = mu + sigma*np.random.randn(datapoints)\n", + "# bootstrap returns the data sample \n", + "t = bootstrap(x, datapoints)" + ] + }, + { + "cell_type": "markdown", + "id": "3e6adc2f", + "metadata": { + "editable": true + }, + "source": [ + "We see that our new variance and from that the standard deviation, agrees with the central limit theorem." + ] + }, + { + "cell_type": "markdown", + "id": "6ec8223c", + "metadata": { + "editable": true + }, + "source": [ + "## Plotting the Histogram" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3cf4144d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# the histogram of the bootstrapped data (normalized data if density = True)\n", + "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n", + "# add a 'best fit' line \n", + "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n", + "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n", + "plt.xlabel('x')\n", + "plt.ylabel('Probability')\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "db5a8f91", + "metadata": { + "editable": true + }, + "source": [ + "## The bias-variance tradeoff\n", + "\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "327bce6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c671d4e", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", + "\n", + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "6e05fc43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c45e0752", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this as" + ] + }, + { + "cell_type": "markdown", + "id": "bafa4ab6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ea0bc471", + "metadata": { + "editable": true + }, + "source": [ + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", + "\n", + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "08b603f3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4114d10e", + "metadata": { + "editable": true + }, + "source": [ + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "8890c666", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d5b7ce4", + "metadata": { + "editable": true + }, + "source": [ + "which, using the abovementioned expectation values can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "3913c5b9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5e0067b1", + "metadata": { + "editable": true + }, + "source": [ + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$." + ] + }, + { + "cell_type": "markdown", + "id": "326bc8f1", + "metadata": { + "editable": true + }, + "source": [ + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d3713eca", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Bias-Variance tradeoff" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "01c3b507", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 500\n", + "n_boostraps = 100\n", + "degree = 18 # A quite high value, just to show.\n", + "noise = 0.1\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-1, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n", + "\n", + "# Hold out some test data that is never used in training.\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "# Combine x transformation and model into one operation.\n", + "# Not neccesary, but convenient.\n", + "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + "\n", + "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n", + "# for each bootstrap iteration.\n", + "y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + "for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + "\n", + " # Evaluate the new model on the same test data each time.\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + "# Note: Expectations and variances taken w.r.t. different training\n", + "# data sets, hence the axis=1. Subsequent means are taken across the test data\n", + "# set in order to obtain a total value, but before this we have error/bias/variance\n", + "# calculated per data point in the test set.\n", + "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n", + "# maintains the column vector form. Dropping this yields very unexpected results.\n", + "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + "print('Error:', error)\n", + "print('Bias^2:', bias)\n", + "print('Var:', variance)\n", + "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n", + "\n", + "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n", + "plt.scatter(x_test, y_test, label='Data points')\n", + "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "949e3a5e", + "metadata": { + "editable": true + }, + "source": [ + "## Understanding what happens" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "7e7f4926", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "33c5cae5", + "metadata": { + "editable": true + }, + "source": [ + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." + ] + }, + { + "cell_type": "markdown", + "id": "f931f0f2", + "metadata": { + "editable": true + }, + "source": [ + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "58daa28d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3bbcf741", + "metadata": { + "editable": true + }, + "source": [ + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." + ] + }, + { + "cell_type": "markdown", + "id": "4b0ffe06", + "metadata": { + "editable": true + }, + "source": [ + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" + ] + }, + { + "cell_type": "markdown", + "id": "b11baed6", + "metadata": { + "editable": true + }, + "source": [ + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", + "\n", + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "39e76d49", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", + "\n", + "## Cross-validation on Ridge regression using KFold only\n", + "\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", + "\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", + "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", + "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", + "\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", + "\n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", + "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", + "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", + "\n", + " j += 1\n", + " i += 1\n", + "\n", + "\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", + "\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", + "\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e7d12ef0", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "47f6ae18", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9c1d4754", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "b698ac66", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "0a2409b0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "56f130b5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "This week we will discuss during the first hour of each lab session\n", + "some technicalities related to the project and methods for updating\n", + "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n", + "the jupyter-notebook from week 37 (September 12-16).\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see \n", + "\n", + "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at " + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week39.ipynb b/doc/LectureNotes/_build/html/_sources/week39.ipynb new file mode 100644 index 000000000..1f411fe62 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week39.ipynb @@ -0,0 +1,2430 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3a65fcc4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "284ac98b", + "metadata": { + "editable": true + }, + "source": [ + "# Week 39: Resampling methods and logistic regression\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n", + "\n", + "Date: **Week 39**" + ] + }, + { + "cell_type": "markdown", + "id": "582e0b32", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 39, September 22-26, 2025\n", + "\n", + "**Material for the lecture on Monday September 22.**\n", + "\n", + "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", + "\n", + "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n", + "\n", + "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n", + "\n", + "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "08ea52de", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos, resampling methods\n", + "1. Raschka et al, pages 175-192\n", + "\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", + "\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", + "\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", + "\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)" + ] + }, + { + "cell_type": "markdown", + "id": "a8d5878f", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos, logistic regression\n", + "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n", + "\n", + "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n", + "\n", + "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n", + "\n", + "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)" + ] + }, + { + "cell_type": "markdown", + "id": "e93210f9", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions week 39\n", + "\n", + "**Material for the lab sessions on Tuesday and Wednesday.**\n", + "\n", + "1. Discussions on how to structure your report for the first project\n", + "\n", + "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n", + "\n", + "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n", + "\n", + "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n", + "\n", + "5. A general guideline can be found at ." + ] + }, + { + "cell_type": "markdown", + "id": "c319a504", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture material" + ] + }, + { + "cell_type": "markdown", + "id": "5f29284a", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", + "\n", + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation." + ] + }, + { + "cell_type": "markdown", + "id": "4a774608", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." + ] + }, + { + "cell_type": "markdown", + "id": "5e62c381", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." + ] + }, + { + "cell_type": "markdown", + "id": "96896342", + "metadata": { + "editable": true + }, + "source": [ + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." + ] + }, + { + "cell_type": "markdown", + "id": "d5318be7", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", + "\n", + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." + ] + }, + { + "cell_type": "markdown", + "id": "7597015e", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)." + ] + }, + { + "cell_type": "markdown", + "id": "fbf69230", + "metadata": { + "editable": true + }, + "source": [ + "## The bias-variance tradeoff\n", + "\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "358f7872", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a4aceef", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", + "\n", + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "84416669", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0036358e", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this as" + ] + }, + { + "cell_type": "markdown", + "id": "d712d2d7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b71e48ac", + "metadata": { + "editable": true + }, + "source": [ + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", + "\n", + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "c78ceafe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "74aae5bc", + "metadata": { + "editable": true + }, + "source": [ + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "1f2313f1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a29b174f", + "metadata": { + "editable": true + }, + "source": [ + "which, using the abovementioned expectation values can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "3bc08002", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7b7d24e8", + "metadata": { + "editable": true + }, + "source": [ + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n", + "\n", + "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**" + ] + }, + { + "cell_type": "markdown", + "id": "f2118d82", + "metadata": { + "editable": true + }, + "source": [ + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "baf08f8a", + "metadata": { + "editable": true + }, + "source": [ + "## Understanding what happens" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1bd7ac4e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3edb75ab", + "metadata": { + "editable": true + }, + "source": [ + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." + ] + }, + { + "cell_type": "markdown", + "id": "88ce8a48", + "metadata": { + "editable": true + }, + "source": [ + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "40385eb8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a0c0d4df", + "metadata": { + "editable": true + }, + "source": [ + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." + ] + }, + { + "cell_type": "markdown", + "id": "68d3e653", + "metadata": { + "editable": true + }, + "source": [ + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" + ] + }, + { + "cell_type": "markdown", + "id": "7f7a6350", + "metadata": { + "editable": true + }, + "source": [ + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", + "\n", + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "23eef50b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", + "\n", + "## Cross-validation on Ridge regression using KFold only\n", + "\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", + "\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", + "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", + "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", + "\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", + "\n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", + "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", + "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", + "\n", + " j += 1\n", + " i += 1\n", + "\n", + "\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", + "\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", + "\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "76662787", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "166cd085", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "53dc97b8", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "660084ab", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5dd5aec2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2c1f6d4b", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic Regression\n", + "\n", + "In linear regression our main interest was centered on learning the\n", + "coefficients of a functional fit (say a polynomial) in order to be\n", + "able to predict the response of a continuous variable on some unseen\n", + "data. The fit to the continuous variable $y_i$ is based on some\n", + "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", + "analytical expressions for standard ordinary Least Squares or Ridge\n", + "regression (in terms of matrices to invert) for several quantities,\n", + "ranging from the variance and thereby the confidence intervals of the\n", + "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n", + "the product of the design matrices, linear regression gives then a\n", + "simple recipe for fitting our data." + ] + }, + { + "cell_type": "markdown", + "id": "149e92ec", + "metadata": { + "editable": true + }, + "source": [ + "## Classification problems\n", + "\n", + "Classification problems, however, are concerned with outcomes taking\n", + "the form of discrete variables (i.e. categories). We may for example,\n", + "on the basis of DNA sequencing for a number of patients, like to find\n", + "out which mutations are important for a certain disease; or based on\n", + "scans of various patients' brains, figure out if there is a tumor or\n", + "not; or given a specific physical system, we'd like to identify its\n", + "state, say whether it is an ordered or disordered system (typical\n", + "situation in solid state physics); or classify the status of a\n", + "patient, whether she/he has a stroke or not and many other similar\n", + "situations.\n", + "\n", + "The most common situation we encounter when we apply logistic\n", + "regression is that of two possible outcomes, normally denoted as a\n", + "binary outcome, true or false, positive or negative, success or\n", + "failure etc." + ] + }, + { + "cell_type": "markdown", + "id": "ce85cd3a", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization and Deep learning\n", + "\n", + "Logistic regression will also serve as our stepping stone towards\n", + "neural network algorithms and supervised deep learning. For logistic\n", + "learning, the minimization of the cost function leads to a non-linear\n", + "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n", + "problem calls therefore for minimization algorithms. This forms the\n", + "bottle neck of all machine learning algorithms, namely how to find\n", + "reliable minima of a multi-variable function. This leads us to the\n", + "family of gradient descent methods. The latter are the working horses\n", + "of basically all modern machine learning algorithms.\n", + "\n", + "We note also that many of the topics discussed here on logistic \n", + "regression are also commonly used in modern supervised Deep Learning\n", + "models, as we will see later." + ] + }, + { + "cell_type": "markdown", + "id": "2eb9e687", + "metadata": { + "editable": true + }, + "source": [ + "## Basics\n", + "\n", + "We consider the case where the outputs/targets, also called the\n", + "responses or the outcomes, $y_i$ are discrete and only take values\n", + "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", + "\n", + "The goal is to predict the\n", + "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", + "made of $n$ samples, each of which carries $p$ features or predictors. The\n", + "primary goal is to identify the classes to which new unseen samples\n", + "belong.\n", + "\n", + "Let us specialize to the case of two classes only, with outputs\n", + "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", + "credit card user that could default or not on her/his credit card\n", + "debt. That is" + ] + }, + { + "cell_type": "markdown", + "id": "9b8b7d05", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7db50d1a", + "metadata": { + "editable": true + }, + "source": [ + "## Linear classifier\n", + "\n", + "Before moving to the logistic model, let us try to use our linear\n", + "regression model to classify these two outcomes. We could for example\n", + "fit a linear model to the default case if $y_i > 0.5$ and the no\n", + "default case $y_i \\leq 0.5$.\n", + "\n", + "We would then have our \n", + "weighted linear combination, namely" + ] + }, + { + "cell_type": "markdown", + "id": "a78fc346", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} + \\boldsymbol{\\epsilon},\n", + "\\label{_auto1} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "661d8faf", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n", + "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors." + ] + }, + { + "cell_type": "markdown", + "id": "8620ba1b", + "metadata": { + "editable": true + }, + "source": [ + "## Some selected properties\n", + "\n", + "The main problem with our function is that it takes values on the\n", + "entire real axis. In the case of logistic regression, however, the\n", + "labels $y_i$ are discrete variables. A typical example is the credit\n", + "card data discussed below here, where we can set the state of\n", + "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n", + "in the data set (see the full example below).\n", + "\n", + "One simple way to get a discrete output is to have sign\n", + "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n", + "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n", + "We will encounter this model in our first demonstration of neural networks.\n", + "\n", + "Historically it is called the **perceptron** model in the machine learning\n", + "literature. This model is extremely simple. However, in many cases it is more\n", + "favorable to use a ``soft\" classifier that outputs\n", + "the probability of a given category. This leads us to the logistic function." + ] + }, + { + "cell_type": "markdown", + "id": "8fdbebd2", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example\n", + "\n", + "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8dc64aeb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "from IPython.display import display\n", + "from pylab import plt, mpl\n", + "mpl.rcParams['font.family'] = 'serif'\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"chddata.csv\"),'r')\n", + "\n", + "# Read the chd data as csv file and organize the data into arrays with age group, age, and chd\n", + "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n", + "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n", + "output = chd['CHD']\n", + "age = chd['Age']\n", + "agegroup = chd['Agegroup']\n", + "numberID = chd['ID'] \n", + "display(chd)\n", + "\n", + "plt.scatter(age, output, marker='o')\n", + "plt.axis([18,70.0,-0.1, 1.2])\n", + "plt.xlabel(r'Age')\n", + "plt.ylabel(r'CHD')\n", + "plt.title(r'Age distribution and Coronary heart disease')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "40385068", + "metadata": { + "editable": true + }, + "source": [ + "## Plotting the mean value for each group\n", + "\n", + "What we could attempt however is to plot the mean value for each group." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a473659b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n", + "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n", + "plt.plot(group, agegroupmean, \"r-\")\n", + "plt.axis([0,9,0, 1.0])\n", + "plt.xlabel(r'Age group')\n", + "plt.ylabel(r'CHD mean values')\n", + "plt.title(r'Mean values for each age group')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3e2ab512", + "metadata": { + "editable": true + }, + "source": [ + "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n", + "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model" + ] + }, + { + "cell_type": "markdown", + "id": "40361f1b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a1b379fb", + "metadata": { + "editable": true + }, + "source": [ + "This expression implies however that $f(y_i\\vert x_i)$ could take any\n", + "value from minus infinity to plus infinity. If we however let\n", + "$f(y\\vert y)$ be represented by the mean value, the above example\n", + "shows us that we can constrain the function to take values between\n", + "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n", + "at our last curve we see also that it has an S-shaped form. This leads\n", + "us to a very popular model for the function $f$, namely the so-called\n", + "Sigmoid function or logistic model. We will consider this function as\n", + "representing the probability for finding a value of $y_i$ with a given\n", + "$x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "bcbf3d2b", + "metadata": { + "editable": true + }, + "source": [ + "## The logistic function\n", + "\n", + "Another widely studied model, is the so-called \n", + "perceptron model, which is an example of a \"hard classification\" model. We\n", + "will encounter this model when we discuss neural networks as\n", + "well. Each datapoint is deterministically assigned to a category (i.e\n", + "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n", + "classifier that outputs the probability of a given category rather\n", + "than a single value. For example, given $x_i$, the classifier\n", + "outputs the probability of being in a category $k$. Logistic regression\n", + "is the most common example of a so-called soft classifier. In logistic\n", + "regression, the probability that a data point $x_i$\n", + "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event," + ] + }, + { + "cell_type": "markdown", + "id": "38918f44", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fd225d0f", + "metadata": { + "editable": true + }, + "source": [ + "Note that $1-p(t)= p(-t)$." + ] + }, + { + "cell_type": "markdown", + "id": "d340b5c1", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of likelihood functions used in logistic regression and nueral networks\n", + "\n", + "The following code plots the logistic function, the step function and other functions we will encounter from here and on." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "357d6f03", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"The sigmoid function (or the logistic curve) is a\n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"tanh Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.tanh(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('tanh function')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8be63821", + "metadata": { + "editable": true + }, + "source": [ + "## Two parameters\n", + "\n", + "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "f79d930e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8a758aae", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n", + "\n", + "Note that we used" + ] + }, + { + "cell_type": "markdown", + "id": "88159170", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f9972402", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum likelihood\n", + "\n", + "In order to define the total likelihood for all possible outcomes from a \n", + "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", + "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", + "We aim thus at maximizing \n", + "the probability of seeing the observed data. We can then approximate the \n", + "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "949524d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9a7fded", + "metadata": { + "editable": true + }, + "source": [ + "from which we obtain the log-likelihood and our **cost/loss** function" + ] + }, + { + "cell_type": "markdown", + "id": "4c5f78fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ccce506", + "metadata": { + "editable": true + }, + "source": [ + "## The cost function rewritten\n", + "\n", + "Reordering the logarithms, we can rewrite the **cost/loss** function as" + ] + }, + { + "cell_type": "markdown", + "id": "bf58bb76", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41543ca6", + "metadata": { + "editable": true + }, + "source": [ + "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n", + "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + ] + }, + { + "cell_type": "markdown", + "id": "e664b57a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb357503", + "metadata": { + "editable": true + }, + "source": [ + "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", + "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + ] + }, + { + "cell_type": "markdown", + "id": "e388ad02", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cross entropy\n", + "\n", + "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n", + "therefore, any local minimizer is a global minimizer. \n", + "\n", + "Minimizing this\n", + "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "1d4f2850", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "68a0c133", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c942a72b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42caf6db", + "metadata": { + "editable": true + }, + "source": [ + "## A more compact expression\n", + "\n", + "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", + "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", + "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n", + "derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "22cd94c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9428067", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "29178d5a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6b7671ad", + "metadata": { + "editable": true + }, + "source": [ + "## Extending to more predictors\n", + "\n", + "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + ] + }, + { + "cell_type": "markdown", + "id": "500b6574", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cf0b50ce", + "metadata": { + "editable": true + }, + "source": [ + "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to" + ] + }, + { + "cell_type": "markdown", + "id": "537486ee", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "534fb571", + "metadata": { + "editable": true + }, + "source": [ + "## Including more classes\n", + "\n", + "Till now we have mainly focused on two classes, the so-called binary\n", + "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", + "of simplicity assume we have only two predictors. We have then following model" + ] + }, + { + "cell_type": "markdown", + "id": "fa7ca275", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cc765c0e", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2c43387d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e063f183", + "metadata": { + "editable": true + }, + "source": [ + "and so on till the class $C=K-1$ class" + ] + }, + { + "cell_type": "markdown", + "id": "060fa00c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9034492", + "metadata": { + "editable": true + }, + "source": [ + "and the model is specified in term of $K-1$ so-called log-odds or\n", + "**logit** transformations." + ] + }, + { + "cell_type": "markdown", + "id": "b7fba1fc", + "metadata": { + "editable": true + }, + "source": [ + "## More classes\n", + "\n", + "In our discussion of neural networks we will encounter the above again\n", + "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "\n", + "The softmax function is used in various multiclass classification\n", + "methods, such as multinomial logistic regression (also known as\n", + "softmax regression), multiclass linear discriminant analysis, naive\n", + "Bayes classifiers, and artificial neural networks. Specifically, in\n", + "multinomial logistic regression and linear discriminant analysis, the\n", + "input to the function is the result of $K$ distinct linear functions,\n", + "and the predicted probability for the $k$-th class given a sample\n", + "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n", + "predictors):" + ] + }, + { + "cell_type": "markdown", + "id": "a8346f86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b05e18eb", + "metadata": { + "editable": true + }, + "source": [ + "It is easy to extend to more predictors. The final class is" + ] + }, + { + "cell_type": "markdown", + "id": "3bff89b1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e89e832c", + "metadata": { + "editable": true + }, + "source": [ + "and they sum to one. Our earlier discussions were all specialized to\n", + "the case with two classes only. It is easy to see from the above that\n", + "what we derived earlier is compatible with these equations.\n", + "\n", + "To find the optimal parameters we would typically use a gradient\n", + "descent method. Newton's method and gradient descent methods are\n", + "discussed in the material on [optimization\n", + "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "464d4933", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization, the central part of any Machine Learning algortithm\n", + "\n", + "Almost every problem in machine learning and data science starts with\n", + "a dataset $X$, a model $g(\\theta)$, which is a function of the\n", + "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n", + "us to judge how well the model $g(\\theta)$ explains the observations\n", + "$X$. The model is fit by finding the values of $\\theta$ that minimize\n", + "the cost function. Ideally we would be able to solve for $\\theta$\n", + "analytically, however this is not possible in general and we must use\n", + "some approximative/numerical method to compute the minimum." + ] + }, + { + "cell_type": "markdown", + "id": "c707d4a0", + "metadata": { + "editable": true + }, + "source": [ + "## Revisiting our Logistic Regression case\n", + "\n", + "In our discussion on Logistic Regression we studied the \n", + "case of\n", + "two classes, with $y_i$ either\n", + "$0$ or $1$. Furthermore we assumed also that we have only two\n", + "parameters $\\theta$ in our fitting, that is we\n", + "defined probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "3f00d244", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2d239661", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$." + ] + }, + { + "cell_type": "markdown", + "id": "4243778f", + "metadata": { + "editable": true + }, + "source": [ + "## The equations to solve\n", + "\n", + "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", + "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", + "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n", + "the first derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "21ce04bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b854153c", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "235c9b1d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1651fe82", + "metadata": { + "editable": true + }, + "source": [ + "This defines what is called the Hessian matrix." + ] + }, + { + "cell_type": "markdown", + "id": "f36a8c94", + "metadata": { + "editable": true + }, + "source": [ + "## Solving using Newton-Raphson's method\n", + "\n", + "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", + "\n", + "Our iterative scheme is then given by" + ] + }, + { + "cell_type": "markdown", + "id": "438b5efe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3ae8207", + "metadata": { + "editable": true + }, + "source": [ + "or in matrix form as" + ] + }, + { + "cell_type": "markdown", + "id": "702a38c4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "43b5a9ab", + "metadata": { + "editable": true + }, + "source": [ + "The right-hand side is computed with the old values of $\\theta$. \n", + "\n", + "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + ] + }, + { + "cell_type": "markdown", + "id": "5b579d10", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Logistic Regression\n", + "\n", + "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a59b8c77", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "class LogisticRegression:\n", + " \"\"\"\n", + " Logistic Regression for binary and multiclass classification.\n", + " \"\"\"\n", + " def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n", + " self.lr = lr # Learning rate for gradient descent\n", + " self.epochs = epochs # Number of iterations\n", + " self.fit_intercept = fit_intercept # Whether to add intercept (bias)\n", + " self.verbose = verbose # Print loss during training if True\n", + " self.weights = None\n", + " self.multi_class = False # Will be determined at fit time\n", + "\n", + " def _add_intercept(self, X):\n", + " \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n", + " intercept = np.ones((X.shape[0], 1))\n", + " return np.concatenate((intercept, X), axis=1)\n", + "\n", + " def _sigmoid(self, z):\n", + " \"\"\"Sigmoid function for binary logistic.\"\"\"\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def _softmax(self, Z):\n", + " \"\"\"Softmax function for multiclass logistic.\"\"\"\n", + " exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n", + " return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n", + "\n", + " def fit(self, X, y):\n", + " \"\"\"\n", + " Train the logistic regression model using gradient descent.\n", + " Supports binary (sigmoid) and multiclass (softmax) based on y.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " y = np.array(y)\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Add intercept if needed\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " n_features += 1\n", + "\n", + " # Determine classes and mode (binary vs multiclass)\n", + " unique_classes = np.unique(y)\n", + " if len(unique_classes) > 2:\n", + " self.multi_class = True\n", + " else:\n", + " self.multi_class = False\n", + "\n", + " # ----- Multiclass case -----\n", + " if self.multi_class:\n", + " n_classes = len(unique_classes)\n", + " # Map original labels to 0...n_classes-1\n", + " class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n", + " y_indices = np.array([class_to_index[c] for c in y])\n", + " # Initialize weight matrix (features x classes)\n", + " self.weights = np.zeros((n_features, n_classes))\n", + "\n", + " # One-hot encode y\n", + " Y_onehot = np.zeros((n_samples, n_classes))\n", + " Y_onehot[np.arange(n_samples), y_indices] = 1\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " scores = X.dot(self.weights) # Linear scores (n_samples x n_classes)\n", + " probs = self._softmax(scores) # Probabilities (n_samples x n_classes)\n", + " # Compute gradient (features x classes)\n", + " gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n", + " # Update weights\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute current loss (categorical cross-entropy)\n", + " loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n", + " print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n", + "\n", + " # ----- Binary case -----\n", + " else:\n", + " # Convert y to 0/1 if not already\n", + " if not np.array_equal(unique_classes, [0, 1]):\n", + " # Map the two classes to 0 and 1\n", + " class0, class1 = unique_classes\n", + " y_binary = np.where(y == class1, 1, 0)\n", + " else:\n", + " y_binary = y.copy().astype(int)\n", + "\n", + " # Initialize weights vector (features,)\n", + " self.weights = np.zeros(n_features)\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " linear_model = X.dot(self.weights) # (n_samples,)\n", + " probs = self._sigmoid(linear_model) # (n_samples,)\n", + " # Gradient for binary cross-entropy\n", + " gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute binary cross-entropy loss\n", + " loss = -np.mean(\n", + " y_binary * np.log(probs + 1e-15) + \n", + " (1 - y_binary) * np.log(1 - probs + 1e-15)\n", + " )\n", + " print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n", + "\n", + " def predict_prob(self, X):\n", + " \"\"\"\n", + " Compute probability estimates. Returns a 1D array for binary or\n", + " a 2D array (n_samples x n_classes) for multiclass.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " # Add intercept if the model used it\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " scores = X.dot(self.weights)\n", + " if self.multi_class:\n", + " return self._softmax(scores)\n", + " else:\n", + " return self._sigmoid(scores)\n", + "\n", + " def predict(self, X):\n", + " \"\"\"\n", + " Predict class labels for samples in X.\n", + " Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n", + " \"\"\"\n", + " probs = self.predict_prob(X)\n", + " if self.multi_class:\n", + " # Choose class with highest probability\n", + " return np.argmax(probs, axis=1)\n", + " else:\n", + " # Threshold at 0.5 for binary\n", + " return (probs >= 0.5).astype(int)" + ] + }, + { + "cell_type": "markdown", + "id": "d7401376", + "metadata": { + "editable": true + }, + "source": [ + "The class implements the sigmoid and softmax internally. During fit(),\n", + "we check the number of classes: if more than 2, we set\n", + "self.multi_class=True and perform multinomial logistic regression. We\n", + "one-hot encode the target vector and update a weight matrix with\n", + "softmax probabilities. Otherwise, we do standard binary logistic\n", + "regression, converting labels to 0/1 if needed and updating a weight\n", + "vector. In both cases we use batch gradient descent on the\n", + "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n", + "stability). Progress (loss) can be printed if verbose=True." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "8609fd64", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Evaluation Metrics\n", + "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n", + "\n", + "def accuracy_score(y_true, y_pred):\n", + " \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n", + " y_true = np.array(y_true)\n", + " y_pred = np.array(y_pred)\n", + " return np.mean(y_true == y_pred)\n", + "\n", + "def binary_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Binary cross-entropy loss.\n", + " y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n", + " \"\"\"\n", + " y_true = np.array(y_true)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n", + "\n", + "def categorical_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Categorical cross-entropy loss for multiclass.\n", + " y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n", + " \"\"\"\n", + " y_true = np.array(y_true, dtype=int)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " # One-hot encode true labels\n", + " n_samples, n_classes = y_prob.shape\n", + " one_hot = np.zeros_like(y_prob)\n", + " one_hot[np.arange(n_samples), y_true] = 1\n", + " # Compute cross-entropy\n", + " loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n", + " return np.mean(loss_vec)" + ] + }, + { + "cell_type": "markdown", + "id": "1879aba2", + "metadata": { + "editable": true + }, + "source": [ + "### Synthetic data generation\n", + "\n", + "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n", + "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "6083d844", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic binary classification data.\n", + " Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " # Half samples for class 0, half for class 1\n", + " n0 = n_samples // 2\n", + " n1 = n_samples - n0\n", + " # Class 0 around mean -2, class 1 around +2\n", + " mean0 = -2 * np.ones(n_features)\n", + " mean1 = 2 * np.ones(n_features)\n", + " X0 = rng.randn(n0, n_features) + mean0\n", + " X1 = rng.randn(n1, n_features) + mean1\n", + " X = np.vstack((X0, X1))\n", + " y = np.array([0]*n0 + [1]*n1)\n", + " return X, y\n", + "\n", + "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic multiclass data with n_classes Gaussian clusters.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " X = []\n", + " y = []\n", + " samples_per_class = n_samples // n_classes\n", + " for cls in range(n_classes):\n", + " # Random cluster center for each class\n", + " center = rng.uniform(-5, 5, size=n_features)\n", + " Xi = rng.randn(samples_per_class, n_features) + center\n", + " yi = [cls] * samples_per_class\n", + " X.append(Xi)\n", + " y.extend(yi)\n", + " X = np.vstack(X)\n", + " y = np.array(y)\n", + " return X, y\n", + "\n", + "\n", + "# Generate and test on binary data\n", + "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n", + "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_bin.fit(X_bin, y_bin)\n", + "y_prob_bin = model_bin.predict_prob(X_bin) # probabilities for class 1\n", + "y_pred_bin = model_bin.predict(X_bin) # predicted classes 0 or 1\n", + "\n", + "acc_bin = accuracy_score(y_bin, y_pred_bin)\n", + "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n", + "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n", + "#For multiclass:\n", + "# Generate and test on multiclass data\n", + "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n", + "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_multi.fit(X_multi, y_multi)\n", + "y_prob_multi = model_multi.predict_prob(X_multi) # (n_samples x 3) probabilities\n", + "y_pred_multi = model_multi.predict(X_multi) # predicted labels 0,1,2\n", + "\n", + "acc_multi = accuracy_score(y_multi, y_pred_multi)\n", + "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n", + "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n", + "\n", + "# CSV Export\n", + "import csv\n", + "\n", + "# Export binary results\n", + "with open('binary_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_bin, y_pred_bin):\n", + " writer.writerow([true, pred])\n", + "\n", + "# Export multiclass results\n", + "with open('multiclass_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_multi, y_pred_multi):\n", + " writer.writerow([true, pred])" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week40.ipynb b/doc/LectureNotes/_build/html/_sources/week40.ipynb new file mode 100644 index 000000000..aa3733b88 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week40.ipynb @@ -0,0 +1,2459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2303c986", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "75c3b33e", + "metadata": { + "editable": true + }, + "source": [ + "# Week 40: Gradient descent methods (continued) and start Neural networks\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 29-October 3, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "4ba50982", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday September 29, 2025\n", + "1. Logistic regression and gradient descent, examples on how to code\n", + "\n", + "\n", + "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n", + "\n", + "3. Video of lecture at \n", + "\n", + "4. Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "1d527020", + "metadata": { + "editable": true + }, + "source": [ + "## Suggested readings and videos\n", + "**Readings and Videos:**\n", + "\n", + "1. The lecture notes for week 40 (these notes)\n", + "\n", + "\n", + "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n", + "\n", + "\n", + "\n", + "3. Neural Networks demystified at \n", + "\n", + "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\"" + ] + }, + { + "cell_type": "markdown", + "id": "63a4d497", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions Tuesday and Wednesday\n", + "**Material for the active learning sessions on Tuesday and Wednesday.**\n", + "\n", + " * Work on project 1 and discussions on how to structure your report\n", + "\n", + " * No weekly exercises for week 40, project work only\n", + "\n", + " * Video on how to write scientific reports recorded during one of the lab sessions at \n", + "\n", + " * A general guideline can be found at ." + ] + }, + { + "cell_type": "markdown", + "id": "73621d6b", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic Regression, from last week\n", + "\n", + "In linear regression our main interest was centered on learning the\n", + "coefficients of a functional fit (say a polynomial) in order to be\n", + "able to predict the response of a continuous variable on some unseen\n", + "data. The fit to the continuous variable $y_i$ is based on some\n", + "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", + "analytical expressions for standard ordinary Least Squares or Ridge\n", + "regression (in terms of matrices to invert) for several quantities,\n", + "ranging from the variance and thereby the confidence intervals of the\n", + "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n", + "the product of the design matrices, linear regression gives then a\n", + "simple recipe for fitting our data." + ] + }, + { + "cell_type": "markdown", + "id": "fc1df17b", + "metadata": { + "editable": true + }, + "source": [ + "## Classification problems\n", + "\n", + "Classification problems, however, are concerned with outcomes taking\n", + "the form of discrete variables (i.e. categories). We may for example,\n", + "on the basis of DNA sequencing for a number of patients, like to find\n", + "out which mutations are important for a certain disease; or based on\n", + "scans of various patients' brains, figure out if there is a tumor or\n", + "not; or given a specific physical system, we'd like to identify its\n", + "state, say whether it is an ordered or disordered system (typical\n", + "situation in solid state physics); or classify the status of a\n", + "patient, whether she/he has a stroke or not and many other similar\n", + "situations.\n", + "\n", + "The most common situation we encounter when we apply logistic\n", + "regression is that of two possible outcomes, normally denoted as a\n", + "binary outcome, true or false, positive or negative, success or\n", + "failure etc." + ] + }, + { + "cell_type": "markdown", + "id": "a3d311e6", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization and Deep learning\n", + "\n", + "Logistic regression will also serve as our stepping stone towards\n", + "neural network algorithms and supervised deep learning. For logistic\n", + "learning, the minimization of the cost function leads to a non-linear\n", + "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n", + "problem calls therefore for minimization algorithms.\n", + "\n", + "As we have discussed earlier, this forms the\n", + "bottle neck of all machine learning algorithms, namely how to find\n", + "reliable minima of a multi-variable function. This leads us to the\n", + "family of gradient descent methods. The latter are the working horses\n", + "of basically all modern machine learning algorithms.\n", + "\n", + "We note also that many of the topics discussed here on logistic \n", + "regression are also commonly used in modern supervised Deep Learning\n", + "models, as we will see later." + ] + }, + { + "cell_type": "markdown", + "id": "4120d6f9", + "metadata": { + "editable": true + }, + "source": [ + "## Basics\n", + "\n", + "We consider the case where the outputs/targets, also called the\n", + "responses or the outcomes, $y_i$ are discrete and only take values\n", + "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", + "\n", + "The goal is to predict the\n", + "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", + "made of $n$ samples, each of which carries $p$ features or predictors. The\n", + "primary goal is to identify the classes to which new unseen samples\n", + "belong.\n", + "\n", + "Last week we specialized to the case of two classes only, with outputs\n", + "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", + "credit card user that could default or not on her/his credit card\n", + "debt. That is" + ] + }, + { + "cell_type": "markdown", + "id": "9e85d1e4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a0d8c838", + "metadata": { + "editable": true + }, + "source": [ + "## Two parameters\n", + "\n", + "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "7cea7945", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6adc5106", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n", + "\n", + "Note that we used" + ] + }, + { + "cell_type": "markdown", + "id": "f976068e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dedf9f0e", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum likelihood\n", + "\n", + "In order to define the total likelihood for all possible outcomes from a \n", + "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", + "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", + "We aim thus at maximizing \n", + "the probability of seeing the observed data. We can then approximate the \n", + "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "bd8b54ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "57bfb17f", + "metadata": { + "editable": true + }, + "source": [ + "from which we obtain the log-likelihood and our **cost/loss** function" + ] + }, + { + "cell_type": "markdown", + "id": "00aee268", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e12940f3", + "metadata": { + "editable": true + }, + "source": [ + "## The cost function rewritten\n", + "\n", + "Reordering the logarithms, we can rewrite the **cost/loss** function as" + ] + }, + { + "cell_type": "markdown", + "id": "e5b2b29e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c6c0ba4c", + "metadata": { + "editable": true + }, + "source": [ + "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n", + "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + ] + }, + { + "cell_type": "markdown", + "id": "46ee2ea8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a05709b", + "metadata": { + "editable": true + }, + "source": [ + "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", + "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + ] + }, + { + "cell_type": "markdown", + "id": "ae1362c9", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cross entropy\n", + "\n", + "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n", + "therefore, any local minimizer is a global minimizer. \n", + "\n", + "Minimizing this\n", + "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "57f4670b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1dc19f59", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "4e96dc87", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fa77bec9", + "metadata": { + "editable": true + }, + "source": [ + "## A more compact expression\n", + "\n", + "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", + "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", + "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n", + "derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "1b013fd2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "910f36dd", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "8212d0ed", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ae7078b", + "metadata": { + "editable": true + }, + "source": [ + "## Extending to more predictors\n", + "\n", + "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + ] + }, + { + "cell_type": "markdown", + "id": "59e57d7c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6ffe0955", + "metadata": { + "editable": true + }, + "source": [ + "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to" + ] + }, + { + "cell_type": "markdown", + "id": "56e9bd82", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "86b12946", + "metadata": { + "editable": true + }, + "source": [ + "## Including more classes\n", + "\n", + "Till now we have mainly focused on two classes, the so-called binary\n", + "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", + "of simplicity assume we have only two predictors. We have then following model" + ] + }, + { + "cell_type": "markdown", + "id": "d55394df", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee01378a", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c7fadfbb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e8310f63", + "metadata": { + "editable": true + }, + "source": [ + "and so on till the class $C=K-1$ class" + ] + }, + { + "cell_type": "markdown", + "id": "be651647", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e277c601", + "metadata": { + "editable": true + }, + "source": [ + "and the model is specified in term of $K-1$ so-called log-odds or\n", + "**logit** transformations." + ] + }, + { + "cell_type": "markdown", + "id": "aea3a410", + "metadata": { + "editable": true + }, + "source": [ + "## More classes\n", + "\n", + "In our discussion of neural networks we will encounter the above again\n", + "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "\n", + "The softmax function is used in various multiclass classification\n", + "methods, such as multinomial logistic regression (also known as\n", + "softmax regression), multiclass linear discriminant analysis, naive\n", + "Bayes classifiers, and artificial neural networks. Specifically, in\n", + "multinomial logistic regression and linear discriminant analysis, the\n", + "input to the function is the result of $K$ distinct linear functions,\n", + "and the predicted probability for the $k$-th class given a sample\n", + "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n", + "predictors):" + ] + }, + { + "cell_type": "markdown", + "id": "bfa7221f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3d749c39", + "metadata": { + "editable": true + }, + "source": [ + "It is easy to extend to more predictors. The final class is" + ] + }, + { + "cell_type": "markdown", + "id": "dc061a39", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ea10488", + "metadata": { + "editable": true + }, + "source": [ + "and they sum to one. Our earlier discussions were all specialized to\n", + "the case with two classes only. It is easy to see from the above that\n", + "what we derived earlier is compatible with these equations.\n", + "\n", + "To find the optimal parameters we would typically use a gradient\n", + "descent method. Newton's method and gradient descent methods are\n", + "discussed in the material on [optimization\n", + "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "9cb3baf8", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization, the central part of any Machine Learning algortithm\n", + "\n", + "Almost every problem in machine learning and data science starts with\n", + "a dataset $X$, a model $g(\\theta)$, which is a function of the\n", + "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n", + "us to judge how well the model $g(\\theta)$ explains the observations\n", + "$X$. The model is fit by finding the values of $\\theta$ that minimize\n", + "the cost function. Ideally we would be able to solve for $\\theta$\n", + "analytically, however this is not possible in general and we must use\n", + "some approximative/numerical method to compute the minimum." + ] + }, + { + "cell_type": "markdown", + "id": "387393d7", + "metadata": { + "editable": true + }, + "source": [ + "## Revisiting our Logistic Regression case\n", + "\n", + "In our discussion on Logistic Regression we studied the \n", + "case of\n", + "two classes, with $y_i$ either\n", + "$0$ or $1$. Furthermore we assumed also that we have only two\n", + "parameters $\\theta$ in our fitting, that is we\n", + "defined probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "30f64659", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ba65422", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$." + ] + }, + { + "cell_type": "markdown", + "id": "005f46d7", + "metadata": { + "editable": true + }, + "source": [ + "## The equations to solve\n", + "\n", + "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", + "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", + "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n", + "the first derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "61a638bc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "469c0042", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "0af5449a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f4c16b4f", + "metadata": { + "editable": true + }, + "source": [ + "This defines what is called the Hessian matrix." + ] + }, + { + "cell_type": "markdown", + "id": "ddbe7f50", + "metadata": { + "editable": true + }, + "source": [ + "## Solving using Newton-Raphson's method\n", + "\n", + "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", + "\n", + "Our iterative scheme is then given by" + ] + }, + { + "cell_type": "markdown", + "id": "52830f96", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1b8a1c14", + "metadata": { + "editable": true + }, + "source": [ + "or in matrix form as" + ] + }, + { + "cell_type": "markdown", + "id": "8ad73cea", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6d47dd0b", + "metadata": { + "editable": true + }, + "source": [ + "The right-hand side is computed with the old values of $\\theta$. \n", + "\n", + "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + ] + }, + { + "cell_type": "markdown", + "id": "f399c2f4", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Logistic Regression\n", + "\n", + "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "79f6b6fc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "class LogisticRegression:\n", + " \"\"\"\n", + " Logistic Regression for binary and multiclass classification.\n", + " \"\"\"\n", + " def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n", + " self.lr = lr # Learning rate for gradient descent\n", + " self.epochs = epochs # Number of iterations\n", + " self.fit_intercept = fit_intercept # Whether to add intercept (bias)\n", + " self.verbose = verbose # Print loss during training if True\n", + " self.weights = None\n", + " self.multi_class = False # Will be determined at fit time\n", + "\n", + " def _add_intercept(self, X):\n", + " \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n", + " intercept = np.ones((X.shape[0], 1))\n", + " return np.concatenate((intercept, X), axis=1)\n", + "\n", + " def _sigmoid(self, z):\n", + " \"\"\"Sigmoid function for binary logistic.\"\"\"\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def _softmax(self, Z):\n", + " \"\"\"Softmax function for multiclass logistic.\"\"\"\n", + " exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n", + " return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n", + "\n", + " def fit(self, X, y):\n", + " \"\"\"\n", + " Train the logistic regression model using gradient descent.\n", + " Supports binary (sigmoid) and multiclass (softmax) based on y.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " y = np.array(y)\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Add intercept if needed\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " n_features += 1\n", + "\n", + " # Determine classes and mode (binary vs multiclass)\n", + " unique_classes = np.unique(y)\n", + " if len(unique_classes) > 2:\n", + " self.multi_class = True\n", + " else:\n", + " self.multi_class = False\n", + "\n", + " # ----- Multiclass case -----\n", + " if self.multi_class:\n", + " n_classes = len(unique_classes)\n", + " # Map original labels to 0...n_classes-1\n", + " class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n", + " y_indices = np.array([class_to_index[c] for c in y])\n", + " # Initialize weight matrix (features x classes)\n", + " self.weights = np.zeros((n_features, n_classes))\n", + "\n", + " # One-hot encode y\n", + " Y_onehot = np.zeros((n_samples, n_classes))\n", + " Y_onehot[np.arange(n_samples), y_indices] = 1\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " scores = X.dot(self.weights) # Linear scores (n_samples x n_classes)\n", + " probs = self._softmax(scores) # Probabilities (n_samples x n_classes)\n", + " # Compute gradient (features x classes)\n", + " gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n", + " # Update weights\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute current loss (categorical cross-entropy)\n", + " loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n", + " print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n", + "\n", + " # ----- Binary case -----\n", + " else:\n", + " # Convert y to 0/1 if not already\n", + " if not np.array_equal(unique_classes, [0, 1]):\n", + " # Map the two classes to 0 and 1\n", + " class0, class1 = unique_classes\n", + " y_binary = np.where(y == class1, 1, 0)\n", + " else:\n", + " y_binary = y.copy().astype(int)\n", + "\n", + " # Initialize weights vector (features,)\n", + " self.weights = np.zeros(n_features)\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " linear_model = X.dot(self.weights) # (n_samples,)\n", + " probs = self._sigmoid(linear_model) # (n_samples,)\n", + " # Gradient for binary cross-entropy\n", + " gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute binary cross-entropy loss\n", + " loss = -np.mean(\n", + " y_binary * np.log(probs + 1e-15) + \n", + " (1 - y_binary) * np.log(1 - probs + 1e-15)\n", + " )\n", + " print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n", + "\n", + " def predict_prob(self, X):\n", + " \"\"\"\n", + " Compute probability estimates. Returns a 1D array for binary or\n", + " a 2D array (n_samples x n_classes) for multiclass.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " # Add intercept if the model used it\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " scores = X.dot(self.weights)\n", + " if self.multi_class:\n", + " return self._softmax(scores)\n", + " else:\n", + " return self._sigmoid(scores)\n", + "\n", + " def predict(self, X):\n", + " \"\"\"\n", + " Predict class labels for samples in X.\n", + " Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n", + " \"\"\"\n", + " probs = self.predict_prob(X)\n", + " if self.multi_class:\n", + " # Choose class with highest probability\n", + " return np.argmax(probs, axis=1)\n", + " else:\n", + " # Threshold at 0.5 for binary\n", + " return (probs >= 0.5).astype(int)" + ] + }, + { + "cell_type": "markdown", + "id": "24e84b29", + "metadata": { + "editable": true + }, + "source": [ + "The class implements the sigmoid and softmax internally. During fit(),\n", + "we check the number of classes: if more than 2, we set\n", + "self.multi_class=True and perform multinomial logistic regression. We\n", + "one-hot encode the target vector and update a weight matrix with\n", + "softmax probabilities. Otherwise, we do standard binary logistic\n", + "regression, converting labels to 0/1 if needed and updating a weight\n", + "vector. In both cases we use batch gradient descent on the\n", + "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n", + "stability). Progress (loss) can be printed if verbose=True." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "7a73eca4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Evaluation Metrics\n", + "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n", + "\n", + "def accuracy_score(y_true, y_pred):\n", + " \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n", + " y_true = np.array(y_true)\n", + " y_pred = np.array(y_pred)\n", + " return np.mean(y_true == y_pred)\n", + "\n", + "def binary_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Binary cross-entropy loss.\n", + " y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n", + " \"\"\"\n", + " y_true = np.array(y_true)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n", + "\n", + "def categorical_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Categorical cross-entropy loss for multiclass.\n", + " y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n", + " \"\"\"\n", + " y_true = np.array(y_true, dtype=int)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " # One-hot encode true labels\n", + " n_samples, n_classes = y_prob.shape\n", + " one_hot = np.zeros_like(y_prob)\n", + " one_hot[np.arange(n_samples), y_true] = 1\n", + " # Compute cross-entropy\n", + " loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n", + " return np.mean(loss_vec)" + ] + }, + { + "cell_type": "markdown", + "id": "40d4b30f", + "metadata": { + "editable": true + }, + "source": [ + "### Synthetic data generation\n", + "\n", + "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n", + "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ac0089bf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic binary classification data.\n", + " Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " # Half samples for class 0, half for class 1\n", + " n0 = n_samples // 2\n", + " n1 = n_samples - n0\n", + " # Class 0 around mean -2, class 1 around +2\n", + " mean0 = -2 * np.ones(n_features)\n", + " mean1 = 2 * np.ones(n_features)\n", + " X0 = rng.randn(n0, n_features) + mean0\n", + " X1 = rng.randn(n1, n_features) + mean1\n", + " X = np.vstack((X0, X1))\n", + " y = np.array([0]*n0 + [1]*n1)\n", + " return X, y\n", + "\n", + "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic multiclass data with n_classes Gaussian clusters.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " X = []\n", + " y = []\n", + " samples_per_class = n_samples // n_classes\n", + " for cls in range(n_classes):\n", + " # Random cluster center for each class\n", + " center = rng.uniform(-5, 5, size=n_features)\n", + " Xi = rng.randn(samples_per_class, n_features) + center\n", + " yi = [cls] * samples_per_class\n", + " X.append(Xi)\n", + " y.extend(yi)\n", + " X = np.vstack(X)\n", + " y = np.array(y)\n", + " return X, y\n", + "\n", + "\n", + "# Generate and test on binary data\n", + "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n", + "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_bin.fit(X_bin, y_bin)\n", + "y_prob_bin = model_bin.predict_prob(X_bin) # probabilities for class 1\n", + "y_pred_bin = model_bin.predict(X_bin) # predicted classes 0 or 1\n", + "\n", + "acc_bin = accuracy_score(y_bin, y_pred_bin)\n", + "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n", + "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n", + "#For multiclass:\n", + "# Generate and test on multiclass data\n", + "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n", + "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_multi.fit(X_multi, y_multi)\n", + "y_prob_multi = model_multi.predict_prob(X_multi) # (n_samples x 3) probabilities\n", + "y_pred_multi = model_multi.predict(X_multi) # predicted labels 0,1,2\n", + "\n", + "acc_multi = accuracy_score(y_multi, y_pred_multi)\n", + "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n", + "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n", + "\n", + "# CSV Export\n", + "import csv\n", + "\n", + "# Export binary results\n", + "with open('binary_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_bin, y_pred_bin):\n", + " writer.writerow([true, pred])\n", + "\n", + "# Export multiclass results\n", + "with open('multiclass_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_multi, y_pred_multi):\n", + " writer.writerow([true, pred])" + ] + }, + { + "cell_type": "markdown", + "id": "1e9acef3", + "metadata": { + "editable": true + }, + "source": [ + "## Using **Scikit-learn**\n", + "\n", + "We show here how we can use a logistic regression case on a data set\n", + "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n", + "using Logistic regression as our algorithm for classification. This is\n", + "a widely studied data set and can easily be included in demonstrations\n", + "of classification problems." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9153234a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data\n", + "cancer = load_breast_cancer()\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))" + ] + }, + { + "cell_type": "markdown", + "id": "908d547b", + "metadata": { + "editable": true + }, + "source": [ + "## Using the correlation matrix\n", + "\n", + "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n", + "We use **Pandas** to compute the correlation matrix." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8a46f4f3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "cancer = load_breast_cancer()\n", + "import pandas as pd\n", + "# Making a data frame\n", + "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n", + "\n", + "fig, axes = plt.subplots(15,2,figsize=(10,20))\n", + "malignant = cancer.data[cancer.target == 0]\n", + "benign = cancer.data[cancer.target == 1]\n", + "ax = axes.ravel()\n", + "\n", + "for i in range(30):\n", + " _, bins = np.histogram(cancer.data[:,i], bins =50)\n", + " ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n", + " ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n", + " ax[i].set_title(cancer.feature_names[i])\n", + " ax[i].set_yticks(())\n", + "ax[0].set_xlabel(\"Feature magnitude\")\n", + "ax[0].set_ylabel(\"Frequency\")\n", + "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n", + "fig.tight_layout()\n", + "plt.show()\n", + "\n", + "import seaborn as sns\n", + "correlation_matrix = cancerpd.corr().round(1)\n", + "# use the heatmap function from seaborn to plot the correlation matrix\n", + "# annot = True to print the values inside the square\n", + "plt.figure(figsize=(15,8))\n", + "sns.heatmap(data=correlation_matrix, annot=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ba0275a7", + "metadata": { + "editable": true + }, + "source": [ + "## Discussing the correlation data\n", + "\n", + "In the above example we note two things. In the first plot we display\n", + "the overlap of benign and malignant tumors as functions of the various\n", + "features in the Wisconsin data set. We see that for\n", + "some of the features we can distinguish clearly the benign and\n", + "malignant cases while for other features we cannot. This can point to\n", + "us which features may be of greater interest when we wish to classify\n", + "a benign or not benign tumour.\n", + "\n", + "In the second figure we have computed the so-called correlation\n", + "matrix, which in our case with thirty features becomes a $30\\times 30$\n", + "matrix.\n", + "\n", + "We constructed this matrix using **pandas** via the statements" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "1af34f8e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)" + ] + }, + { + "cell_type": "markdown", + "id": "1eac30d3", + "metadata": { + "editable": true + }, + "source": [ + "and then" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a0cdd9c9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "correlation_matrix = cancerpd.corr().round(1)" + ] + }, + { + "cell_type": "markdown", + "id": "013777ad", + "metadata": { + "editable": true + }, + "source": [ + "Diagonalizing this matrix we can in turn say something about which\n", + "features are of relevance and which are not. This leads us to\n", + "the classical Principal Component Analysis (PCA) theorem with\n", + "applications. This will be discussed later this semester." + ] + }, + { + "cell_type": "markdown", + "id": "410f90ac", + "metadata": { + "editable": true + }, + "source": [ + "## Other measures in classification studies" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fa16a459", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data\n", + "cancer = load_breast_cancer()\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a721de53", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to Neural networks\n", + "\n", + "Artificial neural networks are computational systems that can learn to\n", + "perform tasks by considering examples, generally without being\n", + "programmed with any task-specific rules. It is supposed to mimic a\n", + "biological system, wherein neurons interact by sending signals in the\n", + "form of mathematical functions between layers. All layers can contain\n", + "an arbitrary number of neurons, and each connection is represented by\n", + "a weight variable." + ] + }, + { + "cell_type": "markdown", + "id": "68de5052", + "metadata": { + "editable": true + }, + "source": [ + "## Artificial neurons\n", + "\n", + "The field of artificial neural networks has a long history of\n", + "development, and is closely connected with the advancement of computer\n", + "science and computers in general. A model of artificial neurons was\n", + "first developed by McCulloch and Pitts in 1943 to study signal\n", + "processing in the brain and has later been refined by others. The\n", + "general idea is to mimic neural networks in the human brain, which is\n", + "composed of billions of neurons that communicate with each other by\n", + "sending electrical signals. Each neuron accumulates its incoming\n", + "signals, which must exceed an activation threshold to yield an\n", + "output. If the threshold is not overcome, the neuron remains inactive,\n", + "i.e. has zero output.\n", + "\n", + "This behaviour has inspired a simple mathematical model for an artificial neuron." + ] + }, + { + "cell_type": "markdown", + "id": "7685af02", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n", + "\\label{artificialNeuron} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3dfcfcb0", + "metadata": { + "editable": true + }, + "source": [ + "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n", + "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n", + "\n", + "Conceptually, it is helpful to divide neural networks into four\n", + "categories:\n", + "1. general purpose neural networks for supervised learning,\n", + "\n", + "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n", + "\n", + "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n", + "\n", + "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n", + "\n", + "In natural science, DNNs and CNNs have already found numerous\n", + "applications. In statistical physics, they have been applied to detect\n", + "phase transitions in 2D Ising and Potts models, lattice gauge\n", + "theories, and different phases of polymers, or solving the\n", + "Navier-Stokes equation in weather forecasting. Deep learning has also\n", + "found interesting applications in quantum physics. Various quantum\n", + "phase transitions can be detected and studied using DNNs and CNNs,\n", + "topological phases, and even non-equilibrium many-body\n", + "localization. Representing quantum states as DNNs quantum state\n", + "tomography are among some of the impressive achievements to reveal the\n", + "potential of DNNs to facilitate the study of quantum systems.\n", + "\n", + "In quantum information theory, it has been shown that one can perform\n", + "gate decompositions with the help of neural. \n", + "\n", + "The applications are not limited to the natural sciences. There is a\n", + "plethora of applications in essentially all disciplines, from the\n", + "humanities to life science and medicine." + ] + }, + { + "cell_type": "markdown", + "id": "0d037ca7", + "metadata": { + "editable": true + }, + "source": [ + "## Neural network types\n", + "\n", + "An artificial neural network (ANN), is a computational model that\n", + "consists of layers of connected neurons, or nodes or units. We will\n", + "refer to these interchangeably as units or nodes, and sometimes as\n", + "neurons.\n", + "\n", + "It is supposed to mimic a biological nervous system by letting each\n", + "neuron interact with other neurons by sending signals in the form of\n", + "mathematical functions between layers. A wide variety of different\n", + "ANNs have been developed, but most of them consist of an input layer,\n", + "an output layer and eventual layers in-between, called *hidden\n", + "layers*. All layers can contain an arbitrary number of nodes, and each\n", + "connection between two nodes is associated with a weight variable.\n", + "\n", + "Neural networks (also called neural nets) are neural-inspired\n", + "nonlinear models for supervised learning. As we will see, neural nets\n", + "can be viewed as natural, more powerful extensions of supervised\n", + "learning methods such as linear and logistic regression and soft-max\n", + "methods we discussed earlier." + ] + }, + { + "cell_type": "markdown", + "id": "7bcf7188", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward neural networks\n", + "\n", + "The feed-forward neural network (FFNN) was the first and simplest type\n", + "of ANNs that were devised. In this network, the information moves in\n", + "only one direction: forward through the layers.\n", + "\n", + "Nodes are represented by circles, while the arrows display the\n", + "connections between the nodes, including the direction of information\n", + "flow. Additionally, each arrow corresponds to a weight variable\n", + "(figure to come). We observe that each node in a layer is connected\n", + "to *all* nodes in the subsequent layer, making this a so-called\n", + "*fully-connected* FFNN." + ] + }, + { + "cell_type": "markdown", + "id": "cd094e20", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Network\n", + "\n", + "A different variant of FFNNs are *convolutional neural networks*\n", + "(CNNs), which have a connectivity pattern inspired by the animal\n", + "visual cortex. Individual neurons in the visual cortex only respond to\n", + "stimuli from small sub-regions of the visual field, called a receptive\n", + "field. This makes the neurons well-suited to exploit the strong\n", + "spatially local correlation present in natural images. The response of\n", + "each neuron can be approximated mathematically as a convolution\n", + "operation. (figure to come)\n", + "\n", + "Convolutional neural networks emulate the behaviour of neurons in the\n", + "visual cortex by enforcing a *local* connectivity pattern between\n", + "nodes of adjacent layers: Each node in a convolutional layer is\n", + "connected only to a subset of the nodes in the previous layer, in\n", + "contrast to the fully-connected FFNN. Often, CNNs consist of several\n", + "convolutional layers that learn local features of the input, with a\n", + "fully-connected layer at the end, which gathers all the local data and\n", + "produces the outputs. They have wide applications in image and video\n", + "recognition." + ] + }, + { + "cell_type": "markdown", + "id": "ea99157e", + "metadata": { + "editable": true + }, + "source": [ + "## Recurrent neural networks\n", + "\n", + "So far we have only mentioned ANNs where information flows in one\n", + "direction: forward. *Recurrent neural networks* on the other hand,\n", + "have connections between nodes that form directed *cycles*. This\n", + "creates a form of internal memory which are able to capture\n", + "information on what has been calculated before; the output is\n", + "dependent on the previous computations. Recurrent NNs make use of\n", + "sequential information by performing the same task for every element\n", + "in a sequence, where each element depends on previous elements. An\n", + "example of such information is sentences, making recurrent NNs\n", + "especially well-suited for handwriting and speech recognition." + ] + }, + { + "cell_type": "markdown", + "id": "b73754c2", + "metadata": { + "editable": true + }, + "source": [ + "## Other types of networks\n", + "\n", + "There are many other kinds of ANNs that have been developed. One type\n", + "that is specifically designed for interpolation in multidimensional\n", + "space is the radial basis function (RBF) network. RBFs are typically\n", + "made up of three layers: an input layer, a hidden layer with\n", + "non-linear radial symmetric activation functions and a linear output\n", + "layer (''linear'' here means that each node in the output layer has a\n", + "linear activation function). The layers are normally fully-connected\n", + "and there are no cycles, thus RBFs can be viewed as a type of\n", + "fully-connected FFNN. They are however usually treated as a separate\n", + "type of NN due the unusual activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "aa97c83d", + "metadata": { + "editable": true + }, + "source": [ + "## Multilayer perceptrons\n", + "\n", + "One uses often so-called fully-connected feed-forward neural networks\n", + "with three or more layers (an input layer, one or more hidden layers\n", + "and an output layer) consisting of neurons that have non-linear\n", + "activation functions.\n", + "\n", + "Such networks are often called *multilayer perceptrons* (MLPs)." + ] + }, + { + "cell_type": "markdown", + "id": "abe84919", + "metadata": { + "editable": true + }, + "source": [ + "## Why multilayer perceptrons?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**.\n", + "\n", + "Note that the requirements on the activation function only applies to\n", + "the hidden layer, the output nodes are always assumed to be linear, so\n", + "as to not restrict the range of output values." + ] + }, + { + "cell_type": "markdown", + "id": "d3ff207b", + "metadata": { + "editable": true + }, + "source": [ + "## Illustration of a single perceptron model and a multi-perceptron model\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f982c11f", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of XOR, OR and AND gates\n", + "\n", + "Let us first try to fit various gates using standard linear\n", + "regression. The gates we are thinking of are the classical XOR, OR and\n", + "AND gates, well-known elements in computer science. The tables here\n", + "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n", + "specific target $y_i$." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "04a3e090", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "Simple code that tests XOR, OR and AND gates with linear regression\n", + "\"\"\"\n", + "\n", + "import numpy as np\n", + "# Design matrix\n", + "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n", + "print(f\"The X.TX matrix:{X.T @ X}\")\n", + "Xinv = np.linalg.pinv(X.T @ X)\n", + "print(f\"The invers of X.TX matrix:{Xinv}\")\n", + "\n", + "# The XOR gate \n", + "yXOR = np.array( [ 0, 1 ,1, 0])\n", + "ThetaXOR = Xinv @ X.T @ yXOR\n", + "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n", + "print(f\"The linear regression prediction for the XOR gate:{X @ ThetaXOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yOR = np.array( [ 0, 1 ,1, 1])\n", + "ThetaOR = Xinv @ X.T @ yOR\n", + "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n", + "print(f\"The linear regression prediction for the OR gate:{X @ ThetaOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yAND = np.array( [ 0, 0 ,0, 1])\n", + "ThetaAND = Xinv @ X.T @ yAND\n", + "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n", + "print(f\"The linear regression prediction for the AND gate:{X @ ThetaAND}\")" + ] + }, + { + "cell_type": "markdown", + "id": "95b1f5a5", + "metadata": { + "editable": true + }, + "source": [ + "What is happening here?" + ] + }, + { + "cell_type": "markdown", + "id": "0d200eff", + "metadata": { + "editable": true + }, + "source": [ + "## Does Logistic Regression do a better Job?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "040a69d0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "Simple code that tests XOR and OR gates with linear regression\n", + "and logistic regression\n", + "\"\"\"\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LogisticRegression\n", + "import numpy as np\n", + "\n", + "# Design matrix\n", + "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n", + "print(f\"The X.TX matrix:{X.T @ X}\")\n", + "Xinv = np.linalg.pinv(X.T @ X)\n", + "print(f\"The invers of X.TX matrix:{Xinv}\")\n", + "\n", + "# The XOR gate \n", + "yXOR = np.array( [ 0, 1 ,1, 0])\n", + "ThetaXOR = Xinv @ X.T @ yXOR\n", + "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n", + "print(f\"The linear regression prediction for the XOR gate:{X @ ThetaXOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yOR = np.array( [ 0, 1 ,1, 1])\n", + "ThetaOR = Xinv @ X.T @ yOR\n", + "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n", + "print(f\"The linear regression prediction for the OR gate:{X @ ThetaOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yAND = np.array( [ 0, 0 ,0, 1])\n", + "ThetaAND = Xinv @ X.T @ yAND\n", + "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n", + "print(f\"The linear regression prediction for the AND gate:{X @ ThetaAND}\")\n", + "\n", + "# Now we change to logistic regression\n", + "\n", + "\n", + "# Logistic Regression\n", + "logreg = LogisticRegression()\n", + "logreg.fit(X, yOR)\n", + "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n", + "\n", + "logreg.fit(X, yXOR)\n", + "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n", + "\n", + "\n", + "logreg.fit(X, yAND)\n", + "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))" + ] + }, + { + "cell_type": "markdown", + "id": "49f17f65", + "metadata": { + "editable": true + }, + "source": [ + "Not exactly impressive, but somewhat better." + ] + }, + { + "cell_type": "markdown", + "id": "714e0891", + "metadata": { + "editable": true + }, + "source": [ + "## Adding Neural Networks" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "28bde670", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "# and now neural networks with Scikit-Learn and the XOR\n", + "\n", + "from sklearn.neural_network import MLPClassifier\n", + "from sklearn.datasets import make_classification\n", + "X, yXOR = make_classification(n_samples=100, random_state=1)\n", + "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n", + "FFNN.predict_proba(X)\n", + "print(f\"Test set accuracy with Feed Forward Neural Network for XOR gate:{FFNN.score(X, yXOR)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "4440856f", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "The output $y$ is produced via the activation function $f$" + ] + }, + { + "cell_type": "markdown", + "id": "6199da92", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "62c964e3", + "metadata": { + "editable": true + }, + "source": [ + "This function receives $x_i$ as inputs.\n", + "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n", + "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n", + "the neurons in the preceding layer. Furthermore, an MLP is\n", + "fully-connected, which means that each neuron receives a weighted sum\n", + "of the outputs of *all* neurons in the previous layer." + ] + }, + { + "cell_type": "markdown", + "id": "64ba4c70", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$," + ] + }, + { + "cell_type": "markdown", + "id": "66c11135", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0f47b20a", + "metadata": { + "editable": true + }, + "source": [ + "Here $b_i$ is the so-called bias which is normally needed in\n", + "case of zero activation weights or inputs. How to fix the biases and\n", + "the weights will be discussed below. The value of $z_i^1$ is the\n", + "argument to the activation function $f_i$ of each node $i$, The\n", + "variable $M$ stands for all possible inputs to a given node $i$ in the\n", + "first layer. We define the output $y_i^1$ of all neurons in layer 1 as" + ] + }, + { + "cell_type": "markdown", + "id": "bda56156", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j + b_i^1\\right)\n", + "\\label{outputLayer1} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1330fab9", + "metadata": { + "editable": true + }, + "source": [ + "where we assume that all nodes in the same layer have identical\n", + "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n", + "In this case we would identify these functions with a superscript $l$ for the $l$-th layer," + ] + }, + { + "cell_type": "markdown", + "id": "ae474dfb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n", + "\\label{generalLayer} \\tag{4}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6cb6fed", + "metadata": { + "editable": true + }, + "source": [ + "where $N_l$ is the number of nodes in layer $l$. When the output of\n", + "all the nodes in the first hidden layer are computed, the values of\n", + "the subsequent layer can be calculated and so forth until the output\n", + "is obtained." + ] + }, + { + "cell_type": "markdown", + "id": "2f8f9b4e", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "The output of neuron $i$ in layer 2 is thus," + ] + }, + { + "cell_type": "markdown", + "id": "18e74238", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d10df3e7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \n", + " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n", + "\\label{outputLayer2} \\tag{6}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "da21a316", + "metadata": { + "editable": true + }, + "source": [ + "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads" + ] + }, + { + "cell_type": "markdown", + "id": "76938a28", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n", + "\\label{_auto3} \\tag{7}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65434967", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \n", + " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n", + " + b_1^3\\right]\n", + "\\label{_auto4} \\tag{8}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "31d4f5aa", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "We can generalize this expression to an MLP with $l$ hidden\n", + "layers. The complete functional form is," + ] + }, + { + "cell_type": "markdown", + "id": "114030e5", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n", + "\\label{completeNN} \\tag{9}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a93aec4e", + "metadata": { + "editable": true + }, + "source": [ + "which illustrates a basic property of MLPs: The only independent\n", + "variables are the input values $x_n$." + ] + }, + { + "cell_type": "markdown", + "id": "7c85562d", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "This confirms that an MLP, despite its quite convoluted mathematical\n", + "form, is nothing more than an analytic function, specifically a\n", + "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n", + "\\hat{y} \\in \\mathbb{R}^m$.\n", + "\n", + "Furthermore, the flexibility and universality of an MLP can be\n", + "illustrated by realizing that the expression is essentially a nested\n", + "sum of scaled activation functions of the form" + ] + }, + { + "cell_type": "markdown", + "id": "1152ea5e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " f(x) = c_1 f(c_2 x + c_3) + c_4\n", + "\\label{_auto5} \\tag{10}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f3d4b33", + "metadata": { + "editable": true + }, + "source": [ + "where the parameters $c_i$ are weights and biases. By adjusting these\n", + "parameters, the activation functions can be shifted up and down or\n", + "left and right, change slope or be rescaled which is the key to the\n", + "flexibility of a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "4c1ac54e", + "metadata": { + "editable": true + }, + "source": [ + "### Matrix-vector notation\n", + "\n", + "We can introduce a more convenient notation for the activations in an A NN. \n", + "\n", + "Additionally, we can represent the biases and activations\n", + "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n", + "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n", + "\n", + "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n", + "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n", + "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as" + ] + }, + { + "cell_type": "markdown", + "id": "5c4a861f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n", + " f_2\\left(\\left[\\begin{array}{ccc}\n", + " w^2_{11} &w^2_{12} &w^2_{13} \\\\\n", + " w^2_{21} &w^2_{22} &w^2_{23} \\\\\n", + " w^2_{31} &w^2_{32} &w^2_{33} \\\\\n", + " \\end{array} \\right] \\cdot\n", + " \\left[\\begin{array}{c}\n", + " y^1_1 \\\\\n", + " y^1_2 \\\\\n", + " y^1_3 \\\\\n", + " \\end{array}\\right] + \n", + " \\left[\\begin{array}{c}\n", + " b^2_1 \\\\\n", + " b^2_2 \\\\\n", + " b^2_3 \\\\\n", + " \\end{array}\\right]\\right).\n", + "\\label{_auto6} \\tag{11}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "276b271b", + "metadata": { + "editable": true + }, + "source": [ + "### Matrix-vector notation and activation\n", + "\n", + "The activation of node $i$ in layer 2 is" + ] + }, + { + "cell_type": "markdown", + "id": "63a5b8f1", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n", + " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n", + "\\label{_auto7} \\tag{12}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "316b8c32", + "metadata": { + "editable": true + }, + "source": [ + "This is not just a convenient and compact notation, but also a useful\n", + "and intuitive way to think about MLPs: The output is calculated by a\n", + "series of matrix-vector multiplications and vector additions that are\n", + "used as input to the activation functions. For each operation\n", + "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer." + ] + }, + { + "cell_type": "markdown", + "id": "34ba90c8", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). As described\n", + "in, the following restrictions are imposed on an activation function\n", + "for a FFNN to fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "3019fcaf", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, Logistic and Hyperbolic ones\n", + "\n", + "The second requirement excludes all linear functions. Furthermore, in\n", + "a MLP with only linear activation functions, each layer simply\n", + "performs a linear transformation of its inputs.\n", + "\n", + "Regardless of the number of layers, the output of the NN will be\n", + "nothing but a linear function of the inputs. Thus we need to introduce\n", + "some kind of non-linearity to the NN to be able to fit non-linear\n", + "functions Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "389ff36b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee9b399a", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "36f98b26", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb7b8839", + "metadata": { + "editable": true + }, + "source": [ + "### Relevance\n", + "\n", + "The *sigmoid* function are more biologically plausible because the\n", + "output of inactive neurons are zero. Such activation function are\n", + "called *one-sided*. However, it has been shown that the hyperbolic\n", + "tangent performs better than the sigmoid for training MLPs. has\n", + "become the most popular for *deep neural networks*" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "db8d28b5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"The sigmoid function (or the logistic curve) is a \n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Sine Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.sin(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sine function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Plots a graph of the squashing function used by a rectified linear\n", + "unit\"\"\"\n", + "z = numpy.arange(-2, 2, .1)\n", + "zero = numpy.zeros(len(z))\n", + "y = numpy.max([zero, z], axis=0)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, y)\n", + "ax.set_ylim([-2.0, 2.0])\n", + "ax.set_xlim([-2.0, 2.0])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('Rectified linear unit')\n", + "\n", + "plt.show()" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week41.ipynb b/doc/LectureNotes/_build/html/_sources/week41.ipynb new file mode 100644 index 000000000..c9b1adcdd --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week41.ipynb @@ -0,0 +1,3820 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b625bb28", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "679109d4", + "metadata": { + "editable": true + }, + "source": [ + "# Week 41 Neural networks and constructing a neural network code\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **Week 41**" + ] + }, + { + "cell_type": "markdown", + "id": "d7401ab9", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 41, October 6-10" + ] + }, + { + "cell_type": "markdown", + "id": "f47e1c5c", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lecture on Monday October 6, 2025\n", + "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n", + "\n", + "2. Building our own Feed-forward Neural Network, getting started\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "af0a9895", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. These lecture notes\n", + "\n", + "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n", + "\n", + "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n", + "\n", + "4. Neural Networks demystified at \n", + "\n", + "5. Building Neural Networks from scratch at \n", + "\n", + "6. Video on Neural Networks at \n", + "\n", + "7. Video on the back propagation algorithm at \n", + "\n", + "8. We also recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at ." + ] + }, + { + "cell_type": "markdown", + "id": "be1e5c03", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning\n", + "\n", + "**Two recent books online.**\n", + "\n", + "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n", + "\n", + "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)" + ] + }, + { + "cell_type": "markdown", + "id": "52520e8f", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on books with hands-on material and codes\n", + "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)" + ] + }, + { + "cell_type": "markdown", + "id": "408a0487", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions on Tuesday and Wednesday\n", + "\n", + "Aim: Getting started with coding neural network. The exercises this\n", + "week aim at setting up the feed-forward part of a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "23056baf", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday October 6" + ] + }, + { + "cell_type": "markdown", + "id": "56a2f2f2", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to Neural networks\n", + "\n", + "Artificial neural networks are computational systems that can learn to\n", + "perform tasks by considering examples, generally without being\n", + "programmed with any task-specific rules. It is supposed to mimic a\n", + "biological system, wherein neurons interact by sending signals in the\n", + "form of mathematical functions between layers. All layers can contain\n", + "an arbitrary number of neurons, and each connection is represented by\n", + "a weight variable." + ] + }, + { + "cell_type": "markdown", + "id": "2e3fa93d", + "metadata": { + "editable": true + }, + "source": [ + "## Artificial neurons\n", + "\n", + "The field of artificial neural networks has a long history of\n", + "development, and is closely connected with the advancement of computer\n", + "science and computers in general. A model of artificial neurons was\n", + "first developed by McCulloch and Pitts in 1943 to study signal\n", + "processing in the brain and has later been refined by others. The\n", + "general idea is to mimic neural networks in the human brain, which is\n", + "composed of billions of neurons that communicate with each other by\n", + "sending electrical signals. Each neuron accumulates its incoming\n", + "signals, which must exceed an activation threshold to yield an\n", + "output. If the threshold is not overcome, the neuron remains inactive,\n", + "i.e. has zero output.\n", + "\n", + "This behaviour has inspired a simple mathematical model for an artificial neuron." + ] + }, + { + "cell_type": "markdown", + "id": "0afafe3e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n", + "\\label{artificialNeuron} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc113056", + "metadata": { + "editable": true + }, + "source": [ + "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n", + "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n", + "\n", + "Conceptually, it is helpful to divide neural networks into four\n", + "categories:\n", + "1. general purpose neural networks for supervised learning,\n", + "\n", + "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n", + "\n", + "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n", + "\n", + "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n", + "\n", + "In natural science, DNNs and CNNs have already found numerous\n", + "applications. In statistical physics, they have been applied to detect\n", + "phase transitions in 2D Ising and Potts models, lattice gauge\n", + "theories, and different phases of polymers, or solving the\n", + "Navier-Stokes equation in weather forecasting. Deep learning has also\n", + "found interesting applications in quantum physics. Various quantum\n", + "phase transitions can be detected and studied using DNNs and CNNs,\n", + "topological phases, and even non-equilibrium many-body\n", + "localization. Representing quantum states as DNNs quantum state\n", + "tomography are among some of the impressive achievements to reveal the\n", + "potential of DNNs to facilitate the study of quantum systems.\n", + "\n", + "In quantum information theory, it has been shown that one can perform\n", + "gate decompositions with the help of neural. \n", + "\n", + "The applications are not limited to the natural sciences. There is a\n", + "plethora of applications in essentially all disciplines, from the\n", + "humanities to life science and medicine." + ] + }, + { + "cell_type": "markdown", + "id": "872c3321", + "metadata": { + "editable": true + }, + "source": [ + "## Neural network types\n", + "\n", + "An artificial neural network (ANN), is a computational model that\n", + "consists of layers of connected neurons, or nodes or units. We will\n", + "refer to these interchangeably as units or nodes, and sometimes as\n", + "neurons.\n", + "\n", + "It is supposed to mimic a biological nervous system by letting each\n", + "neuron interact with other neurons by sending signals in the form of\n", + "mathematical functions between layers. A wide variety of different\n", + "ANNs have been developed, but most of them consist of an input layer,\n", + "an output layer and eventual layers in-between, called *hidden\n", + "layers*. All layers can contain an arbitrary number of nodes, and each\n", + "connection between two nodes is associated with a weight variable.\n", + "\n", + "Neural networks (also called neural nets) are neural-inspired\n", + "nonlinear models for supervised learning. As we will see, neural nets\n", + "can be viewed as natural, more powerful extensions of supervised\n", + "learning methods such as linear and logistic regression and soft-max\n", + "methods we discussed earlier." + ] + }, + { + "cell_type": "markdown", + "id": "53edae74", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward neural networks\n", + "\n", + "The feed-forward neural network (FFNN) was the first and simplest type\n", + "of ANNs that were devised. In this network, the information moves in\n", + "only one direction: forward through the layers.\n", + "\n", + "Nodes are represented by circles, while the arrows display the\n", + "connections between the nodes, including the direction of information\n", + "flow. Additionally, each arrow corresponds to a weight variable\n", + "(figure to come). We observe that each node in a layer is connected\n", + "to *all* nodes in the subsequent layer, making this a so-called\n", + "*fully-connected* FFNN." + ] + }, + { + "cell_type": "markdown", + "id": "0eef36d6", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Network\n", + "\n", + "A different variant of FFNNs are *convolutional neural networks*\n", + "(CNNs), which have a connectivity pattern inspired by the animal\n", + "visual cortex. Individual neurons in the visual cortex only respond to\n", + "stimuli from small sub-regions of the visual field, called a receptive\n", + "field. This makes the neurons well-suited to exploit the strong\n", + "spatially local correlation present in natural images. The response of\n", + "each neuron can be approximated mathematically as a convolution\n", + "operation. (figure to come)\n", + "\n", + "Convolutional neural networks emulate the behaviour of neurons in the\n", + "visual cortex by enforcing a *local* connectivity pattern between\n", + "nodes of adjacent layers: Each node in a convolutional layer is\n", + "connected only to a subset of the nodes in the previous layer, in\n", + "contrast to the fully-connected FFNN. Often, CNNs consist of several\n", + "convolutional layers that learn local features of the input, with a\n", + "fully-connected layer at the end, which gathers all the local data and\n", + "produces the outputs. They have wide applications in image and video\n", + "recognition." + ] + }, + { + "cell_type": "markdown", + "id": "bf602451", + "metadata": { + "editable": true + }, + "source": [ + "## Recurrent neural networks\n", + "\n", + "So far we have only mentioned ANNs where information flows in one\n", + "direction: forward. *Recurrent neural networks* on the other hand,\n", + "have connections between nodes that form directed *cycles*. This\n", + "creates a form of internal memory which are able to capture\n", + "information on what has been calculated before; the output is\n", + "dependent on the previous computations. Recurrent NNs make use of\n", + "sequential information by performing the same task for every element\n", + "in a sequence, where each element depends on previous elements. An\n", + "example of such information is sentences, making recurrent NNs\n", + "especially well-suited for handwriting and speech recognition." + ] + }, + { + "cell_type": "markdown", + "id": "0afbe2d0", + "metadata": { + "editable": true + }, + "source": [ + "## Other types of networks\n", + "\n", + "There are many other kinds of ANNs that have been developed. One type\n", + "that is specifically designed for interpolation in multidimensional\n", + "space is the radial basis function (RBF) network. RBFs are typically\n", + "made up of three layers: an input layer, a hidden layer with\n", + "non-linear radial symmetric activation functions and a linear output\n", + "layer (''linear'' here means that each node in the output layer has a\n", + "linear activation function). The layers are normally fully-connected\n", + "and there are no cycles, thus RBFs can be viewed as a type of\n", + "fully-connected FFNN. They are however usually treated as a separate\n", + "type of NN due the unusual activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "d957cfe8", + "metadata": { + "editable": true + }, + "source": [ + "## Multilayer perceptrons\n", + "\n", + "One uses often so-called fully-connected feed-forward neural networks\n", + "with three or more layers (an input layer, one or more hidden layers\n", + "and an output layer) consisting of neurons that have non-linear\n", + "activation functions.\n", + "\n", + "Such networks are often called *multilayer perceptrons* (MLPs)." + ] + }, + { + "cell_type": "markdown", + "id": "57b218ab", + "metadata": { + "editable": true + }, + "source": [ + "## Why multilayer perceptrons?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**.\n", + "\n", + "Note that the requirements on the activation function only applies to\n", + "the hidden layer, the output nodes are always assumed to be linear, so\n", + "as to not restrict the range of output values." + ] + }, + { + "cell_type": "markdown", + "id": "6bda8dda", + "metadata": { + "editable": true + }, + "source": [ + "## Illustration of a single perceptron model and a multi-perceptron model\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f7d514be", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning and neural networks\n", + "\n", + "Neural networks, in its so-called feed-forward form, where each\n", + "iterations contains a feed-forward stage and a back-propgagation\n", + "stage, consist of series of affine matrix-matrix and matrix-vector\n", + "multiplications. The unknown parameters (the so-called biases and\n", + "weights which deternine the architecture of a neural network), are\n", + "uptaded iteratively using the so-called back-propagation algorithm.\n", + "This algorithm corresponds to the so-called reverse mode of \n", + "automatic differentation." + ] + }, + { + "cell_type": "markdown", + "id": "02ed299b", + "metadata": { + "editable": true + }, + "source": [ + "## Basics of an NN\n", + "\n", + "A neural network consists of a series of hidden layers, in addition to\n", + "the input and output layers. Each layer $l$ has a set of parameters\n", + "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n", + "parameters in other layers through a series of affine transformations,\n", + "for a standard NN these are matrix-matrix and matrix-vector\n", + "multiplications. For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n", + "\n", + "It consist of two basic steps:\n", + "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n", + "\n", + "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n", + "\n", + "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion." + ] + }, + { + "cell_type": "markdown", + "id": "96b8c13c", + "metadata": { + "editable": true + }, + "source": [ + "## Overarching view of a neural network\n", + "\n", + "The architecture of a neural network defines our model. This model\n", + "aims at describing some function $f(\\boldsymbol{x}$ which represents\n", + "some final result (outputs or tagrget values) given a specific inpput\n", + "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n", + "vectors.\n", + "\n", + "The architecture consists of\n", + "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n", + "\n", + "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n", + "\n", + "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n", + "\n", + "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n", + "\n", + "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model." + ] + }, + { + "cell_type": "markdown", + "id": "089704bf", + "metadata": { + "editable": true + }, + "source": [ + "## The optimization problem\n", + "\n", + "The cost function is a function of the unknown parameters\n", + "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n", + "parameters needed to define a neural network\n", + "\n", + "If we are dealing with a regression task a typical cost/loss function\n", + "is the mean squared error" + ] + }, + { + "cell_type": "markdown", + "id": "91ef7170", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9402737", + "metadata": { + "editable": true + }, + "source": [ + "This function represents one of many possible ways to define\n", + "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case." + ] + }, + { + "cell_type": "markdown", + "id": "09940e05", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters of neural networks\n", + "For neural networks the parameters\n", + "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n", + "defined below).\n", + "\n", + "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n", + "superscript indicates the layer number. The biases are typically given\n", + "by vector elements representing each single node of a given layer,\n", + "that is $b_j^{(l)}$." + ] + }, + { + "cell_type": "markdown", + "id": "2bd7b3ff", + "metadata": { + "editable": true + }, + "source": [ + "## Other ingredients of a neural network\n", + "\n", + "Having defined the architecture of a neural network, the optimization\n", + "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n", + "involves the calculations of gradients and their optimization. The\n", + "gradients represent the derivatives of a multidimensional object and\n", + "are often approximated by various gradient methods, including\n", + "1. various quasi-Newton methods,\n", + "\n", + "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n", + "\n", + "3. GD with momentum and other approximations to the learning rates such as\n", + "\n", + " * Adapative gradient (ADAgrad)\n", + "\n", + " * Root mean-square propagation (RMSprop)\n", + "\n", + " * Adaptive gradient with momentum (ADAM) and many other\n", + "\n", + "4. Stochastic gradient descent and various families of learning rate approximations" + ] + }, + { + "cell_type": "markdown", + "id": "1a771f02", + "metadata": { + "editable": true + }, + "source": [ + "## Other parameters\n", + "\n", + "In addition to the above, there are often additional hyperparamaters\n", + "which are included in the setup of a neural network. These will be\n", + "discussed below." + ] + }, + { + "cell_type": "markdown", + "id": "3291a232", + "metadata": { + "editable": true + }, + "source": [ + "## Universal approximation theorem\n", + "\n", + "The universal approximation theorem plays a central role in deep\n", + "learning. [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n", + "the following:\n", + "\n", + "Let $\\sigma$ be any continuous sigmoidal function such that" + ] + }, + { + "cell_type": "markdown", + "id": "74cc209d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fe210f2f", + "metadata": { + "editable": true + }, + "source": [ + "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n", + "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n", + "$\\epsilon >0$, there is a one-layer (hidden) neural network\n", + "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n", + "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which" + ] + }, + { + "cell_type": "markdown", + "id": "4dfec9c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a65f0cd5", + "metadata": { + "editable": true + }, + "source": [ + "## Some parallels from real analysis\n", + "\n", + "For those of you familiar with for example the [Stone-Weierstrass\n", + "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n", + "for polynomial approximations or the convergence criterion for Fourier\n", + "series, there are similarities in the derivation of the proof for\n", + "neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "d006386b", + "metadata": { + "editable": true + }, + "source": [ + "## The approximation theorem in words\n", + "\n", + "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n", + "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n", + "arbitrary accuracy.**\n", + "\n", + "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "0b094d43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f2b9ca56", + "metadata": { + "editable": true + }, + "source": [ + "Then we have" + ] + }, + { + "cell_type": "markdown", + "id": "db4817b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "43216143", + "metadata": { + "editable": true + }, + "source": [ + "## More on the general approximation theorem\n", + "\n", + "None of the proofs give any insight into the relation between the\n", + "number of of hidden layers and nodes and the approximation error\n", + "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n", + "\n", + "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n", + "\n", + "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want." + ] + }, + { + "cell_type": "markdown", + "id": "ef48ad88", + "metadata": { + "editable": true + }, + "source": [ + "## Class of functions we can approximate\n", + "\n", + "The class of functions that can be approximated are the continuous ones.\n", + "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points." + ] + }, + { + "cell_type": "markdown", + "id": "7c4fed36", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the equations for a neural network\n", + "\n", + "The questions we want to ask are how do changes in the biases and the\n", + "weights in our network change the cost function and how can we use the\n", + "final output to modify the weights and biases?\n", + "\n", + "To derive these equations let us start with a plain regression problem\n", + "and define our cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "c4cf04e8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ecc9a1bd", + "metadata": { + "editable": true + }, + "source": [ + "where the $y_i$s are our $n$ targets (the values we want to\n", + "reproduce), while the outputs of the network after having propagated\n", + "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$." + ] + }, + { + "cell_type": "markdown", + "id": "91e6e150", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a neural network with three hidden layers\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "4fabe3cc", + "metadata": { + "editable": true + }, + "source": [ + "## Definitions\n", + "\n", + "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n", + "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n", + "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n", + "$l$-th layer as a function of the bias, the weights which add up from\n", + "the previous layer $l-1$ and the forward passes/outputs\n", + "$\\hat{a}^{l-1}$ from the previous layer as" + ] + }, + { + "cell_type": "markdown", + "id": "8c25e4cf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ae861380", + "metadata": { + "editable": true + }, + "source": [ + "where $b_k^l$ are the biases from layer $l$. Here $M_{l-1}$\n", + "represents the total number of nodes/neurons/units of layer $l-1$. The\n", + "figure in the whiteboard notes illustrates this equation. We can rewrite this in a more\n", + "compact form as the matrix-vector products we discussed earlier," + ] + }, + { + "cell_type": "markdown", + "id": "2b7f7b74", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e76386ca", + "metadata": { + "editable": true + }, + "source": [ + "## Inputs to the activation function\n", + "\n", + "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n", + "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n", + "activation function. In the examples here we will use the sigmoid\n", + "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n", + "and their nodes. It means we have" + ] + }, + { + "cell_type": "markdown", + "id": "12a9fb38", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "08bbe824", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives and the chain rule\n", + "\n", + "From the definition of the activation $z_j^l$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "3783fe53", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b70d213", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "209db1b2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6e42f02f", + "metadata": { + "editable": true + }, + "source": [ + "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)" + ] + }, + { + "cell_type": "markdown", + "id": "78422fdc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c8491cf", + "metadata": { + "editable": true + }, + "source": [ + "## Derivative of the cost function\n", + "\n", + "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n", + "\n", + "Let us specialize to the output layer $l=L$. Our cost function is" + ] + }, + { + "cell_type": "markdown", + "id": "82fb3ded", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}^L) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "88fe7049", + "metadata": { + "editable": true + }, + "source": [ + "The derivative of this function with respect to the weights is" + ] + }, + { + "cell_type": "markdown", + "id": "af856571", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d684ab45", + "metadata": { + "editable": true + }, + "source": [ + "The last partial derivative can easily be computed and reads (by applying the chain rule)" + ] + }, + { + "cell_type": "markdown", + "id": "ac371b5c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8dbfe230", + "metadata": { + "editable": true + }, + "source": [ + "## Simpler examples first, and automatic differentiation\n", + "\n", + "In order to understand the back propagation algorithm and its\n", + "derivation (an implementation of the chain rule), let us first digress\n", + "with some simple examples. These examples are also meant to motivate\n", + "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)." + ] + }, + { + "cell_type": "markdown", + "id": "7244f7f3", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on the chain rule and gradients\n", + "\n", + "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)" + ] + }, + { + "cell_type": "markdown", + "id": "ffb80d86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f15ef23", + "metadata": { + "editable": true + }, + "source": [ + "## Multivariable functions\n", + "\n", + "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives" + ] + }, + { + "cell_type": "markdown", + "id": "1734d532", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c013e25", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f416e200", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "943d440c", + "metadata": { + "editable": true + }, + "source": [ + "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)" + ] + }, + { + "cell_type": "markdown", + "id": "9a88f9e3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s} &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6bc993bf", + "metadata": { + "editable": true + }, + "source": [ + "## Automatic differentiation through examples\n", + "\n", + "A great introduction to automatic differentiation is given by Baydin et al., see .\n", + "See also the video at .\n", + "\n", + "Automatic differentiation is a represented by a repeated application\n", + "of the chain rule on well-known functions and allows for the\n", + "calculation of derivatives to numerical precision. It is not the same\n", + "as the calculation of symbolic derivatives via for example SymPy, nor\n", + "does it use approximative formulae based on Taylor-expansions of a\n", + "function around a given value. The latter are error prone due to\n", + "truncation errors and values of the step size $\\Delta$." + ] + }, + { + "cell_type": "markdown", + "id": "0685fdd2", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example\n", + "\n", + "Our first example is rather simple," + ] + }, + { + "cell_type": "markdown", + "id": "9a2b16de", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) =\\exp{x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba5c3f8a", + "metadata": { + "editable": true + }, + "source": [ + "with derivative" + ] + }, + { + "cell_type": "markdown", + "id": "d0c973a9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) =2x\\exp{x^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "34c21223", + "metadata": { + "editable": true + }, + "source": [ + "We can use SymPy to extract the pertinent lines of Python code through the following simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "72fa0f44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from sympy import *\n", + "x = symbols('x')\n", + "expr = exp(x*x)\n", + "simplify(expr)\n", + "derivative = diff(expr,x)\n", + "print(python(expr))\n", + "print(python(derivative))" + ] + }, + { + "cell_type": "markdown", + "id": "78884bc6", + "metadata": { + "editable": true + }, + "source": [ + "## Smarter way of evaluating the above function\n", + "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable" + ] + }, + { + "cell_type": "markdown", + "id": "f13d7286", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a = x^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "443739d9", + "metadata": { + "editable": true + }, + "source": [ + "leading to" + ] + }, + { + "cell_type": "markdown", + "id": "48b45da1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = f(a(x)) = b= \\exp{a}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "81e7fd8f", + "metadata": { + "editable": true + }, + "source": [ + "We now assume that all operations can be counted in terms of equal\n", + "floating point operations. This means that in order to calculate\n", + "$f(x)$ we need first to square $x$ and then compute the exponential. We\n", + "have thus two floating point operations only." + ] + }, + { + "cell_type": "markdown", + "id": "824bbfa1", + "metadata": { + "editable": true + }, + "source": [ + "## Reducing the number of operations\n", + "\n", + "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "42d2716e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) = 2xb,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f27855c1", + "metadata": { + "editable": true + }, + "source": [ + "which reduces the number of operations from four in the orginal\n", + "expression to two. This means that if we need to compute $f(x)$ and\n", + "its derivative (a common task in optimizations), we have reduced the\n", + "number of operations from six to four in total.\n", + "\n", + "**Note** that the usage of a symbolic software like SymPy does not\n", + "include such simplifications and the calculations of the function and\n", + "the derivatives yield in general more floating point operations." + ] + }, + { + "cell_type": "markdown", + "id": "d4fe531f", + "metadata": { + "editable": true + }, + "source": [ + "## Chain rule, forward and reverse modes\n", + "\n", + "In the above example we have introduced the variables $a$ and $b$, and our function is" + ] + }, + { + "cell_type": "markdown", + "id": "aba8f666", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = f(a(x)) = b= \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "404c698a", + "metadata": { + "editable": true + }, + "source": [ + "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as" + ] + }, + { + "cell_type": "markdown", + "id": "2c73032a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "95a71a82", + "metadata": { + "editable": true + }, + "source": [ + "We note that since $b=f(x)$ that" + ] + }, + { + "cell_type": "markdown", + "id": "c71b8e66", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{db}=1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "23998633", + "metadata": { + "editable": true + }, + "source": [ + "leading to" + ] + }, + { + "cell_type": "markdown", + "id": "0708e562", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee8c4ade", + "metadata": { + "editable": true + }, + "source": [ + "as before." + ] + }, + { + "cell_type": "markdown", + "id": "860d410c", + "metadata": { + "editable": true + }, + "source": [ + "## Forward and reverse modes\n", + "\n", + "We have that" + ] + }, + { + "cell_type": "markdown", + "id": "064e5852", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "983c3afe", + "metadata": { + "editable": true + }, + "source": [ + "which we can rewrite either as" + ] + }, + { + "cell_type": "markdown", + "id": "a1f9638f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "84a07e04", + "metadata": { + "editable": true + }, + "source": [ + "or" + ] + }, + { + "cell_type": "markdown", + "id": "4383650d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "36a2d607", + "metadata": { + "editable": true + }, + "source": [ + "The first expression is called reverse mode (or back propagation)\n", + "since we start by evaluating the derivatives at the end point and then\n", + "propagate backwards. This is the standard way of evaluating\n", + "derivatives (gradients) when optimizing the parameters of a neural\n", + "network. In the context of deep learning this is computationally\n", + "more efficient since the output of a neural network consists of either\n", + "one or some few other output variables.\n", + "\n", + "The second equation defines the so-called **forward mode**." + ] + }, + { + "cell_type": "markdown", + "id": "ab0a9ca8", + "metadata": { + "editable": true + }, + "source": [ + "## More complicated function\n", + "\n", + "We increase our ambitions and introduce a slightly more complicated function" + ] + }, + { + "cell_type": "markdown", + "id": "e85a7d29", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) =\\sqrt{x^2+exp{x^2}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "91c151e1", + "metadata": { + "editable": true + }, + "source": [ + "with derivative" + ] + }, + { + "cell_type": "markdown", + "id": "037a60e4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f198b96", + "metadata": { + "editable": true + }, + "source": [ + "The corresponding SymPy code reads" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "620b6c3e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from sympy import *\n", + "x = symbols('x')\n", + "expr = sqrt(x*x+exp(x*x))\n", + "simplify(expr)\n", + "derivative = diff(expr,x)\n", + "print(python(expr))\n", + "print(python(derivative))" + ] + }, + { + "cell_type": "markdown", + "id": "d1fe5ce8", + "metadata": { + "editable": true + }, + "source": [ + "## Counting the number of floating point operations\n", + "\n", + "A simple count of operations shows that we need five operations for\n", + "the function itself and ten for the derivative. Fifteen operations in total if we wish to proceed with the above codes.\n", + "\n", + "Can we reduce this to\n", + "say half the number of operations?" + ] + }, + { + "cell_type": "markdown", + "id": "746e84de", + "metadata": { + "editable": true + }, + "source": [ + "## Defining intermediate operations\n", + "\n", + "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n", + "We define the following quantities" + ] + }, + { + "cell_type": "markdown", + "id": "cbb4abde", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a = x^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "640a0037", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "e3b8b12d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b = \\exp{x^2} = \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5b2087bf", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "5c397a99", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "c= a+b,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4884822", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c1834aef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "d=f(x)=\\sqrt{c}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aeee8fc4", + "metadata": { + "editable": true + }, + "source": [ + "## New expression for the derivative\n", + "\n", + "With these definitions we obtain the following partial derivatives" + ] + }, + { + "cell_type": "markdown", + "id": "df71e889", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a}{\\partial x} = 2x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "358a49a2", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "95138b08", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial b}{\\partial a} = \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0a0e2f81", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "7fa7f3b5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial c}{\\partial a} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c74442e2", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2e9ebae8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial c}{\\partial b} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "db89516c", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "0bc2735a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42e0cb08", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "56ccf1d5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial d} = 1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "557f2482", + "metadata": { + "editable": true + }, + "source": [ + "## Final derivatives\n", + "Our final derivatives are thus" + ] + }, + { + "cell_type": "markdown", + "id": "90eeebe1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6c2abeb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3f5af305", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n", + "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a} = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b78e9f43", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "d197d721", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x} = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "17334528", + "metadata": { + "editable": true + }, + "source": [ + "which is just" + ] + }, + { + "cell_type": "markdown", + "id": "f69ca3fd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e937d622", + "metadata": { + "editable": true + }, + "source": [ + "and requires only three operations if we can reuse all intermediate variables." + ] + }, + { + "cell_type": "markdown", + "id": "8ab7ba6b", + "metadata": { + "editable": true + }, + "source": [ + "## In general not this simple\n", + "\n", + "In general, see the generalization below, unless we can obtain simple\n", + "analytical expressions which we can simplify further, the final\n", + "implementation of automatic differentiation involves repeated\n", + "calculations (and thereby operations) of derivatives of elementary\n", + "functions." + ] + }, + { + "cell_type": "markdown", + "id": "02665ba6", + "metadata": { + "editable": true + }, + "source": [ + "## Automatic differentiation\n", + "\n", + "We can make this example more formal. Automatic differentiation is a\n", + "formalization of the previous example (see graph).\n", + "\n", + "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n", + "\n", + "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is" + ] + }, + { + "cell_type": "markdown", + "id": "c473a49a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6beeffc2", + "metadata": { + "editable": true + }, + "source": [ + "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n", + "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n", + "\n", + "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$." + ] + }, + { + "cell_type": "markdown", + "id": "814918db", + "metadata": { + "editable": true + }, + "source": [ + "## Chain rule\n", + "\n", + "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n", + "We have defined" + ] + }, + { + "cell_type": "markdown", + "id": "a7a72e3b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x_L} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "041df7ab", + "metadata": { + "editable": true + }, + "source": [ + "which allows us to find the derivatives of the various variables $x_i$ as" + ] + }, + { + "cell_type": "markdown", + "id": "b687bc51", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5c87f3af", + "metadata": { + "editable": true + }, + "source": [ + "Whenever we have a function which can be expressed as a computation\n", + "graph and the various functions can be expressed in terms of\n", + "elementary functions that are differentiable, then automatic\n", + "differentiation works. The functions may not need to be elementary\n", + "functions, they could also be computer programs, although not all\n", + "programs can be automatically differentiated." + ] + }, + { + "cell_type": "markdown", + "id": "02df0535", + "metadata": { + "editable": true + }, + "source": [ + "## First network example, simple percepetron with one input\n", + "\n", + "As yet another example we define now a simple perceptron model with\n", + "all quantities given by scalars. We consider only one input variable\n", + "$x$ and one target value $y$. We define an activation function\n", + "$\\sigma_1$ which takes as input" + ] + }, + { + "cell_type": "markdown", + "id": "dc45fa01", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1x+b_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5568395b", + "metadata": { + "editable": true + }, + "source": [ + "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", + "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", + "graph from whiteboard notes). This output is then fed into the\n", + "**cost/loss** function, which we here for the sake of simplicity just\n", + "define as the squared error" + ] + }, + { + "cell_type": "markdown", + "id": "e6ae6f18", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d6abd22", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with no hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "1e466108", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the parameters\n", + "\n", + "In setting up the feed forward and back propagation parts of the\n", + "algorithm, we need now the derivative of the various variables we want\n", + "to train.\n", + "\n", + "We need" + ] + }, + { + "cell_type": "markdown", + "id": "3b6fd059", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfad60fc", + "metadata": { + "editable": true + }, + "source": [ + "Using the chain rule we find" + ] + }, + { + "cell_type": "markdown", + "id": "5c5014b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c677323", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "93362833", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c857a902", + "metadata": { + "editable": true + }, + "source": [ + "which we later will just define as" + ] + }, + { + "cell_type": "markdown", + "id": "b7b95721", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e2574534", + "metadata": { + "editable": true + }, + "source": [ + "## Adding a hidden layer\n", + "\n", + "We change our simple model to (see graph)\n", + "a network with just one hidden layer but with scalar variables only.\n", + "\n", + "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "ae7a5afa", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7962e138", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0add2cb1", + "metadata": { + "editable": true + }, + "source": [ + "and the cost function" + ] + }, + { + "cell_type": "markdown", + "id": "2ea986fc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "683c4849", + "metadata": { + "editable": true + }, + "source": [ + "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." + ] + }, + { + "cell_type": "markdown", + "id": "f345670c", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with one hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "bb15a76b", + "metadata": { + "editable": true + }, + "source": [ + "## The derivatives\n", + "\n", + "The derivatives are now, using the chain rule again" + ] + }, + { + "cell_type": "markdown", + "id": "d0882362", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3e16d45d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2a0a41b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e8f61358", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5a8258cb", + "metadata": { + "editable": true + }, + "source": [ + "Can you generalize this to more than one hidden layer?" + ] + }, + { + "cell_type": "markdown", + "id": "bb720314", + "metadata": { + "editable": true + }, + "source": [ + "## Important observations\n", + "\n", + "From the above equations we see that the derivatives of the activation\n", + "functions play a central role. If they vanish, the training may\n", + "stop. This is called the vanishing gradient problem, see discussions below. If they become\n", + "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n", + "is referenced as the exploding gradient problem." + ] + }, + { + "cell_type": "markdown", + "id": "52217a26", + "metadata": { + "editable": true + }, + "source": [ + "## The training\n", + "\n", + "The training of the parameters is done through various gradient descent approximations with" + ] + }, + { + "cell_type": "markdown", + "id": "eb647e50", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cda95964", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "130a2766", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_i \\leftarrow b_i-\\eta \\delta_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ac7cc3bc", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ is the learning rate.\n", + "\n", + "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n", + "\n", + "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model." + ] + }, + { + "cell_type": "markdown", + "id": "cde60cd2", + "metadata": { + "editable": true + }, + "source": [ + "## Code example\n", + "\n", + "The code here implements the above model with one hidden layer and\n", + "scalar variables for the same function we studied in the previous\n", + "example. The code is however set up so that we can add multiple\n", + "inputs $x$ and target values $y$. Note also that we have the\n", + "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n", + "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3616dd69", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "# We use the Sigmoid function as activation function\n", + "def sigmoid(z):\n", + " return 1.0/(1.0+np.exp(-z))\n", + "\n", + "def forwardpropagation(x):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_1 = np.matmul(x, w_1) + b_1\n", + " # activation in the hidden layer\n", + " a_1 = sigmoid(z_1)\n", + " # weighted sum of inputs to the output layer\n", + " z_2 = np.matmul(a_1, w_2) + b_2\n", + " a_2 = z_2\n", + " return a_1, a_2\n", + "\n", + "def backpropagation(x, y):\n", + " a_1, a_2 = forwardpropagation(x)\n", + " # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n", + " delta_2 = a_2 - y\n", + " print(0.5*((a_2-y)**2))\n", + " # delta for the hidden layer\n", + " delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_1.T, delta_2)\n", + " output_bias_gradient = np.sum(delta_2, axis=0)\n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(x.T, delta_1)\n", + " hidden_bias_gradient = np.sum(delta_1, axis=0)\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = np.array([4.0],dtype=np.float64)\n", + "# Target values\n", + "y = 2*x+1.0 \n", + "\n", + "# Defining the neural network, only scalars here\n", + "n_inputs = x.shape\n", + "n_features = 1\n", + "n_hidden_neurons = 1\n", + "n_outputs = 1\n", + "\n", + "# Initialize the network\n", + "# weights and bias in the hidden layer\n", + "w_1 = np.random.randn(n_features, n_hidden_neurons)\n", + "b_1 = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n", + "b_2 = np.zeros(n_outputs) + 0.01\n", + "\n", + "eta = 0.1\n", + "for i in range(50):\n", + " # calculate gradients\n", + " derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " w_2 -= eta * derivW2\n", + " b_2 -= eta * derivB2\n", + " w_1 -= eta * derivW1\n", + " b_1 -= eta * derivB1" + ] + }, + { + "cell_type": "markdown", + "id": "3348a149", + "metadata": { + "editable": true + }, + "source": [ + "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." + ] + }, + { + "cell_type": "markdown", + "id": "b9b47543", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 1: Including more data\n", + "\n", + "Try to increase the amount of input and\n", + "target/output data. Try also to perform calculations for more values\n", + "of the learning rates. Feel free to add either hyperparameters with an\n", + "$l_1$ norm or an $l_2$ norm and discuss your results.\n", + "Discuss your results as functions of the amount of training data and various learning rates.\n", + "\n", + "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**." + ] + }, + { + "cell_type": "markdown", + "id": "3d2a82c9", + "metadata": { + "editable": true + }, + "source": [ + "## Simple neural network and the back propagation equations\n", + "\n", + "Let us now try to increase our level of ambition and attempt at setting \n", + "up the equations for a neural network with two input nodes, one hidden\n", + "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n", + "\n", + "We need to define the following parameters and variables with the input layer (layer $(0)$) \n", + "where we label the nodes $x_0$ and $x_1$" + ] + }, + { + "cell_type": "markdown", + "id": "e2bda122", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4324d91", + "metadata": { + "editable": true + }, + "source": [ + "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" + ] + }, + { + "cell_type": "markdown", + "id": "b3c0b344", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fb200d12", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with two input nodes, one hidden layer and one output node\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "5a7e37cd", + "metadata": { + "editable": true + }, + "source": [ + "## The ouput layer\n", + "\n", + "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables" + ] + }, + { + "cell_type": "markdown", + "id": "11f25dfa", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8755dbae", + "metadata": { + "editable": true + }, + "source": [ + "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", + "The parameters we need to optimize are given by" + ] + }, + { + "cell_type": "markdown", + "id": "51983594", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20a70d90", + "metadata": { + "editable": true + }, + "source": [ + "## Compact expressions\n", + "\n", + "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n", + "The inputs to the first hidden layer are" + ] + }, + { + "cell_type": "markdown", + "id": "76e186dc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3396d1b9", + "metadata": { + "editable": true + }, + "source": [ + "with outputs" + ] + }, + { + "cell_type": "markdown", + "id": "2f4d2eed", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6863edaa", + "metadata": { + "editable": true + }, + "source": [ + "## Output layer\n", + "\n", + "For the final output layer we have the inputs to the final activation function" + ] + }, + { + "cell_type": "markdown", + "id": "569b5a62", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "88775a53", + "metadata": { + "editable": true + }, + "source": [ + "resulting in the output" + ] + }, + { + "cell_type": "markdown", + "id": "11852c41", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4e2e26a9", + "metadata": { + "editable": true + }, + "source": [ + "## Explicit derivatives\n", + "\n", + "In total we have nine parameters which we need to train. Using the\n", + "chain rule (or just the back-propagation algorithm) we can find all\n", + "derivatives. Since we will use automatic differentiation in reverse\n", + "mode, we start with the derivatives of the cost function with respect\n", + "to the parameters of the output layer, namely" + ] + }, + { + "cell_type": "markdown", + "id": "25da37b5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4094b188", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "99f40072", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a93180cb", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "312c8e22", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4db8065c", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives of the hidden layer\n", + "\n", + "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)" + ] + }, + { + "cell_type": "markdown", + "id": "316b7cc7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}= \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ef16e76", + "metadata": { + "editable": true + }, + "source": [ + "which, noting that" + ] + }, + { + "cell_type": "markdown", + "id": "85a0f70d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "108db06e", + "metadata": { + "editable": true + }, + "source": [ + "allows us to rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "2922e5c6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb6f6fe5", + "metadata": { + "editable": true + }, + "source": [ + "## Final expression\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "3a0d272d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "70a6cf5c", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "a862fb73", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "703fa2c1", + "metadata": { + "editable": true + }, + "source": [ + "Similarly, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "2032458a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "97d8acd7", + "metadata": { + "editable": true + }, + "source": [ + "## Completing the list\n", + "\n", + "Similarly, we find" + ] + }, + { + "cell_type": "markdown", + "id": "972e5301", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba8f5955", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "3ac41463", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ab92a69c", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "8224b6f2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b55a566b", + "metadata": { + "editable": true + }, + "source": [ + "## Final expressions for the biases of the hidden layer\n", + "\n", + "For the sake of completeness, we list the derivatives of the biases, which are" + ] + }, + { + "cell_type": "markdown", + "id": "cb5f687e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6d8361e8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ccfb7fa8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20fd0aa3", + "metadata": { + "editable": true + }, + "source": [ + "As we will see below, these expressions can be generalized in a more compact form." + ] + }, + { + "cell_type": "markdown", + "id": "6bca7f99", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient expressions\n", + "\n", + "For this specific model, with just one output node and two hidden\n", + "nodes, the gradient descent equations take the following form for output layer" + ] + }, + { + "cell_type": "markdown", + "id": "430e26d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ced71f83", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ec12ee1a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f46fe24d", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "af8f924d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4aeb6140", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "0bc2f26c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7eafd358", + "metadata": { + "editable": true + }, + "source": [ + "where $\\eta$ is the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "548f58f6", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 2: Extended program\n", + "\n", + "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "4c38514a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "06303245", + "metadata": { + "editable": true + }, + "source": [ + "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$" + ] + }, + { + "cell_type": "markdown", + "id": "ed0c0029", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "93df389e", + "metadata": { + "editable": true + }, + "source": [ + "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n", + "You can extend your code to include automatic differentiation.\n", + "\n", + "With these examples, we are now ready to embark upon the writing of more a general code for neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "5df18704", + "metadata": { + "editable": true + }, + "source": [ + "## Getting serious, the back propagation equations for a neural network\n", + "\n", + "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n", + "\n", + "We have thus" + ] + }, + { + "cell_type": "markdown", + "id": "ae3765be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dd8f7882", + "metadata": { + "editable": true + }, + "source": [ + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "f204fdd7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c28e8401", + "metadata": { + "editable": true + }, + "source": [ + "and using the Hadamard product of two vectors we can write this as" + ] + }, + { + "cell_type": "markdown", + "id": "910c4eb1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "efd2f948", + "metadata": { + "editable": true + }, + "source": [ + "## Analyzing the last results\n", + "\n", + "This is an important expression. The second term on the right handside\n", + "measures how fast the cost function is changing as a function of the $j$th\n", + "output activation. If, for example, the cost function doesn't depend\n", + "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n", + "which is what we would expect. The first term on the right, measures\n", + "how fast the activation function $f$ is changing at a given activation\n", + "value $z_j^L$." + ] + }, + { + "cell_type": "markdown", + "id": "e1eeeba2", + "metadata": { + "editable": true + }, + "source": [ + "## More considerations\n", + "\n", + "Notice that everything in the above equations is easily computed. In\n", + "particular, we compute $z_j^L$ while computing the behaviour of the\n", + "network, and it is only a small additional overhead to compute\n", + "$\\sigma'(z^L_j)$. The exact form of the derivative with respect to the\n", + "output depends on the form of the cost function.\n", + "However, provided the cost function is known there should be little\n", + "trouble in calculating" + ] + }, + { + "cell_type": "markdown", + "id": "b5e74c11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e129fe72", + "metadata": { + "editable": true + }, + "source": [ + "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" + ] + }, + { + "cell_type": "markdown", + "id": "3879d293", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1ea1da9d", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives in terms of $z_j^L$\n", + "\n", + "It is also easy to see that our previous equation can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "c7156e16", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8311b4aa", + "metadata": { + "editable": true + }, + "source": [ + "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" + ] + }, + { + "cell_type": "markdown", + "id": "7bb3d820", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eeb0c00", + "metadata": { + "editable": true + }, + "source": [ + "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." + ] + }, + { + "cell_type": "markdown", + "id": "bc7d3757", + "metadata": { + "editable": true + }, + "source": [ + "## Bringing it together\n", + "\n", + "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are" + ] + }, + { + "cell_type": "markdown", + "id": "9f018cff", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1},\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ebde7551", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f96aa8f7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "\\label{_auto2} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1215d118", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c5f6885e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "\\label{_auto3} \\tag{4}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1dedde99", + "metadata": { + "editable": true + }, + "source": [ + "## Final back propagating equation\n", + "\n", + "We have that (replacing $L$ with a general layer $l$)" + ] + }, + { + "cell_type": "markdown", + "id": "a182b912", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9fcc3201", + "metadata": { + "editable": true + }, + "source": [ + "We want to express this in terms of the equations for layer $l+1$." + ] + }, + { + "cell_type": "markdown", + "id": "54237463", + "metadata": { + "editable": true + }, + "source": [ + "## Using the chain rule and summing over all $k$ entries\n", + "\n", + "We obtain" + ] + }, + { + "cell_type": "markdown", + "id": "dc069f5a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "71ba0435", + "metadata": { + "editable": true + }, + "source": [ + "and recalling that" + ] + }, + { + "cell_type": "markdown", + "id": "bd00cbe9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e7e0241", + "metadata": { + "editable": true + }, + "source": [ + "with $M_l$ being the number of nodes in layer $l$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "e8e3697e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d86a02b", + "metadata": { + "editable": true + }, + "source": [ + "This is our final equation.\n", + "\n", + "We are now ready to set up the algorithm for back propagation and learning the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "ff1dc46f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm\n", + "\n", + "The four equations provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\hat{x}$ and the activations\n", + "$\\hat{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\hat{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\hat{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "1313e6dc", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "74378773", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "70450254", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "81a28b23", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a733356", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "f469f486", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7461e5e6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "50a1b605", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "0cebce43", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "2e4405bd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2920aa4e", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "bc4357b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9b66569", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/pub/week42/ipynb/week42.ipynb b/doc/LectureNotes/_build/html/_sources/week42.ipynb similarity index 87% rename from doc/pub/week42/ipynb/week42.ipynb rename to doc/LectureNotes/_build/html/_sources/week42.ipynb index 0000be88d..45a126e79 100644 --- a/doc/pub/week42/ipynb/week42.ipynb +++ b/doc/LectureNotes/_build/html/_sources/week42.ipynb @@ -2,8 +2,10 @@ "cells": [ { "cell_type": "markdown", - "id": "0b7206c9", - "metadata": {}, + "id": "d231eeee", + "metadata": { + "editable": true + }, "source": [ "\n", @@ -12,32 +14,45 @@ }, { "cell_type": "markdown", - "id": "66a4424e", - "metadata": {}, + "id": "5e782cb1", + "metadata": { + "editable": true + }, "source": [ "# Week 42 Constructing a Neural Network code with examples\n", - "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", "\n", - "Date: **October 14-18, 2024**" + "Date: **October 13-17, 2025**" ] }, { "cell_type": "markdown", - "id": "2d48e612", - "metadata": {}, + "id": "53309290", + "metadata": { + "editable": true + }, "source": [ - "## Lecture October 14, 2024\n", + "## Lecture October 13, 2025\n", "1. Building our own Feed-forward Neural Network and discussion of project 2\n", "\n", - "**Readings and videos.**\n", - "\n", + "2. Project 2 is available at " + ] + }, + { + "cell_type": "markdown", + "id": "71367514", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and videos\n", "1. These lecture notes\n", "\n", - "2. [Video of lecture](https://youtu.be/7B2F35gNj2Y)\n", + "2. Video of lecture at \n", "\n", - "3. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf)\n", + "3. Whiteboard notes at \n", "\n", - "4. For a more in depth discussion on neural networks we recommend Goodfellow et al chapters 6 and 7. \n", + "4. For a more in depth discussion on neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8. \n", "\n", "5. Neural Networks demystified at \n", "\n", @@ -52,25 +67,25 @@ }, { "cell_type": "markdown", - "id": "493dbcac", - "metadata": {}, + "id": "c7be87be", + "metadata": { + "editable": true + }, "source": [ - "## Material for the active learning sessions on Tuesday and Wednesday\n", - " * Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)\n", + "## Material for the lab sessions on Tuesday and Wednesday\n", + "1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at \n", "\n", - " * Discussion of project 2\n", - "\n", - " \n", - "\n", - "**Note**: some of the codes will also be discussed next week in connection with the solution of differential equations." + "2. Discussion of project 2" ] }, { "cell_type": "markdown", - "id": "0a83a2c3", - "metadata": {}, + "id": "8e0567a2", + "metadata": { + "editable": true + }, "source": [ - "## Writing a code which implements a feed-forward neural network\n", + "## Lecture material: Writing a code which implements a feed-forward neural network\n", "\n", "Last week we discussed the basics of neural networks and deep learning\n", "and the basics of automatic differentiation. We looked also at\n", @@ -79,13 +94,15 @@ "\n", "We ended our discussions with the derivation of the equations for a\n", "neural network with one hidden layers and two input variables and two\n", - "hidden nodes but only one output node." + "hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm." ] }, { "cell_type": "markdown", - "id": "93735ab4", - "metadata": {}, + "id": "549dcc05", + "metadata": { + "editable": true + }, "source": [ "## Mathematics of deep learning\n", "\n", @@ -98,8 +115,10 @@ }, { "cell_type": "markdown", - "id": "9f8617c7", - "metadata": {}, + "id": "21203bae", + "metadata": { + "editable": true + }, "source": [ "## Reminder on books with hands-on material and codes\n", "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)" @@ -107,8 +126,10 @@ }, { "cell_type": "markdown", - "id": "e35c3c4a", - "metadata": {}, + "id": "1c102a30", + "metadata": { + "editable": true + }, "source": [ "## Reading recommendations\n", "\n", @@ -119,10 +140,12 @@ }, { "cell_type": "markdown", - "id": "35d77455", - "metadata": {}, + "id": "53f11afe", + "metadata": { + "editable": true + }, "source": [ - "## First network example, simple percepetron with one input\n", + "## Reminder from last week: First network example, simple percepetron with one input\n", "\n", "As yet another example we define now a simple perceptron model with\n", "all quantities given by scalars. We consider only one input variable\n", @@ -132,8 +155,10 @@ }, { "cell_type": "markdown", - "id": "aed7f415", - "metadata": {}, + "id": "afa8c42a", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z_1 = w_1x+b_1,\n", @@ -142,8 +167,10 @@ }, { "cell_type": "markdown", - "id": "012d3932", - "metadata": {}, + "id": "cb5c959f", + "metadata": { + "editable": true + }, "source": [ "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", @@ -154,8 +181,10 @@ }, { "cell_type": "markdown", - "id": "6c916a40", - "metadata": {}, + "id": "0083ae15", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", @@ -164,8 +193,10 @@ }, { "cell_type": "markdown", - "id": "de97e0a8", - "metadata": {}, + "id": "f4931203", + "metadata": { + "editable": true + }, "source": [ "## Layout of a simple neural network with no hidden layer\n", "\n", @@ -178,8 +209,10 @@ }, { "cell_type": "markdown", - "id": "b2a74b7e", - "metadata": {}, + "id": "d3a3754d", + "metadata": { + "editable": true + }, "source": [ "## Optimizing the parameters\n", "\n", @@ -192,8 +225,10 @@ }, { "cell_type": "markdown", - "id": "a09160e9", - "metadata": {}, + "id": "bcd5dbab", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", @@ -202,16 +237,20 @@ }, { "cell_type": "markdown", - "id": "6e00f28f", - "metadata": {}, + "id": "2cbc30f1", + "metadata": { + "editable": true + }, "source": [ "Using the chain rule we find" ] }, { "cell_type": "markdown", - "id": "91ca6f32", - "metadata": {}, + "id": "1a1d803d", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", @@ -220,16 +259,20 @@ }, { "cell_type": "markdown", - "id": "234f9dd4", - "metadata": {}, + "id": "776735c7", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "0a5bcd5f", - "metadata": {}, + "id": "c1a2e5af", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", @@ -238,16 +281,20 @@ }, { "cell_type": "markdown", - "id": "b781fb94", - "metadata": {}, + "id": "9e603df9", + "metadata": { + "editable": true + }, "source": [ "which we later will just define as" ] }, { "cell_type": "markdown", - "id": "b3d748ee", - "metadata": {}, + "id": "533212cd", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", @@ -256,8 +303,10 @@ }, { "cell_type": "markdown", - "id": "59e42ceb", - "metadata": {}, + "id": "09d91067", + "metadata": { + "editable": true + }, "source": [ "## Adding a hidden layer\n", "\n", @@ -270,8 +319,10 @@ }, { "cell_type": "markdown", - "id": "c2f312ae", - "metadata": {}, + "id": "f767afe7", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", @@ -280,8 +331,10 @@ }, { "cell_type": "markdown", - "id": "1476ad2f", - "metadata": {}, + "id": "f38ded54", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", @@ -290,16 +343,20 @@ }, { "cell_type": "markdown", - "id": "907e90de", - "metadata": {}, + "id": "f3f03bc3", + "metadata": { + "editable": true + }, "source": [ "and the cost function" ] }, { "cell_type": "markdown", - "id": "1d1157b0", - "metadata": {}, + "id": "9062730e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", @@ -308,16 +365,20 @@ }, { "cell_type": "markdown", - "id": "348ddd64", - "metadata": {}, + "id": "75bbc32c", + "metadata": { + "editable": true + }, "source": [ "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." ] }, { "cell_type": "markdown", - "id": "3672dcce", - "metadata": {}, + "id": "fcf02dbf", + "metadata": { + "editable": true + }, "source": [ "## Layout of a simple neural network with one hidden layer\n", "\n", @@ -330,8 +391,10 @@ }, { "cell_type": "markdown", - "id": "785c3632", - "metadata": {}, + "id": "aa97678f", + "metadata": { + "editable": true + }, "source": [ "## The derivatives\n", "\n", @@ -340,8 +403,10 @@ }, { "cell_type": "markdown", - "id": "af633a03", - "metadata": {}, + "id": "98f68e27", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", @@ -350,8 +415,10 @@ }, { "cell_type": "markdown", - "id": "c0fa4b25", - "metadata": {}, + "id": "c4528178", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", @@ -360,8 +427,10 @@ }, { "cell_type": "markdown", - "id": "c6ed00a0", - "metadata": {}, + "id": "d6304298", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", @@ -370,8 +439,10 @@ }, { "cell_type": "markdown", - "id": "a4ff465c", - "metadata": {}, + "id": "dfc47ba6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", @@ -380,16 +451,20 @@ }, { "cell_type": "markdown", - "id": "fd71eaf0", - "metadata": {}, + "id": "8834c3dc", + "metadata": { + "editable": true + }, "source": [ "Can you generalize this to more than one hidden layer?" ] }, { "cell_type": "markdown", - "id": "771d3788", - "metadata": {}, + "id": "40956770", + "metadata": { + "editable": true + }, "source": [ "## Important observations\n", "\n", @@ -402,8 +477,10 @@ }, { "cell_type": "markdown", - "id": "8f4f2e0d", - "metadata": {}, + "id": "69e7fdcf", + "metadata": { + "editable": true + }, "source": [ "## The training\n", "\n", @@ -412,8 +489,10 @@ }, { "cell_type": "markdown", - "id": "6f0e3d04", - "metadata": {}, + "id": "726d4c90", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", @@ -422,16 +501,20 @@ }, { "cell_type": "markdown", - "id": "dffb7c57", - "metadata": {}, + "id": "0ee83d1c", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "82d6c20b", - "metadata": {}, + "id": "f5b3b5a5", + "metadata": { + "editable": true + }, "source": [ "$$\n", "b_i \\leftarrow b_i-\\eta \\delta_i,\n", @@ -440,8 +523,10 @@ }, { "cell_type": "markdown", - "id": "fb058f95", - "metadata": {}, + "id": "b2746792", + "metadata": { + "editable": true + }, "source": [ "with $\\eta$ is the learning rate.\n", "\n", @@ -452,8 +537,10 @@ }, { "cell_type": "markdown", - "id": "49ee11bd", - "metadata": {}, + "id": "76e2e41a", + "metadata": { + "editable": true + }, "source": [ "## Code example\n", "\n", @@ -468,8 +555,11 @@ { "cell_type": "code", "execution_count": 1, - "id": "96f8781a", - "metadata": {}, + "id": "1c4719c1", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import numpy as np\n", @@ -538,16 +628,20 @@ }, { "cell_type": "markdown", - "id": "89957128", - "metadata": {}, + "id": "debaaadc", + "metadata": { + "editable": true + }, "source": [ "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." ] }, { "cell_type": "markdown", - "id": "786ee004", - "metadata": {}, + "id": "7d576f19", + "metadata": { + "editable": true + }, "source": [ "## Simple neural network and the back propagation equations\n", "\n", @@ -561,8 +655,10 @@ }, { "cell_type": "markdown", - "id": "81c9e4d3", - "metadata": {}, + "id": "582b3b43", + "metadata": { + "editable": true + }, "source": [ "$$\n", "x_1 = a_1^{(0)} \\wedge x_2 = a_2^{(0)}.\n", @@ -571,16 +667,20 @@ }, { "cell_type": "markdown", - "id": "521d11f5", - "metadata": {}, + "id": "c8eace47", + "metadata": { + "editable": true + }, "source": [ "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" ] }, { "cell_type": "markdown", - "id": "973b290c", - "metadata": {}, + "id": "81ec9945", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{ij}^{(1)}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_1^{(1)},b_2^{(1)}\\right\\}.\n", @@ -589,8 +689,10 @@ }, { "cell_type": "markdown", - "id": "32390f24", - "metadata": {}, + "id": "c35e1f69", + "metadata": { + "editable": true + }, "source": [ "## Layout of a simple neural network with two input nodes, one hidden layer with two hidden noeds and one output node\n", "\n", @@ -603,8 +705,10 @@ }, { "cell_type": "markdown", - "id": "945def24", - "metadata": {}, + "id": "05b8eea9", + "metadata": { + "editable": true + }, "source": [ "## The ouput layer\n", "\n", @@ -613,8 +717,10 @@ }, { "cell_type": "markdown", - "id": "7f0f65a4", - "metadata": {}, + "id": "7ef9cb55", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{i}^{(2)}=\\left\\{w_{1}^{(2)},w_{2}^{(2)}\\right\\} \\wedge b^{(2)}.\n", @@ -623,8 +729,10 @@ }, { "cell_type": "markdown", - "id": "3851aa3b", - "metadata": {}, + "id": "1eb5c5ac", + "metadata": { + "editable": true + }, "source": [ "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", "The parameters we need to optimize are given by" @@ -632,8 +740,10 @@ }, { "cell_type": "markdown", - "id": "56cf96e2", - "metadata": {}, + "id": "00492358", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{\\Theta}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\\right\\}.\n", @@ -642,8 +752,10 @@ }, { "cell_type": "markdown", - "id": "a17c14e8", - "metadata": {}, + "id": "45cca5aa", + "metadata": { + "editable": true + }, "source": [ "## Compact expressions\n", "\n", @@ -653,8 +765,10 @@ }, { "cell_type": "markdown", - "id": "e62b0591", - "metadata": {}, + "id": "22cfb40b", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{bmatrix}z_1^{(1)} \\\\ z_2^{(1)} \\end{bmatrix}=\\left(\\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\\\ w_{21}^{(1)} &w_{22}^{(1)} \\end{bmatrix}\\right)^{T}\\begin{bmatrix}a_1^{(0)} \\\\ a_2^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_1^{(1)} \\\\ b_2^{(1)} \\end{bmatrix},\n", @@ -663,16 +777,20 @@ }, { "cell_type": "markdown", - "id": "081802b0", - "metadata": {}, + "id": "45b30d06", + "metadata": { + "editable": true + }, "source": [ "with outputs" ] }, { "cell_type": "markdown", - "id": "5d153d02", - "metadata": {}, + "id": "ebd6a7a5", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{bmatrix}a_1^{(1)} \\\\ a_2^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_1^{(1)}) \\\\ \\sigma^{(1)}(z_2^{(1)}) \\end{bmatrix}.\n", @@ -681,8 +799,10 @@ }, { "cell_type": "markdown", - "id": "cd1f6429", - "metadata": {}, + "id": "659dd686", + "metadata": { + "editable": true + }, "source": [ "## Output layer\n", "\n", @@ -691,8 +811,10 @@ }, { "cell_type": "markdown", - "id": "d9f4dbc5", - "metadata": {}, + "id": "34a1d4ca", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},\n", @@ -701,16 +823,20 @@ }, { "cell_type": "markdown", - "id": "1de28add", - "metadata": {}, + "id": "34471712", + "metadata": { + "editable": true + }, "source": [ "resulting in the output" ] }, { "cell_type": "markdown", - "id": "59b0576f", - "metadata": {}, + "id": "0b3a74fd", + "metadata": { + "editable": true + }, "source": [ "$$\n", "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", @@ -719,8 +845,10 @@ }, { "cell_type": "markdown", - "id": "7d33341b", - "metadata": {}, + "id": "1a5bdab3", + "metadata": { + "editable": true + }, "source": [ "## Explicit derivatives\n", "\n", @@ -733,8 +861,10 @@ }, { "cell_type": "markdown", - "id": "428a98ec", - "metadata": {}, + "id": "37f19e78", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", @@ -743,16 +873,20 @@ }, { "cell_type": "markdown", - "id": "77447fe6", - "metadata": {}, + "id": "5505aab8", + "metadata": { + "editable": true + }, "source": [ "with" ] }, { "cell_type": "markdown", - "id": "63aef148", - "metadata": {}, + "id": "d55d045c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", @@ -761,16 +895,20 @@ }, { "cell_type": "markdown", - "id": "730b31af", - "metadata": {}, + "id": "04f101e7", + "metadata": { + "editable": true + }, "source": [ "and finally" ] }, { "cell_type": "markdown", - "id": "d590fdc8", - "metadata": {}, + "id": "bfab2e91", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", @@ -779,8 +917,10 @@ }, { "cell_type": "markdown", - "id": "0923fb8e", - "metadata": {}, + "id": "77f35b7e", + "metadata": { + "editable": true + }, "source": [ "## Derivatives of the hidden layer\n", "\n", @@ -789,8 +929,10 @@ }, { "cell_type": "markdown", - "id": "74c764da", - "metadata": {}, + "id": "8cf4a606", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", @@ -800,16 +942,20 @@ }, { "cell_type": "markdown", - "id": "b384b7ef", - "metadata": {}, + "id": "86951351", + "metadata": { + "editable": true + }, "source": [ "which, noting that" ] }, { "cell_type": "markdown", - "id": "f74a8bc9", - "metadata": {}, + "id": "73414e65", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},\n", @@ -818,16 +964,20 @@ }, { "cell_type": "markdown", - "id": "913ae0bd", - "metadata": {}, + "id": "8f0aaa15", + "metadata": { + "editable": true + }, "source": [ "allows us to rewrite" ] }, { "cell_type": "markdown", - "id": "895cb126", - "metadata": {}, + "id": "730c5415", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}a_1^{(1)}.\n", @@ -836,8 +986,10 @@ }, { "cell_type": "markdown", - "id": "d279d84b", - "metadata": {}, + "id": "1afcb5a1", + "metadata": { + "editable": true + }, "source": [ "## Final expression\n", "Defining" @@ -845,8 +997,10 @@ }, { "cell_type": "markdown", - "id": "e646a164", - "metadata": {}, + "id": "7f30cb44", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)},\n", @@ -855,16 +1009,20 @@ }, { "cell_type": "markdown", - "id": "f46a9699", - "metadata": {}, + "id": "14c045ce", + "metadata": { + "editable": true + }, "source": [ "we have" ] }, { "cell_type": "markdown", - "id": "72a5bd14", - "metadata": {}, + "id": "0c1a2c68", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)}.\n", @@ -873,16 +1031,20 @@ }, { "cell_type": "markdown", - "id": "b7389977", - "metadata": {}, + "id": "a3385222", + "metadata": { + "editable": true + }, "source": [ "Similarly, we obtain" ] }, { "cell_type": "markdown", - "id": "ecbc82bb", - "metadata": {}, + "id": "18ee3804", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_{12}^{(1)}}=\\delta_1^{(1)}a_2^{(1)}.\n", @@ -891,8 +1053,10 @@ }, { "cell_type": "markdown", - "id": "4749523a", - "metadata": {}, + "id": "ad741d56", + "metadata": { + "editable": true + }, "source": [ "## Completing the list\n", "\n", @@ -901,8 +1065,10 @@ }, { "cell_type": "markdown", - "id": "93ea8c62", - "metadata": {}, + "id": "65870a70", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_{21}^{(1)}}=\\delta_2^{(1)}a_1^{(1)},\n", @@ -911,16 +1077,20 @@ }, { "cell_type": "markdown", - "id": "aeea971d", - "metadata": {}, + "id": "f7807fdc", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "bbaf4d20", - "metadata": {}, + "id": "9af4a759", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial w_{22}^{(1)}}=\\delta_2^{(1)}a_2^{(1)},\n", @@ -929,16 +1099,20 @@ }, { "cell_type": "markdown", - "id": "4469f285", - "metadata": {}, + "id": "dc548cb7", + "metadata": { + "editable": true + }, "source": [ "where we have defined" ] }, { "cell_type": "markdown", - "id": "9735e5df", - "metadata": {}, + "id": "83b75e94", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_2^{(1)}=w_2^{(2)}\\frac{\\partial a_2^{(1)}}{\\partial z_2^{(1)}}\\delta^{(2)}.\n", @@ -947,8 +1121,10 @@ }, { "cell_type": "markdown", - "id": "54d79226", - "metadata": {}, + "id": "1c2be559", + "metadata": { + "editable": true + }, "source": [ "## Final expressions for the biases of the hidden layer\n", "\n", @@ -957,8 +1133,10 @@ }, { "cell_type": "markdown", - "id": "4365ba97", - "metadata": {}, + "id": "18b85f86", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)},\n", @@ -967,16 +1145,20 @@ }, { "cell_type": "markdown", - "id": "bcd2d94c", - "metadata": {}, + "id": "63e39eb4", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "5584b15e", - "metadata": {}, + "id": "a55371c1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C}{\\partial b_{2}^{(1)}}=\\delta_2^{(1)}.\n", @@ -985,16 +1167,20 @@ }, { "cell_type": "markdown", - "id": "f0c53017", - "metadata": {}, + "id": "fa31a9b3", + "metadata": { + "editable": true + }, "source": [ "As we will see below, these expressions can be generalized in a more compact form." ] }, { "cell_type": "markdown", - "id": "2ebbdf34", - "metadata": {}, + "id": "580df891", + "metadata": { + "editable": true + }, "source": [ "## Gradient expressions\n", "\n", @@ -1004,8 +1190,10 @@ }, { "cell_type": "markdown", - "id": "b94ec668", - "metadata": {}, + "id": "c10bf2ce", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", @@ -1014,16 +1202,20 @@ }, { "cell_type": "markdown", - "id": "8bcb00fc", - "metadata": {}, + "id": "0bae11f8", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "f8725166", - "metadata": {}, + "id": "ed4a8b93", + "metadata": { + "editable": true + }, "source": [ "$$\n", "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", @@ -1032,16 +1224,20 @@ }, { "cell_type": "markdown", - "id": "bcb99786", - "metadata": {}, + "id": "2d582987", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "975fb151", - "metadata": {}, + "id": "5fa760a1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", @@ -1050,16 +1246,20 @@ }, { "cell_type": "markdown", - "id": "e17fa81d", - "metadata": {}, + "id": "bc9de8bf", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "12912a16", - "metadata": {}, + "id": "f00e3ace", + "metadata": { + "editable": true + }, "source": [ "$$\n", "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", @@ -1068,16 +1268,20 @@ }, { "cell_type": "markdown", - "id": "0c14e44b", - "metadata": {}, + "id": "7ac96362", + "metadata": { + "editable": true + }, "source": [ "where $\\eta$ is the learning rate." ] }, { "cell_type": "markdown", - "id": "6e854ca6", - "metadata": {}, + "id": "9c46f966", + "metadata": { + "editable": true + }, "source": [ "## Setting up the equations for a neural network\n", "\n", @@ -1091,8 +1295,10 @@ }, { "cell_type": "markdown", - "id": "d10945a7", - "metadata": {}, + "id": "ea509b11", + "metadata": { + "editable": true + }, "source": [ "$$\n", "{\\cal C}(\\boldsymbol{\\Theta}) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n", @@ -1101,8 +1307,10 @@ }, { "cell_type": "markdown", - "id": "44212558", - "metadata": {}, + "id": "e08ff771", + "metadata": { + "editable": true + }, "source": [ "where the $y_i$s are our $n$ targets (the values we want to\n", "reproduce), while the outputs of the network after having propagated\n", @@ -1111,10 +1319,12 @@ }, { "cell_type": "markdown", - "id": "3fd80944", - "metadata": {}, + "id": "6f476983", + "metadata": { + "editable": true + }, "source": [ - "## Layout of a neural network with three hidden layers (last later = $l=L=4$, first layer $l=0$)\n", + "## Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$)\n", "\n", "\n", "\n", @@ -1125,8 +1335,10 @@ }, { "cell_type": "markdown", - "id": "42598707", - "metadata": {}, + "id": "0535d087", + "metadata": { + "editable": true + }, "source": [ "## Definitions\n", "\n", @@ -1140,8 +1352,10 @@ }, { "cell_type": "markdown", - "id": "21638e6e", - "metadata": {}, + "id": "5e024ec1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n", @@ -1150,8 +1364,10 @@ }, { "cell_type": "markdown", - "id": "2843d78a", - "metadata": {}, + "id": "239fb4c6", + "metadata": { + "editable": true + }, "source": [ "where $b_k^l$ are the biases from layer $l$. Here $M_{l-1}$\n", "represents the total number of nodes/neurons/units of layer $l-1$. The\n", @@ -1161,8 +1377,10 @@ }, { "cell_type": "markdown", - "id": "152a46dd", - "metadata": {}, + "id": "7e4fa6c5", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{z}^l = \\left(\\boldsymbol{W}^l\\right)^T\\boldsymbol{a}^{l-1}+\\boldsymbol{b}^l.\n", @@ -1171,8 +1389,10 @@ }, { "cell_type": "markdown", - "id": "f888a137", - "metadata": {}, + "id": "c47cc3c6", + "metadata": { + "editable": true + }, "source": [ "## Inputs to the activation function\n", "\n", @@ -1185,8 +1405,10 @@ }, { "cell_type": "markdown", - "id": "adc3a5e4", - "metadata": {}, + "id": "4eb89f11", + "metadata": { + "editable": true + }, "source": [ "$$\n", "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n", @@ -1195,8 +1417,10 @@ }, { "cell_type": "markdown", - "id": "8c598490", - "metadata": {}, + "id": "92744a90", + "metadata": { + "editable": true + }, "source": [ "## Layout of input to first hidden layer $l=1$ from input layer $l=0$\n", "\n", @@ -1209,8 +1433,10 @@ }, { "cell_type": "markdown", - "id": "0ae86831", - "metadata": {}, + "id": "35424d45", + "metadata": { + "editable": true + }, "source": [ "## Derivatives and the chain rule\n", "\n", @@ -1219,8 +1445,10 @@ }, { "cell_type": "markdown", - "id": "077d65f3", - "metadata": {}, + "id": "b8502930", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n", @@ -1229,16 +1457,20 @@ }, { "cell_type": "markdown", - "id": "816b7643", - "metadata": {}, + "id": "81ad45a5", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "c748d997", - "metadata": {}, + "id": "11bb8afb", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n", @@ -1247,16 +1479,20 @@ }, { "cell_type": "markdown", - "id": "6186a477", - "metadata": {}, + "id": "b53ec752", + "metadata": { + "editable": true + }, "source": [ "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)" ] }, { "cell_type": "markdown", - "id": "5159e465", - "metadata": {}, + "id": "b7519a84", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n", @@ -1265,8 +1501,10 @@ }, { "cell_type": "markdown", - "id": "1717c046", - "metadata": {}, + "id": "c57689db", + "metadata": { + "editable": true + }, "source": [ "## Derivative of the cost function\n", "\n", @@ -1277,8 +1515,10 @@ }, { "cell_type": "markdown", - "id": "43b02473", - "metadata": {}, + "id": "a9f83b15", + "metadata": { + "editable": true + }, "source": [ "$$\n", "{\\cal C}(\\boldsymbol{\\Theta}^L) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n", @@ -1287,16 +1527,20 @@ }, { "cell_type": "markdown", - "id": "5034b9a1", - "metadata": {}, + "id": "067c2583", + "metadata": { + "editable": true + }, "source": [ "The derivative of this function with respect to the weights is" ] }, { "cell_type": "markdown", - "id": "cd13d020", - "metadata": {}, + "id": "43545710", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L} = \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}},\n", @@ -1305,16 +1549,20 @@ }, { "cell_type": "markdown", - "id": "592375f7", - "metadata": {}, + "id": "1eb33717", + "metadata": { + "editable": true + }, "source": [ "The last partial derivative can easily be computed and reads (by applying the chain rule)" ] }, { "cell_type": "markdown", - "id": "2bbcf893", - "metadata": {}, + "id": "e09a8734", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.\n", @@ -1323,8 +1571,10 @@ }, { "cell_type": "markdown", - "id": "58fc5cdc", - "metadata": {}, + "id": "3dc0f5a3", + "metadata": { + "editable": true + }, "source": [ "## The back propagation equations for a neural network\n", "\n", @@ -1333,8 +1583,10 @@ }, { "cell_type": "markdown", - "id": "9ec4e6ef", - "metadata": {}, + "id": "bb58784b", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_i^{L-1},\n", @@ -1343,16 +1595,20 @@ }, { "cell_type": "markdown", - "id": "fcdfd63d", - "metadata": {}, + "id": "10aea094", + "metadata": { + "editable": true + }, "source": [ "Defining" ] }, { "cell_type": "markdown", - "id": "5199bd46", - "metadata": {}, + "id": "b7cc2db8", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", @@ -1361,16 +1617,20 @@ }, { "cell_type": "markdown", - "id": "b255a6df", - "metadata": {}, + "id": "6cce9a62", + "metadata": { + "editable": true + }, "source": [ "and using the Hadamard product of two vectors we can write this as" ] }, { "cell_type": "markdown", - "id": "a6617bc8", - "metadata": {}, + "id": "43e5a84b", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{\\delta}^L = \\sigma'(\\boldsymbol{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", @@ -1379,8 +1639,10 @@ }, { "cell_type": "markdown", - "id": "1ae198f0", - "metadata": {}, + "id": "d5c607a7", + "metadata": { + "editable": true + }, "source": [ "## Analyzing the last results\n", "\n", @@ -1395,8 +1657,10 @@ }, { "cell_type": "markdown", - "id": "f0333d5b", - "metadata": {}, + "id": "a51b3b58", + "metadata": { + "editable": true + }, "source": [ "## More considerations\n", "\n", @@ -1411,8 +1675,10 @@ }, { "cell_type": "markdown", - "id": "05f19c67", - "metadata": {}, + "id": "4cd9d058", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", @@ -1421,16 +1687,20 @@ }, { "cell_type": "markdown", - "id": "f35623e3", - "metadata": {}, + "id": "c80b630d", + "metadata": { + "editable": true + }, "source": [ "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" ] }, { "cell_type": "markdown", - "id": "422aabb5", - "metadata": {}, + "id": "dc0c1a06", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial{\\cal C}}{\\partial w_{ij}^L} = \\delta_j^La_i^{L-1}.\n", @@ -1439,8 +1709,10 @@ }, { "cell_type": "markdown", - "id": "5f2f2143", - "metadata": {}, + "id": "8f2065b7", + "metadata": { + "editable": true + }, "source": [ "## Derivatives in terms of $z_j^L$\n", "\n", @@ -1449,8 +1721,10 @@ }, { "cell_type": "markdown", - "id": "e0ec6446", - "metadata": {}, + "id": "7f89b9d8", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", @@ -1459,16 +1733,20 @@ }, { "cell_type": "markdown", - "id": "ab3e824e", - "metadata": {}, + "id": "49c2cd3f", + "metadata": { + "editable": true + }, "source": [ "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" ] }, { "cell_type": "markdown", - "id": "b63e0260", - "metadata": {}, + "id": "517b1a37", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", @@ -1477,16 +1755,20 @@ }, { "cell_type": "markdown", - "id": "8f85d588", - "metadata": {}, + "id": "65c8107f", + "metadata": { + "editable": true + }, "source": [ "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." ] }, { "cell_type": "markdown", - "id": "c53b387b", - "metadata": {}, + "id": "2a10f902", + "metadata": { + "editable": true + }, "source": [ "## Bringing it together\n", "\n", @@ -1495,8 +1777,10 @@ }, { "cell_type": "markdown", - "id": "8bcd2836", - "metadata": {}, + "id": "b2ebf9c2", + "metadata": { + "editable": true + }, "source": [ "\n", "
\n", @@ -1511,16 +1795,20 @@ }, { "cell_type": "markdown", - "id": "e37eab2a", - "metadata": {}, + "id": "90336322", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "e7371843", - "metadata": {}, + "id": "f25ff166", + "metadata": { + "editable": true + }, "source": [ "\n", "
\n", @@ -1535,16 +1823,20 @@ }, { "cell_type": "markdown", - "id": "c6b5e4ee", - "metadata": {}, + "id": "4cf11d5e", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "d49e9c2c", - "metadata": {}, + "id": "2670748d", + "metadata": { + "editable": true + }, "source": [ "\n", "
\n", @@ -1559,8 +1851,10 @@ }, { "cell_type": "markdown", - "id": "bc813237", - "metadata": {}, + "id": "18c29f71", + "metadata": { + "editable": true + }, "source": [ "## Final back propagating equation\n", "\n", @@ -1569,8 +1863,10 @@ }, { "cell_type": "markdown", - "id": "0adc23ee", - "metadata": {}, + "id": "c593470c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", @@ -1579,16 +1875,20 @@ }, { "cell_type": "markdown", - "id": "9f18ba82", - "metadata": {}, + "id": "28e8caef", + "metadata": { + "editable": true + }, "source": [ "We want to express this in terms of the equations for layer $l+1$." ] }, { "cell_type": "markdown", - "id": "b9c3658c", - "metadata": {}, + "id": "516de9d7", + "metadata": { + "editable": true + }, "source": [ "## Using the chain rule and summing over all $k$ entries\n", "\n", @@ -1597,8 +1897,10 @@ }, { "cell_type": "markdown", - "id": "3792e41e", - "metadata": {}, + "id": "004c0bf4", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", @@ -1607,16 +1909,20 @@ }, { "cell_type": "markdown", - "id": "d052d0c1", - "metadata": {}, + "id": "d62a3b1f", + "metadata": { + "editable": true + }, "source": [ "and recalling that" ] }, { "cell_type": "markdown", - "id": "cb86b10e", - "metadata": {}, + "id": "e9af770e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", @@ -1625,16 +1931,20 @@ }, { "cell_type": "markdown", - "id": "4053ba69", - "metadata": {}, + "id": "eca56f17", + "metadata": { + "editable": true + }, "source": [ "with $M_l$ being the number of nodes in layer $l$, we obtain" ] }, { "cell_type": "markdown", - "id": "444986c0", - "metadata": {}, + "id": "bb0e4414", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", @@ -1643,8 +1953,10 @@ }, { "cell_type": "markdown", - "id": "19dc83fd", - "metadata": {}, + "id": "a4b190fc", + "metadata": { + "editable": true + }, "source": [ "This is our final equation.\n", "\n", @@ -1653,8 +1965,10 @@ }, { "cell_type": "markdown", - "id": "a10386fd", - "metadata": {}, + "id": "ec0f87c0", + "metadata": { + "editable": true + }, "source": [ "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n", "\n", @@ -1677,8 +1991,10 @@ }, { "cell_type": "markdown", - "id": "08b046a2", - "metadata": {}, + "id": "2fb45155", + "metadata": { + "editable": true + }, "source": [ "## Setting up the back propagation algorithm, part 1\n", "\n", @@ -1698,8 +2014,10 @@ }, { "cell_type": "markdown", - "id": "abd09f94", - "metadata": {}, + "id": "3d5c2a0e", + "metadata": { + "editable": true + }, "source": [ "## Setting up the back propagation algorithm, part 2\n", "\n", @@ -1708,8 +2026,10 @@ }, { "cell_type": "markdown", - "id": "d2b58a05", - "metadata": {}, + "id": "9183bbd0", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", @@ -1718,16 +2038,20 @@ }, { "cell_type": "markdown", - "id": "9eb706a2", - "metadata": {}, + "id": "32ece956", + "metadata": { + "editable": true + }, "source": [ "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" ] }, { "cell_type": "markdown", - "id": "4dd2c404", - "metadata": {}, + "id": "466d6bda", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", @@ -1736,8 +2060,10 @@ }, { "cell_type": "markdown", - "id": "10ec2807", - "metadata": {}, + "id": "9f31b228", + "metadata": { + "editable": true + }, "source": [ "## Setting up the Back propagation algorithm, part 3\n", "\n", @@ -1748,8 +2074,10 @@ }, { "cell_type": "markdown", - "id": "dc831415", - "metadata": {}, + "id": "fbeac005", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", @@ -1758,8 +2086,10 @@ }, { "cell_type": "markdown", - "id": "b6841649", - "metadata": {}, + "id": "bc6ae984", + "metadata": { + "editable": true + }, "source": [ "$$\n", "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", @@ -1768,16 +2098,20 @@ }, { "cell_type": "markdown", - "id": "355caec2", - "metadata": {}, + "id": "65f3133d", + "metadata": { + "editable": true + }, "source": [ "with $\\eta$ being the learning rate." ] }, { "cell_type": "markdown", - "id": "472bf1c5", - "metadata": {}, + "id": "5d27bbe1", + "metadata": { + "editable": true + }, "source": [ "## Updating the gradients\n", "\n", @@ -1786,8 +2120,10 @@ }, { "cell_type": "markdown", - "id": "d0a449d0", - "metadata": {}, + "id": "5e5d0aa0", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", @@ -1796,16 +2132,20 @@ }, { "cell_type": "markdown", - "id": "128ab1d1", - "metadata": {}, + "id": "ea32e5bb", + "metadata": { + "editable": true + }, "source": [ "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" ] }, { "cell_type": "markdown", - "id": "bfae04eb", - "metadata": {}, + "id": "3a9bb5a6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", @@ -1814,8 +2154,10 @@ }, { "cell_type": "markdown", - "id": "4530efa3", - "metadata": {}, + "id": "9008dcf8", + "metadata": { + "editable": true + }, "source": [ "$$\n", "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", @@ -1824,8 +2166,10 @@ }, { "cell_type": "markdown", - "id": "4da3f467", - "metadata": {}, + "id": "89aba7d6", + "metadata": { + "editable": true + }, "source": [ "## Activation functions\n", "\n", @@ -1845,8 +2189,10 @@ }, { "cell_type": "markdown", - "id": "a3f2750e", - "metadata": {}, + "id": "ea0cdce2", + "metadata": { + "editable": true + }, "source": [ "### Activation functions, Logistic and Hyperbolic ones\n", "\n", @@ -1862,8 +2208,10 @@ }, { "cell_type": "markdown", - "id": "a8996317", - "metadata": {}, + "id": "91342c80", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n", @@ -1872,16 +2220,20 @@ }, { "cell_type": "markdown", - "id": "bb6de399", - "metadata": {}, + "id": "bd6eb22a", + "metadata": { + "editable": true + }, "source": [ "and the *hyperbolic tangent* function" ] }, { "cell_type": "markdown", - "id": "c6aa6e93", - "metadata": {}, + "id": "4e75b2ab", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\sigma(x) = \\tanh(x)\n", @@ -1890,8 +2242,10 @@ }, { "cell_type": "markdown", - "id": "e10b7b94", - "metadata": {}, + "id": "1626d9b7", + "metadata": { + "editable": true + }, "source": [ "## Relevance\n", "\n", @@ -1905,8 +2259,11 @@ { "cell_type": "code", "execution_count": 2, - "id": "7255188c", - "metadata": {}, + "id": "4ac7c23c", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "%matplotlib inline\n", @@ -1986,8 +2343,10 @@ }, { "cell_type": "markdown", - "id": "cbe6427f", - "metadata": {}, + "id": "6aeb0ee4", + "metadata": { + "editable": true + }, "source": [ "## Vanishing gradients\n", "\n", @@ -2006,8 +2365,10 @@ }, { "cell_type": "markdown", - "id": "7aa9a09d", - "metadata": {}, + "id": "ea47d1d6", + "metadata": { + "editable": true + }, "source": [ "## Exploding gradients\n", "\n", @@ -2022,8 +2383,10 @@ }, { "cell_type": "markdown", - "id": "a6833949", - "metadata": {}, + "id": "1947aa95", + "metadata": { + "editable": true + }, "source": [ "## Is the Logistic activation function (Sigmoid) our choice?\n", "\n", @@ -2043,8 +2406,10 @@ }, { "cell_type": "markdown", - "id": "6c4a6dfd", - "metadata": {}, + "id": "d024119f", + "metadata": { + "editable": true + }, "source": [ "## Logistic function as the root of problems\n", "\n", @@ -2060,8 +2425,10 @@ }, { "cell_type": "markdown", - "id": "7ebf35c4", - "metadata": {}, + "id": "c9178132", + "metadata": { + "editable": true + }, "source": [ "## The derivative of the Logistic funtion\n", "\n", @@ -2086,8 +2453,10 @@ }, { "cell_type": "markdown", - "id": "81e57dfd", - "metadata": {}, + "id": "756185f5", + "metadata": { + "editable": true + }, "source": [ "## Insights from the paper by Glorot and Bengio\n", "\n", @@ -2104,8 +2473,10 @@ }, { "cell_type": "markdown", - "id": "c0ad35af", - "metadata": {}, + "id": "3d92cad4", + "metadata": { + "editable": true + }, "source": [ "## The RELU function family\n", "\n", @@ -2123,8 +2494,10 @@ }, { "cell_type": "markdown", - "id": "9abee1c8", - "metadata": {}, + "id": "cbc6f721", + "metadata": { + "editable": true + }, "source": [ "## ELU function\n", "\n", @@ -2135,8 +2508,10 @@ }, { "cell_type": "markdown", - "id": "1d1ff061", - "metadata": {}, + "id": "9249dc7b", + "metadata": { + "editable": true + }, "source": [ "$$\n", "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\ z & z \\ge 0.\\end{array}\\right.\n", @@ -2145,8 +2520,10 @@ }, { "cell_type": "markdown", - "id": "a8b62fc1", - "metadata": {}, + "id": "e59de3af", + "metadata": { + "editable": true + }, "source": [ "## Which activation function should we use?\n", "\n", @@ -2165,8 +2542,10 @@ }, { "cell_type": "markdown", - "id": "a8858730", - "metadata": {}, + "id": "e2da998c", + "metadata": { + "editable": true + }, "source": [ "## More on activation functions, output layers\n", "\n", @@ -2185,8 +2564,10 @@ }, { "cell_type": "markdown", - "id": "be1e67e4", - "metadata": {}, + "id": "e1abf01e", + "metadata": { + "editable": true + }, "source": [ "## Fine-tuning neural network hyperparameters\n", "\n", @@ -2212,8 +2593,10 @@ }, { "cell_type": "markdown", - "id": "361acaff", - "metadata": {}, + "id": "a8ded7cd", + "metadata": { + "editable": true + }, "source": [ "## Hidden layers\n", "\n", @@ -2236,8 +2619,10 @@ }, { "cell_type": "markdown", - "id": "636cc811", - "metadata": {}, + "id": "96da4f48", + "metadata": { + "editable": true + }, "source": [ "## Batch Normalization\n", "\n", @@ -2260,8 +2645,10 @@ }, { "cell_type": "markdown", - "id": "a9c3c37b", - "metadata": {}, + "id": "395346a7", + "metadata": { + "editable": true + }, "source": [ "## Dropout\n", "\n", @@ -2278,8 +2665,10 @@ }, { "cell_type": "markdown", - "id": "99899732", - "metadata": {}, + "id": "9c712bbb", + "metadata": { + "editable": true + }, "source": [ "## Gradient Clipping\n", "\n", @@ -2296,8 +2685,10 @@ }, { "cell_type": "markdown", - "id": "d76992f2", - "metadata": {}, + "id": "2b66ea72", + "metadata": { + "editable": true + }, "source": [ "## A top-down perspective on Neural networks\n", "\n", @@ -2320,8 +2711,10 @@ }, { "cell_type": "markdown", - "id": "f3f3714c", - "metadata": {}, + "id": "5acbc082", + "metadata": { + "editable": true + }, "source": [ "## More top-down perspectives\n", "\n", @@ -2347,8 +2740,10 @@ }, { "cell_type": "markdown", - "id": "f9eea049", - "metadata": {}, + "id": "31825b65", + "metadata": { + "editable": true + }, "source": [ "## Limitations of supervised learning with deep networks\n", "\n", @@ -2363,8 +2758,10 @@ }, { "cell_type": "markdown", - "id": "ffe27b44", - "metadata": {}, + "id": "c76d9af9", + "metadata": { + "editable": true + }, "source": [ "## Limitations of NNs\n", "\n", @@ -2377,8 +2774,10 @@ }, { "cell_type": "markdown", - "id": "8a4d6517", - "metadata": {}, + "id": "bdc93363", + "metadata": { + "editable": true + }, "source": [ "## Homogeneous data\n", "\n", @@ -2387,8 +2786,10 @@ }, { "cell_type": "markdown", - "id": "322a5ffb", - "metadata": {}, + "id": "a1d6ff64", + "metadata": { + "editable": true + }, "source": [ "## More limitations\n", "\n", @@ -2399,8 +2800,10 @@ }, { "cell_type": "markdown", - "id": "d7c906f4", - "metadata": {}, + "id": "0c2e5742", + "metadata": { + "editable": true + }, "source": [ "## Setting up a Multi-layer perceptron model for classification\n", "\n", @@ -2425,8 +2828,10 @@ }, { "cell_type": "markdown", - "id": "d9b1c4a9", - "metadata": {}, + "id": "d4da3f02", + "metadata": { + "editable": true + }, "source": [ "$$\n", "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n", @@ -2435,16 +2840,20 @@ }, { "cell_type": "markdown", - "id": "62795b6a", - "metadata": {}, + "id": "01ea2e0b", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "a302cc7a", - "metadata": {}, + "id": "9c1c7bec", + "metadata": { + "editable": true + }, "source": [ "$$\n", "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n", @@ -2453,8 +2862,10 @@ }, { "cell_type": "markdown", - "id": "892da0f2", - "metadata": {}, + "id": "9238ff2d", + "metadata": { + "editable": true + }, "source": [ "where $y \\in \\{0, 1\\}$ and $\\boldsymbol{\\theta}$ represents the weights and biases\n", "of our network." @@ -2462,8 +2873,10 @@ }, { "cell_type": "markdown", - "id": "0702951f", - "metadata": {}, + "id": "3be74bd1", + "metadata": { + "editable": true + }, "source": [ "## Defining the cost function\n", "\n", @@ -2472,8 +2885,10 @@ }, { "cell_type": "markdown", - "id": "cb0f2050", - "metadata": {}, + "id": "2e2fd39c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n", @@ -2483,8 +2898,10 @@ }, { "cell_type": "markdown", - "id": "9d6da3d7", - "metadata": {}, + "id": "42b1d26b", + "metadata": { + "editable": true + }, "source": [ "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n", "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$. \n", @@ -2506,8 +2923,10 @@ }, { "cell_type": "markdown", - "id": "eb7f6521", - "metadata": {}, + "id": "f740a484", + "metadata": { + "editable": true + }, "source": [ "$$\n", "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n", @@ -2517,8 +2936,10 @@ }, { "cell_type": "markdown", - "id": "2d3caa00", - "metadata": {}, + "id": "19189bfc", + "metadata": { + "editable": true + }, "source": [ "which reduces to the logistic function in the binary case. \n", "The likelihood of this $C$-class classifier\n", @@ -2527,8 +2948,10 @@ }, { "cell_type": "markdown", - "id": "b84b3da0", - "metadata": {}, + "id": "aeb3ef60", + "metadata": { + "editable": true + }, "source": [ "$$\n", "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n", @@ -2537,16 +2960,20 @@ }, { "cell_type": "markdown", - "id": "5be6ff37", - "metadata": {}, + "id": "dbf419a1", + "metadata": { + "editable": true + }, "source": [ "Again we take the negative log-likelihood to define our cost function:" ] }, { "cell_type": "markdown", - "id": "491b47ff", - "metadata": {}, + "id": "9e345753", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n", @@ -2555,8 +2982,10 @@ }, { "cell_type": "markdown", - "id": "c752e63e", - "metadata": {}, + "id": "3b13095e", + "metadata": { + "editable": true + }, "source": [ "See the logistic regression lectures for a full definition of the cost function.\n", "\n", @@ -2565,8 +2994,10 @@ }, { "cell_type": "markdown", - "id": "efe0f0c7", - "metadata": {}, + "id": "96501a91", + "metadata": { + "editable": true + }, "source": [ "## Example: binary classification problem\n", "\n", @@ -2575,8 +3006,10 @@ }, { "cell_type": "markdown", - "id": "13ee778d", - "metadata": {}, + "id": "48cf79fe", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n", @@ -2585,16 +3018,20 @@ }, { "cell_type": "markdown", - "id": "6884cc1e", - "metadata": {}, + "id": "3243c0b1", + "metadata": { + "editable": true + }, "source": [ "where we had defined the logistic (sigmoid) function" ] }, { "cell_type": "markdown", - "id": "9d7a0a4e", - "metadata": {}, + "id": "bb312a09", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n", @@ -2603,16 +3040,20 @@ }, { "cell_type": "markdown", - "id": "35bfbdc8", - "metadata": {}, + "id": "484cf2b4", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "631ba4a6", - "metadata": {}, + "id": "2b9c5483", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n", @@ -2621,8 +3062,10 @@ }, { "cell_type": "markdown", - "id": "a37f8d69", - "metadata": {}, + "id": "5ca21f09", + "metadata": { + "editable": true + }, "source": [ "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n", "\n", @@ -2632,8 +3075,10 @@ }, { "cell_type": "markdown", - "id": "fcedfc85", - "metadata": {}, + "id": "4852e4d2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n", @@ -2642,16 +3087,20 @@ }, { "cell_type": "markdown", - "id": "4c1ab15e", - "metadata": {}, + "id": "e3b7cbef", + "metadata": { + "editable": true + }, "source": [ "with" ] }, { "cell_type": "markdown", - "id": "6518cab5", - "metadata": {}, + "id": "0c1e69a1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n", @@ -2660,8 +3109,10 @@ }, { "cell_type": "markdown", - "id": "ccfcb38c", - "metadata": {}, + "id": "e71df7f4", + "metadata": { + "editable": true + }, "source": [ "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n", "Our cost function at the final layer $l=L$ is now" @@ -2669,8 +3120,10 @@ }, { "cell_type": "markdown", - "id": "4174ce25", - "metadata": {}, + "id": "50d6fecc", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n", @@ -2679,16 +3132,20 @@ }, { "cell_type": "markdown", - "id": "4602e3f2", - "metadata": {}, + "id": "e145e461", + "metadata": { + "editable": true + }, "source": [ "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get" ] }, { "cell_type": "markdown", - "id": "4b71d88d", - "metadata": {}, + "id": "97f13260", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n", @@ -2697,16 +3154,20 @@ }, { "cell_type": "markdown", - "id": "d5f0d911", - "metadata": {}, + "id": "4361ce3b", + "metadata": { + "editable": true + }, "source": [ "In case we use another activation function than the logistic one, we need to evaluate other derivatives." ] }, { "cell_type": "markdown", - "id": "d0c13a16", - "metadata": {}, + "id": "52a16654", + "metadata": { + "editable": true + }, "source": [ "## The Softmax function\n", "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need" @@ -2714,8 +3175,10 @@ }, { "cell_type": "markdown", - "id": "7af1d556", - "metadata": {}, + "id": "3bfb321e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n", @@ -2725,16 +3188,20 @@ }, { "cell_type": "markdown", - "id": "a42c51d7", - "metadata": {}, + "id": "eccac6c9", + "metadata": { + "editable": true + }, "source": [ "For the Softmax function we have" ] }, { "cell_type": "markdown", - "id": "6365d360", - "metadata": {}, + "id": "23634198", + "metadata": { + "editable": true + }, "source": [ "$$\n", "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n", @@ -2743,16 +3210,20 @@ }, { "cell_type": "markdown", - "id": "557a932f", - "metadata": {}, + "id": "7a2e75ba", + "metadata": { + "editable": true + }, "source": [ "Its derivative with respect to $z_j^l$ gives" ] }, { "cell_type": "markdown", - "id": "055db867", - "metadata": {}, + "id": "2dad2d14", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n", @@ -2761,16 +3232,20 @@ }, { "cell_type": "markdown", - "id": "ccd85582", - "metadata": {}, + "id": "46415917", + "metadata": { + "editable": true + }, "source": [ "which in case of the simply binary model reduces to having $i=j$." ] }, { "cell_type": "markdown", - "id": "d2d1077a", - "metadata": {}, + "id": "6adc7c1e", + "metadata": { + "editable": true + }, "source": [ "## Developing a code for doing neural networks with back propagation\n", "\n", @@ -2791,8 +3266,10 @@ }, { "cell_type": "markdown", - "id": "3dff91e3", - "metadata": {}, + "id": "4110d83e", + "metadata": { + "editable": true + }, "source": [ "## Collect and pre-process data\n", "\n", @@ -2839,8 +3316,11 @@ { "cell_type": "code", "execution_count": 3, - "id": "0527bd02", - "metadata": {}, + "id": "070c610d", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# import necessary packages\n", @@ -2889,8 +3369,10 @@ }, { "cell_type": "markdown", - "id": "b5238d8c", - "metadata": {}, + "id": "28bb6085", + "metadata": { + "editable": true + }, "source": [ "## Train and test datasets\n", "\n", @@ -2908,8 +3390,11 @@ { "cell_type": "code", "execution_count": 4, - "id": "bba6be21", - "metadata": {}, + "id": "5a6ae0b0", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", @@ -2943,8 +3428,10 @@ }, { "cell_type": "markdown", - "id": "4a77d660", - "metadata": {}, + "id": "c26d604d", + "metadata": { + "editable": true + }, "source": [ "## Define model and architecture\n", "\n", @@ -2985,8 +3472,10 @@ }, { "cell_type": "markdown", - "id": "0d276258", - "metadata": {}, + "id": "2775283b", + "metadata": { + "editable": true + }, "source": [ "## Layers\n", "\n", @@ -3023,8 +3512,10 @@ }, { "cell_type": "markdown", - "id": "f25c6fa1", - "metadata": {}, + "id": "f7455c00", + "metadata": { + "editable": true + }, "source": [ "## Weights and biases\n", "\n", @@ -3042,8 +3533,11 @@ { "cell_type": "code", "execution_count": 5, - "id": "3a4a6bba", - "metadata": {}, + "id": "20b3c8c0", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# building our neural network\n", @@ -3065,8 +3559,10 @@ }, { "cell_type": "markdown", - "id": "de886e02", - "metadata": {}, + "id": "a41d9acd", + "metadata": { + "editable": true + }, "source": [ "## Feed-forward pass\n", "\n", @@ -3091,8 +3587,10 @@ }, { "cell_type": "markdown", - "id": "baeb1290", - "metadata": {}, + "id": "b2f64238", + "metadata": { + "editable": true + }, "source": [ "## Matrix multiplications\n", "\n", @@ -3126,8 +3624,11 @@ { "cell_type": "code", "execution_count": 6, - "id": "b575ea05", - "metadata": {}, + "id": "1f5589af", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# setup the feed-forward pass, subscript h = hidden layer\n", @@ -3169,8 +3670,10 @@ }, { "cell_type": "markdown", - "id": "a981e9cf", - "metadata": {}, + "id": "4518e911", + "metadata": { + "editable": true + }, "source": [ "## Choose cost function and optimizer\n", "\n", @@ -3198,8 +3701,10 @@ }, { "cell_type": "markdown", - "id": "fcb5b7b4", - "metadata": {}, + "id": "d519516b", + "metadata": { + "editable": true + }, "source": [ "## Optimizing the cost function\n", "\n", @@ -3234,8 +3739,10 @@ }, { "cell_type": "markdown", - "id": "135edd42", - "metadata": {}, + "id": "46b71202", + "metadata": { + "editable": true + }, "source": [ "## Regularization\n", "\n", @@ -3266,8 +3773,10 @@ }, { "cell_type": "markdown", - "id": "6a94b210", - "metadata": {}, + "id": "129c39d3", + "metadata": { + "editable": true + }, "source": [ "## Matrix multiplication\n", "\n", @@ -3305,8 +3814,11 @@ { "cell_type": "code", "execution_count": 7, - "id": "10a0f4b1", - "metadata": {}, + "id": "8abafb44", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# to categorical turns our integer vector into a onehot representation\n", @@ -3381,8 +3893,10 @@ }, { "cell_type": "markdown", - "id": "37e26c5f", - "metadata": {}, + "id": "e95c7166", + "metadata": { + "editable": true + }, "source": [ "## Improving performance\n", "\n", @@ -3400,8 +3914,10 @@ }, { "cell_type": "markdown", - "id": "3721f1b2", - "metadata": {}, + "id": "b4365471", + "metadata": { + "editable": true + }, "source": [ "## Full object-oriented implementation\n", "\n", @@ -3412,8 +3928,11 @@ { "cell_type": "code", "execution_count": 8, - "id": "a2225589", - "metadata": {}, + "id": "5a0357b2", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "class NeuralNetwork:\n", @@ -3519,8 +4038,10 @@ }, { "cell_type": "markdown", - "id": "9d915d49", - "metadata": {}, + "id": "a417307d", + "metadata": { + "editable": true + }, "source": [ "## Evaluate model performance on test data\n", "\n", @@ -3536,8 +4057,11 @@ { "cell_type": "code", "execution_count": 9, - "id": "62e979c5", - "metadata": {}, + "id": "8ee4b306", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "epochs = 100\n", @@ -3560,8 +4084,10 @@ }, { "cell_type": "markdown", - "id": "32464471", - "metadata": {}, + "id": "efcbd954", + "metadata": { + "editable": true + }, "source": [ "## Adjust hyperparameters\n", "\n", @@ -3572,8 +4098,11 @@ { "cell_type": "code", "execution_count": 10, - "id": "c3277c8f", - "metadata": {}, + "id": "bb527e6e", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "eta_vals = np.logspace(-5, 1, 7)\n", @@ -3600,8 +4129,10 @@ }, { "cell_type": "markdown", - "id": "bb09a45b", - "metadata": {}, + "id": "d282951d", + "metadata": { + "editable": true + }, "source": [ "## Visualization" ] @@ -3609,8 +4140,11 @@ { "cell_type": "code", "execution_count": 11, - "id": "6e2566d5", - "metadata": {}, + "id": "69d3d9c8", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# visual representation of grid search\n", @@ -3650,8 +4184,10 @@ }, { "cell_type": "markdown", - "id": "eec163df", - "metadata": {}, + "id": "99f5058c", + "metadata": { + "editable": true + }, "source": [ "## scikit-learn implementation\n", "\n", @@ -3671,8 +4207,11 @@ { "cell_type": "code", "execution_count": 12, - "id": "b33a66c0", - "metadata": {}, + "id": "7898d99f", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from sklearn.neural_network import MLPClassifier\n", @@ -3695,8 +4234,10 @@ }, { "cell_type": "markdown", - "id": "e1d6d9d7", - "metadata": {}, + "id": "7ceec918", + "metadata": { + "editable": true + }, "source": [ "## Visualization" ] @@ -3704,8 +4245,11 @@ { "cell_type": "code", "execution_count": 13, - "id": "4bc9d5c7", - "metadata": {}, + "id": "98abf229", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# optional\n", @@ -3746,8 +4290,10 @@ }, { "cell_type": "markdown", - "id": "c1c46aeb", - "metadata": {}, + "id": "ba07c374", + "metadata": { + "editable": true + }, "source": [ "## Building neural networks in Tensorflow and Keras\n", "\n", @@ -3762,8 +4308,10 @@ }, { "cell_type": "markdown", - "id": "b5205123", - "metadata": {}, + "id": "1cf09819", + "metadata": { + "editable": true + }, "source": [ "## Tensorflow\n", "\n", @@ -3795,8 +4343,11 @@ { "cell_type": "code", "execution_count": 14, - "id": "b9c0dfe3", - "metadata": {}, + "id": "2c2c3ec5", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "pip3 install tensorflow" @@ -3804,8 +4355,10 @@ }, { "cell_type": "markdown", - "id": "f562b8e8", - "metadata": {}, + "id": "39d013b1", + "metadata": { + "editable": true + }, "source": [ "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n", "(current release of CPU-only TensorFlow)" @@ -3814,8 +4367,11 @@ { "cell_type": "code", "execution_count": 15, - "id": "d4526899", - "metadata": {}, + "id": "fbf36c26", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "conda create -n tf tensorflow\n", @@ -3824,8 +4380,10 @@ }, { "cell_type": "markdown", - "id": "59617395", - "metadata": {}, + "id": "94e66380", + "metadata": { + "editable": true + }, "source": [ "To install the current release of GPU TensorFlow" ] @@ -3833,8 +4391,11 @@ { "cell_type": "code", "execution_count": 16, - "id": "9c975a67", - "metadata": {}, + "id": "5e72b1d2", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "conda create -n tf-gpu tensorflow-gpu\n", @@ -3843,8 +4404,10 @@ }, { "cell_type": "markdown", - "id": "975357be", - "metadata": {}, + "id": "40470dbd", + "metadata": { + "editable": true + }, "source": [ "## Using Keras\n", "\n", @@ -3856,8 +4419,11 @@ { "cell_type": "code", "execution_count": 17, - "id": "e558fb1f", - "metadata": {}, + "id": "f2cd4f41", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "conda install keras" @@ -3865,8 +4431,10 @@ }, { "cell_type": "markdown", - "id": "a6f5c4d4", - "metadata": {}, + "id": "636940c6", + "metadata": { + "editable": true + }, "source": [ "You can look up the [instructions here](https://keras.io/) for more information.\n", "\n", @@ -3875,8 +4443,10 @@ }, { "cell_type": "markdown", - "id": "6bb12225", - "metadata": {}, + "id": "d9f47b57", + "metadata": { + "editable": true + }, "source": [ "## Collect and pre-process data\n", "\n", @@ -3886,8 +4456,11 @@ { "cell_type": "code", "execution_count": 18, - "id": "41fd6ccf", - "metadata": {}, + "id": "1489b5d5", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# import necessary packages\n", @@ -3938,8 +4511,11 @@ { "cell_type": "code", "execution_count": 19, - "id": "c70145e7", - "metadata": {}, + "id": "672dc5a2", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from tensorflow.keras.layers import Input\n", @@ -3964,8 +4540,11 @@ { "cell_type": "code", "execution_count": 20, - "id": "f0413064", - "metadata": {}, + "id": "0513084f", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "\n", @@ -3991,8 +4570,11 @@ { "cell_type": "code", "execution_count": 21, - "id": "a4ba8bc0", - "metadata": {}, + "id": "02a34777", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", @@ -4015,8 +4597,11 @@ { "cell_type": "code", "execution_count": 22, - "id": "e38856a7", - "metadata": {}, + "id": "52c1d6e2", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# optional\n", @@ -4054,191 +4639,10 @@ }, { "cell_type": "markdown", - "id": "143ff6b2", - "metadata": {}, - "source": [ - "## The Breast Cancer Data, now with Keras" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "8830114d", - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "import tensorflow as tf\n", - "from tensorflow.keras.layers import Input\n", - "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", - "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", - "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", - "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", - "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "from sklearn.model_selection import train_test_split as splitter\n", - "from sklearn.datasets import load_breast_cancer\n", - "import pickle\n", - "import os \n", - "\n", - "\n", - "\"\"\"Load breast cancer dataset\"\"\"\n", - "\n", - "np.random.seed(0) #create same seed for random number every time\n", - "\n", - "cancer=load_breast_cancer() #Download breast cancer dataset\n", - "\n", - "inputs=cancer.data #Feature matrix of 569 rows (samples) and 30 columns (parameters)\n", - "outputs=cancer.target #Label array of 569 rows (0 for benign and 1 for malignant)\n", - "labels=cancer.feature_names[0:30]\n", - "\n", - "print('The content of the breast cancer dataset is:') #Print information about the datasets\n", - "print(labels)\n", - "print('-------------------------')\n", - "print(\"inputs = \" + str(inputs.shape))\n", - "print(\"outputs = \" + str(outputs.shape))\n", - "print(\"labels = \"+ str(labels.shape))\n", - "\n", - "x=inputs #Reassign the Feature and Label matrices to other variables\n", - "y=outputs\n", - "\n", - "#%% \n", - "\n", - "# Visualisation of dataset (for correlation analysis)\n", - "\n", - "plt.figure()\n", - "plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)\n", - "plt.xlabel('Mean radius',fontweight='bold')\n", - "plt.ylabel('Mean perimeter',fontweight='bold')\n", - "plt.show()\n", - "\n", - "plt.figure()\n", - "plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)\n", - "plt.xlabel('Mean compactness',fontweight='bold')\n", - "plt.ylabel('Mean concavity',fontweight='bold')\n", - "plt.show()\n", - "\n", - "\n", - "plt.figure()\n", - "plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)\n", - "plt.xlabel('Mean radius',fontweight='bold')\n", - "plt.ylabel('Mean texture',fontweight='bold')\n", - "plt.show()\n", - "\n", - "plt.figure()\n", - "plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)\n", - "plt.xlabel('Mean perimeter',fontweight='bold')\n", - "plt.ylabel('Mean compactness',fontweight='bold')\n", - "plt.show()\n", - "\n", - "\n", - "# Generate training and testing datasets\n", - "\n", - "#Select features relevant to classification (texture,perimeter,compactness and symmetery) \n", - "#and add to input matrix\n", - "\n", - "temp1=np.reshape(x[:,1],(len(x[:,1]),1))\n", - "temp2=np.reshape(x[:,2],(len(x[:,2]),1))\n", - "X=np.hstack((temp1,temp2)) \n", - "temp=np.reshape(x[:,5],(len(x[:,5]),1))\n", - "X=np.hstack((X,temp)) \n", - "temp=np.reshape(x[:,8],(len(x[:,8]),1))\n", - "X=np.hstack((X,temp)) \n", - "\n", - "X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1) #Split datasets into training and testing\n", - "\n", - "y_train=to_categorical(y_train) #Convert labels to categorical when using categorical cross entropy\n", - "y_test=to_categorical(y_test)\n", - "\n", - "del temp1,temp2,temp\n", - "\n", - "# %%\n", - "\n", - "# Define tunable parameters\"\n", - "\n", - "eta=np.logspace(-3,-1,3) #Define vector of learning rates (parameter to SGD optimiser)\n", - "lamda=0.01 #Define hyperparameter\n", - "n_layers=2 #Define number of hidden layers in the model\n", - "n_neuron=np.logspace(0,3,4,dtype=int) #Define number of neurons per layer\n", - "epochs=100 #Number of reiterations over the input data\n", - "batch_size=100 #Number of samples per gradient update\n", - "\n", - "# %%\n", - "\n", - "\"\"\"Define function to return Deep Neural Network model\"\"\"\n", - "\n", - "def NN_model(inputsize,n_layers,n_neuron,eta,lamda):\n", - " model=Sequential() \n", - " for i in range(n_layers): #Run loop to add hidden layers to the model\n", - " if (i==0): #First layer requires input dimensions\n", - " model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))\n", - " else: #Subsequent layers are capable of automatic shape inferencing\n", - " model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))\n", - " model.add(Dense(2,activation='softmax')) #2 outputs - ordered and disordered (softmax for prob)\n", - " sgd=optimizers.SGD(lr=eta)\n", - " model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])\n", - " return model\n", - "\n", - " \n", - "Train_accuracy=np.zeros((len(n_neuron),len(eta))) #Define matrices to store accuracy scores as a function\n", - "Test_accuracy=np.zeros((len(n_neuron),len(eta))) #of learning rate and number of hidden neurons for \n", - "\n", - "for i in range(len(n_neuron)): #run loops over hidden neurons and learning rates to calculate \n", - " for j in range(len(eta)): #accuracy scores \n", - " DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)\n", - " DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)\n", - " Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]\n", - " Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]\n", - " \n", - "\n", - "def plot_data(x,y,data,title=None):\n", - "\n", - " # plot results\n", - " fontsize=16\n", - "\n", - "\n", - " fig = plt.figure()\n", - " ax = fig.add_subplot(111)\n", - " cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)\n", - " \n", - " cbar=fig.colorbar(cax)\n", - " cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)\n", - " cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])\n", - " cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])\n", - "\n", - " # put text on matrix elements\n", - " for i, x_val in enumerate(np.arange(len(x))):\n", - " for j, y_val in enumerate(np.arange(len(y))):\n", - " c = \"${0:.1f}\\\\%$\".format( 100*data[j,i]) \n", - " ax.text(x_val, y_val, c, va='center', ha='center')\n", - "\n", - " # convert axis vaues to to string labels\n", - " x=[str(i) for i in x]\n", - " y=[str(i) for i in y]\n", - "\n", - "\n", - " ax.set_xticklabels(['']+x)\n", - " ax.set_yticklabels(['']+y)\n", - "\n", - " ax.set_xlabel('$\\\\mathrm{learning\\\\ rate}$',fontsize=fontsize)\n", - " ax.set_ylabel('$\\\\mathrm{hidden\\\\ neurons}$',fontsize=fontsize)\n", - " if title is not None:\n", - " ax.set_title(title)\n", - "\n", - " plt.tight_layout()\n", - "\n", - " plt.show()\n", - " \n", - "plot_data(eta,n_neuron,Train_accuracy, 'training')\n", - "plot_data(eta,n_neuron,Test_accuracy, 'testing')" - ] - }, - { - "cell_type": "markdown", - "id": "3a018087", - "metadata": {}, + "id": "53f9be79", + "metadata": { + "editable": true + }, "source": [ "## Building a neural network code\n", "\n", @@ -4254,8 +4658,10 @@ }, { "cell_type": "markdown", - "id": "e513a294", - "metadata": {}, + "id": "39bd1718", + "metadata": { + "editable": true + }, "source": [ "### Learning rate methods\n", "\n", @@ -4273,9 +4679,12 @@ }, { "cell_type": "code", - "execution_count": 24, - "id": "5a213611", - "metadata": {}, + "execution_count": 23, + "id": "4c1f42f1", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import autograd.numpy as np\n", @@ -4412,8 +4821,10 @@ }, { "cell_type": "markdown", - "id": "961917fd", - "metadata": {}, + "id": "532aecc2", + "metadata": { + "editable": true + }, "source": [ "### Usage of the above learning rate schedulers\n", "\n", @@ -4425,9 +4836,12 @@ }, { "cell_type": "code", - "execution_count": 25, - "id": "e3745f70", - "metadata": {}, + "execution_count": 24, + "id": "b24b4414", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", @@ -4436,8 +4850,10 @@ }, { "cell_type": "markdown", - "id": "22aca85a", - "metadata": {}, + "id": "32a25c0b", + "metadata": { + "editable": true + }, "source": [ "Here is a small example for how a segment of code using schedulers\n", "could look. Switching out the schedulers is simple." @@ -4445,9 +4861,12 @@ }, { "cell_type": "code", - "execution_count": 26, - "id": "3c9a6d4a", - "metadata": {}, + "execution_count": 25, + "id": "7a7d273f", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "weights = np.ones((3,3))\n", @@ -4465,8 +4884,10 @@ }, { "cell_type": "markdown", - "id": "6cea2036", - "metadata": {}, + "id": "d34cd45c", + "metadata": { + "editable": true + }, "source": [ "### Cost functions\n", "\n", @@ -4478,9 +4899,12 @@ }, { "cell_type": "code", - "execution_count": 27, - "id": "76f5c3ab", - "metadata": {}, + "execution_count": 26, + "id": "9ad6425d", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import autograd.numpy as np\n", @@ -4514,8 +4938,10 @@ }, { "cell_type": "markdown", - "id": "8cbf4208", - "metadata": {}, + "id": "baaaff79", + "metadata": { + "editable": true + }, "source": [ "Below we give a short example of how these cost function may be used\n", "to obtain results if you wish to test them out on your own using\n", @@ -4524,9 +4950,12 @@ }, { "cell_type": "code", - "execution_count": 28, - "id": "d6e082d7", - "metadata": {}, + "execution_count": 27, + "id": "78f11b83", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from autograd import grad\n", @@ -4543,8 +4972,10 @@ }, { "cell_type": "markdown", - "id": "110f03f8", - "metadata": {}, + "id": "05285af5", + "metadata": { + "editable": true + }, "source": [ "### Activation functions\n", "\n", @@ -4556,9 +4987,12 @@ }, { "cell_type": "code", - "execution_count": 29, - "id": "3c6712b7", - "metadata": {}, + "execution_count": 28, + "id": "7ac52c84", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import autograd.numpy as np\n", @@ -4612,8 +5046,10 @@ }, { "cell_type": "markdown", - "id": "498d3949", - "metadata": {}, + "id": "873e7caa", + "metadata": { + "editable": true + }, "source": [ "Below follows a short demonstration of how to use an activation\n", "function. The derivative of the activation function will be important\n", @@ -4624,9 +5060,12 @@ }, { "cell_type": "code", - "execution_count": 30, - "id": "33947583", - "metadata": {}, + "execution_count": 29, + "id": "bd43ac18", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "z = np.array([[4, 5, 6]]).T\n", @@ -4643,8 +5082,10 @@ }, { "cell_type": "markdown", - "id": "731fc79c", - "metadata": {}, + "id": "3dc2175e", + "metadata": { + "editable": true + }, "source": [ "### The Neural Network\n", "\n", @@ -4664,9 +5105,12 @@ }, { "cell_type": "code", - "execution_count": 31, - "id": "f27ea6ab", - "metadata": {}, + "execution_count": 30, + "id": "5b4b161c", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import math\n", @@ -5134,8 +5578,10 @@ }, { "cell_type": "markdown", - "id": "bf5cdac7", - "metadata": {}, + "id": "9596ae53", + "metadata": { + "editable": true + }, "source": [ "Before we make a model, we will quickly generate a dataset we can use\n", "for our linear regression problem as shown below" @@ -5143,9 +5589,12 @@ }, { "cell_type": "code", - "execution_count": 32, - "id": "c57cb644", - "metadata": {}, + "execution_count": 31, + "id": "a11f680f", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import autograd.numpy as np\n", @@ -5185,8 +5634,10 @@ }, { "cell_type": "markdown", - "id": "c5864d33", - "metadata": {}, + "id": "0fc39e40", + "metadata": { + "editable": true + }, "source": [ "Now that we have our dataset ready for the regression, we can create\n", "our regressor. Note that with the seed parameter, we can make sure our\n", @@ -5198,9 +5649,12 @@ }, { "cell_type": "code", - "execution_count": 33, - "id": "474c34e0", - "metadata": {}, + "execution_count": 32, + "id": "a67ab3a0", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "input_nodes = X_train.shape[1]\n", @@ -5211,17 +5665,22 @@ }, { "cell_type": "markdown", - "id": "74f3bc91", - "metadata": {}, + "id": "3add8665", + "metadata": { + "editable": true + }, "source": [ "We then fit our model with our training data using the scheduler of our choice." ] }, { "cell_type": "code", - "execution_count": 34, - "id": "a47d9dc5", - "metadata": {}, + "execution_count": 33, + "id": "4a4fbc7a", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", @@ -5232,8 +5691,10 @@ }, { "cell_type": "markdown", - "id": "bb2d666b", - "metadata": {}, + "id": "4dff1871", + "metadata": { + "editable": true + }, "source": [ "Due to the progress bar we can see the MSE (train_error) throughout\n", "the FFNN's training. Note that the fit() function has some optional\n", @@ -5245,9 +5706,12 @@ }, { "cell_type": "code", - "execution_count": 35, - "id": "f05cdd60", - "metadata": {}, + "execution_count": 34, + "id": "ad40e38c", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", @@ -5257,8 +5721,10 @@ }, { "cell_type": "markdown", - "id": "0034d61c", - "metadata": {}, + "id": "43cd1e22", + "metadata": { + "editable": true + }, "source": [ "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", "\n", @@ -5269,9 +5735,12 @@ }, { "cell_type": "code", - "execution_count": 36, - "id": "67ecf987", - "metadata": {}, + "execution_count": 35, + "id": "cde36b38", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from sklearn.datasets import load_breast_cancer\n", @@ -5292,9 +5761,12 @@ }, { "cell_type": "code", - "execution_count": 37, - "id": "729ba5dd", - "metadata": {}, + "execution_count": 36, + "id": "2bc572a4", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "input_nodes = X_train.shape[1]\n", @@ -5305,17 +5777,22 @@ }, { "cell_type": "markdown", - "id": "719a054a", - "metadata": {}, + "id": "e3e6fa31", + "metadata": { + "editable": true + }, "source": [ "We will now make use of our validation data by passing it into our fit function as a keyword argument" ] }, { "cell_type": "code", - "execution_count": 38, - "id": "7e58cd70", - "metadata": {}, + "execution_count": 37, + "id": "575ceb29", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", @@ -5326,17 +5803,22 @@ }, { "cell_type": "markdown", - "id": "8225cc6d", - "metadata": {}, + "id": "622015f0", + "metadata": { + "editable": true + }, "source": [ "Finally, we will create a neural network with 2 hidden layers with activation functions." ] }, { "cell_type": "code", - "execution_count": 39, - "id": "a134deda", - "metadata": {}, + "execution_count": 38, + "id": "9c075b36", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "input_nodes = X_train.shape[1]\n", @@ -5351,9 +5833,12 @@ }, { "cell_type": "code", - "execution_count": 40, - "id": "92aeced5", - "metadata": {}, + "execution_count": 39, + "id": "44ded771", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", @@ -5364,8 +5849,10 @@ }, { "cell_type": "markdown", - "id": "3f0373a0", - "metadata": {}, + "id": "317e6e5c", + "metadata": { + "editable": true + }, "source": [ "### Multiclass classification\n", "\n", @@ -5376,9 +5863,12 @@ }, { "cell_type": "code", - "execution_count": 41, - "id": "02888bf9", - "metadata": {}, + "execution_count": 40, + "id": "8911de9d", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from sklearn.datasets import load_digits\n", @@ -5411,8 +5901,10 @@ }, { "cell_type": "markdown", - "id": "e7714a6d", - "metadata": {}, + "id": "82d61377", + "metadata": { + "editable": true + }, "source": [ "## Testing the XOR gate and other gates\n", "\n", @@ -5421,9 +5913,12 @@ }, { "cell_type": "code", - "execution_count": 42, - "id": "08a9206b", - "metadata": {}, + "execution_count": 41, + "id": "2a72a374", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", @@ -5442,32 +5937,16 @@ }, { "cell_type": "markdown", - "id": "82cfd04b", - "metadata": {}, + "id": "2d892009", + "metadata": { + "editable": true + }, "source": [ "Not bad, but the results depend strongly on the learning reate. Try different learning rates." ] } ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.15" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/doc/LectureNotes/_build/html/_sources/week43.ipynb b/doc/LectureNotes/_build/html/_sources/week43.ipynb new file mode 100644 index 000000000..b190102b6 --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week43.ipynb @@ -0,0 +1,5950 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5e07edf2", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "44b465a0", + "metadata": { + "editable": true + }, + "source": [ + "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **October 20, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "9d7bd8c9", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 43\n", + "\n", + "**Material for the lecture on Monday October 20, 2025.**\n", + "\n", + "1. Reminder from last week, see also lecture notes from week 42 at as well as those from week 41, see see . \n", + "\n", + "2. Building our own Feed-forward Neural Network.\n", + "\n", + "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n", + "\n", + "4. Start discussions on how to use neural networks for solving differential equations (ordinary and partial ones). This topic continues next week as well.\n", + "\n", + "5. Video of lecture at \n", + "\n", + "6. Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "c50cff0f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercises and lab session week 43\n", + "**Lab sessions on Tuesday and Wednesday.**\n", + "\n", + "1. Work on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n", + "\n", + "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems" + ] + }, + { + "cell_type": "markdown", + "id": "fe8d32ed", + "metadata": { + "editable": true + }, + "source": [ + "## Using Automatic differentiation\n", + "\n", + "In our discussions of ordinary differential equations and neural network codes\n", + "we will also study the usage of Autograd, see for example in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at and the lecture slides from week 41, see ." + ] + }, + { + "cell_type": "markdown", + "id": "99999ab4", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation and automatic differentiation\n", + "\n", + "For more details on the back propagation algorithm and automatic differentiation see\n", + "1. \n", + "\n", + "2. \n", + "\n", + "3. Slides 12-44 at " + ] + }, + { + "cell_type": "markdown", + "id": "b4489372", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday October 20" + ] + }, + { + "cell_type": "markdown", + "id": "f7435e4a", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n", + "This is a reminder from last week.\n", + "\n", + "**The architecture (our model).**\n", + "\n", + "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n", + "\n", + "2. Define the number of hidden layers and hidden nodes\n", + "\n", + "3. Define activation functions for hidden layers and output layers\n", + "\n", + "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n", + "\n", + "5. Define cost function and possible regularization terms with hyperparameters\n", + "\n", + "6. Initialize weights and biases\n", + "\n", + "7. Fix number of iterations for the feed forward part and back propagation part" + ] + }, + { + "cell_type": "markdown", + "id": "e2561576", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 1\n", + "\n", + "Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n", + "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\boldsymbol{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "39ed46ed", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "776b50ac", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b0ad385d", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "bb592830", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41259526", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "47eaff91", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05b74533", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6edb8648", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "a663fc08", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "479150e0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41b9b1ea", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "590c403a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3db8cbb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a204182a", + "metadata": { + "editable": true + }, + "source": [ + "## Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). The following\n", + "restrictions are imposed on an activation function for an FFNN to\n", + "fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "4fe58cce", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, examples\n", + "\n", + "Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "a14f6d08", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4c290410", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "ca1ac514", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9bcfab3", + "metadata": { + "editable": true + }, + "source": [ + "## The RELU function family\n", + "\n", + "The ReLU activation function suffers from a problem known as the dying\n", + "ReLUs: during training, some neurons effectively die, meaning they\n", + "stop outputting anything other than 0.\n", + "\n", + "In some cases, you may find that half of your network’s neurons are\n", + "dead, especially if you used a large learning rate. During training,\n", + "if a neuron’s weights get updated such that the weighted sum of the\n", + "neuron’s inputs is negative, it will start outputting 0. When this\n", + "happen, the neuron is unlikely to come back to life since the gradient\n", + "of the ReLU function is 0 when its input is negative." + ] + }, + { + "cell_type": "markdown", + "id": "2fdf56f7", + "metadata": { + "editable": true + }, + "source": [ + "## ELU function\n", + "\n", + "To solve this problem, nowadays practitioners use a variant of the\n", + "ReLU function, such as the leaky ReLU discussed above or the so-called\n", + "exponential linear unit (ELU) function" + ] + }, + { + "cell_type": "markdown", + "id": "14bf193c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\ z & z \\ge 0.\\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df29068f", + "metadata": { + "editable": true + }, + "source": [ + "## Which activation function should we use?\n", + "\n", + "In general it seems that the ELU activation function is better than\n", + "the leaky ReLU function (and its variants), which is better than\n", + "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n", + "than the logistic function.\n", + "\n", + "If runtime performance is an issue, then you may opt for the leaky\n", + "ReLU function over the ELU function If you don’t want to tweak yet\n", + "another hyperparameter, you may just use the default $\\alpha$ of\n", + "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n", + "computing power, you can use cross-validation or bootstrap to evaluate\n", + "other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "2fb5a29e", + "metadata": { + "editable": true + }, + "source": [ + "## More on activation functions, output layers\n", + "\n", + "In most cases you can use the ReLU activation function in the hidden\n", + "layers (or one of its variants).\n", + "\n", + "It is a bit faster to compute than other activation functions, and the\n", + "gradient descent optimization does in general not get stuck.\n", + "\n", + "**For the output layer:**\n", + "\n", + "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n", + "\n", + "* For regression tasks, you can simply use no activation function at all." + ] + }, + { + "cell_type": "markdown", + "id": "bab79791", + "metadata": { + "editable": true + }, + "source": [ + "## Building neural networks in Tensorflow and Keras\n", + "\n", + "Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n", + "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n", + "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer. \n", + "\n", + "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n", + "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n", + "NumPy arrays." + ] + }, + { + "cell_type": "markdown", + "id": "cc32bc9d", + "metadata": { + "editable": true + }, + "source": [ + "## Tensorflow\n", + "\n", + "Tensorflow is an open source library machine learning library\n", + "developed by the Google Brain team for internal use. It was released\n", + "under the Apache 2.0 open source license in November 9, 2015.\n", + "\n", + "Tensorflow is a computational framework that allows you to construct\n", + "machine learning models at different levels of abstraction, from\n", + "high-level, object-oriented APIs like Keras, down to the C++ kernels\n", + "that Tensorflow is built upon. The higher levels of abstraction are\n", + "simpler to use, but less flexible, and our choice of implementation\n", + "should reflect the problems we are trying to solve.\n", + "\n", + "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n", + "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n", + "to represent your model, and then create a Tensorflow *session* to run the graph.\n", + "\n", + "In this guide we will analyze the same data as we did in our NumPy and\n", + "scikit-learn tutorial, gathered from the MNIST database of images. We\n", + "will give an introduction to the lower level Python Application\n", + "Program Interfaces (APIs), and see how we use them to build our graph.\n", + "Then we will build (effectively) the same graph in Keras, to see just\n", + "how simple solving a machine learning problem can be.\n", + "\n", + "To install tensorflow on Unix/Linux systems, use pip as" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "deb81088", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "pip3 install tensorflow" + ] + }, + { + "cell_type": "markdown", + "id": "979148b0", + "metadata": { + "editable": true + }, + "source": [ + "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n", + "(current release of CPU-only TensorFlow)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ad63b8d9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf tensorflow\n", + "conda activate tf" + ] + }, + { + "cell_type": "markdown", + "id": "1417a40e", + "metadata": { + "editable": true + }, + "source": [ + "To install the current release of GPU TensorFlow" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d56acb3a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf-gpu tensorflow-gpu\n", + "conda activate tf-gpu" + ] + }, + { + "cell_type": "markdown", + "id": "6a163d27", + "metadata": { + "editable": true + }, + "source": [ + "## Using Keras\n", + "\n", + "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n", + "that supports Tensorflow, CTNK and Theano as backends. \n", + "If you have Anaconda installed you may run the following command" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9ee390a8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda install keras" + ] + }, + { + "cell_type": "markdown", + "id": "528ea3d5", + "metadata": { + "editable": true + }, + "source": [ + "You can look up the [instructions here](https://keras.io/) for more information.\n", + "\n", + "We will to a large extent use **keras** in this course." + ] + }, + { + "cell_type": "markdown", + "id": "32178225", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Let us look again at the MINST data set." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "e37f86e4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tensorflow as tf\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "06a7c3bd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-hot representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "358b46c5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "n_neurons_layer1 = 100\n", + "n_neurons_layer2 = 50\n", + "n_categories = 10\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n", + " model = Sequential()\n", + " model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_categories, activation='softmax'))\n", + " \n", + " sgd = optimizers.SGD(learning_rate=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5a0445fb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n", + " eta=eta, lmbd=lmbd)\n", + " DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = DNN.evaluate(X_test, Y_test)\n", + " \n", + " DNN_keras[i][j] = DNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "f301c7cf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " DNN = DNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "610c95e1", + "metadata": { + "editable": true + }, + "source": [ + "## Using Pytorch with the full MNIST data set" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d0f3ad9a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "\n", + "# Device configuration: use GPU if available\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "# MNIST dataset (downloads if not already present)\n", + "transform = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.5,), (0.5,)) # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n", + "])\n", + "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "\n", + "class NeuralNet(nn.Module):\n", + " def __init__(self):\n", + " super(NeuralNet, self).__init__()\n", + " self.fc1 = nn.Linear(28*28, 100) # first hidden layer (784 -> 100)\n", + " self.fc2 = nn.Linear(100, 100) # second hidden layer (100 -> 100)\n", + " self.fc3 = nn.Linear(100, 10) # output layer (100 -> 10 classes)\n", + " def forward(self, x):\n", + " x = x.view(x.size(0), -1) # flatten images into vectors of size 784\n", + " x = torch.relu(self.fc1(x)) # hidden layer 1 + ReLU activation\n", + " x = torch.relu(self.fc2(x)) # hidden layer 2 + ReLU activation\n", + " x = self.fc3(x) # output layer (logits for 10 classes)\n", + " return x\n", + "\n", + "model = NeuralNet().to(device)\n", + "\n", + "\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n", + "\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " model.train() # set model to training mode\n", + " running_loss = 0.0\n", + " for images, labels in train_loader:\n", + " # Move data to device (GPU if available, else CPU)\n", + " images, labels = images.to(device), labels.to(device)\n", + "\n", + " optimizer.zero_grad() # reset gradients to zero\n", + " outputs = model(images) # forward pass: compute predictions\n", + " loss = criterion(outputs, labels) # compute cross-entropy loss\n", + " loss.backward() # backpropagate to compute gradients\n", + " optimizer.step() # update weights using SGD step \n", + "\n", + " running_loss += loss.item()\n", + " # Compute average loss over all batches in this epoch\n", + " avg_loss = running_loss / len(train_loader)\n", + " print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n", + "\n", + "#Evaluation on the Test Set\n", + "\n", + "\n", + "\n", + "model.eval() # set model to evaluation mode \n", + "correct = 0\n", + "total = 0\n", + "with torch.no_grad(): # disable gradient calculation for evaluation \n", + " for images, labels in test_loader:\n", + " images, labels = images.to(device), labels.to(device)\n", + " outputs = model(images)\n", + " _, predicted = torch.max(outputs, dim=1) # class with highest score\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + "accuracy = 100 * correct / total\n", + "print(f\"Test Accuracy: {accuracy:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "aad687aa", + "metadata": { + "editable": true + }, + "source": [ + "## And a similar example using Tensorflow with Keras" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b6c4fad4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "from tensorflow.keras import layers, regularizers\n", + "\n", + "# Check for GPU (TensorFlow will use it automatically if available)\n", + "gpus = tf.config.list_physical_devices('GPU')\n", + "print(f\"GPUs available: {gpus}\")\n", + "\n", + "# 1) Load and preprocess MNIST\n", + "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n", + "# Normalize to [0, 1]\n", + "x_train = (x_train.astype(\"float32\") / 255.0)\n", + "x_test = (x_test.astype(\"float32\") / 255.0)\n", + "\n", + "# 2) Build the model: 784 -> 100 -> 100 -> 10\n", + "l2_reg = 1e-4 # L2 regularization strength\n", + "\n", + "model = keras.Sequential([\n", + " layers.Input(shape=(28, 28)),\n", + " layers.Flatten(),\n", + " layers.Dense(100, activation=\"relu\",\n", + " kernel_regularizer=regularizers.l2(l2_reg)),\n", + " layers.Dense(100, activation=\"relu\",\n", + " kernel_regularizer=regularizers.l2(l2_reg)),\n", + " layers.Dense(10, activation=\"softmax\") # output probabilities for 10 classes\n", + "])\n", + "\n", + "# 3) Compile with SGD + weight decay via L2 regularizers\n", + "model.compile(\n", + " optimizer=keras.optimizers.SGD(learning_rate=0.01),\n", + " loss=\"sparse_categorical_crossentropy\",\n", + " metrics=[\"accuracy\"],\n", + ")\n", + "\n", + "model.summary()\n", + "\n", + "# 4) Train\n", + "history = model.fit(\n", + " x_train, y_train,\n", + " epochs=10,\n", + " batch_size=64,\n", + " validation_split=0.1, # optional: monitor validation during training\n", + " verbose=1\n", + ")\n", + "\n", + "# 5) Evaluate on test set\n", + "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n", + "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "73162fbb", + "metadata": { + "editable": true + }, + "source": [ + "## Building our own neural network code\n", + "\n", + "Here we present a flexible object oriented codebase\n", + "for a feed forward neural network, along with a demonstration of how\n", + "to use it. Before we get into the details of the neural network, we\n", + "will first present some implementations of various schedulers, cost\n", + "functions and activation functions that can be used together with the\n", + "neural network.\n", + "\n", + "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023." + ] + }, + { + "cell_type": "markdown", + "id": "86f36041", + "metadata": { + "editable": true + }, + "source": [ + "### Learning rate methods\n", + "\n", + "The code below shows object oriented implementations of the Constant,\n", + "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n", + "of the classes belong to the shared abstract Scheduler class, and\n", + "share the update_change() and reset() methods allowing for any of the\n", + "schedulers to be seamlessly used during the training stage, as will\n", + "later be shown in the fit() method of the neural\n", + "network. Update_change() only has one parameter, the gradient\n", + "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n", + "from the weights. The reset() function takes no parameters, and resets\n", + "the desired variables. For Constant and Momentum, reset does nothing." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bcbec449", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "class Scheduler:\n", + " \"\"\"\n", + " Abstract class for Schedulers\n", + " \"\"\"\n", + "\n", + " def __init__(self, eta):\n", + " self.eta = eta\n", + "\n", + " # should be overwritten\n", + " def update_change(self, gradient):\n", + " raise NotImplementedError\n", + "\n", + " # overwritten if needed\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Constant(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + "\n", + " def update_change(self, gradient):\n", + " return self.eta * gradient\n", + " \n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Momentum(Scheduler):\n", + " def __init__(self, eta: float, momentum: float):\n", + " super().__init__(eta)\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " self.change = self.momentum * self.change + self.eta * gradient\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Adagrad(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " return self.eta * gradient * G_t_inverse\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class AdagradMomentum(Scheduler):\n", + " def __init__(self, eta, momentum):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class RMS_prop(Scheduler):\n", + " def __init__(self, eta, rho):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.second = 0.0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + " self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n", + " return self.eta * gradient / (np.sqrt(self.second + delta))\n", + "\n", + " def reset(self):\n", + " self.second = 0.0\n", + "\n", + "\n", + "class Adam(Scheduler):\n", + " def __init__(self, eta, rho, rho2):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.rho2 = rho2\n", + " self.moment = 0\n", + " self.second = 0\n", + " self.n_epochs = 1\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n", + " self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n", + "\n", + " moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n", + " second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n", + "\n", + " return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n", + "\n", + " def reset(self):\n", + " self.n_epochs += 1\n", + " self.moment = 0\n", + " self.second = 0" + ] + }, + { + "cell_type": "markdown", + "id": "961989d9", + "metadata": { + "editable": true + }, + "source": [ + "### Usage of the above learning rate schedulers\n", + "\n", + "To initalize a scheduler, simply create the object and pass in the\n", + "necessary parameters such as the learning rate and the momentum as\n", + "shown below. As the Scheduler class is an abstract class it should not\n", + "called directly, and will raise an error upon usage." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1e9fbe0f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", + "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)" + ] + }, + { + "cell_type": "markdown", + "id": "b5adb1b4", + "metadata": { + "editable": true + }, + "source": [ + "Here is a small example for how a segment of code using schedulers\n", + "could look. Switching out the schedulers is simple." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "dc4f4d28", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "weights = np.ones((3,3))\n", + "print(f\"Before scheduler:\\n{weights=}\")\n", + "\n", + "epochs = 10\n", + "for e in range(epochs):\n", + " gradient = np.random.rand(3, 3)\n", + " change = adam_scheduler.update_change(gradient)\n", + " weights = weights - change\n", + " adam_scheduler.reset()\n", + "\n", + "print(f\"\\nAfter scheduler:\\n{weights=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "8964d118", + "metadata": { + "editable": true + }, + "source": [ + "### Cost functions\n", + "\n", + "Here we discuss cost functions that can be used when creating the\n", + "neural network. Every cost function takes the target vector as its\n", + "parameter, and returns a function valued only at $x$ such that it may\n", + "easily be differentiated." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3a8470bd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "def CostOLS(target):\n", + " \n", + " def func(X):\n", + " return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostLogReg(target):\n", + "\n", + " def func(X):\n", + " \n", + " return -(1.0 / target.shape[0]) * np.sum(\n", + " (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n", + " )\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostCrossEntropy(target):\n", + " \n", + " def func(X):\n", + " return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n", + "\n", + " return func" + ] + }, + { + "cell_type": "markdown", + "id": "ab4daf8f", + "metadata": { + "editable": true + }, + "source": [ + "Below we give a short example of how these cost function may be used\n", + "to obtain results if you wish to test them out on your own using\n", + "AutoGrad's automatics differentiation." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "cf8922ac", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "target = np.array([[1, 2, 3]]).T\n", + "a = np.array([[4, 5, 6]]).T\n", + "\n", + "cost_func = CostCrossEntropy\n", + "cost_func_derivative = grad(cost_func(target))\n", + "\n", + "valued_at_a = cost_func_derivative(a)\n", + "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")" + ] + }, + { + "cell_type": "markdown", + "id": "fab332c4", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "Finally, before we look at the neural network, we will look at the\n", + "activation functions which can be specified between the hidden layers\n", + "and as the output function. Each function can be valued for any given\n", + "vector or matrix X, and can be differentiated via derivate()." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "5ab56013", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import elementwise_grad\n", + "\n", + "def identity(X):\n", + " return X\n", + "\n", + "\n", + "def sigmoid(X):\n", + " try:\n", + " return 1.0 / (1 + np.exp(-X))\n", + " except FloatingPointError:\n", + " return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n", + "\n", + "\n", + "def softmax(X):\n", + " X = X - np.max(X, axis=-1, keepdims=True)\n", + " delta = 10e-10\n", + " return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n", + "\n", + "\n", + "def RELU(X):\n", + " return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n", + "\n", + "\n", + "def LRELU(X):\n", + " delta = 10e-4\n", + " return np.where(X > np.zeros(X.shape), X, delta * X)\n", + "\n", + "\n", + "def derivate(func):\n", + " if func.__name__ == \"RELU\":\n", + "\n", + " def func(X):\n", + " return np.where(X > 0, 1, 0)\n", + "\n", + " return func\n", + "\n", + " elif func.__name__ == \"LRELU\":\n", + "\n", + " def func(X):\n", + " delta = 10e-4\n", + " return np.where(X > 0, 1, delta)\n", + "\n", + " return func\n", + "\n", + " else:\n", + " return elementwise_grad(func)" + ] + }, + { + "cell_type": "markdown", + "id": "969612c3", + "metadata": { + "editable": true + }, + "source": [ + "Below follows a short demonstration of how to use an activation\n", + "function. The derivative of the activation function will be important\n", + "when calculating the output delta term during backpropagation. Note\n", + "that derivate() can also be used for cost functions for a more\n", + "generalized approach." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "313878c6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "z = np.array([[4, 5, 6]]).T\n", + "print(f\"Input to activation function:\\n{z}\")\n", + "\n", + "act_func = sigmoid\n", + "a = act_func(z)\n", + "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n", + "\n", + "act_func_derivative = derivate(act_func)\n", + "valued_at_z = act_func_derivative(a)\n", + "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")" + ] + }, + { + "cell_type": "markdown", + "id": "095347a2", + "metadata": { + "editable": true + }, + "source": [ + "### The Neural Network\n", + "\n", + "Now that we have gotten a good understanding of the implementation of\n", + "some important components, we can take a look at an object oriented\n", + "implementation of a feed forward neural network. The feed forward\n", + "neural network has been implemented as a class named FFNN, which can\n", + "be initiated as a regressor or classifier dependant on the choice of\n", + "cost function. The FFNN can have any number of input nodes, hidden\n", + "layers with any amount of hidden nodes, and any amount of output nodes\n", + "meaning it can perform multiclass classification as well as binary\n", + "classification and regression problems. Although there is a lot of\n", + "code present, it makes for an easy to use and generalizeable interface\n", + "for creating many types of neural networks as will be demonstrated\n", + "below." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9ea2b0b7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import math\n", + "import autograd.numpy as np\n", + "import sys\n", + "import warnings\n", + "from autograd import grad, elementwise_grad\n", + "from random import random, seed\n", + "from copy import deepcopy, copy\n", + "from typing import Tuple, Callable\n", + "from sklearn.utils import resample\n", + "\n", + "warnings.simplefilter(\"error\")\n", + "\n", + "\n", + "class FFNN:\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Feed Forward Neural Network with interface enabling flexible design of a\n", + " nerual networks architecture and the specification of activation function\n", + " in the hidden layers and output layer respectively. This model can be used\n", + " for both regression and classification problems, depending on the output function.\n", + "\n", + " Attributes:\n", + " ------------\n", + " I dimensions (tuple[int]): A list of positive integers, which specifies the\n", + " number of nodes in each of the networks layers. The first integer in the array\n", + " defines the number of nodes in the input layer, the second integer defines number\n", + " of nodes in the first hidden layer and so on until the last number, which\n", + " specifies the number of nodes in the output layer.\n", + " II hidden_func (Callable): The activation function for the hidden layers\n", + " III output_func (Callable): The activation function for the output layer\n", + " IV cost_func (Callable): Our cost function\n", + " V seed (int): Sets random seed, makes results reproducible\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " dimensions: tuple[int],\n", + " hidden_func: Callable = sigmoid,\n", + " output_func: Callable = lambda x: x,\n", + " cost_func: Callable = CostOLS,\n", + " seed: int = None,\n", + " ):\n", + " self.dimensions = dimensions\n", + " self.hidden_func = hidden_func\n", + " self.output_func = output_func\n", + " self.cost_func = cost_func\n", + " self.seed = seed\n", + " self.weights = list()\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + " self.classification = None\n", + "\n", + " self.reset_weights()\n", + " self._set_classification()\n", + "\n", + " def fit(\n", + " self,\n", + " X: np.ndarray,\n", + " t: np.ndarray,\n", + " scheduler: Scheduler,\n", + " batches: int = 1,\n", + " epochs: int = 100,\n", + " lam: float = 0,\n", + " X_val: np.ndarray = None,\n", + " t_val: np.ndarray = None,\n", + " ):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " This function performs the training the neural network by performing the feedforward and backpropagation\n", + " algorithm to update the networks weights.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray) : training data\n", + " II t (np.ndarray) : target data\n", + " III scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n", + " IV scheduler_args (list[int]) : list of all arguments necessary for scheduler\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " V batches (int) : number of batches the datasets are split into, default equal to 1\n", + " VI epochs (int) : number of iterations used to train the network, default equal to 100\n", + " VII lam (float) : regularization hyperparameter lambda\n", + " VIII X_val (np.ndarray) : validation set\n", + " IX t_val (np.ndarray) : validation target set\n", + "\n", + " Returns:\n", + " ------------\n", + " I scores (dict) : A dictionary containing the performance metrics of the model.\n", + " The number of the metrics depends on the parameters passed to the fit-function.\n", + "\n", + " \"\"\"\n", + "\n", + " # setup \n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " val_set = False\n", + " if X_val is not None and t_val is not None:\n", + " val_set = True\n", + "\n", + " # creating arrays for score metrics\n", + " train_errors = np.empty(epochs)\n", + " train_errors.fill(np.nan)\n", + " val_errors = np.empty(epochs)\n", + " val_errors.fill(np.nan)\n", + "\n", + " train_accs = np.empty(epochs)\n", + " train_accs.fill(np.nan)\n", + " val_accs = np.empty(epochs)\n", + " val_accs.fill(np.nan)\n", + "\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + "\n", + " batch_size = X.shape[0] // batches\n", + "\n", + " X, t = resample(X, t)\n", + "\n", + " # this function returns a function valued only at X\n", + " cost_function_train = self.cost_func(t)\n", + " if val_set:\n", + " cost_function_val = self.cost_func(t_val)\n", + "\n", + " # create schedulers for each weight matrix\n", + " for i in range(len(self.weights)):\n", + " self.schedulers_weight.append(copy(scheduler))\n", + " self.schedulers_bias.append(copy(scheduler))\n", + "\n", + " print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n", + "\n", + " try:\n", + " for e in range(epochs):\n", + " for i in range(batches):\n", + " # allows for minibatch gradient descent\n", + " if i == batches - 1:\n", + " # If the for loop has reached the last batch, take all thats left\n", + " X_batch = X[i * batch_size :, :]\n", + " t_batch = t[i * batch_size :, :]\n", + " else:\n", + " X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n", + " t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n", + "\n", + " self._feedforward(X_batch)\n", + " self._backpropagate(X_batch, t_batch, lam)\n", + "\n", + " # reset schedulers for each epoch (some schedulers pass in this call)\n", + " for scheduler in self.schedulers_weight:\n", + " scheduler.reset()\n", + "\n", + " for scheduler in self.schedulers_bias:\n", + " scheduler.reset()\n", + "\n", + " # computing performance metrics\n", + " pred_train = self.predict(X)\n", + " train_error = cost_function_train(pred_train)\n", + "\n", + " train_errors[e] = train_error\n", + " if val_set:\n", + " \n", + " pred_val = self.predict(X_val)\n", + " val_error = cost_function_val(pred_val)\n", + " val_errors[e] = val_error\n", + "\n", + " if self.classification:\n", + " train_acc = self._accuracy(self.predict(X), t)\n", + " train_accs[e] = train_acc\n", + " if val_set:\n", + " val_acc = self._accuracy(pred_val, t_val)\n", + " val_accs[e] = val_acc\n", + "\n", + " # printing progress bar\n", + " progression = e / epochs\n", + " print_length = self._progress_bar(\n", + " progression,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " except KeyboardInterrupt:\n", + " # allows for stopping training at any point and seeing the result\n", + " pass\n", + "\n", + " # visualization of training progression (similiar to tensorflow progression bar)\n", + " sys.stdout.write(\"\\r\" + \" \" * print_length)\n", + " sys.stdout.flush()\n", + " self._progress_bar(\n", + " 1,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " sys.stdout.write(\"\")\n", + "\n", + " # return performance metrics for the entire run\n", + " scores = dict()\n", + "\n", + " scores[\"train_errors\"] = train_errors\n", + "\n", + " if val_set:\n", + " scores[\"val_errors\"] = val_errors\n", + "\n", + " if self.classification:\n", + " scores[\"train_accs\"] = train_accs\n", + "\n", + " if val_set:\n", + " scores[\"val_accs\"] = val_accs\n", + "\n", + " return scores\n", + "\n", + " def predict(self, X: np.ndarray, *, threshold=0.5):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs prediction after training of the network has been finished.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " II threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n", + " in classification problems\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " This vector is thresholded if regression=False, meaning that classification results\n", + " in a vector of 1s and 0s, while regressions in an array of decimal numbers\n", + "\n", + " \"\"\"\n", + "\n", + " predict = self._feedforward(X)\n", + "\n", + " if self.classification:\n", + " return np.where(predict > threshold, 1, 0)\n", + " else:\n", + " return predict\n", + "\n", + " def reset_weights(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Resets/Reinitializes the weights in order to train the network for a new problem.\n", + "\n", + " \"\"\"\n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " self.weights = list()\n", + " for i in range(len(self.dimensions) - 1):\n", + " weight_array = np.random.randn(\n", + " self.dimensions[i] + 1, self.dimensions[i + 1]\n", + " )\n", + " weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n", + "\n", + " self.weights.append(weight_array)\n", + "\n", + " def _feedforward(self, X: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates the activation of each layer starting at the input and ending at the output.\n", + " Each following activation is calculated from a weighted sum of each of the preceeding\n", + " activations (except in the case of the input layer).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " \"\"\"\n", + "\n", + " # reset matrices\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + "\n", + " # if X is just a vector, make it into a matrix\n", + " if len(X.shape) == 1:\n", + " X = X.reshape((1, X.shape[0]))\n", + "\n", + " # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n", + " # to add bias to our data\n", + " bias = np.ones((X.shape[0], 1)) * 0.01\n", + " X = np.hstack([bias, X])\n", + "\n", + " # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n", + " # exponent indicates layer number).\n", + " a = X\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(a)\n", + "\n", + " # The feed forward algorithm\n", + " for i in range(len(self.weights)):\n", + " if i < len(self.weights) - 1:\n", + " z = a @ self.weights[i]\n", + " self.z_matrices.append(z)\n", + " a = self.hidden_func(z)\n", + " # bias column again added to the data here\n", + " bias = np.ones((a.shape[0], 1)) * 0.01\n", + " a = np.hstack([bias, a])\n", + " self.a_matrices.append(a)\n", + " else:\n", + " try:\n", + " # a^L, the nodes in our output layers\n", + " z = a @ self.weights[i]\n", + " a = self.output_func(z)\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(z)\n", + " except Exception as OverflowError:\n", + " print(\n", + " \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n", + " )\n", + "\n", + " # this will be a^L\n", + " return a\n", + "\n", + " def _backpropagate(self, X, t, lam):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs the backpropagation algorithm. In other words, this method\n", + " calculates the gradient of all the layers starting at the\n", + " output layer, and moving from right to left accumulates the gradient until\n", + " the input layer is reached. Each layers respective weights are updated while\n", + " the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each.\n", + " II t (np.ndarray): The target vector, with n rows of p targets.\n", + " III lam (float32): regularization parameter used to punish the weights in case of overfitting\n", + "\n", + " Returns:\n", + " ------------\n", + " No return value.\n", + "\n", + " \"\"\"\n", + " out_derivative = derivate(self.output_func)\n", + " hidden_derivative = derivate(self.hidden_func)\n", + "\n", + " for i in range(len(self.weights) - 1, -1, -1):\n", + " # delta terms for output\n", + " if i == len(self.weights) - 1:\n", + " # for multi-class classification\n", + " if (\n", + " self.output_func.__name__ == \"softmax\"\n", + " ):\n", + " delta_matrix = self.a_matrices[i + 1] - t\n", + " # for single class classification\n", + " else:\n", + " cost_func_derivative = grad(self.cost_func(t))\n", + " delta_matrix = out_derivative(\n", + " self.z_matrices[i + 1]\n", + " ) * cost_func_derivative(self.a_matrices[i + 1])\n", + "\n", + " # delta terms for hidden layer\n", + " else:\n", + " delta_matrix = (\n", + " self.weights[i + 1][1:, :] @ delta_matrix.T\n", + " ).T * hidden_derivative(self.z_matrices[i + 1])\n", + "\n", + " # calculate gradient\n", + " gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n", + " gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n", + " 1, delta_matrix.shape[1]\n", + " )\n", + "\n", + " # regularization term\n", + " gradient_weights += self.weights[i][1:, :] * lam\n", + "\n", + " # use scheduler\n", + " update_matrix = np.vstack(\n", + " [\n", + " self.schedulers_bias[i].update_change(gradient_bias),\n", + " self.schedulers_weight[i].update_change(gradient_weights),\n", + " ]\n", + " )\n", + "\n", + " # update weights and bias\n", + " self.weights[i] -= update_matrix\n", + "\n", + " def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates accuracy of given prediction to target\n", + "\n", + " Parameters:\n", + " ------------\n", + " I prediction (np.ndarray): vector of predicitons output network\n", + " (1s and 0s in case of classification, and real numbers in case of regression)\n", + " II target (np.ndarray): vector of true values (What the network ideally should predict)\n", + "\n", + " Returns:\n", + " ------------\n", + " A floating point number representing the percentage of correctly classified instances.\n", + " \"\"\"\n", + " assert prediction.size == target.size\n", + " return np.average((target == prediction))\n", + " def _set_classification(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Decides if FFNN acts as classifier (True) og regressor (False),\n", + " sets self.classification during init()\n", + " \"\"\"\n", + " self.classification = False\n", + " if (\n", + " self.cost_func.__name__ == \"CostLogReg\"\n", + " or self.cost_func.__name__ == \"CostCrossEntropy\"\n", + " ):\n", + " self.classification = True\n", + "\n", + " def _progress_bar(self, progression, **kwargs):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Displays progress of training\n", + " \"\"\"\n", + " print_length = 40\n", + " num_equals = int(progression * print_length)\n", + " num_not = print_length - num_equals\n", + " arrow = \">\" if num_equals > 0 else \"\"\n", + " bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n", + " perc_print = self._format(progression * 100, decimals=5)\n", + " line = f\" {bar} {perc_print}% \"\n", + "\n", + " for key in kwargs:\n", + " if not np.isnan(kwargs[key]):\n", + " value = self._format(kwargs[key], decimals=4)\n", + " line += f\"| {key}: {value} \"\n", + " sys.stdout.write(\"\\r\" + line)\n", + " sys.stdout.flush()\n", + " return len(line)\n", + "\n", + " def _format(self, value, decimals=4):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Formats decimal numbers for progress bar\n", + " \"\"\"\n", + " if value > 0:\n", + " v = value\n", + " elif value < 0:\n", + " v = -10 * value\n", + " else:\n", + " v = 1\n", + " n = 1 + math.floor(math.log10(v))\n", + " if n >= decimals - 1:\n", + " return str(round(value))\n", + " return f\"{value:.{decimals-n-1}f}\"" + ] + }, + { + "cell_type": "markdown", + "id": "0f29bccd", + "metadata": { + "editable": true + }, + "source": [ + "Before we make a model, we will quickly generate a dataset we can use\n", + "for our linear regression problem as shown below" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "dc37b403", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "def SkrankeFunction(x, y):\n", + " return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n", + "\n", + "def create_X(x, y, n):\n", + " if len(x.shape) > 1:\n", + " x = np.ravel(x)\n", + " y = np.ravel(y)\n", + "\n", + " N = len(x)\n", + " l = int((n + 1) * (n + 2) / 2) # Number of elements in beta\n", + " X = np.ones((N, l))\n", + "\n", + " for i in range(1, n + 1):\n", + " q = int((i) * (i + 1) / 2)\n", + " for k in range(i + 1):\n", + " X[:, q + k] = (x ** (i - k)) * (y**k)\n", + "\n", + " return X\n", + "\n", + "step=0.5\n", + "x = np.arange(0, 1, step)\n", + "y = np.arange(0, 1, step)\n", + "x, y = np.meshgrid(x, y)\n", + "target = SkrankeFunction(x, y)\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "poly_degree=3\n", + "X = create_X(x, y, poly_degree)\n", + "\n", + "X_train, X_test, t_train, t_test = train_test_split(X, target)" + ] + }, + { + "cell_type": "markdown", + "id": "91790369", + "metadata": { + "editable": true + }, + "source": [ + "Now that we have our dataset ready for the regression, we can create\n", + "our regressor. Note that with the seed parameter, we can make sure our\n", + "results stay the same every time we run the neural network. For\n", + "inititialization, we simply specify the dimensions (we wish the amount\n", + "of input nodes to be equal to the datapoints, and the output to\n", + "predict one value)." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "62585c7a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "69cdc171", + "metadata": { + "editable": true + }, + "source": [ + "We then fit our model with our training data using the scheduler of our choice." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d0713298", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Constant(eta=1e-3)\n", + "scores = linear_regression.fit(X_train, t_train, scheduler)" + ] + }, + { + "cell_type": "markdown", + "id": "310f805d", + "metadata": { + "editable": true + }, + "source": [ + "Due to the progress bar we can see the MSE (train_error) throughout\n", + "the FFNN's training. Note that the fit() function has some optional\n", + "parameters with defualt arguments. For example, the regularization\n", + "hyperparameter can be left ignored if not needed, and equally the FFNN\n", + "will by default run for 100 epochs. These can easily be changed, such\n", + "as for example:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "216d1c44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "ba2e5a39", + "metadata": { + "editable": true + }, + "source": [ + "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", + "\n", + "Let us then switch to a binary classification. We use a binary\n", + "classification dataset, and follow a similar setup to the regression\n", + "case." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "8c5b291e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "wisconsin = load_breast_cancer()\n", + "X = wisconsin.data\n", + "target = wisconsin.target\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "X_train, X_val, t_train, t_val = train_test_split(X, target)\n", + "\n", + "scaler = MinMaxScaler()\n", + "scaler.fit(X_train)\n", + "X_train = scaler.transform(X_train)\n", + "X_val = scaler.transform(X_val)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "4f6aa682", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "3ff7c54a", + "metadata": { + "editable": true + }, + "source": [ + "We will now make use of our validation data by passing it into our fit function as a keyword argument" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "4bbcaedd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "aa4f54fe", + "metadata": { + "editable": true + }, + "source": [ + "Finally, we will create a neural network with 2 hidden layers with activation functions." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "c11be1f5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 1\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "78482f24", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "678b88e7", + "metadata": { + "editable": true + }, + "source": [ + "### Multiclass classification\n", + "\n", + "Finally, we will demonstrate the use case of multiclass classification\n", + "using our FFNN with the famous MNIST dataset, which contain images of\n", + "digits between the range of 0 to 9." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "833a7321", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "\n", + "def onehot(target: np.ndarray):\n", + " onehot = np.zeros((target.size, target.max() + 1))\n", + " onehot[np.arange(target.size), target] = 1\n", + " return onehot\n", + "\n", + "digits = load_digits()\n", + "\n", + "X = digits.data\n", + "target = digits.target\n", + "target = onehot(target)\n", + "\n", + "input_nodes = 64\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 10\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n", + "\n", + "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = multiclass.fit(X, target, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "1af2ad7b", + "metadata": { + "editable": true + }, + "source": [ + "## Testing the XOR gate and other gates\n", + "\n", + "Let us now use our code to test the XOR gate." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "752c6403", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", + "\n", + "# The XOR gate\n", + "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n", + "\n", + "input_nodes = X.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n", + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "0a7c91e3", + "metadata": { + "editable": true + }, + "source": [ + "Not bad, but the results depend strongly on the learning reate. Try different learning rates." + ] + }, + { + "cell_type": "markdown", + "id": "40ffa1fb", + "metadata": { + "editable": true + }, + "source": [ + "## Solving differential equations with Deep Learning\n", + "\n", + "The Universal Approximation Theorem states that a neural network can\n", + "approximate any function at a single hidden layer along with one input\n", + "and output layer to any given precision.\n", + "\n", + "**Book on solving differential equations with ML methods.**\n", + "\n", + "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n", + "\n", + "**Physics informed neural networks.**\n", + "\n", + "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n", + "\n", + "**Thanks to Kristine Baluka Hein.**\n", + "\n", + "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n", + "A great thanks to Kristine." + ] + }, + { + "cell_type": "markdown", + "id": "191ba3eb", + "metadata": { + "editable": true + }, + "source": [ + "## Ordinary Differential Equations first\n", + "\n", + "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n", + "\n", + "In general, an ordinary differential equation looks like" + ] + }, + { + "cell_type": "markdown", + "id": "a0be312a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{ode} \\tag{1}\n", + "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "000663cf", + "metadata": { + "editable": true + }, + "source": [ + "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n", + "\n", + "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n", + "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n", + "The equation is referred to as a $n$-th order ODE.\n", + "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n", + "for the solution to be unique." + ] + }, + { + "cell_type": "markdown", + "id": "f5b87995", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "Let the trial solution $g_t(x)$ be" + ] + }, + { + "cell_type": "markdown", + "id": "a166c0b6", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f1e49a2c", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n", + "of conditions, $N(x,P)$ a neural network with weights and biases\n", + "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n", + "neural network. The role of the function $h_2(x, N(x,P))$, is to\n", + "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n", + "evaluated at the values of $x$ where the given conditions must be\n", + "satisfied. The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n", + "the conditions.\n", + "\n", + "But what about the network $N(x,P)$?\n", + "\n", + "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation." + ] + }, + { + "cell_type": "markdown", + "id": "207d1a97", + "metadata": { + "editable": true + }, + "source": [ + "## Minimization process\n", + "\n", + "For the minimization to be defined, we need to have a cost function at hand to minimize.\n", + "\n", + "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n", + "We can choose to consider the mean squared error as the cost function for an input $x$.\n", + "Since we are looking at one input, the cost function is just $f$ squared.\n", + "The cost function $c\\left(x, P \\right)$ can therefore be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "94a061a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "93244d03", + "metadata": { + "editable": true + }, + "source": [ + "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n", + "the cost function becomes" + ] + }, + { + "cell_type": "markdown", + "id": "6dc16fd4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{cost} \\tag{3}\n", + "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "01f4c14a", + "metadata": { + "editable": true + }, + "source": [ + "The neural net should then find the parameters $P$ that minimizes the cost function in\n", + "([3](#cost)) for a set of $N$ training samples $x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "1784066c", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cost function using gradient descent and automatic differentiation\n", + "\n", + "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n", + "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n", + "\n", + "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n", + "Automatic differentiation is a method of finding the derivatives numerically with very high precision." + ] + }, + { + "cell_type": "markdown", + "id": "43e1b7bf", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Exponential decay\n", + "\n", + "An exponential decay of a quantity $g(x)$ is described by the equation" + ] + }, + { + "cell_type": "markdown", + "id": "5c28e60a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{solve_expdec} \\tag{4}\n", + " g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfd2e420", + "metadata": { + "editable": true + }, + "source": [ + "with $g(0) = g_0$ for some chosen initial value $g_0$.\n", + "\n", + "The analytical solution of ([4](#solve_expdec)) is" + ] + }, + { + "cell_type": "markdown", + "id": "b93aa0f8", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "093952f0", + "metadata": { + "editable": true + }, + "source": [ + "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))." + ] + }, + { + "cell_type": "markdown", + "id": "8f82fa61", + "metadata": { + "editable": true + }, + "source": [ + "## The function to solve for\n", + "\n", + "The program will use a neural network to solve" + ] + }, + { + "cell_type": "markdown", + "id": "027d9c52", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode} \\tag{6}\n", + "g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c18c4ee8", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n", + "\n", + "In this example, $\\gamma = 2$ and $g_0 = 10$." + ] + }, + { + "cell_type": "markdown", + "id": "a0d7fc0a", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be" + ] + }, + { + "cell_type": "markdown", + "id": "73cd72f4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a4d0850f", + "metadata": { + "editable": true + }, + "source": [ + "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer." + ] + }, + { + "cell_type": "markdown", + "id": "62f3b94f", + "metadata": { + "editable": true + }, + "source": [ + "## Setup of Network\n", + "\n", + "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}}, P_{\\text{output}} \\}$.\n", + "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n", + "\n", + "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n", + "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n", + "\n", + "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n", + "\n", + "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:" + ] + }, + { + "cell_type": "markdown", + "id": "f5144858", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{trial} \\tag{7}\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6b441362", + "metadata": { + "editable": true + }, + "source": [ + "## Reformulating the problem\n", + "\n", + "We wish that our neural network manages to minimize a given cost function.\n", + "\n", + "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n", + "such that it describes the problem a neural network can solve for.\n", + "\n", + "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n", + "\n", + "The trial solution" + ] + }, + { + "cell_type": "markdown", + "id": "abfe2d6d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aabb6c7b", + "metadata": { + "editable": true + }, + "source": [ + "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "11fc8b1b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{nnmin} \\tag{8}\n", + "g_t'(x, P) = - \\gamma g_t(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "604c92b4", + "metadata": { + "editable": true + }, + "source": [ + "is fulfilled as *best as possible*." + ] + }, + { + "cell_type": "markdown", + "id": "e2cd7572", + "metadata": { + "editable": true + }, + "source": [ + "## More technicalities\n", + "\n", + "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n", + "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n", + "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n", + "\n", + "This gives the following cost function our neural network must solve for:" + ] + }, + { + "cell_type": "markdown", + "id": "d916a5f6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d746e69c", + "metadata": { + "editable": true + }, + "source": [ + "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n", + "\n", + "or, in terms of weights and biases for the hidden and output layer in our network:" + ] + }, + { + "cell_type": "markdown", + "id": "4c34c242", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f55f3047", + "metadata": { + "editable": true + }, + "source": [ + "for an input value $x$." + ] + }, + { + "cell_type": "markdown", + "id": "485e4671", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes" + ] + }, + { + "cell_type": "markdown", + "id": "5628ca35", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{min} \\tag{9}\n", + "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "da2c90ea", + "metadata": { + "editable": true + }, + "source": [ + "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes" + ] + }, + { + "cell_type": "markdown", + "id": "d386a466", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P} C(\\boldsymbol{x}, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ec3d975a", + "metadata": { + "editable": true + }, + "source": [ + "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n", + "\n", + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f0f47e7", + "metadata": { + "editable": true + }, + "source": [ + "## A possible implementation of a neural network\n", + "\n", + "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n", + "\n", + "First, the neural network must feed forward the inputs.\n", + "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n", + "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be $N_{\\text{hidden} }$." + ] + }, + { + "cell_type": "markdown", + "id": "a757d9cf", + "metadata": { + "editable": true + }, + "source": [ + "## Technicalities\n", + "\n", + "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:" + ] + }, + { + "cell_type": "markdown", + "id": "ee093dd9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "x_j\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4d3954bf", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities I\n", + "\n", + "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:" + ] + }, + { + "cell_type": "markdown", + "id": "b4b36b8c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big) \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + " b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "x_1 & x_2 & \\dots & x_N\n", + "\\end{pmatrix} \\\\\n", + "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "36e8a1dd", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities II\n", + "\n", + "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n", + "\n", + "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n", + "\n", + "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:" + ] + }, + { + "cell_type": "markdown", + "id": "af2e68be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7b8922c6", + "metadata": { + "editable": true + }, + "source": [ + "It is possible to use other activations functions for the hidden layer also.\n", + "\n", + "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n", + "\n", + "$$\n", + "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big( \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n", + "$$\n", + "\n", + "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n", + "\n", + "The output layer consists of one neuron in this case, and combines the\n", + "output from each of the neurons in the hidden layers. The output layer\n", + "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n", + "and biases $b_i^{\\text{output}}$. In this case,\n", + "it is assumes that the number of neurons in the output layer is one." + ] + }, + { + "cell_type": "markdown", + "id": "2aa977d9", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities III\n", + "\n", + "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously." + ] + }, + { + "cell_type": "markdown", + "id": "48eccfa6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{1,j}^{\\text{output}} & =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "\\boldsymbol{x}_j^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4c2cdbf", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities IV\n", + "\n", + "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:" + ] + }, + { + "cell_type": "markdown", + "id": "be26d9c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}_{1}^{\\text{output}} =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3703c9a", + "metadata": { + "editable": true + }, + "source": [ + "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network." + ] + }, + { + "cell_type": "markdown", + "id": "9859680c", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation\n", + "\n", + "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n", + "\n", + "The chosen cost function for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "c3df269d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dc69023a", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize the cost function, an optimization method must be chosen.\n", + "\n", + "Here, gradient descent with a constant step size has been chosen." + ] + }, + { + "cell_type": "markdown", + "id": "d4bed3bd", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent\n", + "\n", + "The idea of the gradient descent algorithm is to update parameters in\n", + "a direction where the cost function decreases goes to a minimum.\n", + "\n", + "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n", + "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n", + "\\boldsymbol{\\omega})$, goes as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "ed2a4f9a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9a4f604", + "metadata": { + "editable": true + }, + "source": [ + "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n", + "\n", + "The value of $\\lambda$ decides how large steps the algorithm must take\n", + "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n", + "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n", + "to the elements in $\\boldsymbol{\\omega}$.\n", + "\n", + "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n", + "respect to the two sets of weights and biases, that is for the hidden\n", + "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n", + "}$ .\n", + "\n", + "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by" + ] + }, + { + "cell_type": "markdown", + "id": "e48d507f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P) \\\\\n", + "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b84c5cf5", + "metadata": { + "editable": true + }, + "source": [ + "## The code for solving the ODE" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "293d0f7d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Assuming one input, hidden, and output layer\n", + "def neural_network(params, x):\n", + "\n", + " # Find the weights (including and biases) for the hidden and output layer.\n", + " # Assume that params is a list of parameters for each layer.\n", + " # The biases are the first element for each array in params,\n", + " # and the weights are the remaning elements in each array in params.\n", + "\n", + " w_hidden = params[0]\n", + " w_output = params[1]\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " ## Hidden layer:\n", + "\n", + " # Add a row of ones to include bias\n", + " x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_input)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " ## Output layer:\n", + "\n", + " # Include bias:\n", + " x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_hidden)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial(x,params, g0 = 10):\n", + " return g0 + x*neural_network(params,x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n", + "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n", + " ## Set up initial weights and biases\n", + "\n", + " # For the hidden layer\n", + " p0 = npr.randn(num_neurons_hidden, 2 )\n", + "\n", + " # For the output layer\n", + " p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n", + "\n", + " P = [p0, p1]\n", + "\n", + " print('Initial cost: %g'%cost_function(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of two arrays;\n", + " # one for the gradient w.r.t P_hidden and\n", + " # one for the gradient w.r.t P_output\n", + " cost_grad = cost_function_grad(P, x)\n", + "\n", + " P[0] = P[0] - lmb * cost_grad[0]\n", + " P[1] = P[1] - lmb * cost_grad[1]\n", + "\n", + " print('Final cost: %g'%cost_function(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " # Set seed such that the weight are initialized\n", + " # with same weights and biases for every run.\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = 10\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " # Use the network\n", + " P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " # Print the deviation from the trial solution and true solution\n", + " res = g_trial(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "54c070e1", + "metadata": { + "editable": true + }, + "source": [ + "## The network with one input layer, specified number of hidden layers, and one output layer\n", + "\n", + "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n", + "\n", + "The number of neurons within each hidden layer are given as a list of integers in the program below." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "4ab2467e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# The neural network with one input layer and one output layer,\n", + "# but with number of hidden layers specified by the user.\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x,params, g0 = 10):\n", + " return g0 + x*deep_neural_network(params, x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The same cost function as before, but calls deep_neural_network instead.\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input and one output layer,\n", + "# but with specified number of hidden layers from the user.\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # The number of elements in the list num_hidden_neurons thus represents\n", + " # the number of hidden layers.\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weights and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = np.array([10,10])\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " res = g_trial_deep(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','dnn'])\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "05126a03", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Population growth\n", + "\n", + "A logistic model of population growth assumes that a population converges toward an equilibrium.\n", + "The population growth can be modeled by" + ] + }, + { + "cell_type": "markdown", + "id": "7b4e9871", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{log} \\tag{10}\n", + "\tg'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20266e3a", + "metadata": { + "editable": true + }, + "source": [ + "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n", + "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n", + "\n", + "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n", + "and high execution time (this might be more apparent in the examples solving PDEs),\n", + "using a library like TensorFlow is recommended.\n", + "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method." + ] + }, + { + "cell_type": "markdown", + "id": "8a3f1b3d", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the problem\n", + "\n", + "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n", + "The population follows the model" + ] + }, + { + "cell_type": "markdown", + "id": "14dfc04b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode_population} \\tag{11}\n", + "g'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b125d1d3", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$.\n", + "\n", + "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$." + ] + }, + { + "cell_type": "markdown", + "id": "226a3528", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "We will get a slightly different trial solution, as the boundary conditions are different\n", + "compared to the case for exponential decay.\n", + "\n", + "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n", + "\n", + "$$\n", + "h_1(t) = g_0 + t \\cdot N(t,P)\n", + "$$\n", + "\n", + "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n", + "\n", + "The analytical solution is\n", + "\n", + "$$\n", + "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "adeeb731", + "metadata": { + "editable": true + }, + "source": [ + "## The program using Autograd\n", + "\n", + "The network will be the similar as for the exponential decay example, but with some small modifications for our problem." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "eb3ed6d1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Function to get the parameters.\n", + "# Done such that one can easily change the paramaters after one's liking.\n", + "def get_parameters():\n", + " alpha = 2\n", + " A = 1\n", + " g0 = 1.2\n", + " return alpha, A, g0\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = f(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# The right side of the ODE:\n", + "def f(x, g_trial):\n", + " alpha,A, g0 = get_parameters()\n", + " return alpha*g_trial*(A - g_trial)\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x, params):\n", + " alpha,A, g0 = get_parameters()\n", + " return g0 + x*deep_neural_network(params,x)\n", + "\n", + "# The analytical solution:\n", + "def g_analytic(t):\n", + " alpha,A, g0 = get_parameters()\n", + " return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100, 50, 25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2407df1c", + "metadata": { + "editable": true + }, + "source": [ + "## Using forward Euler to solve the ODE\n", + "\n", + "A straightforward way of solving an ODE numerically, is to use Euler's method.\n", + "\n", + "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n", + "\n", + "$$\n", + "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n", + "$$\n", + "\n", + "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields" + ] + }, + { + "cell_type": "markdown", + "id": "e30d9840", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + " g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n", + " &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4af6e338", + "metadata": { + "editable": true + }, + "source": [ + "along with the condition that $g(0) = g_0$.\n", + "\n", + "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n", + "\n", + "For $i \\geq 1$, we have that" + ] + }, + { + "cell_type": "markdown", + "id": "606cf0d3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "t_i &= i\\Delta t \\\\\n", + "&= (i - 1)\\Delta t + \\Delta t \\\\\n", + "&= t_{i-1} + \\Delta t\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3275ea67", + "metadata": { + "editable": true + }, + "source": [ + "Now, if $g_i = g(t_i)$ then" + ] + }, + { + "cell_type": "markdown", + "id": "8c36efec", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " g_i &= g(t_i) \\\\\n", + " &= g(t_{i-1} + \\Delta t) \\\\\n", + " &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n", + " &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odenum} \\tag{12}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5290cde6", + "metadata": { + "editable": true + }, + "source": [ + "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n", + "\n", + "Equation ([12](#odenum)) could be implemented in the following way,\n", + "extending the program that uses the network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "d5488516", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Assume that all function definitions from the example program using Autograd\n", + "# are located here.\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100,50,25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " ## Find an approximation to the funtion using forward Euler\n", + "\n", + " alpha, A, g0 = get_parameters()\n", + " dt = T/(Nt - 1)\n", + "\n", + " # Perform forward Euler to solve the ODE\n", + " g_euler = np.zeros(Nt)\n", + " g_euler[0] = g0\n", + "\n", + " for i in range(1,Nt):\n", + " g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n", + "\n", + " # Print the errors done by each method\n", + " diff1 = np.max(np.abs(g_euler - g_analytical))\n", + " diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n", + "\n", + " print('Max absolute difference between Euler method and analytical: %g'%diff1)\n", + " print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n", + "\n", + " # Plot results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(t,g_euler)\n", + " plt.plot(t,g_analytical)\n", + " plt.plot(t,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['euler','analytical','dnn'])\n", + " plt.xlabel('Time t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "d631641d", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the one dimensional Poisson equation\n", + "\n", + "The Poisson equation for $g(x)$ in one dimension is" + ] + }, + { + "cell_type": "markdown", + "id": "3bd8043b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{poisson} \\tag{13}\n", + " -g''(x) = f(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "818ac1d8", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function for $x \\in (0,1)$.\n", + "\n", + "The conditions that $g(x)$ is chosen to fulfill, are" + ] + }, + { + "cell_type": "markdown", + "id": "894be116", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g(0) &= 0 \\\\\n", + " g(1) &= 0\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c2fce07f", + "metadata": { + "editable": true + }, + "source": [ + "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n", + "The results from the networks can then be compared to the analytical solution.\n", + "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "1e2ffb5e", + "metadata": { + "editable": true + }, + "source": [ + "## The specific equation to solve for\n", + "\n", + "Here, the function $g(x)$ to solve for follows the equation" + ] + }, + { + "cell_type": "markdown", + "id": "5677eb07", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "-g''(x) = f(x),\\qquad x \\in (0,1)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "89173815", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function, along with the chosen conditions" + ] + }, + { + "cell_type": "markdown", + "id": "f6e81c01", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0) = g(1) = 0\n", + "\\end{aligned}\\label{cond} \\tag{14}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82b4c100", + "metadata": { + "editable": true + }, + "source": [ + "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n", + "\n", + "For this case, a possible trial solution satisfying the conditions could be" + ] + }, + { + "cell_type": "markdown", + "id": "05574f7f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5c17a08c", + "metadata": { + "editable": true + }, + "source": [ + "The analytical solution for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "a0ce240a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g(x) = x(1 - x)\\exp(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d90da9be", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the equation using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "ffd8b552", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2cde42e7", + "metadata": { + "editable": true + }, + "source": [ + "## Comparing with a numerical scheme\n", + "\n", + "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n", + "\n", + "Using Taylor series, the second derivative can be expressed as\n", + "\n", + "$$\n", + "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n", + "$$\n", + "\n", + "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n", + "\n", + "Looking away from the error terms gives an approximation to the second derivative:" + ] + }, + { + "cell_type": "markdown", + "id": "e24a46af", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{approx} \\tag{15}\n", + "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2417ec7c", + "metadata": { + "editable": true + }, + "source": [ + "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes" + ] + }, + { + "cell_type": "markdown", + "id": "012a9c2b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n", + "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "101bccb8", + "metadata": { + "editable": true + }, + "source": [ + "Since we know from our problem that" + ] + }, + { + "cell_type": "markdown", + "id": "280cdc54", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "-g''(x) &= f(x) \\\\\n", + "&= (3x + x^2)\\exp(x)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "38bc9035", + "metadata": { + "editable": true + }, + "source": [ + "along with the conditions $g(0) = g(1) = 0$,\n", + "the following scheme can be used to find an approximate solution for $g(x)$ numerically:" + ] + }, + { + "cell_type": "markdown", + "id": "3925a117", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n", + " -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odesys} \\tag{16}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f86e85b", + "metadata": { + "editable": true + }, + "source": [ + "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n", + "\n", + "The equation can be rewritten into a matrix equation:" + ] + }, + { + "cell_type": "markdown", + "id": "394b14bc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\begin{pmatrix}\n", + "2 & -1 & 0 & \\dots & 0 \\\\\n", + "-1 & 2 & -1 & \\dots & 0 \\\\\n", + "\\vdots & & \\ddots & & \\vdots \\\\\n", + "0 & \\dots & -1 & 2 & -1 \\\\\n", + "0 & \\dots & 0 & -1 & 2\\\\\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "g_1 \\\\\n", + "g_2 \\\\\n", + "\\vdots \\\\\n", + "g_{N_x - 3} \\\\\n", + "g_{N_x - 2}\n", + "\\end{pmatrix}\n", + "&=\n", + "\\Delta x^2\n", + "\\begin{pmatrix}\n", + "f(x_1) \\\\\n", + "f(x_2) \\\\\n", + "\\vdots \\\\\n", + "f(x_{N_x - 3}) \\\\\n", + "f(x_{N_x - 2})\n", + "\\end{pmatrix} \\\\\n", + "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ab07ae1", + "metadata": { + "editable": true + }, + "source": [ + "which makes it possible to solve for the vector $\\boldsymbol{g}$." + ] + }, + { + "cell_type": "markdown", + "id": "8134c34f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the code\n", + "\n", + "We can then compare the result from this numerical scheme with the output from our network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "4362f9a9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + "\n", + " ## Perform the computation using the numerical scheme\n", + "\n", + " dx = 1/(Nx - 1)\n", + "\n", + " # Set up the matrix A\n", + " A = np.zeros((Nx-2,Nx-2))\n", + "\n", + " A[0,0] = 2\n", + " A[0,1] = -1\n", + "\n", + " for i in range(1,Nx-3):\n", + " A[i,i-1] = -1\n", + " A[i,i] = 2\n", + " A[i,i+1] = -1\n", + "\n", + " A[Nx - 3, Nx - 4] = -1\n", + " A[Nx - 3, Nx - 3] = 2\n", + "\n", + " # Set up the vector f\n", + " f_vec = dx**2 * f(x[1:-1])\n", + "\n", + " # Solve the equation\n", + " g_res = np.linalg.solve(A,f_vec)\n", + "\n", + " g_vec = np.zeros(Nx)\n", + " g_vec[1:-1] = g_res\n", + "\n", + " # Print the differences between each method\n", + " max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " max_diff2 = np.max(np.abs(g_vec - g_analytical))\n", + " print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n", + " print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(x,g_vec)\n", + " plt.plot(x,g_analytical)\n", + " plt.plot(x,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['numerical scheme','analytical','dnn'])\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "c66dc85a", + "metadata": { + "editable": true + }, + "source": [ + "## Partial Differential Equations\n", + "\n", + "A partial differential equation (PDE) has a solution here the function\n", + "is defined by multiple variables. The equation may involve all kinds\n", + "of combinations of which variables the function is differentiated with\n", + "respect to.\n", + "\n", + "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "cf60d1fc", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{PDE} \\tag{17}\n", + " f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bff85f6e", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given." + ] + }, + { + "cell_type": "markdown", + "id": "64289867", + "metadata": { + "editable": true + }, + "source": [ + "## Type of problem\n", + "\n", + "The problem our network must solve for, is similar to the ODE case.\n", + "We must have a trial solution $g_t$ at hand.\n", + "\n", + "For instance, the trial solution could be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "75d3a4d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f3e695d", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n", + "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n", + "\n", + "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions." + ] + }, + { + "cell_type": "markdown", + "id": "da1ba3cf", + "metadata": { + "editable": true + }, + "source": [ + "## Network requirements\n", + "\n", + "The network tries then the minimize the cost function following the\n", + "same ideas as described for the ODE case, but now with more than one\n", + "variables to consider. The concept still remains the same; find a set\n", + "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n", + "close to zero as possible.\n", + "\n", + "As for the ODE case, the cost function is the mean squared error that\n", + "the network must try to minimize. The cost function for the network to\n", + "minimize is" + ] + }, + { + "cell_type": "markdown", + "id": "373065ff", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x_1, \\dots, x_N, P\\right) = \\left( f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2281eade", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:" + ] + }, + { + "cell_type": "markdown", + "id": "989a8905", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b36367a0", + "metadata": { + "editable": true + }, + "source": [ + "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into" + ] + }, + { + "cell_type": "markdown", + "id": "6f6f51dd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "35bd1e4a", + "metadata": { + "editable": true + }, + "source": [ + "## Example: The diffusion equation\n", + "\n", + "In one spatial dimension, the equation reads" + ] + }, + { + "cell_type": "markdown", + "id": "2b804c0a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "07f20557", + "metadata": { + "editable": true + }, + "source": [ + "where a possible choice of conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "0e14c702", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a19c5cae", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x)$ being some given function." + ] + }, + { + "cell_type": "markdown", + "id": "de041a40", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the problem\n", + "\n", + "For this case, we want to find $g(x,t)$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "519bb7a7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation} \\label{diffonedim} \\tag{18}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "129322ea", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ddc7b725", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5497b34b", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x) = \\sin(\\pi x)$.\n", + "\n", + "First, let us set up the deep neural network.\n", + "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n", + "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions." + ] + }, + { + "cell_type": "markdown", + "id": "0b9040e4", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd\n", + "\n", + "The only change to do here, is to extend our network such that\n", + "functions of multiple parameters are correctly handled. In this case\n", + "we have two variables in our function to solve for, that is time $t$\n", + "and position $x$. The variables will be represented by a\n", + "one-dimensional array in the program. The program will evaluate the\n", + "network at each possible pair $(x,t)$, given an array for the desired\n", + "$x$-values and $t$-values to approximate the solution at." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "17097802", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]" + ] + }, + { + "cell_type": "markdown", + "id": "a2178b56", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The trial solution\n", + "\n", + "The cost function must then iterate through the given arrays\n", + "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n", + "neural network and the trial solution is evaluated at, and then finds\n", + "the Jacobian of the trial solution.\n", + "\n", + "A possible trial solution for this PDE is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n", + "$$\n", + "\n", + "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n", + "\n", + "To fulfill the conditions, $A(x,t)$ could be:\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n", + "$$\n", + "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "533f4e84", + "metadata": { + "editable": true + }, + "source": [ + "## Why the jacobian?\n", + "\n", + "The Jacobian is used because the program must find the derivative of\n", + "the trial solution with respect to $x$ and $t$.\n", + "\n", + "This gives the necessity of computing the Jacobian matrix, as we want\n", + "to evaluate the gradient with respect to $x$ and $t$ (note that the\n", + "Jacobian of a scalar-valued multivariate function is simply its\n", + "gradient).\n", + "\n", + "In Autograd, the differentiation is by default done with respect to\n", + "the first input argument of your Python function. Since the points is\n", + "an array representing $x$ and $t$, the Jacobian is calculated using\n", + "the values of $x$ and $t$.\n", + "\n", + "To find the second derivative with respect to $x$ and $t$, the\n", + "Jacobian can be found for the second time. The result is a Hessian\n", + "matrix, which is the matrix containing all the possible second order\n", + "mixed derivatives of $g(x,t)$." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "7b494481", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum" + ] + }, + { + "cell_type": "markdown", + "id": "9f4b4939", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The full program\n", + "\n", + "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n", + "\n", + "The analytical solution of our problem is\n", + "\n", + "$$\n", + "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n", + "$$\n", + "\n", + "A possible way to implement a neural network solving the PDE, is given below.\n", + "Be aware, though, that it is fairly slow for the parameters used.\n", + "A better result is possible, but requires more iterations, and thus longer time to complete.\n", + "\n", + "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n", + "Using TensorFlow results in a much better execution time. Try it!" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "83d6eb7d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import jacobian,hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the network\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## Define the trial solution and cost function\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum /( np.size(x)*np.size(t) )\n", + "\n", + "## For comparison, define the analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n", + "\n", + "## Set up a function for training the network to solve for the equation\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [100, 25]\n", + " num_iter = 250\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " g_dnn_ag = np.zeros((Nx, Nt))\n", + " G_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " g_dnn_ag[i,j] = g_trial(point,P)\n", + "\n", + " G_analytical[i,j] = g_analytic(point)\n", + "\n", + " # Find the map difference between the analytical and the computed solution\n", + " diff_ag = np.abs(g_dnn_ag - G_analytical)\n", + " print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = g_dnn_ag[:,indx1]\n", + " res2 = g_dnn_ag[:,indx2]\n", + " res3 = g_dnn_ag[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = G_analytical[:,indx1]\n", + " res_analytical2 = G_analytical[:,indx2]\n", + " res_analytical3 = G_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ada13a48", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the wave equation with Neural Networks\n", + "\n", + "The wave equation is" + ] + }, + { + "cell_type": "markdown", + "id": "e4727d73", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0b86d555", + "metadata": { + "editable": true + }, + "source": [ + "with $c$ being the specified wave speed.\n", + "\n", + "Here, the chosen conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "216948d5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\tg(0,t) &= 0 \\\\\n", + "\tg(1,t) &= 0 \\\\\n", + "\tg(x,0) &= u(x) \\\\\n", + "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "44c25fdc", + "metadata": { + "editable": true + }, + "source": [ + "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions." + ] + }, + { + "cell_type": "markdown", + "id": "98f919eb", + "metadata": { + "editable": true + }, + "source": [ + "## The problem to solve for\n", + "\n", + "The wave equation to solve for, is" + ] + }, + { + "cell_type": "markdown", + "id": "01299767", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{wave} \\tag{19}\n", + "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "556587c5", + "metadata": { + "editable": true + }, + "source": [ + "where $c$ is the given wave speed.\n", + "The chosen conditions for this equation are" + ] + }, + { + "cell_type": "markdown", + "id": "c9eb4f3a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0,t) &= 0, &t \\geq 0 \\\\\n", + "g(1,t) &= 0, &t \\geq 0 \\\\\n", + "g(x,0) &= u(x), &x\\in[0,1] \\\\\n", + "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n", + "\\end{aligned} \\label{condwave} \\tag{20}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "63128ef6", + "metadata": { + "editable": true + }, + "source": [ + "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "ff568c81", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n", + "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n", + "\n", + "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n", + "$$\n", + "\n", + "where\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t^2)u(x) + tv(x)\n", + "$$\n", + "\n", + "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example." + ] + }, + { + "cell_type": "markdown", + "id": "7b32c8dd", + "metadata": { + "editable": true + }, + "source": [ + "## The analytical solution\n", + "\n", + "The analytical solution for our specific problem, is\n", + "\n", + "$$\n", + "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fc33e683", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the wave equation - the full program using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "2f923958", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def v(x):\n", + " return -np.pi*np.sin(np.pi*x)\n", + "\n", + "def h1(point):\n", + " x,t = point\n", + " return (1 - t**2)*u(x) + t*v(x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n", + "\n", + "## Define the cost function\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_d2x = g_t_hessian[0][0]\n", + " g_t_d2t = g_t_hessian[1][1]\n", + "\n", + " err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum / (np.size(t) * np.size(x))\n", + "\n", + "## The neural network\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## The analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n", + "\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [50,20]\n", + " num_iter = 1000\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " res = np.zeros((Nx, Nt))\n", + " res_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " res[i,j] = g_trial(point,P)\n", + "\n", + " res_analytical[i,j] = g_analytic(point)\n", + "\n", + " diff = np.abs(res - res_analytical)\n", + " print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = res[:,indx1]\n", + " res2 = res[:,indx2]\n", + " res3 = res[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = res_analytical[:,indx1]\n", + " res_analytical2 = res_analytical[:,indx2]\n", + " res_analytical3 = res_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "95dea76f", + "metadata": { + "editable": true + }, + "source": [ + "## Resources on differential equations and deep learning\n", + "\n", + "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n", + "\n", + "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n", + "\n", + "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n", + "\n", + "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week44.ipynb b/doc/LectureNotes/_build/html/_sources/week44.ipynb new file mode 100644 index 000000000..6193b11ee --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week44.ipynb @@ -0,0 +1,4983 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "67995f17", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d31bb6a0", + "metadata": { + "editable": true + }, + "source": [ + "# Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **Week 44**" + ] + }, + { + "cell_type": "markdown", + "id": "846f5bd7", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 44\n", + "\n", + "**Material for the lecture Monday October 27, 2025.**\n", + "\n", + "1. Solving differential equations, continuation from last week, first lecture\n", + "\n", + "2. Convolutional Neural Networks, second lecture\n", + "\n", + "3. Readings and Videos:\n", + "\n", + " * These lecture notes at \n", + "\n", + " * For a more in depth discussion on neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n", + "\n", + " * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at . \n", + "\n", + " * Video on Deep Learning at \n", + "\n", + " * Video on Convolutional Neural Networks from MIT at \n", + "\n", + " * Video on CNNs from Stanford at \n", + "\n", + " * Video of lecture October 27 at \n", + "\n", + " * Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "855f98ab", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions on Tuesday and Wednesday\n", + "\n", + "* Main focus is discussion of and work on project 2\n", + "\n", + "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday" + ] + }, + { + "cell_type": "markdown", + "id": "12675cc5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for Lecture Monday October 27" + ] + }, + { + "cell_type": "markdown", + "id": "f714320f", + "metadata": { + "editable": true + }, + "source": [ + "## Solving differential equations with Deep Learning\n", + "\n", + "The Universal Approximation Theorem states that a neural network can\n", + "approximate any function at a single hidden layer along with one input\n", + "and output layer to any given precision.\n", + "\n", + "**Book on solving differential equations with ML methods.**\n", + "\n", + "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n", + "\n", + "**Physics informed neural networks.**\n", + "\n", + "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n", + "\n", + "**Thanks to Kristine Baluka Hein.**\n", + "\n", + "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n", + "A great thanks to Kristine." + ] + }, + { + "cell_type": "markdown", + "id": "ebe354b6", + "metadata": { + "editable": true + }, + "source": [ + "## Ordinary Differential Equations first\n", + "\n", + "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n", + "\n", + "In general, an ordinary differential equation looks like" + ] + }, + { + "cell_type": "markdown", + "id": "f16621c0", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{ode} \\tag{1}\n", + "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b272a0d", + "metadata": { + "editable": true + }, + "source": [ + "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n", + "\n", + "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n", + "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n", + "The equation is referred to as a $n$-th order ODE.\n", + "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n", + "for the solution to be unique." + ] + }, + { + "cell_type": "markdown", + "id": "611b2399", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "Let the trial solution $g_t(x)$ be" + ] + }, + { + "cell_type": "markdown", + "id": "cab2d9fb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fbd68a84", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n", + "of conditions, $N(x,P)$ a neural network with weights and biases\n", + "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n", + "neural network. The role of the function $h_2(x, N(x,P))$, is to\n", + "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n", + "evaluated at the values of $x$ where the given conditions must be\n", + "satisfied. The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n", + "the conditions.\n", + "\n", + "But what about the network $N(x,P)$?\n", + "\n", + "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation." + ] + }, + { + "cell_type": "markdown", + "id": "24929e78", + "metadata": { + "editable": true + }, + "source": [ + "## Minimization process\n", + "\n", + "For the minimization to be defined, we need to have a cost function at hand to minimize.\n", + "\n", + "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n", + "We can choose to consider the mean squared error as the cost function for an input $x$.\n", + "Since we are looking at one input, the cost function is just $f$ squared.\n", + "The cost function $c\\left(x, P \\right)$ can therefore be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "8da0a4d4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3de8b89e", + "metadata": { + "editable": true + }, + "source": [ + "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n", + "the cost function becomes" + ] + }, + { + "cell_type": "markdown", + "id": "1275ce7a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{cost} \\tag{3}\n", + "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a522e0fa", + "metadata": { + "editable": true + }, + "source": [ + "The neural net should then find the parameters $P$ that minimizes the cost function in\n", + "([3](#cost)) for a set of $N$ training samples $x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "8a18955b", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cost function using gradient descent and automatic differentiation\n", + "\n", + "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n", + "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n", + "\n", + "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n", + "Automatic differentiation is a method of finding the derivatives numerically with very high precision." + ] + }, + { + "cell_type": "markdown", + "id": "888808f7", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Exponential decay\n", + "\n", + "An exponential decay of a quantity $g(x)$ is described by the equation" + ] + }, + { + "cell_type": "markdown", + "id": "fcefd7fb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{solve_expdec} \\tag{4}\n", + " g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "02cb2ce9", + "metadata": { + "editable": true + }, + "source": [ + "with $g(0) = g_0$ for some chosen initial value $g_0$.\n", + "\n", + "The analytical solution of ([4](#solve_expdec)) is" + ] + }, + { + "cell_type": "markdown", + "id": "bdd9ef4d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "867cbb56", + "metadata": { + "editable": true + }, + "source": [ + "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))." + ] + }, + { + "cell_type": "markdown", + "id": "2f9ac7ae", + "metadata": { + "editable": true + }, + "source": [ + "## The function to solve for\n", + "\n", + "The program will use a neural network to solve" + ] + }, + { + "cell_type": "markdown", + "id": "49a68337", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode} \\tag{6}\n", + "g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a6a70316", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n", + "\n", + "In this example, $\\gamma = 2$ and $g_0 = 10$." + ] + }, + { + "cell_type": "markdown", + "id": "15622597", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be" + ] + }, + { + "cell_type": "markdown", + "id": "3661d5fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "245327b3", + "metadata": { + "editable": true + }, + "source": [ + "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer." + ] + }, + { + "cell_type": "markdown", + "id": "57ae96b2", + "metadata": { + "editable": true + }, + "source": [ + "## Setup of Network\n", + "\n", + "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}}, P_{\\text{output}} \\}$.\n", + "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n", + "\n", + "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n", + "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n", + "\n", + "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n", + "\n", + "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:" + ] + }, + { + "cell_type": "markdown", + "id": "6e7ea73f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{trial} \\tag{7}\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ef84086", + "metadata": { + "editable": true + }, + "source": [ + "## Reformulating the problem\n", + "\n", + "We wish that our neural network manages to minimize a given cost function.\n", + "\n", + "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n", + "such that it describes the problem a neural network can solve for.\n", + "\n", + "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n", + "\n", + "The trial solution" + ] + }, + { + "cell_type": "markdown", + "id": "03980965", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f838bf7c", + "metadata": { + "editable": true + }, + "source": [ + "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "3e1ebb62", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{nnmin} \\tag{8}\n", + "g_t'(x, P) = - \\gamma g_t(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a85dcbea", + "metadata": { + "editable": true + }, + "source": [ + "is fulfilled as *best as possible*." + ] + }, + { + "cell_type": "markdown", + "id": "dc4a2fc0", + "metadata": { + "editable": true + }, + "source": [ + "## More technicalities\n", + "\n", + "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n", + "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n", + "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n", + "\n", + "This gives the following cost function our neural network must solve for:" + ] + }, + { + "cell_type": "markdown", + "id": "20921b20", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "06e89d99", + "metadata": { + "editable": true + }, + "source": [ + "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n", + "\n", + "or, in terms of weights and biases for the hidden and output layer in our network:" + ] + }, + { + "cell_type": "markdown", + "id": "fb4b7d00", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "925d8872", + "metadata": { + "editable": true + }, + "source": [ + "for an input value $x$." + ] + }, + { + "cell_type": "markdown", + "id": "46f38d69", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes" + ] + }, + { + "cell_type": "markdown", + "id": "adca56df", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{min} \\tag{9}\n", + "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9e260216", + "metadata": { + "editable": true + }, + "source": [ + "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes" + ] + }, + { + "cell_type": "markdown", + "id": "7d5e7f63", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P} C(\\boldsymbol{x}, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d7442d44", + "metadata": { + "editable": true + }, + "source": [ + "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n", + "\n", + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "af21673a", + "metadata": { + "editable": true + }, + "source": [ + "## A possible implementation of a neural network\n", + "\n", + "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n", + "\n", + "First, the neural network must feed forward the inputs.\n", + "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n", + "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be $N_{\\text{hidden} }$." + ] + }, + { + "cell_type": "markdown", + "id": "6687f370", + "metadata": { + "editable": true + }, + "source": [ + "## Technicalities\n", + "\n", + "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:" + ] + }, + { + "cell_type": "markdown", + "id": "7c07e210", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "x_j\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7747386f", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities I\n", + "\n", + "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:" + ] + }, + { + "cell_type": "markdown", + "id": "981c5e4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big) \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + " b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "x_1 & x_2 & \\dots & x_N\n", + "\\end{pmatrix} \\\\\n", + "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7eedb1ed", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities II\n", + "\n", + "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n", + "\n", + "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n", + "\n", + "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:" + ] + }, + { + "cell_type": "markdown", + "id": "8507388c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "32c6ce19", + "metadata": { + "editable": true + }, + "source": [ + "It is possible to use other activations functions for the hidden layer also.\n", + "\n", + "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n", + "\n", + "$$\n", + "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big( \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n", + "$$\n", + "\n", + "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n", + "\n", + "The output layer consists of one neuron in this case, and combines the\n", + "output from each of the neurons in the hidden layers. The output layer\n", + "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n", + "and biases $b_i^{\\text{output}}$. In this case,\n", + "it is assumes that the number of neurons in the output layer is one." + ] + }, + { + "cell_type": "markdown", + "id": "d3adb503", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities III\n", + "\n", + "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously." + ] + }, + { + "cell_type": "markdown", + "id": "41fb7d85", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{1,j}^{\\text{output}} & =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "\\boldsymbol{x}_j^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6af6c5f6", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities IV\n", + "\n", + "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:" + ] + }, + { + "cell_type": "markdown", + "id": "bfdfcfe5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}_{1}^{\\text{output}} =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "224fb7a0", + "metadata": { + "editable": true + }, + "source": [ + "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network." + ] + }, + { + "cell_type": "markdown", + "id": "03c8c39e", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation\n", + "\n", + "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n", + "\n", + "The chosen cost function for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "f467feb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "287a0aed", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize the cost function, an optimization method must be chosen.\n", + "\n", + "Here, gradient descent with a constant step size has been chosen." + ] + }, + { + "cell_type": "markdown", + "id": "a49835f1", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent\n", + "\n", + "The idea of the gradient descent algorithm is to update parameters in\n", + "a direction where the cost function decreases goes to a minimum.\n", + "\n", + "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n", + "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n", + "\\boldsymbol{\\omega})$, goes as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "62d6f51d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ca20573", + "metadata": { + "editable": true + }, + "source": [ + "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n", + "\n", + "The value of $\\lambda$ decides how large steps the algorithm must take\n", + "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n", + "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n", + "to the elements in $\\boldsymbol{\\omega}$.\n", + "\n", + "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n", + "respect to the two sets of weights and biases, that is for the hidden\n", + "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n", + "}$ .\n", + "\n", + "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by" + ] + }, + { + "cell_type": "markdown", + "id": "8b16bc94", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P) \\\\\n", + "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a339b3a7", + "metadata": { + "editable": true + }, + "source": [ + "## The code for solving the ODE" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a63e587a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Assuming one input, hidden, and output layer\n", + "def neural_network(params, x):\n", + "\n", + " # Find the weights (including and biases) for the hidden and output layer.\n", + " # Assume that params is a list of parameters for each layer.\n", + " # The biases are the first element for each array in params,\n", + " # and the weights are the remaning elements in each array in params.\n", + "\n", + " w_hidden = params[0]\n", + " w_output = params[1]\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " ## Hidden layer:\n", + "\n", + " # Add a row of ones to include bias\n", + " x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_input)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " ## Output layer:\n", + "\n", + " # Include bias:\n", + " x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_hidden)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial(x,params, g0 = 10):\n", + " return g0 + x*neural_network(params,x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n", + "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n", + " ## Set up initial weights and biases\n", + "\n", + " # For the hidden layer\n", + " p0 = npr.randn(num_neurons_hidden, 2 )\n", + "\n", + " # For the output layer\n", + " p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n", + "\n", + " P = [p0, p1]\n", + "\n", + " print('Initial cost: %g'%cost_function(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of two arrays;\n", + " # one for the gradient w.r.t P_hidden and\n", + " # one for the gradient w.r.t P_output\n", + " cost_grad = cost_function_grad(P, x)\n", + "\n", + " P[0] = P[0] - lmb * cost_grad[0]\n", + " P[1] = P[1] - lmb * cost_grad[1]\n", + "\n", + " print('Final cost: %g'%cost_function(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " # Set seed such that the weight are initialized\n", + " # with same weights and biases for every run.\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = 10\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " # Use the network\n", + " P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " # Print the deviation from the trial solution and true solution\n", + " res = g_trial(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "85985bda", + "metadata": { + "editable": true + }, + "source": [ + "## The network with one input layer, specified number of hidden layers, and one output layer\n", + "\n", + "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n", + "\n", + "The number of neurons within each hidden layer are given as a list of integers in the program below." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "91831f8e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# The neural network with one input layer and one output layer,\n", + "# but with number of hidden layers specified by the user.\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x,params, g0 = 10):\n", + " return g0 + x*deep_neural_network(params, x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The same cost function as before, but calls deep_neural_network instead.\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input and one output layer,\n", + "# but with specified number of hidden layers from the user.\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # The number of elements in the list num_hidden_neurons thus represents\n", + " # the number of hidden layers.\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weights and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = np.array([10,10])\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " res = g_trial_deep(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','dnn'])\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e6de1553", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Population growth\n", + "\n", + "A logistic model of population growth assumes that a population converges toward an equilibrium.\n", + "The population growth can be modeled by" + ] + }, + { + "cell_type": "markdown", + "id": "6e4c5e3a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{log} \\tag{10}\n", + "\tg'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "64a97256", + "metadata": { + "editable": true + }, + "source": [ + "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n", + "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n", + "\n", + "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n", + "and high execution time (this might be more apparent in the examples solving PDEs),\n", + "using a library like TensorFlow is recommended.\n", + "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method." + ] + }, + { + "cell_type": "markdown", + "id": "94bb8aaa", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the problem\n", + "\n", + "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n", + "The population follows the model" + ] + }, + { + "cell_type": "markdown", + "id": "29ead54b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode_population} \\tag{11}\n", + "g'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5685f6e2", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$.\n", + "\n", + "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$." + ] + }, + { + "cell_type": "markdown", + "id": "adaea719", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "We will get a slightly different trial solution, as the boundary conditions are different\n", + "compared to the case for exponential decay.\n", + "\n", + "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n", + "\n", + "$$\n", + "h_1(t) = g_0 + t \\cdot N(t,P)\n", + "$$\n", + "\n", + "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n", + "\n", + "The analytical solution is\n", + "\n", + "$$\n", + "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4ee7e543", + "metadata": { + "editable": true + }, + "source": [ + "## The program using Autograd\n", + "\n", + "The network will be the similar as for the exponential decay example, but with some small modifications for our problem." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e50f4369", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Function to get the parameters.\n", + "# Done such that one can easily change the paramaters after one's liking.\n", + "def get_parameters():\n", + " alpha = 2\n", + " A = 1\n", + " g0 = 1.2\n", + " return alpha, A, g0\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = f(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# The right side of the ODE:\n", + "def f(x, g_trial):\n", + " alpha,A, g0 = get_parameters()\n", + " return alpha*g_trial*(A - g_trial)\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x, params):\n", + " alpha,A, g0 = get_parameters()\n", + " return g0 + x*deep_neural_network(params,x)\n", + "\n", + "# The analytical solution:\n", + "def g_analytic(t):\n", + " alpha,A, g0 = get_parameters()\n", + " return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100, 50, 25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "cf212644", + "metadata": { + "editable": true + }, + "source": [ + "## Using forward Euler to solve the ODE\n", + "\n", + "A straightforward way of solving an ODE numerically, is to use Euler's method.\n", + "\n", + "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n", + "\n", + "$$\n", + "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n", + "$$\n", + "\n", + "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields" + ] + }, + { + "cell_type": "markdown", + "id": "46f2fb77", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + " g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n", + " &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aab2dfa5", + "metadata": { + "editable": true + }, + "source": [ + "along with the condition that $g(0) = g_0$.\n", + "\n", + "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n", + "\n", + "For $i \\geq 1$, we have that" + ] + }, + { + "cell_type": "markdown", + "id": "8eea575e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "t_i &= i\\Delta t \\\\\n", + "&= (i - 1)\\Delta t + \\Delta t \\\\\n", + "&= t_{i-1} + \\Delta t\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b91b116d", + "metadata": { + "editable": true + }, + "source": [ + "Now, if $g_i = g(t_i)$ then" + ] + }, + { + "cell_type": "markdown", + "id": "b438159d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " g_i &= g(t_i) \\\\\n", + " &= g(t_{i-1} + \\Delta t) \\\\\n", + " &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n", + " &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odenum} \\tag{12}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4fcc89b", + "metadata": { + "editable": true + }, + "source": [ + "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n", + "\n", + "Equation ([12](#odenum)) could be implemented in the following way,\n", + "extending the program that uses the network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "98f55b29", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Assume that all function definitions from the example program using Autograd\n", + "# are located here.\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100,50,25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " ## Find an approximation to the funtion using forward Euler\n", + "\n", + " alpha, A, g0 = get_parameters()\n", + " dt = T/(Nt - 1)\n", + "\n", + " # Perform forward Euler to solve the ODE\n", + " g_euler = np.zeros(Nt)\n", + " g_euler[0] = g0\n", + "\n", + " for i in range(1,Nt):\n", + " g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n", + "\n", + " # Print the errors done by each method\n", + " diff1 = np.max(np.abs(g_euler - g_analytical))\n", + " diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n", + "\n", + " print('Max absolute difference between Euler method and analytical: %g'%diff1)\n", + " print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n", + "\n", + " # Plot results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(t,g_euler)\n", + " plt.plot(t,g_analytical)\n", + " plt.plot(t,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['euler','analytical','dnn'])\n", + " plt.xlabel('Time t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a6e8888e", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the one dimensional Poisson equation\n", + "\n", + "The Poisson equation for $g(x)$ in one dimension is" + ] + }, + { + "cell_type": "markdown", + "id": "ac2720d4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{poisson} \\tag{13}\n", + " -g''(x) = f(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65554b02", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function for $x \\in (0,1)$.\n", + "\n", + "The conditions that $g(x)$ is chosen to fulfill, are" + ] + }, + { + "cell_type": "markdown", + "id": "0cdf0586", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g(0) &= 0 \\\\\n", + " g(1) &= 0\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f7e65a6a", + "metadata": { + "editable": true + }, + "source": [ + "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n", + "The results from the networks can then be compared to the analytical solution.\n", + "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "cd827e12", + "metadata": { + "editable": true + }, + "source": [ + "## The specific equation to solve for\n", + "\n", + "Here, the function $g(x)$ to solve for follows the equation" + ] + }, + { + "cell_type": "markdown", + "id": "a6100e41", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "-g''(x) = f(x),\\qquad x \\in (0,1)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "15c06751", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function, along with the chosen conditions" + ] + }, + { + "cell_type": "markdown", + "id": "b2b4dd2f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0) = g(1) = 0\n", + "\\end{aligned}\\label{cond} \\tag{14}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2133aeed", + "metadata": { + "editable": true + }, + "source": [ + "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n", + "\n", + "For this case, a possible trial solution satisfying the conditions could be" + ] + }, + { + "cell_type": "markdown", + "id": "5baf9b4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed82aba2", + "metadata": { + "editable": true + }, + "source": [ + "The analytical solution for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "c9bce69c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g(x) = x(1 - x)\\exp(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ce42c4a8", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the equation using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2fcb9045", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9db2e30e", + "metadata": { + "editable": true + }, + "source": [ + "## Comparing with a numerical scheme\n", + "\n", + "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n", + "\n", + "Using Taylor series, the second derivative can be expressed as\n", + "\n", + "$$\n", + "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n", + "$$\n", + "\n", + "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n", + "\n", + "Looking away from the error terms gives an approximation to the second derivative:" + ] + }, + { + "cell_type": "markdown", + "id": "2cea098e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{approx} \\tag{15}\n", + "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4606d139", + "metadata": { + "editable": true + }, + "source": [ + "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes" + ] + }, + { + "cell_type": "markdown", + "id": "bf52b218", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n", + "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5649b303", + "metadata": { + "editable": true + }, + "source": [ + "Since we know from our problem that" + ] + }, + { + "cell_type": "markdown", + "id": "cabbaeeb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "-g''(x) &= f(x) \\\\\n", + "&= (3x + x^2)\\exp(x)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9116da9a", + "metadata": { + "editable": true + }, + "source": [ + "along with the conditions $g(0) = g(1) = 0$,\n", + "the following scheme can be used to find an approximate solution for $g(x)$ numerically:" + ] + }, + { + "cell_type": "markdown", + "id": "fa0313ed", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n", + " -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odesys} \\tag{16}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4bff256", + "metadata": { + "editable": true + }, + "source": [ + "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n", + "\n", + "The equation can be rewritten into a matrix equation:" + ] + }, + { + "cell_type": "markdown", + "id": "2817b619", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\begin{pmatrix}\n", + "2 & -1 & 0 & \\dots & 0 \\\\\n", + "-1 & 2 & -1 & \\dots & 0 \\\\\n", + "\\vdots & & \\ddots & & \\vdots \\\\\n", + "0 & \\dots & -1 & 2 & -1 \\\\\n", + "0 & \\dots & 0 & -1 & 2\\\\\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "g_1 \\\\\n", + "g_2 \\\\\n", + "\\vdots \\\\\n", + "g_{N_x - 3} \\\\\n", + "g_{N_x - 2}\n", + "\\end{pmatrix}\n", + "&=\n", + "\\Delta x^2\n", + "\\begin{pmatrix}\n", + "f(x_1) \\\\\n", + "f(x_2) \\\\\n", + "\\vdots \\\\\n", + "f(x_{N_x - 3}) \\\\\n", + "f(x_{N_x - 2})\n", + "\\end{pmatrix} \\\\\n", + "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5130b233", + "metadata": { + "editable": true + }, + "source": [ + "which makes it possible to solve for the vector $\\boldsymbol{g}$." + ] + }, + { + "cell_type": "markdown", + "id": "18a4fdda", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the code\n", + "\n", + "We can then compare the result from this numerical scheme with the output from our network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "3cff184d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + "\n", + " ## Perform the computation using the numerical scheme\n", + "\n", + " dx = 1/(Nx - 1)\n", + "\n", + " # Set up the matrix A\n", + " A = np.zeros((Nx-2,Nx-2))\n", + "\n", + " A[0,0] = 2\n", + " A[0,1] = -1\n", + "\n", + " for i in range(1,Nx-3):\n", + " A[i,i-1] = -1\n", + " A[i,i] = 2\n", + " A[i,i+1] = -1\n", + "\n", + " A[Nx - 3, Nx - 4] = -1\n", + " A[Nx - 3, Nx - 3] = 2\n", + "\n", + " # Set up the vector f\n", + " f_vec = dx**2 * f(x[1:-1])\n", + "\n", + " # Solve the equation\n", + " g_res = np.linalg.solve(A,f_vec)\n", + "\n", + " g_vec = np.zeros(Nx)\n", + " g_vec[1:-1] = g_res\n", + "\n", + " # Print the differences between each method\n", + " max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " max_diff2 = np.max(np.abs(g_vec - g_analytical))\n", + " print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n", + " print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(x,g_vec)\n", + " plt.plot(x,g_analytical)\n", + " plt.plot(x,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['numerical scheme','analytical','dnn'])\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "89115be0", + "metadata": { + "editable": true + }, + "source": [ + "## Partial Differential Equations\n", + "\n", + "A partial differential equation (PDE) has a solution here the function\n", + "is defined by multiple variables. The equation may involve all kinds\n", + "of combinations of which variables the function is differentiated with\n", + "respect to.\n", + "\n", + "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "c43a6341", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation} \\label{PDE} \\tag{17}\n", + " f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "218a7a68", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given." + ] + }, + { + "cell_type": "markdown", + "id": "902f8f61", + "metadata": { + "editable": true + }, + "source": [ + "## Type of problem\n", + "\n", + "The problem our network must solve for, is similar to the ODE case.\n", + "We must have a trial solution $g_t$ at hand.\n", + "\n", + "For instance, the trial solution could be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "1c2bbcbd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "73f5bf7b", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n", + "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n", + "\n", + "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions." + ] + }, + { + "cell_type": "markdown", + "id": "dbb4ece5", + "metadata": { + "editable": true + }, + "source": [ + "## Network requirements\n", + "\n", + "The network tries then the minimize the cost function following the\n", + "same ideas as described for the ODE case, but now with more than one\n", + "variables to consider. The concept still remains the same; find a set\n", + "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n", + "close to zero as possible.\n", + "\n", + "As for the ODE case, the cost function is the mean squared error that\n", + "the network must try to minimize. The cost function for the network to\n", + "minimize is" + ] + }, + { + "cell_type": "markdown", + "id": "d01d3943", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x_1, \\dots, x_N, P\\right) = \\left( f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6514db22", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:" + ] + }, + { + "cell_type": "markdown", + "id": "5a0ed10c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "200fc78c", + "metadata": { + "editable": true + }, + "source": [ + "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into" + ] + }, + { + "cell_type": "markdown", + "id": "0c87647d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6484a267", + "metadata": { + "editable": true + }, + "source": [ + "## Example: The diffusion equation\n", + "\n", + "In one spatial dimension, the equation reads" + ] + }, + { + "cell_type": "markdown", + "id": "2c2a2467", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6df58357", + "metadata": { + "editable": true + }, + "source": [ + "where a possible choice of conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "13d9c7f6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "627708ec", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x)$ being some given function." + ] + }, + { + "cell_type": "markdown", + "id": "43cdd945", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the problem\n", + "\n", + "For this case, we want to find $g(x,t)$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "ccdcb67e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
\n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation} \\label{diffonedim} \\tag{18}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ebe711f8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2174f30f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "083ed2ff", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x) = \\sin(\\pi x)$.\n", + "\n", + "First, let us set up the deep neural network.\n", + "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n", + "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions." + ] + }, + { + "cell_type": "markdown", + "id": "cf5e3f46", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd\n", + "\n", + "The only change to do here, is to extend our network such that\n", + "functions of multiple parameters are correctly handled. In this case\n", + "we have two variables in our function to solve for, that is time $t$\n", + "and position $x$. The variables will be represented by a\n", + "one-dimensional array in the program. The program will evaluate the\n", + "network at each possible pair $(x,t)$, given an array for the desired\n", + "$x$-values and $t$-values to approximate the solution at." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4fee106b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]" + ] + }, + { + "cell_type": "markdown", + "id": "63e5fb7e", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The trial solution\n", + "\n", + "The cost function must then iterate through the given arrays\n", + "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n", + "neural network and the trial solution is evaluated at, and then finds\n", + "the Jacobian of the trial solution.\n", + "\n", + "A possible trial solution for this PDE is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n", + "$$\n", + "\n", + "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n", + "\n", + "To fulfill the conditions, $h_1(x,t)$ could be:\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n", + "$$\n", + "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "50cfea81", + "metadata": { + "editable": true + }, + "source": [ + "## Why the Jacobian?\n", + "\n", + "The Jacobian is used because the program must find the derivative of\n", + "the trial solution with respect to $x$ and $t$.\n", + "\n", + "This gives the necessity of computing the Jacobian matrix, as we want\n", + "to evaluate the gradient with respect to $x$ and $t$ (note that the\n", + "Jacobian of a scalar-valued multivariate function is simply its\n", + "gradient).\n", + "\n", + "In Autograd, the differentiation is by default done with respect to\n", + "the first input argument of your Python function. Since the points is\n", + "an array representing $x$ and $t$, the Jacobian is calculated using\n", + "the values of $x$ and $t$.\n", + "\n", + "To find the second derivative with respect to $x$ and $t$, the\n", + "Jacobian can be found for the second time. The result is a Hessian\n", + "matrix, which is the matrix containing all the possible second order\n", + "mixed derivatives of $g(x,t)$." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "309808f6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum" + ] + }, + { + "cell_type": "markdown", + "id": "9880d94c", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The full program\n", + "\n", + "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n", + "\n", + "The analytical solution of our problem is\n", + "\n", + "$$\n", + "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n", + "$$\n", + "\n", + "A possible way to implement a neural network solving the PDE, is given below.\n", + "Be aware, though, that it is fairly slow for the parameters used.\n", + "A better result is possible, but requires more iterations, and thus longer time to complete.\n", + "\n", + "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n", + "Using TensorFlow results in a much better execution time. Try it!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fcd284e3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import jacobian,hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the network\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## Define the trial solution and cost function\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum /( np.size(x)*np.size(t) )\n", + "\n", + "## For comparison, define the analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n", + "\n", + "## Set up a function for training the network to solve for the equation\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [100, 25]\n", + " num_iter = 250\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " g_dnn_ag = np.zeros((Nx, Nt))\n", + " G_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " g_dnn_ag[i,j] = g_trial(point,P)\n", + "\n", + " G_analytical[i,j] = g_analytic(point)\n", + "\n", + " # Find the map difference between the analytical and the computed solution\n", + " diff_ag = np.abs(g_dnn_ag - G_analytical)\n", + " print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = g_dnn_ag[:,indx1]\n", + " res2 = g_dnn_ag[:,indx2]\n", + " res3 = g_dnn_ag[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = G_analytical[:,indx1]\n", + " res_analytical2 = G_analytical[:,indx2]\n", + " res_analytical3 = G_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "51ff4964", + "metadata": { + "editable": true + }, + "source": [ + "## Resources on differential equations and deep learning\n", + "\n", + "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n", + "\n", + "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n", + "\n", + "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n", + "\n", + "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)" + ] + }, + { + "cell_type": "markdown", + "id": "f7c3b9fc", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Networks (recognizing images)\n", + "\n", + "Convolutional neural networks (CNNs) were developed during the last\n", + "decade of the previous century, with a focus on character recognition\n", + "tasks. Nowadays, CNNs are a central element in the spectacular success\n", + "of deep learning methods. The success in for example image\n", + "classifications have made them a central tool for most machine\n", + "learning practitioners.\n", + "\n", + "CNNs are very similar to ordinary Neural Networks.\n", + "They are made up of neurons that have learnable weights and\n", + "biases. Each neuron receives some inputs, performs a dot product and\n", + "optionally follows it with a non-linearity. The whole network still\n", + "expresses a single differentiable score function: from the raw image\n", + "pixels on one end to class scores at the other. And they still have a\n", + "loss function (for example Softmax) on the last (fully-connected) layer\n", + "and all the tips/tricks we developed for learning regular Neural\n", + "Networks still apply (back propagation, gradient descent etc etc)." + ] + }, + { + "cell_type": "markdown", + "id": "5d3a5ee8", + "metadata": { + "editable": true + }, + "source": [ + "## What is the Difference\n", + "\n", + "**CNN architectures make the explicit assumption that\n", + "the inputs are images, which allows us to encode certain properties\n", + "into the architecture. These then make the forward function more\n", + "efficient to implement and vastly reduce the amount of parameters in\n", + "the network.**" + ] + }, + { + "cell_type": "markdown", + "id": "e8618fc8", + "metadata": { + "editable": true + }, + "source": [ + "## Neural Networks vs CNNs\n", + "\n", + "Neural networks are defined as **affine transformations**, that is \n", + "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n", + "output (to which a bias vector is usually added before passing the result\n", + "through a nonlinear activation function). This is applicable to any type of input, be it an\n", + "image, a sound clip or an unordered collection of features: whatever their\n", + "dimensionality, their representation can always be flattened into a vector\n", + "before the transformation." + ] + }, + { + "cell_type": "markdown", + "id": "b41b4781", + "metadata": { + "editable": true + }, + "source": [ + "## Why CNNS for images, sound files, medical images from CT scans etc?\n", + "\n", + "However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic\n", + "structure. More formally, they share these important properties:\n", + "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n", + "\n", + "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n", + "\n", + "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n", + "\n", + "These properties are not exploited when an affine transformation is applied; in\n", + "fact, all the axes are treated in the same way and the topological information\n", + "is not taken into account. Still, taking advantage of the implicit structure of\n", + "the data may prove very handy in solving some tasks, like computer vision and\n", + "speech recognition, and in these cases it would be best to preserve it. This is\n", + "where discrete convolutions come into play.\n", + "\n", + "A discrete convolution is a linear transformation that preserves this notion of\n", + "ordering. It is sparse (only a few input units contribute to a given output\n", + "unit) and reuses parameters (the same weights are applied to multiple locations\n", + "in the input)." + ] + }, + { + "cell_type": "markdown", + "id": "33bf8922", + "metadata": { + "editable": true + }, + "source": [ + "## Regular NNs don’t scale well to full images\n", + "\n", + "As an example, consider\n", + "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n", + "single fully-connected neuron in a first hidden layer of a regular\n", + "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n", + "seems manageable, but clearly this fully-connected structure does not\n", + "scale to larger images. For example, an image of more respectable\n", + "size, say $200\\times 200\\times 3$, would lead to neurons that have \n", + "$200\\times 200\\times 3 = 120,000$ weights. \n", + "\n", + "We could have\n", + "several such neurons, and the parameters would add up quickly! Clearly,\n", + "this full connectivity is wasteful and the huge number of parameters\n", + "would quickly lead to possible overfitting.\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A regular 3-layer Neural Network.

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "95c20234", + "metadata": { + "editable": true + }, + "source": [ + "## 3D volumes of neurons\n", + "\n", + "Convolutional Neural Networks take advantage of the fact that the\n", + "input consists of images and they constrain the architecture in a more\n", + "sensible way. \n", + "\n", + "In particular, unlike a regular Neural Network, the\n", + "layers of a CNN have neurons arranged in 3 dimensions: width,\n", + "height, depth. (Note that the word depth here refers to the third\n", + "dimension of an activation volume, not to the depth of a full Neural\n", + "Network, which can refer to the total number of layers in a network.)\n", + "\n", + "To understand it better, the above example of an image \n", + "with an input volume of\n", + "activations has dimensions $32\\times 32\\times 3$ (width, height,\n", + "depth respectively). \n", + "\n", + "The neurons in a layer will\n", + "only be connected to a small region of the layer before it, instead of\n", + "all of the neurons in a fully-connected manner. Moreover, the final\n", + "output layer could for this specific image have dimensions $1\\times 1 \\times 10$, \n", + "because by the\n", + "end of the CNN architecture we will reduce the full image into a\n", + "single vector of class scores, arranged along the depth\n", + "dimension. \n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "2b7ba652", + "metadata": { + "editable": true + }, + "source": [ + "## More on Dimensionalities\n", + "\n", + "In fields like signal processing (and imaging as well), one designs\n", + "so-called filters. These filters are defined by the convolutions and\n", + "are often hand-crafted. One may specify filters for smoothing, edge\n", + "detection, frequency reshaping, and similar operations. However with\n", + "neural networks the idea is to automatically learn the filters and use\n", + "many of them in conjunction with non-linear operations (activation\n", + "functions).\n", + "\n", + "As an example consider a neural network operating on sound sequence\n", + "data. Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$. We\n", + "construct then a neural network with onle hidden layer only with\n", + "$10^4$ nodes. This means that we will have a weight matrix with\n", + "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n", + "\n", + "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n", + "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "b6a7ae46", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0d56b05e", + "metadata": { + "editable": true + }, + "source": [ + "that is ten billion parameters to determine." + ] + }, + { + "cell_type": "markdown", + "id": "35c90423", + "metadata": { + "editable": true + }, + "source": [ + "## Further remarks\n", + "\n", + "The main principles that justify convolutions is locality of\n", + "information and repetion of patterns within the signal. Sound samples\n", + "of the input in adjacent spots are much more likely to affect each\n", + "other than those that are very far away. Similarly, sounds are\n", + "repeated in multiple times in the signal. While slightly simplistic,\n", + "reasoning about such a sound example demonstrates this. The same\n", + "principles then apply to images and other similar data." + ] + }, + { + "cell_type": "markdown", + "id": "d08d4fb6", + "metadata": { + "editable": true + }, + "source": [ + "## Layers used to build CNNs\n", + "\n", + "A simple CNN is a sequence of layers, and every layer of a CNN\n", + "transforms one volume of activations to another through a\n", + "differentiable function. We use three main types of layers to build\n", + "CNN architectures: Convolutional Layer, Pooling Layer, and\n", + "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n", + "will stack these layers to form a full CNN architecture.\n", + "\n", + "A simple CNN for image classification could have the architecture:\n", + "\n", + "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n", + "\n", + "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n", + "\n", + "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n", + "\n", + "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n", + "\n", + "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume." + ] + }, + { + "cell_type": "markdown", + "id": "dd95dcc6", + "metadata": { + "editable": true + }, + "source": [ + "## Transforming images\n", + "\n", + "CNNs transform the original image layer by layer from the original\n", + "pixel values to the final class scores. \n", + "\n", + "Observe that some layers contain\n", + "parameters and other don’t. In particular, the CNN layers perform\n", + "transformations that are a function of not only the activations in the\n", + "input volume, but also of the parameters (the weights and biases of\n", + "the neurons). On the other hand, the RELU/POOL layers will implement a\n", + "fixed function. The parameters in the CONV/FC layers will be trained\n", + "with gradient descent so that the class scores that the CNN computes\n", + "are consistent with the labels in the training set for each image." + ] + }, + { + "cell_type": "markdown", + "id": "5fdbdbfd", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in brief\n", + "\n", + "In summary:\n", + "\n", + "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n", + "\n", + "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n", + "\n", + "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n", + "\n", + "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n", + "\n", + "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)" + ] + }, + { + "cell_type": "markdown", + "id": "c0cbb6b0", + "metadata": { + "editable": true + }, + "source": [ + "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A deep CNN

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "caf2418d", + "metadata": { + "editable": true + }, + "source": [ + "## Key Idea\n", + "\n", + "A dense neural network is representd by an affine operation (like\n", + "matrix-matrix multiplication) where all parameters are included.\n", + "\n", + "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n", + "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n", + "\n", + "We say we perform a filtering (convolution is the mathematical operation)." + ] + }, + { + "cell_type": "markdown", + "id": "7d5552d8", + "metadata": { + "editable": true + }, + "source": [ + "## How to do image compression before the era of deep learning\n", + "\n", + "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n", + "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n", + "\n", + "The orthogonal vectors which are obtained from the SVD, can be used to\n", + "project down the dimensionality of a given image. In the example here\n", + "we gray-scale an image and downsize it.\n", + "\n", + "This recipe relies on us being able to actually perform the SVD. For\n", + "large images, and in particular with many images to reconstruct, using the SVD \n", + "may quickly become an overwhelming task. With the advent of efficient deep\n", + "learning methods like CNNs and later generative methods, these methods\n", + "have become in the last years the premier way of performing image\n", + "analysis. In particular for classification problems with labelled images." + ] + }, + { + "cell_type": "markdown", + "id": "d0bc0489", + "metadata": { + "editable": true + }, + "source": [ + "## The SVD example" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cec697e6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from matplotlib.image import imread\n", + "import matplotlib.pyplot as plt\n", + "import scipy.linalg as ln\n", + "import numpy as np\n", + "import os\n", + "from PIL import Image\n", + "from math import log10, sqrt \n", + "plt.rcParams['figure.figsize'] = [16, 8]\n", + "# Import image\n", + "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n", + "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n", + "img = plt.imshow(X)\n", + "# convert to gray\n", + "img.set_cmap('gray')\n", + "plt.axis('off')\n", + "plt.show()\n", + "# Call image size\n", + "print(': %s'%str(X.shape))\n", + "\n", + "\n", + "# split the matrix into U, S, VT\n", + "U, S, VT = np.linalg.svd(X,full_matrices=False)\n", + "S = np.diag(S)\n", + "m = 800 # Image's width\n", + "n = 1200 # Image's height\n", + "j = 0\n", + "# Try compression with different k vectors (these represent projections):\n", + "for k in (5,10, 20, 100,200,400,500):\n", + " # Original size of the image\n", + " originalSize = m * n \n", + " # Size after compressed\n", + " compressedSize = k * (1 + m + n) \n", + " # The projection of the original image\n", + " Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n", + " plt.figure(j+1)\n", + " j += 1\n", + " img = plt.imshow(Xapprox)\n", + " img.set_cmap('gray')\n", + " \n", + " plt.axis('off')\n", + " plt.title('k = ' + str(k))\n", + " plt.show() \n", + " print('Original size of image:')\n", + " print(originalSize)\n", + " print('Compression rate as Compressed image / Original size:')\n", + " ratio = compressedSize * 1.0 / originalSize\n", + " print(ratio)\n", + " print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' ) \n", + " # Estimate MQA\n", + " x= X.astype(\"float\")\n", + " y=Xapprox.astype(\"float\")\n", + " err = np.sum((x - y) ** 2)\n", + " err /= float(X.shape[0] * Xapprox.shape[1])\n", + " print('The mean-square deviation '+ str(round( err)))\n", + " max_pixel = 255.0\n", + " # Estimate Signal Noise Ratio\n", + " srv = 20 * (log10(max_pixel / sqrt(err)))\n", + " print('Signa to noise ratio '+ str(round(srv)) +'dB')" + ] + }, + { + "cell_type": "markdown", + "id": "6a578704", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of CNNs\n", + "\n", + "The mathematics of CNNs is based on the mathematical operation of\n", + "**convolution**. In mathematics (in particular in functional analysis),\n", + "convolution is represented by mathematical operations (integration,\n", + "summation etc) on two functions in order to produce a third function\n", + "that expresses how the shape of one gets modified by the other.\n", + "Convolution has a plethora of applications in a variety of\n", + "disciplines, spanning from statistics to signal processing, computer\n", + "vision, solutions of differential equations,linear algebra,\n", + "engineering, and yes, machine learning.\n", + "\n", + "Mathematically, convolution is defined as follows (one-dimensional example):\n", + "Let us define a continuous function $y(t)$ given by" + ] + }, + { + "cell_type": "markdown", + "id": "5c858d52", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\int x(a) w(t-a) da,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a96333c3", + "metadata": { + "editable": true + }, + "source": [ + "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n", + "\n", + "The above integral is written in a more compact form as" + ] + }, + { + "cell_type": "markdown", + "id": "9834d45e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\left(x * w\\right)(t).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "13e15c5f", + "metadata": { + "editable": true + }, + "source": [ + "The discretized version reads" + ] + }, + { + "cell_type": "markdown", + "id": "0a496b2f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "48c5ecd3", + "metadata": { + "editable": true + }, + "source": [ + "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n", + "\n", + "How can we use this? And what does it mean? Let us study some familiar examples first." + ] + }, + { + "cell_type": "markdown", + "id": "7cab11e7", + "metadata": { + "editable": true + }, + "source": [ + "## Convolution Examples: Polynomial multiplication\n", + "\n", + "Our first example is that of a multiplication between two polynomials,\n", + "which we will rewrite in terms of the mathematics of convolution. In\n", + "the final stage, since the problem here is a discrete one, we will\n", + "recast the final expression in terms of a matrix-vector\n", + "multiplication, where the matrix is a so-called [Toeplitz matrix\n", + "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n", + "\n", + "Let us look a the following polynomials to second and third order, respectively:" + ] + }, + { + "cell_type": "markdown", + "id": "c90333f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c1b0c9b", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9c8df6e8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "50667dfa", + "metadata": { + "editable": true + }, + "source": [ + "The polynomial multiplication gives us a new polynomial of degree $5$" + ] + }, + { + "cell_type": "markdown", + "id": "11f2ea4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4abea758", + "metadata": { + "editable": true + }, + "source": [ + "## Efficient Polynomial Multiplication\n", + "\n", + "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n", + "We note first that the new coefficients are given as" + ] + }, + { + "cell_type": "markdown", + "id": "ad22b2d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{split}\n", + "\\delta_0=&\\alpha_0\\beta_0\\\\\n", + "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n", + "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n", + "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n", + "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n", + "\\delta_5=&\\alpha_2\\beta_3.\\\\\n", + "\\end{split}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a3ee064", + "metadata": { + "editable": true + }, + "source": [ + "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n", + "\n", + "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as" + ] + }, + { + "cell_type": "markdown", + "id": "3aca65d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0e04ce27", + "metadata": { + "editable": true + }, + "source": [ + "or as a double sum with restriction $l=i+j$" + ] + }, + { + "cell_type": "markdown", + "id": "173eda29", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a196c2cd", + "metadata": { + "editable": true + }, + "source": [ + "## Further simplification\n", + "\n", + "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as" + ] + }, + { + "cell_type": "markdown", + "id": "56018bb8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba91ab7b", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of\n", + "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation." + ] + }, + { + "cell_type": "markdown", + "id": "1b25324b", + "metadata": { + "editable": true + }, + "source": [ + "## A more efficient way of coding the above Convolution\n", + "\n", + "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n", + "which are non-zero, we can rewrite the above convolution expressions\n", + "as a matrix-vector multiplication" + ] + }, + { + "cell_type": "markdown", + "id": "dd6d9155", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n", + " \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n", + "\t\t\t \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n", + "\t\t\t 0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n", + "\t\t\t 0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n", + "\t\t\t 0 & 0 & 0 & \\alpha_2\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "28050537", + "metadata": { + "editable": true + }, + "source": [ + "## Commutative process\n", + "\n", + "The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding $\\beta$ and a vector holding $\\alpha$.\n", + "In this case we have" + ] + }, + { + "cell_type": "markdown", + "id": "f8278af4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0 \\\\\n", + " \\beta_1 & \\beta_0 & 0 \\\\\n", + "\t\t\t \\beta_2 & \\beta_1 & \\beta_0 \\\\\n", + "\t\t\t \\beta_3 & \\beta_2 & \\beta_1 \\\\\n", + "\t\t\t 0 & \\beta_3 & \\beta_2 \\\\\n", + "\t\t\t 0 & 0 & \\beta_3\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfa8bf9e", + "metadata": { + "editable": true + }, + "source": [ + "Note that the use of these matrices is for mathematical purposes only\n", + "and not implementation purposes. When implementing the above equation\n", + "we do not encode (and allocate memory) the matrices explicitely. We\n", + "rather code the convolutions in the minimal memory footprint that they\n", + "require." + ] + }, + { + "cell_type": "markdown", + "id": "4ad971ca", + "metadata": { + "editable": true + }, + "source": [ + "## Toeplitz matrices\n", + "\n", + "The above matrices are examples of so-called [Toeplitz\n", + "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n", + "Toeplitz matrix is a matrix in which each descending diagonal from\n", + "left to right is constant. For instance the last matrix, which we\n", + "rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "ff12250a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0 \\\\\n", + " a_1 & a_0 & 0 \\\\\n", + "\t\t\t a_2 & a_1 & a_0 \\\\\n", + "\t\t\t a_3 & a_2 & a_1 \\\\\n", + "\t\t\t 0 & a_3 & a_2 \\\\\n", + "\t\t\t 0 & 0 & a_3\n", + "\t\t\t \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d7ebe2e9", + "metadata": { + "editable": true + }, + "source": [ + "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n", + "matrix. Such a matrix does not need to be a square matrix. Toeplitz\n", + "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n", + "polynomial, compressed to a finite-dimensional space, can be\n", + "represented by such a matrix. The example above shows that we can\n", + "represent linear convolution as multiplication of a Toeplitz matrix by\n", + "a vector." + ] + }, + { + "cell_type": "markdown", + "id": "5bfa6cd4", + "metadata": { + "editable": true + }, + "source": [ + "## Fourier series and Toeplitz matrices\n", + "\n", + "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n", + "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n", + "\n", + "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)" + ] + }, + { + "cell_type": "markdown", + "id": "4cb64d8c", + "metadata": { + "editable": true + }, + "source": [ + "## Generalizing the above one-dimensional case\n", + "\n", + "In order to align the above simple case with the more general\n", + "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n", + "with $\\boldsymbol{w}$. We will interpret $\\boldsymbol{w}$ as a weight/filter function\n", + "with which we want to perform the convolution with an input variable\n", + "$\\boldsymbol{x}$ of length $n$. We will assume always that the filter\n", + "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n", + "\n", + "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have" + ] + }, + { + "cell_type": "markdown", + "id": "b05f94fc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e95bb8b8", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n", + "Here the symbol $*$ represents the mathematical operation of convolution." + ] + }, + { + "cell_type": "markdown", + "id": "490b28d9", + "metadata": { + "editable": true + }, + "source": [ + "## Memory considerations\n", + "\n", + "This expression leaves us however with some terms with negative\n", + "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n", + "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n", + "\n", + "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n", + "represent a third-order polynomial.\n", + "\n", + "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n", + "contains the coefficients of a fifth-order polynomial. When $i=5$ we\n", + "may also have values of $x(4)$ and $x(5)$ which are not defined." + ] + }, + { + "cell_type": "markdown", + "id": "73dba37b", + "metadata": { + "editable": true + }, + "source": [ + "## Padding\n", + "\n", + "The solution to this is what is called **padding**! We simply define a\n", + "new vector $x$ with two added elements set to zero before $x(0)$ and\n", + "two new elements after $x(3)$ set to zero. That is, we augment the\n", + "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n", + "constant (a new hyperparameter), see discussions below as well." + ] + }, + { + "cell_type": "markdown", + "id": "a4ef9cfb", + "metadata": { + "editable": true + }, + "source": [ + "## New vector\n", + "\n", + "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n", + "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n", + "$x(6)=0$, and $x(7)=0$.\n", + "\n", + "We have added four new elements, which\n", + "are all zero. The benefit is that we can rewrite the equation for\n", + "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$," + ] + }, + { + "cell_type": "markdown", + "id": "a3df037d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a10c95fd", + "metadata": { + "editable": true + }, + "source": [ + "As an example, we have" + ] + }, + { + "cell_type": "markdown", + "id": "be674b8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c903130e", + "metadata": { + "editable": true + }, + "source": [ + "as before except that we have an additional term $x(6)w(0)$, which is zero.\n", + "\n", + "Similarly, for the fifth-order term we have" + ] + }, + { + "cell_type": "markdown", + "id": "369fb648", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9eae3982", + "metadata": { + "editable": true + }, + "source": [ + "The zeroth-order term is" + ] + }, + { + "cell_type": "markdown", + "id": "52147ec0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f26b1f24", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting as dot products\n", + "\n", + "If we now flip the filter/weight vector, with the following term as a typical example" + ] + }, + { + "cell_type": "markdown", + "id": "1cda7b7e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "de80daa7", + "metadata": { + "editable": true + }, + "source": [ + "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n", + "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n", + "\n", + "The padding $P$ we have introduced for the convolution stage is just\n", + "another hyperparameter which is introduced as part of the\n", + "architecture. Similarly, below we will also introduce another\n", + "hyperparameter called **Stride** $S$." + ] + }, + { + "cell_type": "markdown", + "id": "bdb16a64", + "metadata": { + "editable": true + }, + "source": [ + "## Cross correlation\n", + "\n", + "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n", + "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)" + ] + }, + { + "cell_type": "markdown", + "id": "a88a1043", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "03659d77", + "metadata": { + "editable": true + }, + "source": [ + "we have now" + ] + }, + { + "cell_type": "markdown", + "id": "532e84de", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0487f1f5", + "metadata": { + "editable": true + }, + "source": [ + "Both TensorFlow and PyTorch (as well as our own code example below),\n", + "implement the last equation, although it is normally referred to as\n", + "convolution. The same padding rules and stride rules discussed below\n", + "apply to this expression as well.\n", + "\n", + "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression." + ] + }, + { + "cell_type": "markdown", + "id": "98475dfa", + "metadata": { + "editable": true + }, + "source": [ + "## Two-dimensional objects\n", + "\n", + "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n", + "We often use convolutions over more than one dimension at a time. If\n", + "we have a two-dimensional image $X$ as input, we can have a **filter**\n", + "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$" + ] + }, + { + "cell_type": "markdown", + "id": "1cb3be71", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd9cd9fb", + "metadata": { + "editable": true + }, + "source": [ + "Convolution is a commutative process, which means we can rewrite this equation as" + ] + }, + { + "cell_type": "markdown", + "id": "1ba314a8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9fb3fef", + "metadata": { + "editable": true + }, + "source": [ + "Normally the latter is more straightforward to implement in a machine\n", + "larning library since there is less variation in the range of values\n", + "of $m$ and $n$.\n", + "\n", + "As mentioned above, most deep learning libraries implement\n", + "cross-correlation instead of convolution (although it is referred to as\n", + "convolution)" + ] + }, + { + "cell_type": "markdown", + "id": "2d48086b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2a62fbae", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in more detail, simple example\n", + "\n", + "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n", + "and a $2\\times 2$ filter $W$ given by the following matrices" + ] + }, + { + "cell_type": "markdown", + "id": "0176ecc6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02} \\\\\n", + " x_{10} & x_{11} & x_{12} \\\\\n", + "\t x_{20} & x_{21} & x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f87b6051", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "164502cc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n", + "\t w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1d4e61fe", + "metadata": { + "editable": true + }, + "source": [ + "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n", + "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n", + "\n", + "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n", + "\n", + "Here we perform the operation" + ] + }, + { + "cell_type": "markdown", + "id": "7aae890d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "352ba109", + "metadata": { + "editable": true + }, + "source": [ + "and obtain" + ] + }, + { + "cell_type": "markdown", + "id": "4660c16f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\\\\n", + "\t x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "edb9d39b", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n", + "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as" + ] + }, + { + "cell_type": "markdown", + "id": "11470079", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9b505f16", + "metadata": { + "editable": true + }, + "source": [ + "and the new matrix" + ] + }, + { + "cell_type": "markdown", + "id": "30c903b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n", + " 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n", + "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\\\\n", + " 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "057a5e31", + "metadata": { + "editable": true + }, + "source": [ + "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "e5f35917", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c7cca5e", + "metadata": { + "editable": true + }, + "source": [ + "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\\times 2$ output matrix." + ] + }, + { + "cell_type": "markdown", + "id": "ed8782fc", + "metadata": { + "editable": true + }, + "source": [ + "## The convolution stage\n", + "\n", + "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n", + "order to reduce the dimensionality of an image, adds, in addition to\n", + "the weights and biases (to be trained by the back propagation\n", + "algorithm) that define the filters, two new hyperparameters, the so-called\n", + "**padding** $P$ and the stride $S$." + ] + }, + { + "cell_type": "markdown", + "id": "3582873f", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the number of parameters\n", + "\n", + "In the above example we have an input matrix of dimension $3\\times\n", + "3$. In general we call the input for an input volume and it is defined\n", + "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n", + "standard three color channels $D_1=3$.\n", + "\n", + "The above example has $W_1=H_1=3$ and $D_1=1$.\n", + "\n", + "When we introduce the filter we have the following additional hyperparameters\n", + "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n", + "\n", + "2. $F$ as the filter's spatial extent\n", + "\n", + "3. $S$ as the stride parameter\n", + "\n", + "4. $P$ as the padding parameter\n", + "\n", + "These parameters are defined by the architecture of the network and are not included in the training." + ] + }, + { + "cell_type": "markdown", + "id": "c06b2b85", + "metadata": { + "editable": true + }, + "source": [ + "## New image (or volume)\n", + "\n", + "Acting with the filter on the input volume produces an output volume\n", + "which is defined by its width $W_2$, its height $H_2$ and its depth\n", + "$D_2$.\n", + "\n", + "These are defined by the following relations" + ] + }, + { + "cell_type": "markdown", + "id": "aa9ff748", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0508533e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b59d0d6", + "metadata": { + "editable": true + }, + "source": [ + "and $D_2=K$." + ] + }, + { + "cell_type": "markdown", + "id": "e283f13b", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters to train, common settings\n", + "\n", + "With parameter sharing, the convolution involves thus for each filter $F\\times F\\times D_1$ weights plus one bias parameter.\n", + "\n", + "In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "59617fcb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f406e197", + "metadata": { + "editable": true + }, + "source": [ + "parameters to train by back propagation.\n", + "\n", + "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n", + "\n", + "**Common settings.**\n", + "\n", + "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n", + "\n", + "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n", + "\n", + "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n", + "\n", + "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$" + ] + }, + { + "cell_type": "markdown", + "id": "82febfb4", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of CNN setups\n", + "\n", + "Let us assume we have an input volume $V$ given by an image of dimensionality\n", + "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n", + "\n", + "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n", + "\n", + "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n", + "of dimensionality $28\\times 28\\times 3$.\n", + "\n", + "The total number of parameters to train for each filter is then\n", + "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n", + "gives us $76$ parameters for each filter, leading to a total of $760$\n", + "parameters for the ten filters.\n", + "\n", + "How many parameters will a filter of dimensionality $3\\times 3$\n", + "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n", + "\n", + "Note that strides constitute a form of **subsampling**. As an alternative to\n", + "being interpreted as a measure of how much the kernel/filter is translated, strides\n", + "can also be viewed as how much of the output is retained. For instance, moving\n", + "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n", + "retaining only odd output elements." + ] + }, + { + "cell_type": "markdown", + "id": "638e063c", + "metadata": { + "editable": true + }, + "source": [ + "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A deep CNN

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d182de4b", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling\n", + "\n", + "In addition to discrete convolutions themselves, **pooling** operations\n", + "make up another important building block in CNNs. Pooling operations reduce\n", + "the size of feature maps by using some function to summarize subregions, such\n", + "as taking the average or the maximum value.\n", + "\n", + "Pooling works by sliding a window across the input and feeding the content of\n", + "the window to a **pooling function**. In some sense, pooling works very much\n", + "like a discrete convolution, but replaces the linear combination described by\n", + "the kernel with some other function." + ] + }, + { + "cell_type": "markdown", + "id": "1159bffe", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling arithmetic\n", + "\n", + "In a neural network, pooling layers provide invariance to small translations of\n", + "the input. The most common kind of pooling is **max pooling**, which\n", + "consists in splitting the input in (usually non-overlapping) patches and\n", + "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n", + "mean or average pooling, which all share the same idea of aggregating the input\n", + "locally by applying a non-linearity to the content of some patches." + ] + }, + { + "cell_type": "markdown", + "id": "138b6d6a", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A deep CNN

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "97123878", + "metadata": { + "editable": true + }, + "source": [ + "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n", + "\n", + "As discussed above, CNNs are neural networks built from the assumption\n", + "that the inputs to the network are 2D images. This is important\n", + "because the number of features or pixels in images grows very fast\n", + "with the image size, and an enormous number of weights and biases are\n", + "needed in order to build an accurate network. Next week we will\n", + "discuss in more detail how we can build a CNN using either TensorFlow\n", + "with Keras and PyTorch." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/_sources/week45.ipynb b/doc/LectureNotes/_build/html/_sources/week45.ipynb new file mode 100644 index 000000000..c5336e2ab --- /dev/null +++ b/doc/LectureNotes/_build/html/_sources/week45.ipynb @@ -0,0 +1,2335 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9686648f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "45892517", + "metadata": { + "editable": true + }, + "source": [ + "# Week 45, Convolutional Neural Networks (CCNs)\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n", + "\n", + "Date: **November 3-7, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "8449fbfd", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 45\n", + "\n", + "**Material for the lecture on Monday November 3, 2025.**\n", + "\n", + "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n", + "\n", + "2. Readings and Videos:\n", + "\n", + "3. These lecture notes at \n", + "\n", + "4. Video of lecture at \n", + "\n", + "5. Whiteboard notes at \n", + "\n", + "6. For a more in depth discussion on CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications \n", + "\n", + "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at .\n", + "\n", + "\n", + "a. Video on Deep Learning at " + ] + }, + { + "cell_type": "markdown", + "id": "4ad8a4b2", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "Discussion of and work on project 2, no exercises this week, only project work" + ] + }, + { + "cell_type": "markdown", + "id": "48e99fbe", + "metadata": { + "editable": true + }, + "source": [ + "## Material for Lecture Monday November 3" + ] + }, + { + "cell_type": "markdown", + "id": "661e183c", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Networks (recognizing images), reminder from last week\n", + "\n", + "Convolutional neural networks (CNNs) were developed during the last\n", + "decade of the previous century, with a focus on character recognition\n", + "tasks. Nowadays, CNNs are a central element in the spectacular success\n", + "of deep learning methods. The success in for example image\n", + "classifications have made them a central tool for most machine\n", + "learning practitioners.\n", + "\n", + "CNNs are very similar to ordinary Neural Networks.\n", + "They are made up of neurons that have learnable weights and\n", + "biases. Each neuron receives some inputs, performs a dot product and\n", + "optionally follows it with a non-linearity. The whole network still\n", + "expresses a single differentiable score function: from the raw image\n", + "pixels on one end to class scores at the other. And they still have a\n", + "loss function (for example Softmax) on the last (fully-connected) layer\n", + "and all the tips/tricks we developed for learning regular Neural\n", + "Networks still apply (back propagation, gradient descent etc etc)." + ] + }, + { + "cell_type": "markdown", + "id": "96a38398", + "metadata": { + "editable": true + }, + "source": [ + "## What is the Difference\n", + "\n", + "**CNN architectures make the explicit assumption that\n", + "the inputs are images, which allows us to encode certain properties\n", + "into the architecture. These then make the forward function more\n", + "efficient to implement and vastly reduce the amount of parameters in\n", + "the network.**" + ] + }, + { + "cell_type": "markdown", + "id": "3ca522fb", + "metadata": { + "editable": true + }, + "source": [ + "## Neural Networks vs CNNs\n", + "\n", + "Neural networks are defined as **affine transformations**, that is \n", + "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n", + "output (to which a bias vector is usually added before passing the result\n", + "through a nonlinear activation function). This is applicable to any type of input, be it an\n", + "image, a sound clip or an unordered collection of features: whatever their\n", + "dimensionality, their representation can always be flattened into a vector\n", + "before the transformation." + ] + }, + { + "cell_type": "markdown", + "id": "609aa156", + "metadata": { + "editable": true + }, + "source": [ + "## Why CNNS for images, sound files, medical images from CT scans etc?\n", + "\n", + "However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic\n", + "structure. More formally, they share these important properties:\n", + "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n", + "\n", + "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n", + "\n", + "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n", + "\n", + "These properties are not exploited when an affine transformation is applied; in\n", + "fact, all the axes are treated in the same way and the topological information\n", + "is not taken into account. Still, taking advantage of the implicit structure of\n", + "the data may prove very handy in solving some tasks, like computer vision and\n", + "speech recognition, and in these cases it would be best to preserve it. This is\n", + "where discrete convolutions come into play.\n", + "\n", + "A discrete convolution is a linear transformation that preserves this notion of\n", + "ordering. It is sparse (only a few input units contribute to a given output\n", + "unit) and reuses parameters (the same weights are applied to multiple locations\n", + "in the input)." + ] + }, + { + "cell_type": "markdown", + "id": "c280e4de", + "metadata": { + "editable": true + }, + "source": [ + "## Regular NNs don’t scale well to full images\n", + "\n", + "As an example, consider\n", + "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n", + "single fully-connected neuron in a first hidden layer of a regular\n", + "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n", + "seems manageable, but clearly this fully-connected structure does not\n", + "scale to larger images. For example, an image of more respectable\n", + "size, say $200\\times 200\\times 3$, would lead to neurons that have \n", + "$200\\times 200\\times 3 = 120,000$ weights. \n", + "\n", + "We could have\n", + "several such neurons, and the parameters would add up quickly! Clearly,\n", + "this full connectivity is wasteful and the huge number of parameters\n", + "would quickly lead to possible overfitting.\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A regular 3-layer Neural Network.

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0d86d50e", + "metadata": { + "editable": true + }, + "source": [ + "## 3D volumes of neurons\n", + "\n", + "Convolutional Neural Networks take advantage of the fact that the\n", + "input consists of images and they constrain the architecture in a more\n", + "sensible way. \n", + "\n", + "In particular, unlike a regular Neural Network, the\n", + "layers of a CNN have neurons arranged in 3 dimensions: width,\n", + "height, depth. (Note that the word depth here refers to the third\n", + "dimension of an activation volume, not to the depth of a full Neural\n", + "Network, which can refer to the total number of layers in a network.)\n", + "\n", + "To understand it better, the above example of an image \n", + "with an input volume of\n", + "activations has dimensions $32\\times 32\\times 3$ (width, height,\n", + "depth respectively). \n", + "\n", + "The neurons in a layer will\n", + "only be connected to a small region of the layer before it, instead of\n", + "all of the neurons in a fully-connected manner. Moreover, the final\n", + "output layer could for this specific image have dimensions $1\\times 1 \\times 10$, \n", + "because by the\n", + "end of the CNN architecture we will reduce the full image into a\n", + "single vector of class scores, arranged along the depth\n", + "dimension. \n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "93102a35", + "metadata": { + "editable": true + }, + "source": [ + "## More on Dimensionalities\n", + "\n", + "In fields like signal processing (and imaging as well), one designs\n", + "so-called filters. These filters are defined by the convolutions and\n", + "are often hand-crafted. One may specify filters for smoothing, edge\n", + "detection, frequency reshaping, and similar operations. However with\n", + "neural networks the idea is to automatically learn the filters and use\n", + "many of them in conjunction with non-linear operations (activation\n", + "functions).\n", + "\n", + "As an example consider a neural network operating on sound sequence\n", + "data. Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$. We\n", + "construct then a neural network with onle hidden layer only with\n", + "$10^4$ nodes. This means that we will have a weight matrix with\n", + "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n", + "\n", + "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n", + "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "b0e6ea33", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3fbba997", + "metadata": { + "editable": true + }, + "source": [ + "that is ten billion parameters to determine." + ] + }, + { + "cell_type": "markdown", + "id": "4be9d3e0", + "metadata": { + "editable": true + }, + "source": [ + "## Further remarks\n", + "\n", + "The main principles that justify convolutions is locality of\n", + "information and repetion of patterns within the signal. Sound samples\n", + "of the input in adjacent spots are much more likely to affect each\n", + "other than those that are very far away. Similarly, sounds are\n", + "repeated in multiple times in the signal. While slightly simplistic,\n", + "reasoning about such a sound example demonstrates this. The same\n", + "principles then apply to images and other similar data." + ] + }, + { + "cell_type": "markdown", + "id": "b93711ab", + "metadata": { + "editable": true + }, + "source": [ + "## Layers used to build CNNs\n", + "\n", + "A simple CNN is a sequence of layers, and every layer of a CNN\n", + "transforms one volume of activations to another through a\n", + "differentiable function. We use three main types of layers to build\n", + "CNN architectures: Convolutional Layer, Pooling Layer, and\n", + "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n", + "will stack these layers to form a full CNN architecture.\n", + "\n", + "A simple CNN for image classification could have the architecture:\n", + "\n", + "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n", + "\n", + "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n", + "\n", + "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n", + "\n", + "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n", + "\n", + "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume." + ] + }, + { + "cell_type": "markdown", + "id": "df93de2c", + "metadata": { + "editable": true + }, + "source": [ + "## Transforming images\n", + "\n", + "CNNs transform the original image layer by layer from the original\n", + "pixel values to the final class scores. \n", + "\n", + "Observe that some layers contain\n", + "parameters and other don’t. In particular, the CNN layers perform\n", + "transformations that are a function of not only the activations in the\n", + "input volume, but also of the parameters (the weights and biases of\n", + "the neurons). On the other hand, the RELU/POOL layers will implement a\n", + "fixed function. The parameters in the CONV/FC layers will be trained\n", + "with gradient descent so that the class scores that the CNN computes\n", + "are consistent with the labels in the training set for each image." + ] + }, + { + "cell_type": "markdown", + "id": "35b469f8", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in brief\n", + "\n", + "In summary:\n", + "\n", + "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n", + "\n", + "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n", + "\n", + "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n", + "\n", + "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n", + "\n", + "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)" + ] + }, + { + "cell_type": "markdown", + "id": "f2bc243c", + "metadata": { + "editable": true + }, + "source": [ + "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A deep CNN

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "92956a26", + "metadata": { + "editable": true + }, + "source": [ + "## Key Idea\n", + "\n", + "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n", + "\n", + "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n", + "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n", + "\n", + "We say we perform a filtering (convolution is the mathematical operation)." + ] + }, + { + "cell_type": "markdown", + "id": "b758f4ee", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of CNNs\n", + "\n", + "The mathematics of CNNs is based on the mathematical operation of\n", + "**convolution**. In mathematics (in particular in functional analysis),\n", + "convolution is represented by mathematical operations (integration,\n", + "summation etc) on two functions in order to produce a third function\n", + "that expresses how the shape of one gets modified by the other.\n", + "Convolution has a plethora of applications in a variety of\n", + "disciplines, spanning from statistics to signal processing, computer\n", + "vision, solutions of differential equations,linear algebra,\n", + "engineering, and yes, machine learning.\n", + "\n", + "Mathematically, convolution is defined as follows (one-dimensional example):\n", + "Let us define a continuous function $y(t)$ given by" + ] + }, + { + "cell_type": "markdown", + "id": "9fa911b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\int x(a) w(t-a) da,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "918817a5", + "metadata": { + "editable": true + }, + "source": [ + "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n", + "\n", + "The above integral is written in a more compact form as" + ] + }, + { + "cell_type": "markdown", + "id": "d5538df6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\left(x * w\\right)(t).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4a4e2bc", + "metadata": { + "editable": true + }, + "source": [ + "The discretized version reads" + ] + }, + { + "cell_type": "markdown", + "id": "68268e68", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "198bcce9", + "metadata": { + "editable": true + }, + "source": [ + "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n", + "\n", + "How can we use this? And what does it mean? Let us study some familiar examples first." + ] + }, + { + "cell_type": "markdown", + "id": "43b535c4", + "metadata": { + "editable": true + }, + "source": [ + "## Convolution Examples: Polynomial multiplication\n", + "\n", + "Our first example is that of a multiplication between two polynomials,\n", + "which we will rewrite in terms of the mathematics of convolution. In\n", + "the final stage, since the problem here is a discrete one, we will\n", + "recast the final expression in terms of a matrix-vector\n", + "multiplication, where the matrix is a so-called [Toeplitz matrix\n", + "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n", + "\n", + "Let us look a the following polynomials to second and third order, respectively:" + ] + }, + { + "cell_type": "markdown", + "id": "45bc8ffc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2c42df04", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "08c139bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bf189420", + "metadata": { + "editable": true + }, + "source": [ + "The polynomial multiplication gives us a new polynomial of degree $5$" + ] + }, + { + "cell_type": "markdown", + "id": "7f5d7607", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a2f47e64", + "metadata": { + "editable": true + }, + "source": [ + "## Efficient Polynomial Multiplication\n", + "\n", + "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n", + "We note first that the new coefficients are given as" + ] + }, + { + "cell_type": "markdown", + "id": "7890aee8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{split}\n", + "\\delta_0=&\\alpha_0\\beta_0\\\\\n", + "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n", + "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n", + "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n", + "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n", + "\\delta_5=&\\alpha_2\\beta_3.\\\\\n", + "\\end{split}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a03a3eb", + "metadata": { + "editable": true + }, + "source": [ + "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n", + "\n", + "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as" + ] + }, + { + "cell_type": "markdown", + "id": "b49e404f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4ef5b061", + "metadata": { + "editable": true + }, + "source": [ + "or as a double sum with restriction $l=i+j$" + ] + }, + { + "cell_type": "markdown", + "id": "61685a6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ced5341", + "metadata": { + "editable": true + }, + "source": [ + "## Further simplification\n", + "\n", + "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as" + ] + }, + { + "cell_type": "markdown", + "id": "3d00697e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "22837be3", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of\n", + "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation." + ] + }, + { + "cell_type": "markdown", + "id": "1603a086", + "metadata": { + "editable": true + }, + "source": [ + "## A more efficient way of coding the above Convolution\n", + "\n", + "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n", + "which are non-zero, we can rewrite the above convolution expressions\n", + "as a matrix-vector multiplication" + ] + }, + { + "cell_type": "markdown", + "id": "340acf5c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n", + " \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n", + "\t\t\t \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n", + "\t\t\t 0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n", + "\t\t\t 0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n", + "\t\t\t 0 & 0 & 0 & \\alpha_2\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cdc8d513", + "metadata": { + "editable": true + }, + "source": [ + "## Commutative process\n", + "\n", + "The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding $\\beta$ and a vector holding $\\alpha$.\n", + "In this case we have" + ] + }, + { + "cell_type": "markdown", + "id": "51e1f3d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0 \\\\\n", + " \\beta_1 & \\beta_0 & 0 \\\\\n", + "\t\t\t \\beta_2 & \\beta_1 & \\beta_0 \\\\\n", + "\t\t\t \\beta_3 & \\beta_2 & \\beta_1 \\\\\n", + "\t\t\t 0 & \\beta_3 & \\beta_2 \\\\\n", + "\t\t\t 0 & 0 & \\beta_3\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ce936f65", + "metadata": { + "editable": true + }, + "source": [ + "Note that the use of these matrices is for mathematical purposes only\n", + "and not implementation purposes. When implementing the above equation\n", + "we do not encode (and allocate memory) the matrices explicitely. We\n", + "rather code the convolutions in the minimal memory footprint that they\n", + "require." + ] + }, + { + "cell_type": "markdown", + "id": "c93a683f", + "metadata": { + "editable": true + }, + "source": [ + "## Toeplitz matrices\n", + "\n", + "The above matrices are examples of so-called [Toeplitz\n", + "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n", + "Toeplitz matrix is a matrix in which each descending diagonal from\n", + "left to right is constant. For instance the last matrix, which we\n", + "rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "1e3cffca", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0 \\\\\n", + " a_1 & a_0 & 0 \\\\\n", + "\t\t\t a_2 & a_1 & a_0 \\\\\n", + "\t\t\t a_3 & a_2 & a_1 \\\\\n", + "\t\t\t 0 & a_3 & a_2 \\\\\n", + "\t\t\t 0 & 0 & a_3\n", + "\t\t\t \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e27270d9", + "metadata": { + "editable": true + }, + "source": [ + "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n", + "matrix. Such a matrix does not need to be a square matrix. Toeplitz\n", + "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n", + "polynomial, compressed to a finite-dimensional space, can be\n", + "represented by such a matrix. The example above shows that we can\n", + "represent linear convolution as multiplication of a Toeplitz matrix by\n", + "a vector." + ] + }, + { + "cell_type": "markdown", + "id": "125ef645", + "metadata": { + "editable": true + }, + "source": [ + "## Fourier series and Toeplitz matrices\n", + "\n", + "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n", + "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n", + "\n", + "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)" + ] + }, + { + "cell_type": "markdown", + "id": "d13ab1e4", + "metadata": { + "editable": true + }, + "source": [ + "## Generalizing the above one-dimensional case\n", + "\n", + "In order to align the above simple case with the more general\n", + "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n", + "with $\\boldsymbol{w}$. We will interpret $\\boldsymbol{w}$ as a weight/filter function\n", + "with which we want to perform the convolution with an input variable\n", + "$\\boldsymbol{x}$ of length $n$. We will assume always that the filter\n", + "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n", + "\n", + "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have" + ] + }, + { + "cell_type": "markdown", + "id": "b9eb4b1e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bdf0893f", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n", + "Here the symbol $*$ represents the mathematical operation of convolution." + ] + }, + { + "cell_type": "markdown", + "id": "64cd5dbb", + "metadata": { + "editable": true + }, + "source": [ + "## Memory considerations\n", + "\n", + "This expression leaves us however with some terms with negative\n", + "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n", + "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n", + "\n", + "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n", + "represent a third-order polynomial.\n", + "\n", + "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n", + "contains the coefficients of a fifth-order polynomial. When $i=5$ we\n", + "may also have values of $x(4)$ and $x(5)$ which are not defined." + ] + }, + { + "cell_type": "markdown", + "id": "20fa0219", + "metadata": { + "editable": true + }, + "source": [ + "## Padding\n", + "\n", + "The solution to this is what is called **padding**! We simply define a\n", + "new vector $x$ with two added elements set to zero before $x(0)$ and\n", + "two new elements after $x(3)$ set to zero. That is, we augment the\n", + "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n", + "constant (a new hyperparameter), see discussions below as well." + ] + }, + { + "cell_type": "markdown", + "id": "d24c7e69", + "metadata": { + "editable": true + }, + "source": [ + "## New vector\n", + "\n", + "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n", + "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n", + "$x(6)=0$, and $x(7)=0$.\n", + "\n", + "We have added four new elements, which\n", + "are all zero. The benefit is that we can rewrite the equation for\n", + "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$," + ] + }, + { + "cell_type": "markdown", + "id": "c00151a8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9b39bfd", + "metadata": { + "editable": true + }, + "source": [ + "As an example, we have" + ] + }, + { + "cell_type": "markdown", + "id": "53de5ac4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e1025d77", + "metadata": { + "editable": true + }, + "source": [ + "as before except that we have an additional term $x(6)w(0)$, which is zero.\n", + "\n", + "Similarly, for the fifth-order term we have" + ] + }, + { + "cell_type": "markdown", + "id": "34a5a413", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ef38242", + "metadata": { + "editable": true + }, + "source": [ + "The zeroth-order term is" + ] + }, + { + "cell_type": "markdown", + "id": "42a8bd2e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2580d624", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting as dot products\n", + "\n", + "If we now flip the filter/weight vector, with the following term as a typical example" + ] + }, + { + "cell_type": "markdown", + "id": "76157e3c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a47c0bbf", + "metadata": { + "editable": true + }, + "source": [ + "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n", + "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n", + "\n", + "The padding $P$ we have introduced for the convolution stage is just\n", + "another hyperparameter which is introduced as part of the\n", + "architecture. Similarly, below we will also introduce another\n", + "hyperparameter called **Stride** $S$." + ] + }, + { + "cell_type": "markdown", + "id": "4de2c235", + "metadata": { + "editable": true + }, + "source": [ + "## Cross correlation\n", + "\n", + "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n", + "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)" + ] + }, + { + "cell_type": "markdown", + "id": "33319954", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d46fb216", + "metadata": { + "editable": true + }, + "source": [ + "we have now" + ] + }, + { + "cell_type": "markdown", + "id": "1125a773", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4e9ea645", + "metadata": { + "editable": true + }, + "source": [ + "Both TensorFlow and PyTorch (as well as our own code example below),\n", + "implement the last equation, although it is normally referred to as\n", + "convolution. The same padding rules and stride rules discussed below\n", + "apply to this expression as well.\n", + "\n", + "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression." + ] + }, + { + "cell_type": "markdown", + "id": "711fc589", + "metadata": { + "editable": true + }, + "source": [ + "## Two-dimensional objects\n", + "\n", + "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n", + "We often use convolutions over more than one dimension at a time. If\n", + "we have a two-dimensional image $X$ as input, we can have a **filter**\n", + "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$" + ] + }, + { + "cell_type": "markdown", + "id": "ea93186d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2ce72e4f", + "metadata": { + "editable": true + }, + "source": [ + "Convolution is a commutative process, which means we can rewrite this equation as" + ] + }, + { + "cell_type": "markdown", + "id": "7c891889", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "337f70a6", + "metadata": { + "editable": true + }, + "source": [ + "Normally the latter is more straightforward to implement in a machine\n", + "larning library since there is less variation in the range of values\n", + "of $m$ and $n$.\n", + "\n", + "As mentioned above, most deep learning libraries implement\n", + "cross-correlation instead of convolution (although it is referred to as\n", + "convolution)" + ] + }, + { + "cell_type": "markdown", + "id": "aa0e3c87", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77113b34", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in more detail, simple example\n", + "\n", + "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n", + "and a $2\\times 2$ filter $W$ given by the following matrices" + ] + }, + { + "cell_type": "markdown", + "id": "d54278c7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02} \\\\\n", + " x_{10} & x_{11} & x_{12} \\\\\n", + "\t x_{20} & x_{21} & x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "597d1ef3", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c544ba40", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n", + "\t w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6c1b40b", + "metadata": { + "editable": true + }, + "source": [ + "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n", + "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n", + "\n", + "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n", + "\n", + "Here we perform the operation" + ] + }, + { + "cell_type": "markdown", + "id": "d8ee5cf0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5df35204", + "metadata": { + "editable": true + }, + "source": [ + "and obtain" + ] + }, + { + "cell_type": "markdown", + "id": "afe8a3ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\\\\n", + "\t x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a1c6848", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n", + "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as" + ] + }, + { + "cell_type": "markdown", + "id": "4506234a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f1b2fef4", + "metadata": { + "editable": true + }, + "source": [ + "and the new matrix" + ] + }, + { + "cell_type": "markdown", + "id": "6c372fa6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n", + " 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n", + "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\\\\n", + " 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "61ad1cf3", + "metadata": { + "editable": true + }, + "source": [ + "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "a18a70a2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b63a1613", + "metadata": { + "editable": true + }, + "source": [ + "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\\times 2$ output matrix." + ] + }, + { + "cell_type": "markdown", + "id": "8fa9fe57", + "metadata": { + "editable": true + }, + "source": [ + "## The convolution stage\n", + "\n", + "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n", + "order to reduce the dimensionality of an image, adds, in addition to\n", + "the weights and biases (to be trained by the back propagation\n", + "algorithm) that define the filters, two new hyperparameters, the so-called\n", + "**padding** $P$ and the stride $S$." + ] + }, + { + "cell_type": "markdown", + "id": "a30b6ced", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the number of parameters\n", + "\n", + "In the above example we have an input matrix of dimension $3\\times\n", + "3$. In general we call the input for an input volume and it is defined\n", + "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n", + "standard three color channels $D_1=3$.\n", + "\n", + "The above example has $W_1=H_1=3$ and $D_1=1$.\n", + "\n", + "When we introduce the filter we have the following additional hyperparameters\n", + "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n", + "\n", + "2. $F$ as the filter's spatial extent\n", + "\n", + "3. $S$ as the stride parameter\n", + "\n", + "4. $P$ as the padding parameter\n", + "\n", + "These parameters are defined by the architecture of the network and are not included in the training." + ] + }, + { + "cell_type": "markdown", + "id": "b38d040f", + "metadata": { + "editable": true + }, + "source": [ + "## New image (or volume)\n", + "\n", + "Acting with the filter on the input volume produces an output volume\n", + "which is defined by its width $W_2$, its height $H_2$ and its depth\n", + "$D_2$.\n", + "\n", + "These are defined by the following relations" + ] + }, + { + "cell_type": "markdown", + "id": "3b090ce0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "52fa4212", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dfa9a926", + "metadata": { + "editable": true + }, + "source": [ + "and $D_2=K$." + ] + }, + { + "cell_type": "markdown", + "id": "9bb02c26", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters to train, common settings\n", + "\n", + "With parameter sharing, the convolution involves thus for each filter $F\\times F\\times D_1$ weights plus one bias parameter.\n", + "\n", + "In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "d98e6808", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "601ecd16", + "metadata": { + "editable": true + }, + "source": [ + "parameters to train by back propagation.\n", + "\n", + "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n", + "\n", + "**Common settings.**\n", + "\n", + "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n", + "\n", + "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n", + "\n", + "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n", + "\n", + "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$" + ] + }, + { + "cell_type": "markdown", + "id": "3f87e148", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of CNN setups\n", + "\n", + "Let us assume we have an input volume $V$ given by an image of dimensionality\n", + "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n", + "\n", + "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n", + "\n", + "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n", + "of dimensionality $28\\times 28\\times 3$.\n", + "\n", + "The total number of parameters to train for each filter is then\n", + "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n", + "gives us $76$ parameters for each filter, leading to a total of $760$\n", + "parameters for the ten filters.\n", + "\n", + "How many parameters will a filter of dimensionality $3\\times 3$\n", + "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n", + "\n", + "Note that strides constitute a form of **subsampling**. As an alternative to\n", + "being interpreted as a measure of how much the kernel/filter is translated, strides\n", + "can also be viewed as how much of the output is retained. For instance, moving\n", + "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n", + "retaining only odd output elements." + ] + }, + { + "cell_type": "markdown", + "id": "45526eae", + "metadata": { + "editable": true + }, + "source": [ + "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A deep CNN

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "963177d2", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling\n", + "\n", + "In addition to discrete convolutions themselves, **pooling** operations\n", + "make up another important building block in CNNs. Pooling operations reduce\n", + "the size of feature maps by using some function to summarize subregions, such\n", + "as taking the average or the maximum value.\n", + "\n", + "Pooling works by sliding a window across the input and feeding the content of\n", + "the window to a **pooling function**. In some sense, pooling works very much\n", + "like a discrete convolution, but replaces the linear combination described by\n", + "the kernel with some other function." + ] + }, + { + "cell_type": "markdown", + "id": "f657465b", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling arithmetic\n", + "\n", + "In a neural network, pooling layers provide invariance to small translations of\n", + "the input. The most common kind of pooling is **max pooling**, which\n", + "consists in splitting the input in (usually non-overlapping) patches and\n", + "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n", + "mean or average pooling, which all share the same idea of aggregating the input\n", + "locally by applying a non-linearity to the content of some patches." + ] + }, + { + "cell_type": "markdown", + "id": "33142d01", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

Figure 1: A deep CNN

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "7e8ee265", + "metadata": { + "editable": true + }, + "source": [ + "## Building convolutional neural networks using Tensorflow and Keras\n", + "\n", + "As discussed above, CNNs are neural networks built from the assumption that the inputs\n", + "to the network are 2D images. This is important because the number of features or pixels in images\n", + "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network. \n", + "\n", + "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n", + "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n", + "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n", + "matrices, typically 1 for each color dimension (Red, Green, Blue)." + ] + }, + { + "cell_type": "markdown", + "id": "c4e2bc6f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting it up\n", + "\n", + "It means that to represent the entire\n", + "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:" + ] + }, + { + "cell_type": "markdown", + "id": "f8d6e5be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd170ded", + "metadata": { + "editable": true + }, + "source": [ + "## The MNIST dataset again\n", + "\n", + "The MNIST dataset consists of grayscale images with a pixel size of\n", + "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n", + "neuron in the first hidden layer.\n", + "\n", + "If we were to analyze images of size $128\\times 128$ we would require\n", + "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n", + "dealing with color images, as most images are, we have an image matrix\n", + "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n", + "meaning 3 times the number of weights $= 49152$ are required for every\n", + "single neuron in the first hidden layer." + ] + }, + { + "cell_type": "markdown", + "id": "5f8a4322", + "metadata": { + "editable": true + }, + "source": [ + "## Strong correlations\n", + "\n", + "Images typically have strong local correlations, meaning that a small\n", + "part of the image varies little from its neighboring regions. If for\n", + "example we have an image of a blue car, we can roughly assume that a\n", + "small blue part of the image is surrounded by other blue regions.\n", + "\n", + "Therefore, instead of connecting every single pixel to a neuron in the\n", + "first hidden layer, as we have previously done with deep neural\n", + "networks, we can instead connect each neuron to a small part of the\n", + "image (in all 3 RGB depth dimensions). The size of each small area is\n", + "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)." + ] + }, + { + "cell_type": "markdown", + "id": "bad994c1", + "metadata": { + "editable": true + }, + "source": [ + "## Layers of a CNN\n", + "\n", + "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth. \n", + "The input image is typically a square matrix of depth 3. \n", + "\n", + "A **convolution** is performed on the image which outputs\n", + "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n", + "\n", + "Each filter slides along the input image, taking the dot product\n", + "between each small part of the image and the filter, in all depth\n", + "dimensions. This is then passed through a non-linear function,\n", + "typically the **Rectified Linear (ReLu)** function, which serves as the\n", + "activation of the neurons in the first convolutional layer. This is\n", + "further passed through a **pooling layer**, which reduces the size of the\n", + "convolutional layer, e.g. by taking the maximum or average across some\n", + "small regions, and this serves as input to the next convolutional\n", + "layer." + ] + }, + { + "cell_type": "markdown", + "id": "3f9bf131", + "metadata": { + "editable": true + }, + "source": [ + "## Systematic reduction\n", + "\n", + "By systematically reducing the size of the input volume, through\n", + "convolution and pooling, the network should create representations of\n", + "small parts of the input, and then from them assemble representations\n", + "of larger areas. The final pooling layer is flattened to serve as\n", + "input to a hidden layer, such that each neuron in the final pooling\n", + "layer is connected to every single neuron in the hidden layer. This\n", + "then serves as input to the output layer, e.g. a softmax output for\n", + "classification." + ] + }, + { + "cell_type": "markdown", + "id": "625ace40", + "metadata": { + "editable": true + }, + "source": [ + "## Prerequisites: Collect and pre-process data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a3f06a64", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "# RGB images have a depth of 3\n", + "# our images are grayscale so they should have a depth of 1\n", + "inputs = inputs[:,:,:,np.newaxis]\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "n_inputs = len(inputs)\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "764e7143", + "metadata": { + "editable": true + }, + "source": [ + "## Importing Keras and Tensorflow" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "1b8fd15a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras import datasets, layers, models\n", + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "#from tensorflow.keras import Conv2D\n", + "#from tensorflow.keras import MaxPooling2D\n", + "#from tensorflow.keras import Flatten\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "# one-liner from scikit-learn library\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "markdown", + "id": "bf68c3f4", + "metadata": { + "editable": true + }, + "source": [ + "## Running with Keras" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d5a91d0e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n", + " n_filters, n_neurons_connected, n_categories,\n", + " eta, lmbd):\n", + " model = Sequential()\n", + " model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n", + " activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n", + " model.add(layers.Flatten())\n", + " model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n", + " \n", + " sgd = optimizers.SGD(lr=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model\n", + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "input_shape = X_train.shape[1:4]\n", + "receptive_field = 3\n", + "n_filters = 10\n", + "n_neurons_connected = 50\n", + "n_categories = 10\n", + "\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)" + ] + }, + { + "cell_type": "markdown", + "id": "8ff4d34b", + "metadata": { + "editable": true + }, + "source": [ + "## Final part" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "c1035646", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n", + " n_filters, n_neurons_connected, n_categories,\n", + " eta, lmbd)\n", + " CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = CNN.evaluate(X_test, Y_test)\n", + " \n", + " CNN_keras[i][j] = CNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "dcdee4b4", + "metadata": { + "editable": true + }, + "source": [ + "## Final visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "c34c4218", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " CNN = CNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9848777f", + "metadata": { + "editable": true + }, + "source": [ + "## The CIFAR01 data set\n", + "\n", + "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n", + "6,000 images in each class. The dataset is divided into 50,000\n", + "training images and 10,000 testing images. The classes are mutually\n", + "exclusive and there is no overlap between them." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "e3c34685", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "\n", + "from tensorflow.keras import datasets, layers, models\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# We import the data set\n", + "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n", + "\n", + "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n", + "train_images, test_images = train_images / 255.0, test_images / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "376a2959", + "metadata": { + "editable": true + }, + "source": [ + "## Verifying the data set\n", + "\n", + "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "fa4b303c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n", + " 'dog', 'frog', 'horse', 'ship', 'truck']\n", + "plt.figure(figsize=(10,10))\n", + "for i in range(25):\n", + " plt.subplot(5,5,i+1)\n", + " plt.xticks([])\n", + " plt.yticks([])\n", + " plt.grid(False)\n", + " plt.imshow(train_images[i], cmap=plt.cm.binary)\n", + " # The CIFAR labels happen to be arrays, \n", + " # which is why you need the extra index\n", + " plt.xlabel(class_names[train_labels[i][0]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8f717ab7", + "metadata": { + "editable": true + }, + "source": [ + "## Set up the model\n", + "\n", + "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n", + "\n", + "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "91013222", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model = models.Sequential()\n", + "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "\n", + "# Let's display the architecture of our model so far.\n", + "\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "64f3581b", + "metadata": { + "editable": true + }, + "source": [ + "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer." + ] + }, + { + "cell_type": "markdown", + "id": "07774fd6", + "metadata": { + "editable": true + }, + "source": [ + "## Add Dense layers on top\n", + "\n", + "To complete our model, you will feed the last output tensor from the\n", + "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n", + "to perform classification. Dense layers take vectors as input (which\n", + "are 1D), while the current output is a 3D tensor. First, you will\n", + "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n", + "layers on top. CIFAR has 10 output classes, so you use a final Dense\n", + "layer with 10 outputs and a softmax activation." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a6dc1206", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model.add(layers.Flatten())\n", + "model.add(layers.Dense(64, activation='relu'))\n", + "model.add(layers.Dense(10))\n", + "Here's the complete architecture of our model.\n", + "\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "71ef5715", + "metadata": { + "editable": true + }, + "source": [ + "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers." + ] + }, + { + "cell_type": "markdown", + "id": "596eaf51", + "metadata": { + "editable": true + }, + "source": [ + "## Compile and train the model" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1c8159af", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model.compile(optimizer='adam',\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])\n", + "\n", + "history = model.fit(train_images, train_labels, epochs=10, \n", + " validation_data=(test_images, test_labels))" + ] + }, + { + "cell_type": "markdown", + "id": "23913f02", + "metadata": { + "editable": true + }, + "source": [ + "## Finally, evaluate the model" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "942cf136", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "plt.plot(history.history['accuracy'], label='accuracy')\n", + "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('Accuracy')\n", + "plt.ylim([0.5, 1])\n", + "plt.legend(loc='lower right')\n", + "\n", + "test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)\n", + "\n", + "print(test_acc)" + ] + }, + { + "cell_type": "markdown", + "id": "9cf8f35b", + "metadata": { + "editable": true + }, + "source": [ + "## Building code using Pytorch\n", + "\n", + "This code loads and normalizes the MNIST dataset. Thereafter it defines a CNN architecture with:\n", + "1. Two convolutional layers\n", + "\n", + "2. Max pooling\n", + "\n", + "3. Dropout for regularization\n", + "\n", + "4. Two fully connected layers\n", + "\n", + "It uses the Adam optimizer and for cost function it employs the\n", + "Cross-Entropy function. It trains for 10 epochs.\n", + "You can modify the architecture (number of layers, channels, dropout\n", + "rate) or training parameters (learning rate, batch size, epochs) to\n", + "experiment with different configurations." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3f08edcf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "from torchvision import datasets, transforms\n", + "\n", + "# Set device\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "# Define transforms\n", + "transform = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.1307,), (0.3081,))\n", + "])\n", + "\n", + "# Load datasets\n", + "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "# Create data loaders\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define CNN model\n", + "class CNN(nn.Module):\n", + " def __init__(self):\n", + " super(CNN, self).__init__()\n", + " self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n", + " self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n", + " self.pool = nn.MaxPool2d(2, 2)\n", + " self.fc1 = nn.Linear(64*7*7, 1024)\n", + " self.fc2 = nn.Linear(1024, 10)\n", + " self.dropout = nn.Dropout(0.5)\n", + "\n", + " def forward(self, x):\n", + " x = self.pool(F.relu(self.conv1(x)))\n", + " x = self.pool(F.relu(self.conv2(x)))\n", + " x = x.view(-1, 64*7*7)\n", + " x = self.dropout(F.relu(self.fc1(x)))\n", + " x = self.fc2(x)\n", + " return x\n", + "\n", + "# Initialize model, loss function, and optimizer\n", + "model = CNN().to(device)\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.Adam(model.parameters(), lr=0.001)\n", + "\n", + "# Training loop\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " model.train()\n", + " running_loss = 0.0\n", + " for batch_idx, (data, target) in enumerate(train_loader):\n", + " data, target = data.to(device), target.to(device)\n", + " optimizer.zero_grad()\n", + " outputs = model(data)\n", + " loss = criterion(outputs, target)\n", + " loss.backward()\n", + " optimizer.step()\n", + " running_loss += loss.item()\n", + "\n", + " print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n", + "\n", + "# Testing the model\n", + "model.eval()\n", + "correct = 0\n", + "total = 0\n", + "with torch.no_grad():\n", + " for data, target in test_loader:\n", + " data, target = data.to(device), target.to(device)\n", + " outputs = model(data)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += target.size(0)\n", + " correct += (predicted == target).sum().item()\n", + "\n", + "print(f'Test Accuracy: {100 * correct / total:.2f}%')" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/_build/html/chapter1.html b/doc/LectureNotes/_build/html/chapter1.html index c323aa723..ca7455c5a 100644 --- a/doc/LectureNotes/_build/html/chapter1.html +++ b/doc/LectureNotes/_build/html/chapter1.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter10.html b/doc/LectureNotes/_build/html/chapter10.html index be40b4d4a..612413b38 100644 --- a/doc/LectureNotes/_build/html/chapter10.html +++ b/doc/LectureNotes/_build/html/chapter10.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter11.html b/doc/LectureNotes/_build/html/chapter11.html index 21280b666..25c63e7e2 100644 --- a/doc/LectureNotes/_build/html/chapter11.html +++ b/doc/LectureNotes/_build/html/chapter11.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter12.html b/doc/LectureNotes/_build/html/chapter12.html index b3454b65b..e3ff223d2 100644 --- a/doc/LectureNotes/_build/html/chapter12.html +++ b/doc/LectureNotes/_build/html/chapter12.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter13.html b/doc/LectureNotes/_build/html/chapter13.html index c8c0990c6..c366b8259 100644 --- a/doc/LectureNotes/_build/html/chapter13.html +++ b/doc/LectureNotes/_build/html/chapter13.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter2.html b/doc/LectureNotes/_build/html/chapter2.html index 2b3176071..814f23a7c 100644 --- a/doc/LectureNotes/_build/html/chapter2.html +++ b/doc/LectureNotes/_build/html/chapter2.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter3.html b/doc/LectureNotes/_build/html/chapter3.html index a06cc0460..8accd5d6d 100644 --- a/doc/LectureNotes/_build/html/chapter3.html +++ b/doc/LectureNotes/_build/html/chapter3.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter4.html b/doc/LectureNotes/_build/html/chapter4.html index fdb544a13..90002fb38 100644 --- a/doc/LectureNotes/_build/html/chapter4.html +++ b/doc/LectureNotes/_build/html/chapter4.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter5.html b/doc/LectureNotes/_build/html/chapter5.html index 7e4464070..a264f255a 100644 --- a/doc/LectureNotes/_build/html/chapter5.html +++ b/doc/LectureNotes/_build/html/chapter5.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter6.html b/doc/LectureNotes/_build/html/chapter6.html index 8f9aec998..178cf9a00 100644 --- a/doc/LectureNotes/_build/html/chapter6.html +++ b/doc/LectureNotes/_build/html/chapter6.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter7.html b/doc/LectureNotes/_build/html/chapter7.html index 53f16bfdf..e9af01fc7 100644 --- a/doc/LectureNotes/_build/html/chapter7.html +++ b/doc/LectureNotes/_build/html/chapter7.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter8.html b/doc/LectureNotes/_build/html/chapter8.html index 47a013e0c..a068a54b0 100644 --- a/doc/LectureNotes/_build/html/chapter8.html +++ b/doc/LectureNotes/_build/html/chapter8.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapter9.html b/doc/LectureNotes/_build/html/chapter9.html index 39622eb94..d98e219ce 100644 --- a/doc/LectureNotes/_build/html/chapter9.html +++ b/doc/LectureNotes/_build/html/chapter9.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/chapteroptimization.html b/doc/LectureNotes/_build/html/chapteroptimization.html index 44829966d..7c1d042b7 100644 --- a/doc/LectureNotes/_build/html/chapteroptimization.html +++ b/doc/LectureNotes/_build/html/chapteroptimization.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/clustering.html b/doc/LectureNotes/_build/html/clustering.html index 7175fdd80..cb49d05d2 100644 --- a/doc/LectureNotes/_build/html/clustering.html +++ b/doc/LectureNotes/_build/html/clustering.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/exercisesweek34.html b/doc/LectureNotes/_build/html/exercisesweek34.html index bc02792c6..67011b8c8 100644 --- a/doc/LectureNotes/_build/html/exercisesweek34.html +++ b/doc/LectureNotes/_build/html/exercisesweek34.html @@ -228,10 +228,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/exercisesweek35.html b/doc/LectureNotes/_build/html/exercisesweek35.html index ddaf0e727..a3f715f2c 100644 --- a/doc/LectureNotes/_build/html/exercisesweek35.html +++ b/doc/LectureNotes/_build/html/exercisesweek35.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    @@ -557,7 +592,7 @@

    Exercise 4 - Fitting a polynomial
    n = 100
     x = np.linspace(-3, 3, n)
    -y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)
     
    diff --git a/doc/LectureNotes/_build/html/exercisesweek36.html b/doc/LectureNotes/_build/html/exercisesweek36.html index 8a389f299..275e30f1e 100644 --- a/doc/LectureNotes/_build/html/exercisesweek36.html +++ b/doc/LectureNotes/_build/html/exercisesweek36.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    @@ -475,7 +510,7 @@

    Exercise 3 - Scaling data
    n = 100
     x = np.linspace(-3, 3, n)
    -y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)
     
    diff --git a/doc/LectureNotes/_build/html/exercisesweek37.html b/doc/LectureNotes/_build/html/exercisesweek37.html index a366a3c71..2595944b9 100644 --- a/doc/LectureNotes/_build/html/exercisesweek37.html +++ b/doc/LectureNotes/_build/html/exercisesweek37.html @@ -62,7 +62,7 @@ - + @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    @@ -413,7 +448,8 @@

    Contents

    -
    + +

    Exercises week 37#

    Implementing gradient descent for Ridge and ordinary Least Squares Regression

    Date: September 8-12, 2025

    @@ -454,23 +490,23 @@

    1a)#

    term, the data is shifted such that the intercept is effectively 0 . (In practice, one could include an intercept in the model and not penalize it, but here we simplify by centering.) -Choose \(n=100\) data points and set up \(\boldsymbol{x}, \)\boldsymbol{y}\( and the design matrix \)\boldsymbol{X}$.

    +Choose \(n=100\) data points and set up \(\boldsymbol{x}\), \(\boldsymbol{y}\) and the design matrix \(\boldsymbol{X}\).

    -
    # Standardize features (zero mean, unit variance for each feature)
    -X_mean = X.mean(axis=0)
    -X_std = X.std(axis=0)
    -X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
    -X_norm = (X - X_mean) / X_std
    -
    -# Center the target to zero mean (optional, to simplify intercept handling)
    -y_mean = ?
    -y_centered = ?
    +
    # Standardize features (zero mean, unit variance for each feature)
    +X_mean = X.mean(axis=0)
    +X_std = X.std(axis=0)
    +X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
    +X_norm = (X - X_mean) / X_std
    +
    +# Center the target to zero mean (optional, to simplify intercept handling)
    +y_mean = ?
    +y_centered = ?
     
    -

    Fill in the necessary details.

    +

    Fill in the necessary details. Do we need to center the \(y\)-values?

    After this preprocessing, each column of \(\boldsymbol{X}_{\mathrm{norm}}\) has mean zero and standard deviation \(1\) and \(\boldsymbol{y}_{\mathrm{centered}}\) has mean 0. This makes the optimization landscape nicer and ensures the regularization penalty \(\lambda \sum_j @@ -486,16 +522,18 @@

    Exercise 2, calculate the gradients\(\boldsymbol{\theta}\)#

    -
    # Set regularization parameter, either a single value or a vector of values
    -lambda = ?
    +
    # Set regularization parameter, either a single value or a vector of values
    +# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called "anonymous functions" or "lambda functions."
    +lam = ?
    +
     
    -# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y
    -I = np.eye(n_features)
    -theta_closed_formRidge = ?
    -theta_closed_formOLS = ?
    +# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y
    +I = np.eye(n_features)
    +theta_closed_formRidge = ?
    +theta_closed_formOLS = ?
     
    -print("Closed-form Ridge coefficients:", theta_closed_form)
    -print("Closed-form OLS coefficients:", theta_closed_form)
    +print("Closed-form Ridge coefficients:", theta_closed_form)
    +print("Closed-form OLS coefficients:", theta_closed_form)
     
    @@ -503,7 +541,7 @@

    Exercise 3, using the analytical formulae for OLS and Ridge regression to fi

    This computes the Ridge and OLS regression coefficients directly. The identity matrix \(I\) has the same size as \(X^T X\). It adds \(\lambda\) to the diagonal of \(X^T X\) for Ridge regression. We then invert this matrix and multiply by \(X^T y\). The result -for \(\boldsymbol{\theta}\) is a NumPy array of shape (n\(\_\)features,) containing the +for \(\boldsymbol{\theta}\) is a NumPy array of shape (n\(\_\)features,) containing the fitted parameters \(\boldsymbol{\theta}\).

    3a)#

    @@ -521,54 +559,45 @@

    Exercise 4, Implementing the simplest form for gradient descent\(n\) and \(p\) are so large that the closed-form might be too slow or memory-intensive. We derive the gradients from the cost functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to -the parameters \(\boldsymbol{\theta}\) and set up (using the template below) your own gradient descent code for OLS and Ridge regression.

    +the parameters \(\boldsymbol{\theta}\) and set up (using the template below) your own gradient descent code for OLS and Ridge regression.

    Below is a template code for gradient descent implementation of ridge:

    -
    # Gradient descent parameters, learning rate eta first
    -eta = 0.1
    -# Then number of iterations
    -num_iters = 1000
    -
    -# Initialize weights for gradient descent
    -theta = np.zeros(n_features)
    -
    -# Arrays to store history for plotting
    -cost_history = np.zeros(num_iters)
    -
    -# Gradient descent loop
    -m = n_samples  # number of data points
    -for t in range(num_iters):
    -    # Compute prediction error
    -    error = X_norm.dot(theta) - y_centered 
    -    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring
    -    cost_OLS = ?
    -    cost_Ridge = ?
    -    # You could add a history for both methods (optional)
    -    cost_history[t] = ?
    -    # Compute gradients for OSL and Ridge
    -    grad_OLS = ?
    -    grad_Ridge = ?
    -    # Update parameters theta
    -    theta_gdOLS = ?
    -    theta_gdRidge = ? 
    -
    -# After the loop, theta contains the fitted coefficients
    -theta_gdOLS = ?
    -theta_gdRidge = ?
    -print("Gradient Descent OLS coefficients:", theta_gdOLS)
    -print("Gradient Descent Ridge coefficients:", theta_gdRidge)
    +
    # Gradient descent parameters, learning rate eta first
    +eta = 0.1
    +# Then number of iterations
    +num_iters = 1000
    +
    +# Initialize weights for gradient descent
    +theta = np.zeros(n_features)
    +
    +# Gradient descent loop
    +for t in range(num_iters):
    +    # Compute gradients for OSL and Ridge
    +    grad_OLS = ?
    +    grad_Ridge = ?
    +    # Update parameters theta
    +    theta_gdOLS = ?
    +    theta_gdRidge = ? 
    +
    +# After the loop, theta contains the fitted coefficients
    +theta_gdOLS = ?
    +theta_gdRidge = ?
    +print("Gradient Descent OLS coefficients:", theta_gdOLS)
    +print("Gradient Descent Ridge coefficients:", theta_gdRidge)
     

    4a)#

    -

    Discuss the results as function of the learning rate parameters and the number of iterations.

    +

    Write first a gradient descent code for OLS only using the above template. +Discuss the results as function of the learning rate parameters and the number of iterations

    4b)#

    -

    Try to add a stopping parameter as function of the number iterations and the difference between the new and old \(\theta\) values. How would you define a stopping criterion?

    +

    Write then a similar code for Ridge regression using the above template. +Try to add a stopping parameter as function of the number iterations and the difference between the new and old \(\theta\) values. How would you define a stopping criterion?

    @@ -586,24 +615,24 @@

    Exercise 5, Ridge regression and a new Synthetic Dataset
    -
    import numpy as np
    +
    import numpy as np
     
    -# Set random seed for reproducibility
    -np.random.seed(0)
    +# Set random seed for reproducibility
    +np.random.seed(0)
     
    -# Define dataset size
    -n_samples = 100
    -n_features = 10
    +# Define dataset size
    +n_samples = 100
    +n_features = 10
     
    -# Define true coefficients (sparse linear relationship)
    -theta_true = np.array([5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0])
    +# Define true coefficients (sparse linear relationship)
    +theta_true = np.array([5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0])
     
    -# Generate feature matrix X (n_samples x n_features) with random values
    -X = np.random.randn(n_samples, n_features)  # standard normal distribution
    +# Generate feature matrix X (n_samples x n_features) with random values
    +X = np.random.randn(n_samples, n_features)  # standard normal distribution
     
    -# Generate target values y with a linear combination of X and theta_true, plus noise
    -noise = 0.5 * np.random.randn(n_samples)    # Gaussian noise
    -y = X.dot @ theta_true + noise
    +# Generate target values y with a linear combination of X and theta_true, plus noise
    +noise = 0.5 * np.random.randn(n_samples)  # Gaussian noise
    +y = X.dot @ theta_true + noise
     
    @@ -623,7 +652,7 @@

    Exercise 5, Ridge regression and a new Synthetic Dataset

    next

    -

    Project 1 on Machine Learning, deadline October 6 (midnight), 2025

    +

    Week 37: Gradient descent methods

    diff --git a/doc/LectureNotes/_build/html/exercisesweek38.html b/doc/LectureNotes/_build/html/exercisesweek38.html new file mode 100644 index 000000000..5f9730b95 --- /dev/null +++ b/doc/LectureNotes/_build/html/exercisesweek38.html @@ -0,0 +1,779 @@ + + + + + + + + + + + Exercises week 38 — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + + + + + + +
    + +
    +

    Exercises week 38#

    +
    +

    September 15-19#

    +
    +
    +

    Resampling and the Bias-Variance Trade-off#

    +
    +

    Learning goals#

    +

    After completing these exercises, you will know how to

    +
      +
    • Derive expectation and variances values related to linear regression

    • +
    • Compute expectation and variances values related to linear regression

    • +
    • Compute and evaluate the trade-off between bias and variance of a model

    • +
    +
    +
    +

    Deliverables#

    +

    Complete the following exercises while working in a jupyter notebook. Then, in canvas, include

    + +
    +
    +
    +

    Use the books!#

    +

    This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer).

    +

    For more discussions on Ridge regression and calculation of expectation values, Wessel van Wieringen’s article is highly recommended.

    +

    The exercises this week are also a part of project 1 and can be reused in the theory part of the project.

    +
    +

    Definitions#

    +

    We assume that there exists a continuous function \(f(\boldsymbol{x})\) and a normal distributed error \(\boldsymbol{\varepsilon}\sim N(0, \sigma^2)\) which describes our data

    +
    +\[ +\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon} +\]
    +

    We further assume that this continous function can be modeled with a linear model \(\mathbf{\tilde{y}}\) of some features \(\mathbf{X}\).

    +
    +\[ +\boldsymbol{y} = \boldsymbol{\tilde{y}} + \boldsymbol{\varepsilon} = \boldsymbol{X}\boldsymbol{\beta} +\boldsymbol{\varepsilon} +\]
    +

    We therefore get that our data \(\boldsymbol{y}\) has an expectation value \(\boldsymbol{X}\boldsymbol{\beta}\) and variance \(\sigma^2\), that is \(\boldsymbol{y}\) follows a normal distribution with mean value \(\boldsymbol{X}\boldsymbol{\beta}\) and variance \(\sigma^2\).

    +
    +
    +
    +

    Exercise 1: Expectation values for ordinary least squares expressions#

    +

    a) With the expressions for the optimal parameters \(\boldsymbol{\hat{\beta}_{OLS}}\) show that

    +
    +\[ +\mathbb{E}(\boldsymbol{\hat{\beta}_{OLS}}) = \boldsymbol{\beta}. +\]
    +

    b) Show that the variance of \(\boldsymbol{\hat{\beta}_{OLS}}\) is

    +
    +\[ +\mathbf{Var}(\boldsymbol{\hat{\beta}_{OLS}}) = \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1}. +\]
    +

    We can use the last expression when we define a confidence interval for the parameters \(\boldsymbol{\hat{\beta}_{OLS}}\). +A given parameter \({\boldsymbol{\hat{\beta}_{OLS}}}_j\) is given by the diagonal matrix element of the above matrix.

    +
    +
    +

    Exercise 2: Expectation values for Ridge regression#

    +

    a) With the expressions for the optimal parameters \(\boldsymbol{\hat{\beta}_{Ridge}}\) show that

    +
    +\[ +\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta} +\]
    +

    We see that \(\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \mathbb{E} \big[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}\big ]\) for any \(\lambda > 0\).

    +

    b) Show that the variance is

    +
    +\[ +\mathbf{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1} \mathbf{X}^{T}\mathbf{X} \{ [ \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T} +\]
    +

    We see that if the parameter \(\lambda\) goes to infinity then the variance of the Ridge parameters \(\boldsymbol{\beta}\) goes to zero.

    +
    +
    +

    Exercise 3: Deriving the expression for the Bias-Variance Trade-off#

    +

    The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.

    +

    The parameters \(\boldsymbol{\hat{\beta}_{OLS}}\) are found by optimizing the mean squared error via the so-called cost function

    +
    +\[ +C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right] +\]
    +

    a) Show that you can rewrite this into an expression which contains

    +
      +
    • the variance of the model (the variance term)

    • +
    • the expected deviation of the mean of the model from the true data (the bias term)

    • +
    • the variance of the noise

    • +
    +

    In other words, show that:

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, +\]
    +

    with

    +
    +\[ +\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right], +\]
    +

    and

    +
    +\[ +\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2. +\]
    +

    In order to arrive at the equation for the bias, we have to approximate the unknown function \(f\) with the output/target values \(y\).

    +

    b) Explain what the terms mean and discuss their interpretations.

    +
    +
    +

    Exercise 4: Computing the Bias and Variance#

    +

    Before you compute the bias and variance of a real model for different complexities, let’s for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.

    +

    a) Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.

    +
    +
    +
    import numpy as np
    +
    +n = 100
    +bootstraps = 1000
    +
    +predictions = np.random.rand(bootstraps, n) * 10 + 10
    +# The definition of targets has been updated, and was wrong earlier in the week.
    +targets = np.random.rand(1, n)
    +
    +mse = ...
    +bias = ...
    +variance = ...
    +
    +
    +
    +
    +

    b) Change the prediction values in some way to increase the bias while decreasing the variance.

    +

    c) Change the prediction values in some way to increase the variance while decreasing the bias.

    +

    d) Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.

    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.preprocessing import (
    +    PolynomialFeatures,
    +)  # use the fit_transform method of the created object!
    +from sklearn.linear_model import LinearRegression
    +from sklearn.metrics import mean_squared_error
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +
    +
    +
    +
    +
    +
    +
    n = 100
    +bootstraps = 1000
    +
    +x = np.linspace(-3, 3, n)
    +y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)
    +
    +biases = []
    +variances = []
    +mses = []
    +
    +# for p in range(1, 5):
    +#    predictions = ...
    +#    targets = ...
    +#
    +#    X = ...
    +#    X_train, X_test, y_train, y_test = ...
    +#    for b in range(bootstraps):
    +#        X_train_re, y_train_re = ...
    +#
    +#        # fit your model on the sampled data
    +#
    +#        # make predictions on the test data
    +#        predictions[b, :] =
    +#        targets[b, :] =
    +#
    +#    biases.append(...)
    +#    variances.append(...)
    +#    mses.append(...)
    +
    +
    +
    +
    +

    e) Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).

    +

    f) Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).

    +
    +
    +

    Exercise 5: Interpretation of scaling and metrics#

    +

    In this course, we often ask you to scale data and compute various metrics. Although these practices are “standard” in the field, we will require you to demonstrate an understanding of why you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.

    +

    First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.

    +

    Briefly answer the following:

    +

    a) Why do we scale data?

    +

    b) Why does the OLS method give practically equivelent models on scaled and unscaled data?

    +

    c) Why does the Ridge method not give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?

    +

    d) Why do we say that the Ridge method gives a biased model?

    +

    e) Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?

    +

    f) Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?

    +

    g) Give interpretations of the following R2 scores: 0, 0.5, 1.

    +

    h) What is an advantage of the R2 score over the MSE?

    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/exercisesweek39.html b/doc/LectureNotes/_build/html/exercisesweek39.html new file mode 100644 index 000000000..e0b03141c --- /dev/null +++ b/doc/LectureNotes/_build/html/exercisesweek39.html @@ -0,0 +1,639 @@ + + + + + + + + + + + Exercises week 39 — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + + + + + + +
    + +
    +

    Exercises week 39#

    +
    +

    Getting started with project 1#

    +

    The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.

    +

    A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.

    +
    +

    Learning goals#

    +

    After completing these exercises, you will know how to

    +
      +
    • Create a properly formatted report in Overleaf

    • +
    • Select and present graphs for a scientific report

    • +
    • Write an abstract and introduction for a scientific report

    • +
    +
    +
    +

    Deliverables#

    +

    Complete the following exercises while working in an Overleaf project. Then, in canvas, include

    +
      +
    • An exported PDF of the report draft you have been working on.

    • +
    • A comment linking to the github repository used in exercise 4.

    • +
    +
    +
    +
    +

    Exercise 1: Creating the report document#

    +

    We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.

    +

    a) Create an account on Overleaf.com, or log in using SSO with your UiO email.

    +

    b) Download this template project.

    +

    c) Create a new Overleaf project with the correct formatting by uploading the template project.

    +

    d) Read the general guideline for writing a report, which can be found at CompPhysics/MachineLearning.

    +

    e) Look at the provided example of an earlier project, found at CompPhysics/MachineLearning

    +
    +
    +

    Exercise 2: Adding good figures#

    +

    a) Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.

    +

    b) Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.

    +

    c) Refer to the figure in your text using \ref.

    +

    d) Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.

    +

    e) Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.

    +
    +
    +

    Exercise 3: Writing an abstract and introduction#

    +

    Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.

    +

    a) Read the guidelines on abstract and introduction before you start.

    +

    b) Write an abstract for project 1 in your report.

    +

    c) Write an introduction for project 1 in your report.

    +
    +
    +

    Exercise 4: Making the code available and presentable#

    +

    A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.

    +

    a) Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.

    +

    b) Add a PDF of the report to this repository, after completing exercises 1-3

    +

    c) Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.

    +

    d) Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.

    +

    e) Create a README file in the repository or project folder with

    +
      +
    • the name of the group members

    • +
    • a short description of the project

    • +
    • a description of how to install the required packages to run your code from a requirements.txt file

    • +
    • names and descriptions of the various notebooks in the Code folder and the results they produce

    • +
    +
    +
    +

    Exercise 5: Referencing#

    +

    a) Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.

    +

    b) Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn

    +

    c) Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.

    +

    d) At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.

    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/exercisesweek41.html b/doc/LectureNotes/_build/html/exercisesweek41.html new file mode 100644 index 000000000..9ede978a5 --- /dev/null +++ b/doc/LectureNotes/_build/html/exercisesweek41.html @@ -0,0 +1,975 @@ + + + + + + + + + + + Exercises week 41 — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + + + + + + +
    + + + +
    +

    Exercises week 41#

    +

    October 6-10, 2025

    +

    Date: Deadline is Friday October 10 at midnight

    +
    +
    +

    Overarching aims of the exercises this week#

    +

    This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!

    +

    We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.

    +

    If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.

    +

    First, here are some functions you are going to need, don’t change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.

    +
    +
    +
    import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later
    +from sklearn import datasets
    +import matplotlib.pyplot as plt
    +from sklearn.metrics import accuracy_score
    +
    +
    +# Defining some activation functions
    +def ReLU(z):
    +    return np.where(z > 0, z, 0)
    +
    +
    +def sigmoid(z):
    +    return 1 / (1 + np.exp(-z))
    +
    +
    +def softmax(z):
    +    """Compute softmax values for each set of scores in the rows of the matrix z.
    +    Used with batched input data."""
    +    e_z = np.exp(z - np.max(z, axis=0))
    +    return e_z / np.sum(e_z, axis=1)[:, np.newaxis]
    +
    +
    +def softmax_vec(z):
    +    """Compute softmax values for each set of scores in the vector z.
    +    Use this function when you use the activation function on one vector at a time"""
    +    e_z = np.exp(z - np.max(z))
    +    return e_z / np.sum(e_z)
    +
    +
    +
    +
    +
    +
    +

    Exercise 1#

    +

    In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!

    +
    +
    +
    np.random.seed(2024)
    +
    +x = np.random.randn(2)  # network input. This is a single input with two features
    +W1 = np.random.randn(4, 2)  # first layer weights
    +
    +
    +
    +
    +

    a) Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?

    +

    b) Define the bias of the first layer, b1with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)

    +
    +
    +
    b1 = ...
    +
    +
    +
    +
    +

    c) Compute the intermediary z1 for the first layer

    +
    +
    +
    z1 = ...
    +
    +
    +
    +
    +

    d) Compute the activation a1 for the first layer using the ReLU activation function defined earlier.

    +
    +
    +
    a1 = ...
    +
    +
    +
    +
    +

    Confirm that you got the correct activation with the test below. Make sure that you define b1 with the randn function right after you define W1.

    +
    +
    +
    sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])
    +
    +print(np.allclose(a1, sol1))
    +
    +
    +
    +
    +
    +
    +

    Exercise 2#

    +

    Now we will add a layer to the network with an output of length 8 and ReLU activation.

    +

    a) What is the input of the second layer? What is its shape?

    +

    b) Define the weight and bias of the second layer with the right shapes.

    +
    +
    +
    W2 = ...
    +b2 = ...
    +
    +
    +
    +
    +

    c) Compute the intermediary z2 and activation a2 for the second layer.

    +
    +
    +
    z2 = ...
    +a2 = ...
    +
    +
    +
    +
    +

    Confirm that you got the correct activation shape with the test below.

    +
    +
    +
    print(
    +    np.allclose(np.exp(len(a2)), 2980.9579870417283)
    +)  # This should evaluate to True if a2 has the correct shape :)
    +
    +
    +
    +
    +
    +
    +

    Exercise 3#

    +

    We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.

    +

    a) Complete the function below so that it returns a list layers of weight and bias tuples (W, b) for each layer, in order, with the correct shapes that we can use later as our network parameters.

    +
    +
    +
    def create_layers(network_input_size, layer_output_sizes):
    +    layers = []
    +
    +    i_size = network_input_size
    +    for layer_output_size in layer_output_sizes:
    +        W = ...
    +        b = ...
    +        layers.append((W, b))
    +
    +        i_size = layer_output_size
    +    return layers
    +
    +
    +
    +
    +

    b) Comple the function below so that it evaluates the intermediary z and activation a for each layer, with ReLU actication, and returns the final activation a. This is the complete feed-forward pass, a full neural network!

    +
    +
    +
    def feed_forward_all_relu(layers, input):
    +    a = input
    +    for W, b in layers:
    +        z = ...
    +        a = ...
    +    return a
    +
    +
    +
    +
    +

    c) Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.

    +
    +
    +
    input_size = ...
    +layer_output_sizes = [...]
    +
    +x = np.random.rand(input_size)
    +layers = ...
    +predict = ...
    +print(predict)
    +
    +
    +
    +
    +

    d) Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?

    +
    +
    +

    Exercise 4 - Custom activation for each layer#

    +

    So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.

    +

    a) Complete the feed_forward function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.

    +
    +
    +
    def feed_forward(input, layers, activation_funcs):
    +    a = input
    +    for (W, b), activation_func in zip(layers, activation_funcs):
    +        z = ...
    +        a = ...
    +    return a
    +
    +
    +
    +
    +

    b) You are now given a list with three activation functions, two ReLU and one sigmoid. (Don’t call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd’s numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)

    +

    Evaluate a network with three layers and these activation functions.

    +
    +
    +
    network_input_size = ...
    +layer_output_sizes = [...]
    +activation_funcs = [ReLU, ReLU, sigmoid]
    +layers = ...
    +
    +x = np.random.randn(network_input_size)
    +feed_forward(x, layers, activation_funcs)
    +
    +
    +
    +
    +

    c) How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?

    +
    +
    +

    Exercise 5 - Processing multiple inputs at once#

    +

    So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.

    +

    To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.

    +

    a) Complete the function create_layers_batch so that the weight matrix is the transpose of what it was when you only sent in one input at a time.

    +
    +
    +
    def create_layers_batch(network_input_size, layer_output_sizes):
    +    layers = []
    +
    +    i_size = network_input_size
    +    for layer_output_size in layer_output_sizes:
    +        W = ...
    +        b = ...
    +        layers.append((W, b))
    +
    +        i_size = layer_output_size
    +    return layers
    +
    +
    +
    +
    +

    b) Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function feed_forward_batch so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)

    +
    +
    +
    inputs = np.random.rand(1000, 4)
    +
    +
    +def feed_forward_batch(inputs, layers, activation_funcs):
    +    a = inputs
    +    for (W, b), activation_func in zip(layers, activation_funcs):
    +        z = ...
    +        a = ...
    +    return a
    +
    +
    +
    +
    +

    c) Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.

    +
    +
    +
    network_input_size = ...
    +layer_output_sizes = [...]
    +activation_funcs = [...]
    +layers = create_layers_batch(network_input_size, layer_output_sizes)
    +
    +x = np.random.randn(network_input_size)
    +feed_forward_batch(inputs, layers, activation_funcs)
    +
    +
    +
    +
    +

    You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.

    +
    +
    +

    Exercise 6 - Predicting on real data#

    +

    You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).

    +

    This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.

    +
    +
    +
    iris = datasets.load_iris()
    +
    +_, ax = plt.subplots()
    +scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)
    +ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
    +_ = ax.legend(
    +    scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
    +)
    +
    +
    +
    +
    +
    +
    +
    inputs = iris.data
    +
    +# Since each prediction is a vector with a score for each of the three types of flowers,
    +# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.
    +targets = np.zeros((len(iris.data), 3))
    +for i, t in enumerate(iris.target):
    +    targets[i, t] = 1
    +
    +
    +def accuracy(predictions, targets):
    +    one_hot_predictions = np.zeros(predictions.shape)
    +
    +    for i, prediction in enumerate(predictions):
    +        one_hot_predictions[i, np.argmax(prediction)] = 1
    +    return accuracy_score(one_hot_predictions, targets)
    +
    +
    +
    +
    +

    a) What should the input size for the network be with this dataset? What should the output size of the last layer be?

    +

    b) Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 “nodes”, the second has the number of nodes you found in exercise a). Softmax returns a “probability distribution”, in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.

    +
    +
    +
    ...
    +layers = ...
    +
    +
    +
    +
    +

    c) Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.

    +
    +
    +
    predictions = feed_forward_batch(inputs, layers, activation_funcs)
    +
    +
    +
    +
    +

    d) Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.

    +
    +
    +
    print(accuracy(predictions, targets))
    +
    +
    +
    +
    +
    +
    +

    Exercise 7 - Training on real data (Optional)#

    +

    To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.

    +

    Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is “most certain” on the correct target.

    +
    +
    +
    def cross_entropy(predict, target):
    +    return np.sum(-target * np.log(predict))
    +
    +
    +def cost(input, layers, activation_funcs, target):
    +    predict = feed_forward_batch(input, layers, activation_funcs)
    +    return cross_entropy(predict, target)
    +
    +
    +
    +
    +

    To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these

    +
    +\[ +\frac{\partial C}{\partial W}, \frac{\partial C}{\partial b} +\]
    +

    Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.

    +
    +
    +
    from autograd import grad
    +
    +
    +gradient_func = grad(
    +    cost, 1
    +)  # Taking the gradient wrt. the second input to the cost function, i.e. the layers
    +
    +
    +
    +
    +

    a) What shape should the gradient of the cost function wrt. weights and biases be?

    +

    b) Use the gradient_func function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what’s inside. What does the grad func from autograd actually do?

    +
    +
    +
    layers_grad = gradient_func(
    +    inputs, layers, activation_funcs, targets
    +)  # Don't change this
    +
    +
    +
    +
    +

    c) Finish the train_network function.

    +
    +
    +
    def train_network(
    +    inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100
    +):
    +    for i in range(epochs):
    +        layers_grad = gradient_func(inputs, layers, activation_funcs, targets)
    +        for (W, b), (W_g, b_g) in zip(layers, layers_grad):
    +            W -= ...
    +            b -= ...
    +
    +
    +
    +
    +

    e) What do we call the gradient method used above?

    +

    d) Train your network and see how the accuracy changes! Make a plot if you want.

    +
    +
    +
    ...
    +
    +
    +
    +
    +

    e) How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?

    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/exercisesweek42.html b/doc/LectureNotes/_build/html/exercisesweek42.html new file mode 100644 index 000000000..b9a532931 --- /dev/null +++ b/doc/LectureNotes/_build/html/exercisesweek42.html @@ -0,0 +1,1031 @@ + + + + + + + + + + + Exercises week 42 — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + + + + + + +
    + +
    +

    Exercises week 42#

    +

    October 13-17, 2025

    +

    Date: Deadline is Friday October 17 at midnight

    +
    +
    +

    Overarching aims of the exercises this week#

    +

    The aim of the exercises this week is to train the neural network you implemented last week.

    +

    To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.

    +

    You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.

    +

    We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.

    +

    First, some setup code that you will need.

    +
    +
    +
    import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later
    +from autograd import grad, elementwise_grad
    +from sklearn import datasets
    +import matplotlib.pyplot as plt
    +from sklearn.metrics import accuracy_score
    +
    +
    +# Defining some activation functions
    +def ReLU(z):
    +    return np.where(z > 0, z, 0)
    +
    +
    +# Derivative of the ReLU function
    +def ReLU_der(z):
    +    return np.where(z > 0, 1, 0)
    +
    +
    +def sigmoid(z):
    +    return 1 / (1 + np.exp(-z))
    +
    +
    +def mse(predict, target):
    +    return np.mean((predict - target) ** 2)
    +
    +
    +
    +
    +
    +
    +

    Exercise 1 - Understand the feed forward pass#

    +

    a) Complete last weeks’ exercises if you haven’t already (recommended).

    +
    +
    +

    Exercise 2 - Gradient with one layer using autograd#

    +

    For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.

    +

    In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!

    +

    a) If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?

    +

    b) Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.

    +
    +
    +
    def feed_forward_one_layer(W, b, x):
    +    z = ...
    +    a = ...
    +    return a
    +
    +
    +def cost_one_layer(W, b, x, target):
    +    predict = feed_forward_one_layer(W, b, x)
    +    return mse(predict, target)
    +
    +
    +x = np.random.rand(2)
    +target = np.random.rand(3)
    +
    +W = ...
    +b = ...
    +
    +
    +
    +
    +

    c) Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!

    +
    +
    +
    autograd_one_layer = grad(cost_one_layer, [0, 1])
    +W_g, b_g = autograd_one_layer(W, b, x, target)
    +print(W_g, b_g)
    +
    +
    +
    +
    +
    +
    +

    Exercise 3 - Gradient with one layer writing backpropagation by hand#

    +

    Before you use the gradient you found using autograd, you will have to find the gradient “manually”, to better understand how the backpropagation computation works. To do backpropagation “manually”, you will need to write out expressions for many derivatives along the computation.

    +

    We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.

    +
    +\[ +\frac{dC}{dW} = \frac{dC}{da}\frac{da}{dz}\frac{dz}{dW} +\]
    +
    +\[ +\frac{dC}{db} = \frac{dC}{da}\frac{da}{dz}\frac{dz}{db} +\]
    +

    a) Which intermediary results can be reused between the two expressions?

    +

    b) What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.

    +
    +
    +
    z = W @ x + b
    +a = sigmoid(z)
    +
    +predict = a
    +
    +
    +def mse_der(predict, target):
    +    return ...
    +
    +
    +print(mse_der(predict, target))
    +
    +cost_autograd = grad(mse, 0)
    +print(cost_autograd(predict, target))
    +
    +
    +
    +
    +

    c) What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.

    +
    +
    +
    def sigmoid_der(z):
    +    return ...
    +
    +
    +print(sigmoid_der(z))
    +
    +sigmoid_autograd = elementwise_grad(sigmoid, 0)
    +print(sigmoid_autograd(z))
    +
    +
    +
    +
    +

    d) Using the two derivatives you just computed, compute this intermetidary gradient you will use later:

    +
    +\[ +\frac{dC}{dz} = \frac{dC}{da}\frac{da}{dz} +\]
    +
    +
    +
    dC_da = ...
    +dC_dz = ...
    +
    +
    +
    +
    +

    e) What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.

    +

    f) Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions

    +
    +
    +
    dC_da = ...
    +dC_dz = ...
    +dC_dW = ...
    +dC_db = ...
    +
    +print(dC_dW, dC_db)
    +
    +
    +
    +
    +

    You should get the same results as with autograd.

    +
    +
    +
    W_g, b_g = autograd_one_layer(W, b, x, target)
    +print(W_g, b_g)
    +
    +
    +
    +
    +
    +
    +

    Exercise 4 - Gradient with two layers writing backpropagation by hand#

    +

    Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let’s move up to two layers.

    +
    +
    +
    x = np.random.rand(2)
    +target = np.random.rand(4)
    +
    +W1 = np.random.rand(3, 2)
    +b1 = np.random.rand(3)
    +
    +W2 = np.random.rand(4, 3)
    +b2 = np.random.rand(4)
    +
    +layers = [(W1, b1), (W2, b2)]
    +
    +
    +
    +
    +
    +
    +
    z1 = W1 @ x + b1
    +a1 = sigmoid(z1)
    +z2 = W2 @ a1 + b2
    +a2 = sigmoid(z2)
    +
    +
    +
    +
    +

    We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.

    +

    a) Compute the gradients of the last layer, just like you did the single layer in the previous exercise.

    +
    +
    +
    dC_da2 = ...
    +dC_dz2 = ...
    +dC_dW2 = ...
    +dC_db2 = ...
    +
    +
    +
    +
    +

    To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.

    +
    +\[ +\frac{dC}{da_1} = \frac{dC}{dz_2}\frac{dz_2}{da_1} +\]
    +

    b) What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute \(z_2\))

    +
    +\[ +\frac{dz_2}{da_1} +\]
    +

    c) Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.

    +
    +\[ +\frac{dC}{dW_1} = \frac{dC}{da_1}\frac{da_1}{dz_1}\frac{dz_1}{dW_1} +\]
    +
    +\[ +\frac{dC}{db_1} = \frac{dC}{da_1}\frac{da_1}{dz_1}\frac{dz_1}{db_1} +\]
    +
    +
    +
    dC_da1 = ...
    +dC_dz1 = ...
    +dC_dW1 = ...
    +dC_db1 = ...
    +
    +
    +
    +
    +
    +
    +
    print(dC_dW1, dC_db1)
    +print(dC_dW2, dC_db2)
    +
    +
    +
    +
    +

    d) Make sure you got the same gradient as the following code which uses autograd to do backpropagation.

    +
    +
    +
    def feed_forward_two_layers(layers, x):
    +    W1, b1 = layers[0]
    +    z1 = W1 @ x + b1
    +    a1 = sigmoid(z1)
    +
    +    W2, b2 = layers[1]
    +    z2 = W2 @ a1 + b2
    +    a2 = sigmoid(z2)
    +
    +    return a2
    +
    +
    +
    +
    +
    +
    +
    def cost_two_layers(layers, x, target):
    +    predict = feed_forward_two_layers(layers, x)
    +    return mse(predict, target)
    +
    +
    +grad_two_layers = grad(cost_two_layers, 0)
    +grad_two_layers(layers, x, target)
    +
    +
    +
    +
    +

    e) How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?

    +
    +
    +

    Exercise 5 - Gradient with any number of layers writing backpropagation by hand#

    +

    Well done on getting this far! Now it’s time to compute the gradient with any number of layers.

    +

    First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.

    +
    +
    +
    def create_layers(network_input_size, layer_output_sizes):
    +    layers = []
    +
    +    i_size = network_input_size
    +    for layer_output_size in layer_output_sizes:
    +        W = np.random.randn(layer_output_size, i_size)
    +        b = np.random.randn(layer_output_size)
    +        layers.append((W, b))
    +
    +        i_size = layer_output_size
    +    return layers
    +
    +
    +def feed_forward(input, layers, activation_funcs):
    +    a = input
    +    for (W, b), activation_func in zip(layers, activation_funcs):
    +        z = W @ a + b
    +        a = activation_func(z)
    +    return a
    +
    +
    +def cost(layers, input, activation_funcs, target):
    +    predict = feed_forward(input, layers, activation_funcs)
    +    return mse(predict, target)
    +
    +
    +
    +
    +

    You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.

    +

    Here is a function which does that for you.

    +
    +
    +
    def feed_forward_saver(input, layers, activation_funcs):
    +    layer_inputs = []
    +    zs = []
    +    a = input
    +    for (W, b), activation_func in zip(layers, activation_funcs):
    +        layer_inputs.append(a)
    +        z = W @ a + b
    +        a = activation_func(z)
    +
    +        zs.append(z)
    +
    +    return layer_inputs, zs, a
    +
    +
    +
    +
    +

    a) Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.

    +
    +
    +
    def backpropagation(
    +    input, layers, activation_funcs, target, activation_ders, cost_der=mse_der
    +):
    +    layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)
    +
    +    layer_grads = [() for layer in layers]
    +
    +    # We loop over the layers, from the last to the first
    +    for i in reversed(range(len(layers))):
    +        layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]
    +
    +        if i == len(layers) - 1:
    +            # For last layer we use cost derivative as dC_da(L) can be computed directly
    +            dC_da = ...
    +        else:
    +            # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)
    +            (W, b) = layers[i + 1]
    +            dC_da = ...
    +
    +        dC_dz = ...
    +        dC_dW = ...
    +        dC_db = ...
    +
    +        layer_grads[i] = (dC_dW, dC_db)
    +
    +    return layer_grads
    +
    +
    +
    +
    +
    +
    +
    network_input_size = 2
    +layer_output_sizes = [3, 4]
    +activation_funcs = [sigmoid, ReLU]
    +activation_ders = [sigmoid_der, ReLU_der]
    +
    +layers = create_layers(network_input_size, layer_output_sizes)
    +
    +x = np.random.rand(network_input_size)
    +target = np.random.rand(4)
    +
    +
    +
    +
    +
    +
    +
    layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)
    +print(layer_grads)
    +
    +
    +
    +
    +
    +
    +
    cost_grad = grad(cost, 0)
    +cost_grad(layers, x, [sigmoid, ReLU], target)
    +
    +
    +
    +
    +
    +
    +

    Exercise 6 - Batched inputs#

    +

    Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.

    +
    +
    +

    Exercise 7 - Training#

    +

    a) Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.

    +
      +
    • IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!

    • +
    • Instead, use the fact that the derivatives multiplied together simplify to prediction - target (see source1, source2)

    • +
    +

    b) Use stochastic gradient descent with momentum when you train your network.

    +
    +
    +

    Exercise 8 (Optional) - Object orientation#

    +

    Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.

    +

    a) Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.

    +

    We provide here a skeleton structure which should get you started.

    +
    +
    +
    class NeuralNetwork:
    +    def __init__(
    +        self,
    +        network_input_size,
    +        layer_output_sizes,
    +        activation_funcs,
    +        activation_ders,
    +        cost_fun,
    +        cost_der,
    +    ):
    +        pass
    +
    +    def predict(self, inputs):
    +        # Simple feed forward pass
    +        pass
    +
    +    def cost(self, inputs, targets):
    +        pass
    +
    +    def _feed_forward_saver(self, inputs):
    +        pass
    +
    +    def compute_gradient(self, inputs, targets):
    +        pass
    +
    +    def update_weights(self, layer_grads):
    +        pass
    +
    +    # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it
    +    def autograd_compliant_predict(self, layers, inputs):
    +        pass
    +
    +    def autograd_gradient(self, inputs, targets):
    +        pass
    +
    +
    +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/exercisesweek43.html b/doc/LectureNotes/_build/html/exercisesweek43.html new file mode 100644 index 000000000..037da43fb --- /dev/null +++ b/doc/LectureNotes/_build/html/exercisesweek43.html @@ -0,0 +1,850 @@ + + + + + + + + + + + Exercises week 43 — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + + + + + + +
    + + +
    +

    Exercises week 43#

    +

    October 20-24, 2025

    +

    Date: Deadline Friday October 24 at midnight

    +
    +
    +

    Overarching aims of the exercises for week 43#

    +

    The aim of the exercises this week is to gain some confidence with +ways to visualize the results of a classification problem. We will +target three ways of setting up the analysis. The first and simplest +one is the

    +
      +
    1. so-called confusion matrix. The next one is the so-called

    2. +
    3. ROC curve. Finally we have the

    4. +
    5. Cumulative gain curve.

    6. +
    +

    We will use Logistic Regression as method for the classification in +this exercise. You can compare these results with those obtained with +your neural network code from project 2 without a hidden layer.

    +

    In these exercises we will use binary and multi-class data sets +(the Iris data set from week 41).

    +

    The underlying mathematics is described here.

    +
    +

    Confusion Matrix#

    +

    A confusion matrix summarizes a classifier’s performance by +tabulating predictions versus true labels. For binary classification, +it is a \(2\times2\) table whose entries are counts of outcomes:

    +
    +\[\begin{split} +\begin{array}{l|cc} & \text{Predicted Positive} & \text{Predicted Negative} \\ \hline \text{Actual Positive} & TP & FN \\ \text{Actual Negative} & FP & TN \end{array}. +\end{split}\]
    +

    Here TP (true positives) is the number of cases correctly predicted as +positive, FP (false positives) is the number incorrectly predicted as +positive, TN (true negatives) is correctly predicted negative, and FN +(false negatives) is incorrectly predicted negative . In other words, +“positive” means class 1 and “negative” means class 0; for example, TP +occurs when the prediction and actual are both positive. Formally:

    +
    +\[ +\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}, \quad \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}, +\]
    +

    where TPR and FPR are the true and false positive rates defined below.

    +

    In multiclass classification with \(K\) classes, the confusion matrix +generalizes to a \(K\times K\) table. Entry \(N_{ij}\) in the table is +the count of instances whose true class is \(i\) and whose predicted +class is \(j\). For example, a three-class confusion matrix can be written +as:

    +
    +\[\begin{split} +\begin{array}{c|ccc} & \text{Pred Class 1} & \text{Pred Class 2} & \text{Pred Class 3} \\ \hline \text{Act Class 1} & N_{11} & N_{12} & N_{13} \\ \text{Act Class 2} & N_{21} & N_{22} & N_{23} \\ \text{Act Class 3} & N_{31} & N_{32} & N_{33} \end{array}. +\end{split}\]
    +

    Here the diagonal entries \(N_{ii}\) are the true positives for each +class, and off-diagonal entries are misclassifications. This matrix +allows computation of per-class metrics: e.g. for class \(i\), +\(\mathrm{TP}_i=N_{ii}\), \(\mathrm{FN}_i=\sum_{j\neq i}N_{ij}\), +\(\mathrm{FP}_i=\sum_{j\neq i}N_{ji}\), and \(\mathrm{TN}_i\) is the sum of +all remaining entries.

    +

    As defined above, TPR and FPR come from the binary case. In binary +terms with \(P\) actual positives and \(N\) actual negatives, one has

    +
    +\[ +\text{TPR} = \frac{TP}{P} = \frac{TP}{TP+FN}, \quad \text{FPR} = +\frac{FP}{N} = \frac{FP}{FP+TN}, +\]
    +

    as used in standard confusion-matrix +formulations. These rates will be used in constructing ROC curves.

    +
    +
    +

    ROC Curve#

    +

    The Receiver Operating Characteristic (ROC) curve plots the trade-off +between true positives and false positives as a discrimination +threshold varies. Specifically, for a binary classifier that outputs +a score or probability, one varies the threshold \(t\) for declaring +positive, and computes at each \(t\) the true positive rate +\(\mathrm{TPR}(t)\) and false positive rate \(\mathrm{FPR}(t)\) using the +confusion matrix at that threshold. The ROC curve is then the graph +of TPR versus FPR. By definition,

    +
    +\[ +\mathrm{TPR} = \frac{TP}{TP+FN}, \qquad \mathrm{FPR} = \frac{FP}{FP+TN}, +\]
    +

    where \(TP,FP,TN,FN\) are counts determined by threshold \(t\). A perfect +classifier would reach the point (FPR=0, TPR=1) at some threshold.

    +

    Formally, the ROC curve is obtained by plotting +\((\mathrm{FPR}(t),\mathrm{TPR}(t))\) for all \(t\in[0,1]\) (or as \(t\) +sweeps through the sorted scores). The Area Under the ROC Curve (AUC) +quantifies the average performance over all thresholds. It can be +interpreted probabilistically: \(\mathrm{AUC} = +\Pr\bigl(s(X^+)>s(X^-)\bigr)\), the probability that a random positive +instance \(X^+\) receives a higher score \(s\) than a random negative +instance \(X^-\) . Equivalently, the AUC is the integral under the ROC +curve:

    +
    +\[ +\mathrm{AUC} \;=\; \int_{0}^{1} \mathrm{TPR}(f)\,df, +\]
    +

    where \(f\) ranges over FPR (or fraction of negatives). A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0.

    +
    +
    +

    Cumulative Gain#

    +

    The cumulative gain curve (or gains chart) evaluates how many +positives are captured as one targets an increasing fraction of the +population, sorted by model confidence. To construct it, sort all +instances by decreasing predicted probability of the positive class. +Then, for the top \(\alpha\) fraction of instances, compute the fraction +of all actual positives that fall in this subset. In formula form, if +\(P\) is the total number of positive instances and \(P(\alpha)\) is the +number of positives among the top \(\alpha\) of the data, the cumulative +gain at level \(\alpha\) is

    +
    +\[ +\mathrm{Gain}(\alpha) \;=\; \frac{P(\alpha)}{P}. +\]
    +

    For example, cutting off at the top 10% of predictions yields a gain +equal to (positives in top 10%) divided by (total positives) . +Plotting \(\mathrm{Gain}(\alpha)\) versus \(\alpha\) (often in percent) +gives the gain curve. The baseline (random) curve is the diagonal +\(\mathrm{Gain}(\alpha)=\alpha\), while an ideal model has a steep climb +toward 1.

    +

    A related measure is the {\em lift}, often called the gain ratio. It is the ratio of the model’s capture rate to that of random selection. Equivalently,

    +
    +\[ +\mathrm{Lift}(\alpha) \;=\; \frac{\mathrm{Gain}(\alpha)}{\alpha}. +\]
    +

    A lift \(>1\) indicates better-than-random targeting. In practice, gain +and lift charts (used e.g.\ in marketing or imbalanced classification) +show how many positives can be “gained” by focusing on a fraction of +the population .

    +
    +
    +

    Other measures: Precision, Recall, and the F\(_1\) Measure#

    +

    Precision and recall (sensitivity) quantify binary classification +accuracy in terms of positive predictions. They are defined from the +confusion matrix as:

    +
    +\[ +\text{Precision} = \frac{TP}{TP + FP}, \qquad \text{Recall} = \frac{TP}{TP + FN}. +\]
    +

    Precision is the fraction of predicted positives that are correct, and +recall is the fraction of actual positives that are correctly +identified . A high-precision classifier makes few false-positive +errors, while a high-recall classifier makes few false-negative +errors.

    +

    The F\(_1\) score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean. The usual formula is:

    +
    +\[ +F_1 =2\frac{\text{Precision}\times\text{Recall}}{\text{Precision} + \text{Recall}}. +\]
    +

    This can be shown to equal

    +
    +\[ +\frac{2\,TP}{2\,TP + FP + FN}. +\]
    +

    The F\(_1\) score ranges from 0 (worst) to 1 (best), and balances the +trade-off between precision and recall.

    +

    For multi-class classification, one computes per-class +precision/recall/F\(_1\) (treating each class as “positive” in a +one-vs-rest manner) and then averages. Common averaging methods are:

    +

    Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F\(_1\) from these totals. +Macro-averaging: Compute the F\(1\) score \(F{1,i}\) for each class \(i\) separately, then take the unweighted mean: \(F_{1,\mathrm{macro}} = \frac{1}{K}\sum_{i=1}^K F_{1,i}\) . This treats all classes equally regardless of size. +Weighted-averaging: Like macro-average, but weight each class’s \(F_{1,i}\) by its support \(n_i\) (true count): \(F_{1,\mathrm{weighted}} = \frac{1}{N}\sum_{i=1}^K n_i F_{1,i}\), where \(N=\sum_i n_i\). This accounts for class imbalance by giving more weight to larger classes .

    +

    Each of these averages has different use-cases. Micro-average is +dominated by common classes, macro-average highlights performance on +rare classes, and weighted-average is a compromise. These formulas +and concepts allow rigorous evaluation of classifier performance in +both binary and multi-class settings.

    +
    +
    +

    Exercises#

    +

    Here is a simple code example which uses the Logistic regression machinery from scikit-learn. +At the end it sets up the confusion matrix and the ROC and cumulative gain curves. +Feel free to use these functionalities (we don’t expect you to write your own code for say the confusion matrix).

    +
    +
    +
    %matplotlib inline
    +
    +import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.model_selection import  train_test_split 
    +# from sklearn.datasets import fill in the data set
    +from sklearn.linear_model import LogisticRegression
    +
    +# Load the data, fill inn
    +mydata.data = ?
    +
    +X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)
    +print(X_train.shape)
    +print(X_test.shape)
    +# Logistic Regression
    +# define which type of problem, binary or multiclass
    +logreg = LogisticRegression(solver='lbfgs')
    +logreg.fit(X_train, y_train)
    +
    +from sklearn.preprocessing import LabelEncoder
    +from sklearn.model_selection import cross_validate
    +#Cross validation
    +accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']
    +print(accuracy)
    +print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
    +
    +import scikitplot as skplt
    +y_pred = logreg.predict(X_test)
    +skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
    +plt.show()
    +y_probas = logreg.predict_proba(X_test)
    +skplt.metrics.plot_roc(y_test, y_probas)
    +plt.show()
    +skplt.metrics.plot_cumulative_gain(y_test, y_probas)
    +plt.show()
    +
    +
    +
    +
    +
    +

    Exercise a)#

    +

    Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem.

    +
    +
    +

    Exercise b)#

    +

    Use a binary classification data available from scikit-learn. As an example you can use +the MNIST data set and just specialize to two numbers. To do so you can use the following code lines

    +
    +
    +
    from sklearn.datasets import load_digits
    +digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1
    +X, y = digits.data, digits.target
    +
    +
    +
    +
    +

    Alternatively, you can use the make\(\_\)classification +functionality. This function generates a random \(n\)-class classification +dataset, which can be configured for binary classification by setting +n_classes=2. You can also control the number of samples, features, +informative features, redundant features, and more.

    +
    +
    +
    from sklearn.datasets import make_classification
    +X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)
    +
    +
    +
    +
    +

    You can use this option for the multiclass case as well, see the next exercise. +If you prefer to study other binary classification datasets, feel free +to replace the above suggestions with your own dataset.

    +

    Make plots of the confusion matrix, the ROC curve and the cumulative gain curve.

    +
    +
    +

    Exercise c) week 43#

    +

    As a multiclass problem, we will use the Iris data set discussed in +the exercises from weeks 41 and 42. This is a three-class data set and +you can set it up using scikit-learn,

    +
    +
    +
    from sklearn.datasets import load_iris
    +iris = load_iris()
    +X = iris.data  # Features
    +y = iris.target # Target labels
    +
    +
    +
    +
    +

    Make plots of the confusion matrix, the ROC curve and the cumulative +gain curve for this (or other) multiclass data set.

    +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/exercisesweek44.html b/doc/LectureNotes/_build/html/exercisesweek44.html new file mode 100644 index 000000000..97dee6175 --- /dev/null +++ b/doc/LectureNotes/_build/html/exercisesweek44.html @@ -0,0 +1,639 @@ + + + + + + + + + + + Exercises week 44 — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Exercises week 44

    + +
    + +
    +
    + + + + +
    + + + +
    +

    Exercises week 44#

    +

    October 27-31, 2025

    +

    Date: Deadline is Friday October 31 at midnight

    +
    +
    +

    Overarching aims of the exercises this week#

    +

    The exercise set this week has two parts.

    +
      +
    1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.

    2. +
    3. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. You don’t need to answer all the questions, but you should be able to answer them by the end of working on project 2.

    4. +
    +
    +

    Deliverables#

    +

    First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don’t have a group, you should really consider joining one!

    +

    Complete exercise 1 while working in an Overleaf project. Then, in canvas, include

    +
      +
    • An exported PDF of the report draft you have been working on.

    • +
    • A comment linking to the github repository used in exercise 1d)

    • +
    +
    +
    +

    Exercise 1:#

    +

    Following the same directions as in the weekly exercises for week 39:

    +

    a) Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.

    +

    b) Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.

    +

    c) Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)

    +

    d) Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.

    +

    e) If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.

    +
    +
    +

    Exercise 2:#

    +

    a) Linear and logistic regression methods

    +
      +
    1. What is the main difference between ordinary least squares and Ridge regression?

    2. +
    3. Which kind of data set would you use logistic regression for?

    4. +
    5. In linear regression you assume that your output is described by a continuous non-stochastic function \(f(x)\). Which is the equivalent function in logistic regression?

    6. +
    7. Can you find an analytic solution to a logistic regression type of problem?

    8. +
    9. What kind of cost function would you use in logistic regression?

    10. +
    +

    b) Deep learning

    +
      +
    1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?

    2. +
    3. Describe the architecture of a typical feed forward Neural Network (NN).

    4. +
    5. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?

    6. +
    7. How would you know if your model is suffering from the problem of exploding gradients?

    8. +
    9. Can you name and explain a few hyperparameters used for training a neural network?

    10. +
    11. Describe the architecture of a typical Convolutional Neural Network (CNN)

    12. +
    13. What is the vanishing gradient problem in Neural Networks and how to fix it?

    14. +
    15. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn’t decrease in a few epochs?

    16. +
    17. How does L1/L2 regularization affect a neural network?

    18. +
    19. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?

    20. +
    +

    c) Optimization part

    +
      +
    1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?

    2. +
    3. And why don’t we use it? Or stated differently, why do we introduce the learning rate as a parameter?

    4. +
    5. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?

    6. +
    7. Why should we use stochastic gradient descent instead of plain gradient descent?

    8. +
    9. Which parameters would you need to tune when use a stochastic gradient descent approach?

    10. +
    +

    d) Analysis of results

    +
      +
    1. How do you assess overfitting and underfitting?

    2. +
    3. Why do we divide the data in test and train and/or eventually validation sets?

    4. +
    5. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.

    6. +
    7. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?

    8. +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/figslides/RNN1.png b/doc/LectureNotes/_build/html/figslides/RNN1.png new file mode 100644 index 000000000..6174bee40 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN1.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN10.png b/doc/LectureNotes/_build/html/figslides/RNN10.png new file mode 100644 index 000000000..259fc5c22 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN10.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN11.png b/doc/LectureNotes/_build/html/figslides/RNN11.png new file mode 100644 index 000000000..04423c850 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN11.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN12.png b/doc/LectureNotes/_build/html/figslides/RNN12.png new file mode 100644 index 000000000..f0c1fc40b Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN12.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN13.png b/doc/LectureNotes/_build/html/figslides/RNN13.png new file mode 100644 index 000000000..f0f83c0d1 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN13.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN14.png b/doc/LectureNotes/_build/html/figslides/RNN14.png new file mode 100644 index 000000000..cead8a2fa Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN14.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN15.png b/doc/LectureNotes/_build/html/figslides/RNN15.png new file mode 100644 index 000000000..2d894680e Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN15.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN16.png b/doc/LectureNotes/_build/html/figslides/RNN16.png new file mode 100644 index 000000000..10bc64a05 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN16.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN17.png b/doc/LectureNotes/_build/html/figslides/RNN17.png new file mode 100644 index 000000000..095e0df92 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN17.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN18.png b/doc/LectureNotes/_build/html/figslides/RNN18.png new file mode 100644 index 000000000..aa5cfee07 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN18.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN19.png b/doc/LectureNotes/_build/html/figslides/RNN19.png new file mode 100644 index 000000000..37ac76e53 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN19.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN2.png b/doc/LectureNotes/_build/html/figslides/RNN2.png new file mode 100644 index 000000000..39bc7147c Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN2.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN20.png b/doc/LectureNotes/_build/html/figslides/RNN20.png new file mode 100644 index 000000000..12635c4c8 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN20.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN21.png b/doc/LectureNotes/_build/html/figslides/RNN21.png new file mode 100644 index 000000000..3e55cd33b Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN21.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN22.png b/doc/LectureNotes/_build/html/figslides/RNN22.png new file mode 100644 index 000000000..fa4611af1 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN22.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN3.png b/doc/LectureNotes/_build/html/figslides/RNN3.png new file mode 100644 index 000000000..07ca1d7d4 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN3.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN4.png b/doc/LectureNotes/_build/html/figslides/RNN4.png new file mode 100644 index 000000000..5b204a801 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN4.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN5.png b/doc/LectureNotes/_build/html/figslides/RNN5.png new file mode 100644 index 000000000..bc4d8e6ca Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN5.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN6.png b/doc/LectureNotes/_build/html/figslides/RNN6.png new file mode 100644 index 000000000..11faa4239 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN6.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN7.png b/doc/LectureNotes/_build/html/figslides/RNN7.png new file mode 100644 index 000000000..6f9489814 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN7.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN8.png b/doc/LectureNotes/_build/html/figslides/RNN8.png new file mode 100644 index 000000000..9ea7d412c Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN8.png differ diff --git a/doc/LectureNotes/_build/html/figslides/RNN9.png b/doc/LectureNotes/_build/html/figslides/RNN9.png new file mode 100644 index 000000000..bd537ad0a Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN9.png differ diff --git a/doc/LectureNotes/_build/html/figslides/cnn.jpeg b/doc/LectureNotes/_build/html/figslides/cnn.jpeg new file mode 100644 index 000000000..67bf3ced7 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/cnn.jpeg differ diff --git a/doc/LectureNotes/_build/html/figslides/deepcnn.png b/doc/LectureNotes/_build/html/figslides/deepcnn.png new file mode 100644 index 000000000..a6c023d72 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/deepcnn.png differ diff --git a/doc/LectureNotes/_build/html/figslides/discreteconv.png b/doc/LectureNotes/_build/html/figslides/discreteconv.png new file mode 100644 index 000000000..3d40abfcb Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/discreteconv.png differ diff --git a/doc/LectureNotes/_build/html/figslides/discreteconv1.png b/doc/LectureNotes/_build/html/figslides/discreteconv1.png new file mode 100644 index 000000000..4d57c1e99 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/discreteconv1.png differ diff --git a/doc/LectureNotes/_build/html/figslides/lstm.pdf b/doc/LectureNotes/_build/html/figslides/lstm.pdf new file mode 100644 index 000000000..31ca2c643 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/lstm.pdf differ diff --git a/doc/LectureNotes/_build/html/figslides/lstm.png b/doc/LectureNotes/_build/html/figslides/lstm.png new file mode 100644 index 000000000..fc543251a Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/lstm.png differ diff --git a/doc/LectureNotes/_build/html/figslides/maxpooling.png b/doc/LectureNotes/_build/html/figslides/maxpooling.png new file mode 100644 index 000000000..752651f85 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/maxpooling.png differ diff --git a/doc/LectureNotes/_build/html/figslides/nn.jpeg b/doc/LectureNotes/_build/html/figslides/nn.jpeg new file mode 100644 index 000000000..0a495cfe4 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/nn.jpeg differ diff --git a/doc/LectureNotes/_build/html/figslides/photo.jpg b/doc/LectureNotes/_build/html/figslides/photo.jpg new file mode 100755 index 000000000..426220598 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/photo.jpg differ diff --git a/doc/LectureNotes/_build/html/figslides/photo1.jpg b/doc/LectureNotes/_build/html/figslides/photo1.jpg new file mode 100755 index 000000000..2989b6347 Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/photo1.jpg differ diff --git a/doc/LectureNotes/_build/html/figures/adagrad.png b/doc/LectureNotes/_build/html/figures/adagrad.png new file mode 100644 index 000000000..97a9cf908 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/adagrad.png differ diff --git a/doc/LectureNotes/_build/html/figures/adam.png b/doc/LectureNotes/_build/html/figures/adam.png new file mode 100644 index 000000000..a3a39f025 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/adam.png differ diff --git a/doc/LectureNotes/_build/html/figures/generativelearning.png b/doc/LectureNotes/_build/html/figures/generativelearning.png new file mode 100644 index 000000000..78168b7a0 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/generativelearning.png differ diff --git a/doc/LectureNotes/_build/html/figures/nn1.pdf b/doc/LectureNotes/_build/html/figures/nn1.pdf new file mode 100644 index 000000000..bebe5cabd Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn1.pdf differ diff --git a/doc/LectureNotes/_build/html/figures/nn1.png b/doc/LectureNotes/_build/html/figures/nn1.png new file mode 100644 index 000000000..05c359481 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn1.png differ diff --git a/doc/LectureNotes/_build/html/figures/nn2.pdf b/doc/LectureNotes/_build/html/figures/nn2.pdf new file mode 100644 index 000000000..7b62d8ff7 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn2.pdf differ diff --git a/doc/LectureNotes/_build/html/figures/nn2.png b/doc/LectureNotes/_build/html/figures/nn2.png new file mode 100644 index 000000000..c402d795b Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn2.png differ diff --git a/doc/LectureNotes/_build/html/figures/nns.png b/doc/LectureNotes/_build/html/figures/nns.png new file mode 100644 index 000000000..19e31ef05 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nns.png differ diff --git a/doc/LectureNotes/_build/html/figures/rmsprop.png b/doc/LectureNotes/_build/html/figures/rmsprop.png new file mode 100644 index 000000000..9f336d033 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/rmsprop.png differ diff --git a/doc/LectureNotes/_build/html/figures/simplenn1.png b/doc/LectureNotes/_build/html/figures/simplenn1.png new file mode 100644 index 000000000..3c87aa3ee Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn1.png differ diff --git a/doc/LectureNotes/_build/html/figures/simplenn2.png b/doc/LectureNotes/_build/html/figures/simplenn2.png new file mode 100644 index 000000000..2ce83dd53 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn2.png differ diff --git a/doc/LectureNotes/_build/html/figures/simplenn3.pdf b/doc/LectureNotes/_build/html/figures/simplenn3.pdf new file mode 100644 index 000000000..c27014f4a Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn3.pdf differ diff --git a/doc/LectureNotes/_build/html/figures/simplenn3.png b/doc/LectureNotes/_build/html/figures/simplenn3.png new file mode 100644 index 000000000..a377fad3c Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn3.png differ diff --git a/doc/LectureNotes/_build/html/figures/standarddeeplearning.png b/doc/LectureNotes/_build/html/figures/standarddeeplearning.png new file mode 100644 index 000000000..f21133ff9 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/standarddeeplearning.png differ diff --git a/doc/LectureNotes/_build/html/figures/structure.pdf b/doc/LectureNotes/_build/html/figures/structure.pdf new file mode 100644 index 000000000..d21e6d3d9 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/structure.pdf differ diff --git a/doc/LectureNotes/_build/html/figures/structure.png b/doc/LectureNotes/_build/html/figures/structure.png new file mode 100644 index 000000000..bf82679e3 Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/structure.png differ diff --git a/doc/LectureNotes/_build/html/genindex.html b/doc/LectureNotes/_build/html/genindex.html index 5583cc411..161d3d12f 100644 --- a/doc/LectureNotes/_build/html/genindex.html +++ b/doc/LectureNotes/_build/html/genindex.html @@ -227,10 +227,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/intro.html b/doc/LectureNotes/_build/html/intro.html index 01ecdb733..ae09bc9c6 100644 --- a/doc/LectureNotes/_build/html/intro.html +++ b/doc/LectureNotes/_build/html/intro.html @@ -231,10 +231,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/linalg.html b/doc/LectureNotes/_build/html/linalg.html index 52c213f10..f7d1bcb07 100644 --- a/doc/LectureNotes/_build/html/linalg.html +++ b/doc/LectureNotes/_build/html/linalg.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/objects.inv b/doc/LectureNotes/_build/html/objects.inv index 26f7645df..3220aedd5 100644 Binary files a/doc/LectureNotes/_build/html/objects.inv and b/doc/LectureNotes/_build/html/objects.inv differ diff --git a/doc/LectureNotes/_build/html/project1.html b/doc/LectureNotes/_build/html/project1.html index 44d08f6bb..b7ae745b1 100644 --- a/doc/LectureNotes/_build/html/project1.html +++ b/doc/LectureNotes/_build/html/project1.html @@ -62,7 +62,8 @@ - + + @@ -229,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    @@ -381,16 +417,17 @@

    Contents

    @@ -774,16 +848,17 @@

    Software and needed installations

    Projects

    diff --git a/doc/LectureNotes/_build/html/search.html b/doc/LectureNotes/_build/html/search.html index be23d8ca2..109651ed7 100644 --- a/doc/LectureNotes/_build/html/search.html +++ b/doc/LectureNotes/_build/html/search.html @@ -229,10 +229,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/searchindex.js b/doc/LectureNotes/_build/html/searchindex.js index 4c7aa149b..30d44a50d 100644 --- a/doc/LectureNotes/_build/html/searchindex.js +++ b/doc/LectureNotes/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({"alltitles": {"1a)": [[18, "a"]], "3a)": [[18, "id1"]], "3b)": [[18, "b"]], "4a)": [[18, "id2"]], "4b)": [[18, "id3"]], "A Classification Tree": [[9, "a-classification-tree"]], "A Frequentist approach to data analysis": [[0, "a-frequentist-approach-to-data-analysis"], [26, "a-frequentist-approach-to-data-analysis"]], "A better approach": [[8, "a-better-approach"]], "A first summary": [[26, "a-first-summary"]], "A quick Reminder on Lagrangian Multipliers": [[8, "a-quick-reminder-on-lagrangian-multipliers"]], "A simple example": [[4, "a-simple-example"]], "A soft classifier": [[8, "a-soft-classifier"]], "A top-down perspective on Neural networks": [[1, "a-top-down-perspective-on-neural-networks"]], "ADAM optimizer": [[13, "adam-optimizer"]], "Activation functions": [[12, "activation-functions"]], "Adaptive boosting: AdaBoost, Basic Algorithm": [[10, "adaptive-boosting-adaboost-basic-algorithm"]], "Adding error analysis and training set up": [[26, "adding-error-analysis-and-training-set-up"], [27, "adding-error-analysis-and-training-set-up"]], "Adjust hyperparameters": [[1, "adjust-hyperparameters"]], "Algorithms for Setting up Decision Trees": [[9, "algorithms-for-setting-up-decision-trees"]], "An Overview of Ensemble Methods": [[10, "an-overview-of-ensemble-methods"]], "An extrapolation example": [[4, "an-extrapolation-example"]], "An optimization/minimization problem": [[26, "an-optimization-minimization-problem"]], "And finally \\boldsymbol{X}\\boldsymbol{X}^T": [[27, "and-finally-boldsymbol-x-boldsymbol-x-t"]], "And what about using neural networks?": [[26, "and-what-about-using-neural-networks"]], "Another Example, now with a polynomial fit": [[28, "another-example-now-with-a-polynomial-fit"]], "Another example, the moons again": [[9, "another-example-the-moons-again"]], "Applied Data Analysis and Machine Learning": [[19, null]], "Autocorrelation function": [[23, "autocorrelation-function"]], "Automatic differentiation": [[13, "automatic-differentiation"]], "Back to Ridge and LASSO Regression": [[27, "back-to-ridge-and-lasso-regression"], [28, "back-to-ridge-and-lasso-regression"]], "Back to the Cancer Data": [[11, "back-to-the-cancer-data"]], "Background literature": [[21, "background-literature"]], "Bagging": [[10, "bagging"]], "Bagging Examples": [[10, "bagging-examples"]], "Basic Matrix Features": [[20, "basic-matrix-features"]], "Basic ideas of the Principal Component Analysis (PCA)": [[11, null]], "Basic math of the SVD": [[5, "basic-math-of-the-svd"], [27, "basic-math-of-the-svd"], [28, "basic-math-of-the-svd"]], "Basics": [[7, "basics"]], "Basics of a tree": [[9, "basics-of-a-tree"]], "Batch Normalization": [[1, "batch-normalization"]], "Bayes\u2019 Theorem and Ridge and Lasso Regression": [[5, "bayes-theorem-and-ridge-and-lasso-regression"]], "Boosting, a Bird\u2019s Eye View": [[10, "boosting-a-bird-s-eye-view"]], "Bootstrap": [[6, "bootstrap"]], "Bringing it together, first back propagation equation": [[12, "bringing-it-together-first-back-propagation-equation"]], "Building a Feed Forward Neural Network": [[1, null]], "Building a tree, regression": [[9, "building-a-tree-regression"]], "Building neural networks in Tensorflow and Keras": [[1, "building-neural-networks-in-tensorflow-and-keras"]], "CNNs in more detail, building convolutional neural networks in Tensorflow and Keras": [[3, "cnns-in-more-detail-building-convolutional-neural-networks-in-tensorflow-and-keras"]], "Cancer Data again now with Decision Trees and other Methods": [[9, "cancer-data-again-now-with-decision-trees-and-other-methods"]], "Choose cost function and optimizer": [[1, "choose-cost-function-and-optimizer"]], "Classical PCA Theorem": [[11, "classical-pca-theorem"]], "Clustering and Unsupervised Learning": [[14, null]], "Code for SVD and Inversion of Matrices": [[5, "code-for-svd-and-inversion-of-matrices"]], "Codes and Approaches": [[14, "codes-and-approaches"]], "Codes for the SVD": [[5, "codes-for-the-svd"], [27, "codes-for-the-svd"], [28, "codes-for-the-svd"]], "Coding Setup and Linear Regression": [[15, "coding-setup-and-linear-regression"]], "Collect and pre-process data": [[1, "collect-and-pre-process-data"]], "Communication channels": [[26, "communication-channels"]], "Compare Bagging on Trees with Random Forests": [[10, "compare-bagging-on-trees-with-random-forests"]], "Comparing with a numerical scheme": [[2, "comparing-with-a-numerical-scheme"]], "Comparison with OLS": [[28, "comparison-with-ols"]], "Computing the Gini index": [[9, "computing-the-gini-index"]], "Conditions on convex functions": [[28, "conditions-on-convex-functions"]], "Conjugate gradient method": [[13, "conjugate-gradient-method"]], "Convex function": [[28, "convex-function"]], "Convex functions": [[13, "convex-functions"], [28, "convex-functions"]], "Convolution Examples: Polynomial multiplication": [[3, "convolution-examples-polynomial-multiplication"]], "Convolution Examples: Principle of Superposition and Periodic Forces (Fourier Transforms)": [[3, "convolution-examples-principle-of-superposition-and-periodic-forces-fourier-transforms"]], "Convolutional Neural Network": [[12, "convolutional-neural-network"]], "Convolutional Neural Networks": [[3, null]], "Correlation Function and Design/Feature Matrix": [[27, "correlation-function-and-design-feature-matrix"]], "Correlation Matrix": [[11, "correlation-matrix"], [27, "correlation-matrix"]], "Correlation Matrix with Pandas": [[27, "correlation-matrix-with-pandas"]], "Course Format": [[26, "course-format"]], "Course setting": [[22, null]], "Covariance Matrix Examples": [[27, "covariance-matrix-examples"]], "Covariance and Correlation Matrix": [[27, "covariance-and-correlation-matrix"]], "Cross-validation": [[6, "cross-validation"]], "Deadlines for projects (tentative)": [[26, "deadlines-for-projects-tentative"]], "Decision trees, overarching aims": [[9, null]], "Deep learning methods": [[26, "deep-learning-methods"]], "Define model and architecture": [[1, "define-model-and-architecture"]], "Defining the cost function": [[1, "defining-the-cost-function"]], "Deliverables": [[15, "deliverables"], [16, "deliverables"]], "Derivatives and the chain rule": [[12, "derivatives-and-the-chain-rule"]], "Derivatives, example 1": [[27, "derivatives-example-1"]], "Deriving OLS from a probability distribution": [[5, "deriving-ols-from-a-probability-distribution"]], "Deriving and Implementing Ordinary Least Squares": [[16, "deriving-and-implementing-ordinary-least-squares"]], "Deriving and Implementing Ridge Regression": [[17, "deriving-and-implementing-ridge-regression"]], "Deriving the Lasso Regression Equations": [[27, "deriving-the-lasso-regression-equations"], [28, "deriving-the-lasso-regression-equations"], [28, "id6"]], "Deriving the Ridge Regression Equations": [[27, "deriving-the-ridge-regression-equations"], [28, "deriving-the-ridge-regression-equations"], [28, "id3"]], "Deriving the back propagation code for a multilayer perceptron model": [[12, "deriving-the-back-propagation-code-for-a-multilayer-perceptron-model"]], "Developing a code for doing neural networks with back propagation": [[1, "developing-a-code-for-doing-neural-networks-with-back-propagation"]], "Diagonalize the sample covariance matrix to obtain the principal components": [[11, "diagonalize-the-sample-covariance-matrix-to-obtain-the-principal-components"]], "Different kernels and Mercer\u2019s theorem": [[8, "different-kernels-and-mercer-s-theorem"]], "Disadvantages": [[9, "disadvantages"]], "Discriminative Modeling": [[26, "discriminative-modeling"]], "Domains and probabilities": [[23, "domains-and-probabilities"]], "Dropout": [[1, "dropout"]], "Economy-size SVD": [[27, "economy-size-svd"], [28, "economy-size-svd"]], "Elements of Probability Theory and Statistical Data Analysis": [[23, null]], "Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods": [[10, null]], "Entropy and the ID3 algorithm": [[9, "entropy-and-the-id3-algorithm"]], "Essential elements of ML": [[26, "essential-elements-of-ml"]], "Evaluate model performance on test data": [[1, "evaluate-model-performance-on-test-data"]], "Example 2": [[27, "example-2"]], "Example 3": [[27, "example-3"]], "Example 4": [[27, "example-4"]], "Example Matrix": [[27, "example-matrix"], [28, "example-matrix"]], "Example of discriminative modeling, taken from Generative Deep Learning by David Foster": [[26, "example-of-discriminative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of generative modeling, taken from Generative Deep Learning by David Foster": [[26, "example-of-generative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of own Standard scaling": [[27, "example-of-own-standard-scaling"]], "Example relevant for the exercises": [[27, "example-relevant-for-the-exercises"]], "Example: Exponential decay": [[2, "example-exponential-decay"]], "Example: Population growth": [[2, "example-population-growth"]], "Example: The diffusion equation": [[2, "example-the-diffusion-equation"]], "Example: binary classification problem": [[1, "example-binary-classification-problem"]], "Examples": [[26, "examples"]], "Examples of likelihood functions used in logistic regression and neural networks": [[7, "examples-of-likelihood-functions-used-in-logistic-regression-and-neural-networks"]], "Exercise 1 - Choice of model and degrees of freedom": [[17, "exercise-1-choice-of-model-and-degrees-of-freedom"]], "Exercise 1 - Finding the derivative of Matrix-Vector expressions": [[16, "exercise-1-finding-the-derivative-of-matrix-vector-expressions"]], "Exercise 1 - Github Setup": [[15, "exercise-1-github-setup"]], "Exercise 1, scale your data": [[18, "exercise-1-scale-your-data"]], "Exercise 1: Setting up various Python environments": [[0, "exercise-1-setting-up-various-python-environments"]], "Exercise 2 - Deriving the expression for OLS": [[16, "exercise-2-deriving-the-expression-for-ols"]], "Exercise 2 - Deriving the expression for Ridge Regression": [[17, "exercise-2-deriving-the-expression-for-ridge-regression"]], "Exercise 2 - Setting up a Github repository": [[15, "exercise-2-setting-up-a-github-repository"]], "Exercise 2, calculate the gradients": [[18, "exercise-2-calculate-the-gradients"]], "Exercise 2: making your own data and exploring scikit-learn": [[0, "exercise-2-making-your-own-data-and-exploring-scikit-learn"]], "Exercise 3 - Creating feature matrix and implementing OLS using the analytical expression": [[16, "exercise-3-creating-feature-matrix-and-implementing-ols-using-the-analytical-expression"]], "Exercise 3 - Fitting an OLS model to data": [[15, "exercise-3-fitting-an-ols-model-to-data"]], "Exercise 3 - Scaling data": [[17, "exercise-3-scaling-data"]], "Exercise 3 - Setting up a Python virtual environment": [[15, "exercise-3-setting-up-a-python-virtual-environment"]], "Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters \\boldsymbol{\\theta}": [[18, "exercise-3-using-the-analytical-formulae-for-ols-and-ridge-regression-to-find-the-optimal-paramters-boldsymbol-theta"]], "Exercise 3: Normalizing our data": [[0, "exercise-3-normalizing-our-data"]], "Exercise 4 - Fitting a polynomial": [[16, "exercise-4-fitting-a-polynomial"]], "Exercise 4 - Implementing Ridge Regression": [[17, "exercise-4-implementing-ridge-regression"]], "Exercise 4 - Testing multiple hyperparameters": [[17, "exercise-4-testing-multiple-hyperparameters"]], "Exercise 4 - The train-test split": [[15, "exercise-4-the-train-test-split"]], "Exercise 4, Implementing the simplest form for gradient descent": [[18, "exercise-4-implementing-the-simplest-form-for-gradient-descent"]], "Exercise 4: Adding Ridge Regression": [[0, "exercise-4-adding-ridge-regression"]], "Exercise 5 - Comparing your code with sklearn": [[16, "exercise-5-comparing-your-code-with-sklearn"]], "Exercise 5, Ridge regression and a new Synthetic Dataset": [[18, "exercise-5-ridge-regression-and-a-new-synthetic-dataset"]], "Exercise 5: Analytical exercises": [[0, "exercise-5-analytical-exercises"]], "Exercise: Cross-validation as resampling techniques, adding more complexity": [[6, "exercise-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Exercise: Analysis of real data": [[6, "exercise-analysis-of-real-data"]], "Exercise: Bias-variance trade-off and resampling techniques": [[6, "exercise-bias-variance-trade-off-and-resampling-techniques"]], "Exercise: Lasso Regression on the Franke function with resampling": [[6, "exercise-lasso-regression-on-the-franke-function-with-resampling"]], "Exercise: Ordinary Least Square (OLS) on the Franke function": [[6, "exercise-ordinary-least-square-ols-on-the-franke-function"]], "Exercise: Ridge Regression on the Franke function with resampling": [[6, "exercise-ridge-regression-on-the-franke-function-with-resampling"]], "Exercises": [[0, "exercises"]], "Exercises and Projects": [[6, "exercises-and-projects"]], "Exercises week 34": [[15, null]], "Exercises week 35": [[16, null]], "Exercises week 36": [[17, null]], "Exercises week 37": [[18, null]], "Expectation values": [[23, "expectation-values"]], "Extending to more than one variable": [[28, "extending-to-more-than-one-variable"]], "Extremely useful tools, strongly recommended": [[26, "extremely-useful-tools-strongly-recommended"]], "Feed-forward neural networks": [[12, "feed-forward-neural-networks"]], "Feed-forward pass": [[1, "feed-forward-pass"]], "Final back propagating equation": [[12, "final-back-propagating-equation"]], "Fine-tuning neural network hyperparameters": [[1, "fine-tuning-neural-network-hyperparameters"]], "Fitting an Equation of State for Dense Nuclear Matter": [[0, "fitting-an-equation-of-state-for-dense-nuclear-matter"]], "Fixing the singularity": [[27, "fixing-the-singularity"], [28, "fixing-the-singularity"]], "Format for electronic delivery of report and programs": [[21, "format-for-electronic-delivery-of-report-and-programs"]], "Frequently used scaling functions": [[27, "frequently-used-scaling-functions"]], "From OLS to Ridge and Lasso": [[28, "from-ols-to-ridge-and-lasso"]], "From one to many layers, the universal approximation theorem": [[12, "from-one-to-many-layers-the-universal-approximation-theorem"]], "Functionality in Scikit-Learn": [[27, "functionality-in-scikit-learn"]], "Further Dimensionality Remarks": [[3, "further-dimensionality-remarks"]], "Further properties (important for our analyses later)": [[5, "further-properties-important-for-our-analyses-later"], [27, "further-properties-important-for-our-analyses-later"], [28, "further-properties-important-for-our-analyses-later"]], "Gaussian Elimination": [[20, "gaussian-elimination"]], "General Features": [[9, "general-features"]], "General linear models and linear algebra": [[26, "general-linear-models-and-linear-algebra"]], "Generalizing the fitting procedure as a linear algebra problem": [[26, "generalizing-the-fitting-procedure-as-a-linear-algebra-problem"], [26, "id1"]], "Generative Adversarial Networks": [[4, "generative-adversarial-networks"]], "Generative Models": [[4, "generative-models"]], "Generative Versus Discriminative Modeling": [[26, "generative-versus-discriminative-modeling"]], "Geometric Interpretation and link with Singular Value Decomposition": [[11, "geometric-interpretation-and-link-with-singular-value-decomposition"]], "Gradient Boosting, Classification Example": [[10, "gradient-boosting-classification-example"]], "Gradient Boosting, Examples of Regression": [[10, "gradient-boosting-examples-of-regression"]], "Gradient Clipping": [[1, "gradient-clipping"]], "Gradient Descent Example": [[28, "id1"]], "Gradient boosting: Basics with Steepest Descent/Functional Gradient Descent": [[10, "gradient-boosting-basics-with-steepest-descent-functional-gradient-descent"]], "Gradient descent": [[2, "gradient-descent"]], "Gradient descent and Ridge": [[28, "gradient-descent-and-ridge"]], "Gradient descent example": [[28, "gradient-descent-example"]], "Grading": [[24, "grading"], [24, "id2"], [26, "grading"]], "How to take derivatives of Matrix-Vector expressions": [[16, "how-to-take-derivatives-of-matrix-vector-expressions"]], "Hyperplanes and all that": [[8, "hyperplanes-and-all-that"]], "Important Matrix and vector handling packages": [[20, "important-matrix-and-vector-handling-packages"]], "Important technicalities: More on Rescaling data": [[27, "important-technicalities-more-on-rescaling-data"]], "Improving performance": [[1, "improving-performance"]], "In summary": [[24, "in-summary"]], "Including Stochastic Gradient Descent with Autograd": [[13, "including-stochastic-gradient-descent-with-autograd"]], "Incremental PCA": [[11, "incremental-pca"]], "Installing R, C++, cython or Julia": [[26, "installing-r-c-cython-or-julia"]], "Installing R, C++, cython, Numba etc": [[26, "installing-r-c-cython-numba-etc"]], "Instructor information": [[24, "instructor-information"]], "Interpretations and optimizing our parameters": [[26, "interpretations-and-optimizing-our-parameters"], [26, "id2"], [26, "id3"], [27, "interpretations-and-optimizing-our-parameters"], [27, "id1"], [27, "id2"]], "Interpreting the Ridge results": [[27, "interpreting-the-ridge-results"], [28, "interpreting-the-ridge-results"], [28, "id4"]], "Introducing JAX": [[13, "introducing-jax"]], "Introducing the Covariance and Correlation functions": [[11, "introducing-the-covariance-and-correlation-functions"], [27, "introducing-the-covariance-and-correlation-functions"]], "Introduction": [[0, "introduction"], [6, "introduction"], [19, "introduction"], [20, "introduction"]], "Introduction to numerical projects": [[21, "introduction-to-numerical-projects"]], "Iterative Fitting, Classification and AdaBoost": [[10, "iterative-fitting-classification-and-adaboost"]], "Iterative Fitting, Regression and Squared-error Cost Function": [[10, "iterative-fitting-regression-and-squared-error-cost-function"]], "Kernel PCA": [[11, "kernel-pca"]], "Kernels and non-linearity": [[8, "kernels-and-non-linearity"]], "LU Decomposition, the inverse of a matrix": [[20, "lu-decomposition-the-inverse-of-a-matrix"]], "Lasso Regression": [[28, "lasso-regression"]], "Lasso case": [[28, "lasso-case"]], "Layers": [[1, "layers"]], "Layers used to build CNNs": [[3, "layers-used-to-build-cnns"]], "Learning goals": [[15, "learning-goals"], [16, "learning-goals"], [17, "learning-goals"], [18, "learning-goals"]], "Learning outcomes": [[19, "learning-outcomes"], [26, "learning-outcomes"]], "Lectures and ComputerLab": [[26, "lectures-and-computerlab"]], "Limitations of supervised learning with deep networks": [[1, "limitations-of-supervised-learning-with-deep-networks"]], "Linear Algebra, Handling of Arrays and more Python Features": [[20, null]], "Linear Regression": [[0, null]], "Linear Regression Problems": [[27, "linear-regression-problems"], [28, "linear-regression-problems"]], "Linear Regression and the SVD": [[28, "linear-regression-and-the-svd"]], "Linear Regression, basic elements": [[0, "linear-regression-basic-elements"]], "Linking Bayes\u2019 Theorem with Ridge and Lasso Regression": [[5, "linking-bayes-theorem-with-ridge-and-lasso-regression"]], "Linking the regression analysis with a statistical interpretation": [[5, "linking-the-regression-analysis-with-a-statistical-interpretation"]], "Linking with the SVD": [[5, "linking-with-the-svd"], [27, "linking-with-the-svd"]], "Links to relevant courses at the University of Oslo": [[25, "links-to-relevant-courses-at-the-university-of-oslo"]], "Logistic Regression": [[7, null], [7, "id1"]], "MNIST and GANs": [[4, "mnist-and-gans"]], "Machine Learning": [[26, "machine-learning"]], "Machine learning": [[19, "machine-learning"]], "Main textbooks": [[26, "main-textbooks"]], "Making a tree": [[9, "making-a-tree"]], "Making your own Bootstrap: Changing the Level of the Decision Tree": [[10, "making-your-own-bootstrap-changing-the-level-of-the-decision-tree"]], "Making your own test-train splitting": [[27, "making-your-own-test-train-splitting"]], "Material for exercises week 35": [[27, "material-for-exercises-week-35"]], "Material for lab sessions sessions Tuesday and Wednesday": [[28, "material-for-lab-sessions-sessions-tuesday-and-wednesday"]], "Material for lecture Monday September 2": [[28, "material-for-lecture-monday-september-2"]], "Mathematical Interpretation of Ordinary Least Squares": [[5, "mathematical-interpretation-of-ordinary-least-squares"], [27, "mathematical-interpretation-of-ordinary-least-squares"], [28, "mathematical-interpretation-of-ordinary-least-squares"]], "Mathematical optimization of convex functions": [[8, "mathematical-optimization-of-convex-functions"]], "Mathematics of CNNs": [[3, "mathematics-of-cnns"]], "Mathematics of the SVD and implications": [[5, "mathematics-of-the-svd-and-implications"], [27, "mathematics-of-the-svd-and-implications"], [28, "mathematics-of-the-svd-and-implications"]], "Matrices in Python": [[26, "matrices-in-python"]], "Matrix multiplication": [[1, "matrix-multiplication"]], "Matrix-vector notation and activation": [[12, "matrix-vector-notation-and-activation"]], "Meet the covariance!": [[23, "meet-the-covariance"]], "Meet the Covariance Matrix": [[5, "meet-the-covariance-matrix"], [27, "meet-the-covariance-matrix"]], "Meet the Hessian Matrix": [[27, "meet-the-hessian-matrix"]], "Meet the Pandas": [[26, "meet-the-pandas"]], "Min-Max Scaling": [[27, "min-max-scaling"]], "Momentum based GD": [[13, "momentum-based-gd"]], "More complicated Example: The Ising model": [[6, "more-complicated-example-the-ising-model"]], "More interpretations": [[27, "more-interpretations"], [28, "more-interpretations"], [28, "id5"]], "More on Dimensionalities": [[3, "more-on-dimensionalities"]], "More on Rescaling data": [[6, "more-on-rescaling-data"]], "More on Steepest descent": [[28, "more-on-steepest-descent"]], "More on convex functions": [[28, "more-on-convex-functions"]], "More preprocessing": [[27, "more-preprocessing"]], "Multilayer perceptrons": [[12, "multilayer-perceptrons"]], "Network requirements": [[2, "network-requirements"]], "Neural Networks vs CNNs": [[3, "neural-networks-vs-cnns"]], "Neural networks": [[12, null]], "Note about SVD Calculations": [[27, "note-about-svd-calculations"], [28, "note-about-svd-calculations"]], "Note on Scikit-Learn": [[28, "note-on-scikit-learn"]], "Numerical experiments and the covariance, central limit theorem": [[23, "numerical-experiments-and-the-covariance-central-limit-theorem"]], "Numpy and arrays": [[20, "numpy-and-arrays"], [26, "numpy-and-arrays"]], "Numpy examples and Important Matrix and vector handling packages": [[26, "numpy-examples-and-important-matrix-and-vector-handling-packages"]], "Optimization and gradient descent, the central part of any Machine Learning algortithm": [[28, "optimization-and-gradient-descent-the-central-part-of-any-machine-learning-algortithm"]], "Optimization, the central part of any Machine Learning algortithm": [[13, null]], "Optimizing our parameters": [[26, "optimizing-our-parameters"]], "Optimizing our parameters, more details": [[26, "optimizing-our-parameters-more-details"]], "Optimizing the cost function": [[1, "optimizing-the-cost-function"]], "Organizing our data": [[0, "organizing-our-data"], [26, "organizing-our-data"]], "Other Matrix and Vector Operations": [[20, "other-matrix-and-vector-operations"]], "Other Types of Recurrent Neural Networks": [[4, "other-types-of-recurrent-neural-networks"]], "Other courses on Data science and Machine Learning at UiO": [[26, "other-courses-on-data-science-and-machine-learning-at-uio"]], "Other courses on Data science and Machine Learning at UiO, contn": [[26, "other-courses-on-data-science-and-machine-learning-at-uio-contn"]], "Other popular texts": [[26, "other-popular-texts"]], "Other techniques": [[11, "other-techniques"]], "Other types of networks": [[12, "other-types-of-networks"]], "Other ways of visualizing the trees": [[9, "other-ways-of-visualizing-the-trees"]], "Our model for the nuclear binding energies": [[26, "our-model-for-the-nuclear-binding-energies"]], "Overview of first week": [[26, "overview-of-first-week"]], "Own code for Ordinary Least Squares": [[26, "own-code-for-ordinary-least-squares"], [27, "own-code-for-ordinary-least-squares"]], "PCA and scikit-learn": [[11, "pca-and-scikit-learn"]], "Pandas AI": [[26, "pandas-ai"]], "Part a : Ordinary Least Square (OLS) for the Runge function": [[21, "part-a-ordinary-least-square-ols-for-the-runge-function"]], "Part b: Adding Ridge regression for the Runge function": [[21, "part-b-adding-ridge-regression-for-the-runge-function"]], "Part c: Writing your own gradient descent code": [[21, "part-c-writing-your-own-gradient-descent-code"]], "Part d: Including momentum and more advanced ways to update the learning the rate": [[21, "part-d-including-momentum-and-more-advanced-ways-to-update-the-learning-the-rate"]], "Part e: Writing our own code for Lasso regression": [[21, "part-e-writing-our-own-code-for-lasso-regression"]], "Part f: Stochastic gradient descent": [[21, "part-f-stochastic-gradient-descent"]], "Part g: Bias-variance trade-off and resampling techniques": [[21, "part-g-bias-variance-trade-off-and-resampling-techniques"]], "Part h): Cross-validation as resampling techniques, adding more complexity": [[21, "part-h-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Partial Differential Equations": [[2, "partial-differential-equations"]], "Plans for week 35": [[27, "plans-for-week-35"]], "Plans for week 36": [[28, "plans-for-week-36"]], "Practical tips": [[13, "practical-tips"]], "Practicalities": [[24, "practicalities"], [24, "id1"]], "Preamble: Note on writing reports, using reference material, AI and other tools": [[21, "preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools"]], "Predicting New Points With A Trained Recurrent Neural Network": [[4, "predicting-new-points-with-a-trained-recurrent-neural-network"]], "Preprocessing our data": [[27, "preprocessing-our-data"]], "Prerequisites": [[26, "prerequisites"]], "Prerequisites and background": [[19, "prerequisites-and-background"]], "Prerequisites: Collect and pre-process data": [[3, "prerequisites-collect-and-pre-process-data"]], "Probability Distribution Functions": [[23, "probability-distribution-functions"]], "Program example for gradient descent with Ridge Regression": [[28, "program-example-for-gradient-descent-with-ridge-regression"]], "Program for stochastic gradient": [[13, "program-for-stochastic-gradient"]], "Project 1 on Machine Learning, deadline October 6 (midnight), 2025": [[21, null]], "Properties of PDFs": [[23, "properties-of-pdfs"]], "Pros and cons of trees, pros": [[9, "pros-and-cons-of-trees-pros"]], "Python installers": [[19, "python-installers"], [26, "python-installers"]], "RMS prop": [[13, "rms-prop"]], "Random Numbers": [[23, "random-numbers"]], "Random forests": [[10, "random-forests"]], "Randomized PCA": [[11, "randomized-pca"]], "Reading material": [[26, "reading-material"]], "Reading recommendations:": [[27, "reading-recommendations"]], "Reading suggestions week 34": [[26, "reading-suggestions-week-34"]], "Recurrent neural networks": [[12, "recurrent-neural-networks"]], "Recurrent neural networks: Overarching view": [[4, null]], "Reducing the number of degrees of freedom, overarching view": [[0, "reducing-the-number-of-degrees-of-freedom-overarching-view"], [27, "reducing-the-number-of-degrees-of-freedom-overarching-view"]], "Reformulating the problem": [[2, "reformulating-the-problem"]], "Regression Case": [[10, "regression-case"]], "Regression analysis and resampling methods": [[21, "regression-analysis-and-resampling-methods"]], "Regression analysis, overarching aims": [[26, "regression-analysis-overarching-aims"]], "Regression analysis, overarching aims II": [[26, "regression-analysis-overarching-aims-ii"]], "Regularization": [[1, "regularization"]], "Reminder from last week": [[27, "reminder-from-last-week"]], "Reminder on Newton-Raphson\u2019s method": [[28, "reminder-on-newton-raphson-s-method"]], "Reminder on Statistics": [[6, "reminder-on-statistics"]], "Replace or not": [[13, "replace-or-not"]], "Required Technologies": [[19, "required-technologies"]], "Resampling Methods": [[6, null]], "Resampling methods": [[6, "id1"]], "Residual Error": [[27, "residual-error"], [28, "residual-error"]], "Resources on differential equations and deep learning": [[2, "resources-on-differential-equations-and-deep-learning"]], "Revisiting Ordinary Least Squares": [[28, "revisiting-ordinary-least-squares"]], "Revisiting our Linear Regression Solvers": [[13, "revisiting-our-linear-regression-solvers"]], "Rewriting the Covariance and/or Correlation Matrix": [[27, "rewriting-the-covariance-and-or-correlation-matrix"]], "Rewriting the fitting procedure as a linear algebra problem": [[26, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem"]], "Rewriting the fitting procedure as a linear algebra problem, more details": [[26, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem-more-details"]], "Ridge Regression": [[28, "ridge-regression"]], "Ridge and LASSO Regression": [[27, "ridge-and-lasso-regression"], [28, "ridge-and-lasso-regression"], [28, "id2"]], "Ridge and Lasso Regression": [[5, null], [5, "id1"]], "SVD analysis": [[28, "svd-analysis"]], "Same code but now with momentum gradient descent": [[13, "same-code-but-now-with-momentum-gradient-descent"]], "Schedule first week": [[26, "schedule-first-week"]], "Schematic Regression Procedure": [[9, "schematic-regression-procedure"]], "Setting up the Back propagation algorithm": [[12, "setting-up-the-back-propagation-algorithm"]], "Setting up the Matrix to be inverted": [[27, "setting-up-the-matrix-to-be-inverted"], [28, "setting-up-the-matrix-to-be-inverted"]], "Setting up the network using Autograd; The full program": [[2, "setting-up-the-network-using-autograd-the-full-program"]], "Similar (second order function now) problem but now with AdaGrad": [[13, "similar-second-order-function-now-problem-but-now-with-adagrad"]], "Simple Python Code to read in Data and perform Classification": [[9, "simple-python-code-to-read-in-data-and-perform-classification"]], "Simple case": [[27, "simple-case"], [28, "simple-case"]], "Simple code for solving the above problem": [[28, "simple-code-for-solving-the-above-problem"]], "Simple example to illustrate Ordinary Least Squares, Ridge and Lasso Regression": [[28, "simple-example-to-illustrate-ordinary-least-squares-ridge-and-lasso-regression"]], "Simple geometric interpretation": [[28, "simple-geometric-interpretation"]], "Simple linear regression model using scikit-learn": [[0, "simple-linear-regression-model-using-scikit-learn"], [26, "simple-linear-regression-model-using-scikit-learn"]], "Simple one-dimensional second-order polynomial": [[18, "simple-one-dimensional-second-order-polynomial"]], "Simple program": [[28, "simple-program"]], "Software and needed installations": [[21, "software-and-needed-installations"], [26, "software-and-needed-installations"]], "Solving Differential Equations with Deep Learning": [[2, null]], "Solving the one dimensional Poisson equation": [[2, "solving-the-one-dimensional-poisson-equation"]], "Solving the wave equation with Neural Networks": [[2, "solving-the-wave-equation-with-neural-networks"]], "Some famous Matrices": [[20, "some-famous-matrices"]], "Some simple problems": [[13, "some-simple-problems"], [28, "some-simple-problems"]], "Some useful matrix and vector expressions": [[27, "some-useful-matrix-and-vector-expressions"]], "Splitting our Data in Training and Test data": [[0, "splitting-our-data-in-training-and-test-data"], [27, "splitting-our-data-in-training-and-test-data"]], "Standard steepest descent": [[13, "standard-steepest-descent"]], "Statistical analysis and optimization of data": [[19, "statistical-analysis-and-optimization-of-data"], [26, "statistical-analysis-and-optimization-of-data"]], "Steepest descent": [[13, "steepest-descent"], [28, "steepest-descent"]], "Stochastic Gradient Descent (SGD)": [[13, "stochastic-gradient-descent-sgd"]], "Stochastic variables and the main concepts, the discrete case": [[23, "stochastic-variables-and-the-main-concepts-the-discrete-case"]], "Support Vector Machines, overarching aims": [[8, null]], "Systematic reduction": [[3, "systematic-reduction"]], "Teachers": [[26, "teachers"]], "Teachers and Grading": [[24, null]], "Teaching Assistants Fall semester 2023": [[24, "teaching-assistants-fall-semester-2023"]], "Tentative deadllines for projects": [[24, "tentative-deadllines-for-projects"]], "Testing the Means Squared Error as function of Complexity": [[0, "testing-the-means-squared-error-as-function-of-complexity"], [27, "testing-the-means-squared-error-as-function-of-complexity"]], "Textbooks": [[25, null]], "The Algorithm before theorem": [[11, "the-algorithm-before-theorem"]], "The Breast Cancer Data, now with Keras": [[1, "the-breast-cancer-data-now-with-keras"]], "The CART algorithm for Classification": [[9, "the-cart-algorithm-for-classification"]], "The CART algorithm for Regression": [[9, "the-cart-algorithm-for-regression"]], "The CIFAR01 data set": [[3, "the-cifar01-data-set"]], "The Hessian matrix": [[28, "the-hessian-matrix"]], "The Hessian matrix for Ridge Regression": [[28, "the-hessian-matrix-for-ridge-regression"]], "The Jacobian": [[27, "the-jacobian"]], "The MNIST dataset again": [[3, "the-mnist-dataset-again"]], "The OLS case": [[28, "the-ols-case"]], "The RELU function family": [[1, "the-relu-function-family"]], "The Ridge case": [[28, "the-ridge-case"]], "The SVD, a Fantastic Algorithm": [[27, "the-svd-a-fantastic-algorithm"], [28, "the-svd-a-fantastic-algorithm"]], "The Softmax function": [[1, "the-softmax-function"]], "The \\chi^2 function": [[0, "the-chi-2-function"], [26, "the-chi-2-function"], [26, "id4"], [26, "id5"], [26, "id6"], [26, "id7"], [26, "id8"]], "The bias-variance tradeoff": [[6, "the-bias-variance-tradeoff"]], "The code for solving the ODE": [[2, "the-code-for-solving-the-ode"]], "The complete code with a simple data set": [[27, "the-complete-code-with-a-simple-data-set"]], "The cost/loss function": [[27, "the-cost-loss-function"]], "The course has two central parts": [[19, "the-course-has-two-central-parts"]], "The derivative of the cost/loss function": [[28, "the-derivative-of-the-cost-loss-function"]], "The equations": [[28, "the-equations"]], "The equations for ordinary least squares": [[27, "the-equations-for-ordinary-least-squares"]], "The first Case": [[28, "the-first-case"]], "The ideal": [[28, "the-ideal"]], "The logistic function": [[7, "the-logistic-function"]], "The mean squared error and its derivative": [[27, "the-mean-squared-error-and-its-derivative"]], "The moons example": [[8, "the-moons-example"]], "The multilayer perceptron (MLP)": [[12, "the-multilayer-perceptron-mlp"]], "The network with one input layer, specified number of hidden layers, and one output layer": [[2, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"]], "The plethora of machine learning algorithms/methods": [[26, "the-plethora-of-machine-learning-algorithms-methods"]], "The sensitiveness of the gradient descent": [[28, "the-sensitiveness-of-the-gradient-descent"]], "The singular value decomposition": [[5, "the-singular-value-decomposition"], [27, "the-singular-value-decomposition"], [28, "the-singular-value-decomposition"]], "The two-dimensional case": [[8, "the-two-dimensional-case"]], "To our real data: nuclear binding energies. Brief reminder on masses and binding energies": [[26, "to-our-real-data-nuclear-binding-energies-brief-reminder-on-masses-and-binding-energies"]], "Topics covered in this course: Statistical analysis and optimization of data": [[26, "topics-covered-in-this-course-statistical-analysis-and-optimization-of-data"]], "Towards the PCA theorem": [[11, "towards-the-pca-theorem"]], "Train and test datasets": [[1, "train-and-test-datasets"]], "Two-dimensional Objects": [[3, "two-dimensional-objects"]], "Type of problem": [[2, "type-of-problem"]], "Types of Machine Learning": [[26, "types-of-machine-learning"]], "Useful Python libraries": [[19, "useful-python-libraries"], [26, "useful-python-libraries"]], "Using Autograd": [[13, "using-autograd"]], "Using forward Euler to solve the ODE": [[2, "using-forward-euler-to-solve-the-ode"]], "Using gradient descent methods, limitations": [[13, "using-gradient-descent-methods-limitations"], [28, "using-gradient-descent-methods-limitations"]], "Visualization": [[1, "visualization"], [1, "id1"]], "Visualizing the Tree, Classification": [[9, "visualizing-the-tree-classification"]], "Week 34: Introduction to the course, Logistics and Practicalities": [[26, null]], "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression": [[27, null]], "Week 36: Linear Regression and Gradient descent": [[28, null]], "What Is Generative Modeling?": [[26, "what-is-generative-modeling"]], "What does it mean?": [[27, "what-does-it-mean"], [28, "what-does-it-mean"]], "What is Machine Learning?": [[0, "what-is-machine-learning"]], "What is a good model?": [[0, "what-is-a-good-model"], [26, "what-is-a-good-model"]], "What is a good model? Can we define it?": [[26, "what-is-a-good-model-can-we-define-it"]], "Which activation function should I use?": [[1, "which-activation-function-should-i-use"]], "Why Linear Regression (aka Ordinary Least Squares and family)": [[26, "why-linear-regression-aka-ordinary-least-squares-and-family"]], "Wisconsin Cancer Data": [[7, "wisconsin-cancer-data"]], "With Lasso Regression": [[28, "with-lasso-regression"]], "Writing Our First Generative Adversarial Network": [[4, "writing-our-first-generative-adversarial-network"]], "Writing our own PCA code": [[11, "writing-our-own-pca-code"]], "Writing the Cost Function": [[28, "writing-the-cost-function"]], "XGBoost: Extreme Gradient Boosting": [[10, "xgboost-extreme-gradient-boosting"]], "Yet another Example": [[28, "yet-another-example"]], "a) Expression for Ridge regression": [[17, "a-expression-for-ridge-regression"]], "scikit-learn implementation": [[1, "scikit-learn-implementation"]]}, "docnames": ["chapter1", "chapter10", "chapter11", "chapter12", "chapter13", "chapter2", "chapter3", "chapter4", "chapter5", "chapter6", "chapter7", "chapter8", "chapter9", "chapteroptimization", "clustering", "exercisesweek34", "exercisesweek35", "exercisesweek36", "exercisesweek37", "intro", "linalg", "project1", "schedule", "statistics", "teachers", "textbooks", "week34", "week35", "week36"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["chapter1.ipynb", "chapter10.ipynb", "chapter11.ipynb", "chapter12.ipynb", "chapter13.ipynb", "chapter2.ipynb", "chapter3.ipynb", "chapter4.ipynb", "chapter5.ipynb", "chapter6.ipynb", "chapter7.ipynb", "chapter8.ipynb", "chapter9.ipynb", "chapteroptimization.ipynb", "clustering.ipynb", "exercisesweek34.ipynb", "exercisesweek35.ipynb", "exercisesweek36.ipynb", "exercisesweek37.ipynb", "intro.md", "linalg.ipynb", "project1.ipynb", "schedule.md", "statistics.ipynb", "teachers.md", "textbooks.md", "week34.ipynb", "week35.ipynb", "week36.ipynb"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 21, 23, 24, 26, 27], "0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 24, 26, 27, 28], "00": [0, 1, 5, 11, 26, 27], "000": [1, 3], "00000000e": [], "001": [2, 8, 13, 28], "004": 5, "004113634617443131": 27, "004113634617443139": 27, "00411363461744314": 27, "004113634617443147": 27, "00727646693": [0, 26], "0086649156": [0, 26], "01": [0, 1, 2, 5, 9, 11, 13, 17, 25, 26, 27], "0110": 23, "01719003e": [], "02": [0, 4, 7, 12, 26], "02334824": [], "02857": 4, "02f": 6, "03077640549": 4, "03097597e": [], "031": 5, "04": 11, "0458": 9, "05": [4, 6], "062292565": 4, "062435": [], "06730814": [], "07": [], "0713": [0, 26], "07285": 3, "08": 23, "08078025e": [], "08336233266": 4, "08376632": 27, "083766322923899": 27, "0837663229239043": 27, "0917": 9, "0n": [0, 26], "0x113e21950": 17, "1": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 24, 25, 26, 28], "10": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 20, 22, 23, 24, 26, 27, 28], "100": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 23, 24, 26, 27, 28], "1000": [0, 1, 2, 4, 5, 8, 11, 13, 14, 18, 19, 23, 26, 28], "10000": [2, 5, 6, 10, 11, 13, 23], "100000": 8, "10001": 10, "1001": 23, "1002": 23, "1003": 23, "1005": 23, "1009": 23, "101": 16, "1011": 23, "1013": 23, "1013904243": 23, "1015": 23, "102": 16, "1023": 23, "1024": 3, "1026": 23, "1027": 23, "103": 1, "1030": 23, "1037": 23, "1038": 23, "1040": 23, "1047": 23, "107": 16, "108": [], "10th": 9, "10x": [0, 26], "11": [0, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 20, 21, 23, 25, 26, 27, 28], "110": [], "1100": 23, "1101": 23, "111": [1, 7, 12], "112": 16, "11340253": [], "11590451": [], "116": 16, "117": 16, "118": 16, "12": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 18, 20, 23, 25, 26, 27, 28], "120": 3, "121": [8, 9, 10, 16], "1215pm": [24, 26], "122": [8, 9, 10], "124": [0, 26], "125": 16, "127": [4, 16], "128": [3, 4, 13], "129": 16, "1298": 9, "12pm": [24, 26], "13": [0, 2, 9, 12, 20, 23, 26], "131": 16, "133": 7, "135": 16, "136": 16, "14": [0, 2, 4, 6, 8, 9, 10, 12, 20, 23, 25, 27], "141": 16, "143": 16, "1446729567": 4, "149": 16, "14g": 6, "15": [0, 2, 4, 6, 7, 8, 9, 12, 13, 21, 23, 26, 28], "150": [4, 8], "152": 16, "153760": [], "156": 16, "157": [], "158": [], "159": 16, "15g": 6, "15pm": 26, "16": [1, 2, 3, 4, 5, 8, 9, 10, 23, 26, 28], "160": 16, "1603": 3, "161": 16, "162": 16, "16231451": 4, "163": 16, "16384": 3, "164": 16, "167": 16, "17": [1, 2, 8, 23], "172": 16, "173": 16, "176": 16, "178": 16, "179": 16, "1797": 1, "18": [2, 6, 7, 8, 9, 10, 23, 26], "1807": 4, "18392847": [], "19": [2, 23, 26], "1940": [], "1943": 12, "19569961": 27, "1970": [20, 26], "1973": 9, "1979": 6, "1_1": 12, "1_2": 12, "1_3": 12, "1cm": [0, 8, 10, 23, 26], "1d": [1, 2, 3], "1e": [2, 4, 13, 14], "1e10": 14, "1e4": 6, "1f": 1, "1k": 20, "1n": [0, 26], "1x": [0, 26], "2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 23, 25], "20": [0, 1, 2, 6, 7, 8, 16, 17, 23, 24, 26, 27, 28], "200": [0, 2, 3, 4, 8, 9, 10], "2000": [0, 27], "2004": [13, 28], "2006": 25, "20072279": [], "2008": 26, "2010": 1, "2011": 1, "2014": 4, "2015": 1, "2016": [0, 26], "2018": [0, 6, 27], "2021": [6, 14, 27], "2022": 26, "2025": [18, 26, 27, 28], "21": [0, 1, 5, 7, 9, 12, 20, 26, 27, 28], "2116753732": 4, "215pm": [24, 26], "2167072": [], "22": [0, 1, 5, 12, 13, 20, 26, 27, 28], "221": 8, "225": 4, "22948497": [], "23": [1, 12, 20], "24": [0, 1, 20, 26], "25": [2, 3, 4, 5, 6, 8, 9, 11, 27], "250": [2, 4, 7, 9], "25000": [], "250154": [], "253775": [], "255": 3, "256": 4, "25x": 21, "26": [], "26303845": [], "264": [], "265": [], "265109911": 4, "266": [], "269": [], "27": 1, "270": [], "278": 28, "27n_": 23, "28": [1, 3, 4], "283": 28, "2830637392": 4, "2861": 23, "2873": 9, "2882": 23, "2886": 23, "2890": [0, 26], "2892": 23, "29": 27, "2915": 23, "2931": 26, "29364655": [], "294399745619595": [], "296247": [], "2968": 26, "2980": 26, "298273": [], "298375": [], "2990": 26, "2_": 12, "2_1": 12, "2_2": 12, "2_3": 12, "2_i": 12, "2_m": [6, 23], "2_t": 13, "2_x": 23, "2a": 17, "2b": 23, "2cm": 8, "2d": [1, 3, 11, 12, 19, 26], "2e": 6, "2f": [0, 7, 9, 10, 11, 12, 26], "2g": 2, "2g_i": 2, "2k": 3, "2m": 6, "2mvizaqfst8": 27, "2n": [0, 2, 3, 26, 27], "2nd": 9, "2p": 23, "2pt": 4, "2x": [0, 3, 8, 13, 26], "2x_ix_jy_iy_j": 8, "2x_j": 8, "2y_i": 10, "2y_j": 8, "3": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 26, 28], "30": [0, 1, 4, 6, 7, 10, 13, 24], "30000": [0, 26], "3072": 3, "31": [12, 20, 23], "315": [6, 27], "3155": [0, 5, 6, 27, 28], "32": [3, 4, 6, 12, 13, 20, 23], "3200": 1, "3250": 1, "3297": [], "33": [12, 20, 24], "3303": [], "3310": [], "332331": [], "333": 7, "3331": [], "3337": [], "34": 20, "3436": [0, 26], "3437": [0, 26], "35": [0, 6, 21, 26, 28], "3581341341": 4, "359": [5, 28], "36": [0, 5, 6, 18, 21, 23], "37": [21, 28], "370782966": 4, "38": [21, 23], "39": [0, 24, 26], "3d": [2, 3, 4, 6, 13, 16], "3f": [1, 3, 9], "3n": 20, "3x": [2, 8], "3x_i": 2, "3y": 8, "4": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 21, 23, 26, 28], "40": [1, 6, 24, 26], "400": 4, "4000": 26, "4050": [25, 26], "41": 20, "4155": [2, 15], "41589548": [], "42": [1, 4, 8, 9, 10, 20], "43": [0, 7, 20], "4310": 26, "436462435": 4, "44": [0, 20, 28], "45": [24, 26], "46": [24, 26], "462": 7, "47": [24, 26], "479465113": 4, "47958494": [], "48": [], "48257387": [24, 26], "49": [5, 6, 11], "49152": 3, "4940954": [0, 26], "4990": 23, "4992": 23, "4997": 23, "4c4c7f": [9, 10], "4d": 3, "4f": 6, "4pm": [24, 26], "4y": 8, "4y_i": 10, "5": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, 21, 23, 26, 27, 28], "50": [1, 2, 3, 4, 6, 7, 8, 10, 13, 26, 27], "500": [1, 3, 4, 6, 9, 10, 13], "5018": 23, "506": [], "507d50": [9, 10], "50j": 13, "50x10": 1, "51": 10, "510": 1, "512132": [], "5177783846": 4, "53": 9, "54": [6, 23], "5411205": [], "54894451": [], "55": 1, "56": 1, "56536": [0, 26], "569": 1, "57": [0, 8, 24, 26], "571": [5, 28], "58": [10, 24, 26], "591317992": 4, "5cm": 23, "5f": 8, "5x": [8, 18], "5y": 8, "6": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 18, 20, 23, 24, 26, 27, 28], "60": [1, 3], "60000": 4, "6019067271": 4, "606439": [], "625": 7, "63": 1, "64": [1, 3, 4, 13, 20, 26], "64x50": 1, "65": [1, 8, 9], "6887363571": 4, "69": [16, 23], "69069n_": 23, "691": [], "6n_": 23, "7": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 20, 21, 23, 25, 26, 27], "70": [1, 7], "70653767": 4, "71": 1, "724": 3, "73": [], "7304881": [], "75": [5, 6, 8, 11], "76": [24, 26], "765": 7, "77": [24, 26], "7718": 9, "7782028952": 4, "77893972": [], "78": [], "7d7d58": [9, 10], "8": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 20, 23, 24, 26, 28], "80": [0, 1, 5, 8, 17, 27], "800": [4, 7], "81": 1, "815am": [24, 26], "85": 1, "8702784034": 4, "88": 26, "8f": 6, "8g": 6, "8n": 20, "8x8": 1, "9": [0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 20, 23, 26], "90": 1, "9040": 9, "91": [24, 26], "92": [24, 26], "93": 16, "931": [0, 26], "933": [5, 28], "937": 23, "938": 23, "939": [0, 23, 26], "94": 23, "95": [1, 11], "954": 23, "955820c21e8b": 4, "96": 6, "960": 23, "961": 23, "962": 23, "9649652536": 4, "96611194e": [], "9780387310732": 25, "9780387848570": 25, "9781098134174": 26, "9781492032632": 25, "9781801819312": 26, "97898392": 27, "98": [0, 1, 16], "985": 23, "986": 23, "989": 23, "9898ff": [9, 10], "99": [13, 16], "991": 23, "992": 23, "993": 23, "996": 5, "999": [9, 23], "9x": 6, "9y": 6, "A": [2, 3, 5, 6, 7, 10, 11, 12, 13, 15, 16, 19, 20, 22, 23, 24, 25, 27, 28], "AND": 2, "And": [0, 3, 4, 5, 6, 9, 13, 19, 21, 23, 28], "As": [0, 1, 2, 3, 4, 5, 6, 8, 10, 12, 13, 15, 16, 20, 21, 23, 26, 27, 28], "At": [0, 4, 6, 13, 26], "BE": [0, 26], "Be": [2, 18, 19, 26], "Being": 13, "But": [0, 1, 2, 3, 5, 6, 9, 10, 16, 23, 27], "By": [0, 3, 5, 6, 12, 13, 17, 20, 26, 27, 28], "For": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 20, 21, 23, 25, 26, 27, 28], "IF": [6, 27], "IN": 25, "If": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 26, 27, 28], "In": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 25, 26, 27, 28], "Ising": [5, 12, 27, 28], "It": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 20, 21, 23, 26, 27, 28], "Its": [1, 2, 4, 11], "No": [6, 9, 26, 27], "Not": [0, 1, 5, 6, 27, 28], "OR": 23, "Of": 23, "On": [0, 3, 21, 23, 24, 25, 26], "One": [0, 1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 17, 23, 27, 28], "Or": [0, 1, 6, 26], "Such": [0, 6, 12, 16, 23], "That": [0, 5, 7, 10, 11, 12, 14, 21, 23, 26], "The": [4, 10, 13, 14, 16, 17, 18, 20, 21, 22, 23, 24, 25], "Then": [0, 1, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 20, 26, 28], "There": [0, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 20, 21, 23, 24, 26, 27, 28], "These": [0, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 17, 20, 21, 23, 24, 26, 27, 28], "To": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 20, 23, 27, 28], "With": [0, 5, 6, 8, 9, 10, 11, 12, 14, 16, 20, 21, 23, 26, 27], "_": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 20, 21, 26, 27, 28], "_0": [5, 8, 10, 11, 13, 27, 28], "_1": [2, 5, 6, 8, 10, 11, 12, 13, 14, 20, 27, 28], "_2": [2, 5, 8, 11, 12, 13, 20, 27], "_3": 20, "_4": 20, "_9": 13, "__class__": 10, "__doc__": 6, "__future__": [8, 9], "__init__": 1, "__main__": 2, "__name__": [2, 10], "_auto1": [2, 3, 4, 5, 6, 7, 12, 13, 20, 23, 27, 28], "_auto10": [6, 12], "_auto11": 6, "_auto12": 6, "_auto2": [2, 3, 4, 5, 6, 12, 13, 20, 23], "_auto3": [3, 4, 5, 6, 12, 13, 20], "_auto4": [4, 6, 12, 13, 20], "_auto5": [4, 6, 12, 13, 20], "_auto6": [4, 6, 12, 20], "_auto7": [4, 6, 12, 20], "_auto8": [6, 12], "_auto9": [6, 12], "_build": [0, 19, 21, 25, 26], "_c": 1, "_center": [], "_compon": 11, "_depth": 9, "_export": [15, 16], "_fraction": 9, "_i": [0, 1, 2, 5, 6, 7, 8, 11, 12, 13, 21, 26, 27, 28], "_j": [0, 1, 2, 3, 5, 6, 8, 13, 21, 27, 28], "_k": [13, 28], "_l": 12, "_lambda": 6, "_leaf": 9, "_m": 10, "_multilayer_perceptron": [], "_n": [2, 5, 8, 11, 13, 27, 28], "_node": 9, "_norm": [], "_p": [5, 8, 27, 28], "_ratio": 11, "_sampl": 9, "_split": [6, 9, 21], "_t": 13, "_test": [6, 21], "_varianc": 11, "_weight": 9, "a0": 3, "a0faa0": [9, 10], "a1": [0, 26], "a2": [0, 26], "a3": [0, 26], "a4": [0, 26], "a_": [0, 1, 16, 20, 26, 27], "a_0": [0, 26], "a_1a": [0, 26], "a_2a": [0, 26], "a_3": [0, 26], "a_3a": [0, 26], "a_4": [0, 26], "a_4a": [0, 26], "a_h": 1, "a_i": [0, 1, 2, 12, 26], "a_j": [1, 12], "a_k": [0, 1, 12], "aaron": 25, "ab": [0, 2, 5, 13, 14, 26, 27], "ab_channel": 19, "abandon": 1, "abid": 23, "abil": [0, 10], "abl": [0, 1, 4, 5, 6, 7, 10, 12, 13, 16, 18, 21, 27, 28], "about": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 19, 20, 21, 24], "abov": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 23, 25, 26, 27], "abovement": [6, 21, 26], "abscissa": [13, 28], "absolut": [0, 2, 5, 6, 13, 26, 27, 28], "absorb": [27, 28], "abstract": 1, "acceler": 13, "accept": [0, 3, 6, 9, 21, 27], "access": [3, 11, 23, 26], "accid": [4, 6], "accompani": [0, 26, 27], "accomplish": [8, 9, 13], "accord": [0, 1, 2, 5, 6, 9, 12, 13, 14, 23, 26, 28], "accordingli": 11, "account": [0, 3, 5, 13, 15, 16, 23, 26], "accumul": [12, 13, 23], "accur": [0, 3, 4, 6, 10, 13], "accuraci": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 26, 27, 28], "accuracy_scor": [0, 1, 10, 26], "accuracy_score_numpi": 1, "achiev": [0, 1, 5, 6, 8, 12, 20, 26], "aco": 23, "acquaint": 19, "acquir": [1, 19, 26], "acr": [], "across": [1, 3, 6, 9, 17, 19, 26], "act": [1, 3, 20], "action": 23, "activ": [0, 2, 3, 4, 9, 15, 22, 24, 26], "actual": [0, 1, 4, 5, 6, 8, 11, 15, 16, 18, 20, 23, 26, 27, 28], "ad": [1, 3, 4, 5, 8, 13, 15, 16, 20, 28], "ada_clf": 10, "adaboostclassifi": 10, "adadelta": 13, "adagrad": 21, "adam": [1, 3, 4, 21, 26], "adapt": [4, 6, 13, 17, 25, 28], "add": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 15, 16, 17, 18, 23, 24, 26, 27, 28], "add_subplot": [1, 7, 12, 14], "addendum": 5, "addit": [0, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15, 19, 20, 21, 23, 24, 25, 26, 27], "addition": [12, 13, 28], "address": [1, 9, 11, 13, 26], "adjac": [3, 12], "adjoint": [5, 27], "adjust": [0, 5, 12, 13, 28], "admir": [0, 26], "advanc": [4, 6, 12, 25, 26], "advantag": [1, 3, 5, 6, 10, 13, 20, 28], "adversari": 26, "afecionado": 26, "affect": [3, 15], "affin": [0, 3, 8, 11, 27], "afford": 3, "aficionado": 26, "aforement": 14, "african": [], "after": [0, 1, 2, 4, 5, 6, 9, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "afterward": [0, 26], "ag": [0, 7, 26, 27], "ag_0": 2, "again": [0, 1, 4, 5, 6, 7, 8, 10, 11, 12, 13, 21, 23, 26, 27, 28], "against": [1, 4, 7, 10], "agegroup": 7, "agegroupmean": 7, "aggreg": [9, 10], "agorithm": 10, "agre": [5, 6, 23, 27, 28], "agreement": 13, "ahead": 9, "ai": [0, 25], "aid": 11, "aim": [0, 1, 4, 6, 7, 11, 14, 16, 17, 19, 20, 21, 27], "ainv": 5, "airplan": 3, "aka": 5, "al": [0, 2, 4, 16, 17, 25, 26, 27, 28], "alarm": [5, 7], "aldo": 27, "algebra": [0, 3, 5, 13, 19, 27, 28], "algorithm": [0, 1, 2, 4, 5, 6, 7, 8, 13, 14, 16, 19, 20, 21, 23, 25], "align": [0, 2, 5, 6, 7, 8, 13, 23, 26, 27, 28], "all": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], "allevi": [1, 13, 28], "alloc": [3, 20], "allow": [0, 1, 2, 3, 5, 6, 8, 10, 13, 15, 19, 20, 21, 26, 27, 28], "almost": [0, 1, 6, 8, 11, 13, 23, 28], "alon": [2, 9], "along": [2, 3, 4, 5, 6, 9, 10, 11, 15, 19, 20, 26, 27, 28], "alpha": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 23, 26, 27, 28], "alpha_": 10, "alpha_0": 3, "alpha_1": 3, "alpha_2": 3, "alpha_i": [3, 13], "alpha_k": 13, "alpha_m": 10, "alpha_n": 3, "alpha_opt": 13, "alreadi": [2, 3, 4, 5, 6, 10, 12, 15, 19, 20, 23, 26, 27, 28], "also": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 26, 27, 28], "alter": 1, "altern": [0, 1, 4, 5, 6, 8, 9, 11, 13, 15, 18, 20, 21, 26, 27], "although": [0, 1, 5, 6, 8, 10, 13, 16, 26], "alwai": [0, 3, 5, 6, 12, 13, 16, 21, 23, 26, 27, 28], "am": 4, "ame2016": [0, 26], "american": [], "among": [0, 3, 5, 9, 10, 12, 20, 26, 27], "amongst": 5, "amount": [0, 1, 3, 4, 6, 8, 10, 14, 19], "an": [1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28], "an_": 23, "anaconda": [0, 1, 19, 21, 26], "analogi": 13, "analys": 6, "analysi": [1, 3, 4, 7, 14, 20, 25], "analyt": [2, 3, 5, 6, 7, 12, 13, 17, 19, 21, 26, 27, 28], "analyz": [0, 1, 3, 4, 5, 6, 16, 21, 23, 27, 28], "andrew": 1, "angl": [0, 3, 9, 27], "anharmon": 3, "ani": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 23, 26, 27], "anim": [4, 12], "ann": 12, "annot": [0, 1, 3, 7, 8, 26], "announc": 26, "anoth": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 20, 21, 23, 26, 27], "ansatz": [0, 18, 26], "answer": [0, 1, 3, 5, 6, 20, 21, 24, 26], "antialias": [2, 6], "anticip": 4, "anymor": [1, 8], "anyon": [4, 8, 15], "anyth": [1, 15, 16, 23], "anytim": [24, 26], "apach": 1, "apart": [11, 13, 28], "api": [1, 19, 26], "appar": 2, "appear": [0, 1, 3, 13, 20, 23], "append": [1, 3, 4, 8, 9, 13, 26], "appendix": 21, "appli": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 18, 21, 23, 25, 26, 27], "applic": [0, 1, 3, 4, 5, 6, 7, 9, 12, 13, 16, 20, 23, 25, 26, 27, 28], "apply_gradi": 4, "approach": [1, 2, 4, 5, 6, 9, 10, 11, 12, 13, 15, 16, 18, 19, 21, 23, 25, 27, 28], "approch": 21, "appropri": [2, 6, 9, 12, 13, 17, 19, 23], "approv": 26, "approx": [0, 2, 3, 6, 10, 11, 13, 18, 21, 23, 26, 28], "approxim": [0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 13, 21, 23, 26, 27, 28], "apt": [0, 19, 21, 26], "aq": 23, "ar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], "aragorn": 26, "arang": [1, 3, 4, 6, 7, 9, 10, 12, 13, 26], "arbitrari": [1, 4, 6, 8, 12, 13, 23, 28], "arbitrarili": [0, 1, 11, 26], "arc": 6, "architectur": [3, 4, 12], "archiv": 21, "area": [0, 3, 6, 25, 26], "argmax": [1, 11], "argmin": [4, 10, 14], "argsort": 11, "argu": [1, 13], "argument": [0, 2, 3, 5, 11, 12, 13, 17, 26, 27], "aris": [0, 6, 12, 13, 23, 26, 28], "arithmet": [0, 13, 20, 26], "arm": [6, 27], "armadillo": 20, "around": [0, 1, 4, 5, 6, 11, 18, 21, 23, 26], "arrai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 16, 18, 19, 21, 23, 27, 28], "arrang": [3, 26], "arraybox": 13, "arriv": [0, 6, 9, 11, 20, 23, 26], "arrow": 12, "arrowprop": 8, "art": [0, 1, 19], "articl": [0, 3, 4, 6, 10, 26, 27, 28], "artifici": [0, 2, 7, 12, 25, 26], "artificialneuron": 12, "arug": 13, "arxiv": [3, 4], "asarrai": [0, 6, 9, 27], "asid": 27, "ask": [5, 6, 11, 12, 15, 21], "aspect": [0, 6, 19, 26, 27], "assembl": 3, "assembli": [0, 26], "assert": 4, "assess": [0, 6, 21, 26, 27], "assici": 4, "assign": [0, 7, 8, 9, 12, 13, 14, 15, 22, 24, 25, 26], "associ": [0, 6, 9, 12, 14, 23, 26], "assum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 20, 21, 23, 26, 27, 28], "assumpt": [0, 3, 5, 6, 9, 11, 23, 26, 27], "ast": [0, 5, 6, 26], "astyp": [4, 9, 10], "asymmetri": [0, 26], "asymptot": [4, 6], "atom": [0, 26], "attempt": [0, 4, 6, 7, 8, 10, 26, 27], "attend": 26, "attent": [0, 20, 26], "attract": [0, 10, 26], "attribut": [0, 9, 26], "audi": [0, 26], "audio": [3, 4], "august": [26, 27], "aurelien": [0, 25, 26], "austfjel": 6, "auth": 15, "authent": 15, "author": [0, 1, 10, 23], "authour": 26, "auto": [9, 10, 23], "auto_exampl": [21, 27], "autocor": 23, "autocorrelation_tim": 23, "autocorrelform": 23, "autocovari": 23, "autoencod": [4, 19, 26], "autoencond": 19, "autograd": [19, 26], "autom": [0, 19, 25, 26], "automac": 20, "automag": 26, "automat": [0, 1, 2, 3, 4, 11, 16, 19, 20, 26], "automobil": 3, "autonom": 4, "avail": [0, 1, 4, 6, 10, 11, 19, 20, 21, 22, 24, 25, 26], "averag": [0, 1, 3, 6, 9, 10, 13, 14, 23, 24, 26, 27], "avoid": [0, 4, 5, 6, 9, 11, 13, 18, 20, 27], "awai": [2, 3, 6, 27], "awar": [2, 10], "award": [24, 26], "ax": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 21, 26], "axes3d": [2, 6, 13, 28], "axes_grid1": 6, "axhlin": 8, "axi": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 20, 23, 26, 27, 28], "axiom": 5, "axvlin": [4, 8], "axvspan": 4, "b": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 23, 24, 26, 27, 28], "b1": 8, "b2": 8, "b3": 8, "b_": [0, 1, 20], "b_0": 0, "b_1": [0, 2, 12, 13], "b_2": [0, 13], "b_5": 13, "b_group": 9, "b_i": [0, 1, 2, 12, 26], "b_ia_": [0, 26], "b_ia_i": 0, "b_index": 9, "b_j": [1, 12], "b_k": [0, 1, 12, 13], "b_m": 12, "b_score": 9, "b_valu": 9, "babcock": 26, "bachelor": [22, 24], "back": [0, 3, 4, 5, 6, 8, 9, 10, 15, 16, 20, 23, 26], "backbon": 20, "backend": [1, 4], "background": [25, 26], "backpropag": 1, "backtrack": 9, "backup": 20, "backward": [1, 2, 4, 12, 20], "bad": [6, 17, 27], "badli": 23, "bag": [9, 19, 26], "bag_clf": 10, "baggin": 26, "baggingboot": 10, "baggingclassifi": 10, "baggingtre": 10, "balanc": 6, "ballpark": 18, "band": 20, "bandwidth": 20, "bar": [0, 6, 11, 21, 26], "barber": 25, "bare": [4, 10], "base": [0, 1, 3, 4, 5, 7, 8, 9, 10, 14, 15, 16, 17, 19, 23, 24, 25, 26, 27, 28], "basi": [5, 7, 8, 10, 11, 12, 13, 20, 27, 28], "basic": [6, 8, 12, 13, 14, 15, 19, 21, 23, 26], "batch": [3, 4, 11, 12, 13, 28], "batch_shap": 4, "batch_siz": [1, 3, 4], "batchnorm": 4, "bay": 7, "bayesian": [5, 19, 25, 26], "becaus": [0, 1, 2, 3, 4, 5, 6, 8, 9, 12, 13, 14, 26, 27, 28], "becom": [0, 1, 2, 5, 6, 7, 9, 12, 13, 23, 26, 27, 28], "been": [0, 1, 2, 3, 4, 5, 6, 11, 12, 13, 19, 20, 21, 26, 27], "befor": [0, 1, 2, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 17, 18, 20, 21, 23, 26, 27], "beforehand": [0, 23, 26], "begin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 20, 23, 24, 26, 27, 28], "behav": [1, 6, 13, 28], "behavior": [0, 1, 13, 26, 28], "behaviour": 12, "behind": [0, 1, 6, 8, 13, 26, 28], "being": [0, 1, 2, 3, 4, 5, 7, 8, 10, 11, 12, 13, 17, 23, 26, 27, 28], "believ": [9, 20], "belong": [7, 8, 9, 13, 14, 28], "below": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 18, 20, 21, 23, 26, 27, 28], "benchmark": 10, "benefici": [1, 13], "benefit": [0, 1, 4, 11, 13, 19, 26, 28], "bengio": [1, 25, 26, 27], "benign": [1, 7], "besid": [4, 5, 28], "bessel": [5, 27], "best": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 18, 24, 26, 27, 28], "beta": [1, 3, 10, 11, 13, 16, 17, 26, 27, 28], "beta_": [3, 13, 17, 27], "beta_0": [1, 3, 13, 27], "beta_1": [1, 3, 10, 13, 27], "beta_1x_i": 13, "beta_2": [3, 13], "beta_3": 3, "beta_i": 3, "beta_j": [13, 27], "beta_k": 13, "beta_linreg": 13, "beta_m": 10, "beta_mg_m": 10, "beta_n": 3, "better": [0, 1, 2, 3, 4, 6, 9, 10, 11, 12, 13, 26, 27], "between": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 21, 23, 26, 27, 28], "beyond": [0, 1, 5, 6, 8, 13, 26, 27, 28], "bf": [13, 14, 20, 23, 28], "bg": 26, "bgd": 13, "bia": [0, 1, 2, 3, 5, 8, 9, 10, 12, 13, 26, 27, 28], "bias": [1, 2, 3, 5, 6, 9, 12], "bibliographi": 21, "big": [0, 1, 2, 5, 6, 14], "bigger": [1, 6, 27], "bigr": 12, "bike": 9, "bilbo": 26, "billion": [3, 12, 19], "bin": [7, 23], "binari": [0, 3, 5, 7, 9, 10, 12, 26], "binarycrossentropi": 4, "bind": 0, "binomi": [19, 23, 26], "binsboot": 6, "bioinformat": 0, "biolog": [1, 12], "bios1100": [19, 26], "bird": [0, 3], "birth": 26, "bishop": [25, 26], "bit": [1, 4, 20, 23, 26], "bitwis": 23, "bivari": 2, "bk": 13, "bla": [20, 26], "black": [8, 9, 14], "block": [6, 10, 19, 20, 23, 26], "blog": 26, "blogpost": 4, "blue": [0, 3], "bm": [], "bmatrix": [0, 1, 3, 5, 7, 8, 11, 13, 20, 26, 27, 28], "bmi": 1, "bodi": [0, 1, 4, 12], "bold": 1, "boldfac": [0, 5, 16, 27, 28], "boldsymbol": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 21, 26, 28], "boltzmann": [12, 19, 26], "book": [17, 21, 25, 26, 27], "book1": 25, "boolean": [4, 17], "boost": [1, 9, 19, 26], "boostrap": 10, "bootstrap": [1, 13, 19, 21, 26], "borrow": 26, "boston_dataset": [], "bot": 8, "both": [0, 1, 4, 5, 6, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 27, 28], "bottl": 7, "bound": [8, 12], "boundari": [2, 4, 8, 11, 12], "box": [4, 9], "boyd": [8, 13, 28], "bracket": [4, 23], "brain": [1, 7, 12], "branch": [9, 26], "break": [0, 4, 6, 11, 14, 26], "breast": [5, 7, 11], "breviti": 13, "brew": [0, 19, 21, 26], "brg": 8, "brief": [21, 27], "briefli": [0, 16, 26], "bring": [0, 5, 6, 10, 27], "britt": [24, 26], "broad": 0, "broadli": 26, "brought": [13, 19, 26], "brownle": 4, "browser": [15, 26], "brute": [3, 5, 11, 27], "buffer_s": 4, "bui": 4, "build": [0, 4, 5, 6, 10, 16, 20, 23, 26], "built": [1, 3, 4, 6], "bunch": 11, "busi": [], "byte": [20, 26], "c": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 22, 23, 24, 25, 27, 28], "c1": [8, 11], "c2": [8, 11], "c_": [8, 9, 10, 13, 23, 28], "c_0": 23, "c_1": 12, "c_2": 12, "c_3": 12, "c_4": 12, "c_i": [12, 13], "c_k": 23, "ca": [1, 26], "cach": 10, "cal": [0, 8, 10, 12, 13, 28], "calcul": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 20, 23, 26], "california": 21, "call": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 20, 21, 23, 24, 26, 27, 28], "calor": [0, 27], "cambridg": [13, 25, 28], "can": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28], "cancel": [0, 13, 26, 27], "cancer": [5, 10], "cancerpd": 7, "candid": [8, 9, 10], "cannot": [0, 1, 4, 5, 6, 7, 8, 9, 21, 23, 27, 28], "canopi": [0, 19, 21, 26], "canva": [15, 16, 21, 26], "cap": 5, "capabl": [0, 1, 8, 13, 19, 26], "capac": [2, 24], "capita": [], "caption": 21, "captur": [4, 11, 12, 26], "car": [3, 4], "card": [0, 7, 26], "cardin": 1, "care": [11, 15], "carefulli": 13, "carlo": [0, 6, 19, 23, 25, 26], "carri": [2, 6, 7, 21], "cart": 10, "case": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 19, 20, 21, 26], "casella": 25, "cast": 1, "cat": [3, 4], "catch": 0, "categor": [0, 1, 3, 9, 11, 26], "categori": [0, 1, 3, 7, 10, 12, 14, 26], "categorical_crossentropi": [1, 3], "caus": [0, 5, 6, 23, 26, 27, 28], "causal": 0, "causat": [0, 26], "cax": 1, "cb": [6, 26], "cbar": 1, "cc": [0, 1, 5, 13, 26, 27, 28], "ccc": [5, 12, 28], "cdf": 23, "cdot": [0, 2, 6, 12, 13, 14, 20, 23, 26, 28], "celebr": [13, 28], "cell": 4, "center": [0, 1, 6, 7, 8, 9, 11, 14, 18, 21, 23, 26, 27], "central": [0, 3, 5, 6, 8, 16, 20, 26, 27], "centroid": [14, 23], "centroid_differ": 14, "centuri": 3, "certain": [0, 3, 6, 7, 9, 23, 26, 27], "cg": 13, "cha": [], "chain": [0, 1, 13, 19, 23, 26], "challeng": 15, "chanc": [1, 5, 13, 23], "chang": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 20, 21, 23, 26, 27, 28], "channel": 3, "chapter": [0, 6, 10, 11, 16, 17, 20, 21, 25, 26, 27, 28], "chapter3": [0, 21], "charact": [0, 3, 5, 26, 27, 28], "character": [8, 9, 10, 12, 23], "characterist": [0, 1, 3, 10, 13, 26], "charg": [0, 26], "charl": [], "chase": 4, "chatgpt": [15, 21], "chd": 7, "chddata": 7, "cheap": [5, 27, 28], "cheaper": [1, 13], "check": [1, 3, 4, 5, 11, 13, 15, 16, 20, 26], "checkmark": 3, "checkpoint": 4, "checkpoint_dir": 4, "checkpoint_prefix": 4, "chen": 10, "cheng": 27, "chiaramont": 2, "childcar": 16, "children": 16, "choic": [0, 1, 2, 3, 4, 6, 9, 12, 13, 14, 20, 26, 27, 28], "choleski": [5, 20, 27, 28], "choos": [2, 3, 6, 9, 10, 11, 13, 14, 15, 18, 21, 28], "chosen": [0, 1, 2, 6, 8, 9, 10, 13, 16, 23, 26, 28], "chosen_datapoint": 1, "christian": 25, "christoph": [25, 26], "cifar": 3, "cifar10": 3, "circ": [1, 12], "circl": [0, 8, 12, 27], "circuit": 3, "circumfer": 9, "circumv": [1, 5, 13, 27, 28], "cite": 21, "ckpt": 4, "clariti": 23, "class": [0, 1, 3, 4, 6, 7, 8, 9, 11, 12, 13, 23, 26], "class_nam": [3, 9], "class_val": 9, "class_valu": 9, "classic": [7, 9, 13], "classif": [0, 3, 5, 6, 7, 8, 11, 12, 19, 21, 25, 26, 27], "classifi": [0, 1, 4, 7, 9, 10, 11, 26], "classificaton": 1, "classifii": 10, "clean": 1, "clear": [1, 5, 10, 12, 13], "clearli": [0, 3, 5, 6, 7, 8, 23, 27, 28], "clever": [1, 10], "clf": [0, 6, 8, 9, 10, 26, 27], "clf3": 0, "clf_lasso": 6, "clf_ridg": 6, "cli": 15, "clip": [3, 23], "clone": [15, 24], "close": [0, 1, 2, 4, 6, 8, 9, 11, 12, 13, 14, 18, 23, 25, 26, 28], "closer": [3, 5, 13, 27, 28], "closest": [8, 11, 13, 14], "closur": [19, 26], "cloud": [19, 26], "cluster": [0, 1, 4, 6, 11, 19, 26], "cluster_label": 14, "cm": [1, 2, 3, 6, 8, 13, 28], "cmap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 26], "cmap_arg": 6, "cmd": [9, 15], "cn_": 23, "cnn": 12, "cnn_kera": 3, "cntk": [19, 26], "co": [0, 2, 3, 6, 9, 13, 26], "code": [0, 3, 4, 6, 7, 8, 18, 19, 20, 23, 25], "coef": [0, 26], "coef0": 8, "coef_": [0, 5, 6, 8, 9, 13, 16, 26, 27, 28], "coeff": 5, "coeffici": [0, 3, 5, 6, 7, 8, 9, 13, 18, 20, 26, 27], "coerc": [0, 6, 26], "coin": [10, 23], "coin_toss": 10, "col": [0, 11, 26, 27], "colab": [19, 26], "cold": 9, "colinear": [], "collabor": 21, "collaps": 8, "collect": [2, 6, 10, 11, 17, 19, 23, 25, 26], "collinear": [5, 27, 28], "color": [0, 3, 4, 6, 8, 9, 10, 23], "color_channel": 3, "color_cod": 6, "colorbar": [1, 6], "colsample_bytre": 10, "colsaobject": 10, "column": [0, 1, 2, 5, 6, 7, 8, 9, 11, 12, 16, 17, 18, 20, 26, 27, 28], "columntransform": 9, "com": [4, 6, 15, 16, 19, 21, 25, 26, 28], "combin": [1, 2, 5, 6, 7, 10, 15, 18, 23], "come": [0, 1, 3, 4, 5, 12, 13, 14, 15, 26, 27, 28], "command": [0, 1, 15], "comment": [0, 4, 5, 6, 21], "commerci": [0, 19, 21, 26], "commit": 15, "commod": [0, 26], "common": [0, 1, 3, 5, 6, 7, 9, 11, 13, 14, 16, 21, 23, 26, 27, 28], "commonli": [0, 1, 4, 6, 7, 9, 13, 14, 27], "commun": [0, 12, 15, 21], "commut": 3, "commutatitav": 3, "compact": [0, 1, 3, 5, 6, 7, 9, 11, 12, 13, 14, 26, 27], "compair": 0, "compar": [0, 3, 4, 5, 6, 11, 13, 18, 20, 21, 26, 27, 28], "comparison": [2, 4, 13], "compat": 7, "compet": 0, "competit": 10, "compil": [0, 1, 3, 4, 13, 19, 20, 26], "complet": [0, 2, 3, 4, 9, 12, 15, 16, 17, 18, 26], "completenn": 12, "complex": [1, 5, 8, 9, 11, 12, 13, 16, 26, 28], "complic": [0, 1, 9, 13, 21, 26, 28], "compoment": 27, "compon": [0, 1, 3, 4, 5, 6, 7, 9, 14, 16, 19, 26, 27, 28], "components_": 11, "compos": [9, 12, 13, 14, 19, 26], "compphys": [0, 6, 16, 19, 21, 22, 24, 25, 26, 27, 28], "compress": [0, 26, 27], "compris": 6, "compromis": [5, 27, 28], "compulsori": [19, 26], "comput": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28], "computation": [0, 3, 6, 9, 13, 23, 26, 28], "computationalscienceuio": 26, "computerlab": 21, "concaten": [2, 4, 6, 14], "concav": [1, 13, 27, 28], "concentr": 10, "concept": [0, 2, 19, 26, 27], "conceptu": [12, 13, 28], "concern": [0, 1, 4, 7, 26, 28], "concic": 26, "conclud": [0, 5, 13], "conclus": 1, "cond": 2, "conda": [0, 1, 19, 21, 26], "condis": 27, "condit": [0, 2, 4, 5, 6, 8, 9, 11, 13, 23, 26, 27], "conduct": 19, "condwav": 2, "confid": [0, 5, 6, 7, 8, 26, 27], "configur": 3, "confirm": [5, 12], "confus": [5, 6, 7, 10, 20, 27], "confusion_matrix": 9, "congruenti": 23, "conjug": [4, 8], "conjugaci": 13, "conjunct": 3, "connect": [0, 1, 3, 4, 9, 11, 12, 13, 20, 26, 27, 28], "consequ": [5, 6, 8, 10, 12, 13, 27, 28], "conserv": [5, 14, 27, 28], "consid": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 16, 20, 21, 23, 26, 27, 28], "consider": [0, 1, 5, 13, 26, 27, 28], "consist": [1, 2, 3, 4, 6, 12, 13, 21, 23, 27, 28], "constant": [0, 2, 4, 5, 6, 8, 12, 13, 16, 18, 23, 26, 27, 28], "constitu": [0, 26], "constitut": [2, 6], "constrain": [1, 3, 5, 7, 11, 28], "constraint": [5, 6, 8, 13, 27, 28], "construct": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 20, 23, 26, 27], "contact": [0, 26], "contain": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 18, 20, 21, 23, 25, 26, 27, 28], "contemporari": 26, "content": [1, 15, 19, 20, 26, 28], "context": [6, 10, 13, 21, 28], "contigu": 20, "continu": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 20, 21, 23, 26, 27, 28], "contour": [9, 10, 13], "contourf": [8, 9, 10], "contrast": [1, 4, 9, 10, 12, 26], "contribut": [0, 3, 5, 13, 18, 23, 26, 27, 28], "contributor": [0, 21], "control": [0, 1, 3, 9, 13, 15, 19, 26], "conv": [3, 4], "conv2d": [3, 4], "conv2dtranspos": 4, "convei": 26, "conveni": [5, 6, 12, 13, 20, 21, 26, 28], "convent": [12, 27], "converg": [1, 2, 4, 5, 8, 13, 14, 18, 27, 28], "convergencewarn": [], "convert": [0, 1, 4, 5, 9, 11, 13, 20, 26, 27, 28], "converttomatrix": 4, "convex": [4, 5, 7, 27], "convinc": [13, 28], "convolut": [1, 4, 19, 26], "cool": [4, 9], "coolwarm": 6, "coordin": [5, 12, 14, 27, 28], "coorel": [], "copi": [0, 1, 14, 15, 27], "core": 10, "corel": 26, "coronari": 7, "corr": [5, 7, 11, 27], "correalt": [11, 19], "correct": [0, 1, 2, 3, 4, 5, 7, 13, 15, 20, 23, 26, 27, 28], "correctli": [1, 2, 6, 7, 10, 18, 21], "correl": [0, 1, 3, 5, 6, 7, 10, 12, 13, 19, 23, 26, 28], "correlation_matrix": [5, 7, 11, 27], "correspond": [0, 3, 5, 6, 8, 9, 11, 12, 19, 20, 21, 23, 26, 27, 28], "cortex": 12, "cosin": [3, 6], "cost": [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 16, 17, 18, 21, 26], "cost_deep_grad": 2, "cost_funct": 2, "cost_function_deep": 2, "cost_function_deep_grad": 2, "cost_function_grad": 2, "cost_grad": 2, "cost_histori": 18, "cost_ol": 18, "cost_ridg": 18, "cost_sum": 2, "costol": 13, "could": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 26, 27, 28], "coulomb": [0, 26], "count": [0, 9, 15, 22, 23, 24, 26], "counterpart": 26, "countor": 13, "coupl": [4, 5, 6], "cours": [0, 1, 3, 5, 11, 15, 16, 17, 21, 24, 27], "coursework": 15, "courvil": [25, 26, 27], "cov": [5, 6, 11, 20, 23, 26, 27], "cov_xi": [5, 11, 27], "cov_xx": [5, 11, 27], "cov_yi": [5, 11, 27], "covari": [0, 7, 19, 20, 26, 28], "covariance_matrix": [5, 11, 14], "cover": [0, 5, 19, 24, 25, 27, 28], "covert": [0, 26], "covxi": 23, "covxx": 23, "covxz": 23, "covyi": 23, "covyz": 23, "covzz": 23, "cpu": 1, "craft": 3, "creat": [1, 3, 4, 5, 9, 10, 11, 12, 15, 18, 19, 26], "create_biases_and_weight": 1, "create_convolutional_neural_network_kera": 3, "create_neural_network_kera": 1, "create_x": [5, 11], "credit": [0, 7, 24, 26], "crim": [], "crime": [], "criteria": [0, 4, 9, 10, 14, 23, 26], "criterion": [9, 10, 13, 18, 28], "critic": [6, 21, 27], "critiqu": 21, "cross": [0, 1, 3, 7, 9, 10, 13, 15, 19, 23, 26, 27, 28], "cross_entropi": 4, "cross_val_scor": 6, "cross_valid": [7, 10], "crossvalid": 6, "crucial": [1, 23], "cs231": 3, "csr_matrix": [20, 26], "csv": [0, 4, 6, 7, 9], "ctnk": 1, "cubic": 0, "cumbersom": 5, "cumsum": [10, 11, 26], "cumul": [7, 10, 23], "cumulative_heads_ratio": 10, "cup": 5, "current": [1, 2, 3, 4, 13, 14, 15, 16, 25, 28], "curs": [0, 27], "curv": [6, 7, 10, 12, 21], "curvatur": [13, 28], "custom": [6, 14], "custom_cmap": [9, 10], "custom_cmap2": [9, 10], "cutpoint": 9, "cv": [6, 7, 10], "cvxbook": [13, 28], "cvxopt": [5, 8, 27], "cycl": [1, 12], "d": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 20, 23, 24, 26, 27, 28], "d2_g_t": 2, "d_f": [13, 28], "d_g_t": 2, "d_net_out": 2, "da": 3, "dagger": [5, 20, 27, 28], "dai": [1, 9, 19], "damp": 3, "darget": 9, "darkr": 23, "dat": [0, 26], "dat_id": [0, 6, 7, 9, 26], "data": [2, 4, 5, 8, 10, 12, 13, 14, 16, 20, 21, 25, 28], "data1": 14, "data2": 14, "data3": 14, "data4": 14, "data_id": [0, 6, 7, 9, 26], "data_indic": 1, "data_panda": 26, "data_path": [0, 6, 7, 9, 26], "databas": 1, "datafil": [0, 6, 7, 9, 26], "datafram": [0, 4, 5, 7, 9, 11, 26, 27], "datapoint": [1, 5, 6, 7, 11, 13, 16, 28], "datasci": [15, 16], "dataset": [0, 4, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 26, 28], "datatyp": 4, "date": [15, 18, 21, 26, 27, 28], "daughter": 10, "david": 25, "dbh": 1, "dbo": 1, "dcomposit": 20, "ddot": 2, "dead": 1, "deadlin": 15, "deal": [0, 1, 3, 5, 6, 8, 11, 13, 14, 20, 23, 26, 27, 28], "dealt": 0, "debt": 7, "debug": [0, 5, 6, 27, 28], "decad": [0, 3], "decai": [0, 13, 23, 26], "decemb": [24, 26], "decent": 10, "decid": [0, 2, 3, 5, 6, 9, 18, 27, 28], "decim": [0, 26], "decis": [0, 1, 8, 11, 19, 25, 26], "decision_funct": 8, "decision_tre": 9, "decisiontreeclassifi": [9, 10], "decisiontreeregressor": [0, 9, 10], "declar": [0, 4, 20, 26], "decompos": [5, 6, 20, 27, 28], "decomposit": [0, 6, 12, 26], "decompost": [5, 27, 28], "deconvolut": 3, "decorrel": [10, 13], "decreas": [1, 2, 4, 5, 6, 10, 11, 13, 28], "deduc": [0, 26], "deep": [3, 7, 12, 13, 19, 25, 27, 28], "deep_neural_network": 2, "deep_param": 2, "deep_tree_clf": [9, 10], "deep_tree_clf1": 9, "deep_tree_clf2": 9, "deepen": [5, 19, 26], "deeper": [0, 3, 4, 26], "deeplearningbook": [25, 26, 28], "deer": 3, "def": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 17, 23, 26, 27, 28], "def_covari": 23, "default": [0, 1, 2, 4, 6, 7, 20, 26, 27], "default_tim": 4, "defect": [5, 27, 28], "defici": [5, 27, 28], "defin": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 23, 27, 28], "definit": [1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 20, 23, 27, 28], "defint": 23, "degre": [3, 5, 6, 8, 9, 10, 11, 15, 16, 21, 23, 26, 28], "deisenroth": 27, "del": 1, "delet": [6, 15], "delimit": 4, "deliv": [15, 22, 26], "delta": [0, 2, 3, 6, 8, 12, 13, 14, 26], "delta_": [1, 20], "delta_0": 3, "delta_1": 3, "delta_2": 3, "delta_3": 3, "delta_4": 3, "delta_5": 3, "delta_h": [0, 1, 26], "delta_j": [3, 12], "delta_k": 12, "delta_l": [1, 3], "delta_momentum": 13, "delta_n": [0, 3, 26], "delug": 19, "delv": 0, "demand": [13, 28], "demonstr": [0, 3, 5, 6, 7, 11, 12, 19, 26, 27, 28], "den": 4, "denomin": [1, 5], "denot": [1, 2, 6, 7, 13, 23, 28], "dens": [1, 3, 4], "densiti": [0, 2, 6, 23], "depart": [24, 26, 27, 28], "depend": [0, 1, 2, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27, 28], "depict": 23, "deploy": [0, 19, 21, 26], "depth": [0, 3, 9, 10, 20], "deriv": [0, 1, 2, 6, 7, 8, 10, 11, 13, 18, 19, 21, 26], "derivati": 13, "derivative_fn": 13, "descend": [5, 9, 11, 27, 28], "descent": [0, 1, 3, 7, 8, 12, 26, 27], "describ": [0, 2, 4, 5, 6, 8, 10, 11, 12, 13, 20, 21, 26], "descript": [0, 8, 9, 21, 26], "design": [0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 17, 18, 21, 26, 28], "designmatrix": [0, 26], "desir": [0, 2, 4, 5, 13, 14, 26, 27, 28], "desktop": 15, "despit": [1, 12], "destroi": 20, "det": [5, 20, 27, 28], "detail": [0, 6, 11, 13, 14, 18, 20, 21, 27, 28], "detect": [3, 8, 12], "determin": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 18, 20, 23, 26, 27, 28], "determinist": [7, 13, 23, 28], "dev": [1, 21], "develop": [0, 3, 5, 8, 10, 11, 12, 19, 20, 21, 26, 27], "deviat": [0, 1, 2, 4, 5, 6, 17, 18, 21, 23, 26, 27], "devis": 12, "df": [4, 8, 11, 13, 26], "df1": 26, "di": [], "diag": [5, 8, 27, 28], "diagnost": [1, 10], "diagon": [0, 5, 7, 13, 18, 20, 23, 26, 27, 28], "diagonaliz": [5, 27, 28], "diagram": 10, "diagsvd": 6, "dice": [6, 23], "dict": [6, 8], "dictionari": [], "did": [0, 1, 5, 6, 7, 10, 11, 14, 16, 21, 26], "die": 1, "diff": 2, "diff1": 2, "diff2": 2, "diff_ag": 2, "diffeent": 8, "differ": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28], "differenti": [0, 3, 16, 19, 20, 26, 27, 28], "difficult": [0, 1, 6, 10, 13, 23, 26], "difficulti": [0, 1, 13, 26, 28], "diffonedim": 2, "digit": [0, 1, 3, 4, 6, 24, 26], "dilemma": 13, "dilut": 1, "dim": [4, 11, 14, 20], "dimens": [0, 1, 2, 3, 4, 5, 8, 11, 14, 16, 20, 26, 27, 28], "dimension": [0, 4, 5, 6, 9, 11, 13, 14, 19, 20, 21, 26, 27, 28], "dimensionless": [0, 3, 26], "diment": 20, "dimnsion": 4, "diod": 3, "direct": [0, 1, 2, 4, 11, 12, 13, 14, 26, 27, 28], "directli": [1, 4, 5, 6, 18, 23, 27, 28], "disadvantag": [0, 26], "disappear": [3, 6], "disc_loss": 4, "disc_tap": 4, "discard": [6, 11], "disciplin": [0, 3, 12], "disclaim": 23, "discord": 26, "discourag": [13, 15, 28], "discov": [0, 26], "discover": 5, "discret": [1, 3, 5, 7, 13], "discrimin": [4, 7, 10, 11], "discriminator_loss": 4, "discriminator_loss_list": 4, "discriminator_model": 4, "discriminator_optim": 4, "discuss": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 20, 21, 23, 25, 26, 27, 28], "diseas": 7, "disguis": [6, 27], "disord": [1, 7], "displai": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 21, 23, 26, 27], "displaystyl": [0, 5, 17, 26, 27, 28], "disregard": [0, 26], "dissimilar": [11, 14], "dist": 14, "distanc": [8, 9, 11, 14, 23], "distance_list": 9, "distinct": [3, 7, 8, 9, 10, 14], "distinctli": 8, "distinguish": [0, 4, 7, 8, 23, 26], "distplot": [], "distribut": [0, 1, 4, 6, 7, 10, 11, 13, 14, 18, 19, 20, 21, 26, 27, 28], "distrubut": [0, 19, 21, 26], "dive": [0, 8, 20, 26], "diverg": [1, 13, 28], "divid": [0, 1, 3, 5, 6, 7, 8, 9, 11, 12, 18, 23, 26, 27], "divis": [6, 8, 9, 13, 18, 20, 23], "dna": 7, "dnn": [0, 1, 2, 4, 12, 26], "dnn1": 4, "dnn2_gru2": 4, "dnn_kera": 1, "dnn_model": 1, "dnn_numpi": 1, "dnn_scikit": [0, 1, 26], "do": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 26, 27, 28], "doc": [0, 15, 16, 19, 21, 22, 24, 25, 26], "document": [4, 13, 15], "doe": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 26], "doesn": [3, 9, 12, 26], "dog": [1, 3, 4], "domain": [5, 8, 13, 21, 28], "domin": [0, 26], "don": [0, 1, 3, 5, 6, 8, 11, 13, 15, 16, 19, 21, 26, 27], "done": [0, 2, 3, 4, 5, 6, 9, 10, 11, 13, 16, 20, 21, 26, 27, 28], "dot": [0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 20, 21, 23, 26, 27, 28], "doubl": [3, 4, 16, 20, 26], "doubli": 1, "doubt": 21, "down": [0, 3, 6, 9, 11, 12, 13, 28], "download": [0, 1, 3, 5, 6, 15, 20, 25, 26], "downsampl": 3, "dozen": 1, "dq": 6, "drag": 13, "dramat": 11, "drastic": 4, "draw": [4, 6, 10, 13, 28], "drawback": [0, 1, 3, 13, 27, 28], "drawn": [1, 4, 6, 7, 11, 23, 26], "drive": [3, 4], "driven": 3, "drop": [0, 1, 5, 6, 11, 13, 23, 26, 27, 28], "dropna": [0, 6, 26], "dropout": 4, "dt": [2, 3, 13, 23], "dtype": [0, 1, 3, 4, 14, 20, 26], "dub": [0, 26], "due": [1, 2, 5, 6, 8, 10, 12, 13, 18, 24, 26, 27, 28], "dummi": [], "dure": [0, 1, 3, 4, 8, 9, 11, 19, 21, 26], "dwell": [], "dwh": 1, "dwo": 1, "dx": [2, 3, 8, 23], "dx_1": 23, "dx_1p": 6, "dx_2p": 6, "dx_mp": 6, "dx_n": 23, "dxp": 6, "dy": [1, 8, 23], "dynam": 4, "dz": 8, "e": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 24, 26, 27, 28], "e_": [0, 2, 26], "each": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 22, 23, 24, 26, 27, 28], "eapprox": [0, 26], "earli": [1, 13], "earlier": [0, 5, 7, 8, 9, 11, 12, 13, 26, 27], "earthexplor": 6, "eas": [6, 9, 14], "easi": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 19, 20, 26, 27, 28], "easier": [5, 6, 8, 9, 13, 15, 21, 23, 26, 27, 28], "easiest": [13, 18], "easili": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 21, 26, 27, 28], "eastern": [24, 26], "ebind": [0, 26], "eblock": 9, "econometr": 26, "economi": 5, "ecosystem": [19, 26], "ect": 22, "edg": 3, "edgecolor": 6, "editor": 15, "edu": [13, 21, 28], "educ": [0, 21, 26], "eff": 23, "effect": [1, 4, 10, 13, 16, 17, 18, 23], "effic": 1, "effici": [0, 3, 10, 13, 19, 20, 23, 26], "efron": 6, "egrad": 13, "eig": [5, 11, 13, 20, 23, 26, 27, 28], "eigen": 23, "eigenpair": [5, 11, 27, 28], "eigenvalu": [0, 5, 8, 11, 13, 20, 26, 27, 28], "eigenvector": [5, 11, 13, 27, 28], "eight": [20, 26], "eigval": [20, 23, 26], "eigvalu": [11, 13, 28], "eigvec": [20, 23, 26], "eigvector": [11, 13, 28], "eir": [24, 26], "eispack": [20, 26], "either": [1, 5, 6, 7, 8, 9, 10, 11, 13, 18, 21, 23, 26, 27, 28], "eivind": 24, "eivinsto": 24, "ekstr\u00f8m": 4, "elabor": 23, "elarn": 3, "electr": [0, 3, 12, 26], "electron": 26, "eleg": 11, "element": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 19, 20, 21, 25, 27], "elementari": [10, 13, 20], "elementwis": [3, 13], "elementwise_grad": [2, 13], "elessar": 26, "elif": 14, "elim": 20, "elimin": [3, 8], "elin": [24, 26], "ellipsi": 16, "els": [1, 3, 4, 7, 9, 12, 13, 16, 20], "elu": 1, "elus": [0, 26], "email": [22, 24, 26], "embed": [0, 11, 27], "embodi": [6, 21], "emit": 23, "emner": 25, "emphas": [0, 10, 19, 26], "emphasi": [0, 19, 25, 26], "empir": [1, 11, 23], "emploi": [0, 1, 5, 6, 11, 13, 23, 26, 27, 28], "employ": 0, "empti": [6, 10, 15], "emul": 12, "en": [19, 21, 25], "enabl": 11, "enbodi": 6, "encod": [0, 3, 5, 9, 11, 14, 26, 27, 28], "encompass": [0, 21, 23], "encount": [0, 1, 5, 7, 13, 15, 21, 23, 26, 27, 28], "encourag": [15, 21], "end": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 24, 26, 27, 28], "endpoint": [3, 6], "energi": [0, 4, 6], "enforc": 12, "eng": 25, "engin": [0, 1, 3, 4, 19, 26], "english": 21, "enocurag": 21, "enorm": 3, "enough": [0, 6, 13, 26, 28], "ensembl": [1, 9, 26], "ensur": [0, 1, 2, 3, 5, 6, 11, 13, 18, 23, 27, 28], "entail": 26, "enter": [5, 6, 27, 28], "enthought": [0, 19, 21, 26], "entir": [1, 3, 7, 9, 19, 23, 26], "entiti": [9, 12, 20, 26], "entri": [0, 5, 8, 11, 12, 20, 26, 27], "entropi": [1, 3, 7, 10, 13, 26, 28], "enumer": [0, 1, 2, 3, 4, 6, 8, 26, 27], "env": 23, "environ": [2, 19, 21, 26], "environemnt": 15, "eo": [0, 6], "eol": 0, "eosfit": 0, "epoch": [0, 1, 3, 4, 12, 13, 26], "epsilon": [0, 5, 6, 7, 13, 21, 26, 27, 28], "epsilon_": [0, 26], "epsilon_0": [0, 26], "epsilon_1": [0, 26], "epsilon_2": [0, 26], "epsilon_i": [0, 26, 27], "eq": [3, 13, 14, 20, 23, 28], "eqnarrai": [3, 5, 6], "equal": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 16, 18, 20, 21, 23, 26, 27, 28], "equat": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, 20, 23, 26], "equilibrium": [2, 12], "equiv": [3, 13, 20, 23, 28], "equival": [0, 1, 5, 7, 8, 11, 13, 19, 20, 26, 27, 28], "erf": 23, "eriador": 26, "err": [0, 10], "err_": 6, "err_sqr": 2, "errat": [13, 28], "erron": 2, "error": [1, 2, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23], "error_estimate_corr_tim": 23, "error_hidden": 1, "error_output": 1, "escap": [13, 28], "especi": [1, 3, 9, 12, 13, 15, 18, 21], "essenti": [0, 5, 6, 9, 10, 12, 14, 15, 21, 23, 27, 28], "establish": [0, 6, 10, 11, 16, 21], "estim": [0, 1, 5, 6, 7, 10, 11, 13, 19, 23, 26, 27, 28], "estimated_mse_fold": 6, "estimated_mse_kfold": 6, "estimated_mse_sklearn": 6, "et": [0, 2, 4, 16, 17, 25, 26, 27, 28], "eta": [0, 1, 3, 8, 12, 13, 18, 26, 28], "eta0": [8, 13], "eta_": 13, "eta_t": 13, "eta_v": [0, 1, 3, 26], "etc": [0, 1, 3, 5, 7, 8, 9, 11, 12, 13, 14, 19, 20, 21, 23, 27, 28], "ethic": 19, "euclidean": [0, 14, 27], "evalu": [0, 2, 3, 4, 5, 6, 9, 13, 15, 16, 17, 21, 23, 26, 27, 28], "evalut": [13, 21], "even": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 19, 20, 23, 26, 27, 28], "evenli": 4, "event": [5, 7, 10, 23], "eventu": [0, 5, 6, 11, 12, 13, 21, 24, 27, 28], "everi": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 19, 23, 24, 26, 27, 28], "everyth": [4, 12, 16, 18], "everywher": [4, 13, 28], "evolv": 0, "exact": [0, 5, 11, 12, 13, 20, 23, 26, 27], "exactli": [0, 3, 4, 6, 12, 18, 19, 27], "exam": 26, "examin": 6, "exampl": [0, 5, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 25], "exce": [1, 12, 13], "excel": [0, 1, 4, 5, 10, 21, 26, 27], "except": [3, 4, 6, 8, 9, 20], "excess": [0, 26], "excit": 0, "exclud": [1, 6, 12, 27], "exclus": [0, 1, 3, 6, 23, 26], "execut": [2, 5, 13, 15, 27, 28], "exemplifi": 13, "exercic": [24, 26], "exercis": [5, 19, 21, 22, 24, 26, 28], "exhaust": 6, "exhibit": [0, 5, 6, 8, 26, 27], "exist": [0, 1, 2, 3, 5, 6, 7, 8, 9, 13, 20, 21, 26, 28], "exit": [5, 20, 27, 28], "exp": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 16, 17, 23, 27, 28], "exp_term": 1, "expand": [5, 7, 11, 13, 28], "expans": [0, 3, 5, 8, 10, 12, 13, 26, 27, 28], "expect": [0, 1, 5, 6, 7, 11, 12, 13, 15, 18, 19, 21, 26, 27], "expectation_value_of_h_wrt_p": 23, "expens": [6, 10, 13, 16, 28], "experi": [0, 1, 6, 8, 13, 15, 19, 21, 26, 27, 28], "experiment": [0, 4, 6, 9, 23, 26], "expert": [1, 9], "explain": [0, 6, 9, 10, 11, 13, 16, 21, 26, 28], "explained_variance_ratio_": 11, "explanatori": [0, 26], "explicit": [0, 3, 6, 13, 20, 21, 26, 27, 28], "explicitli": [0, 4], "explod": 1, "exploit": [0, 3, 12, 13, 26], "explor": [1, 4, 6, 8, 13, 18, 19, 21, 26, 28], "expon": 1, "exponenti": [0, 1, 5, 6, 10, 13, 23, 26, 28], "export": [9, 15, 16], "export_graphviz": 9, "export_text": 9, "exporttext": 9, "expos": 19, "express": [0, 2, 3, 5, 6, 7, 10, 12, 13, 18, 20, 21, 23, 26, 28], "exptmean": 23, "exptvari": 23, "extend": [0, 2, 7, 11, 13, 19, 26], "extens": [0, 12, 15, 19, 26], "extent": [0, 1, 6, 25], "extern": [3, 6, 9], "extra": [1, 3, 5, 15, 24, 26, 27, 28], "extract": [0, 3, 5, 6, 7, 8, 11, 13, 16, 17, 20, 26, 27], "extrapol": [0, 26], "extrem": [0, 1, 4, 5, 6, 7, 8, 9, 13, 15, 16, 20, 27, 28], "extremum": [13, 28], "extrins": 11, "ey": [0, 5, 6, 13, 14, 18, 20, 26, 27, 28], "f": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18, 20, 23, 24, 26, 27, 28], "f1": 13, "f11": [0, 26], "f12": [0, 26], "f13": [0, 26], "f1_grad": 13, "f1d": 13, "f2": 13, "f2_grad_x1": 13, "f2_grad_x1_analyt": 13, "f2_grad_x2": 13, "f2_grad_x2_analyt": 13, "f3": 13, "f3_grad": 13, "f3_grad_analyt": 13, "f4": 13, "f4_grad": 13, "f4_grad_analyt": 13, "f5": 13, "f5_grad": 13, "f6": 13, "f6_for": 13, "f6_for_grad": 13, "f6_grad_analyt": 13, "f6_while": 13, "f6_while_grad": 13, "f7": 13, "f7_grad": 13, "f7_grad_analyt": 13, "f8": 13, "f8_grad": 13, "f9": [0, 13, 26], "f9_altern": 13, "f9_alternative_grad": 13, "f9_grad": 13, "f_": 10, "f_0": [3, 10], "f_1": [10, 13, 28], "f_2": [12, 13, 28], "f_3": 12, "f_d": 23, "f_grad": 13, "f_grad_analyt": 13, "f_i": [0, 6, 12, 16], "f_m": [3, 10], "f_n": 3, "f_vec": 2, "face": [13, 26, 28], "facecolor": [6, 8, 23], "facil": [0, 19], "facilit": 12, "fact": [0, 1, 3, 5, 9, 11, 12, 13, 26, 27, 28], "factor": [0, 1, 3, 5, 6, 9, 10, 11, 13, 20, 23, 26, 27, 28], "factori": 13, "fade": 6, "fafab0": [9, 10], "fail": [0, 6, 13, 24, 26, 28], "failur": 7, "fairli": [1, 2, 18, 23], "faisal": [16, 27], "fake": 4, "fake_loss": 4, "fake_output": 4, "fall": [8, 9, 22], "fals": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 14, 16, 17, 20, 26, 27, 28], "famili": [0, 7, 8, 23, 27], "familiar": [0, 3, 5, 6, 8, 15, 19, 20, 21, 23, 26], "famou": [6, 12], "far": [0, 3, 4, 5, 6, 8, 11, 12, 13, 14, 16, 26, 27, 28], "fashion": [0, 9, 10, 26], "fast": [1, 3, 6, 10, 12, 13, 19, 23, 26, 28], "faster": [1, 11, 13], "fastest": [13, 20, 28], "favor": 7, "favorit": 23, "fc": 3, "featur": [0, 1, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 17, 18, 19, 23, 26, 28], "feature_nam": [1, 7, 9], "feautur": 9, "fed": 1, "feed": [0, 2, 3, 11, 19, 26], "feed_forward": 1, "feed_forward_out": 1, "feed_forward_train": 1, "feedback": [4, 26], "feeddorward": 4, "feedforward": [1, 4, 12], "feel": [0, 5, 6, 11, 13, 15, 16, 18, 19, 21, 24, 26], "feet": [], "felt": 21, "fetch": [6, 15], "few": [1, 3, 4, 5, 9, 17, 18, 23, 26], "fewer": [0, 9, 11, 26], "ffnn": [1, 12], "field": [0, 3, 6, 12, 19], "fifth": [0, 6, 26], "fig": [0, 1, 2, 3, 4, 6, 7, 12, 13, 14, 21, 26], "fig_id": [0, 6, 7, 9, 26], "figaxi": 23, "figsiz": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 26], "figur": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 19, 21, 26, 27, 28], "figure_id": [0, 6, 7, 9, 26], "figurefil": [0, 6, 7, 9, 26], "file": [0, 4, 5, 6, 7, 9, 15, 21, 26], "file_prefix": 4, "filenam": 26, "fill": [5, 9, 18, 27, 28], "filter": [3, 4], "final": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 21, 22, 23, 24, 26, 28], "financ": 0, "find": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 21, 23, 26, 27, 28], "fine": [0, 14], "finish": 2, "finit": [3, 5, 6, 12, 13, 17, 23, 27, 28], "finnicki": 15, "first": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 20, 21, 23, 24, 25, 27], "firsteigvector": 11, "fit": [1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, 21, 23, 27], "fit_beta": 27, "fit_intercept": [0, 5, 6, 16, 27, 28], "fit_mod": 9, "fit_theta": 6, "fit_transform": [0, 6, 8, 9, 11, 15], "fiti": [0, 26], "five": [0, 9, 26, 27], "fix": [0, 3, 4, 6, 10, 11, 12, 13, 21, 26], "flag": 4, "flat": [12, 13, 28], "flatten": [1, 3, 4, 5, 20], "flexibl": [1, 6, 8, 10, 12, 26], "flip": [24, 26], "float": [0, 3, 4, 5, 9, 11, 13, 14, 20, 26, 27, 28], "float32": [4, 9], "float64": [4, 20, 26], "flop": [5, 20, 27, 28], "flow": [1, 4, 12], "fluctuat": 5, "fly": 11, "fm": 0, "fmax": 3, "fmesh": 13, "fn": 7, "focu": [0, 3, 4, 5, 6, 15, 19, 21, 25, 26, 27, 28], "focus": [1, 6, 7, 20, 27], "fold": [6, 9, 21], "folder": [0, 4, 6, 15, 21, 26], "follow": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 23, 24, 25, 26, 27, 28], "font": [7, 23, 26], "fontdict": 23, "fontsiz": [1, 6, 8, 9, 10, 23], "fontweight": 1, "footprint": 3, "foral": [8, 27], "forc": [0, 5, 6, 10, 11, 27, 28], "forcast": 4, "forecast": [4, 12], "forest": [0, 1, 9, 19, 26], "forget": 11, "form": [0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27, 28], "formal": [3, 4, 14, 23], "format": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 19, 23, 25], "format_data": 4, "formatstrformatt": [6, 13, 28], "formul": [4, 6, 11, 14], "formula": [3, 13, 23, 28], "forth": [4, 12], "fortran": [0, 19, 20, 26], "fortran2003": [19, 26], "fortran2008": 21, "fortran90": 23, "fortun": [0, 11, 27], "forward": [0, 3, 6, 19, 20, 26], "found": [1, 2, 4, 5, 6, 12, 13, 21, 26, 27], "foundat": [19, 26], "four": [4, 5, 6, 8, 12, 20, 22, 24, 26, 28], "fourier": [0, 26], "fourierdef1": 3, "fourierdef2": 3, "fourierseriessign": 3, "fourth": [12, 26, 27], "fp": 7, "frac": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 20, 21, 23, 26, 27, 28], "fraction": 9, "frame": 7, "framework": [1, 8, 10, 23], "frank": [5, 11], "frankefunct": [5, 6, 11], "fredli": [24, 26], "free": [0, 6, 11, 13, 15, 16, 18, 19, 20, 21, 23, 24, 25, 26], "freecodecamp": 19, "freedom": [5, 28], "freeli": [0, 21], "freez": 15, "frequenc": [3, 6, 7, 23], "frequent": [0, 8, 9, 13, 28], "frequentist": 19, "fresh": 10, "fridai": [15, 24, 26], "friedman": [6, 21, 25, 26], "friendli": 4, "fro": 21, "frodo": 26, "frog": 3, "from": [0, 1, 2, 3, 4, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25], "from_cod": 9, "from_logit": [3, 4], "from_tensor_slic": 4, "front": [0, 4, 5, 26, 27, 28], "frustrat": 15, "fulfil": [2, 5, 12, 27, 28], "full": [1, 3, 5, 7, 9, 10, 13, 23, 26, 27, 28], "full_matric": [5, 27, 28], "fulli": [3, 6, 12, 23], "fun": [19, 26], "func": 2, "function": [2, 3, 4, 5, 9, 14, 15, 16, 17, 18, 19, 20], "functionali": 11, "fundament": [0, 6, 19, 26], "funtion": 2, "further": [2, 7, 9, 26], "furthermor": [0, 3, 5, 6, 7, 11, 12, 13, 19, 21, 26, 27, 28], "futur": [0, 4, 8, 9, 26], "fy": [15, 21, 22, 24, 25, 26], "fys4155": 21, "fys5419": [25, 26], "fys5429": [25, 26], "f\u00f8470": [24, 26], "g": [0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 13, 15, 18, 23, 26, 27, 28], "g0": 2, "g_": [2, 9, 10], "g_0": 2, "g_1": [2, 10], "g_2": [2, 10], "g_analyt": 2, "g_dnn_ag": 2, "g_euler": 2, "g_i": 2, "g_m": [3, 10], "g_n": 3, "g_re": 2, "g_t": 2, "g_t_d2t": 2, "g_t_d2x": 2, "g_t_dt": 2, "g_t_hessian": 2, "g_t_hessian_func": 2, "g_t_jacobian": 2, "g_t_jacobian_func": 2, "g_trial": 2, "g_trial_deep": 2, "g_vec": 2, "gain": [1, 5, 7, 9, 10, 13, 27, 28], "galleri": [0, 26], "game": 4, "gamge": 26, "gamma": [0, 2, 8, 9, 10, 11, 13, 26, 28], "gamma1": 8, "gamma2": 8, "gamma_": [0, 26], "gamma_0": 10, "gamma_1": 10, "gamma_1x": 10, "gamma_i": [0, 8, 23, 26], "gamma_j": 13, "gamma_k": [13, 28], "gamma_m": 10, "gamma_x": [0, 26], "gap": 8, "gate": [4, 12], "gather": [0, 1, 12, 27], "gaug": 12, "gaussbacksub": 20, "gaussian": [4, 5, 6, 8, 14, 18, 23, 26], "gaussian_point": 14, "gaussian_rbf": 8, "gave": 13, "gavra": 26, "gbc": 26, "gca": [2, 6, 8, 13], "gd": [1, 28], "gd_clf": 10, "gdclassiffiercgain": 10, "gdclassiffierconfus": 10, "gdclassiffierroc": 10, "gdm": 13, "gdregress": 10, "ge": [1, 5, 7, 23, 27, 28], "gen_loss": 4, "gen_tap": 4, "gender": [0, 26], "genener": 4, "gener": [0, 1, 2, 3, 5, 6, 8, 10, 11, 12, 13, 14, 15, 16, 18, 20, 21, 23, 25, 27, 28], "generaliz": 16, "generallay": 12, "generate_and_save_imag": 4, "generate_imag": 4, "generate_latent_point": 4, "generate_simple_clustering_dataset": 14, "generated_imag": 4, "generator_loss": 4, "generator_loss_list": 4, "generator_model": 4, "generator_optim": 4, "genom": 19, "geodes": 11, "geometr": [0, 13, 26], "geometri": 5, "georg": 25, "geotif": 6, "geq": [2, 5, 8, 9, 13, 27, 28], "geron": [0, 25, 26], "get": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 15, 19, 20, 21, 23, 24, 26, 27, 28], "get_dummi": 9, "get_paramet": 2, "get_split": 9, "get_yaxi": 8, "get_yticklabel": 6, "gh": 15, "gibb": [19, 26], "gif": 4, "gini": 10, "gini_index": 9, "ginvers": 13, "git": [0, 15, 19, 26], "giter": 13, "github": [0, 19, 21, 22, 24, 25, 26, 27], "gitignor": 15, "gitlab": [0, 15, 19, 21, 26], "give": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 19, 21, 23, 26, 27, 28], "given": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 20, 23, 26, 27, 28], "global": [6, 7, 13, 28], "glorot": 1, "gnew": 13, "go": [0, 1, 3, 5, 6, 8, 9, 11, 12, 13, 15, 16, 18, 26, 27, 28], "goal": [0, 7, 9, 26], "goe": [0, 1, 2, 5, 6, 13, 14, 15, 20, 26, 27, 28], "golden": 13, "gone": [5, 27, 28], "gong": 1, "good": [1, 3, 4, 5, 6, 9, 10, 11, 13, 15, 18, 19, 23, 25, 27, 28], "goodfellow": [4, 25, 26, 27, 28], "googl": [1, 4, 19, 26], "got": [1, 6, 21], "gotten": 26, "gov": 6, "govern": 26, "gp": 25, "gpu": [1, 13, 19, 26], "grad": [2, 13], "grad_analyt": 13, "grad_ol": 18, "grad_ridg": 18, "grade": [21, 22], "gradient": [0, 3, 4, 7, 8, 9, 12, 19, 26, 27], "gradientboostingclassifi": 10, "gradientboostingregressor": 10, "gradients_of_discrimin": 4, "gradients_of_gener": 4, "gradienttap": 4, "gradual": [1, 14], "grai": [4, 6], "graph": [1, 9, 11, 12, 13, 16, 28], "graph_from_dot_data": 9, "graphic": [0, 1, 9, 15, 26], "grasp": 0, "gray_r": [1, 3], "grayscal": 3, "great": [5, 13, 15, 28], "greater": [1, 7, 23, 27], "greatli": 13, "greedi": 9, "green": [0, 3, 9, 23], "grei": 4, "grid": [1, 3, 6, 7, 8, 12, 23, 27], "grossli": [13, 28], "ground": [0, 26], "group": [0, 6, 7, 9, 14, 15, 19, 21, 22, 24, 26], "groupbi": [0, 26], "grow": [1, 3, 9, 10], "growth": [0, 26], "gru": 4, "guarante": [0, 4, 13, 23, 26, 27, 28], "guess": [1, 4, 10, 13, 14, 28], "guestrin": 10, "gui": 15, "guid": 1, "h": [0, 1, 5, 6, 8, 13, 15, 23, 24, 25, 26, 27, 28], "h1": 2, "h_": [0, 13, 26, 28], "h_1": [2, 13, 28], "h_2": [2, 13, 28], "h_m": 10, "ha": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 23, 26, 27, 28], "haanen": [24, 26], "habit": [0, 27], "had": [0, 1, 6, 7, 13, 26, 28], "hadamard": [1, 12, 13], "half": [1, 8, 9], "halv": 10, "hand": [0, 1, 2, 3, 5, 11, 12, 13, 19, 20, 21, 23, 24, 25, 26, 27, 28], "handi": [3, 21], "handl": [0, 1, 2, 5, 9, 11, 15, 18, 19, 27, 28], "handle_unknown": 9, "handsid": 12, "handwrit": 12, "handwritten": [1, 5], "happen": [1, 2, 3, 4, 5, 6, 10, 13, 23, 27, 28], "hard": [1, 7, 8, 10, 13, 28], "hardcopi": [19, 26], "harder": [0, 1, 27], "harmon": 3, "hasn": [], "hassl": [0, 19, 26], "hast": [19, 26], "hasti": [0, 6, 16, 17, 21, 25, 26, 27], "hat": [0, 1, 5, 6, 7, 9, 10, 11, 12, 13, 16, 17, 18, 20, 27, 28], "have": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 27, 28], "haven": 1, "he": 7, "head": [4, 10, 23], "header": [0, 26], "heads_proba": 10, "health": [0, 27], "hear": [0, 13, 26], "heart": [0, 7, 26], "heatmap": [0, 1, 3, 7, 17, 26], "heavili": 0, "heavisid": 1, "height": [1, 3, 6, 27], "held": 13, "help": [0, 1, 4, 12, 13, 15, 16, 21, 26], "helper": [4, 14], "henc": [0, 5, 6, 8, 9, 10, 12, 13, 26, 27, 28], "henrik": [24, 26], "her": 7, "here": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 23, 26, 27, 28], "hereaft": [0, 8, 12, 26], "hermitian": 20, "hessenberg": 20, "hessian": [0, 2, 5, 13], "heterogen": [9, 10], "hi": 7, "hidden": [1, 3, 4, 12], "hidden_bia": 1, "hidden_bias_gradi": 1, "hidden_layer_s": [0, 1, 26], "hidden_neuron": 4, "hidden_weight": 1, "hidden_weights_gradi": 1, "hierarch": [5, 27, 28], "high": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 13, 14, 19, 20, 21, 26, 27, 28], "higher": [0, 1, 3, 5, 6, 8, 13, 18, 21, 26, 27, 28], "highest": [1, 2], "highli": [0, 3, 4, 10, 19, 20, 25, 26, 27, 28], "highwai": [], "hing": 8, "hint": [13, 15, 16, 27, 28], "hip": 19, "hire": 0, "hist": [4, 6, 7, 23], "histogram": [6, 7, 23], "histor": [7, 11], "histori": [3, 4, 12, 15, 18], "hitherto": 5, "hjorth": [24, 26, 27, 28], "hobbi": 23, "hoc": [5, 27, 28], "hoff": 25, "hold": [1, 3, 6, 13, 14, 28], "holder": [0, 26], "home": [], "homepag": [21, 26], "homework": [6, 13, 28], "homogen": [1, 3, 9, 10, 13], "honchar": 2, "hopefulli": [0, 11, 15, 23, 26], "horizont": 11, "horlyk": [24, 26], "hors": [3, 7, 26], "hot": [1, 9], "hour": [1, 19, 22, 23, 24, 26], "how": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23, 26, 27, 28], "howev": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 23, 26, 27, 28], "hspace": [0, 4, 8, 10, 23, 26], "hstack": 1, "htf": 26, "html": [0, 16, 19, 21, 22, 24, 25, 26, 27, 28], "http": [0, 3, 4, 6, 13, 15, 16, 19, 20, 21, 22, 24, 25, 26, 27, 28], "huang": [0, 26], "huber": [0, 26], "huge": [1, 3, 4, 19], "human": [0, 1, 3, 6, 9, 12, 27], "humid": 9, "hundr": 1, "hungri": 1, "hybrid": 22, "hydrogen": [0, 26], "hyperbol": [1, 4, 12], "hyperparam": 8, "hyperparamet": [3, 4, 5, 6, 9, 13, 18, 21, 27, 28], "hyperplan": 11, "h\u00f8rlyk": [24, 26], "i": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28], "i0": [0, 26], "i1": [0, 6, 8, 12, 26, 27], "i2": [0, 8, 12, 26], "i3": [0, 12, 26], "i5": [0, 26], "i_": [13, 28], "i_1": [5, 6], "i_2": [5, 6], "ian": 25, "ic": [1, 21], "id": [7, 13, 28], "ida": [24, 26], "idea": [0, 1, 2, 3, 4, 6, 9, 10, 12, 13, 20, 21, 27, 28], "ideal": [0, 2, 6, 8, 13, 23, 26], "idem": 6, "ident": [5, 6, 12, 13, 17, 18, 20, 27, 28], "identifi": [0, 1, 7, 9, 11, 12, 13, 14, 26, 27], "ieor": 23, "ifi": 25, "ifs": [19, 26], "ignor": [0, 1, 3, 9, 15, 27], "ii": [20, 23], "iii": [20, 26], "ij": [0, 1, 3, 6, 8, 12, 14, 16, 20, 23, 26, 27], "ik": [0, 20, 26, 27], "illustr": [5, 7, 10, 12, 13, 14, 19, 26], "im": 6, "imag": [1, 3, 4, 6, 9, 11, 12, 14, 25, 26], "image_at_epoch_": 4, "image_batch": 4, "image_height": 3, "image_path": [0, 6, 7, 9, 26], "image_width": 3, "imageio": 6, "images_from_seed_imag": 4, "imagin": 1, "immedi": [0, 3, 4, 6, 19, 26], "implement": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 21, 23, 26, 27, 28], "impli": [3, 5, 6, 7, 13, 20, 27, 28], "implicit": 3, "implicitli": [11, 23], "import": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 21, 23], "importantli": 3, "impos": [0, 6, 11, 12, 26], "imposs": [0, 5, 26, 27, 28], "impress": [0, 12, 26], "improv": [0, 4, 5, 9, 10, 11, 13, 15, 21, 27, 28], "impur": 9, "imread": 6, "imshow": [1, 3, 4, 6], "in3050": [25, 26], "in3310": 26, "in4080": [25, 26], "in4300": [25, 26], "in4310": 25, "in5400": 3, "in5550": 25, "in_out_neuron": 4, "inaccur": [13, 28], "inact": 12, "inadequ": [0, 26], "inch": [6, 27], "includ": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 15, 16, 17, 18, 19, 23, 24, 25, 26, 27, 28], "include_bia": [6, 9], "incom": [12, 16], "incorrect": 1, "incoveni": 8, "increas": [0, 1, 3, 4, 5, 6, 9, 12, 13, 21, 23, 26, 27], "increasingli": 23, "ind": 6, "inde": [0, 2, 4, 5, 6, 13, 26, 27, 28], "indefinit": 4, "independ": [0, 5, 6, 7, 8, 12, 13, 23, 26, 27, 28], "index": [0, 1, 3, 4, 10, 14, 19, 20, 21, 23, 25, 26], "index_col": [0, 26], "indic": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 16, 21, 26, 27], "indispens": 6, "individu": [1, 6, 7, 10, 12, 23, 26, 27], "indu": [], "indx": 20, "indx1": 2, "indx2": 2, "indx3": 2, "ineffici": [3, 13], "inequ": [8, 13], "inequaltii": 28, "inertia": 13, "inf1000": [19, 26], "inf1100": [19, 26], "inf1100l": [19, 26], "inf1110": [19, 26], "inf3000": 26, "infeas": 9, "infer": [0, 1, 4, 6, 25, 26], "inferenc": 1, "infil": [0, 6, 7, 9, 26], "infin": [5, 6, 7, 11, 27, 28], "infinit": 3, "infinitesim": 23, "influenc": [6, 10, 18], "influenti": 1, "info": 26, "inform": [0, 1, 3, 4, 6, 9, 11, 12, 13, 14, 20, 21, 25, 26, 28], "inforom": 15, "infti": [3, 6, 13, 23, 28], "ingeni": [13, 28], "ingredi": [0, 9, 26], "inher": 6, "inherit": [20, 26], "initi": [0, 1, 2, 6, 10, 13, 14, 18, 20, 23, 26, 28], "inject": 14, "inlin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "inner": [0, 13, 27], "inp": 4, "inplac": 13, "input": [0, 1, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 21, 23, 26, 27, 28], "input_dim": 1, "input_shap": [3, 4], "inputs": 1, "inputs_shuffl": [0, 1, 27], "insert": [3, 5, 6, 8, 10, 23, 27, 28], "insid": [4, 7], "insight": [0, 1, 5, 19, 26, 27, 28], "insist": [6, 13, 27], "inspir": [0, 1, 12, 21, 26], "instabl": 2, "instal": [0, 1, 5, 6, 9, 15], "instanc": [0, 1, 2, 4, 6, 9, 11, 13, 16, 26, 27, 28], "instanti": 10, "instead": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 17, 20, 23, 26, 27], "institut": 1, "instruct": [0, 1, 15], "int": [0, 1, 2, 3, 4, 5, 6, 11, 13, 14, 20, 23, 27], "int32": 10, "int_": [3, 6, 23], "int_0": 23, "int_a": 23, "intak": [0, 27], "integ": [1, 2, 13, 14, 20, 23, 26], "integer_vector": 1, "integr": [3, 6, 23, 26], "intellig": [0, 14, 25, 26], "intend": 10, "intens": [1, 18], "intention": 14, "interact": [0, 6, 9, 12, 19, 21, 26], "intercept": [0, 6, 8, 11, 13, 16, 17, 18, 26, 27, 28], "intercept_": [0, 6, 8, 9, 13, 26, 27], "interchang": [5, 12, 20], "interconnect": 1, "interest": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 19, 21, 23, 26, 27, 28], "interfac": [0, 1, 15, 20, 27], "interior": [0, 9, 26], "intermedi": [20, 27], "intern": [1, 10, 12], "interpol": [1, 3, 4, 6, 12], "interpr": [5, 27, 28], "interpret": [0, 1, 6, 9, 10, 12, 13, 15, 16, 20, 21, 23], "interv": [0, 3, 5, 6, 7, 13, 23, 26, 27, 28], "intial": [13, 28], "intract": [0, 4, 27], "intrins": [3, 11, 20, 23, 26], "intro": [19, 25, 26], "introduc": [0, 1, 5, 6, 8, 10, 12, 20, 21, 23, 26, 28], "introduct": [1, 2, 4, 13, 25, 27, 28], "introductori": [0, 4, 20, 25, 26, 27], "intuit": [0, 5, 6, 8, 12, 13, 21, 26], "inv": [0, 5, 13, 17, 26, 27, 28], "invalu": [0, 13, 19, 26, 28], "invari": 1, "invd": 5, "inver": 8, "invers": [0, 3, 6, 13, 26, 27, 28], "inverse_transform": 8, "invert": [0, 5, 7, 10, 13, 16, 18, 26], "invh": 13, "invok": 8, "involv": [0, 2, 6, 7, 11, 12, 26, 27], "io": [0, 19, 21, 22, 24, 25, 26, 27], "ip": [0, 8, 23, 26], "ipca": 11, "ipynb": [19, 26], "ipython": [0, 5, 7, 9, 11, 14, 19, 21, 26, 27], "iq": 6, "iri": [8, 9], "irreduc": 6, "irrelev": [5, 27, 28], "irrespect": [0, 26], "irvin": 21, "isn": 5, "isnul": [], "isomap": 11, "issu": [1, 9, 15, 20], "it_arrai": 13, "item": [0, 13, 26], "items": [20, 26], "iter": [1, 2, 4, 6, 8, 13, 14, 18, 21, 23, 28], "its": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 23, 26, 28], "itself": [5, 6, 12, 21, 23, 26, 27], "j": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 20, 21, 23, 25, 26, 27, 28], "j1": 20, "j_": 6, "j_lasso_sk": 6, "j_ridge_sk": 6, "j_sk": 6, "jackknif": [6, 19, 26], "jacobian": [2, 13, 28], "jason": 4, "jax": [19, 26], "jensen": [24, 26, 27, 28], "jerom": [21, 25], "ji": [12, 20], "jit": 13, "jj": [0, 5, 6, 26], "jk": [0, 1, 6, 12, 20, 26], "jl": [0, 26], "jm": 20, "jnp": 13, "job": [2, 8, 10, 15], "join": [0, 4, 6, 7, 9, 26], "joint": [4, 5], "judg": [13, 28], "judgement": 6, "julia": [19, 20, 21], "jump": 23, "junk": 4, "jupit": 26, "jupyt": [0, 15, 16, 19, 21, 25, 26], "just": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 23, 26, 27, 28], "justif": 0, "justifi": [3, 10], "k": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 23, 24, 26, 27, 28], "k0": 7, "k1": 7, "kaggl": [6, 21], "kappa_d": 23, "karl": [24, 26], "karush": 8, "katrin": [24, 26], "keep": [0, 1, 4, 5, 6, 11, 13, 14, 15, 18, 20, 21, 26, 27, 28], "keepdim": [1, 6, 10, 20], "kei": [1, 3, 6, 12], "kept": [4, 6, 14], "kera": [0, 4, 19, 21, 26], "kernel": [0, 1, 3, 19, 26, 27], "kernel_regular": [1, 3], "kernel_s": 4, "kernelpca": 11, "kev": [0, 26], "kevin": [25, 26], "keyword": [20, 26], "kfold": 6, "kg": 1, "ki": 20, "kick": [1, 13], "kiener": 2, "kilomet": [6, 27], "kind": [0, 2, 3, 4, 8, 12, 13, 14, 26, 27], "kj": [6, 12, 20, 27], "kjm": [19, 26], "kkt": 8, "kl": 23, "km": [12, 26], "kmean": 14, "kmeanspoint": 14, "kn_k": 14, "know": [0, 1, 2, 5, 6, 8, 13, 15, 16, 17, 19, 26, 27, 28], "knowledg": [0, 19, 26], "known": [1, 3, 4, 5, 6, 7, 8, 9, 12, 18, 20, 21, 23, 25, 27], "kondev": [0, 26], "kp": 23, "kpca": 11, "kroneck": 14, "kuhn": 8, "kvalsund": [24, 26], "kwown": [0, 26], "l": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 20, 21, 23, 26, 28], "l0": 7, "l1": [0, 1, 3, 7, 26], "l1_l2": [1, 3], "l1regl": 5, "l2": [1, 3], "l_": 20, "l_1": 7, "l_2": [7, 13, 28], "l_j": 12, "la": 13, "la_i": 12, "la_k": 12, "lab": [19, 21, 26], "label": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 19, 20, 21, 23, 26, 27, 28], "labelencod": [7, 10], "labels": [6, 8, 9], "labels_shuffl": [0, 1, 27], "laboratori": 22, "lack": [0, 26], "lagari": 2, "lagrang": [8, 11], "lam": [], "lambda": [0, 1, 2, 3, 5, 6, 7, 8, 10, 12, 13, 17, 18, 21, 23, 26, 27, 28], "lambda_": 11, "lambda_0": 11, "lambda_1": [5, 8, 11, 27, 28], "lambda_2": [8, 11], "lambda_i": [8, 11], "lambda_iy_i": 8, "lambda_jy_iy_j": 8, "lambda_k": 8, "lambda_n": [5, 8, 27, 28], "lamda": 1, "land": 8, "landmark": 8, "landscap": [13, 18, 28], "langl": [0, 6, 11, 23, 26, 27], "languag": [0, 1, 4, 8, 19, 20, 21, 25, 26], "lapack": [20, 26], "laplac": 5, "laptop": [15, 19], "larg": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 13, 18, 19, 20, 21, 23, 25, 26, 27, 28], "larger": [0, 3, 5, 6, 8, 10, 11, 13, 17, 23, 26, 27, 28], "largest": [4, 8, 11], "lasso": [0, 7, 19, 26], "lasso_sk": 6, "last": [0, 1, 3, 4, 5, 6, 7, 8, 12, 16, 17, 20, 21, 23, 24, 26, 28], "latent": 4, "latent_dim": 4, "latent_point": 4, "latent_space_value_rang": 4, "later": [0, 1, 4, 7, 8, 12, 13, 14, 15, 19, 21, 26], "latest": [4, 15, 19], "latest_checkpoint": 4, "latex": 26, "latter": [0, 3, 6, 7, 8, 11, 13, 20, 23, 26, 27, 28], "lattic": 12, "law": 0, "layer": [0, 4, 13, 26], "lbfg": [7, 9, 10], "lcc": [5, 6], "lda": 11, "ldot": [0, 6, 11, 21, 26], "le": [5, 7, 10, 13, 17, 23, 27, 28], "lead": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 20, 23, 26, 27, 28], "leaf": 9, "leaki": 1, "leakyrelu": 4, "lear": [13, 28], "learn": [3, 4, 5, 6, 7, 8, 9, 10, 12, 20, 24, 25], "learnabl": 3, "learner": 10, "learnig": 26, "learning_r": [8, 10], "learning_rate_init": [0, 1, 26], "learning_schedul": 13, "learnt": 21, "least": [0, 7, 8, 10, 11, 17, 18, 19, 20, 23], "leat": 13, "leav": [0, 1, 3, 5, 6, 9, 11, 26, 28], "lectur": [0, 1, 5, 10, 11, 12, 13, 19, 20, 21, 22, 24, 25, 27], "lecturenot": [0, 19, 21, 25, 26], "left": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 20, 21, 23, 26, 27, 28], "leftarrow": [8, 12], "legend": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 15, 26, 27, 28], "leinonen": 26, "len": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 16, 17, 20, 26, 27, 28], "length": [0, 1, 3, 4, 8, 9, 13, 16, 19, 26, 27, 28], "length_of_sequ": 4, "leq": [0, 5, 7, 8, 13, 14, 23, 26, 27, 28], "less": [0, 1, 3, 4, 5, 6, 8, 9, 13, 19, 23, 26, 27, 28], "lessen": 1, "let": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 23, 26, 27, 28], "letter": [0, 16, 20, 23, 26, 27], "level": [0, 1, 5, 6, 9, 19, 20, 21, 22, 24, 26], "li": [8, 11], "lib": [], "liblinear": 10, "librari": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 20, 21, 23, 25, 27, 28], "licens": [0, 1, 19, 21, 26], "lie": [0, 6, 11, 23, 26, 27], "life": [0, 1, 8, 12, 26], "lifetim": 13, "like": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27, 28], "likelihood": [0, 1, 5, 9, 26, 27], "lim_": 23, "limit": [0, 5, 6, 8, 12, 20, 21, 26, 27], "lin_clf": 8, "lin_model": [], "lin_reg": 9, "linalg": [0, 2, 5, 6, 8, 11, 13, 17, 20, 23, 26, 27, 28], "line": [0, 3, 6, 8, 11, 13, 15, 16, 26, 28], "line1": 8, "line2": 8, "line3": 8, "line_model": 15, "line_ms": 15, "line_predict": 15, "linear": [1, 3, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 23], "linear_model": [0, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 26, 27, 28], "linear_regress": 6, "linearli": [5, 27, 28], "linearloc": [6, 13, 28], "linearregress": [0, 6, 7, 9, 15, 16, 26, 27], "linearsvc": 8, "lineat": 28, "liner": [1, 3], "linerar": 10, "linewidth": [0, 2, 4, 6, 8, 9, 10], "link": [0, 4, 9, 12, 15, 19, 21, 22, 24, 26], "linlag": 5, "linpack": [20, 26], "linreg": [0, 26], "linspac": [0, 2, 3, 4, 6, 8, 9, 10, 13, 16, 17, 20, 23, 26, 27], "linu": 4, "linux": [0, 1, 19, 21, 26], "liquid": [0, 26], "list": [1, 2, 3, 4, 9, 15, 19, 21, 26], "listedcolormap": [9, 10], "literatur": [1, 7, 14, 25], "littl": [1, 3, 9, 12], "live": [8, 16], "ll": [0, 18, 23, 26, 27], "lle": [0, 27], "lloyd": [4, 14], "lmb": [0, 2, 5, 6, 27, 28], "lmbd": [0, 1, 3, 26], "lmbd_val": [0, 1, 3, 26], "lmbda": [13, 28], "ln": [1, 13, 28], "load": [1, 4, 6, 7, 9, 10], "load_boston": [], "load_breast_canc": [1, 7, 9, 10, 11], "load_data": [3, 4], "load_digit": [1, 3], "load_iri": [8, 9], "loc": [3, 6, 7, 8, 9, 10, 26], "local": [0, 1, 3, 7, 12, 13, 15, 27, 28], "locat": [2, 3, 8, 15], "log": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 13, 15, 20, 21, 26], "log10": [0, 5, 6, 27, 28], "log_": [0, 26], "log_clf": 10, "logarithm": [0, 5, 7, 17, 20, 26], "logbook": 21, "logic": [0, 1, 9, 26], "login": 15, "logist": [0, 1, 2, 8, 9, 10, 11, 12, 13, 19, 27, 28], "logisticregress": [7, 9, 10, 11], "logit": 7, "logreg": [7, 9, 10, 11], "logspac": [0, 1, 3, 5, 6, 26, 27, 28], "long": [0, 1, 3, 4, 12, 13, 26, 28], "longer": [2, 3, 8, 10, 14, 20, 23, 26], "loocv": 6, "look": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20, 21, 23, 26, 27, 28], "loop": [1, 4, 6, 10, 12, 14, 16, 17, 18, 19, 20, 26], "lose": 1, "loss": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 13, 18, 20, 21, 26], "loss_fil": 4, "lossfil": 4, "lost": 4, "lot": [1, 4, 6, 16], "low": [0, 6, 9, 10, 11, 21, 26, 27], "lower": [0, 1, 3, 6, 9, 10, 16, 20, 27], "lowercas": [20, 26], "lowest": [9, 13, 23], "lr": [1, 3, 4, 10], "lstat": [], "lstm": 4, "lstm_2layer": 4, "lstsq": [0, 26, 27], "lt": 6, "lu": [0, 5, 26, 27, 28], "lubksb": 20, "luckili": 2, "ludcmp": 20, "lux": 20, "lvert": 1, "lw": [0, 26], "m": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 15, 18, 20, 23, 24, 25, 26, 27, 28], "m_": [9, 12], "m_1": 14, "m_h": [0, 26], "m_k": 14, "m_l": 12, "m_n": [0, 26], "m_p": [0, 26], "m_t": 13, "ma": 11, "machin": [1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 20, 25, 27], "machinelearn": [0, 6, 16, 19, 21, 22, 24, 25, 26, 27, 28], "mackai": 25, "made": [0, 1, 3, 4, 5, 6, 7, 9, 11, 12, 21, 26, 27], "mae": [0, 26], "magic": 4, "magnitud": [1, 6, 7, 13, 27], "mai": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 19, 20, 21, 23, 26, 27, 28], "mail": [22, 24], "main": [0, 1, 3, 4, 5, 6, 7, 9, 20, 21, 25, 27, 28], "mainli": [0, 5, 6, 7, 9, 26, 27], "maintain": 6, "major": [1, 6, 9, 10, 13, 20, 26, 28], "make": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 25, 26, 28], "make_axes_locat": 6, "make_moon": [8, 9, 10], "make_pipelin": [0, 6, 10, 27], "makedir": [0, 6, 7, 9, 26], "malcondit": 20, "malign": [1, 7, 9], "mammographi": 5, "manag": [0, 2, 3, 15, 19, 21, 26], "mandatori": [24, 26], "mani": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28], "manifold": 11, "manner": 3, "manual": [6, 27], "map": [0, 1, 2, 6, 7, 8, 11, 12, 14, 23, 26], "marc": 27, "margin": [0, 5, 8], "marit": [0, 26], "mark": 26, "marker": [7, 20, 26], "markov": [19, 26], "marsaglia": 23, "mass": [0, 1, 5, 13, 27, 28], "massag": [0, 26], "masses2016": [0, 26], "masses2016ol": [0, 26], "masses2016tre": 0, "masseval2016": [0, 26], "master": [22, 24], "mat": [19, 26], "mat1100": [19, 26], "mat1110": [19, 26], "mat1120": [19, 26], "match": [1, 4, 5, 13, 14, 15, 27, 28], "materi": [4, 5, 7, 13, 15, 20, 22, 24], "math": [3, 7, 12, 13, 20, 23, 25, 26], "mathbb": [0, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 20, 21, 23, 26, 27, 28], "mathbf": [0, 5, 6, 7, 8, 13, 20, 21, 26, 27, 28], "mathcal": [1, 5, 6, 7, 13, 21], "matheemat": 3, "mathemat": [0, 6, 11, 12, 13, 19, 20, 23, 25, 26], "mathemati": 26, "mathrm": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 21, 23, 26, 27, 28], "matmul": [1, 2, 5], "matnat": 25, "matplotlib": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 20, 21, 23, 26, 27, 28], "matric": [0, 1, 3, 4, 6, 7, 8, 11, 13, 16, 17, 19, 27, 28], "matrix": [0, 2, 3, 4, 6, 7, 8, 10, 13, 17, 18, 21, 23], "matshow": 1, "matter": [2, 3, 13, 27, 28], "max": [0, 1, 2, 3, 4, 9, 10, 12, 13, 24, 26, 28], "max_depth": [0, 9, 10], "max_diff": 2, "max_diff1": 2, "max_diff2": 2, "max_it": [0, 1, 8, 13, 26], "max_iter": 14, "max_leaf_nod": 10, "max_sampl": 10, "maxdegre": [0, 6, 10, 27], "maxdepth": 10, "maxim": [1, 4, 5, 7, 8, 11], "maximum": [0, 2, 3, 5, 7, 8, 9, 10, 13, 14, 26, 27, 28], "maxpolydegre": [5, 6, 27, 28], "maxpooling2d": 3, "mbox": [5, 6, 27, 28], "mcculloch": 12, "md": 11, "mdoel": 4, "mean": [1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23, 26], "mean_absolute_error": [0, 26], "mean_divisor": 14, "mean_i": 23, "mean_matrix": 14, "mean_squared_error": [0, 4, 6, 7, 10, 15, 26, 27], "mean_squared_log_error": [0, 26], "mean_vector": 14, "mean_x": 23, "meaning": [0, 4, 7, 26], "meansquarederror": [0, 26], "meant": [3, 7, 10, 13], "measur": [0, 1, 2, 5, 6, 9, 11, 12, 14, 16, 18, 21, 23, 26, 27], "mechan": [0, 4, 23, 26], "median": [0, 26, 27], "medicin": 12, "medium": [4, 8, 13], "medv": [], "meet": [0, 24], "mehta": [0, 26, 27, 28], "memori": [3, 4, 11, 12, 13, 18, 20], "mention": [0, 12, 13, 21, 23, 26, 28], "mere": [0, 21], "meshgrid": [2, 5, 6, 8, 9, 10, 11], "mess": 15, "messag": [5, 13], "messi": 2, "met": [0, 3, 8, 27], "meteorolog": 9, "meter": [6, 27], "method": [0, 1, 2, 3, 4, 5, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 23, 25, 27], "metion": 6, "metric": [0, 1, 3, 6, 7, 9, 10, 14, 15, 26, 27], "metropoli": [19, 26], "mev": [0, 23, 26], "mgd": 13, "mglearn": [19, 26], "mgrid": 13, "mhjensen": [], "mi": 10, "mia": [24, 26], "microsoft": 25, "mid": 1, "midel": 4, "midnight": 15, "midpoint": 9, "might": [0, 1, 2, 4, 6, 9, 13, 15, 17, 18, 27, 28], "migth": 17, "mild": 9, "millimet": [6, 27], "million": [0, 26, 27], "mimic": 12, "min": [0, 2, 5, 8, 9, 28], "min_": [0, 2, 5, 14, 17, 26, 27, 28], "min_samples_leaf": 9, "mind": [0, 6, 13, 15, 18, 26, 27, 28], "mindboard": 4, "mine": [19, 26], "mini": [1, 11, 12, 13, 28], "minibatch": [1, 11, 13], "minibathc": 13, "miniforge3": [], "minim": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 27, 28], "minima": [0, 1, 7, 13, 26, 28], "minimum": [0, 1, 2, 6, 8, 9, 11, 13, 27, 28], "minmaxscal": [0, 27], "minor": 23, "minst": 1, "minu": 7, "mirjalili": 26, "mirror": 9, "misc": 6, "misclassif": [8, 9, 10], "misclassifi": [8, 10], "miser": 0, "mismatch": 1, "miss": [7, 10], "mistak": 4, "mit": 25, "mix": [1, 2, 26], "mixtur": 13, "mk": [9, 20], "mkdir": [0, 6, 7, 9, 26], "ml": [0, 1, 10, 13, 20, 21, 27, 28], "mlab": 23, "mle": [5, 7], "mlp": 1, "mlpclassifi": 1, "mlpregressor": [0, 26], "mm": 20, "mml": 27, "mn": [12, 23], "mnist": [1, 11], "mod": 23, "mode": [22, 24, 26], "model": [2, 3, 5, 7, 8, 9, 10, 11, 13, 14, 16, 18, 19, 21, 23, 25, 27, 28], "model_select": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "moder": 10, "modern": [0, 6, 7, 19, 26], "modif": [2, 12, 13], "modifi": [0, 1, 3, 5, 7, 8, 10, 12, 13, 26, 27, 28], "modul": [0, 16, 20, 26], "modular": 23, "modulo": 23, "moe": [11, 27], "moment": [5, 6, 13, 23], "mondai": [24, 26], "monitor": [13, 18], "monoton": [5, 12, 23], "mont": [0, 6, 19, 23, 25, 26], "montli": 16, "moor": [5, 6], "more": [0, 1, 2, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 23], "moreov": [0, 3], "morten": [24, 26, 27, 28], "mortenhj": 26, "most": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 21, 23, 26, 27, 28], "mostli": [1, 11, 18], "motion": [0, 13], "motiv": [1, 4], "move": [0, 4, 5, 6, 7, 9, 12, 13, 14, 15, 16, 21, 23, 27, 28], "mpl": [7, 26], "mpl_toolkit": [2, 6, 13, 28], "mplot3d": [2, 6, 13, 28], "mplregressor": 1, "mse": [0, 4, 5, 6, 9, 10, 15, 16, 17, 18, 21, 26, 27, 28], "mse_simpletre": 10, "mselassopredict": [5, 28], "mselassotrain": [5, 28], "mseownridgepredict": [6, 27, 28], "msepredict": [5, 28], "mseridgepredict": [0, 5, 6, 27, 28], "msetrain": [5, 28], "msle": [0, 26], "mt": [7, 12], "mu": [0, 6, 11, 13, 23, 26], "mu0": 23, "mu1": 23, "mu2": 23, "mu_": [6, 23, 27], "mu_i": [6, 27], "mu_n": 11, "mu_x": 23, "much": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 20, 21, 23, 26, 27, 28], "multi": [0, 1, 3, 7, 19, 26], "multiclass": [1, 7], "multidimension": [11, 12, 26], "multilay": 1, "multinomi": 7, "multipl": [2, 4, 5, 6, 7, 12, 13, 15, 23, 27, 28], "multipli": [3, 5, 6, 11, 13, 18, 20, 23, 27, 28], "multiplum": 8, "multivari": [0, 2, 10, 11, 19, 23, 26], "multivariate_norm": [11, 14], "multpli": 16, "murphi": [11, 25, 26], "must": [1, 2, 5, 6, 8, 10, 12, 13, 14, 15, 23, 27, 28], "mutat": 7, "mutual": [1, 3, 6, 13], "mx_": 23, "my": 26, "myenv": [], "myriad": [0, 19, 26], "mz1": 23, "mz2": 23, "m\u00f8svatn": 6, "n": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 26, 27, 28], "n1": 20, "n2": 20, "n_": [1, 2, 3, 8, 12, 23], "n_0": [12, 23], "n_boostrap": [6, 10], "n_bootstrap": 6, "n_categori": [1, 3], "n_cluster": 14, "n_compon": 11, "n_epoch": 13, "n_estim": 10, "n_examples_to_gener": 4, "n_featur": [1, 18], "n_filter": 3, "n_hidden": 2, "n_hidden_neuron": [0, 1, 26], "n_i": 23, "n_input": [0, 1, 3, 27], "n_instanc": 9, "n_job": 10, "n_k": 14, "n_l": [12, 23], "n_layer": 1, "n_m": 9, "n_neuron": 1, "n_neurons_connect": 3, "n_neurons_layer1": 1, "n_neurons_layer2": 1, "n_point": 14, "n_sampl": [6, 8, 9, 10, 14, 18], "n_split": 6, "n_step": 4, "n_t": 2, "n_x": 2, "nabla": [1, 13, 28], "nabla_": [2, 13, 28], "nabla_w": 13, "nag": 13, "naimi": [0, 26], "naiv": 7, "naive_kmean": 14, "name": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 19, 20, 21, 23, 24, 26, 27, 28], "narrow": 13, "nation": [1, 5], "nativ": [19, 26], "natur": [0, 1, 4, 8, 9, 12, 13, 21, 23, 25, 26, 28], "navier": 12, "navig": 15, "nb": 23, "nb_": 20, "nbconvert": 26, "nd": 14, "ndarrai": 6, "ne": [9, 10, 20, 23, 27, 28], "nearest": [1, 3, 6, 11], "nearli": [13, 28], "neat": 26, "neccesari": 6, "necess": 2, "necessari": [0, 1, 3, 4, 8, 14, 18, 26], "necessarili": [0, 4, 11, 23, 26], "necesserali": 5, "neck": 7, "need": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 23, 27, 28], "neg": [0, 1, 3, 5, 6, 7, 10, 13, 20, 23, 26, 28], "neg_mean_squared_error": 6, "neglect": 23, "neglig": 23, "neighbor": [3, 6, 11], "neither": [4, 13], "neq": [13, 14, 23, 28], "nervou": 12, "nest": [9, 12], "nesterov": 13, "net": [2, 4, 12], "netlib": [20, 26], "network": [0, 9, 13, 19, 25, 27], "neural": [0, 13, 19, 25, 27], "neural_network": [0, 1, 2, 26], "neuralnetwork": 1, "neuron": [1, 2, 3, 4, 12], "neutral": [0, 26], "neutron": [0, 26], "never": [1, 4, 6, 9, 23], "new": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 20, 26, 27, 28], "new_chang": 13, "new_hobbit": 26, "newaxi": [0, 3, 6, 9], "newli": [0, 26], "newton": [1, 7, 8, 13, 23], "next": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 26, 27, 28], "next_guess": 13, "next_input": 4, "ng": 1, "ni": 14, "nice": [0, 1, 5, 11, 26, 27, 28], "nicer": 18, "niter": [13, 28], "nitric": [], "nlambda": [0, 5, 6, 27, 28], "nlp": 25, "nm": 23, "nm_n": [0, 26], "nmse": 6, "nn": [2, 5, 6, 12, 20, 26], "nn_model": 1, "nnmin": 2, "node": [1, 3, 9, 10, 12], "nois": [0, 4, 5, 6, 8, 9, 10, 13, 18, 21, 26, 27, 28], "noise_dimens": 4, "noisi": [1, 6, 21], "non": [0, 1, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 20, 23, 26, 27, 28], "none": [0, 1, 2, 4, 5, 9, 10, 13, 23, 26, 27], "nonlinear": [3, 6, 8, 9, 11, 12], "nonneg": [6, 9, 13, 28], "nonparametr": 6, "nonsens": 23, "nonsingular": 20, "nonumb": [3, 7, 8, 13, 20], "nor": [1, 4, 13], "norm": [0, 1, 5, 6, 8, 11, 13, 18, 26, 27, 28], "normal": [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "normali": [20, 26], "norwai": [6, 21, 26, 28], "notat": [0, 2, 5, 6, 13, 14, 23, 26, 27, 28], "note": [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 19, 20, 23, 25, 26], "notebook": [0, 1, 3, 9, 15, 16, 19, 21, 26], "noth": [1, 2, 5, 8, 12, 14, 23, 27, 28], "notic": [4, 5, 12, 13, 20, 23, 26], "notion": 3, "novel": [3, 6, 10, 26], "novemb": [1, 24, 26], "now": [0, 2, 4, 5, 6, 7, 8, 10, 11, 12, 14, 15, 16, 19, 20, 21, 23, 26, 27], "nowadai": [0, 1, 3, 9, 19, 26], "nox": [], "np": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 23, 26, 27, 28], "npr": 2, "nsampl": 6, "nt": 2, "nu": 23, "nuclear": [5, 27, 28], "nuclei": [0, 23, 26], "nucleon": [0, 26], "nucleu": [0, 26], "num": 4, "num_coordin": 2, "num_hidden_neuron": 2, "num_it": [2, 18], "num_neuron": 2, "num_neurons_hidden": 2, "num_point": 2, "num_tre": 10, "num_valu": 2, "number": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 22, 24, 26, 28], "numberid": 7, "numberparamet": 3, "numer": [0, 5, 6, 9, 10, 11, 12, 13, 19, 20, 25, 26, 27, 28], "numpi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 23, 27, 28], "nunmpi": [5, 27], "nve_frngahw": 28, "nx": 2, "ny": 23, "o": [0, 1, 4, 5, 6, 7, 8, 9, 11, 20, 24, 25, 26, 27, 28], "obei": [6, 11, 13, 27], "object": [0, 1, 4, 8, 10, 15, 20, 26], "obliqu": [5, 27, 28], "observ": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 26, 28], "obtain": [0, 1, 5, 6, 7, 8, 9, 10, 12, 13, 14, 17, 20, 21, 23, 26, 27, 28], "obviou": [5, 6, 11, 23, 27, 28], "obviouli": 26, "obvious": [0, 4, 5, 6, 20, 26], "oc": [27, 28], "occupi": [], "occur": [0, 6, 8, 9, 20, 23, 26], "octob": [24, 26], "od": 0, "odd": [0, 3, 7, 26, 27], "odenum": 2, "odesi": 2, "oen": 0, "off": [1, 3, 4, 5, 9, 13, 23], "offer": [6, 11, 19, 20, 22, 24, 26], "offic": [24, 26], "offici": [22, 26], "often": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 23, 26, 27, 28], "ofter": [20, 26], "ol": [0, 13, 17, 27], "old": [1, 5, 10, 13, 15, 18], "ols_paramet": 16, "ols_sk": 6, "ols_svd": 6, "olsbeta": 28, "olstheta": [0, 5], "omega": [2, 3, 6], "omega_0": 3, "omit": [0, 5, 26, 27, 28], "onc": [1, 6, 9, 11, 13], "one": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 19, 20, 21, 23, 24, 26, 27], "onehot": 1, "onehot_vector": 1, "onehotencod": 9, "ones": [0, 2, 5, 6, 8, 9, 10, 11, 13, 16, 18, 20, 21, 26, 27, 28], "ones_lik": 4, "ong": 27, "onl": 3, "onli": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 23, 26, 27, 28], "onlin": [11, 15, 22], "onto": [5, 11, 27, 28], "open": [0, 1, 4, 6, 7, 9, 15, 19, 21, 22, 24, 26], "oper": [0, 1, 3, 5, 6, 10, 11, 12, 13, 15, 16, 19, 23, 26, 27, 28], "operation": 23, "oplu": 23, "opmiz": 13, "opportun": 0, "oppos": [6, 13], "opposit": [1, 5, 8, 27, 28], "opt": [1, 5, 21, 26, 28], "optim": [0, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14, 16, 17, 21], "optimis": [1, 3], "option": [0, 1, 3, 5, 6, 8, 11, 15, 18, 20, 27], "optmiz": [1, 8, 13, 27], "oral": 26, "orang": 0, "order": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 21, 23, 26, 27, 28], "ordinari": [0, 2, 3, 7, 11, 13, 17, 18, 19], "oreilli": [25, 26], "org": [0, 3, 4, 16, 19, 20, 21, 25, 26, 27, 28], "organ": [6, 7, 10, 20], "orient": [1, 5, 23, 27, 28], "origin": [0, 3, 5, 6, 8, 11, 12, 13, 15, 20, 26, 27, 28], "orthogn": [5, 27, 28], "orthogon": [0, 5, 6, 8, 11, 13, 20, 26, 27, 28], "orthonorm": [5, 27, 28], "os": [24, 26], "oscar": 1, "oscil": [3, 13], "oskar": 26, "oskarlei": 26, "osl": 18, "oslo": [0, 19, 21, 22, 24, 26, 27, 28], "osx": [0, 19, 21, 26], "other": [0, 1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 19, 22, 23, 24, 25, 27, 28], "otherwis": [0, 1, 4, 7, 13, 20, 26], "ouput": [5, 7, 12], "our": [1, 2, 3, 6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18, 19, 20, 23], "ourmodel": 0, "ourselv": [0, 5, 6, 8, 11, 13, 26, 27, 28], "out": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27], "out_fil": 9, "outcom": [0, 7, 9, 10, 12, 23, 27], "outdoor": 9, "outer": [6, 12, 13], "outfil": 4, "outlier": [0, 8, 26, 27], "outlin": [6, 10, 11], "outlook": 9, "outperform": 10, "output": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 20, 21, 23, 26, 27, 28], "output_bia": 1, "output_bias_gradi": 1, "output_shap": 4, "output_weight": 1, "output_weights_gradi": 1, "outputlayer1": 12, "outputlayer2": 12, "outsid": 4, "over": [0, 1, 3, 4, 5, 6, 9, 10, 12, 13, 15, 16, 20, 21, 26, 27, 28], "over1": 13, "overal": [1, 10], "overcast": 9, "overcom": [12, 13], "overdetermin": [0, 26], "overfit": [0, 1, 3, 6, 9, 10, 13], "overflow": 5, "overhead": 12, "overlap": [3, 7, 8, 9], "overlin": [0, 5, 6, 9, 10, 11, 14, 20, 26, 27], "overst": 0, "overtrain": 4, "overview": 3, "own": [4, 5, 6, 8, 12, 13, 16, 18, 19, 20, 28], "owner": [], "ownmsepredict": 0, "ownmsetrain": 0, "ownridgebeta": 27, "ownridgetheta": [0, 6, 27, 28], "ownypredictridg": 0, "ownytilderidg": 0, "oxid": [], "p": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 23, 26, 27, 28], "p0": 2, "p1": 2, "p_": [2, 4, 8, 9], "p_hidden": 2, "p_i": [5, 23], "p_j": 23, "p_n": 23, "p_output": 2, "p_x": 23, "pack": [0, 26], "packag": [0, 1, 3, 4, 5, 8, 11, 13, 15, 19, 21, 23, 27, 28], "packtpub": 26, "packtpublish": 26, "pad": [3, 4], "page": [0, 19, 26, 28], "pai": [0, 1, 9, 13, 15], "pair": [0, 2, 3, 9, 19, 23, 26], "paltform": 15, "panda": [0, 4, 5, 6, 7, 9, 11, 19, 21, 28], "panel": 26, "paper": 1, "paradigm": [0, 26], "parallel": [10, 13, 19, 20, 26], "param": 2, "paramat": 2, "paramet": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 16, 17, 18, 21, 23, 28], "parameter": [0, 6, 10, 26, 27], "parametr": [0, 6, 26, 27], "paramt": [3, 5], "part": [0, 1, 3, 5, 6, 10, 17, 20, 22, 23, 24, 26, 27], "partial": [0, 1, 5, 6, 7, 8, 10, 11, 12, 13, 16, 23, 26, 27, 28], "particip": [15, 19, 22, 24, 26], "particl": [0, 4, 13, 23, 26], "particular": [0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13, 16, 21, 23, 25, 26, 27, 28], "particularli": [5, 6, 8, 11, 13, 23, 27, 28], "partit": [1, 4, 9], "partli": [6, 26], "partner": 15, "pass": [2, 3, 12, 14], "password": 21, "past": [10, 23], "patch": [6, 23], "path": [0, 4, 6, 7, 9, 19, 26], "pathcollect": 17, "patient": 7, "patter": 4, "pattern": [0, 3, 4, 12, 25, 26], "pauli": [0, 26], "pc": [11, 15, 19], "pca": [0, 7, 19, 26, 27], "pd": [0, 4, 5, 6, 7, 9, 11, 26, 27, 28], "pde": 2, "pdf": [0, 3, 4, 5, 6, 9, 15, 16, 21, 25, 26], "pedagog": [0, 26, 27], "penal": [6, 18, 27], "penalti": [6, 13, 18, 21, 27], "penros": [5, 6], "pentagon": [13, 28], "peopl": [1, 9, 13, 19], "per": [0, 1, 6, 22, 24, 26], "percentag": [10, 11, 24], "perceptron": [0, 1, 7, 26], "peregrin": 26, "perfect": [0, 1, 13, 26], "perfectli": [4, 6], "perform": [0, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 14, 16, 18, 19, 20, 21, 23, 26, 27, 28], "performac": 4, "perhap": [0, 5, 13, 26, 27, 28], "perimet": 1, "period": [1, 4, 23], "permiss": 15, "permut": 11, "persist": 13, "person": [5, 6, 7, 16, 22, 24, 26, 27], "perspect": 25, "pertin": [12, 26], "petal": [8, 9], "peter": [25, 27], "phantom": 23, "phase": [6, 12], "phenomena": 23, "phi": 8, "phi_k": 8, "philosophi": 13, "phone": [24, 26], "photo": [4, 26], "php": 21, "phrase": [0, 26], "physic": [0, 1, 4, 7, 12, 13, 23, 24, 25, 26, 27, 28], "pi": [2, 3, 5, 6, 7, 9, 12, 13, 23], "pick": [1, 9, 10, 11, 13, 14], "pickl": 1, "pictur": [0, 26], "pie": [19, 26], "piec": [11, 14], "pillow": [0, 19, 21, 26], "pinv": [5, 6, 13, 21, 27, 28], "pip": [0, 1, 15, 19, 21, 26], "pip3": [0, 1, 21, 26], "pipelin": [0, 6, 8, 10, 27], "pippin": 26, "pit": 4, "pitfal": [6, 27], "pitt": 12, "pixel": [1, 3, 4, 26], "pixel_height": [1, 3], "pixel_width": [1, 3], "place": [0, 4, 6, 8, 13, 15, 20, 21, 26, 28], "plai": [0, 3, 4, 5, 6, 8, 11, 18, 19, 21, 26, 27, 28], "plain": [8, 10, 12, 13, 14, 28], "plan": [6, 9, 24, 25, 26], "plane": [8, 9], "plateau": [5, 28], "platform": [19, 26], "plausibl": 12, "pleas": [13, 21, 24, 26], "plenti": 1, "plethora": [3, 12], "plot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "plot_all_sc": [21, 27], "plot_confusion_matrix": [7, 10], "plot_count": 6, "plot_cumulative_gain": [7, 10], "plot_data": 1, "plot_dataset": 8, "plot_decision_boundari": [9, 10], "plot_import": 10, "plot_max": 4, "plot_min": 4, "plot_model": 4, "plot_numb": 4, "plot_predict": 8, "plot_regression_predict": 9, "plot_result": 4, "plot_roc": [7, 10], "plot_surfac": [2, 6, 13], "plot_train": 9, "plot_tre": [9, 10], "plt": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, 23, 26, 27, 28], "plu": [0, 3, 5, 7, 18, 26, 27], "pm": 8, "pmatrix": 2, "pml": 25, "pn": 3, "png": [0, 4, 6, 7, 9, 26], "point": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 20, 21, 23, 24, 26, 27, 28], "point_1": 4, "point_2": 4, "poisson": [19, 23, 26], "poli": [6, 8], "poly100_kernel_svm_clf": 8, "poly3": 0, "poly3_plot": 0, "poly_featur": [8, 9, 15], "poly_features10": 9, "poly_fit": 9, "poly_fit10": 9, "poly_kernel_svm_clf": 8, "poly_model": 15, "poly_ms": 15, "poly_predict": 15, "polydegre": [0, 5, 6, 10, 27], "polygon": [13, 28], "polym": 12, "polymi": 21, "polynomi": [0, 5, 6, 7, 8, 9, 10, 11, 15, 17, 21, 26, 27], "polynomial_featur": [6, 15, 16, 17], "polynomial_svm_clf": 8, "polynomialfeatur": [0, 6, 8, 9, 15, 16, 27], "polytrop": [0, 6], "pool": 3, "pool_siz": 3, "poor": [1, 13, 28], "poorli": [0, 27], "popul": [0, 5, 26, 27], "popular": [0, 1, 3, 6, 7, 8, 9, 11, 12, 15, 19, 20, 21, 23, 27], "popularli": [0, 26], "portabl": 10, "portion": [11, 13], "pose": [0, 4, 5, 6, 11, 23, 26], "posit": [0, 1, 2, 3, 5, 7, 8, 10, 11, 13, 14, 20, 23, 26, 27, 28], "possibl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 20, 21, 23, 24, 26, 27, 28], "possibli": [6, 8, 13, 21], "posterior": 5, "postpon": [0, 27], "postscript": 21, "postul": 5, "potenti": [0, 3, 5, 6, 12, 13, 27], "pott": 12, "power": [0, 1, 5, 6, 8, 9, 12, 13, 26, 27, 28], "pp": [5, 6], "practic": [0, 5, 6, 7, 8, 16, 18, 21, 23, 27], "practition": [0, 1, 3, 26], "pre": 26, "preced": [1, 11, 12, 23], "preceed": 4, "preceq": 8, "precis": [0, 2, 5, 11, 13, 20, 21, 23, 26, 27], "pred": 6, "predicit": 0, "predict": [0, 1, 5, 6, 7, 8, 9, 10, 15, 16, 17, 18, 19, 21, 25, 26, 27, 28], "predict_prob": 1, "predict_proba": [7, 10], "predictor": [0, 5, 6, 7, 9, 10, 11, 26, 27], "prefer": [0, 1, 6, 8, 9, 11, 13, 15, 19, 21, 26], "prepar": [0, 6, 20, 21, 26, 27], "preprocess": [0, 4, 6, 7, 8, 9, 10, 11, 15, 16, 17, 18, 21], "prerequisit": 0, "prescript": 21, "presenc": 13, "present": [0, 5, 6, 7, 9, 12, 13, 20, 21, 23, 26, 27, 28], "preserv": [3, 11, 20], "press": [13, 15, 25, 28], "pretrain": [1, 4], "pretti": [0, 4, 8, 9, 19, 21, 26], "prev_centroid": 14, "prevent": [13, 23], "previou": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 20, 21, 23, 27, 28], "previous": [2, 3, 9, 10, 23], "price": [0, 4, 9, 13], "primal": 8, "primari": [0, 7, 26], "prime": 23, "princip": [0, 5, 7, 19, 26, 27, 28], "principl": [0, 6, 7, 8, 14, 26], "print": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 20, 23, 26, 27, 28], "print_funct": [8, 9], "printout": [0, 26], "prior": [0, 5, 6, 26], "privat": 0, "prob": [1, 23], "probabilist": [0, 25, 26, 27], "probabl": [0, 1, 3, 4, 6, 7, 10, 13, 19, 26, 27], "problem": [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 19, 20, 21, 23], "probml": 25, "proce": [0, 5, 6, 7, 8, 9, 10, 11, 13, 20, 26, 27], "procedur": [2, 4, 5, 6, 8, 10, 11, 13, 27, 28], "proceed": 20, "process": [0, 2, 4, 6, 9, 10, 12, 13, 19, 20, 21, 23, 25, 26, 28], "prod": 25, "prod_": [1, 5, 7], "produc": [0, 3, 4, 5, 6, 9, 10, 11, 12, 13, 18, 19, 20, 23, 26, 27], "product": [0, 1, 3, 5, 6, 7, 8, 12, 13, 16, 17, 19, 20, 26, 27], "profess": [0, 26], "program": [0, 1, 4, 5, 6, 8, 12, 14, 15, 19, 20, 22, 23, 24, 26, 27], "programm": 20, "progress": [1, 4, 14], "prohibit": 6, "project": [0, 1, 2, 3, 5, 11, 13, 15, 19, 22, 27, 28], "project_root_dir": [0, 6, 7, 9, 26], "promin": 12, "promis": 8, "promot": [24, 26], "prone": [9, 15], "pronounc": [13, 19, 26], "proof": [0, 11, 12, 13, 26, 28], "propag": [2, 3, 13], "proper": [0, 2, 6, 7], "properli": [1, 6, 8, 10, 13, 18, 21], "properti": [0, 1, 3, 12, 13, 16, 20, 26], "proport": [0, 1, 5, 9, 11, 13, 23, 26, 27], "propos": [1, 4, 6, 10, 21, 26], "propto": [5, 13, 28], "proton": [0, 26], "prove": [3, 13, 28], "provid": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 19, 20, 21, 23, 26, 27, 28], "proxi": [1, 13], "prune": 9, "pseudo": [20, 23], "pseudocod": 21, "pseudoinv": 5, "pseudoinvers": [5, 6, 21], "pseudorandom": [6, 23], "psychologi": [0, 26], "pt": 13, "public": [0, 15, 19, 26], "pull": 15, "punish": [0, 1, 26], "pure": [3, 9, 23], "purest": 9, "puriti": 9, "purpos": [0, 3, 10, 12, 14, 26], "push": 15, "put": 1, "py": 5, "pycod": 26, "pydata": 19, "pydot": 9, "pyhton2": 26, "pylab": [7, 26], "pypi": 19, "pyplot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, 23, 26, 27, 28], "pythagora": 5, "python": [1, 2, 3, 5, 6, 8, 11, 12, 13, 14, 21, 23, 27], "python2": [0, 21], "python3": [0, 19, 21, 26], "pytorch": [0, 19, 21, 26], "q": [5, 6, 8, 11, 23], "qp": 8, "qquad": [2, 11, 13, 20], "qr": [5, 6, 20, 27, 28], "quad": [1, 13, 20], "quadrat": [0, 8, 9, 13, 26], "qualit": [4, 9, 21, 23], "qualiti": [0, 9, 19, 26, 27], "quantifi": 1, "quantil": 10, "quantit": [0, 6, 9, 21, 26], "quantiti": [0, 2, 5, 6, 7, 9, 10, 11, 12, 14, 16, 20, 23, 26, 27, 28], "quantum": [4, 12, 25, 26], "quartil": [0, 27], "quench": 5, "queri": 9, "question": [0, 5, 6, 9, 11, 12, 13, 21, 24, 26, 27], "qugan": 4, "quick": [4, 23], "quickli": [1, 3, 9, 11, 13, 28], "quit": [1, 5, 6, 9, 10, 12, 15, 27, 28], "quot": 4, "r": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 20, 21, 23, 27, 28], "r2": [0, 5, 6, 26, 27, 28], "r2_score": [0, 26], "r2score": [0, 26], "r_1": 9, "r_2": 9, "r_j": 9, "r_m": 9, "rad": [], "rade": [], "radial": [8, 12], "radioact": 23, "radiu": [0, 1, 27], "rain": 9, "ramp": 1, "ran0": 23, "ran1": 23, "ran2": 23, "ran3": 23, "rand": [0, 4, 5, 6, 9, 10, 13, 15, 20, 26, 27, 28], "randint": [6, 9, 13], "randn": [0, 1, 2, 5, 6, 9, 11, 13, 15, 18, 26, 27, 28], "random": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 17, 18, 19, 20, 26, 27, 28], "random_forest_model": 10, "random_index": 13, "random_indic": [1, 3], "random_st": [7, 8, 9, 10, 11], "randomforestclassifi": 10, "randomli": [1, 6, 9, 13, 14, 18, 28], "rang": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 20, 23, 26, 27, 28], "rangl": [0, 6, 11, 23, 26, 27], "rangle_x": 23, "rank": [5, 27, 28], "rankdir": 4, "raphson": [1, 8, 13], "rapidli": 0, "rare": [1, 13], "raschka": [26, 27], "rasckha": 26, "rashcka": 28, "rate": [1, 2, 3, 4, 8, 9, 10, 12, 13, 18, 28], "rather": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 23, 26, 27, 28], "ratio": [4, 7, 9, 10, 11], "rational": [0, 26], "ravel": [5, 6, 7, 8, 9, 10, 11, 13, 20], "raw": 3, "rbf": [8, 11, 12], "rbf_kernel_svm_clf": 8, "rbf_pca": 11, "rc": 23, "rcond": [0, 26, 27], "rcparam": [1, 3, 7, 8, 9, 10, 23, 26], "re": [2, 4, 13, 15, 28], "reach": [1, 4, 5, 6, 9, 10, 12, 13, 14, 28], "read": [0, 2, 3, 4, 5, 6, 7, 8, 11, 12, 16, 17, 20, 21, 23, 25, 28], "read_csv": [0, 6, 7, 9], "read_fwf": [0, 26], "reader": [0, 6, 20, 23, 26, 27], "readi": [0, 1, 5, 6, 8, 10, 11, 12, 20, 26], "readili": 1, "readm": 15, "readthedoc": 19, "real": [0, 1, 4, 7, 10, 11, 12, 16, 18, 20, 27], "real_loss": 4, "real_output": 4, "realist": [8, 26], "realiti": 23, "realiz": [1, 12], "realli": [0, 1, 26], "rearrang": 13, "reason": [0, 1, 3, 4, 10, 13, 25, 26, 28], "reassign": 1, "recal": [5, 6, 9, 10, 11, 12, 20, 23, 26, 27, 28], "recast": 3, "receiv": [1, 3, 10, 12, 23], "recent": [0, 6, 13, 25], "recept": [3, 12], "receptive_field": 3, "recip": [0, 6, 7, 20, 21, 26, 27], "reciproc": 5, "recogn": [0, 4, 5, 10, 26], "recognit": [0, 1, 3, 12, 25, 26], "recommen": 26, "recommend": [0, 2, 3, 4, 5, 6, 8, 13, 15, 19, 20, 21, 25, 28], "reconsid": 9, "reconstruct": 11, "record": [10, 21, 22, 24, 26], "recreat": 15, "rectangl": [9, 13, 28], "rectangular": [5, 27, 28], "rectifi": [1, 3, 12], "recur": [0, 19, 26], "recurr": [0, 1, 19, 26], "recurs": [9, 19, 20, 26], "red": [0, 3, 4, 6, 8, 9], "redefin": [0, 10, 26, 27, 28], "redefinit": 28, "reduc": [1, 3, 5, 6, 9, 10, 11, 13, 26, 28], "reduct": [0, 10, 11, 19, 23, 26, 27], "reegress": 21, "refer": [0, 1, 2, 3, 5, 6, 11, 12, 13, 14, 20, 25, 26, 27, 28], "referenc": 2, "refin": 12, "refit": 6, "reflect": [0, 1, 4, 5, 21, 23, 26], "refresh": [19, 26], "refreshprogrammingskil": 26, "reg": [10, 11], "regard": [1, 9, 13], "regardless": [12, 16], "region": [3, 4, 6, 9, 12, 21], "regist": [6, 23], "reglasso": [5, 28], "regr_1": [0, 9], "regr_2": [0, 9], "regr_3": [0, 9], "regress": [1, 8, 11, 12, 16, 19, 20], "regressor": [0, 7, 10], "regridg": [0, 5, 6, 27, 28], "regular": [0, 3, 4, 5, 6, 7, 9, 13, 17, 18, 24, 26, 27, 28], "regularli": 15, "reilli": [0, 25, 26], "reinforc": [0, 8, 19, 26], "reiter": 1, "reject": 7, "rel": [0, 4, 6, 7, 9, 12, 13, 23, 26, 27], "relat": [0, 1, 3, 4, 5, 11, 13, 14, 20, 23, 26, 27, 28], "relationship": [0, 4, 9, 18, 26], "relativeerror": [0, 26, 27], "releas": [1, 19, 26], "relev": [0, 1, 5, 7, 11, 19, 21, 23, 26, 28], "reli": [0, 6, 8], "reliabilti": 21, "reliabl": [7, 23], "relu": [3, 4, 26], "remain": [1, 2, 4, 6, 12, 20, 23, 27], "remaind": 23, "reman": 2, "remark": 1, "rememb": [0, 8, 13, 20, 21, 26], "remind": [0, 5, 11, 13, 20, 23], "remot": 15, "remov": [4, 5, 6, 18, 27, 28], "renam": 15, "render": [0, 26, 27], "reorder": [5, 7, 27, 28], "reorgan": [0, 26], "repeat": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 14, 20, 21, 23, 26, 27, 28], "repeated": 26, "repeatedli": [0, 6, 10, 13], "repet": 3, "repetit": [6, 26, 27], "rephras": [13, 28], "replac": [0, 1, 3, 4, 5, 6, 10, 12, 14, 19, 21, 26, 27, 28], "replica": 6, "repo": [15, 21], "report": 26, "repositori": [4, 21, 26], "repres": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 21, 23, 26, 27, 28], "represent": [0, 1, 3, 6, 23, 26], "representd": 3, "reproduc": [0, 5, 6, 9, 12, 15, 16, 18, 19, 23, 26, 27], "repuls": [0, 26], "request": [0, 13], "requir": [0, 1, 3, 4, 5, 6, 8, 9, 11, 12, 13, 15, 17, 18, 20, 26, 27, 28], "res1": 2, "res2": 2, "res3": 2, "res_analyt": 2, "res_analytical1": 2, "res_analytical2": 2, "res_analytical3": 2, "resaml": 6, "resampl": [0, 7, 10, 19, 26, 27], "rescal": [0, 11, 12], "rescu": 5, "reseach": 6, "research": [0, 4, 13, 19, 25, 26], "resembl": [6, 23], "reserv": [1, 5, 6, 23], "reshap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 20, 26, 27], "residenti": [], "residu": [0, 5, 13, 26], "resiz": [5, 27, 28], "resourc": 26, "respect": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 21, 23, 26, 27, 28], "respond": 12, "respons": [0, 7, 9, 12, 26, 27], "rest": [0, 5, 18, 27, 28], "restat": [0, 12, 26], "restor": 4, "restored_discrimin": 4, "restored_gener": 4, "restrict": [0, 3, 9, 12, 26], "result": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26], "retail": [], "retain": [5, 6, 27, 28], "return": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 16, 17, 20, 23, 26, 27, 28], "return_data": 14, "return_sequ": 4, "return_x_i": 9, "reus": [1, 3, 6, 21], "reveal": [0, 12, 26], "revers": [1, 20], "review": [19, 20], "revisit": 14, "revolut": 26, "reward": [0, 4, 26], "rewrit": [0, 3, 5, 6, 7, 8, 10, 11, 12, 13, 16, 20, 21, 23, 28], "rewritten": [2, 6, 8, 10, 23], "rewrot": 13, "rf": 10, "rgb": 3, "rgoj5yh7evk": 19, "rh": 6, "rho": [0, 10, 13], "rho_1": 10, "rho_2": 10, "rho_m": 10, "rich": [0, 26], "ride": 9, "rideclass": 9, "ridedata": 9, "ridg": [7, 11, 13, 19, 26], "ridge_paramet": 17, "ridge_sk": 6, "ridgebeta": 28, "ridgetheta": 5, "right": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 20, 21, 23, 26, 27, 28], "right_sid": 2, "rightarrow": [0, 1, 5, 6, 8, 11, 12, 13, 23, 26, 27, 28], "rigor": [0, 26, 27, 28], "ring": 6, "rise": [0, 26], "risk": [0, 13, 26, 28], "rival": 4, "river": [], "rlm": 26, "rm": 23, "rmse": [], "rmsporp": 13, "rmsprop": [1, 3, 4, 13, 21], "rnd_clf": 10, "rng": 23, "rnn": [4, 12], "rnn1": 4, "rnn2": 4, "rnn_2layer": 4, "rnn_input": 4, "rnn_output": 4, "rnn_train": 4, "rntrick1": 23, "rntrick2": 23, "rntrick3": 23, "rntrick4": 23, "ro": [0, 13, 26, 28], "robert": [21, 25], "robust": [0, 26], "robustscal": [0, 27], "roc": [7, 10], "role": [0, 2, 5, 6, 8, 18, 19, 21, 26, 27, 28], "roll": 6, "room": [0, 24, 26], "root": [0, 5, 9, 13, 15, 23, 27, 28], "rot": 26, "rotat": [1, 8, 9, 10], "rotation_matrix": 9, "roughli": [1, 3, 18], "round": [7, 9, 13], "routin": [13, 20, 26, 28], "row": [0, 1, 2, 5, 6, 9, 11, 16, 20, 26, 27, 28], "rr": [5, 27, 28], "rrr": [5, 27, 28], "rudg": [], "rug": [13, 28], "rule": [0, 1, 5, 6, 13, 21, 26, 27, 28], "run": [0, 1, 2, 4, 5, 6, 8, 9, 11, 13, 15, 19, 21, 26, 27, 28], "runtim": [1, 6, 14, 15], "rust": [0, 19, 20, 26], "rvert": 1, "rvert_2": 1, "s_": [3, 6], "s_1": 6, "s_i": [6, 7], "s_j": 6, "s_k": 6, "s_phenomenon": 21, "saddl": [13, 28], "safeguard": 18, "sai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20, 21, 23, 26, 27, 28], "said": [6, 9, 13, 28], "sake": [0, 5, 7, 11, 26, 27, 28], "sale": [0, 26], "sam": 26, "same": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 18, 20, 21, 23, 26, 27, 28], "samm": 10, "sampl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 18, 19, 20, 21, 23, 26, 27], "sample_vari": 14, "sampleexptvari": 23, "samwis": 26, "sastri": 11, "satisfactori": [0, 26], "satisfi": [1, 2, 3, 6, 8, 13, 20, 23, 28], "satur": [1, 6], "save": [0, 4, 6, 7, 9, 13, 26], "save_fig": [0, 6, 7, 9, 10, 26], "savefig": [0, 4, 6, 7, 9, 23, 26], "savetxt": 4, "saw": [5, 27], "scalabl": 10, "scalar": [2, 5, 6, 10, 27], "scale": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 20, 21, 24, 26, 28], "scale_mean": 4, "scale_std": 4, "scaler": [0, 7, 8, 9, 10, 11, 17, 21, 27], "scan": [5, 7], "scari": 5, "scatter": [0, 1, 6, 7, 8, 9, 14, 15, 17, 26, 27], "scenario": [6, 13, 28], "schedul": 13, "scheme": [1, 13, 28], "schrage": 23, "sch\u00f8yen": [6, 27], "scienc": [0, 1, 10, 12, 13, 19, 22, 23, 24, 25, 28], "scientif": [0, 19, 21, 26], "scientist": [0, 26], "scikit": [3, 5, 6, 8, 9, 10, 13, 15, 16, 19, 20, 21, 25], "scikit_learn": 0, "scikitlearn": 26, "scikitplot": [7, 10], "scipi": [0, 3, 5, 6, 13, 19, 20, 21, 26, 27, 28], "scl": 6, "scm": 15, "score": [0, 1, 3, 6, 7, 9, 10, 11, 15, 16, 21, 24, 26, 27], "scores_kfold": 6, "scratch": [1, 13, 16], "sdg": 13, "sdv4f4s2sb8": 28, "seaborn": [0, 1, 3, 6, 7, 26], "seamless": [0, 19, 21, 26], "search": [0, 1, 3, 5, 9, 13, 15, 26, 28], "sebastian": 26, "sebastianraschka": 26, "sec": 6, "second": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 14, 15, 16, 19, 20, 23, 24, 26, 27, 28], "secondeigvector": 11, "secondli": 12, "section": [4, 11, 16, 20, 21, 23, 27], "sector": 0, "see": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 26, 27, 28], "seed": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 18, 23, 26, 27, 28], "seed_imag": 4, "seek": [1, 2, 8], "seem": [1, 3, 4], "seemingli": [0, 26], "seen": [0, 1, 3, 5, 10, 12, 23], "segment": [13, 28], "seismic": 6, "seldomli": [0, 26], "select": [1, 5, 6, 8, 9, 10, 11, 15, 21, 22, 23, 24, 25, 26, 27, 28], "selevet": 15, "self": [1, 5, 27], "sell": 4, "semest": [7, 22], "semi": [8, 13, 28], "semilogx": 6, "send": [5, 12, 13, 24, 26], "senior": [22, 24], "sens": [0, 4, 6, 8, 26], "sensibl": 3, "sensit": [0, 5, 6, 9, 13, 26, 27], "sent": 2, "sentenc": [4, 12], "separ": [0, 1, 2, 4, 6, 8, 9, 12, 14, 18, 19, 21, 23, 26], "septemb": [18, 21, 26], "sequenc": [3, 4, 7, 9, 10, 12, 13, 19, 20, 23, 26, 28], "sequenti": [1, 3, 4, 10, 12, 23], "seri": [0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 20, 26, 27, 28], "serif": [7, 23, 26], "serv": [0, 1, 2, 3, 5, 7, 13, 25, 26, 27, 28], "servic": 21, "session": [1, 15, 21, 22, 24, 26], "set": [1, 4, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24], "set_major_formatt": 6, "set_major_loc": 6, "set_tick": [1, 8], "set_ticklabel": 1, "set_titl": [0, 1, 2, 3, 7, 12, 14, 26], "set_xlabel": [0, 1, 2, 3, 7, 12, 26], "set_xlim": [7, 12], "set_xticklabel": 1, "set_ylabel": [0, 1, 2, 3, 7, 26], "set_ylim": [7, 12], "set_ytick": 7, "set_yticklabel": [1, 6], "set_zlim": 6, "seth": 4, "setminu": 6, "setosa": [8, 9], "setosa_or_versicolor": 8, "setp": 6, "setup": [1, 4, 6, 8, 19, 26, 27, 28], "sever": [0, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 19, 20, 21, 23, 26, 27, 28], "sgd": [1, 3, 28], "sgd_clf": 8, "sgdclassifi": 8, "sgdreg": 13, "sgdregressor": 13, "sgn": [5, 27, 28], "shallow": 13, "shape": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 20, 26, 27, 28], "share": [1, 3, 15, 26], "shareabl": 15, "she": 7, "shift": [1, 6, 12, 15, 18, 23, 27], "ship": 3, "shire": 26, "short": [4, 5, 21], "shortcom": [13, 28], "shorten": 4, "shorter": 23, "shorthand": 26, "shortli": [20, 26], "should": [0, 2, 3, 5, 6, 8, 9, 11, 12, 15, 18, 20, 21, 23, 26, 27], "show": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 23, 26, 27, 28], "show_shap": 4, "shown": [0, 4, 5, 8, 12, 13, 20, 27, 28], "shrink": [3, 5, 6, 8, 11, 27, 28], "shrinkag": [5, 6, 27, 28], "shrunk": 11, "shuffl": [0, 1, 4, 6, 13, 27], "side": [0, 2, 5, 8, 12, 13, 20, 21, 26, 28], "sigh": [19, 26], "sigma": [0, 1, 5, 6, 7, 10, 11, 12, 13, 20, 21, 23, 26, 27, 28], "sigma0": 23, "sigma1": 23, "sigma2": 23, "sigma_": [5, 20, 26, 27, 28], "sigma_0": [5, 27, 28], "sigma_1": [5, 27, 28], "sigma_2": [5, 27, 28], "sigma_fn": [7, 12], "sigma_i": [0, 5, 26, 27, 28], "sigma_j": [5, 27, 28], "sigma_m": [6, 23], "sigma_n": [11, 23], "sigma_t": 13, "sigma_x": 23, "sigmoid": [1, 2, 4, 7, 8, 10, 12], "sigmundson": [6, 27], "sign": [1, 2, 7, 8, 10, 23, 24], "signal": [1, 3, 10, 12], "signifi": 4, "signific": 1, "significantli": [1, 13, 18, 23, 28], "sim": [4, 5, 6, 13, 23], "similar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 19, 20, 21, 26, 28], "similarli": [0, 1, 3, 5, 8, 10, 13, 23, 26, 27, 28], "simpl": [1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 14, 16, 17, 19, 20, 23], "simplepredict": 10, "simpler": [0, 1, 5, 6, 7, 13, 16, 19, 21, 26, 28], "simplernn": 4, "simplest": [0, 1, 3, 4, 9, 10, 12, 14, 21, 26], "simpletre": 10, "simpli": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 19, 20, 21, 23, 26, 27, 28], "simplic": [2, 5, 6, 7, 8, 9, 10, 11, 12, 14, 27, 28], "simplicti": [5, 27, 28], "simplifi": [0, 6, 9, 18, 19, 21, 26, 27], "simplist": [3, 6, 23], "simul": [6, 18], "simultan": 6, "sin": [0, 1, 2, 3, 4, 9, 12, 13, 20, 26], "sinc": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 16, 18, 20, 21, 23, 25, 26, 27, 28], "sine": [3, 12], "singl": [0, 1, 2, 3, 5, 6, 7, 8, 9, 12, 13, 18, 20, 23, 26, 27, 28], "singular": [0, 6, 13, 20, 26], "sinusoid": 3, "site": [0, 21, 22, 27], "situat": [0, 4, 5, 7, 13, 23, 26, 27, 28], "six": [3, 23], "size": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 18, 20, 21, 23, 26], "sketch": 10, "ski": 9, "skill": 0, "skip": 11, "skl": [0, 6, 26, 27], "sklearn": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 26, 27, 28], "skplt": [7, 10], "sl": [6, 27], "slack": 8, "slice": [2, 20, 26], "slide": [0, 3, 16, 21, 23, 26, 27, 28], "slight": [6, 13], "slightli": [1, 2, 3, 5, 6, 7, 10, 23, 27, 28], "slope": [8, 11, 12], "slow": [0, 2, 8, 13, 18, 27, 28], "slower": [5, 20, 26, 27, 28], "slowest": 20, "slowli": 12, "slp": 1, "small": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 19, 20, 23, 26, 27, 28], "smaller": [0, 1, 2, 5, 6, 8, 9, 11, 13, 23, 26, 27, 28], "smallest": [0, 4, 14, 26], "smallest_row_index": 14, "smooth": [0, 3, 6, 13, 21, 26, 28], "sn": [0, 1, 3, 6, 7, 26], "sne": 11, "so": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 24, 26, 27, 28], "soar": 6, "social": 0, "soft": [1, 7, 10, 12], "soften": 8, "softmax": [3, 7], "softwar": [0, 8, 19, 20], "sol": 8, "sole": [0, 6, 26], "solid": [0, 7], "solut": [0, 1, 2, 3, 5, 6, 8, 10, 11, 13, 18, 20, 21, 23, 26, 27, 28], "soluton": 2, "solv": [0, 1, 3, 5, 6, 8, 10, 11, 12, 13, 16, 20, 21, 26, 27], "solve_expdec": 2, "solve_ode_deep_neural_network": 2, "solve_ode_neural_network": 2, "solve_pde_deep_neural_network": 2, "solveod": 2, "solveode_popul": 2, "solver": [2, 7, 8, 9, 10, 20, 26], "some": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18, 21, 23, 26], "some_model": [6, 27], "somehow": 4, "someon": 16, "someth": [0, 1, 3, 4, 7, 9, 11, 15, 21, 23, 26, 27], "sometim": [0, 1, 11, 12, 13, 14, 27], "soon": [20, 24, 27], "sophist": [0, 26], "sopt": 13, "sort": [5, 6, 9, 11, 23], "sound": [3, 5], "sourc": [0, 1, 3, 6, 19, 20, 21, 23, 26], "space": [0, 1, 4, 5, 8, 9, 11, 12, 13, 14, 23, 27, 28], "span": [0, 3, 5, 9, 11, 20, 26, 27, 28], "spare": 1, "spars": [3, 6, 18, 20, 26], "sparse_mtx": [20, 26], "sparsecategoricalcrossentropi": 3, "sparsiti": [10, 18], "spatial": [1, 2, 3, 12], "speak": 23, "special": [6, 7, 10, 12, 13, 20, 23, 26, 27, 28], "specif": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 16, 19, 20, 21, 23, 25, 26, 27, 28], "specifi": [0, 3, 5, 6, 7, 9, 11, 13, 14, 23, 26, 28], "specifici": [0, 10, 26], "spectacular": 3, "spectral": 1, "speech": [0, 1, 3, 4, 12], "speed": [1, 2, 4, 13], "spend": [16, 23], "spent": 21, "sphere": [0, 27], "spin": 6, "spite": 0, "spline": 8, "split": [1, 3, 4, 5, 6, 8, 9, 10, 11, 14, 16, 17, 21, 23, 26, 28], "splite": 0, "splitter": [1, 10], "spontan": 23, "spot": 3, "spread": [0, 11, 23, 26, 27], "springer": [21, 25, 26], "spuriou": 13, "sqquar": 28, "sqrsignal": 3, "sqrt": [3, 4, 5, 6, 8, 10, 11, 13, 23, 27, 28], "squar": [1, 2, 3, 4, 7, 8, 9, 11, 13, 14, 15, 17, 18, 19, 20, 23], "squarederror": 10, "squaredeuclidean": 14, "squash": 12, "srtm": 6, "srtm_data_norway_1": 6, "stabil": [5, 21], "stabl": [0, 4, 5, 6, 9, 16, 19, 21, 26, 27, 28], "stack": [3, 4], "stage": [5, 13, 15, 21], "stai": [0, 2, 4, 5, 11, 26, 27], "stand": [0, 5, 9, 12, 26, 27, 28], "standard": [0, 1, 4, 5, 6, 7, 8, 10, 12, 17, 18, 20, 21, 23, 26, 28], "standardscal": [0, 6, 7, 8, 9, 10, 11, 17, 27], "stanford": [13, 28], "start": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 20, 23, 24, 26, 27, 28], "start_tim": 14, "stat": 6, "state": [1, 2, 4, 5, 6, 7, 8, 10, 11, 12, 13, 19, 23, 26, 27, 28], "statement": [0, 7, 20, 26], "stationari": 28, "statist": [0, 1, 3, 4, 7, 9, 10, 11, 12, 13, 14, 20, 21, 25, 27, 28], "statu": [0, 7, 15, 26], "stavang": 6, "std": [0, 4, 6, 18, 26, 27], "steep": [13, 28], "step": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 20, 21, 26, 28], "step_fn": [7, 12], "step_length": 13, "steps_list": 9, "stereo": 3, "still": [0, 2, 3, 5, 6, 11, 13, 23, 27, 28], "stimuli": 12, "stk": [25, 26], "stk2100": [25, 26], "stk3155": [15, 21, 22, 24], "stk4021": [25, 26], "stk4051": [25, 26], "stk4155": [22, 24], "stk5000": 25, "stochast": [0, 1, 5, 6, 8, 11, 12, 28], "stock": 4, "stoke": 12, "stone": [0, 7], "stop": [1, 4, 9, 13, 14, 18, 28], "storag": [5, 27, 28], "store": [0, 1, 2, 3, 6, 11, 13, 18, 23, 26], "storehaug": [24, 26], "str": [1, 3, 4], "straight": [0, 6, 8, 13, 26, 28], "straightforward": [0, 2, 3, 5, 6, 8, 9, 10, 13, 20, 26, 27, 28], "strategi": [0, 1, 9, 26], "stratifi": 6, "strength": [0, 5, 14, 27, 28], "stretch": 11, "strict": [8, 13, 28], "strictli": [8, 13, 28], "stride": [4, 20], "strike": 6, "string": 1, "stroke": 7, "strong": [3, 6, 9, 10, 12, 20, 23], "strongli": [0, 8, 15, 19, 20], "stronli": [], "structur": [0, 1, 2, 3, 6, 9, 10, 12, 19, 26], "stuck": [1, 13, 28], "student": [0, 15, 21, 22, 24, 25, 26], "studi": [0, 3, 4, 5, 6, 7, 8, 11, 12, 13, 19, 21, 25, 26, 27, 28], "studier": 25, "style": [7, 9, 20, 26], "st\u00f8land": 24, "sub": [9, 12], "subdivid": [0, 20, 26], "subfield": 0, "subject": [6, 8, 23], "submit": 26, "subplot": [0, 1, 3, 4, 6, 7, 8, 9, 10, 14, 26], "subplots_adjust": [8, 23], "subprogram": [20, 26], "subract": [0, 27], "subroutin": [0, 26], "subscript": 1, "subsequ": [1, 4, 5, 6, 12, 20, 23, 27, 28], "subset": [1, 6, 9, 12, 13, 19, 26, 28], "subspac": [0, 8, 11, 27], "substanti": [9, 10], "substep": 11, "substitut": [3, 6, 12, 16, 20], "subsubset": 9, "subtask": 6, "subtl": 1, "subtract": [0, 4, 5, 6, 11, 13, 18, 20, 21, 23, 27], "subtre": 9, "succeed": [0, 4, 26], "success": [3, 7, 9, 13, 23], "successfulli": [4, 9], "sudo": [0, 19, 21, 26], "suffer": [0, 1, 2, 5, 10, 26, 27, 28], "suffici": [1, 6, 8, 11, 13, 28], "suggest": [1, 13, 21, 25, 28], "suit": [8, 12], "suitabl": [0, 15, 23, 27], "sum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "sum_": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 20, 21, 23, 26, 27, 28], "sum_i": [0, 2, 5, 6, 8, 13, 21, 27, 28], "sum_j": [6, 18], "sum_ja_": 0, "sum_k": [6, 8, 12, 20], "sum_logist": 13, "sum_m": 3, "sum_n": 3, "sum_nx_": 3, "summar": [5, 6, 9], "summari": [1, 3, 4, 10, 22, 28], "summat": [0, 3, 16, 27, 28], "sunni": 9, "super": [5, 27, 28], "superfici": 3, "superscript": [1, 12], "supervis": [0, 5, 6, 7, 9, 12, 19, 26, 27, 28], "supplement": [7, 21], "support": [0, 1, 9, 10, 11, 13, 19, 26, 27], "suppos": [0, 5, 6, 7, 8, 10, 11, 12, 13, 20, 26, 27, 28], "suppress": [5, 13, 28], "sure": [0, 1, 4, 6, 16, 21], "surf": 6, "surfac": [0, 6, 26], "surpass": 6, "surpris": [0, 26], "surround": [3, 19], "survei": [0, 5, 6, 26, 27], "svc": [8, 9, 10], "svd": [0, 6, 11, 26], "svdinv": 5, "svm": [8, 9, 10, 11], "svm_clf": [8, 10], "swath": [5, 27, 28], "switch": 0, "sy": [13, 28], "symbol": [1, 5, 11, 13, 19, 23, 26, 27, 28], "symmeteri": 1, "symmetr": [0, 5, 8, 11, 12, 13, 20, 26, 27], "symmetri": 6, "sympi": [0, 19, 21, 26], "synonim": 23, "syntax": 13, "system": [0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 19, 20, 21, 26, 28], "systemat": [4, 6], "t": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 28], "t0": [3, 6, 13], "t1": [2, 13], "t2": 2, "t3": 2, "t_": 2, "t_0": [2, 9, 13], "t_1": 13, "t_b": 10, "t_i": [1, 2, 5, 12, 27, 28], "t_j": 12, "t_k": 9, "tabl": [9, 21, 23, 24, 26], "tabul": [0, 26], "tabular": 26, "tackl": 4, "tag": [2, 3, 4, 5, 6, 7, 12, 13, 14, 20, 23, 27, 28], "taht": [0, 26], "tail": 23, "tailor": [2, 8, 11, 26], "taiwan": [0, 26], "take": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 20, 23, 26, 27, 28], "taken": [0, 1, 3, 6, 10, 13, 20], "tan": 3, "tangent": [1, 4, 12, 13, 28], "tanh": [1, 4, 7, 8, 12], "target": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 26, 27, 28], "target_nam": 9, "task": [0, 1, 3, 6, 9, 11, 12, 14, 21, 26], "tau": [3, 5, 23], "taught": 26, "tax": [], "taylor": [2, 13, 28], "taylornr": [13, 28], "tc": 8, "teach": [15, 22, 26], "team": 1, "teaser": 0, "technic": [0, 5, 6, 13, 21, 28], "techniqu": [0, 1, 8, 10, 13, 19, 23, 25, 26, 27], "technologi": [0, 1], "tell": [0, 4, 6, 10, 11, 13, 16, 23], "temp": 1, "temp1": 1, "temp2": 1, "temperatur": [0, 9, 26], "templat": 18, "temporarili": 1, "ten": [3, 26], "tend": [3, 5, 6, 8, 9, 10, 12, 13, 14, 27], "tendenc": [0, 26], "tension": 6, "tensor": 3, "tensorflow": [0, 2, 4, 8, 14, 19, 20, 21, 25, 26, 27], "term": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 21, 23, 26, 27, 28], "term1": [5, 6, 11], "term2": [5, 6, 11], "term3": [5, 6, 11], "term4": [5, 6, 11], "termin": [0, 4, 5, 9, 10, 13, 15, 27, 28], "terminarl": 15, "terrain": 6, "terrain1": 6, "test": [3, 4, 5, 6, 7, 8, 9, 10, 13, 16, 20, 21, 23, 26, 28], "test_acc": 3, "test_accuraci": [1, 3], "test_error": 6, "test_imag": [3, 4], "test_ind": 6, "test_input": 4, "test_label": [3, 4], "test_loss": 3, "test_pr": 1, "test_predict": 1, "test_rnn": 4, "test_scor": [7, 10], "test_siz": [0, 1, 3, 5, 6, 10, 15, 17, 27, 28], "test_split": 9, "testerror": [0, 6, 27], "testi": 4, "testpredict": 4, "testx": 4, "text": [0, 1, 2, 4, 5, 8, 9, 11, 13, 15, 18, 20, 23, 25, 27, 28], "textbook": [16, 21, 27, 28], "textual": 9, "textur": 1, "tf": [1, 3, 4, 13, 14, 28], "th": [0, 1, 2, 5, 6, 7, 9, 12, 13, 14, 20, 21, 23, 26, 27], "than": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 17, 19, 23, 26, 27], "thank": [4, 6, 27], "theano": [1, 19, 26], "thei": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 18, 20, 21, 23, 26, 27, 28], "them": [0, 1, 3, 4, 6, 8, 9, 10, 11, 12, 13, 18, 20, 21, 26, 27], "theme": [0, 15, 26], "themselv": [0, 21, 23, 26], "thenc": 6, "theorem": [2, 6, 7, 27, 28], "theoret": [0, 4, 10], "theori": [0, 1, 3, 8, 9, 12, 13, 19, 21, 25, 26], "thereaft": [0, 5, 6, 11, 12, 20, 21, 26], "therebi": [0, 5, 7, 11, 21, 26, 27, 28], "therefor": [0, 1, 2, 3, 4, 6, 7, 8, 11, 13, 23, 26, 27, 28], "therein": 11, "thereof": [0, 6, 13, 26], "theta": [0, 1, 4, 5, 6, 7, 13, 16, 21, 23, 26, 27, 28], "theta_": [0, 1, 6, 7, 13, 26, 27, 28], "theta_0": [0, 5, 6, 7, 16, 26, 27, 28], "theta_0x_": [0, 26, 27], "theta_1": [0, 5, 6, 7, 26, 27, 28], "theta_1x_": [0, 26, 27], "theta_1x_0": [0, 26], "theta_1x_1": [0, 7, 26], "theta_1x_2": [0, 26], "theta_1x_i": [7, 27, 28], "theta_2": [0, 26, 27], "theta_2x_": [0, 26, 27], "theta_2x_0": [0, 26], "theta_2x_1": [0, 26], "theta_2x_2": [0, 7, 26], "theta_2x_i": 27, "theta_3x_i": 27, "theta_4x_i": 27, "theta_closed_form": 18, "theta_closed_formol": 18, "theta_closed_formridg": 18, "theta_gdol": 18, "theta_gdridg": 18, "theta_i": [0, 1, 5, 26, 27, 28], "theta_j": [0, 5, 6, 18, 26, 27], "theta_k": 28, "theta_linreg": [13, 28], "theta_ol": 18, "theta_p": 7, "theta_px_p": 7, "theta_ridg": 18, "theta_t": 13, "theta_tru": 18, "thetavalu": 5, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 27, 28], "thing": [0, 1, 2, 4, 5, 7, 9, 15, 16, 18, 23, 26], "think": [0, 1, 3, 4, 6, 9, 12, 13, 14, 23, 26, 27, 28], "third": [0, 3, 6, 13, 24, 26, 28], "thirti": 7, "thorughout": 26, "those": [0, 3, 5, 6, 8, 9, 10, 11, 20, 21, 26, 27, 28], "though": [1, 2, 3, 4, 13, 16, 17, 20, 23], "thought": [6, 14, 21, 23], "thousand": [0, 1, 21, 27], "three": [0, 1, 3, 5, 6, 8, 9, 12, 20, 21, 22, 23, 24, 26, 27, 28], "threshold": [1, 3, 9, 10, 11, 12, 13], "through": [0, 1, 2, 3, 4, 5, 6, 8, 11, 12, 13, 14, 15, 19, 20, 21, 23, 26, 27, 28], "throughout": [0, 4, 5, 14, 15, 19, 20, 23, 26], "throw": [3, 6, 23], "thu": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 24, 26, 27, 28], "thumb": [0, 6, 21, 27], "thursdai": [], "tibshirani": [6, 21, 25, 26], "tick_param": 6, "ticker": [6, 13, 23, 28], "tif": 6, "tight_layout": [1, 7], "tightli": 11, "tild": [0, 5, 6, 7, 11, 21, 23, 26, 27, 28], "till": [0, 4, 7, 8, 9, 10, 12, 20, 26, 27], "time": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "timeit": 4, "timer": 4, "tini": 1, "tip": 3, "titl": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 23, 26, 28], "tmp": 13, "tn": [2, 3, 7], "to_categor": [1, 3, 4], "to_categorical_numpi": 1, "to_numer": [0, 6, 26], "todai": 3, "togeth": [0, 3, 6, 8, 11, 13, 19, 26], "toi": 14, "told": 13, "toler": [2, 14], "tolist": 4, "tomographi": 12, "too": [0, 2, 4, 5, 6, 9, 11, 13, 17, 18, 23, 25, 27, 28], "took": [8, 26], "tool": [0, 1, 3, 6, 13, 15, 19, 27], "toolbox": 8, "top": [0, 3, 5, 6, 9, 10, 19, 26], "topic": [0, 5, 6, 7, 8, 19, 21, 27, 28], "topolog": [3, 12], "topologi": [1, 12], "torkjellsdatt": [24, 26], "toss": [10, 23], "total": [0, 1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13, 14, 20, 23, 24, 26, 27, 28], "total_loss": 4, "totalclustervari": 14, "totalscatt": 14, "toward": [1, 2, 7, 12, 13, 15, 28], "town": [], "tp": [4, 7], "tpng": 9, "tpu": [13, 19, 26], "tqdm": 6, "tr": [], "track": [3, 13, 14, 15, 20, 27, 28], "tract": [], "tractabl": [0, 26, 27], "trade": [5, 9], "tradeoff": [0, 5, 21, 26, 27, 28], "tradit": [0, 1, 4, 6, 26], "train": [2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 16, 17, 21, 28], "train_accuraci": [0, 1, 3, 26], "train_dataset": 4, "train_end": [0, 1, 27], "train_error": 6, "train_imag": [3, 4], "train_ind": 6, "train_label": [3, 4], "train_pr": 1, "train_siz": [0, 1, 3, 27], "train_step": 4, "train_test_split": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "train_test_split_numpi": [0, 1, 27], "trainable_vari": 4, "trained_model": [6, 27], "trainerror": [0, 27], "traini": 4, "training_checkpoint": 4, "training_dataset": 4, "training_gradi": 13, "trainingerror": 6, "trainpredict": 4, "trainscor": 4, "trainx": 4, "trait": [0, 26], "trajectori": 4, "transfer": [9, 26], "transform": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 20, 26, 27, 28], "transit": [6, 12], "translat": [1, 4, 6, 10, 26, 27], "transpos": [1, 5, 11, 20, 27, 28], "travers": [0, 5], "treat": [0, 1, 3, 6, 12, 13, 18, 23, 26, 27, 28], "tree": [0, 1, 19, 26], "tree_clf": [9, 10], "tree_clf_": 9, "tree_clf_sr": 9, "tree_reg": 9, "tree_reg1": 9, "tree_reg2": 9, "trend": 23, "treue": 7, "trevor": [21, 25], "tri": [2, 3, 4, 9, 13, 16], "triain": 0, "trial": [0, 2, 4, 6, 13, 23, 26, 28], "triangl": [13, 28], "triangular": 20, "trick": [3, 4, 8, 11, 13, 23], "trickier": 23, "tridiagon": 20, "trillion": 19, "trivial": [0, 1, 5, 11, 23, 26, 28], "troubl": [0, 8, 12, 15, 27], "truck": 3, "true": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 20, 21, 23, 26, 27, 28], "true_beta": 27, "true_fun": 6, "true_theta": 6, "truli": 26, "try": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 18, 19, 20, 21, 23, 26, 27, 28], "tucker": 8, "tuesdai": [24, 26], "tumor": [7, 9], "tumour": 7, "tunabl": 1, "tune": [4, 9, 13, 20, 26], "turn": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 21, 23, 26, 27, 28], "tutori": [1, 4], "tv": 2, "tveito": 2, "tweak": [1, 4, 10, 23], "twice": [13, 28], "twist": 11, "two": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 17, 20, 21, 22, 23, 25, 26, 27, 28], "tx": [13, 28], "tx_1": [13, 28], "txt": [4, 15], "ty": [13, 28], "type": [0, 1, 3, 6, 8, 10, 13, 20, 23, 27, 28], "typic": [0, 1, 2, 3, 4, 5, 7, 9, 10, 12, 13, 15, 16, 23, 26, 27, 28], "typo": 21, "u": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 23, 25, 26, 27, 28], "u_": 20, "u_i": 12, "u_m": 10, "ua": [0, 26], "ubuntu": [0, 19, 21, 26], "uci": 21, "uio": [15, 21, 24, 25], "un": 14, "unabl": 15, "unari": [20, 26], "unbalanc": [6, 9], "unbias": [0, 5, 6, 26], "uncent": [6, 27], "uncertainti": [0, 5, 26], "uncertitud": 23, "unchang": [1, 3], "uncorrel": [10, 23], "undefin": [5, 27, 28], "under": [0, 1, 5, 6, 10, 13, 19, 21, 26, 27, 28], "underdetermin": [0, 26], "underfit": [1, 6], "underflowproblem": 5, "undergo": 5, "undergradu": [22, 24], "underli": [0, 1, 9, 13, 18, 23, 26], "underset": [4, 14], "understand": [0, 1, 3, 5, 6, 10, 13, 14, 15, 19, 26, 27, 28], "understood": [8, 13], "undesir": 8, "undetermin": [5, 8], "undo": 4, "unexpect": 6, "unexpected": 23, "unexplain": 18, "unfair": [6, 27], "unfortun": [1, 8, 9, 10], "unicode_liter": [8, 9], "uniform": [0, 1, 5, 6, 11, 13, 21, 23, 26, 28], "uniformli": [13, 23, 28], "unifrompdf": 23, "unimport": [13, 28], "union": [5, 6], "uniqu": [0, 2, 6, 13, 14, 20, 26], "unique_cluster_label": 14, "unit": [0, 1, 3, 4, 5, 10, 12, 18, 23, 26, 27, 28], "unitari": [5, 6, 20, 27, 28], "unitarili": [20, 26], "uniti": 23, "univari": 23, "univers": [0, 1, 2, 13, 19, 21, 22, 24, 26, 27, 28], "unix": 1, "unknow": [0, 20, 26], "unknown": [0, 1, 3, 4, 5, 6, 8, 10, 13, 20, 21, 26, 27, 28], "unknowwn": 12, "unlabel": 1, "unless": [0, 3, 6, 11, 13, 21, 26, 28], "unlik": [1, 3, 8, 13, 28], "unnecessarili": 9, "unord": 3, "unravel": 1, "unrol": [3, 11], "unseen": [0, 7, 9, 15], "unstabl": 1, "unsupervis": [0, 1, 4, 12, 19, 26], "unsymmetr": [20, 26], "until": [1, 2, 4, 9, 12, 13, 14, 28], "untouch": 0, "unusu": 12, "up": [1, 3, 4, 5, 6, 8, 10, 11, 13, 14, 16, 18, 19, 20, 21, 23, 24], "updat": [1, 2, 10, 12, 13, 14, 15, 18], "uploa": 26, "upload": [15, 19, 21, 25], "upon": [0, 1, 6, 7, 11, 20], "upper": [0, 8, 9, 16, 20, 27], "uppercas": [20, 26], "upsampl": 4, "upscal": 4, "url": [26, 27], "us": [4, 5, 6, 8, 9, 10, 11, 12, 14, 15, 17, 20, 23, 25], "usag": [0, 8, 19, 26, 27], "usd": [], "usd10000": [], "use_bia": 4, "usecol": [0, 26], "useless": 1, "user": [0, 1, 2, 4, 6, 7, 15, 19, 20, 21, 26, 27], "usernam": [15, 21], "usetex": 23, "usg": 6, "usr": 23, "usual": [0, 3, 4, 7, 12, 13, 14, 26], "ut": 5, "util": [1, 3, 4, 6, 7, 10, 14, 26], "ux": 20, "v": [2, 4, 5, 6, 11, 13, 15, 19, 27, 28], "v0": 23, "v1": 23, "v2": 23, "v_0": 11, "va": 1, "vahid": 26, "val": 13, "val_accuraci": 3, "val_loss": 4, "vale": 2, "valid": [0, 1, 4, 7, 9, 10, 13, 19, 23, 26, 27], "validation_data": 3, "validation_split": 4, "valu": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 19, 20, 21, 26], "valuat": 9, "valy": 4, "van": [0, 21, 26, 27, 28], "vandenbergh": [8, 13, 28], "vandermond": [0, 26], "vanilla": [0, 6, 11, 14, 27], "vanish": [1, 4, 13, 23, 28], "var": [5, 6, 10, 11, 21, 23, 27], "var_x": 23, "varabl": 8, "varepsilon": [5, 6], "varepsilon_": [5, 6], "varepsilon_i": [5, 6], "vari": [0, 1, 3, 5, 6, 10, 26], "variabl": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 14, 20, 26, 27], "varianc": [0, 1, 5, 7, 9, 10, 11, 13, 14, 18, 19, 20, 23, 26, 27, 28], "variance_i": [5, 11, 27], "variance_x": [5, 11, 27], "variant": [0, 1, 6, 8, 12, 13, 26, 27, 28], "variat": [3, 4, 11, 26], "varieti": [0, 3, 12, 19, 21, 26], "variou": [1, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 19, 20, 21, 23, 26, 27, 28], "varydimens": 4, "vastli": 3, "vaue": 1, "vault": 0, "vdot": [2, 13, 28], "ve": 21, "vec": 6, "vector": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 14, 17, 18, 19, 28], "vector_mean": 14, "ventur": [0, 8, 19, 26], "venv": 15, "verbos": [1, 3, 4], "veri": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 21, 23, 25, 26, 27, 28], "verifi": [3, 11, 20, 26], "versatil": [8, 26], "versicolor": [8, 9], "version": [0, 3, 10, 13, 14, 15, 19, 20, 21, 23, 26], "versu": 1, "vert": [0, 1, 5, 6, 7, 8, 9, 11, 13, 16, 17, 26, 27, 28], "vert_1": [5, 6, 27, 28], "vert_2": [5, 6, 11, 17, 27, 28], "via": [0, 5, 6, 7, 8, 9, 10, 11, 12, 19, 20, 21, 22, 23, 24, 26, 27, 28], "vidal": 11, "video": [0, 1, 12, 19, 22, 24, 26, 27, 28], "view": [1, 3, 5, 6, 12, 13, 23, 25, 26, 28], "violat": 8, "virginica": 9, "viridi": [0, 1, 2, 3, 26], "virtual": 1, "viscos": 13, "viscou": 13, "visibl": 15, "vision": [0, 3], "visual": [0, 3, 11, 12, 18, 19, 26, 27], "visualis": 1, "visualstudio": [15, 16], "viz": [6, 8, 23], "vmap": 13, "vmax": [1, 6], "vmin": [1, 6], "voic": 3, "volum": [0, 3, 26], "vote": [10, 26], "voting_clf": 10, "votingclassifi": 10, "votingsimpl": 10, "vstack": [5, 11, 20, 23, 26, 27], "vt": [5, 27, 28], "w": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "w1": 8, "w2": [8, 11], "w3": 8, "w_": [1, 12], "w_1": [8, 20], "w_1x_": 8, "w_1x_1": 8, "w_2": [8, 20], "w_2x_": 8, "w_2x_2": 8, "w_3": 20, "w_4": 20, "w_hidden": 2, "w_i": [1, 2, 10], "w_ix_i": 12, "w_j": 20, "w_m": 20, "w_output": 2, "w_px_": 8, "w_px_p": 8, "wa": [1, 3, 4, 5, 6, 7, 10, 11, 12, 14, 17, 20, 26, 27], "wai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 18, 20, 23, 26, 27, 28], "walk": 9, "walker": 23, "wang": [0, 26], "want": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 23, 26, 27, 28], "warn": 4, "warrant": 6, "wast": 3, "watch": [19, 28], "wave": 3, "wavelet": 8, "we": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28], "weak": [9, 10, 14], "weather": [1, 12], "web": [19, 22, 24, 26], "webpag": 26, "websit": [6, 20, 21, 22, 26], "wedg": [8, 23], "wednesdai": [24, 26], "wee": 11, "week": [0, 5, 6, 7, 21, 22, 24], "weekli": [15, 16, 19, 21, 22, 24, 25, 26], "weight": [1, 2, 3, 6, 7, 9, 10, 12, 13, 18, 23], "weigth": 2, "welcom": [8, 15, 19], "well": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 19, 20, 21, 23, 25, 26, 27, 28], "went": 8, "were": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 14, 23, 26], "wessel": [0, 21, 26, 27, 28], "what": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 23], "whatev": 3, "when": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 27, 28], "whenev": [13, 15, 23], "where": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 27, 28], "wherea": [6, 23], "wherefrom": 21, "wherein": [1, 12], "whether": [0, 3, 5, 7, 9, 21, 23, 26], "which": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28], "whichev": [1, 3], "while": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 23, 26, 27, 28], "white": 9, "whiteboard": [27, 28], "who": [0, 15], "whole": [1, 3, 4, 5, 9, 11, 13], "whose": [0, 6, 10, 23, 27], "whow": [11, 27], "why": [0, 1, 3, 6, 13, 15, 16, 17, 21, 27, 28], "wide": [0, 1, 3, 6, 7, 12, 19, 20, 21, 26], "widehat": 6, "width": [0, 3, 8, 9, 26], "wieringen": [0, 21, 26, 27, 28], "wiki": 21, "wikipedia": 21, "win": 10, "wind": 9, "wing": [24, 26], "winther": 2, "wiothout": 6, "wiscons": 7, "wisconsin": 10, "wisdom": [6, 27], "wise": [1, 5, 12, 13, 27, 28], "wish": [0, 2, 5, 7, 8, 11, 13, 14, 18, 20, 21, 26, 27, 28], "with_std": [0, 27], "wither": 6, "within": [0, 2, 3, 4, 7, 9, 12, 13, 14, 23, 25, 26, 28], "withinclust": 14, "without": [0, 1, 5, 6, 8, 9, 11, 12, 13, 15, 21, 26, 27, 28], "won": [0, 15, 26], "wonder": 8, "word": [0, 1, 3, 4, 5, 6, 7, 14, 23, 26, 27, 28], "work": [0, 1, 4, 6, 7, 8, 9, 13, 15, 16, 18, 19, 21, 22, 23, 24, 26, 27], "workshop": 26, "world": [0, 8, 16, 27], "worldwid": [0, 26], "worri": 15, "wors": [0, 1, 3, 4, 6, 26], "worth": 9, "worthi": 21, "would": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 18, 20, 21, 23, 26, 27, 28], "wrap": [6, 20, 26], "write": [0, 1, 2, 3, 5, 6, 7, 8, 12, 13, 15, 16, 20, 26, 27], "written": [0, 2, 3, 5, 11, 12, 13, 16, 19, 20, 21, 23, 26, 27, 28], "wrong": [1, 8, 15], "wrongli": 10, "wrote": [5, 11, 27], "wrt": [10, 13], "wth": [10, 13], "www": [19, 20, 21, 25, 26, 28], "wx_1": 8, "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 26, 28], "x0": 8, "x1": [4, 8, 9, 10, 13], "x1_exampl": 8, "x1d": 8, "x2": [8, 9, 10, 13], "x2d": [8, 11], "x2d_train": 11, "x2dsl": 11, "x3": 8, "x_": [0, 2, 3, 5, 6, 8, 10, 11, 13, 14, 20, 23, 26, 27, 28], "x_0": [0, 5, 11, 18, 20, 26, 27], "x_1": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 18, 20, 23, 26, 27, 28], "x_2": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 20, 23, 26, 27, 28], "x_3": [8, 20, 23], "x_4": 20, "x_6": 18, "x_center": 11, "x_data": 1, "x_data_ful": 1, "x_hidden": 2, "x_i": [0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "x_input": 2, "x_ix_": [0, 26], "x_iy_i": 8, "x_j": [0, 2, 8, 9, 12, 16, 23, 27], "x_jy_j": 8, "x_k": [12, 14, 20, 23, 27], "x_l": 23, "x_m": [6, 12, 20, 23], "x_mean": 18, "x_n": [0, 2, 3, 6, 8, 11, 12, 13, 20, 23, 26, 28], "x_new": [9, 10], "x_norm": 18, "x_offset": [6, 27], "x_output": 2, "x_p": [3, 7, 9], "x_poli": 9, "x_poly10": 9, "x_pred": 4, "x_prev": 2, "x_reduc": 11, "x_scale": 8, "x_small": 13, "x_std": 18, "x_test": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 27, 28], "x_test_": 17, "x_test_own": 6, "x_test_scal": [0, 6, 7, 9, 10, 11, 27], "x_tot": 4, "x_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "x_train_": 17, "x_train_mean": [6, 27], "x_train_own": 6, "x_train_scal": [0, 6, 7, 9, 10, 11, 27], "x_val": 1, "xarrai": [19, 26], "xavier": 1, "xbnew": [13, 28], "xcode": [0, 19, 21, 26], "xdclassiffierconfus": 10, "xdclassiffierroc": 10, "xg_clf": 10, "xgb": 10, "xgbclassifi": 10, "xgboost": 9, "xgboot": 10, "xgbregressor": 10, "xgparam": 10, "xgtree": 10, "xi": [8, 13], "xi_": 8, "xi_1": 8, "xi_i": 8, "xk": 8, "xla": [13, 19, 26], "xlabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 23, 26, 27, 28], "xlim": [6, 10], "xm": 9, "xmesh": 13, "xnew": [0, 13, 26, 28], "xp": 23, "xpanda": [0, 27], "xpd": [5, 11, 27], "xplot": 0, "xscale": [0, 27], "xsr": 9, "xt_x": [13, 28], "xtest": 6, "xtick": [3, 6, 8, 9], "xtrain": 6, "xu": [0, 26], "xx": [0, 20, 26], "xy": [0, 6, 8, 20, 26], "xytext": 8, "xz": [20, 26], "y": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 26, 27, 28], "y1": 4, "y2": 4, "y3": 4, "y_": [0, 1, 5, 6, 10, 11, 20, 26, 27], "y_0": [0, 5, 11, 20, 26, 27], "y_1": [0, 5, 8, 9, 11, 13, 20, 26, 27, 28], "y_1y_1": 8, "y_1y_1k": 8, "y_1y_2": 8, "y_1y_2k": 8, "y_1y_n": 8, "y_1y_nk": 8, "y_2": [0, 5, 8, 9, 11, 20, 26, 27], "y_2y_1": 8, "y_2y_1k": 8, "y_2y_2": 8, "y_2y_2k": 8, "y_3": [0, 9, 20], "y_4": 20, "y_center": 18, "y_data": [0, 1, 5, 6, 26, 27, 28], "y_data_ful": 1, "y_decis": 8, "y_fit": [0, 27], "y_i": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 21, 26, 27, 28], "y_if_": 10, "y_ix_": [0, 26], "y_ix_i": [7, 8, 13, 27, 28], "y_iy_jk": 8, "y_j": [6, 8, 12, 21], "y_k": 12, "y_m": 20, "y_mean": 18, "y_model": [0, 4, 5, 6, 26, 27, 28], "y_n": [8, 13, 28], "y_ny_1": 8, "y_ny_1k": 8, "y_ny_2": 8, "y_ny_2k": 8, "y_ny_n": 8, "y_ny_nk": 8, "y_offset": [6, 17, 27], "y_plot": 9, "y_pred": [0, 1, 4, 6, 7, 8, 9, 10, 27], "y_pred1": 9, "y_pred2": 9, "y_pred_rf": 10, "y_pred_tre": 10, "y_proba": [7, 10], "y_scaler": [6, 27], "y_test": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 27, 28], "y_test_onehot": 1, "y_test_predict": [], "y_tot": 4, "y_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "y_train_mean": [6, 27], "y_train_onehot": 1, "y_train_predict": [], "y_train_scal": [6, 27], "y_val": 1, "ye": [3, 6, 7], "year": [0, 19, 26], "yet": [0, 1, 6, 8, 11, 13, 26], "yi": 13, "yield": [0, 2, 5, 6, 8, 10, 12, 13, 14, 20, 23, 26, 28], "yk": 8, "ylabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 23, 26, 27, 28], "ylim": [3, 6], "ym": 9, "ymesh": 13, "yn": 0, "yo": [8, 9, 10], "yoshua": [1, 25], "you": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28], "young": 0, "your": [1, 2, 4, 5, 6, 8, 11, 13, 15, 17, 19, 20, 26, 28], "your_model_object": 16, "yourself": [11, 13, 26, 28], "youtu": [27, 28], "youtub": [19, 28], "ypred": 6, "ypredict": [0, 13, 26, 27, 28], "ypredict2": [13, 28], "ypredictlasso": [5, 28], "ypredictol": [0, 5, 28], "ypredictown": [6, 27], "ypredictownridg": [6, 27, 28], "ypredictridg": [0, 5, 6, 27, 28], "ypredictskl": [6, 27], "ytest": 6, "ytick": [3, 6, 8, 9], "ytild": [0, 6, 26, 27], "ytildelasso": [5, 28], "ytildenp": [0, 26, 27], "ytildeol": [0, 5, 28], "ytildeownridg": [6, 27, 28], "ytilderidg": [5, 6, 27, 28], "ytrain": 6, "yuxi": 26, "yx": [20, 26], "yy": [20, 26], "yz": [20, 26], "z": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 20, 23, 26, 27], "z_": [1, 2, 12, 20, 26], "z_0": [20, 26], "z_1": [20, 26], "z_2": [20, 26], "z_c": 1, "z_h": 1, "z_hidden": 2, "z_i": [1, 12], "z_j": [1, 12], "z_k": [12, 27], "z_m": 1, "z_mod": 9, "z_o": 1, "z_output": 2, "zaman": 23, "zaxi": 6, "zero": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 23, 26, 27, 28], "zeros_lik": 4, "zeroth": 27, "zfill": 4, "zip": [4, 6], "zm_h": [0, 26], "zn": [], "zone": [], "zoom": 26, "zx": [20, 26], "zy": [20, 26], "zz": [20, 26], "\u00f8yvind": [6, 27]}, "titles": ["3. Linear Regression", "14. Building a Feed Forward Neural Network", "15. Solving Differential Equations with Deep Learning", "16. Convolutional Neural Networks", "17. Recurrent neural networks: Overarching view", "4. Ridge and Lasso Regression", "5. Resampling Methods", "6. Logistic Regression", "8. Support Vector Machines, overarching aims", "9. Decision trees, overarching aims", "10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods", "11. Basic ideas of the Principal Component Analysis (PCA)", "13. Neural networks", "7. Optimization, the central part of any Machine Learning algortithm", "12. Clustering and Unsupervised Learning", "Exercises week 34", "Exercises week 35", "Exercises week 36", "Exercises week 37", "Applied Data Analysis and Machine Learning", "2. Linear Algebra, Handling of Arrays and more Python Features", "Project 1 on Machine Learning, deadline October 6 (midnight), 2025", "Course setting", "1. Elements of Probability Theory and Statistical Data Analysis", "Teachers and Grading", "Textbooks", "Week 34: Introduction to the course, Logistics and Practicalities", "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression", "Week 36: Linear Regression and Gradient descent"], "titleterms": {"": [8, 10, 28], "1": [0, 15, 16, 17, 18, 21, 27], "1a": 18, "2": [0, 15, 16, 17, 18, 26, 27, 28], "2023": 24, "2025": 21, "2a": [], "2b": [], "3": [0, 15, 16, 17, 18, 27], "34": [15, 26], "35": [16, 27], "36": [17, 28], "37": 18, "3a": 18, "3b": 18, "4": [0, 15, 16, 17, 18, 27], "4a": 18, "4b": 18, "5": [0, 16, 18], "6": 21, "A": [0, 1, 4, 8, 9, 26], "And": [26, 27], "In": 24, "Ising": 6, "The": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 15, 19, 26, 27, 28], "To": 26, "With": [4, 28], "about": [26, 27, 28], "abov": 28, "activ": [1, 12], "ad": [0, 6, 21, 26, 27], "adaboost": 10, "adagrad": 13, "adam": 13, "adapt": 10, "adjust": 1, "advanc": 21, "adversari": 4, "again": [3, 9], "ai": [21, 26], "aim": [8, 9, 26], "aka": 26, "algebra": [20, 26], "algorithm": [9, 10, 11, 12, 26, 27, 28], "algortithm": [13, 28], "all": 8, "an": [0, 4, 10, 15, 26], "analys": [5, 27, 28], "analysi": [0, 5, 6, 11, 19, 21, 23, 26, 27, 28], "analyt": [0, 16, 18], "ani": [13, 28], "anoth": [9, 28], "appli": 19, "approach": [0, 8, 14, 26], "approxim": 12, "architectur": 1, "arrai": [20, 26], "assist": 24, "autocorrel": 23, "autograd": [2, 13], "automat": 13, "b": 21, "back": [1, 11, 12, 27, 28], "background": [19, 21], "bag": 10, "base": 13, "basic": [0, 5, 7, 9, 10, 11, 20, 27, 28], "batch": 1, "bay": 5, "befor": 11, "better": 8, "bia": [6, 21], "binari": 1, "bind": 26, "bird": 10, "boldsymbol": [18, 27], "boost": 10, "bootstrap": [6, 10], "boston": [], "breast": 1, "brief": 26, "bring": 12, "build": [1, 3, 9], "c": [21, 26], "calcul": [18, 27, 28], "can": 26, "cancer": [1, 7, 9, 11], "cart": 9, "case": [8, 10, 23, 27, 28], "central": [13, 19, 23, 28], "chain": 12, "chang": 10, "channel": 26, "chi": [0, 26], "choic": 17, "choos": 1, "cifar01": 3, "classic": 11, "classif": [1, 9, 10], "classifi": 8, "clip": 1, "cluster": 14, "cnn": 3, "code": [1, 2, 5, 9, 11, 12, 13, 14, 15, 16, 21, 26, 27, 28], "collect": [1, 3], "commun": 26, "compar": [2, 10, 16], "comparison": 28, "complet": 27, "complex": [0, 6, 21, 27], "complic": 6, "compon": 11, "comput": 9, "computerlab": 26, "con": 9, "concept": 23, "condit": 28, "conjug": 13, "contn": 26, "convex": [8, 13, 28], "convolut": [3, 12], "correl": [11, 27], "correspond": [], "cost": [1, 10, 27, 28], "cours": [19, 22, 25, 26], "covari": [5, 11, 23, 27], "cover": 26, "creat": 16, "cross": [6, 21], "cython": 26, "d": 21, "data": [0, 1, 3, 6, 7, 9, 11, 15, 17, 18, 19, 23, 26, 27], "dataset": [1, 3, 18], "david": 26, "deadlin": [21, 26], "deadllin": 24, "decai": 2, "decis": [9, 10], "decomposit": [5, 11, 20, 27, 28], "deeep": [], "deep": [1, 2, 26], "defin": [1, 26], "degre": [0, 17, 27], "deliver": [15, 16], "deliveri": 21, "dens": 0, "deriv": [5, 12, 16, 17, 27, 28], "descent": [2, 10, 13, 18, 21, 28], "design": 27, "detail": [3, 26], "develop": 1, "diagon": 11, "differ": 8, "differenti": [2, 13], "diffus": 2, "dimension": [2, 3, 8, 18], "disadvantag": 9, "discret": 23, "discrimin": 26, "distribut": [5, 23], "do": 1, "doe": [27, 28], "domain": 23, "down": 1, "dropout": 1, "e": 21, "economi": [27, 28], "electron": 21, "element": [0, 23, 26], "elimin": 20, "energi": 26, "ensembl": 10, "entropi": 9, "environ": [0, 15], "equat": [0, 2, 12, 27, 28], "error": [0, 10, 26, 27, 28], "essenti": 26, "etc": 26, "euler": 2, "evalu": 1, "exampl": [1, 2, 3, 4, 6, 7, 8, 9, 10, 26, 27, 28], "exercis": [0, 6, 15, 16, 17, 18, 27], "expect": 23, "experi": 23, "explor": 0, "exponenti": 2, "express": [16, 17, 27], "extend": 28, "extrapol": 4, "extrem": [10, 26], "ey": 10, "f": 21, "fall": 24, "famili": [1, 26], "famou": 20, "fantast": [27, 28], "featur": [9, 16, 20, 27], "feed": [1, 12], "final": [12, 27], "find": [16, 18], "fine": 1, "first": [4, 12, 26, 28], "fit": [0, 10, 15, 16, 26, 28], "fix": [27, 28], "forc": 3, "forest": 10, "form": 18, "format": [21, 26], "formula": 18, "forward": [1, 2, 12], "foster": 26, "fourier": 3, "frank": 6, "freedom": [0, 17, 27], "frequent": 27, "frequentist": [0, 26], "from": [5, 10, 12, 26, 27, 28], "full": 2, "function": [0, 1, 6, 7, 8, 10, 11, 12, 13, 21, 23, 26, 27, 28], "further": [3, 5, 27, 28], "g": 21, "gan": 4, "gaussian": 20, "gd": 13, "gener": [4, 9, 26], "geometr": [11, 28], "gini": 9, "github": 15, "goal": [15, 16, 17, 18], "good": [0, 26], "grade": [24, 26], "gradient": [1, 2, 10, 13, 18, 21, 28], "growth": 2, "h": 21, "ha": 19, "handl": [20, 26], "hessian": [27, 28], "hidden": 2, "hous": [], "how": 16, "hyperparamet": [1, 17], "hyperplan": 8, "i": [0, 1, 26], "id3": 9, "idea": 11, "ideal": 28, "ii": 26, "illustr": 28, "implement": [1, 16, 17, 18], "implic": [5, 27, 28], "import": [5, 20, 26, 27, 28], "improv": 1, "includ": [13, 21], "increment": 11, "index": 9, "inform": 24, "input": 2, "instal": [19, 21, 26], "instructor": 24, "interpret": [5, 11, 26, 27, 28], "introduc": [11, 13, 27], "introduct": [0, 6, 19, 20, 21, 26], "invers": [5, 20], "invert": [27, 28], "iter": 10, "its": 27, "jacobian": 27, "jax": 13, "julia": 26, "jungl": 10, "kera": [1, 3], "kernel": [8, 11], "lab": 28, "lagrangian": 8, "lasso": [5, 6, 21, 27, 28], "last": 27, "later": [5, 27, 28], "layer": [1, 2, 3, 12], "learn": [0, 1, 2, 11, 13, 14, 15, 16, 17, 18, 19, 21, 26, 27, 28], "least": [5, 6, 16, 21, 26, 27, 28], "lectur": [26, 28], "level": 10, "librari": [19, 26], "likelihood": 7, "limit": [1, 13, 23, 28], "linear": [0, 8, 13, 15, 20, 26, 27, 28], "link": [5, 11, 25, 27], "literatur": 21, "logist": [7, 26], "loss": [27, 28], "lu": 20, "machin": [0, 8, 13, 19, 21, 26, 28], "main": [23, 26], "make": [0, 9, 10, 27], "mani": [10, 12], "mass": 26, "materi": [21, 26, 27, 28], "math": [5, 27, 28], "mathemat": [3, 5, 8, 27, 28], "matric": [5, 20, 26], "matrix": [1, 5, 11, 12, 16, 20, 26, 27, 28], "matter": 0, "max": 27, "mean": [0, 27, 28], "meet": [5, 10, 23, 26, 27], "mercer": 8, "method": [6, 9, 10, 13, 21, 26, 28], "midnight": 21, "min": 27, "minim": 26, "ml": 26, "mlp": 12, "mnist": [3, 4], "model": [0, 1, 4, 6, 12, 15, 17, 26], "momentum": [13, 21], "mondai": 28, "moon": [8, 9], "more": [3, 6, 20, 21, 26, 27, 28], "multilay": 12, "multipl": [1, 3, 17], "multipli": 8, "need": [21, 26], "network": [1, 2, 3, 4, 7, 12, 26], "neural": [1, 2, 3, 4, 7, 12, 26], "new": [4, 18], "newton": 28, "non": 8, "normal": [0, 1], "notat": 12, "note": [21, 27, 28], "now": [1, 9, 13, 28], "nuclear": [0, 26], "numba": 26, "number": [0, 2, 23, 27], "numer": [2, 21, 23], "numpi": [20, 26], "object": 3, "obtain": 11, "octob": 21, "od": 2, "off": [6, 21], "ol": [5, 6, 15, 16, 18, 21, 28], "one": [2, 12, 18, 28], "oper": 20, "optim": [1, 8, 13, 18, 19, 26, 27, 28], "order": [13, 18], "ordinari": [5, 6, 16, 21, 26, 27, 28], "organ": [0, 26], "oslo": 25, "other": [4, 9, 11, 12, 20, 21, 26], "our": [0, 4, 5, 11, 13, 21, 26, 27, 28], "outcom": [19, 26], "output": 2, "overarch": [0, 4, 8, 9, 26, 27], "overview": [10, 26], "own": [0, 10, 11, 21, 26, 27], "packag": [20, 26], "panda": [26, 27], "paramet": [26, 27], "paramt": 18, "part": [13, 19, 21, 28], "partial": 2, "pass": 1, "pca": 11, "pdf": 23, "perceptron": 12, "perform": [1, 9], "period": 3, "perspect": 1, "plan": [27, 28], "plethora": 26, "point": 4, "poisson": 2, "polynomi": [3, 16, 18, 28], "popul": 2, "popular": 26, "practic": [13, 24, 26], "pre": [1, 3], "preambl": 21, "predict": 4, "preprocess": 27, "prerequisit": [3, 19, 26], "princip": 11, "principl": 3, "pro": 9, "probabl": [5, 23], "problem": [1, 2, 13, 26, 27, 28], "procedur": [9, 26], "process": [1, 3], "program": [2, 13, 21, 28], "project": [6, 21, 24, 26], "prop": 13, "propag": [1, 12], "properti": [5, 23, 27, 28], "python": [0, 9, 15, 19, 20, 26], "quick": 8, "r": 26, "random": [10, 11, 23], "raphson": 28, "rate": 21, "read": [9, 26, 27], "real": [6, 26], "recommend": [26, 27], "recurr": [4, 12], "reduc": [0, 27], "reduct": 3, "refer": 21, "reformul": 2, "regress": [0, 5, 6, 7, 9, 10, 13, 15, 17, 18, 21, 26, 27, 28], "regular": 1, "relat": [], "relev": [25, 27], "relu": 1, "remark": 3, "remind": [6, 8, 26, 27, 28], "replac": 13, "report": 21, "repositori": 15, "requir": [2, 19], "resampl": [6, 21], "rescal": [6, 27], "residu": [27, 28], "resourc": 2, "result": [27, 28], "revisit": [13, 28], "rewrit": [26, 27], "ridg": [0, 5, 6, 17, 18, 21, 27, 28], "rm": 13, "rule": 12, "rung": 21, "same": 13, "sampl": 11, "scale": [17, 18, 27], "schedul": 26, "schemat": 9, "scheme": 2, "scienc": 26, "scikit": [0, 1, 11, 26, 27, 28], "second": [13, 18], "semest": 24, "sensit": 28, "septemb": 28, "session": 28, "set": [0, 2, 3, 9, 12, 15, 22, 26, 27, 28], "setup": 15, "sgd": 13, "should": 1, "similar": 13, "simpl": [0, 4, 9, 13, 18, 26, 27, 28], "simplest": 18, "singl": 10, "singular": [5, 11, 27, 28], "size": [27, 28], "sklearn": 16, "soft": 8, "softmax": 1, "softwar": [21, 26], "solv": [2, 28], "solver": 13, "some": [13, 20, 27, 28], "specifi": 2, "split": [0, 15, 27], "squar": [0, 5, 6, 10, 16, 21, 26, 27, 28], "standard": [13, 27], "state": 0, "statist": [5, 6, 19, 23, 26], "steepest": [10, 13, 28], "stochast": [13, 21, 23], "strongli": 26, "suggest": 26, "summari": [24, 26], "superposit": 3, "supervis": 1, "support": 8, "svd": [5, 27, 28], "synthet": 18, "systemat": 3, "t": 27, "take": 16, "taken": 26, "teach": 24, "teacher": [24, 26], "technic": 27, "techniqu": [6, 11, 21], "technologi": 19, "tensorflow": [1, 3], "tent": [24, 26], "test": [0, 1, 15, 17, 27], "text": 26, "textbook": [25, 26], "than": 28, "theorem": [5, 8, 11, 12, 23], "theori": 23, "theta": 18, "thi": 26, "tip": 13, "togeth": 12, "tool": [21, 26], "top": 1, "topic": 26, "toward": 11, "trade": [6, 21], "tradeoff": 6, "train": [0, 1, 4, 15, 26, 27], "transform": 3, "tree": [9, 10], "tuesdai": 28, "tune": 1, "two": [3, 8, 19], "type": [2, 4, 12, 26], "uio": 26, "univers": [12, 25], "unsupervis": 14, "up": [0, 2, 9, 12, 15, 26, 27, 28], "updat": 21, "us": [0, 1, 2, 3, 7, 13, 16, 18, 19, 21, 26, 27, 28], "v": 3, "valid": [6, 21], "valu": [5, 11, 23, 27, 28], "variabl": [23, 28], "varianc": [6, 21], "variou": 0, "vector": [8, 12, 16, 20, 26, 27], "versu": 26, "view": [0, 4, 10, 27], "virtual": 15, "visual": [1, 9], "wai": [9, 21], "wave": 2, "we": 26, "wednesdai": 28, "week": [15, 16, 17, 18, 26, 27, 28], "weekli": [], "what": [0, 26, 27, 28], "which": 1, "why": 26, "wisconsin": 7, "write": [4, 11, 21, 28], "x": 27, "xgboost": 10, "yet": 28, "your": [0, 10, 16, 18, 21, 27]}}) \ No newline at end of file +Search.setIndex({"alltitles": {"1a)": [[18, "a"]], "3D volumes of neurons": [[43, "d-volumes-of-neurons"], [44, "d-volumes-of-neurons"]], "3a)": [[18, "id1"]], "3b)": [[18, "b"]], "4a)": [[18, "id2"]], "4b)": [[18, "id3"]], "A Classification Tree": [[9, "a-classification-tree"]], "A Frequentist approach to data analysis": [[0, "a-frequentist-approach-to-data-analysis"], [33, "a-frequentist-approach-to-data-analysis"]], "A better approach": [[8, "a-better-approach"]], "A deep CNN model (From Raschka et al)": [[43, "a-deep-cnn-model-from-raschka-et-al"], [44, "a-deep-cnn-model-from-raschka-et-al"]], "A first summary": [[33, "a-first-summary"]], "A more compact expression": [[38, "a-more-compact-expression"], [39, "a-more-compact-expression"]], "A more efficient way of coding the above Convolution": [[43, "a-more-efficient-way-of-coding-the-above-convolution"], [44, "a-more-efficient-way-of-coding-the-above-convolution"]], "A new Cost Function": [[37, "a-new-cost-function"]], "A possible implementation of a neural network": [[42, "a-possible-implementation-of-a-neural-network"], [43, "a-possible-implementation-of-a-neural-network"]], "A quick Reminder on Lagrangian Multipliers": [[8, "a-quick-reminder-on-lagrangian-multipliers"]], "A simple example": [[4, "a-simple-example"]], "A soft classifier": [[8, "a-soft-classifier"]], "A top-down perspective on Neural networks": [[1, "a-top-down-perspective-on-neural-networks"], [41, "a-top-down-perspective-on-neural-networks"]], "A way to Read the Bias-Variance Tradeoff": [[37, "a-way-to-read-the-bias-variance-tradeoff"], [38, "a-way-to-read-the-bias-variance-tradeoff"]], "ADAM algorithm, taken from Goodfellow et al": [[36, "adam-algorithm-taken-from-goodfellow-et-al"]], "ADAM optimizer": [[13, "adam-optimizer"], [36, "id2"]], "Accuracy": [[36, "accuracy"]], "Activation functions": [[12, "activation-functions"], [39, "activation-functions"], [41, "activation-functions"], [41, "id3"], [42, "activation-functions"], [42, "id1"]], "Activation functions, Logistic and Hyperbolic ones": [[39, "activation-functions-logistic-and-hyperbolic-ones"], [41, "activation-functions-logistic-and-hyperbolic-ones"]], "Activation functions, examples": [[42, "activation-functions-examples"]], "AdaGrad Properties": [[36, "adagrad-properties"]], "AdaGrad Update Rule Derivation": [[36, "adagrad-update-rule-derivation"]], "AdaGrad algorithm, taken from Goodfellow et al": [[36, "adagrad-algorithm-taken-from-goodfellow-et-al"]], "Adam Optimizer": [[36, "adam-optimizer"]], "Adam vs. AdaGrad and RMSProp": [[36, "adam-vs-adagrad-and-rmsprop"]], "Adam: Bias Correction": [[36, "adam-bias-correction"]], "Adam: Exponential Moving Averages (Moments)": [[36, "adam-exponential-moving-averages-moments"]], "Adam: Update Rule Derivation": [[36, "adam-update-rule-derivation"]], "Adaptive boosting: AdaBoost, Basic Algorithm": [[10, "adaptive-boosting-adaboost-basic-algorithm"]], "Adaptivity Across Dimensions": [[36, "adaptivity-across-dimensions"]], "Add Dense layers on top": [[44, "add-dense-layers-on-top"]], "Adding Neural Networks": [[39, "adding-neural-networks"]], "Adding a hidden layer": [[40, "adding-a-hidden-layer"], [41, "adding-a-hidden-layer"]], "Adding error analysis and training set up": [[33, "adding-error-analysis-and-training-set-up"], [34, "adding-error-analysis-and-training-set-up"]], "Adjust hyperparameters": [[1, "adjust-hyperparameters"], [41, "adjust-hyperparameters"]], "Algorithms and codes for Adagrad, RMSprop and Adam": [[36, "algorithms-and-codes-for-adagrad-rmsprop-and-adam"]], "Algorithms for Setting up Decision Trees": [[9, "algorithms-for-setting-up-decision-trees"]], "An Overview of Ensemble Methods": [[10, "an-overview-of-ensemble-methods"]], "An extrapolation example": [[4, "an-extrapolation-example"]], "An optimization/minimization problem": [[33, "an-optimization-minimization-problem"]], "Analyzing the last results": [[40, "analyzing-the-last-results"], [41, "analyzing-the-last-results"]], "And a similar example using Tensorflow with Keras": [[42, "and-a-similar-example-using-tensorflow-with-keras"]], "And finally \\boldsymbol{X}\\boldsymbol{X}^T": [[34, "and-finally-boldsymbol-x-boldsymbol-x-t"]], "And finally ADAM": [[36, "and-finally-adam"]], "And what about using neural networks?": [[33, "and-what-about-using-neural-networks"]], "Another Example from Scikit-Learn\u2019s Repository": [[37, "another-example-from-scikit-learn-s-repository"], [38, "another-example-from-scikit-learn-s-repository"]], "Another Example, now with a polynomial fit": [[35, "another-example-now-with-a-polynomial-fit"]], "Another example, the moons again": [[9, "another-example-the-moons-again"]], "Applied Data Analysis and Machine Learning": [[25, null]], "Artificial neurons": [[39, "artificial-neurons"], [40, "artificial-neurons"]], "Assumptions made": [[37, "assumptions-made"]], "Autocorrelation function": [[30, "autocorrelation-function"]], "Automatic differentiation": [[13, "automatic-differentiation"], [40, "automatic-differentiation"]], "Automatic differentiation through examples": [[40, "automatic-differentiation-through-examples"]], "Back propagation": [[42, "back-propagation"], [43, "back-propagation"]], "Back propagation and automatic differentiation": [[42, "back-propagation-and-automatic-differentiation"]], "Back to Ridge and LASSO Regression": [[34, "back-to-ridge-and-lasso-regression"], [35, "back-to-ridge-and-lasso-regression"]], "Back to the Cancer Data": [[11, "back-to-the-cancer-data"]], "Background literature": [[27, "background-literature"], [28, "background-literature"]], "Bagging": [[10, "bagging"]], "Bagging Examples": [[10, "bagging-examples"]], "Basic Matrix Features": [[26, "basic-matrix-features"]], "Basic ideas of the Principal Component Analysis (PCA)": [[11, null]], "Basic math of the SVD": [[5, "basic-math-of-the-svd"], [34, "basic-math-of-the-svd"], [35, "basic-math-of-the-svd"]], "Basics": [[7, "basics"], [38, "basics"], [39, "basics"]], "Basics of a tree": [[9, "basics-of-a-tree"]], "Basics of an NN": [[40, "basics-of-an-nn"]], "Batch Normalization": [[1, "batch-normalization"], [41, "batch-normalization"]], "Batches and mini-batches": [[36, "batches-and-mini-batches"]], "Bayes\u2019 Theorem and Ridge and Lasso Regression": [[5, "bayes-theorem-and-ridge-and-lasso-regression"]], "Boosting, a Bird\u2019s Eye View": [[10, "boosting-a-bird-s-eye-view"]], "Bootstrap": [[6, "bootstrap"]], "Bringing it together": [[40, "bringing-it-together"], [41, "bringing-it-together"]], "Bringing it together, first back propagation equation": [[12, "bringing-it-together-first-back-propagation-equation"]], "Building a Feed Forward Neural Network": [[1, null]], "Building a neural network code": [[41, "building-a-neural-network-code"]], "Building a tree, regression": [[9, "building-a-tree-regression"]], "Building code using Pytorch": [[44, "building-code-using-pytorch"]], "Building convolutional neural networks in Tensorflow/Keras and PyTorch": [[43, "building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch"]], "Building convolutional neural networks using Tensorflow and Keras": [[44, "building-convolutional-neural-networks-using-tensorflow-and-keras"]], "Building neural networks in Tensorflow and Keras": [[1, "building-neural-networks-in-tensorflow-and-keras"], [41, "building-neural-networks-in-tensorflow-and-keras"], [42, "building-neural-networks-in-tensorflow-and-keras"]], "Building our own neural network code": [[42, "building-our-own-neural-network-code"]], "But none of these can compete with Newton\u2019s method": [[36, "but-none-of-these-can-compete-with-newton-s-method"]], "CNNs in brief": [[43, "cnns-in-brief"], [44, "cnns-in-brief"]], "CNNs in more detail, building convolutional neural networks in Tensorflow and Keras": [[3, "cnns-in-more-detail-building-convolutional-neural-networks-in-tensorflow-and-keras"]], "CNNs in more detail, simple example": [[43, "cnns-in-more-detail-simple-example"], [44, "cnns-in-more-detail-simple-example"]], "Cancer Data again now with Decision Trees and other Methods": [[9, "cancer-data-again-now-with-decision-trees-and-other-methods"]], "Chain rule": [[40, "chain-rule"]], "Chain rule, forward and reverse modes": [[40, "chain-rule-forward-and-reverse-modes"]], "Challenge: Choosing a Fixed Learning Rate": [[36, "challenge-choosing-a-fixed-learning-rate"]], "Choose cost function and optimizer": [[1, "choose-cost-function-and-optimizer"], [41, "choose-cost-function-and-optimizer"]], "Class of functions we can approximate": [[40, "class-of-functions-we-can-approximate"]], "Classical PCA Theorem": [[11, "classical-pca-theorem"]], "Classification and Regression, writing our own neural network code": [[28, "classification-and-regression-writing-our-own-neural-network-code"]], "Classification problems": [[38, "classification-problems"], [39, "classification-problems"]], "Clustering and Unsupervised Learning": [[14, null]], "Code Example for Cross-validation and k-fold Cross-validation": [[37, "code-example-for-cross-validation-and-k-fold-cross-validation"], [38, "code-example-for-cross-validation-and-k-fold-cross-validation"]], "Code example": [[40, "code-example"], [41, "code-example"]], "Code example for the Bootstrap method": [[37, "code-example-for-the-bootstrap-method"]], "Code for SVD and Inversion of Matrices": [[5, "code-for-svd-and-inversion-of-matrices"]], "Code with a Number of Minibatches which varies": [[36, "code-with-a-number-of-minibatches-which-varies"]], "Codes and Approaches": [[14, "codes-and-approaches"]], "Codes for the SVD": [[5, "codes-for-the-svd"], [34, "codes-for-the-svd"], [35, "codes-for-the-svd"]], "Coding Setup and Linear Regression": [[15, "coding-setup-and-linear-regression"]], "Collect and pre-process data": [[1, "collect-and-pre-process-data"], [41, "collect-and-pre-process-data"], [41, "id2"], [42, "collect-and-pre-process-data"]], "Communication channels": [[33, "communication-channels"]], "Commutative process": [[43, "commutative-process"], [44, "commutative-process"]], "Compact expressions": [[40, "compact-expressions"], [41, "compact-expressions"]], "Compare Bagging on Trees with Random Forests": [[10, "compare-bagging-on-trees-with-random-forests"]], "Comparing with a numerical scheme": [[2, "comparing-with-a-numerical-scheme"], [42, "comparing-with-a-numerical-scheme"], [43, "comparing-with-a-numerical-scheme"]], "Comparison with OLS": [[35, "comparison-with-ols"]], "Compile and train the model": [[44, "compile-and-train-the-model"]], "Completing the list": [[40, "completing-the-list"], [41, "completing-the-list"]], "Computation of gradients": [[36, "computation-of-gradients"]], "Computing the Gini index": [[9, "computing-the-gini-index"]], "Conditions on convex functions": [[35, "conditions-on-convex-functions"]], "Confidence Intervals": [[37, "confidence-intervals"]], "Confusion Matrix": [[23, "confusion-matrix"]], "Conjugate gradient method": [[13, "conjugate-gradient-method"]], "Convergence rates": [[36, "convergence-rates"]], "Convex function": [[35, "convex-function"]], "Convex functions": [[13, "convex-functions"], [35, "convex-functions"]], "Convolution Examples: Polynomial multiplication": [[3, "convolution-examples-polynomial-multiplication"], [43, "convolution-examples-polynomial-multiplication"], [44, "convolution-examples-polynomial-multiplication"]], "Convolution Examples: Principle of Superposition and Periodic Forces (Fourier Transforms)": [[3, "convolution-examples-principle-of-superposition-and-periodic-forces-fourier-transforms"]], "Convolutional Neural Network": [[12, "convolutional-neural-network"], [39, "convolutional-neural-network"], [40, "convolutional-neural-network"]], "Convolutional Neural Networks": [[3, null]], "Convolutional Neural Networks (recognizing images)": [[43, "convolutional-neural-networks-recognizing-images"]], "Convolutional Neural Networks (recognizing images), reminder from last week": [[44, "convolutional-neural-networks-recognizing-images-reminder-from-last-week"]], "Correlation Function and Design/Feature Matrix": [[34, "correlation-function-and-design-feature-matrix"]], "Correlation Matrix": [[11, "correlation-matrix"], [34, "correlation-matrix"]], "Correlation Matrix with Pandas": [[34, "correlation-matrix-with-pandas"]], "Cost functions": [[41, "cost-functions"], [42, "cost-functions"]], "Counting the number of floating point operations": [[40, "counting-the-number-of-floating-point-operations"]], "Course Format": [[33, "course-format"]], "Course setting": [[29, null]], "Covariance Matrix Examples": [[34, "covariance-matrix-examples"]], "Covariance and Correlation Matrix": [[34, "covariance-and-correlation-matrix"]], "Cross correlation": [[43, "cross-correlation"], [44, "cross-correlation"]], "Cross-validation": [[6, "cross-validation"]], "Cross-validation in brief": [[37, "cross-validation-in-brief"], [38, "cross-validation-in-brief"]], "Cumulative Gain": [[23, "cumulative-gain"]], "Deadlines for projects (tentative)": [[33, "deadlines-for-projects-tentative"]], "Decision trees, overarching aims": [[9, null]], "Deep Neural Networks": [[36, "deep-neural-networks"]], "Deep learning methods": [[33, "deep-learning-methods"]], "Define model and architecture": [[1, "define-model-and-architecture"], [41, "define-model-and-architecture"]], "Defining intermediate operations": [[40, "defining-intermediate-operations"]], "Defining the cost function": [[1, "defining-the-cost-function"], [41, "defining-the-cost-function"]], "Defining the problem": [[42, "defining-the-problem"], [43, "defining-the-problem"]], "Definitions": [[19, "definitions"], [40, "definitions"], [41, "definitions"]], "Deliverables": [[15, "deliverables"], [16, "deliverables"], [19, "deliverables"], [20, "deliverables"], [24, "deliverables"], [27, "deliverables"], [28, "deliverables"]], "Derivation of the AdaGrad Algorithm": [[36, "derivation-of-the-adagrad-algorithm"]], "Derivative of the cost function": [[40, "derivative-of-the-cost-function"], [41, "derivative-of-the-cost-function"]], "Derivatives and the chain rule": [[12, "derivatives-and-the-chain-rule"], [40, "derivatives-and-the-chain-rule"], [41, "derivatives-and-the-chain-rule"]], "Derivatives in terms of z_j^L": [[40, "derivatives-in-terms-of-z-j-l"], [41, "derivatives-in-terms-of-z-j-l"]], "Derivatives of the hidden layer": [[40, "derivatives-of-the-hidden-layer"], [41, "derivatives-of-the-hidden-layer"]], "Derivatives, example 1": [[34, "derivatives-example-1"]], "Deriving OLS from a probability distribution": [[5, "deriving-ols-from-a-probability-distribution"], [37, "deriving-ols-from-a-probability-distribution"]], "Deriving and Implementing Ordinary Least Squares": [[16, "deriving-and-implementing-ordinary-least-squares"]], "Deriving and Implementing Ridge Regression": [[17, "deriving-and-implementing-ridge-regression"]], "Deriving the Lasso Regression Equations": [[34, "deriving-the-lasso-regression-equations"], [35, "deriving-the-lasso-regression-equations"], [35, "id6"]], "Deriving the Ridge Regression Equations": [[34, "deriving-the-ridge-regression-equations"], [35, "deriving-the-ridge-regression-equations"], [35, "id3"]], "Deriving the back propagation code for a multilayer perceptron model": [[12, "deriving-the-back-propagation-code-for-a-multilayer-perceptron-model"]], "Developing a code for doing neural networks with back propagation": [[1, "developing-a-code-for-doing-neural-networks-with-back-propagation"], [41, "developing-a-code-for-doing-neural-networks-with-back-propagation"]], "Diagonalize the sample covariance matrix to obtain the principal components": [[11, "diagonalize-the-sample-covariance-matrix-to-obtain-the-principal-components"]], "Different kernels and Mercer\u2019s theorem": [[8, "different-kernels-and-mercer-s-theorem"]], "Disadvantages": [[9, "disadvantages"]], "Discriminative Modeling": [[33, "discriminative-modeling"]], "Discussing the correlation data": [[39, "discussing-the-correlation-data"]], "Does Logistic Regression do a better Job?": [[39, "does-logistic-regression-do-a-better-job"]], "Domains and probabilities": [[30, "domains-and-probabilities"]], "Dropout": [[1, "dropout"], [41, "dropout"]], "ELU function": [[41, "elu-function"], [42, "elu-function"]], "Economy-size SVD": [[34, "economy-size-svd"], [35, "economy-size-svd"]], "Efficient Polynomial Multiplication": [[43, "efficient-polynomial-multiplication"], [44, "efficient-polynomial-multiplication"]], "Elements of Probability Theory and Statistical Data Analysis": [[30, null]], "Empirical Evidence: Convergence Time and Memory in Practice": [[36, "empirical-evidence-convergence-time-and-memory-in-practice"]], "Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods": [[10, null]], "Entropy and the ID3 algorithm": [[9, "entropy-and-the-id3-algorithm"]], "Essential elements of ML": [[33, "essential-elements-of-ml"]], "Evaluate model performance on test data": [[1, "evaluate-model-performance-on-test-data"], [41, "evaluate-model-performance-on-test-data"]], "Example 2": [[34, "example-2"]], "Example 3": [[34, "example-3"]], "Example 4": [[34, "example-4"]], "Example Matrix": [[34, "example-matrix"], [35, "example-matrix"]], "Example code for Bias-Variance tradeoff": [[37, "example-code-for-bias-variance-tradeoff"]], "Example code for Logistic Regression": [[38, "example-code-for-logistic-regression"], [39, "example-code-for-logistic-regression"]], "Example of discriminative modeling, taken from Generative Deep Learning by David Foster": [[33, "example-of-discriminative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of generative modeling, taken from Generative Deep Learning by David Foster": [[33, "example-of-generative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of own Standard scaling": [[34, "example-of-own-standard-scaling"]], "Example relevant for the exercises": [[34, "example-relevant-for-the-exercises"]], "Example: Exponential decay": [[2, "example-exponential-decay"], [42, "example-exponential-decay"], [43, "example-exponential-decay"]], "Example: Population growth": [[2, "example-population-growth"], [42, "example-population-growth"], [43, "example-population-growth"]], "Example: Solving the one dimensional Poisson equation": [[42, "example-solving-the-one-dimensional-poisson-equation"], [43, "example-solving-the-one-dimensional-poisson-equation"]], "Example: Solving the wave equation with Neural Networks": [[42, "example-solving-the-wave-equation-with-neural-networks"]], "Example: The diffusion equation": [[2, "example-the-diffusion-equation"], [42, "example-the-diffusion-equation"], [43, "example-the-diffusion-equation"]], "Example: binary classification problem": [[1, "example-binary-classification-problem"], [41, "example-binary-classification-problem"]], "Examples": [[33, "examples"]], "Examples of CNN setups": [[43, "examples-of-cnn-setups"], [44, "examples-of-cnn-setups"]], "Examples of XOR, OR and AND gates": [[39, "examples-of-xor-or-and-and-gates"]], "Examples of likelihood functions used in logistic regression and neural networks": [[7, "examples-of-likelihood-functions-used-in-logistic-regression-and-neural-networks"]], "Examples of likelihood functions used in logistic regression and nueral networks": [[38, "examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks"]], "Exercise 1": [[21, "exercise-1"]], "Exercise 1 - Choice of model and degrees of freedom": [[17, "exercise-1-choice-of-model-and-degrees-of-freedom"]], "Exercise 1 - Finding the derivative of Matrix-Vector expressions": [[16, "exercise-1-finding-the-derivative-of-matrix-vector-expressions"]], "Exercise 1 - Github Setup": [[15, "exercise-1-github-setup"]], "Exercise 1 - Understand the feed forward pass": [[22, "exercise-1-understand-the-feed-forward-pass"]], "Exercise 1, scale your data": [[18, "exercise-1-scale-your-data"]], "Exercise 1:": [[24, "exercise-1"]], "Exercise 1: Creating the report document": [[20, "exercise-1-creating-the-report-document"]], "Exercise 1: Expectation values for ordinary least squares expressions": [[19, "exercise-1-expectation-values-for-ordinary-least-squares-expressions"]], "Exercise 1: Including more data": [[40, "exercise-1-including-more-data"]], "Exercise 1: Setting up various Python environments": [[0, "exercise-1-setting-up-various-python-environments"]], "Exercise 2": [[21, "exercise-2"]], "Exercise 2 - Deriving the expression for OLS": [[16, "exercise-2-deriving-the-expression-for-ols"]], "Exercise 2 - Deriving the expression for Ridge Regression": [[17, "exercise-2-deriving-the-expression-for-ridge-regression"]], "Exercise 2 - Gradient with one layer using autograd": [[22, "exercise-2-gradient-with-one-layer-using-autograd"]], "Exercise 2 - Setting up a Github repository": [[15, "exercise-2-setting-up-a-github-repository"]], "Exercise 2, calculate the gradients": [[18, "exercise-2-calculate-the-gradients"]], "Exercise 2:": [[24, "exercise-2"]], "Exercise 2: Adding good figures": [[20, "exercise-2-adding-good-figures"]], "Exercise 2: Expectation values for Ridge regression": [[19, "exercise-2-expectation-values-for-ridge-regression"]], "Exercise 2: Extended program": [[40, "exercise-2-extended-program"]], "Exercise 2: making your own data and exploring scikit-learn": [[0, "exercise-2-making-your-own-data-and-exploring-scikit-learn"]], "Exercise 3": [[21, "exercise-3"]], "Exercise 3 - Creating feature matrix and implementing OLS using the analytical expression": [[16, "exercise-3-creating-feature-matrix-and-implementing-ols-using-the-analytical-expression"]], "Exercise 3 - Fitting an OLS model to data": [[15, "exercise-3-fitting-an-ols-model-to-data"]], "Exercise 3 - Gradient with one layer writing backpropagation by hand": [[22, "exercise-3-gradient-with-one-layer-writing-backpropagation-by-hand"]], "Exercise 3 - Scaling data": [[17, "exercise-3-scaling-data"]], "Exercise 3 - Setting up a Python virtual environment": [[15, "exercise-3-setting-up-a-python-virtual-environment"]], "Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters \\boldsymbol{\\theta}": [[18, "exercise-3-using-the-analytical-formulae-for-ols-and-ridge-regression-to-find-the-optimal-paramters-boldsymbol-theta"]], "Exercise 3: Deriving the expression for the Bias-Variance Trade-off": [[19, "exercise-3-deriving-the-expression-for-the-bias-variance-trade-off"]], "Exercise 3: Normalizing our data": [[0, "exercise-3-normalizing-our-data"]], "Exercise 3: Writing an abstract and introduction": [[20, "exercise-3-writing-an-abstract-and-introduction"]], "Exercise 4 - Custom activation for each layer": [[21, "exercise-4-custom-activation-for-each-layer"]], "Exercise 4 - Fitting a polynomial": [[16, "exercise-4-fitting-a-polynomial"]], "Exercise 4 - Gradient with two layers writing backpropagation by hand": [[22, "exercise-4-gradient-with-two-layers-writing-backpropagation-by-hand"]], "Exercise 4 - Implementing Ridge Regression": [[17, "exercise-4-implementing-ridge-regression"]], "Exercise 4 - Testing multiple hyperparameters": [[17, "exercise-4-testing-multiple-hyperparameters"]], "Exercise 4 - The train-test split": [[15, "exercise-4-the-train-test-split"]], "Exercise 4, Implementing the simplest form for gradient descent": [[18, "exercise-4-implementing-the-simplest-form-for-gradient-descent"]], "Exercise 4: Adding Ridge Regression": [[0, "exercise-4-adding-ridge-regression"]], "Exercise 4: Computing the Bias and Variance": [[19, "exercise-4-computing-the-bias-and-variance"]], "Exercise 4: Making the code available and presentable": [[20, "exercise-4-making-the-code-available-and-presentable"]], "Exercise 5 - Comparing your code with sklearn": [[16, "exercise-5-comparing-your-code-with-sklearn"]], "Exercise 5 - Gradient with any number of layers writing backpropagation by hand": [[22, "exercise-5-gradient-with-any-number-of-layers-writing-backpropagation-by-hand"]], "Exercise 5 - Processing multiple inputs at once": [[21, "exercise-5-processing-multiple-inputs-at-once"]], "Exercise 5, Ridge regression and a new Synthetic Dataset": [[18, "exercise-5-ridge-regression-and-a-new-synthetic-dataset"]], "Exercise 5: Analytical exercises": [[0, "exercise-5-analytical-exercises"]], "Exercise 5: Interpretation of scaling and metrics": [[19, "exercise-5-interpretation-of-scaling-and-metrics"]], "Exercise 5: Referencing": [[20, "exercise-5-referencing"]], "Exercise 6 - Batched inputs": [[22, "exercise-6-batched-inputs"]], "Exercise 6 - Predicting on real data": [[21, "exercise-6-predicting-on-real-data"]], "Exercise 7 - Training": [[22, "exercise-7-training"]], "Exercise 7 - Training on real data (Optional)": [[21, "exercise-7-training-on-real-data-optional"]], "Exercise 8 (Optional) - Object orientation": [[22, "exercise-8-optional-object-orientation"]], "Exercise a)": [[23, "exercise-a"]], "Exercise b)": [[23, "exercise-b"]], "Exercise c) week 43": [[23, "exercise-c-week-43"]], "Exercise: Cross-validation as resampling techniques, adding more complexity": [[6, "exercise-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Exercise: Analysis of real data": [[6, "exercise-analysis-of-real-data"]], "Exercise: Bias-variance trade-off and resampling techniques": [[6, "exercise-bias-variance-trade-off-and-resampling-techniques"]], "Exercise: Lasso Regression on the Franke function with resampling": [[6, "exercise-lasso-regression-on-the-franke-function-with-resampling"]], "Exercise: Ordinary Least Square (OLS) on the Franke function": [[6, "exercise-ordinary-least-square-ols-on-the-franke-function"]], "Exercise: Ridge Regression on the Franke function with resampling": [[6, "exercise-ridge-regression-on-the-franke-function-with-resampling"]], "Exercises": [[0, "exercises"], [23, "exercises"]], "Exercises and Projects": [[6, "exercises-and-projects"]], "Exercises and lab session week 43": [[42, "exercises-and-lab-session-week-43"]], "Exercises week 34": [[15, null]], "Exercises week 35": [[16, null]], "Exercises week 36": [[17, null]], "Exercises week 37": [[18, null]], "Exercises week 38": [[19, null]], "Exercises week 39": [[20, null]], "Exercises week 41": [[21, null]], "Exercises week 42": [[22, null]], "Exercises week 43": [[23, null]], "Exercises week 44": [[24, null]], "Expectation value and variance": [[37, "expectation-value-and-variance"]], "Expectation value and variance for \\boldsymbol{\\theta}": [[37, "expectation-value-and-variance-for-boldsymbol-theta"]], "Expectation values": [[30, "expectation-values"]], "Explicit derivatives": [[40, "explicit-derivatives"], [41, "explicit-derivatives"]], "Exploding gradients": [[41, "exploding-gradients"]], "Extending to more predictors": [[38, "extending-to-more-predictors"], [39, "extending-to-more-predictors"]], "Extending to more than one variable": [[35, "extending-to-more-than-one-variable"]], "Extremely useful tools, strongly recommended": [[33, "extremely-useful-tools-strongly-recommended"]], "Feed-forward neural networks": [[12, "feed-forward-neural-networks"], [39, "feed-forward-neural-networks"], [40, "feed-forward-neural-networks"]], "Feed-forward pass": [[1, "feed-forward-pass"], [41, "feed-forward-pass"]], "Final back propagating equation": [[12, "final-back-propagating-equation"], [40, "final-back-propagating-equation"], [41, "final-back-propagating-equation"]], "Final derivatives": [[40, "final-derivatives"]], "Final expression": [[40, "final-expression"], [41, "final-expression"]], "Final expressions for the biases of the hidden layer": [[40, "final-expressions-for-the-biases-of-the-hidden-layer"], [41, "final-expressions-for-the-biases-of-the-hidden-layer"]], "Final part": [[44, "final-part"]], "Final technicalities I": [[42, "final-technicalities-i"], [43, "final-technicalities-i"]], "Final technicalities II": [[42, "final-technicalities-ii"], [43, "final-technicalities-ii"]], "Final technicalities III": [[42, "final-technicalities-iii"], [43, "final-technicalities-iii"]], "Final technicalities IV": [[42, "final-technicalities-iv"], [43, "final-technicalities-iv"]], "Final visualization": [[44, "final-visualization"]], "Finally, evaluate the model": [[44, "finally-evaluate-the-model"]], "Finding the Limit": [[37, "finding-the-limit"]], "Finding the number of parameters": [[43, "finding-the-number-of-parameters"], [44, "finding-the-number-of-parameters"]], "Fine-tuning neural network hyperparameters": [[1, "fine-tuning-neural-network-hyperparameters"], [41, "fine-tuning-neural-network-hyperparameters"]], "First network example, simple percepetron with one input": [[40, "first-network-example-simple-percepetron-with-one-input"]], "Fitting an Equation of State for Dense Nuclear Matter": [[0, "fitting-an-equation-of-state-for-dense-nuclear-matter"]], "Fixing the singularity": [[34, "fixing-the-singularity"], [35, "fixing-the-singularity"]], "Format for electronic delivery of report and programs": [[27, "format-for-electronic-delivery-of-report-and-programs"], [28, "format-for-electronic-delivery-of-report-and-programs"]], "Forward and reverse modes": [[40, "forward-and-reverse-modes"]], "Fourier series and Toeplitz matrices": [[43, "fourier-series-and-toeplitz-matrices"], [44, "fourier-series-and-toeplitz-matrices"]], "Frequently used scaling functions": [[34, "frequently-used-scaling-functions"], [36, "frequently-used-scaling-functions"]], "From OLS to Ridge and Lasso": [[35, "from-ols-to-ridge-and-lasso"]], "From one to many layers, the universal approximation theorem": [[12, "from-one-to-many-layers-the-universal-approximation-theorem"]], "Full object-oriented implementation": [[41, "full-object-oriented-implementation"]], "Functionality in Scikit-Learn": [[34, "functionality-in-scikit-learn"], [36, "functionality-in-scikit-learn"]], "Further Dimensionality Remarks": [[3, "further-dimensionality-remarks"]], "Further properties (important for our analyses later)": [[5, "further-properties-important-for-our-analyses-later"], [34, "further-properties-important-for-our-analyses-later"], [35, "further-properties-important-for-our-analyses-later"]], "Further remarks": [[43, "further-remarks"], [44, "further-remarks"]], "Further simplification": [[43, "further-simplification"], [44, "further-simplification"]], "Gaussian Elimination": [[26, "gaussian-elimination"]], "General Features": [[9, "general-features"]], "General linear models and linear algebra": [[33, "general-linear-models-and-linear-algebra"]], "Generalizing the above one-dimensional case": [[43, "generalizing-the-above-one-dimensional-case"], [44, "generalizing-the-above-one-dimensional-case"]], "Generalizing the fitting procedure as a linear algebra problem": [[33, "generalizing-the-fitting-procedure-as-a-linear-algebra-problem"], [33, "id1"]], "Generative Adversarial Networks": [[4, "generative-adversarial-networks"]], "Generative Models": [[4, "generative-models"]], "Generative Versus Discriminative Modeling": [[33, "generative-versus-discriminative-modeling"]], "Geometric Interpretation and link with Singular Value Decomposition": [[11, "geometric-interpretation-and-link-with-singular-value-decomposition"]], "Getting serious, the back propagation equations for a neural network": [[40, "getting-serious-the-back-propagation-equations-for-a-neural-network"]], "Getting started with project 1": [[20, "getting-started-with-project-1"]], "Gradient Boosting, Classification Example": [[10, "gradient-boosting-classification-example"]], "Gradient Boosting, Examples of Regression": [[10, "gradient-boosting-examples-of-regression"]], "Gradient Clipping": [[1, "gradient-clipping"], [41, "gradient-clipping"]], "Gradient Descent Example": [[35, "id1"], [36, "id1"]], "Gradient boosting: Basics with Steepest Descent/Functional Gradient Descent": [[10, "gradient-boosting-basics-with-steepest-descent-functional-gradient-descent"]], "Gradient descent": [[2, "gradient-descent"], [42, "gradient-descent"], [43, "gradient-descent"]], "Gradient descent and Ridge": [[35, "gradient-descent-and-ridge"], [36, "gradient-descent-and-ridge"]], "Gradient descent and revisiting Ordinary Least Squares from last week": [[36, "gradient-descent-and-revisiting-ordinary-least-squares-from-last-week"]], "Gradient descent example": [[35, "gradient-descent-example"], [36, "gradient-descent-example"]], "Gradient expressions": [[40, "gradient-expressions"], [41, "gradient-expressions"]], "Grading": [[31, "grading"], [31, "id2"], [33, "grading"]], "Hidden layers": [[41, "hidden-layers"]], "Homogeneous data": [[41, "homogeneous-data"]], "How to do image compression before the era of deep learning": [[43, "how-to-do-image-compression-before-the-era-of-deep-learning"]], "How to take derivatives of Matrix-Vector expressions": [[16, "how-to-take-derivatives-of-matrix-vector-expressions"]], "Hyperplanes and all that": [[8, "hyperplanes-and-all-that"]], "Identifying Terms": [[37, "identifying-terms"]], "Illustration of a single perceptron model and a multi-perceptron model": [[39, "illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model"], [40, "illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model"]], "Important Matrix and vector handling packages": [[26, "important-matrix-and-vector-handling-packages"]], "Important observations": [[40, "important-observations"], [41, "important-observations"]], "Important technicalities: More on Rescaling data": [[34, "important-technicalities-more-on-rescaling-data"]], "Importing Keras and Tensorflow": [[44, "importing-keras-and-tensorflow"]], "Improving gradient descent with momentum": [[36, "improving-gradient-descent-with-momentum"]], "Improving performance": [[1, "improving-performance"], [41, "improving-performance"]], "In general not this simple": [[40, "in-general-not-this-simple"]], "In summary": [[31, "in-summary"]], "Including Stochastic Gradient Descent with Autograd": [[13, "including-stochastic-gradient-descent-with-autograd"], [36, "including-stochastic-gradient-descent-with-autograd"]], "Including more classes": [[38, "including-more-classes"], [39, "including-more-classes"]], "Incremental PCA": [[11, "incremental-pca"]], "Independent and Identically Distributed (iid)": [[37, "independent-and-identically-distributed-iid"]], "Inputs to the activation function": [[40, "inputs-to-the-activation-function"], [41, "inputs-to-the-activation-function"]], "Insights from the paper by Glorot and Bengio": [[41, "insights-from-the-paper-by-glorot-and-bengio"]], "Installing R, C++, cython or Julia": [[33, "installing-r-c-cython-or-julia"]], "Installing R, C++, cython, Numba etc": [[33, "installing-r-c-cython-numba-etc"]], "Instructor information": [[31, "instructor-information"]], "Interpretations and optimizing our parameters": [[33, "interpretations-and-optimizing-our-parameters"], [33, "id2"], [33, "id3"], [34, "interpretations-and-optimizing-our-parameters"], [34, "id1"], [34, "id2"]], "Interpreting the Ridge results": [[34, "interpreting-the-ridge-results"], [35, "interpreting-the-ridge-results"], [35, "id4"]], "Introducing JAX": [[13, "introducing-jax"]], "Introducing the Covariance and Correlation functions": [[11, "introducing-the-covariance-and-correlation-functions"], [34, "introducing-the-covariance-and-correlation-functions"]], "Introduction": [[0, "introduction"], [6, "introduction"], [25, "introduction"], [26, "introduction"]], "Introduction to Neural networks": [[39, "introduction-to-neural-networks"], [40, "introduction-to-neural-networks"]], "Introduction to numerical projects": [[27, "introduction-to-numerical-projects"], [28, "introduction-to-numerical-projects"]], "Is the Logistic activation function (Sigmoid) our choice?": [[41, "is-the-logistic-activation-function-sigmoid-our-choice"]], "Iterative Fitting, Classification and AdaBoost": [[10, "iterative-fitting-classification-and-adaboost"]], "Iterative Fitting, Regression and Squared-error Cost Function": [[10, "iterative-fitting-regression-and-squared-error-cost-function"]], "Kernel PCA": [[11, "kernel-pca"]], "Kernels and non-linearity": [[8, "kernels-and-non-linearity"]], "Key Idea": [[43, "key-idea"], [44, "key-idea"]], "LU Decomposition, the inverse of a matrix": [[26, "lu-decomposition-the-inverse-of-a-matrix"]], "Lab sessions on Tuesday and Wednesday": [[43, "lab-sessions-on-tuesday-and-wednesday"]], "Lab sessions Tuesday and Wednesday": [[39, "lab-sessions-tuesday-and-wednesday"]], "Lab sessions on Tuesday and Wednesday": [[40, "lab-sessions-on-tuesday-and-wednesday"]], "Lab sessions week 39": [[38, "lab-sessions-week-39"]], "Lasso Regression": [[35, "lasso-regression"]], "Lasso case": [[35, "lasso-case"]], "Layers": [[1, "layers"], [41, "layers"]], "Layers of a CNN": [[44, "layers-of-a-cnn"]], "Layers used to build CNNs": [[3, "layers-used-to-build-cnns"], [43, "layers-used-to-build-cnns"], [44, "layers-used-to-build-cnns"]], "Layout of a neural network with three hidden layers": [[40, "layout-of-a-neural-network-with-three-hidden-layers"]], "Layout of a neural network with three hidden layers (last layer = l=L=4, first layer l=0)": [[41, "layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0"]], "Layout of a simple neural network with no hidden layer": [[40, "layout-of-a-simple-neural-network-with-no-hidden-layer"], [41, "layout-of-a-simple-neural-network-with-no-hidden-layer"]], "Layout of a simple neural network with one hidden layer": [[40, "layout-of-a-simple-neural-network-with-one-hidden-layer"], [41, "layout-of-a-simple-neural-network-with-one-hidden-layer"]], "Layout of a simple neural network with two input nodes, one hidden layer and one output node": [[40, "layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node"]], "Layout of a simple neural network with two input nodes, one hidden layer with two hidden noeds and one output node": [[41, "layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node"]], "Layout of input to first hidden layer l=1 from input layer l=0": [[41, "layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0"]], "Learning goals": [[15, "learning-goals"], [16, "learning-goals"], [17, "learning-goals"], [18, "learning-goals"], [19, "learning-goals"], [20, "learning-goals"]], "Learning outcomes": [[25, "learning-outcomes"], [33, "learning-outcomes"]], "Learning rate methods": [[41, "learning-rate-methods"], [42, "learning-rate-methods"]], "Lecture Monday October 20": [[42, "lecture-monday-october-20"]], "Lecture Monday October 6": [[40, "lecture-monday-october-6"]], "Lecture Monday September 29, 2025": [[39, "lecture-monday-september-29-2025"]], "Lecture October 13, 2025": [[41, "lecture-october-13-2025"]], "Lecture material": [[38, "lecture-material"]], "Lecture material: Writing a code which implements a feed-forward neural network": [[41, "lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network"]], "Lectures and ComputerLab": [[33, "lectures-and-computerlab"]], "Limitations of NNs": [[41, "limitations-of-nns"]], "Limitations of supervised learning with deep networks": [[1, "limitations-of-supervised-learning-with-deep-networks"], [41, "limitations-of-supervised-learning-with-deep-networks"]], "Linear Algebra, Handling of Arrays and more Python Features": [[26, null]], "Linear Regression": [[0, null]], "Linear Regression Problems": [[34, "linear-regression-problems"], [35, "linear-regression-problems"]], "Linear Regression and the SVD": [[35, "linear-regression-and-the-svd"]], "Linear Regression, basic elements": [[0, "linear-regression-basic-elements"]], "Linear classifier": [[38, "linear-classifier"]], "Linking Bayes\u2019 Theorem with Ridge and Lasso Regression": [[5, "linking-bayes-theorem-with-ridge-and-lasso-regression"]], "Linking the regression analysis with a statistical interpretation": [[5, "linking-the-regression-analysis-with-a-statistical-interpretation"], [37, "linking-the-regression-analysis-with-a-statistical-interpretation"]], "Linking with the SVD": [[5, "linking-with-the-svd"], [34, "linking-with-the-svd"]], "Links to relevant courses at the University of Oslo": [[32, "links-to-relevant-courses-at-the-university-of-oslo"]], "Logistic Regression": [[7, null], [7, "id1"], [38, "logistic-regression"]], "Logistic Regression, from last week": [[39, "logistic-regression-from-last-week"]], "Logistic function as the root of problems": [[41, "logistic-function-as-the-root-of-problems"]], "MNIST and GANs": [[4, "mnist-and-gans"]], "Machine Learning": [[33, "machine-learning"]], "Machine learning": [[25, "machine-learning"]], "Main textbooks": [[33, "main-textbooks"]], "Making a tree": [[9, "making-a-tree"]], "Making your own Bootstrap: Changing the Level of the Decision Tree": [[10, "making-your-own-bootstrap-changing-the-level-of-the-decision-tree"]], "Making your own test-train splitting": [[34, "making-your-own-test-train-splitting"]], "Material for Lecture Monday November 3": [[44, "material-for-lecture-monday-november-3"]], "Material for Lecture Monday October 27": [[43, "material-for-lecture-monday-october-27"]], "Material for exercises week 35": [[34, "material-for-exercises-week-35"]], "Material for lab sessions sessions Tuesday and Wednesday": [[35, "material-for-lab-sessions-sessions-tuesday-and-wednesday"]], "Material for lecture Monday September 2": [[35, "material-for-lecture-monday-september-2"]], "Material for lecture Monday September 8": [[36, "material-for-lecture-monday-september-8"]], "Material for the lab sessions": [[36, "material-for-the-lab-sessions"], [37, "material-for-the-lab-sessions"], [44, "material-for-the-lab-sessions"]], "Material for the lab sessions on Tuesday and Wednesday": [[41, "material-for-the-lab-sessions-on-tuesday-and-wednesday"]], "Material for the lecture on Monday October 6, 2025": [[40, "material-for-the-lecture-on-monday-october-6-2025"]], "Mathematical Interpretation of Ordinary Least Squares": [[5, "mathematical-interpretation-of-ordinary-least-squares"], [34, "mathematical-interpretation-of-ordinary-least-squares"], [35, "mathematical-interpretation-of-ordinary-least-squares"]], "Mathematical model": [[39, "mathematical-model"], [39, "id1"], [39, "id2"], [39, "id3"], [39, "id4"]], "Mathematical optimization of convex functions": [[8, "mathematical-optimization-of-convex-functions"]], "Mathematics of CNNs": [[3, "mathematics-of-cnns"], [43, "mathematics-of-cnns"], [44, "mathematics-of-cnns"]], "Mathematics of deep learning": [[40, "mathematics-of-deep-learning"], [41, "mathematics-of-deep-learning"]], "Mathematics of deep learning and neural networks": [[40, "mathematics-of-deep-learning-and-neural-networks"]], "Mathematics of the SVD and implications": [[5, "mathematics-of-the-svd-and-implications"], [34, "mathematics-of-the-svd-and-implications"], [35, "mathematics-of-the-svd-and-implications"]], "Matrices in Python": [[33, "matrices-in-python"]], "Matrix multiplication": [[1, "matrix-multiplication"], [41, "matrix-multiplication"]], "Matrix multiplications": [[41, "matrix-multiplications"]], "Matrix-vector notation": [[39, "matrix-vector-notation"]], "Matrix-vector notation and activation": [[12, "matrix-vector-notation-and-activation"], [39, "matrix-vector-notation-and-activation"]], "Maximum Likelihood Estimation (MLE)": [[37, "maximum-likelihood-estimation-mle"]], "Maximum likelihood": [[38, "maximum-likelihood"], [39, "maximum-likelihood"]], "Meet the covariance!": [[30, "meet-the-covariance"]], "Meet the Covariance Matrix": [[5, "meet-the-covariance-matrix"], [34, "meet-the-covariance-matrix"]], "Meet the Hessian Matrix": [[34, "meet-the-hessian-matrix"]], "Meet the Pandas": [[33, "meet-the-pandas"]], "Memory Usage and Scalability": [[36, "memory-usage-and-scalability"]], "Memory considerations": [[43, "memory-considerations"], [44, "memory-considerations"]], "Memory constraints": [[36, "memory-constraints"]], "Min-Max Scaling": [[34, "min-max-scaling"]], "Minimization process": [[42, "minimization-process"], [43, "minimization-process"]], "Minimizing the cost function using gradient descent and automatic differentiation": [[42, "minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation"], [43, "minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation"]], "Minimizing the cross entropy": [[38, "minimizing-the-cross-entropy"], [39, "minimizing-the-cross-entropy"]], "Momentum based GD": [[13, "momentum-based-gd"], [36, "momentum-based-gd"]], "More classes": [[38, "more-classes"], [39, "more-classes"]], "More complicated Example: The Ising model": [[6, "more-complicated-example-the-ising-model"]], "More complicated function": [[40, "more-complicated-function"]], "More considerations": [[40, "more-considerations"], [41, "more-considerations"]], "More details": [[42, "more-details"], [42, "id4"], [43, "more-details"], [43, "id3"]], "More examples on bootstrap and cross-validation and errors": [[37, "more-examples-on-bootstrap-and-cross-validation-and-errors"], [38, "more-examples-on-bootstrap-and-cross-validation-and-errors"]], "More interpretations": [[34, "more-interpretations"], [35, "more-interpretations"], [35, "id5"]], "More limitations": [[41, "more-limitations"]], "More on Dimensionalities": [[3, "more-on-dimensionalities"], [43, "more-on-dimensionalities"], [44, "more-on-dimensionalities"]], "More on Rescaling data": [[6, "more-on-rescaling-data"]], "More on Steepest descent": [[35, "more-on-steepest-descent"]], "More on activation functions, output layers": [[41, "more-on-activation-functions-output-layers"], [42, "more-on-activation-functions-output-layers"]], "More on convex functions": [[35, "more-on-convex-functions"]], "More on the general approximation theorem": [[40, "more-on-the-general-approximation-theorem"]], "More preprocessing": [[34, "more-preprocessing"], [36, "more-preprocessing"]], "More technicalities": [[42, "more-technicalities"], [43, "more-technicalities"]], "More top-down perspectives": [[41, "more-top-down-perspectives"]], "Motivation for Adaptive Step Sizes": [[36, "motivation-for-adaptive-step-sizes"]], "Multiclass classification": [[41, "multiclass-classification"], [42, "multiclass-classification"]], "Multilayer perceptrons": [[12, "multilayer-perceptrons"], [39, "multilayer-perceptrons"], [40, "multilayer-perceptrons"]], "Multivariable functions": [[40, "multivariable-functions"]], "Network requirements": [[2, "network-requirements"], [42, "network-requirements"], [43, "network-requirements"]], "Neural Networks vs CNNs": [[3, "neural-networks-vs-cnns"], [43, "neural-networks-vs-cnns"], [44, "neural-networks-vs-cnns"]], "Neural network types": [[39, "neural-network-types"], [40, "neural-network-types"]], "Neural networks": [[12, null]], "New expression for the derivative": [[40, "new-expression-for-the-derivative"]], "New image (or volume)": [[43, "new-image-or-volume"], [44, "new-image-or-volume"]], "New vector": [[43, "new-vector"], [44, "new-vector"]], "Non-Convex Problems": [[36, "non-convex-problems"]], "Note about SVD Calculations": [[34, "note-about-svd-calculations"], [35, "note-about-svd-calculations"]], "Note on Scikit-Learn": [[35, "note-on-scikit-learn"]], "Numerical experiments and the covariance, central limit theorem": [[30, "numerical-experiments-and-the-covariance-central-limit-theorem"]], "Numpy and arrays": [[26, "numpy-and-arrays"], [33, "numpy-and-arrays"]], "Numpy examples and Important Matrix and vector handling packages": [[33, "numpy-examples-and-important-matrix-and-vector-handling-packages"]], "Optimization and Deep learning": [[38, "optimization-and-deep-learning"], [39, "optimization-and-deep-learning"]], "Optimization and gradient descent, the central part of any Machine Learning algortithm": [[35, "optimization-and-gradient-descent-the-central-part-of-any-machine-learning-algortithm"]], "Optimization, the central part of any Machine Learning algortithm": [[13, null], [38, "optimization-the-central-part-of-any-machine-learning-algortithm"], [39, "optimization-the-central-part-of-any-machine-learning-algortithm"]], "Optimizing our parameters": [[33, "optimizing-our-parameters"]], "Optimizing our parameters, more details": [[33, "optimizing-our-parameters-more-details"]], "Optimizing the cost function": [[1, "optimizing-the-cost-function"], [41, "optimizing-the-cost-function"]], "Optimizing the parameters": [[40, "optimizing-the-parameters"], [41, "optimizing-the-parameters"]], "Optional (Note that you should include at least two of these in the report):": [[28, "optional-note-that-you-should-include-at-least-two-of-these-in-the-report"]], "Ordinary Differential Equations first": [[42, "ordinary-differential-equations-first"], [43, "ordinary-differential-equations-first"]], "Organizing our data": [[0, "organizing-our-data"], [33, "organizing-our-data"]], "Other Matrix and Vector Operations": [[26, "other-matrix-and-vector-operations"]], "Other Types of Recurrent Neural Networks": [[4, "other-types-of-recurrent-neural-networks"]], "Other courses on Data science and Machine Learning at UiO": [[33, "other-courses-on-data-science-and-machine-learning-at-uio"]], "Other courses on Data science and Machine Learning at UiO, contn": [[33, "other-courses-on-data-science-and-machine-learning-at-uio-contn"]], "Other ingredients of a neural network": [[40, "other-ingredients-of-a-neural-network"]], "Other measures in classification studies": [[39, "other-measures-in-classification-studies"]], "Other measures: Precision, Recall, and the F_1 Measure": [[23, "other-measures-precision-recall-and-the-f-1-measure"]], "Other parameters": [[40, "other-parameters"]], "Other popular texts": [[33, "other-popular-texts"]], "Other techniques": [[11, "other-techniques"]], "Other types of networks": [[12, "other-types-of-networks"], [39, "other-types-of-networks"], [40, "other-types-of-networks"]], "Other ways of visualizing the trees": [[9, "other-ways-of-visualizing-the-trees"]], "Our model for the nuclear binding energies": [[33, "our-model-for-the-nuclear-binding-energies"]], "Output layer": [[40, "output-layer"], [41, "output-layer"]], "Overarching aims of the exercises for week 43": [[23, "overarching-aims-of-the-exercises-for-week-43"]], "Overarching aims of the exercises this week": [[21, "overarching-aims-of-the-exercises-this-week"], [22, "overarching-aims-of-the-exercises-this-week"], [24, "overarching-aims-of-the-exercises-this-week"]], "Overarching view of a neural network": [[40, "overarching-view-of-a-neural-network"]], "Overview of first week": [[33, "overview-of-first-week"]], "Overview video on Stochastic Gradient Descent (SGD)": [[36, "overview-video-on-stochastic-gradient-descent-sgd"]], "Own code for Ordinary Least Squares": [[33, "own-code-for-ordinary-least-squares"], [34, "own-code-for-ordinary-least-squares"]], "PCA and scikit-learn": [[11, "pca-and-scikit-learn"]], "Padding": [[43, "padding"], [44, "padding"]], "Pandas AI": [[33, "pandas-ai"]], "Parameters of neural networks": [[40, "parameters-of-neural-networks"]], "Parameters to train, common settings": [[43, "parameters-to-train-common-settings"], [44, "parameters-to-train-common-settings"]], "Part a : Ordinary Least Square (OLS) for the Runge function": [[27, "part-a-ordinary-least-square-ols-for-the-runge-function"]], "Part a): Analytical warm-up": [[28, "part-a-analytical-warm-up"]], "Part b): Writing your own Neural Network code": [[28, "part-b-writing-your-own-neural-network-code"]], "Part b: Adding Ridge regression for the Runge function": [[27, "part-b-adding-ridge-regression-for-the-runge-function"]], "Part c): Testing against other software libraries": [[28, "part-c-testing-against-other-software-libraries"]], "Part c: Writing your own gradient descent code": [[27, "part-c-writing-your-own-gradient-descent-code"]], "Part d): Testing different activation functions and depths of the neural network": [[28, "part-d-testing-different-activation-functions-and-depths-of-the-neural-network"]], "Part d: Including momentum and more advanced ways to update the learning the rate": [[27, "part-d-including-momentum-and-more-advanced-ways-to-update-the-learning-the-rate"]], "Part e): Testing different norms": [[28, "part-e-testing-different-norms"]], "Part e: Writing our own code for Lasso regression": [[27, "part-e-writing-our-own-code-for-lasso-regression"]], "Part f): Classification analysis using neural networks": [[28, "part-f-classification-analysis-using-neural-networks"]], "Part f: Stochastic gradient descent": [[27, "part-f-stochastic-gradient-descent"]], "Part g) Critical evaluation of the various algorithms": [[28, "part-g-critical-evaluation-of-the-various-algorithms"]], "Part g: Bias-variance trade-off and resampling techniques": [[27, "part-g-bias-variance-trade-off-and-resampling-techniques"]], "Part h): Cross-validation as resampling techniques, adding more complexity": [[27, "part-h-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Partial Differential Equations": [[2, "partial-differential-equations"], [42, "partial-differential-equations"], [43, "partial-differential-equations"]], "Plan for week 39, September 22-26, 2025": [[38, "plan-for-week-39-september-22-26-2025"]], "Plan for week 41, October 6-10": [[40, "plan-for-week-41-october-6-10"]], "Plan for week 44": [[43, "plan-for-week-44"]], "Plans for week 35": [[34, "plans-for-week-35"]], "Plans for week 36": [[35, "plans-for-week-36"]], "Plans for week 37, lecture Monday": [[36, "plans-for-week-37-lecture-monday"]], "Plans for week 38, lecture Monday September 15": [[37, "plans-for-week-38-lecture-monday-september-15"]], "Plans for week 43": [[42, "plans-for-week-43"]], "Plans for week 45": [[44, "plans-for-week-45"]], "Plotting the Histogram": [[37, "plotting-the-histogram"]], "Plotting the mean value for each group": [[38, "plotting-the-mean-value-for-each-group"]], "Pooling": [[43, "pooling"], [44, "pooling"]], "Pooling arithmetic": [[43, "pooling-arithmetic"], [44, "pooling-arithmetic"]], "Pooling types (From Raschka et al)": [[43, "pooling-types-from-raschka-et-al"], [44, "pooling-types-from-raschka-et-al"]], "Practical tips": [[13, "practical-tips"], [36, "practical-tips"]], "Practicalities": [[31, "practicalities"], [31, "id1"]], "Preamble: Note on writing reports, using reference material, AI and other tools": [[27, "preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools"], [28, "preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools"]], "Predicting New Points With A Trained Recurrent Neural Network": [[4, "predicting-new-points-with-a-trained-recurrent-neural-network"]], "Preprocessing our data": [[34, "preprocessing-our-data"]], "Prerequisites": [[33, "prerequisites"]], "Prerequisites and background": [[25, "prerequisites-and-background"]], "Prerequisites: Collect and pre-process data": [[3, "prerequisites-collect-and-pre-process-data"], [44, "prerequisites-collect-and-pre-process-data"]], "Probability Distribution Functions": [[30, "probability-distribution-functions"]], "Program example for gradient descent with Ridge Regression": [[35, "program-example-for-gradient-descent-with-ridge-regression"], [36, "program-example-for-gradient-descent-with-ridge-regression"]], "Program for stochastic gradient": [[13, "program-for-stochastic-gradient"]], "Project 1 on Machine Learning, deadline October 6 (midnight), 2025": [[27, null]], "Project 2 on Machine Learning, deadline November 10 (Midnight)": [[28, null]], "Properties of PDFs": [[30, "properties-of-pdfs"]], "Pros and cons": [[36, "pros-and-cons"]], "Pros and cons of trees, pros": [[9, "pros-and-cons-of-trees-pros"]], "Python installers": [[25, "python-installers"], [33, "python-installers"]], "RMS prop": [[13, "rms-prop"]], "RMSProp algorithm, taken from Goodfellow et al": [[36, "rmsprop-algorithm-taken-from-goodfellow-et-al"]], "RMSProp: Adaptive Learning Rates": [[36, "rmsprop-adaptive-learning-rates"]], "RMSprop for adaptive learning rate with Stochastic Gradient Descent": [[36, "rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent"]], "ROC Curve": [[23, "roc-curve"]], "Random Numbers": [[30, "random-numbers"]], "Random forests": [[10, "random-forests"]], "Randomized PCA": [[11, "randomized-pca"]], "Reading material": [[33, "reading-material"]], "Reading recommendations": [[41, "reading-recommendations"]], "Reading recommendations:": [[34, "reading-recommendations"]], "Reading suggestions week 34": [[33, "reading-suggestions-week-34"]], "Readings and Videos": [[37, "readings-and-videos"]], "Readings and Videos, logistic regression": [[38, "readings-and-videos-logistic-regression"]], "Readings and Videos, resampling methods": [[38, "readings-and-videos-resampling-methods"]], "Readings and Videos:": [[36, "readings-and-videos"], [40, "readings-and-videos"]], "Readings and videos": [[41, "readings-and-videos"]], "Recurrent neural networks": [[12, "recurrent-neural-networks"], [39, "recurrent-neural-networks"], [40, "recurrent-neural-networks"]], "Recurrent neural networks: Overarching view": [[4, null]], "Reducing the number of degrees of freedom, overarching view": [[0, "reducing-the-number-of-degrees-of-freedom-overarching-view"], [34, "reducing-the-number-of-degrees-of-freedom-overarching-view"]], "Reducing the number of operations": [[40, "reducing-the-number-of-operations"]], "Reformulating the problem": [[2, "reformulating-the-problem"], [42, "reformulating-the-problem"], [43, "reformulating-the-problem"]], "Regression Case": [[10, "regression-case"]], "Regression analysis and resampling methods": [[27, "regression-analysis-and-resampling-methods"]], "Regression analysis, overarching aims": [[33, "regression-analysis-overarching-aims"]], "Regression analysis, overarching aims II": [[33, "regression-analysis-overarching-aims-ii"]], "Regular NNs don\u2019t scale well to full images": [[43, "regular-nns-dont-scale-well-to-full-images"], [44, "regular-nns-dont-scale-well-to-full-images"]], "Regularization": [[1, "regularization"], [41, "regularization"]], "Relevance": [[39, "relevance"], [41, "relevance"]], "Reminder about the gradient machinery from project 1": [[28, "reminder-about-the-gradient-machinery-from-project-1"]], "Reminder from last week": [[34, "reminder-from-last-week"]], "Reminder from last week: First network example, simple percepetron with one input": [[41, "reminder-from-last-week-first-network-example-simple-percepetron-with-one-input"]], "Reminder on Newton-Raphson\u2019s method": [[35, "reminder-on-newton-raphson-s-method"]], "Reminder on Statistics": [[6, "reminder-on-statistics"]], "Reminder on books with hands-on material and codes": [[40, "reminder-on-books-with-hands-on-material-and-codes"], [41, "reminder-on-books-with-hands-on-material-and-codes"]], "Reminder on different scaling methods": [[36, "reminder-on-different-scaling-methods"]], "Reminder on the chain rule and gradients": [[40, "reminder-on-the-chain-rule-and-gradients"]], "Replace or not": [[13, "replace-or-not"], [36, "replace-or-not"]], "Required Analysis:": [[28, "required-analysis"]], "Required Technologies": [[25, "required-technologies"]], "Resampling Methods": [[6, null]], "Resampling and the Bias-Variance Trade-off": [[19, "resampling-and-the-bias-variance-trade-off"]], "Resampling approaches can be computationally expensive": [[37, "resampling-approaches-can-be-computationally-expensive"], [38, "resampling-approaches-can-be-computationally-expensive"]], "Resampling methods": [[6, "id1"], [37, "resampling-methods"], [37, "id2"], [38, "resampling-methods"], [38, "id1"]], "Resampling methods: Bootstrap": [[37, "resampling-methods-bootstrap"], [38, "resampling-methods-bootstrap"]], "Resampling methods: Bootstrap approach": [[37, "resampling-methods-bootstrap-approach"]], "Resampling methods: Bootstrap background": [[37, "resampling-methods-bootstrap-background"]], "Resampling methods: Bootstrap steps": [[37, "resampling-methods-bootstrap-steps"]], "Resampling methods: More Bootstrap background": [[37, "resampling-methods-more-bootstrap-background"]], "Residual Error": [[34, "residual-error"], [35, "residual-error"]], "Resources on differential equations and deep learning": [[2, "resources-on-differential-equations-and-deep-learning"], [42, "resources-on-differential-equations-and-deep-learning"], [43, "resources-on-differential-equations-and-deep-learning"]], "Revisiting Ordinary Least Squares": [[35, "revisiting-ordinary-least-squares"]], "Revisiting our Linear Regression Solvers": [[13, "revisiting-our-linear-regression-solvers"]], "Revisiting our Logistic Regression case": [[38, "revisiting-our-logistic-regression-case"], [39, "revisiting-our-logistic-regression-case"]], "Rewriting as dot products": [[43, "rewriting-as-dot-products"], [44, "rewriting-as-dot-products"]], "Rewriting the Covariance and/or Correlation Matrix": [[34, "rewriting-the-covariance-and-or-correlation-matrix"]], "Rewriting the \\delta-function": [[37, "rewriting-the-delta-function"]], "Rewriting the fitting procedure as a linear algebra problem": [[33, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem"]], "Rewriting the fitting procedure as a linear algebra problem, more details": [[33, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem-more-details"]], "Ridge Regression": [[35, "ridge-regression"]], "Ridge and LASSO Regression": [[34, "ridge-and-lasso-regression"], [35, "ridge-and-lasso-regression"], [35, "id2"]], "Ridge and Lasso Regression": [[5, null], [5, "id1"]], "Running with Keras": [[44, "running-with-keras"]], "SGD example": [[36, "sgd-example"]], "SGD vs Full-Batch GD: Convergence Speed and Memory Comparison": [[36, "sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison"]], "SVD analysis": [[35, "svd-analysis"]], "Same code but now with momentum gradient descent": [[13, "same-code-but-now-with-momentum-gradient-descent"], [36, "same-code-but-now-with-momentum-gradient-descent"], [36, "id3"], [36, "id4"]], "Schedule first week": [[33, "schedule-first-week"]], "Schematic Regression Procedure": [[9, "schematic-regression-procedure"]], "Second moment of the gradient": [[36, "second-moment-of-the-gradient"]], "September 15-19": [[19, "september-15-19"]], "Set up the model": [[44, "set-up-the-model"]], "Setting it up": [[44, "setting-it-up"]], "Setting up a Multi-layer perceptron model for classification": [[41, "setting-up-a-multi-layer-perceptron-model-for-classification"]], "Setting up the Back propagation algorithm": [[12, "setting-up-the-back-propagation-algorithm"]], "Setting up the Back propagation algorithm, part 3": [[40, "setting-up-the-back-propagation-algorithm-part-3"], [41, "setting-up-the-back-propagation-algorithm-part-3"], [42, "setting-up-the-back-propagation-algorithm-part-3"]], "Setting up the Matrix to be inverted": [[34, "setting-up-the-matrix-to-be-inverted"], [35, "setting-up-the-matrix-to-be-inverted"]], "Setting up the back propagation algorithm": [[40, "setting-up-the-back-propagation-algorithm"]], "Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations": [[41, "setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations"], [42, "setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations"]], "Setting up the back propagation algorithm, part 1": [[41, "setting-up-the-back-propagation-algorithm-part-1"], [42, "setting-up-the-back-propagation-algorithm-part-1"]], "Setting up the back propagation algorithm, part 2": [[40, "setting-up-the-back-propagation-algorithm-part-2"], [41, "setting-up-the-back-propagation-algorithm-part-2"], [42, "setting-up-the-back-propagation-algorithm-part-2"]], "Setting up the code": [[42, "setting-up-the-code"], [43, "setting-up-the-code"]], "Setting up the equations for a neural network": [[40, "setting-up-the-equations-for-a-neural-network"], [41, "setting-up-the-equations-for-a-neural-network"]], "Setting up the network using Autograd": [[42, "setting-up-the-network-using-autograd"], [43, "setting-up-the-network-using-autograd"]], "Setting up the network using Autograd; The full program": [[2, "setting-up-the-network-using-autograd-the-full-program"], [42, "setting-up-the-network-using-autograd-the-full-program"], [43, "setting-up-the-network-using-autograd-the-full-program"]], "Setting up the network using Autograd; The trial solution": [[42, "setting-up-the-network-using-autograd-the-trial-solution"], [43, "setting-up-the-network-using-autograd-the-trial-solution"]], "Setting up the problem": [[42, "setting-up-the-problem"], [43, "setting-up-the-problem"]], "Setup of Network": [[42, "setup-of-network"], [43, "setup-of-network"]], "Similar (second order function now) problem but now with AdaGrad": [[13, "similar-second-order-function-now-problem-but-now-with-adagrad"], [36, "similar-second-order-function-now-problem-but-now-with-adagrad"]], "Simple Python Code to read in Data and perform Classification": [[9, "simple-python-code-to-read-in-data-and-perform-classification"]], "Simple case": [[34, "simple-case"], [35, "simple-case"]], "Simple code for solving the above problem": [[35, "simple-code-for-solving-the-above-problem"]], "Simple example": [[38, "simple-example"], [40, "simple-example"]], "Simple example code": [[36, "simple-example-code"]], "Simple example to illustrate Ordinary Least Squares, Ridge and Lasso Regression": [[35, "simple-example-to-illustrate-ordinary-least-squares-ridge-and-lasso-regression"]], "Simple geometric interpretation": [[35, "simple-geometric-interpretation"]], "Simple linear regression model using scikit-learn": [[0, "simple-linear-regression-model-using-scikit-learn"], [33, "simple-linear-regression-model-using-scikit-learn"]], "Simple neural network and the back propagation equations": [[40, "simple-neural-network-and-the-back-propagation-equations"], [41, "simple-neural-network-and-the-back-propagation-equations"]], "Simple one-dimensional second-order polynomial": [[18, "simple-one-dimensional-second-order-polynomial"]], "Simple program": [[35, "simple-program"], [36, "simple-program"]], "Simpler examples first, and automatic differentiation": [[40, "simpler-examples-first-and-automatic-differentiation"]], "Slightly different approach": [[36, "slightly-different-approach"]], "Smarter way of evaluating the above function": [[40, "smarter-way-of-evaluating-the-above-function"]], "Sneaking in automatic differentiation using Autograd": [[36, "sneaking-in-automatic-differentiation-using-autograd"]], "Software and needed installations": [[27, "software-and-needed-installations"], [33, "software-and-needed-installations"]], "Solving Differential Equations with Deep Learning": [[2, null]], "Solving differential equations with Deep Learning": [[42, "solving-differential-equations-with-deep-learning"], [43, "solving-differential-equations-with-deep-learning"]], "Solving the equation using Autograd": [[42, "solving-the-equation-using-autograd"], [43, "solving-the-equation-using-autograd"]], "Solving the one dimensional Poisson equation": [[2, "solving-the-one-dimensional-poisson-equation"]], "Solving the wave equation - the full program using Autograd": [[42, "solving-the-wave-equation-the-full-program-using-autograd"]], "Solving the wave equation with Neural Networks": [[2, "solving-the-wave-equation-with-neural-networks"]], "Solving using Newton-Raphson\u2019s method": [[38, "solving-using-newton-raphson-s-method"], [39, "solving-using-newton-raphson-s-method"]], "Some famous Matrices": [[26, "some-famous-matrices"]], "Some parallels from real analysis": [[40, "some-parallels-from-real-analysis"]], "Some selected properties": [[38, "some-selected-properties"]], "Some simple problems": [[13, "some-simple-problems"], [35, "some-simple-problems"]], "Some useful matrix and vector expressions": [[34, "some-useful-matrix-and-vector-expressions"]], "Splitting our Data in Training and Test data": [[0, "splitting-our-data-in-training-and-test-data"], [34, "splitting-our-data-in-training-and-test-data"]], "Standard Approach based on the Normal Distribution": [[37, "standard-approach-based-on-the-normal-distribution"]], "Standard steepest descent": [[13, "standard-steepest-descent"]], "Statistical analysis": [[37, "statistical-analysis"], [38, "statistical-analysis"]], "Statistical analysis and optimization of data": [[25, "statistical-analysis-and-optimization-of-data"], [33, "statistical-analysis-and-optimization-of-data"]], "Steepest descent": [[13, "steepest-descent"], [35, "steepest-descent"]], "Stochastic Gradient Descent": [[36, "stochastic-gradient-descent"]], "Stochastic Gradient Descent (SGD)": [[13, "stochastic-gradient-descent-sgd"], [36, "stochastic-gradient-descent-sgd"]], "Stochastic variables and the main concepts, the discrete case": [[30, "stochastic-variables-and-the-main-concepts-the-discrete-case"]], "Strong correlations": [[44, "strong-correlations"]], "Strongly Convex Case": [[36, "strongly-convex-case"]], "Suggested readings and videos": [[39, "suggested-readings-and-videos"]], "Summarizing: Performing a general discrete convolution (From Raschka et al)": [[43, "summarizing-performing-a-general-discrete-convolution-from-raschka-et-al"], [44, "summarizing-performing-a-general-discrete-convolution-from-raschka-et-al"]], "Summary of methods to implement and analyze": [[28, "summary-of-methods-to-implement-and-analyze"]], "Summing up": [[37, "summing-up"], [38, "summing-up"]], "Support Vector Machines, overarching aims": [[8, null]], "Synthetic data generation": [[38, "synthetic-data-generation"], [39, "synthetic-data-generation"]], "Systematic reduction": [[3, "systematic-reduction"], [44, "systematic-reduction"]], "Teachers": [[33, "teachers"]], "Teachers and Grading": [[31, null]], "Teaching Assistants Fall semester 2023": [[31, "teaching-assistants-fall-semester-2023"]], "Technicalities": [[42, "technicalities"], [43, "technicalities"]], "Tensorflow": [[41, "tensorflow"], [42, "tensorflow"]], "Tentative deadllines for projects": [[31, "tentative-deadllines-for-projects"]], "Testing the Means Squared Error as function of Complexity": [[0, "testing-the-means-squared-error-as-function-of-complexity"], [34, "testing-the-means-squared-error-as-function-of-complexity"]], "Testing the XOR gate and other gates": [[41, "testing-the-xor-gate-and-other-gates"], [42, "testing-the-xor-gate-and-other-gates"]], "Textbooks": [[32, null]], "The back propagation equations for a neural network": [[41, "the-back-propagation-equations-for-a-neural-network"]], "The Algorithm before theorem": [[11, "the-algorithm-before-theorem"]], "The Breast Cancer Data, now with Keras": [[1, "the-breast-cancer-data-now-with-keras"]], "The CART algorithm for Classification": [[9, "the-cart-algorithm-for-classification"]], "The CART algorithm for Regression": [[9, "the-cart-algorithm-for-regression"]], "The CIFAR01 data set": [[3, "the-cifar01-data-set"], [44, "the-cifar01-data-set"]], "The Central Limit Theorem": [[37, "the-central-limit-theorem"]], "The Hessian matrix": [[35, "the-hessian-matrix"], [36, "the-hessian-matrix"]], "The Hessian matrix for Ridge Regression": [[35, "the-hessian-matrix-for-ridge-regression"], [36, "the-hessian-matrix-for-ridge-regression"]], "The Jacobian": [[34, "the-jacobian"]], "The MNIST dataset again": [[3, "the-mnist-dataset-again"], [44, "the-mnist-dataset-again"]], "The Neural Network": [[41, "the-neural-network"], [42, "the-neural-network"]], "The OLS case": [[35, "the-ols-case"]], "The RELU function family": [[1, "the-relu-function-family"], [41, "the-relu-function-family"], [42, "the-relu-function-family"]], "The Ridge case": [[35, "the-ridge-case"]], "The SVD example": [[43, "the-svd-example"]], "The SVD, a Fantastic Algorithm": [[34, "the-svd-a-fantastic-algorithm"], [35, "the-svd-a-fantastic-algorithm"]], "The Softmax function": [[1, "the-softmax-function"], [41, "the-softmax-function"]], "The \\chi^2 function": [[0, "the-chi-2-function"], [33, "the-chi-2-function"], [33, "id4"], [33, "id5"], [33, "id6"], [33, "id7"], [33, "id8"]], "The analytical solution": [[42, "the-analytical-solution"]], "The approximation theorem in words": [[40, "the-approximation-theorem-in-words"]], "The bias-variance tradeoff": [[6, "the-bias-variance-tradeoff"], [37, "the-bias-variance-tradeoff"], [38, "the-bias-variance-tradeoff"]], "The code for solving the ODE": [[2, "the-code-for-solving-the-ode"], [42, "the-code-for-solving-the-ode"], [43, "the-code-for-solving-the-ode"]], "The complete code with a simple data set": [[34, "the-complete-code-with-a-simple-data-set"]], "The convolution stage": [[43, "the-convolution-stage"], [44, "the-convolution-stage"]], "The cost function rewritten": [[38, "the-cost-function-rewritten"], [39, "the-cost-function-rewritten"]], "The cost/loss function": [[34, "the-cost-loss-function"]], "The course has two central parts": [[25, "the-course-has-two-central-parts"]], "The derivative of the Logistic funtion": [[41, "the-derivative-of-the-logistic-funtion"]], "The derivative of the cost/loss function": [[35, "the-derivative-of-the-cost-loss-function"], [36, "the-derivative-of-the-cost-loss-function"]], "The derivatives": [[40, "the-derivatives"], [41, "the-derivatives"]], "The equations": [[35, "the-equations"]], "The equations for ordinary least squares": [[34, "the-equations-for-ordinary-least-squares"]], "The equations to solve": [[38, "the-equations-to-solve"], [39, "the-equations-to-solve"]], "The first Case": [[35, "the-first-case"]], "The function to solve for": [[42, "the-function-to-solve-for"], [43, "the-function-to-solve-for"]], "The gradient step": [[36, "the-gradient-step"]], "The ideal": [[35, "the-ideal"]], "The logistic function": [[7, "the-logistic-function"], [38, "the-logistic-function"]], "The mean squared error and its derivative": [[34, "the-mean-squared-error-and-its-derivative"]], "The moons example": [[8, "the-moons-example"]], "The multilayer perceptron (MLP)": [[12, "the-multilayer-perceptron-mlp"]], "The network with one input layer, specified number of hidden layers, and one output layer": [[2, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"], [42, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"], [43, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"]], "The optimization problem": [[40, "the-optimization-problem"]], "The ouput layer": [[40, "the-ouput-layer"], [41, "the-ouput-layer"]], "The plethora of machine learning algorithms/methods": [[33, "the-plethora-of-machine-learning-algorithms-methods"]], "The problem to solve for": [[42, "the-problem-to-solve-for"]], "The program using Autograd": [[42, "the-program-using-autograd"], [43, "the-program-using-autograd"]], "The same example but now with cross-validation": [[37, "the-same-example-but-now-with-cross-validation"], [38, "the-same-example-but-now-with-cross-validation"]], "The sensitiveness of the gradient descent": [[35, "the-sensitiveness-of-the-gradient-descent"]], "The singular value decomposition": [[5, "the-singular-value-decomposition"], [34, "the-singular-value-decomposition"], [35, "the-singular-value-decomposition"]], "The specific equation to solve for": [[42, "the-specific-equation-to-solve-for"], [43, "the-specific-equation-to-solve-for"]], "The training": [[40, "the-training"], [41, "the-training"]], "The trial solution": [[42, "the-trial-solution"], [42, "id2"], [42, "id3"], [42, "id5"], [43, "the-trial-solution"], [43, "id1"], [43, "id2"]], "The two-dimensional case": [[8, "the-two-dimensional-case"]], "Theoretical Convergence Speed and convex optimization": [[36, "theoretical-convergence-speed-and-convex-optimization"]], "Time decay rate": [[36, "time-decay-rate"]], "To our real data: nuclear binding energies. Brief reminder on masses and binding energies": [[33, "to-our-real-data-nuclear-binding-energies-brief-reminder-on-masses-and-binding-energies"]], "Toeplitz matrices": [[43, "toeplitz-matrices"], [44, "toeplitz-matrices"]], "Topics covered in this course: Statistical analysis and optimization of data": [[33, "topics-covered-in-this-course-statistical-analysis-and-optimization-of-data"]], "Towards the PCA theorem": [[11, "towards-the-pca-theorem"]], "Train and test datasets": [[1, "train-and-test-datasets"], [41, "train-and-test-datasets"]], "Transforming images": [[43, "transforming-images"], [44, "transforming-images"]], "Two parameters": [[38, "two-parameters"], [39, "two-parameters"]], "Two-dimensional Objects": [[3, "two-dimensional-objects"]], "Two-dimensional objects": [[43, "two-dimensional-objects"], [44, "two-dimensional-objects"]], "Type of problem": [[2, "type-of-problem"], [42, "type-of-problem"], [43, "type-of-problem"]], "Types of Machine Learning": [[33, "types-of-machine-learning"]], "Understanding what happens": [[37, "understanding-what-happens"], [38, "understanding-what-happens"]], "Universal approximation theorem": [[40, "universal-approximation-theorem"]], "Updating the gradients": [[40, "updating-the-gradients"], [41, "updating-the-gradients"], [42, "updating-the-gradients"]], "Usage of the above learning rate schedulers": [[41, "usage-of-the-above-learning-rate-schedulers"], [42, "usage-of-the-above-learning-rate-schedulers"]], "Use the books!": [[19, "use-the-books"]], "Useful Python libraries": [[25, "useful-python-libraries"], [33, "useful-python-libraries"]], "Using Autograd": [[13, "using-autograd"]], "Using Automatic differentiation": [[42, "using-automatic-differentiation"]], "Using Keras": [[41, "using-keras"], [42, "using-keras"]], "Using Pytorch with the full MNIST data set": [[42, "using-pytorch-with-the-full-mnist-data-set"]], "Using Scikit-learn": [[39, "using-scikit-learn"]], "Using forward Euler to solve the ODE": [[2, "using-forward-euler-to-solve-the-ode"], [42, "using-forward-euler-to-solve-the-ode"], [43, "using-forward-euler-to-solve-the-ode"]], "Using gradient descent methods, limitations": [[13, "using-gradient-descent-methods-limitations"], [35, "using-gradient-descent-methods-limitations"], [36, "using-gradient-descent-methods-limitations"]], "Using the chain rule and summing over all k entries": [[40, "using-the-chain-rule-and-summing-over-all-k-entries"], [41, "using-the-chain-rule-and-summing-over-all-k-entries"]], "Using the correlation matrix": [[39, "using-the-correlation-matrix"]], "Vanishing gradients": [[41, "vanishing-gradients"]], "Various steps in cross-validation": [[37, "various-steps-in-cross-validation"], [38, "various-steps-in-cross-validation"]], "Verifying the data set": [[44, "verifying-the-data-set"]], "Visualization": [[1, "visualization"], [1, "id1"], [41, "visualization"], [41, "id1"]], "Visualizing the Tree, Classification": [[9, "visualizing-the-tree-classification"]], "Week 34: Introduction to the course, Logistics and Practicalities": [[33, null]], "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression": [[34, null]], "Week 36: Linear Regression and Gradient descent": [[35, null]], "Week 37: Gradient descent methods": [[36, null]], "Week 38: Statistical analysis, bias-variance tradeoff and resampling methods": [[37, null]], "Week 39: Resampling methods and logistic regression": [[38, null]], "Week 40: Gradient descent methods (continued) and start Neural networks": [[39, null]], "Week 41 Neural networks and constructing a neural network code": [[40, null]], "Week 42 Constructing a Neural Network code with examples": [[41, null]], "Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations": [[42, null]], "Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)": [[43, null]], "Week 45, Convolutional Neural Networks (CCNs)": [[44, null]], "Weights and biases": [[41, "weights-and-biases"]], "What Is Generative Modeling?": [[33, "what-is-generative-modeling"]], "What does it mean?": [[34, "what-does-it-mean"], [35, "what-does-it-mean"]], "What is Machine Learning?": [[0, "what-is-machine-learning"]], "What is a good model?": [[0, "what-is-a-good-model"], [33, "what-is-a-good-model"]], "What is a good model? Can we define it?": [[33, "what-is-a-good-model-can-we-define-it"]], "What is the Difference": [[43, "what-is-the-difference"], [44, "what-is-the-difference"]], "When do we stop?": [[36, "when-do-we-stop"]], "Which activation function should I use?": [[1, "which-activation-function-should-i-use"]], "Which activation function should we use?": [[41, "which-activation-function-should-we-use"], [42, "which-activation-function-should-we-use"]], "Why CNNS for images, sound files, medical images from CT scans etc?": [[43, "why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc"], [44, "why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc"]], "Why Combine Momentum and RMSProp?": [[36, "why-combine-momentum-and-rmsprop"]], "Why Linear Regression (aka Ordinary Least Squares and family)": [[33, "why-linear-regression-aka-ordinary-least-squares-and-family"]], "Why multilayer perceptrons?": [[39, "why-multilayer-perceptrons"], [40, "why-multilayer-perceptrons"]], "Why resampling methods": [[37, "why-resampling-methods"]], "Why resampling methods ?": [[37, "id1"], [38, "why-resampling-methods"]], "Why the Jacobian?": [[43, "why-the-jacobian"]], "Why the jacobian?": [[42, "why-the-jacobian"]], "Wisconsin Cancer Data": [[7, "wisconsin-cancer-data"]], "With Lasso Regression": [[35, "with-lasso-regression"]], "Wrapping it up": [[37, "wrapping-it-up"]], "Writing Our First Generative Adversarial Network": [[4, "writing-our-first-generative-adversarial-network"]], "Writing our own PCA code": [[11, "writing-our-own-pca-code"]], "Writing the Cost Function": [[35, "writing-the-cost-function"]], "XGBoost: Extreme Gradient Boosting": [[10, "xgboost-extreme-gradient-boosting"]], "Yet another Example": [[35, "yet-another-example"]], "a) Expression for Ridge regression": [[17, "a-expression-for-ridge-regression"]], "scikit-learn implementation": [[1, "scikit-learn-implementation"], [41, "scikit-learn-implementation"]]}, "docnames": ["chapter1", "chapter10", "chapter11", "chapter12", "chapter13", "chapter2", "chapter3", "chapter4", "chapter5", "chapter6", "chapter7", "chapter8", "chapter9", "chapteroptimization", "clustering", "exercisesweek34", "exercisesweek35", "exercisesweek36", "exercisesweek37", "exercisesweek38", "exercisesweek39", "exercisesweek41", "exercisesweek42", "exercisesweek43", "exercisesweek44", "intro", "linalg", "project1", "project2", "schedule", "statistics", "teachers", "textbooks", "week34", "week35", "week36", "week37", "week38", "week39", "week40", "week41", "week42", "week43", "week44", "week45"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["chapter1.ipynb", "chapter10.ipynb", "chapter11.ipynb", "chapter12.ipynb", "chapter13.ipynb", "chapter2.ipynb", "chapter3.ipynb", "chapter4.ipynb", "chapter5.ipynb", "chapter6.ipynb", "chapter7.ipynb", "chapter8.ipynb", "chapter9.ipynb", "chapteroptimization.ipynb", "clustering.ipynb", "exercisesweek34.ipynb", "exercisesweek35.ipynb", "exercisesweek36.ipynb", "exercisesweek37.ipynb", "exercisesweek38.ipynb", "exercisesweek39.ipynb", "exercisesweek41.ipynb", "exercisesweek42.ipynb", "exercisesweek43.ipynb", "exercisesweek44.ipynb", "intro.md", "linalg.ipynb", "project1.ipynb", "project2.ipynb", "schedule.md", "statistics.ipynb", "teachers.md", "textbooks.md", "week34.ipynb", "week35.ipynb", "week36.ipynb", "week37.ipynb", "week38.ipynb", "week39.ipynb", "week40.ipynb", "week41.ipynb", "week42.ipynb", "week43.ipynb", "week44.ipynb", "week45.ipynb"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 40, 41, 42, 43, 44], "0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43, 44], "00": [0, 1, 5, 11, 33, 34, 40, 41, 43, 44], "000": [1, 3, 41, 43, 44], "000000": [], "00000000e": [], "001": [2, 8, 13, 21, 35, 36, 42, 43, 44], "004": 5, "004113634617443131": 34, "004113634617443139": 34, "00411363461744314": 34, "004113634617443147": 34, "005b82": [], "00622f": [], "00727646693": [0, 33], "0072b2": [], "00749c": [], "0076268": 21, "008561": [], "0086649156": [0, 33], "00e0e0": [], "01": [0, 1, 2, 5, 9, 11, 13, 17, 32, 33, 34, 36, 38, 39, 40, 41, 42, 43, 44], "010726": [], "0110": 30, "01719003e": [], "02": [0, 4, 7, 12, 33, 38, 39, 41, 43, 44], "02334824": [], "023b95": [], "024c1a": [], "025": 28, "02857": 4, "02f": 6, "03077640549": 4, "03097597e": [], "031": 5, "04": 11, "0458": 9, "05": [4, 6], "0550ae": [], "05767": 40, "062292565": 4, "062435": [], "06730814": [], "07": [], "0713": [0, 33], "07285": 3, "08": 30, "08078025e": [], "080808": [], "08336233266": 4, "08376632": 34, "083766322923899": 34, "0837663229239043": 34, "0917": 9, "0969da4a": [], "0d1117": [], "0n": [0, 33], "0x113e21950": 17, "1": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 26, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 43, 44], "10": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "100": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 21, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "1000": [0, 1, 2, 4, 5, 8, 11, 13, 14, 18, 19, 21, 23, 25, 28, 30, 33, 35, 36, 38, 39, 41, 42, 43], "10000": [2, 5, 6, 10, 11, 13, 30, 37, 42, 43], "100000": 8, "10001": 10, "1001": 30, "1002": 30, "1003": 30, "1005": 30, "1007": [37, 38], "1009": 30, "101": 16, "1011": 30, "1013": 30, "1013904243": 30, "1015": 30, "102": 16, "1023": 30, "1024": [3, 44], "1026": 30, "1027": 30, "103": [1, 41], "1030": 30, "1037": 30, "1038": 30, "1040": 30, "1047": 30, "107": 16, "108": [], "10e": [41, 42], "10th": 9, "10x": [0, 28, 33], "10y": 28, "11": [0, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 23, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "110": [], "1100": 30, "1101": 30, "111": [1, 7, 12, 38, 39, 40, 41], "112": 16, "11340253": [], "114": 43, "11590451": [], "116": 16, "116329": [], "116633": [], "117": 16, "118": 16, "12": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 18, 21, 23, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44], "120": [3, 43, 44], "1200": 43, "121": [8, 9, 10, 16], "1215pm": [31, 33], "122": [8, 9, 10], "124": [0, 33], "125": 16, "127": [4, 16], "128": [3, 4, 13, 36, 43, 44], "129": 16, "1298": 9, "12pm": [31, 33], "13": [0, 2, 9, 12, 22, 23, 26, 28, 30, 33, 39, 42, 43], "1307": 44, "131": 16, "133": [7, 38], "135": 16, "136": 16, "14": [0, 2, 4, 6, 8, 9, 10, 12, 26, 28, 30, 32, 34, 37, 38, 42, 43, 44], "141": 16, "1412": 36, "141414": [], "143": 16, "1446729567": 4, "149": 16, "14g": [6, 37], "15": [0, 2, 4, 6, 7, 8, 9, 12, 13, 27, 28, 30, 33, 35, 36, 38, 39, 42, 43, 44], "150": [4, 8, 21, 38, 39], "1502": 40, "152": 16, "153760": [], "156": 16, "157": [], "158": [], "159": 16, "15g": [6, 37], "15pm": 33, "16": [1, 2, 3, 4, 5, 8, 9, 10, 21, 30, 33, 35, 37, 42, 43, 44], "160": 16, "1603": 3, "161": 16, "162": 16, "16231451": 4, "163": 16, "16384": [3, 44], "164": 16, "167": 16, "17": [1, 2, 8, 22, 30, 41, 42, 43], "172": 16, "173": 16, "175": [37, 38], "176": 16, "178": 16, "179": 16, "1797": [1, 41], "18": [2, 6, 7, 8, 9, 10, 30, 33, 37, 38, 42, 43], "1807": 4, "181036": [], "18392847": [], "18c1c4": [], "19": [2, 30, 33, 37, 42], "192": [37, 38], "1940": [], "1943": [12, 39, 40], "19569961": 34, "19680801": [], "1970": [26, 33], "1973": 9, "1979": [6, 37], "1989": 40, "1991": 40, "1_1": [12, 39], "1_2": [12, 39], "1_3": [12, 39], "1cm": [0, 8, 10, 30, 33, 40, 41], "1d": [1, 2, 3, 24, 38, 39, 41, 42, 43, 44], "1e": [2, 4, 13, 14, 36, 38, 39, 41, 42, 43], "1e10": 14, "1e1e1": [], "1e4": 6, "1f": 1, "1ffvbn0xlhv": 22, "1k": 26, "1n": [0, 33], "1x": [0, 33], "1zkibvqf": 21, "2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 25, 26, 27, 30, 32, 36, 37, 38, 39, 43, 44], "20": [0, 1, 2, 6, 7, 8, 16, 17, 23, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "200": [0, 2, 3, 4, 8, 9, 10, 38, 39, 42, 43, 44], "2000": [0, 34], "2001": [], "2004": [13, 35], "2006": 32, "2007": [], "20072279": [], "2008": [33, 36], "2009": [], "2010": [1, 41], "2011": [1, 36, 41], "2012": 36, "2013": [], "2014": [4, 36], "2015": [1, 41, 42], "2016": [0, 33], "2017": 42, "2018": [0, 6, 34, 37, 38], "2019": [], "2020": [], "2021": [6, 14, 34, 36], "2022": [28, 33, 40, 41], "2023": [41, 42], "2024": [21, 37], "2025": [18, 21, 22, 23, 24, 28, 33, 34, 35, 36, 37, 42, 43, 44], "21": [0, 1, 5, 7, 9, 12, 23, 26, 33, 34, 35, 38, 39, 40, 41, 43, 44], "2116753732": 4, "215pm": [31, 33], "2167072": [], "22": [0, 1, 5, 12, 13, 23, 26, 33, 34, 35, 39, 41, 43, 44], "221": 8, "225": 4, "22948497": [], "23": [1, 12, 23, 26, 39, 41], "24": [0, 1, 23, 26, 33, 41], "242424": [], "24292f": [], "25": [2, 3, 4, 5, 6, 8, 9, 11, 24, 34, 42, 43, 44], "250": [2, 4, 7, 9, 38, 42, 43], "25000": [], "250154": [], "252124": [], "253775": [], "255": [3, 28, 42, 43, 44], "256": [4, 36], "25x": [27, 28], "26": [], "26303845": [], "264": [], "265": [], "265109911": 4, "266": [], "269": [], "27": [1, 24, 41], "270": [], "278": [35, 36], "27n_": 30, "28": [1, 3, 4, 41, 42, 43, 44], "283": [35, 36], "2830637392": 4, "2861": 30, "2873": 9, "2882": 30, "2886": 30, "2890": [0, 33], "2892": 30, "29": 34, "2915": 30, "2931": 33, "29364655": [], "294399745619595": [], "296247": [], "2968": 33, "2980": [21, 33], "298273": [], "298375": [], "299": 43, "2990": 33, "2_": [12, 39], "2_1": [12, 39], "2_2": [12, 39], "2_3": [12, 39], "2_i": [12, 39], "2_m": [6, 30, 37], "2_t": 13, "2_x": 30, "2a": 17, "2a1968": [], "2b": 30, "2b2b2b": [], "2c8f433990d1": 36, "2cm": 8, "2d": [1, 3, 11, 12, 25, 28, 33, 38, 39, 40, 41, 43, 44], "2e": [6, 37, 38], "2f": [0, 7, 9, 10, 11, 12, 23, 33, 38, 39, 42, 44], "2g": [2, 42, 43], "2g_i": [2, 42, 43], "2k": 3, "2m": [6, 37], "2mvizaqfst8": 34, "2n": [0, 2, 3, 33, 34, 42], "2nd": 9, "2p": [30, 40, 43, 44], "2pt": 4, "2x": [0, 3, 8, 13, 33, 40], "2x_ix_jy_iy_j": 8, "2x_j": 8, "2xb": 40, "2y_i": 10, "2y_j": 8, "3": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 35, 36, 37, 38, 39, 43], "30": [0, 1, 4, 6, 7, 10, 13, 31, 36, 37, 38, 39, 41, 42], "300": [38, 39], "30000": [0, 33], "3072": [3, 43, 44], "3081": 44, "31": [12, 23, 24, 26, 30, 39], "315": [6, 34, 36], "3155": [0, 5, 6, 34, 35, 36, 37, 38], "32": [3, 4, 6, 12, 13, 23, 26, 30, 36, 39, 43, 44], "3200": [1, 41], "3250": [1, 41], "3297": [], "33": [12, 23, 26, 31, 39], "3303": [], "3310": [], "332331": [], "333": [7, 38], "3331": [], "3337": [], "34": 26, "3436": [0, 33], "3437": [0, 33], "35": [0, 6, 27, 33, 35, 36], "3581341341": 4, "359": [5, 35], "36": [0, 5, 6, 18, 27, 30], "37": [27, 35, 37, 38], "370782966": 4, "38": [27, 30], "387": [37, 38], "39": [0, 24, 27, 28, 31, 33], "3d": [2, 3, 4, 6, 13, 16, 37, 38, 42], "3d73a9": [], "3f": [1, 3, 9, 41, 42, 44], "3n": 26, "3pi": [43, 44], "3x": [2, 8, 42, 43], "3x_0x_1": 40, "3x_i": [2, 42, 43], "3y": 8, "4": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 24, 26, 27, 28, 30, 33, 35, 36, 37, 38, 39, 40, 42, 43, 44], "40": [1, 6, 31, 33, 37, 38, 41, 42], "400": [4, 43], "4000": 33, "40008b9a5380fcacce3976bf7c08af5b": 36, "4050": [32, 33], "41": [23, 26, 28, 42, 43], "4155": [2, 15, 42, 43], "41589548": [], "42": [1, 4, 8, 9, 10, 23, 26, 28, 38, 39, 40, 42, 43], "43": [0, 7, 26], "4310": 33, "436462435": 4, "437a6b": [], "44": [0, 26, 35, 36, 42], "45": [31, 33], "46": [31, 33], "462": [7, 38], "468": 42, "47": [31, 33], "473d18": [], "479465113": 4, "47958494": [], "48": [], "48257387": [31, 33], "49": [5, 6, 11], "49152": [3, 44], "4940954": [0, 33], "4990": 30, "4992": 30, "4997": 30, "4c4b4be8": [], "4c4c7f": [9, 10], "4d": [3, 44], "4f": [6, 28, 38, 39, 42, 44], "4pm": [31, 33], "4y": 8, "4y_i": 10, "5": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "50": [1, 2, 3, 4, 6, 7, 8, 10, 13, 24, 28, 33, 34, 36, 37, 39, 40, 41, 42, 43, 44], "500": [1, 3, 4, 6, 9, 10, 13, 36, 37, 38, 41, 43], "5000": [27, 28], "5018": 30, "506": [], "507d50": [9, 10], "50j": 13, "50x10": [1, 41], "51": 10, "510": [1, 41], "512132": [], "515151": [], "5177783846": 4, "52": 38, "53": [9, 38], "5391cf": [], "54": [6, 30], "5411205": [], "54894451": [], "55": [1, 41], "56": [1, 41], "56469864": 21, "56536": [0, 33], "569": 1, "57": [0, 8, 31, 33], "571": [5, 35], "576": 37, "58": [10, 31, 33], "5870": 43, "58a6ff70": [], "591317992": 4, "5ca7e4": [], "5cm": 30, "5f": [8, 36], "5x": [8, 18], "5y": 8, "6": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 18, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "60": [1, 3, 44], "60000": 4, "6019067271": 4, "60610368": 21, "606439": [], "61362": 28, "622cbc": [], "625": [7, 38], "63": [1, 41], "64": [1, 3, 4, 13, 26, 33, 36, 41, 42, 43, 44], "64x50": [1, 41], "65": [1, 8, 9, 41], "66666691": [], "66707b": [], "66ccee": [], "66e9ec": [], "6730c5": [], "6887363571": 4, "69": [16, 30], "69069n_": 30, "691": [], "6980": 36, "6e7681": [], "6e7781": [], "6f98b3": [], "6n_": 30, "7": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 26, 27, 28, 30, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "70": [1, 7, 38, 41], "702c00": [], "70653767": 4, "71": [1, 41], "724": [3, 44], "72f088": [], "73": [], "7304881": [], "737373": [], "75": [5, 6, 8, 11, 37], "76": [31, 33, 38, 43, 44], "760": [43, 44], "765": [7, 38], "77": [31, 33], "7718": 9, "7782028952": 4, "77893972": [], "78": [], "784": 42, "797979": [], "7998f2": [], "79c0ff": [], "7d7d58": [9, 10], "7ee787": [], "7f4707": [], "8": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 19, 21, 26, 28, 30, 31, 33, 35, 38, 39, 41, 42, 43, 44], "80": [0, 1, 5, 8, 17, 34, 41], "800": [4, 7, 38, 43], "8045e5": [], "81": [1, 41], "815am": [31, 33], "81b19b": [], "8250df": [], "84858": [37, 38], "85": [1, 41], "8702784034": 4, "8786ac": [], "88": 33, "8a4600": [], "8b949e": [], "8c8c8c": [], "8f": [6, 37, 38], "8g": [6, 37], "8n": 26, "8x8": [1, 41, 42], "9": [0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 26, 30, 33, 36, 38, 39, 41, 42, 43, 44], "90": 1, "9040": 9, "91": [31, 33], "912583": [], "91cbff": [], "92": [31, 33], "93": 16, "931": [0, 33], "933": [5, 35], "937": 30, "938": 30, "939": [0, 30, 33], "94": 30, "95": [1, 11, 37, 41], "953800": [], "954": 30, "955820c21e8b": 4, "9579870417283": 21, "96": [6, 37], "960": 30, "961": 30, "962": 30, "9649652536": 4, "96611194e": [], "974eb7": [], "978": [37, 38], "9780387310732": 32, "9780387848570": 32, "9781098134174": 33, "9781492032632": 32, "9781801819312": 33, "97898392": 34, "98": [0, 1, 16, 41], "985": 30, "986": 30, "98661b": [], "989": 30, "9898ff": [9, 10], "99": [13, 16, 36, 37], "991": 30, "992": 30, "993": 30, "996": 5, "996b00": [], "999": [9, 30, 36, 41, 42], "9999": 24, "999999": [], "9e86c8": [], "9e8741": [], "9f4e55": [], "9x": 6, "9y": 6, "A": [2, 3, 5, 6, 7, 10, 11, 12, 13, 15, 16, 19, 20, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 40], "AND": [2, 42, 43], "AS": [], "AT": [], "And": [0, 3, 4, 5, 6, 9, 13, 20, 22, 24, 25, 27, 28, 30, 35, 43, 44], "As": [0, 1, 2, 3, 4, 5, 6, 8, 10, 12, 13, 15, 16, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "At": [0, 4, 6, 13, 20, 23, 33, 36], "BE": [0, 33], "BUT": [], "BY": [], "Be": [2, 18, 25, 33, 42, 43], "Being": 13, "But": [0, 1, 2, 3, 5, 6, 9, 10, 16, 21, 28, 30, 34, 37, 38, 41, 42, 43, 44], "By": [0, 3, 5, 6, 12, 13, 17, 19, 23, 26, 33, 34, 35, 36, 37, 39, 44], "FOR": [], "For": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 21, 22, 23, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "IF": [6, 34, 36], "IN": 32, "If": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 18, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "In": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "Ising": [5, 12, 34, 35, 39, 40], "It": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 20, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "Its": [1, 2, 4, 11, 41, 42, 43], "NO": [], "NOT": [], "No": [6, 9, 33, 34, 36, 39, 41, 42], "Not": [0, 1, 5, 6, 34, 35, 36, 37, 39, 41, 42], "OF": [], "ON": [], "OR": 30, "Of": 30, "On": [0, 3, 27, 30, 31, 32, 33, 36, 37, 43, 44], "One": [0, 1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 17, 30, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "Or": [0, 1, 6, 24, 33, 41], "SUCH": [], "Such": [0, 6, 12, 16, 30, 36, 37, 38, 39, 40, 41, 43, 44], "THE": [], "TO": [41, 42], "That": [0, 5, 7, 10, 11, 12, 14, 27, 28, 30, 33, 37, 38, 39, 40, 41, 43, 44], "The": [4, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32], "Then": [0, 1, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 24, 26, 33, 35, 36, 37, 40, 41, 42], "There": [0, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 26, 27, 28, 30, 31, 33, 34, 35, 36, 39, 40, 43, 44], "These": [0, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 17, 18, 22, 23, 26, 27, 28, 30, 31, 33, 34, 35, 36, 40, 41, 42, 43, 44], "To": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 20, 21, 22, 23, 26, 28, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "WITH": [], "Will": [38, 39], "With": [0, 5, 6, 8, 9, 10, 11, 12, 14, 16, 19, 21, 26, 27, 28, 30, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44], "_": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 23, 26, 27, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "_0": [5, 8, 10, 11, 13, 34, 35], "_1": [2, 5, 6, 8, 10, 11, 12, 13, 14, 23, 26, 34, 35, 36, 40, 41, 42, 43], "_2": [2, 5, 8, 11, 12, 13, 26, 34, 36, 39, 42, 43], "_3": 26, "_4": 26, "_9": [13, 36], "__array_finalize__": [], "__class__": [10, 41, 42], "__doc__": [6, 37, 38], "__future__": [8, 9, 40], "__getattribute__": [], "__import__": [], "__init__": [1, 22, 38, 39, 41, 42, 44], "__main__": [2, 42, 43], "__name__": [2, 10, 41, 42, 43], "__new__": [], "__path__": [], "_accuraci": [41, 42], "_add_intercept": [38, 39], "_auto1": [2, 3, 4, 5, 6, 7, 12, 13, 26, 30, 34, 35, 38, 39, 40, 41, 42, 43], "_auto10": [6, 12], "_auto11": 6, "_auto12": 6, "_auto2": [2, 3, 4, 5, 6, 12, 13, 26, 30, 39, 40, 41, 42, 43], "_auto3": [3, 4, 5, 6, 12, 13, 26, 39, 40, 41], "_auto4": [4, 6, 12, 13, 26, 39], "_auto5": [4, 6, 12, 13, 26, 39], "_auto6": [4, 6, 12, 26, 39], "_auto7": [4, 6, 12, 26, 39], "_auto8": [6, 12], "_auto9": [6, 12], "_backpropag": [41, 42], "_build": [0, 25, 27, 28, 32, 33, 41, 42], "_c": [1, 41], "_center": [], "_compile_transl": [], "_compon": 11, "_da": 22, "_data": [], "_depth": 9, "_export": [15, 16, 19], "_feed_forward_sav": 22, "_feedforward": [41, 42], "_format": [41, 42], "_fraction": 9, "_i": [0, 1, 2, 5, 6, 7, 8, 11, 12, 13, 19, 23, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "_j": [0, 1, 2, 3, 5, 6, 8, 13, 19, 27, 34, 35, 36, 37, 38, 41, 42, 43, 44], "_k": [13, 35, 36, 41, 42], "_l": [12, 39, 40, 41, 42], "_lambda": 6, "_leaf": 9, "_m": 10, "_mask": [], "_multilayer_perceptron": [], "_n": [2, 5, 8, 11, 13, 34, 35, 36, 42, 43], "_node": 9, "_norm": [], "_p": [5, 8, 34, 35], "_parse_numpydoc_see_also_sect": [], "_progress_bar": [41, 42], "_pydevd_bundl": [], "_ratio": 11, "_sampl": 9, "_set_classif": [41, 42], "_sigmoid": [38, 39], "_softmax": [38, 39], "_split": [6, 9, 27], "_t": [13, 36], "_test": [6, 27], "_varianc": 11, "_weight": 9, "a0": 3, "a0111f": [], "a0faa0": [9, 10], "a1": [0, 21, 22, 33], "a11": [], "a12236": [], "a2": [0, 21, 22, 33], "a25e53": [], "a2bffc": [], "a3": [0, 33], "a4": [0, 33], "a5d6ff": [], "a_": [0, 1, 16, 26, 33, 34, 40, 41, 43, 44], "a_0": [0, 33, 40, 41, 43, 44], "a_1": [40, 41, 43, 44], "a_1a": [0, 33], "a_2": [40, 41, 43, 44], "a_2a": [0, 33], "a_3": [0, 33, 43, 44], "a_3a": [0, 33], "a_4": [0, 33], "a_4a": [0, 33], "a_h": [1, 41], "a_i": [0, 1, 2, 12, 33, 40, 41, 42, 43], "a_j": [1, 12, 40, 41, 42], "a_k": [0, 1, 12, 40, 41], "a_matric": [41, 42], "aa": [], "aaa": [], "aaron": 32, "ab": [0, 2, 5, 13, 14, 33, 34, 36, 40, 42, 43], "ab6369": [], "ab_channel": [25, 39, 40, 41, 42, 43], "abandon": [1, 41], "abe338": [], "abid": 30, "abil": [0, 10], "abl": [0, 1, 4, 5, 6, 7, 10, 12, 13, 16, 18, 20, 21, 24, 27, 34, 35, 36, 38, 39, 40, 41, 43], "about": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 19, 20, 22, 23, 24, 25, 26, 27, 31, 36, 37, 38, 39, 41, 42, 43, 44], "abov": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 26, 28, 30, 32, 33, 34, 36, 37, 38, 39], "abovement": [6, 27, 33, 37, 38], "abscissa": [13, 35], "absent": 36, "absolut": [0, 2, 5, 6, 13, 33, 34, 35, 37, 38, 42, 43], "absorb": [34, 35], "abstract": [1, 24, 36, 38, 41, 42], "abund": 36, "ac": [], "acc_bin": [38, 39], "acc_multi": [38, 39], "acceler": [13, 36], "accept": [0, 3, 6, 9, 21, 27, 34, 36, 43, 44], "access": [3, 11, 30, 33, 36, 43, 44], "accid": [4, 6, 37, 38], "accompani": [0, 33, 34], "accomplish": [8, 9, 13, 36], "accord": [0, 1, 2, 5, 6, 9, 12, 13, 14, 30, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44], "accordingli": 11, "account": [0, 3, 5, 13, 15, 16, 20, 23, 30, 33, 36, 43, 44], "accumul": [12, 13, 30, 36, 39, 40, 41, 42], "accur": [0, 3, 4, 6, 10, 13, 36, 37, 38, 43, 44], "accuraci": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 21, 23, 24, 28, 33, 34, 35, 38, 39, 40, 41, 42, 44], "accuracy_scor": [0, 1, 10, 21, 22, 28, 33, 38, 39, 41], "accuracy_score_numpi": [1, 41], "acheiv": 21, "achiev": [0, 1, 5, 6, 8, 12, 26, 33, 36, 37, 38, 39, 40, 41], "aco": 30, "acquaint": 25, "acquir": [1, 25, 33, 41], "acr": [], "across": [1, 3, 6, 9, 17, 23, 25, 33, 37, 41, 43, 44], "act": [1, 3, 23, 26, 28, 36, 41, 42, 43, 44], "act_func": [41, 42], "act_func_deriv": [41, 42], "actic": 21, "action": 30, "activ": [0, 2, 3, 4, 9, 15, 22, 24, 29, 31, 33, 36, 43, 44], "activation_d": 22, "activation_func": [21, 22], "activest": [], "actual": [0, 1, 4, 5, 6, 8, 11, 15, 16, 18, 21, 23, 26, 30, 33, 34, 35, 36, 37, 41, 43], "ad": [1, 3, 4, 5, 8, 13, 15, 16, 26, 35, 36, 37, 38, 42, 43, 44], "ada_clf": 10, "adaboostclassifi": 10, "adadelta": [13, 36], "adagrad": [27, 37, 40, 41, 42], "adagradmomentum": [41, 42], "adam": [1, 3, 4, 21, 27, 28, 33, 37, 40, 41, 42, 44], "adam_schedul": [41, 42], "adap": 40, "adapt": [4, 6, 13, 17, 28, 32, 35, 37, 38, 40, 42], "add": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 15, 16, 17, 18, 20, 21, 24, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "add6ff": [], "add_": [], "add_subplot": [1, 7, 12, 14, 38, 39, 41], "add_suplot": [42, 43], "addendum": 5, "addeventlisten": [], "addit": [0, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15, 21, 25, 26, 27, 28, 30, 31, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44], "addition": [12, 13, 35, 36, 39, 40], "address": [1, 9, 11, 13, 33, 36, 41], "adjac": [3, 12, 39, 40, 43, 44], "adjoint": [5, 34], "adjust": [0, 5, 12, 13, 35, 36, 39], "admir": [0, 33], "advanc": [4, 6, 12, 32, 33, 36, 37, 38, 39, 40], "advantag": [1, 3, 5, 6, 10, 13, 19, 24, 26, 28, 35, 36, 37, 38, 41, 43, 44], "advent": 43, "adversari": 33, "advis": [], "afecionado": 33, "affect": [3, 15, 19, 24, 41, 42, 43, 44], "affin": [0, 3, 8, 11, 34, 40, 43, 44], "afford": [3, 44], "aficionado": 33, "aforement": 14, "african": [], "after": [0, 1, 2, 4, 5, 6, 9, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 40, 41, 42, 43, 44], "afterward": [0, 33], "ag": [0, 7, 33, 34, 38], "ag_0": [2, 42, 43], "again": [0, 1, 4, 5, 6, 7, 8, 10, 11, 12, 13, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42], "against": [1, 4, 7, 10, 38, 41], "agegroup": [7, 38], "agegroupmean": [7, 38], "aggreg": [9, 10, 36, 43, 44], "agorithm": 10, "agre": [5, 6, 30, 34, 35, 36, 37], "agreement": [13, 36], "ahead": 9, "ai": [0, 32], "aid": [11, 20, 36], "aim": [0, 1, 4, 6, 7, 11, 14, 16, 17, 19, 20, 25, 26, 27, 28, 34, 37, 38, 39, 40, 41], "ainv": 5, "airplan": [3, 44], "aka": [5, 28], "al": [0, 2, 4, 16, 17, 20, 28, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42], "alarm": [5, 7], "aldo": 34, "alexanderamini": 43, "alexsmola": 42, "algebra": [0, 3, 5, 13, 25, 34, 35, 37, 43, 44], "algorithm": [0, 1, 2, 4, 5, 6, 7, 8, 13, 14, 16, 25, 26, 27, 30, 32, 37, 38, 39, 43, 44], "align": [0, 2, 5, 6, 7, 8, 13, 30, 33, 34, 35, 37, 38, 39, 42, 43, 44], "all": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 42, 43, 44], "allclos": 21, "allevi": [1, 13, 35, 41], "alloc": [3, 26, 43, 44], "allow": [0, 1, 2, 3, 5, 6, 8, 10, 13, 15, 23, 25, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "almost": [0, 1, 6, 8, 11, 13, 30, 35, 36, 37, 38, 39, 41], "alon": [2, 9, 36, 42, 43], "along": [2, 3, 4, 5, 6, 9, 10, 11, 15, 20, 21, 22, 25, 26, 33, 34, 35, 37, 38, 41, 42, 43, 44], "alpha": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 23, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "alpha_": [10, 36], "alpha_0": [3, 43, 44], "alpha_1": [3, 43, 44], "alpha_2": [3, 43, 44], "alpha_i": [3, 13, 43, 44], "alpha_k": [13, 43, 44], "alpha_m": 10, "alpha_n": 3, "alpha_opt": 13, "alreadi": [2, 3, 4, 5, 6, 10, 12, 15, 22, 25, 26, 30, 33, 34, 35, 38, 39, 40, 42, 43], "also": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "alter": [1, 41], "altern": [0, 1, 4, 5, 6, 8, 9, 11, 13, 15, 18, 23, 26, 27, 33, 34, 36, 37, 38, 41, 43, 44], "although": [0, 1, 5, 6, 8, 10, 13, 16, 19, 20, 33, 36, 37, 38, 40, 41, 42, 43, 44], "alwai": [0, 3, 5, 6, 12, 13, 16, 19, 21, 22, 27, 28, 30, 33, 34, 35, 36, 37, 39, 40, 43, 44], "am": 4, "ambit": [40, 41], "ame2016": [0, 33], "american": [], "amjith": [], "among": [0, 3, 5, 9, 10, 12, 23, 26, 33, 34, 39, 40, 43, 44], "amongst": [5, 37], "amount": [0, 1, 3, 4, 6, 8, 10, 14, 25, 37, 38, 40, 41, 42, 43, 44], "an": [1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "an_": 30, "anaconda": [0, 1, 25, 27, 33, 41, 42], "analogi": 13, "analys": [6, 37, 38], "analysi": [1, 3, 4, 7, 14, 19, 23, 24, 26, 32, 36, 39, 41, 43, 44], "analyt": [2, 3, 5, 6, 7, 12, 13, 17, 22, 24, 25, 27, 33, 34, 35, 36, 37, 38, 39, 40, 43], "analyz": [0, 1, 3, 4, 5, 6, 16, 27, 30, 34, 35, 36, 42, 44], "andrew": [1, 41], "angl": [0, 3, 9, 34, 36], "anharmon": 3, "ani": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 19, 21, 30, 33, 34, 36, 37, 40, 41, 42, 43, 44], "anim": [4, 12, 39, 40], "ann": [12, 39, 40], "annot": [0, 1, 3, 7, 8, 33, 39, 41, 42, 44], "announc": 33, "anom": [], "anomali": [], "anonym": 18, "anoth": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 26, 27, 28, 30, 33, 34, 36, 40, 41, 42, 43, 44], "ansatz": [0, 18, 33], "answer": [0, 1, 3, 5, 6, 19, 22, 24, 26, 27, 28, 31, 33, 37, 41], "antialias": [2, 6, 42, 43], "anticip": 4, "anymor": [1, 8, 41], "anyon": [4, 8, 15], "anyth": [1, 15, 16, 21, 22, 30, 41, 42], "anytim": [31, 33], "anywai": [], "apach": [1, 41, 42], "apart": [11, 13, 35, 36], "api": [1, 25, 33, 41, 42], "appar": [2, 42, 43], "appear": [0, 1, 3, 13, 26, 30, 40, 41, 42, 44], "append": [1, 3, 4, 8, 9, 13, 19, 21, 22, 33, 36, 38, 39, 41, 42, 44], "appendic": [27, 28], "appendix": 27, "appli": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 18, 27, 28, 30, 32, 33, 34, 36, 37, 38, 39, 40, 41, 43, 44], "applic": [0, 1, 3, 4, 5, 6, 7, 9, 12, 13, 16, 24, 26, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "apply_gradi": 4, "approach": [1, 2, 4, 5, 6, 9, 10, 11, 12, 13, 15, 16, 18, 21, 24, 25, 27, 30, 32, 34, 35, 40, 41, 42, 43], "approch": 27, "appropri": [2, 6, 9, 12, 13, 17, 25, 30, 36, 37, 38, 39, 42, 43], "approv": 33, "approx": [0, 2, 3, 6, 10, 11, 13, 18, 27, 30, 33, 35, 36, 37, 42, 43, 44], "approxim": [0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 13, 19, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43], "apt": [0, 25, 27, 33], "aq": 30, "ar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "aragorn": 33, "arang": [1, 3, 4, 6, 7, 9, 10, 12, 13, 33, 36, 38, 39, 41, 42, 44], "arbitrari": [1, 4, 6, 8, 12, 13, 30, 35, 37, 39, 40, 41, 42], "arbitrarili": [0, 1, 11, 33, 36, 41], "arc": 6, "architectur": [3, 4, 12, 24, 28, 40, 42, 43, 44], "archiv": [27, 28], "area": [0, 3, 6, 23, 32, 33, 43, 44], "argmax": [1, 11, 21, 38, 39, 41], "argmin": [4, 10, 14], "argsort": 11, "argu": [1, 13, 28, 41], "arguement": 19, "argument": [0, 2, 3, 5, 11, 12, 13, 17, 21, 33, 34, 36, 37, 39, 40, 41, 42, 43, 44], "aris": [0, 6, 12, 13, 30, 33, 35, 37, 38], "arithmet": [0, 13, 26, 33], "arm": [6, 34, 36], "armadillo": 26, "armin": [], "arnulf": [40, 41], "around": [0, 1, 4, 5, 6, 11, 18, 21, 22, 27, 28, 30, 33, 37, 38, 39, 40, 41], "arrai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 16, 18, 21, 23, 25, 27, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "arrang": [3, 33, 43, 44], "array_equ": [38, 39], "arraybox": 13, "arriv": [0, 6, 9, 11, 19, 26, 30, 33, 37], "arrow": [12, 39, 40, 41, 42], "arrowprop": 8, "art": [0, 1, 25, 41], "articl": [0, 3, 4, 6, 10, 19, 28, 33, 34, 35, 36, 37, 38, 43, 44], "artifici": [0, 2, 7, 12, 24, 32, 33, 38, 42, 43], "artificialneuron": [12, 39, 40], "arug": 13, "arxiv": [3, 4, 36, 40], "as_fram": 28, "asarrai": [0, 6, 9, 34, 36], "asid": 34, "ask": [5, 6, 11, 12, 15, 19, 27, 28, 37, 40, 41], "aspect": [0, 6, 25, 33, 34, 40, 41], "assembl": [3, 44], "assembli": [0, 33], "assert": [4, 41, 42], "assess": [0, 6, 24, 27, 33, 34, 37, 38], "asset": [], "assici": 4, "assign": [0, 7, 8, 9, 12, 13, 14, 15, 29, 31, 32, 33, 38, 39, 41], "associ": [0, 6, 9, 12, 14, 30, 33, 37, 38, 39, 40], "assum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 19, 24, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "assumpt": [0, 3, 5, 6, 9, 11, 30, 33, 34, 38, 43, 44], "ast": [0, 5, 6, 33, 37], "astyp": [4, 9, 10, 38, 39, 42, 43], "asymmetri": [0, 33], "asymptot": [4, 6, 36, 37, 38], "atom": [0, 33], "attain": 36, "attempt": [0, 4, 6, 7, 8, 10, 33, 34, 36, 38, 40, 41], "attend": 33, "attent": [0, 26, 33], "attract": [0, 10, 33], "attribut": [0, 9, 22, 33, 41, 42], "auc": 23, "audi": [0, 33], "audio": [3, 4, 43, 44], "augment": [43, 44], "august": [33, 34], "aurelien": [0, 32, 33], "austfjel": 6, "auth": 15, "authent": 15, "author": [0, 1, 10, 30, 41], "authour": 33, "auto": [9, 10, 28, 30, 41, 42], "auto_exampl": [21, 27, 34], "autocor": 30, "autocorrelation_tim": 30, "autocorrelform": 30, "autocovari": 30, "autoencod": [4, 25, 33], "autoencond": 25, "autograd": [21, 25, 28, 33, 40, 41], "autograd_compliant_predict": 22, "autograd_gradi": 22, "autograd_one_lay": 22, "autom": [0, 25, 32, 33], "automac": 26, "automag": 33, "automat": [0, 1, 2, 3, 4, 11, 16, 21, 22, 25, 26, 28, 33, 39, 41, 44], "automobil": [3, 44], "autonom": 4, "avail": [0, 1, 4, 6, 10, 11, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 37, 38, 41, 42], "avali": [20, 24, 27, 28], "averag": [0, 1, 3, 6, 9, 10, 13, 14, 23, 30, 31, 33, 34, 37, 38, 41, 42, 43, 44], "avg_loss": 42, "avoid": [0, 4, 5, 6, 9, 11, 13, 18, 21, 26, 34, 36, 37, 38, 41, 42], "awai": [2, 3, 6, 34, 36, 40, 42, 43, 44], "awar": [2, 10, 42, 43], "award": [31, 33], "ax": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 21, 26, 27, 28, 33, 37, 38, 39, 41, 42, 43, 44], "axes3d": [2, 6, 13, 35, 36, 42, 43], "axes_grid1": 6, "axhlin": 8, "axi": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 21, 24, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "axiom": 5, "axvlin": [4, 8], "axvspan": 4, "b": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "b1": [8, 21, 22], "b19db4": [], "b1bac4": [], "b2": [8, 21, 22], "b3": 8, "b35900": [], "b89784": [], "b_": [0, 1, 26, 40, 41], "b_0": [0, 40], "b_1": [0, 2, 12, 13, 36, 39, 40, 41, 42, 43], "b_2": [0, 13, 40, 41], "b_5": [13, 36], "b_g": [21, 22], "b_group": 9, "b_i": [0, 1, 2, 12, 33, 39, 40, 41, 42, 43], "b_ia_": [0, 33], "b_ia_i": 0, "b_index": 9, "b_j": [1, 12, 39, 40, 41, 42], "b_k": [0, 1, 12, 13, 36, 39, 40, 41], "b_m": [12, 39], "b_score": 9, "b_valu": 9, "ba": 36, "babcock": 33, "bach": 36, "bachelor": [29, 31], "back": [0, 3, 4, 5, 6, 8, 9, 10, 15, 16, 21, 26, 28, 30, 33, 36, 44], "backbon": 26, "backend": [1, 4, 41, 42], "background": [32, 33, 41], "backprogag": 22, "backpropag": [1, 21, 28, 36, 40, 41, 42], "backpropog": 22, "backslash": [], "backtrack": 9, "backup": 26, "backward": [1, 2, 4, 12, 22, 26, 36, 40, 41, 42, 43, 44], "bad": [6, 17, 28, 34, 41, 42], "badli": 30, "bag": [9, 25, 33], "bag_clf": 10, "baggin": 33, "baggingboot": 10, "baggingclassifi": 10, "baggingtre": 10, "bailei": [], "balanc": [6, 23, 36, 37, 38], "ballpark": 18, "baluka": [42, 43], "band": 26, "bandwidth": 26, "banner": [], "bar": [0, 6, 11, 27, 33, 41, 42], "barber": 32, "bare": [4, 10], "base": [0, 1, 3, 4, 5, 7, 8, 9, 10, 14, 15, 16, 17, 25, 30, 31, 32, 33, 34, 35, 38, 39, 40, 41, 43, 44], "baselin": 23, "basi": [5, 7, 8, 10, 11, 12, 13, 26, 34, 35, 38, 39, 40], "basic": [6, 8, 12, 13, 14, 15, 24, 25, 27, 28, 30, 33, 37, 41, 42], "basin": 36, "batch": [3, 4, 11, 12, 13, 21, 35, 38, 39, 42, 44], "batch_idx": 44, "batch_shap": 4, "batch_siz": [1, 3, 4, 41, 42, 44], "batchnorm": 4, "bay": [7, 38, 39], "baydin": 40, "bayesian": [5, 25, 32, 33], "bbbbbb": [], "beauti": [], "becam": [], "becaus": [0, 1, 2, 3, 4, 5, 6, 8, 9, 12, 13, 14, 24, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "becom": [0, 1, 2, 5, 6, 7, 9, 12, 13, 19, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "been": [0, 1, 2, 3, 4, 5, 6, 11, 12, 13, 19, 20, 24, 25, 26, 27, 28, 33, 34, 36, 37, 39, 40, 41, 42, 43], "befor": [0, 1, 2, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 24, 26, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 44], "beforehand": [0, 30, 33], "began": [], "begin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 22, 23, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "beginn": 28, "behav": [1, 6, 13, 35, 37, 38, 41], "behavior": [0, 1, 13, 33, 35, 36, 41], "behaviour": [12, 36, 39, 40, 41], "behind": [0, 1, 6, 8, 13, 24, 33, 35, 41], "being": [0, 1, 2, 3, 4, 5, 7, 8, 10, 11, 12, 13, 17, 20, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44], "believ": [9, 26], "belong": [7, 8, 9, 13, 14, 35, 38, 39, 41, 42], "below": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 18, 21, 22, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "benchmark": [10, 28], "benefici": [1, 13, 41], "benefit": [0, 1, 4, 11, 13, 25, 33, 35, 36, 41, 43, 44], "bengio": [1, 28, 32, 33, 34, 36], "benign": [1, 7, 39], "benno": [40, 41], "berner": [40, 41], "besid": [4, 5, 35], "bessel": [5, 34, 37], "best": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 18, 21, 23, 28, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "beta": [1, 3, 10, 11, 13, 16, 17, 19, 33, 34, 35, 41, 42, 43, 44], "beta1": [], "beta2": [], "beta_": [3, 13, 17, 34, 43, 44], "beta_0": [1, 3, 13, 34, 41, 43, 44], "beta_1": [1, 3, 10, 13, 34, 36, 41, 43, 44], "beta_1m_": 36, "beta_1x_i": 13, "beta_2": [3, 13, 36, 43, 44], "beta_2v_": 36, "beta_3": [3, 43, 44], "beta_i": [3, 36, 43, 44], "beta_j": [13, 34], "beta_k": 13, "beta_linreg": 13, "beta_m": 10, "beta_mg_m": 10, "beta_n": 3, "better": [0, 1, 2, 3, 4, 6, 9, 10, 11, 12, 13, 19, 20, 22, 23, 33, 34, 36, 37, 41, 42, 43, 44], "between": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "beyond": [0, 1, 5, 6, 8, 13, 33, 34, 35, 36, 41], "bf": [13, 14, 26, 30, 35], "bf5400": [], "bg": 33, "bgd": [13, 36], "bia": [0, 1, 2, 3, 5, 8, 9, 10, 12, 13, 20, 21, 22, 24, 28, 33, 34, 35, 39, 40, 41, 42, 43, 44], "bias": [1, 2, 3, 5, 6, 9, 12, 19, 21, 22, 28, 36, 37, 39, 42, 43, 44], "bib": [], "bibliographi": [27, 28], "bibtex": [], "big": [0, 1, 2, 5, 6, 14, 19, 36, 37, 41, 42, 43], "bigger": [1, 6, 34, 41], "bigl": 23, "bigr": [12, 23, 39], "bike": 9, "bilbo": 33, "billion": [3, 12, 25, 36, 39, 40, 43, 44], "bin": [7, 30, 39], "binari": [0, 3, 5, 7, 9, 10, 12, 23, 28, 33, 38, 39, 42, 44], "binary_cross_entropi": [38, 39], "binary_result": [38, 39], "binarycrossentropi": 4, "bind": 0, "binomi": [25, 30, 33], "binsboot": [6, 37], "bioinformat": 0, "biolog": [1, 12, 39, 40, 41], "bios1100": [25, 33], "bird": [0, 3, 44], "birth": 33, "bishop": [32, 33], "bit": [1, 4, 19, 21, 26, 30, 33, 41, 42], "bitwis": 30, "bivari": [2, 42, 43], "bk": [13, 36], "bla": [26, 33], "black": [8, 9, 14], "blame": [], "block": [6, 10, 25, 26, 30, 33, 37, 38, 43, 44], "blockquot": [], "blog": [28, 33], "blogpost": 4, "blue": [0, 3, 43, 44], "bm": [], "bmatrix": [0, 1, 3, 5, 7, 8, 11, 13, 26, 33, 34, 35, 36, 38, 39, 40, 41, 43, 44], "bmi": [1, 41], "bnb2fevkeeo": 43, "bodi": [0, 1, 4, 12, 39, 40, 41], "bold": 1, "boldfac": [0, 5, 16, 34, 35], "boldsymbol": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 19, 27, 33, 35, 36, 38, 39, 40, 41, 42, 43, 44], "boltzmann": [12, 25, 33, 39, 40], "book": [17, 27, 28, 32, 33, 34, 37, 38, 42, 43, 44], "book1": 32, "bool": [], "boolean": [4, 17], "boost": [1, 9, 25, 33, 41], "boostrap": 10, "bootstrap": [1, 13, 19, 25, 27, 33, 36, 41, 42], "born": 36, "borrow": 33, "boston_dataset": [], "bot": 8, "both": [0, 1, 4, 5, 6, 8, 9, 10, 13, 14, 15, 16, 17, 19, 23, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "bottl": [7, 38, 39], "bottou": 36, "bound": [8, 12, 36, 39, 40, 41, 42], "boundari": [2, 4, 8, 11, 12, 42, 43], "bousquet": 36, "bower": [], "box": [4, 9, 21, 22], "boyd": [8, 13, 35], "bracket": [4, 30], "brain": [1, 7, 12, 38, 39, 40, 41, 42], "branch": [9, 33], "break": [0, 4, 6, 11, 14, 33, 36], "breast": [5, 7, 11, 39], "breviti": 13, "brew": [0, 25, 27, 33], "brg": 8, "brian": [], "brief": [27, 28, 34], "briefli": [0, 16, 19, 28, 33, 37], "bring": [0, 5, 6, 10, 28, 34, 36], "britt": [31, 33], "broad": 0, "broadcast": 21, "broadli": 33, "brought": [13, 25, 33], "brownle": 4, "browser": [15, 33], "brute": [3, 5, 11, 34, 40, 43, 44], "bsd": [], "budget": 36, "buffer_s": 4, "bug": [], "bugfix": [], "bui": 4, "build": [0, 4, 5, 6, 10, 16, 22, 26, 30, 33, 37, 38, 39, 40], "buildmodel_tutori": 28, "built": [1, 3, 4, 6, 37, 38, 41, 42, 43, 44], "bunch": 11, "bundl": [], "busi": [], "bxe2t": [39, 40, 41], "byte": [26, 33], "c": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 26, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "c1": [8, 11], "c2": [8, 11], "c4a2f5": [], "c5e478": [], "c9d1d9": [], "c_": [8, 9, 10, 13, 30, 35, 36], "c_0": 30, "c_1": [12, 39], "c_2": [12, 39], "c_3": [12, 39], "c_4": [12, 39], "c_i": [12, 13, 36, 39], "c_k": 30, "ca": [1, 33], "caab6d": [], "cach": 10, "cal": [0, 8, 10, 12, 13, 35, 36, 40, 41, 42], "calcul": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 19, 22, 26, 28, 30, 33, 36, 37, 38, 39, 40, 41, 42, 43], "california": [27, 28], "call": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 21, 23, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "callabl": [41, 42], "calor": [0, 34], "caltech": [], "cambridg": [13, 32, 35, 40, 41], "can": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 39, 41, 42, 43, 44], "cancel": [0, 13, 33, 34], "cancer": [5, 10, 23, 39], "cancerpd": [7, 39], "candid": [8, 9, 10, 36], "cannot": [0, 1, 4, 5, 6, 7, 8, 9, 27, 30, 34, 35, 36, 39, 41], "canopi": [0, 25, 27, 33], "canva": [15, 16, 19, 20, 24, 27, 28, 33], "cap": 5, "capabl": [0, 1, 8, 13, 25, 33], "capac": [2, 31, 42, 43], "capita": [], "caption": [20, 27, 28], "captur": [4, 11, 12, 23, 33, 39, 40], "car": [3, 4, 44], "card": [0, 7, 33, 38, 39], "cardin": [1, 41], "care": [11, 15, 19, 22, 36], "carefulli": [13, 36], "carlo": [0, 6, 25, 30, 32, 33, 37, 38], "carri": [2, 6, 7, 27, 37, 38, 39, 42, 43], "cart": 10, "case": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 23, 25, 26, 27, 28, 33, 37, 40, 41, 42], "casella": 32, "cast": [1, 41], "cat": [3, 4, 44], "catch": 0, "categor": [0, 1, 3, 9, 11, 33, 38, 39, 41, 42, 44], "categori": [0, 1, 3, 7, 10, 12, 14, 33, 38, 39, 40, 41, 43, 44], "categorical_cross_entropi": [38, 39], "categorical_crossentropi": [1, 3, 41, 42, 44], "caus": [0, 5, 6, 30, 33, 34, 35, 36, 37, 38], "causal": 0, "causat": [0, 33], "cax": 1, "cb": [6, 33], "cbar": 1, "cc": [0, 1, 5, 13, 23, 33, 34, 35, 36, 40, 41, 42], "cc398b": [], "ccbb44": [], "ccc": [5, 12, 23, 35, 39], "cdf": 30, "cdot": [0, 2, 6, 12, 13, 14, 26, 30, 33, 35, 36, 37, 39, 42, 43], "celebr": [13, 35], "cell": [4, 21, 22], "center": [0, 1, 6, 7, 8, 9, 11, 14, 18, 27, 30, 33, 34, 36, 37, 38, 39, 41], "central": [0, 3, 5, 6, 8, 16, 20, 24, 26, 28, 33, 34, 40, 41, 43, 44], "centroid": [14, 30], "centroid_differ": 14, "centuri": [3, 43, 44], "certain": [0, 3, 6, 7, 9, 21, 30, 33, 34, 37, 38, 39, 43, 44], "certainti": 37, "cf": [], "cf222e": [], "cffi": [], "cg": 13, "cha": [], "chain": [0, 1, 13, 22, 25, 30, 33], "challeng": [15, 40], "chanc": [1, 5, 13, 30, 36, 41], "chang": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 19, 21, 22, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "changeabl": 28, "changelog": [], "channel": [3, 43, 44], "chap4": [40, 41], "chapter": [0, 6, 10, 11, 16, 17, 19, 26, 27, 28, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "chapter3": [0, 27], "charact": [0, 3, 5, 33, 34, 35, 43, 44], "character": [8, 9, 10, 12, 30, 39, 41, 42], "characterist": [0, 1, 3, 10, 13, 23, 33, 41, 42, 44], "charg": [0, 33], "charl": [], "charset": [], "chart": 23, "chase": 4, "chatgpt": [15, 27, 28], "chd": [7, 38], "chddata": [7, 38], "cheap": [5, 34, 35, 36], "cheaper": [1, 13, 36, 41], "check": [1, 3, 4, 5, 11, 13, 15, 16, 19, 21, 22, 26, 33, 36, 38, 39, 41, 42], "checkmark": 3, "checkpoint": 4, "checkpoint_dir": 4, "checkpoint_prefix": 4, "chen": 10, "cheng": 34, "chiaramont": [2, 42, 43], "childcar": 16, "children": 16, "choic": [0, 1, 2, 3, 4, 6, 9, 12, 13, 14, 20, 26, 28, 33, 34, 35, 36, 37, 38, 39, 42, 43, 44], "choleski": [5, 26, 34, 35], "choos": [2, 3, 6, 9, 10, 11, 13, 14, 15, 18, 19, 21, 27, 28, 35, 37, 38, 39, 42, 43, 44], "chosen": [0, 1, 2, 6, 8, 9, 10, 13, 16, 30, 33, 35, 36, 37, 38, 41, 42, 43], "chosen_datapoint": [1, 41], "christian": 32, "christoph": [32, 33], "chunk": 36, "cifar": [3, 44], "cifar10": [3, 44], "circ": [1, 12, 36, 40, 41], "circl": [0, 8, 12, 34, 36, 39, 40], "circuit": 3, "circumfer": 9, "circumv": [1, 5, 13, 34, 35, 36, 41], "citat": [], "cite": [20, 27, 28], "ckpt": 4, "cl": [38, 39], "claim": 24, "clarifi": 21, "clariti": 30, "class": [0, 1, 3, 4, 6, 7, 8, 9, 11, 12, 13, 21, 22, 23, 30, 33, 37, 41, 42, 43, 44], "class0": [38, 39], "class1": [38, 39], "class_nam": [3, 9, 44], "class_to_index": [38, 39], "class_val": 9, "class_valu": 9, "classic": [7, 9, 13, 28, 39], "classif": [0, 3, 5, 6, 7, 8, 11, 12, 21, 23, 24, 25, 27, 32, 33, 34, 37, 43, 44], "classifi": [0, 1, 4, 7, 9, 10, 11, 23, 28, 33, 39, 41, 42], "classificaton": [1, 41], "classifii": 10, "claus": [], "clean": [1, 41], "clear": [1, 5, 10, 12, 13, 36, 41, 42], "clearli": [0, 3, 5, 6, 7, 8, 30, 34, 35, 37, 38, 39, 43, 44], "clever": [1, 10, 41], "clf": [0, 6, 8, 9, 10, 33, 34], "clf3": 0, "clf_lasso": 6, "clf_ridg": 6, "cli": 15, "click": [], "climb": 23, "clip": [3, 30, 36, 38, 39, 43, 44], "clock": 36, "clone": [15, 31], "close": [0, 1, 2, 4, 6, 8, 9, 11, 12, 13, 14, 18, 24, 30, 32, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44], "closer": [3, 5, 13, 34, 35, 36], "closest": [8, 11, 13, 14], "closur": [25, 33], "cloud": [25, 33], "cluster": [0, 1, 4, 6, 11, 25, 33, 37, 38, 39, 41], "cluster_label": 14, "cm": [1, 2, 3, 6, 8, 13, 35, 36, 41, 42, 43, 44], "cmap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 33, 41, 42, 43, 44], "cmap_arg": 6, "cmd": [9, 15], "cn_": 30, "cnn": [12, 24, 39, 40], "cnn_kera": [3, 44], "cntk": [25, 33], "co": [0, 2, 3, 6, 9, 13, 33, 37, 38, 42], "code": [0, 3, 4, 6, 7, 8, 18, 19, 21, 22, 23, 25, 26, 30, 32], "codebas": [41, 42], "codec": [], "coef": [0, 33], "coef0": 8, "coef_": [0, 5, 6, 8, 9, 13, 16, 33, 34, 35, 36], "coeff": 5, "coeffici": [0, 3, 5, 6, 7, 8, 9, 13, 18, 26, 33, 34, 36, 37, 38, 39, 43, 44], "coerc": [0, 6, 33, 37, 38], "coin": [10, 30], "coin_toss": 10, "col": [0, 11, 33, 34], "colab": [21, 22, 25, 33], "cold": 9, "colinear": [], "collabor": [20, 27, 28], "collaps": 8, "collect": [2, 6, 10, 11, 17, 25, 30, 32, 33, 37, 38, 40, 43], "collinear": [5, 34, 35], "color": [0, 3, 4, 6, 8, 9, 10, 30, 36, 43, 44], "color_channel": [3, 44], "color_cod": 6, "colorbar": [1, 6, 20], "coloumn": [41, 42], "colsample_bytre": 10, "colsaobject": 10, "column": [0, 1, 2, 5, 6, 7, 8, 9, 11, 12, 16, 17, 18, 19, 26, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "columntransform": 9, "com": [4, 6, 15, 16, 19, 20, 21, 22, 25, 27, 28, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "combin": [1, 2, 5, 6, 7, 10, 15, 18, 22, 23, 30, 37, 38, 41, 42, 43, 44], "come": [0, 1, 3, 4, 5, 12, 13, 14, 15, 23, 24, 28, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "comfort": [], "command": [0, 1, 15, 41, 42], "comment": [0, 4, 5, 6, 20, 24, 27, 28], "commerci": [0, 25, 27, 33], "commit": 15, "commod": [0, 33], "common": [0, 1, 3, 5, 6, 7, 9, 11, 13, 14, 16, 23, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41], "commonli": [0, 1, 4, 6, 7, 9, 13, 14, 34, 36, 37, 38, 39, 41], "commonmark": [], "commun": [0, 12, 15, 27, 39, 40], "commut": 3, "commutatitav": 3, "compact": [0, 1, 3, 5, 6, 7, 9, 11, 12, 13, 14, 21, 33, 34, 37, 43, 44], "compair": 0, "compar": [0, 3, 4, 5, 6, 11, 13, 18, 23, 26, 27, 28, 33, 34, 35, 36, 37, 38, 40], "comparison": [2, 4, 13, 28, 42, 43], "compat": [7, 38, 39], "compens": 36, "compet": 0, "competit": 10, "compil": [0, 1, 3, 4, 13, 25, 26, 33, 41, 42], "compl": 21, "complet": [0, 2, 3, 4, 9, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 33, 39, 42, 43, 44], "completenn": [12, 39], "complex": [1, 5, 8, 9, 11, 12, 13, 16, 19, 28, 33, 35, 36, 37, 38, 41], "complianc": [], "complic": [0, 1, 9, 13, 27, 28, 33, 35, 36, 37, 38, 41], "compoment": 34, "compon": [0, 1, 3, 4, 5, 6, 7, 9, 14, 16, 25, 33, 34, 35, 37, 39, 40, 41, 42, 43, 44], "components_": 11, "compos": [9, 12, 13, 14, 25, 33, 39, 40, 42, 44], "compphys": [0, 6, 16, 20, 25, 27, 28, 29, 31, 32, 33, 34, 35, 38, 39, 41, 42, 43, 44], "compress": [0, 33, 34, 44], "compresseds": 43, "compris": 6, "compromis": [5, 23, 34, 35], "compulsori": [25, 33], "comput": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 21, 22, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "computation": [0, 3, 6, 9, 13, 30, 33, 35, 36, 40, 44], "computationalscienceuio": 33, "compute_gradi": 22, "computerlab": [27, 28], "con": 28, "concaten": [2, 4, 6, 14, 38, 39, 42, 43], "concav": [1, 13, 34, 35], "concentr": 10, "concept": [0, 2, 23, 25, 33, 34, 42, 43], "conceptu": [12, 13, 35, 39, 40], "concern": [0, 1, 4, 7, 33, 35, 38, 39, 41, 43, 44], "concic": 33, "conclud": [0, 5, 13, 36], "conclus": [1, 41], "cond": [2, 42, 43], "conda": [0, 1, 25, 27, 33, 41, 42], "condis": 34, "condit": [0, 2, 4, 5, 6, 8, 9, 11, 13, 30, 33, 34, 36, 37, 42, 43], "conduct": 25, "condwav": [2, 42], "confid": [0, 5, 6, 7, 8, 19, 23, 33, 34, 38, 39], "config": 42, "configur": [3, 23, 42, 44], "confirm": [5, 12, 21, 39], "conform": [], "confus": [5, 6, 7, 10, 26, 28, 34, 37], "confusion_matrix": 9, "congruenti": 30, "conjug": [4, 8], "conjugaci": 13, "conjunct": [3, 43, 44], "connect": [0, 1, 3, 4, 9, 11, 12, 13, 24, 26, 33, 34, 35, 39, 40, 41, 42, 43, 44], "consensu": 36, "consequ": [5, 6, 8, 10, 12, 13, 34, 35, 36, 37], "consequenti": [], "conserv": [5, 14, 34, 35], "consid": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 16, 19, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "consider": [0, 1, 5, 13, 33, 34, 35, 37], "consist": [1, 2, 3, 4, 6, 12, 13, 27, 28, 30, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "consol": [], "const": [], "constant": [0, 2, 4, 5, 6, 8, 12, 13, 16, 18, 30, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "constitu": [0, 33], "constitut": [2, 6, 37, 38, 42, 43, 44], "constrain": [1, 3, 5, 7, 11, 35, 38, 41, 43, 44], "constraint": [5, 6, 8, 13, 34, 35, 37], "construct": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 23, 26, 30, 33, 34, 37, 39, 43, 44], "constructor": [], "consult": 28, "consum": 36, "contact": [0, 33], "contain": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 18, 19, 21, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "contemporari": 33, "content": [1, 15, 20, 25, 26, 33, 35, 36, 43, 44], "context": [6, 10, 13, 22, 27, 35, 36, 37, 38, 40], "contigu": 26, "contin": 19, "continu": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 19, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "contour": [9, 10, 13], "contourf": [8, 9, 10], "contract": [], "contrast": [1, 4, 9, 10, 12, 33, 36, 39, 40, 41], "contribut": [0, 3, 5, 13, 18, 30, 33, 34, 35, 36, 43, 44], "contributor": [0, 27], "control": [0, 1, 3, 9, 13, 15, 23, 25, 33, 41, 44], "conv": [3, 4, 43, 44], "conv1": 44, "conv2": 44, "conv2d": [3, 4, 44], "conv2dtranspos": 4, "convei": 33, "conveni": [5, 6, 12, 13, 26, 27, 28, 33, 35, 36, 37, 39], "convent": [12, 34], "converg": [1, 2, 4, 5, 8, 13, 14, 18, 34, 35, 40, 41, 42, 43], "convergencewarn": [], "convers": [20, 36], "convert": [0, 1, 4, 5, 9, 11, 13, 26, 33, 34, 35, 38, 39, 43], "converttomatrix": 4, "convex": [4, 5, 7, 34, 38, 39], "convinc": [13, 23, 35, 43, 44], "convolut": [1, 4, 24, 25, 33, 41], "cool": [4, 9], "coolwarm": 6, "coordin": [5, 12, 14, 34, 35, 36, 39], "coorel": [], "copi": [0, 1, 14, 15, 34, 38, 39, 41, 42], "copyright": [], "core": 10, "corel": 33, "corner": [43, 44], "coronari": [7, 38], "corr": [5, 7, 11, 34, 39], "correalt": [11, 25], "correct": [0, 1, 2, 3, 4, 5, 7, 13, 15, 19, 20, 21, 22, 23, 26, 30, 33, 34, 35, 37, 38, 39, 41, 42, 43, 44], "correctli": [1, 2, 6, 7, 10, 18, 19, 21, 22, 23, 27, 28, 37, 38, 41, 42, 43], "correl": [0, 1, 3, 5, 6, 7, 10, 12, 13, 25, 30, 33, 35, 36, 37, 40], "correlation_matrix": [5, 7, 11, 34, 39], "correspond": [0, 3, 5, 6, 8, 9, 11, 12, 25, 26, 27, 28, 30, 33, 34, 35, 37, 39, 40, 43, 44], "cortex": [12, 39, 40], "cosin": [3, 6, 37, 38], "cost": [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 16, 17, 18, 19, 21, 22, 24, 27, 28, 33, 44], "cost_autograd": 22, "cost_deep_grad": [2, 42, 43], "cost_der": 22, "cost_fun": 22, "cost_func": [41, 42], "cost_func_deriv": [41, 42], "cost_funct": [2, 42, 43], "cost_function_deep": [2, 42, 43], "cost_function_deep_grad": [2, 42, 43], "cost_function_grad": [2, 42, 43], "cost_function_train": [41, 42], "cost_function_v": [41, 42], "cost_grad": [2, 22, 42, 43], "cost_histori": [], "cost_ol": [], "cost_one_lay": 22, "cost_ridg": [], "cost_sum": [2, 42, 43], "cost_two_lay": 22, "costcrossentropi": [41, 42], "costli": 36, "costlogreg": [41, 42], "costol": [13, 36, 41, 42], "could": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "coulomb": [0, 33], "count": [0, 9, 15, 23, 27, 28, 29, 30, 31, 33], "counter": [27, 28], "counteract": 36, "counterpart": 33, "countor": 13, "coupl": [4, 5, 6, 21, 37], "cours": [0, 1, 3, 5, 11, 15, 16, 17, 19, 20, 21, 24, 27, 28, 31, 34, 37, 38, 41, 42], "coursework": 15, "courvil": [28, 32, 33, 34, 36], "cov": [5, 6, 11, 26, 30, 33, 34, 37], "cov_xi": [5, 11, 34], "cov_xx": [5, 11, 34], "cov_yi": [5, 11, 34], "covari": [0, 7, 25, 26, 33, 35, 39], "covariance_matrix": [5, 11, 14], "cover": [0, 5, 25, 27, 28, 31, 32, 34, 35, 37], "covert": [0, 33], "covxi": 30, "covxx": 30, "covxz": 30, "covyi": 30, "covyz": 30, "covzz": 30, "cpu": [1, 41, 42, 44], "cqofi41lfdw": [40, 41], "craft": [3, 43, 44], "crash": 36, "creat": [1, 3, 4, 5, 9, 10, 11, 12, 15, 18, 19, 21, 22, 24, 25, 33, 36, 38, 39, 40, 41, 42, 44], "create_biases_and_weight": [1, 41], "create_convolutional_neural_network_kera": [3, 44], "create_lay": [21, 22], "create_layers_batch": 21, "create_neural_network_kera": [1, 41, 42], "create_x": [5, 11, 41, 42], "creation": [], "credit": [0, 7, 31, 33, 38, 39], "crim": [], "crime": [], "criteria": [0, 4, 9, 10, 14, 30, 33], "criterion": [9, 10, 13, 18, 35, 36, 40, 42, 44], "critic": [6, 27, 34], "critiqu": [27, 28], "cross": [0, 1, 3, 7, 9, 10, 13, 15, 21, 22, 23, 25, 28, 30, 33, 34, 35, 36, 41, 42], "cross_entropi": [4, 21], "cross_val_scor": [6, 37, 38], "cross_valid": [7, 10, 23, 39], "crossentropyloss": [42, 44], "crossvalid": [6, 37, 38], "crucial": [1, 30, 36, 41], "cs231": 3, "cs231n": 42, "cs231n_2017_lecture4": 42, "csr_matrix": [26, 33], "css": [], "csv": [0, 4, 6, 7, 9, 37, 38, 39], "ctnk": [1, 41, 42], "cube": 40, "cubic": 0, "cuda": [42, 44], "culprit": [], "cumbersom": [5, 37], "cuml": 23, "cumprod": [], "cumsum": [10, 11, 33], "cumul": [7, 10, 30, 36], "cumulative_heads_ratio": 10, "cuomo": [42, 43], "cup": 5, "current": [1, 2, 3, 4, 13, 14, 15, 16, 32, 35, 36, 38, 39, 41, 42, 43, 44], "curs": [0, 34], "curv": [6, 7, 10, 12, 27, 38, 39, 41], "curvatur": [13, 35, 36], "custom": [6, 14], "custom_cmap": [9, 10], "custom_cmap2": [9, 10], "custom_lin": [], "cut": 23, "cutpoint": 9, "cv": [6, 7, 10, 23, 37, 38, 39], "cvxbook": [13, 35], "cvxopt": [5, 8, 34], "cybenko": 40, "cycl": [1, 12, 39, 40, 41], "cycler": [], "d": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "d1": [], "d166a3": [], "d2": [], "d2_g_t": [2, 42, 43], "d2a8ff": [], "d4d0ab": [], "d71835": [], "d9dee3": [], "d_1": [43, 44], "d_2": [43, 44], "d_f": [13, 35], "d_g_t": [2, 42, 43], "d_net_out": [2, 42, 43], "da": [3, 22, 40, 43, 44], "da_1": 22, "dagger": [5, 26, 34, 35], "dai": [1, 9, 25, 41], "damag": [], "damp": 3, "darget": 9, "darkr": 30, "dat": [0, 33], "dat_id": [0, 6, 7, 9, 33, 37, 38], "data": [2, 4, 5, 8, 10, 12, 13, 14, 16, 19, 20, 22, 23, 24, 26, 27, 28, 32, 35, 36, 37, 43], "data1": 14, "data2": 14, "data3": 14, "data4": 14, "data_id": [0, 6, 7, 9, 33, 37, 38], "data_indic": [1, 41], "data_panda": 33, "data_path": [0, 6, 7, 9, 33, 37, 38], "databas": [1, 41, 42], "datafil": [0, 6, 7, 9, 33, 37, 38], "datafram": [0, 4, 5, 7, 9, 11, 33, 34, 39], "dataload": [42, 44], "datapoint": [1, 5, 6, 7, 11, 13, 16, 35, 36, 37, 38, 41, 42], "datasci": [15, 16, 19], "dataset": [0, 4, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 27, 28, 33, 35, 36, 37, 38, 39, 42], "datatyp": 4, "date": [15, 18, 21, 22, 23, 24, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "daughter": 10, "davi": [], "david": 32, "davison": [37, 38], "db": [22, 40, 43], "db_1": 22, "dbb7ff": [], "dbh": [1, 41], "dbo": [1, 41], "dc": 22, "dc5e85cd93c3": 28, "dc_da": 22, "dc_da1": 22, "dc_da2": 22, "dc_db": 22, "dc_db1": 22, "dc_db2": 22, "dc_dw": 22, "dc_dw1": 22, "dc_dw2": 22, "dc_dz": 22, "dc_dz1": 22, "dc_dz2": 22, "dcc6e0": [], "dcomposit": 26, "ddot": [2, 42, 43], "de": 36, "dead": [1, 41, 42], "deadlin": [15, 20, 21, 22, 23, 24], "deal": [0, 1, 3, 5, 6, 8, 11, 13, 14, 19, 26, 30, 33, 34, 35, 36, 40, 41, 42, 44], "dealt": 0, "debt": [7, 38, 39], "debug": [0, 5, 6, 34, 35, 36, 37, 38, 41, 42], "debugg": [], "decad": [0, 3, 36, 43, 44], "decai": [0, 13, 30, 33], "decemb": [31, 33], "decent": 10, "decid": [0, 2, 3, 5, 6, 9, 18, 34, 35, 36, 37, 38, 41, 42, 43, 44], "decim": [0, 33, 41, 42], "decis": [0, 1, 8, 11, 25, 32, 33, 41], "decision_funct": 8, "decision_tre": 9, "decisiontreeclassifi": [9, 10], "decisiontreeregressor": [0, 9, 10], "declar": [0, 4, 20, 23, 26, 33], "declare_namespac": [], "decompos": [5, 6, 26, 34, 35, 40], "decomposit": [0, 6, 12, 33, 39, 40, 43], "decompost": [5, 34, 35], "deconvolut": [3, 43, 44], "decorrel": [10, 13, 36], "decreas": [1, 2, 4, 5, 6, 10, 11, 13, 19, 23, 24, 35, 36, 37, 38, 41, 42, 43], "dedic": 20, "deduc": [0, 33], "deep": [3, 7, 12, 13, 24, 25, 28, 32, 34, 35], "deep_neural_network": [2, 42, 43], "deep_param": [2, 42, 43], "deep_tree_clf": [9, 10], "deep_tree_clf1": 9, "deep_tree_clf2": 9, "deepcopi": [41, 42], "deepen": [5, 25, 33], "deeper": [0, 3, 4, 33, 44], "deepimag": 42, "deeplearningbook": [28, 32, 33, 35, 36], "deer": [3, 44], "def": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 17, 21, 22, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "def_covari": 30, "default": [0, 1, 2, 4, 6, 7, 26, 28, 33, 34, 38, 39, 41, 42, 43], "default_tim": 4, "defect": [5, 34, 35], "defici": [5, 34, 35], "defin": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 26, 27, 30, 34, 35, 36, 37, 38, 39, 44], "definit": [1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 23, 26, 30, 34, 35, 36, 37, 38, 39, 42, 43], "defint": 30, "defualt": [41, 42], "degre": [3, 5, 6, 8, 9, 10, 11, 15, 16, 19, 20, 27, 30, 33, 35, 36, 37, 38, 43, 44], "deisenroth": 34, "del": 1, "delet": [6, 15], "delimit": 4, "deliv": [15, 27, 28, 29, 33], "delta": [0, 2, 3, 6, 8, 12, 13, 14, 33, 36, 40, 41, 42, 43, 44], "delta_": [1, 26, 40, 41], "delta_0": [3, 40, 43, 44], "delta_1": [3, 40, 41, 43, 44], "delta_2": [3, 40, 41, 43, 44], "delta_2a_1": [40, 41], "delta_3": [3, 43, 44], "delta_4": [3, 43, 44], "delta_5": [3, 43, 44], "delta_h": [0, 1, 33, 41], "delta_i": [40, 41, 43, 44], "delta_j": [3, 12, 40, 41, 42, 43, 44], "delta_k": [12, 40, 41, 42], "delta_l": [1, 3, 41, 43, 44], "delta_matrix": [41, 42], "delta_momentum": [13, 36], "delta_n": [0, 3, 33], "delug": 25, "delv": 0, "demand": [13, 35], "demonstr": [0, 3, 5, 6, 7, 11, 12, 19, 25, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "demystifi": [39, 40, 41], "den": 4, "denomin": [1, 5, 36, 41], "denot": [1, 2, 6, 7, 13, 30, 35, 36, 38, 39, 41, 42, 43], "dens": [1, 3, 4, 41, 42, 43], "densiti": [0, 2, 6, 30, 37, 38, 42, 43], "depart": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "depend": [0, 1, 2, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 25, 26, 27, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "depict": 30, "deploy": [0, 25, 27, 33], "depth": [0, 3, 9, 10, 26, 37, 41, 43, 44], "der": [], "deriv": [0, 1, 2, 6, 7, 8, 10, 11, 13, 18, 22, 25, 27, 28, 33, 38, 39, 42, 43], "derivati": 13, "derivative_fn": 13, "derivb1": [40, 41], "derivb2": [40, 41], "derivw1": [40, 41], "derivw2": [40, 41], "descend": [5, 9, 11, 34, 35, 43, 44], "descent": [0, 1, 3, 7, 8, 12, 22, 24, 28, 33, 34, 38, 40, 41, 44], "describ": [0, 2, 4, 5, 6, 8, 10, 11, 12, 13, 19, 20, 23, 24, 26, 27, 28, 33, 36, 37, 39, 40, 42, 43, 44], "descript": [0, 8, 9, 20, 27, 28, 33, 41, 42], "design": [0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 17, 18, 27, 28, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "designmatrix": [0, 33], "desir": [0, 2, 4, 5, 13, 14, 33, 34, 35, 36, 41, 42, 43], "desktop": 15, "despit": [1, 12, 36, 39, 41], "destroi": 26, "det": [5, 26, 34, 35], "detail": [0, 6, 11, 13, 14, 18, 21, 22, 26, 27, 34, 35, 36, 41], "detect": [3, 8, 12, 39, 40, 43, 44], "determin": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 18, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "determinist": [7, 13, 30, 35, 36, 38, 40], "deternin": 40, "dev": [1, 27, 28, 41], "develop": [0, 3, 5, 8, 10, 11, 12, 25, 26, 27, 28, 33, 34, 39, 40, 42, 43, 44], "deviat": [0, 1, 2, 4, 5, 6, 17, 18, 19, 27, 30, 33, 34, 36, 37, 38, 41, 42, 43], "devic": [42, 44], "devis": [12, 39, 40], "df": [4, 8, 11, 13, 23, 33, 40], "df1": 33, "di": [], "diag": [5, 8, 34, 35, 36, 43], "diagnost": [1, 10, 41], "diagon": [0, 5, 7, 13, 18, 19, 23, 26, 30, 33, 34, 35, 36, 38, 39, 41, 42, 43, 44], "diagonaliz": [5, 34, 35], "diagram": 10, "diagsvd": 6, "dice": [6, 30, 37], "dict": [6, 8, 41, 42], "dictionari": [41, 42], "did": [0, 1, 5, 6, 7, 10, 11, 14, 16, 22, 27, 28, 33, 37, 38, 39, 41, 42, 43], "didn": 42, "die": [1, 41, 42], "diff": [2, 40, 42], "diff1": [2, 42, 43], "diff2": [2, 42, 43], "diff_ag": [2, 42, 43], "diffeent": 8, "differ": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42], "different": [40, 41, 42], "differenti": [0, 3, 16, 21, 22, 25, 26, 28, 33, 34, 35, 39, 41, 44], "difficult": [0, 1, 6, 10, 13, 30, 33, 36, 37, 38, 41], "difficulti": [0, 1, 13, 33, 35, 36, 41], "diffonedim": [2, 42, 43], "digit": [0, 1, 3, 4, 6, 23, 28, 31, 33, 41, 42, 44], "digress": 40, "dilemma": [13, 36], "dilut": [1, 41], "dim": [4, 11, 14, 26, 41, 42], "dimens": [0, 1, 2, 3, 4, 5, 8, 11, 14, 16, 26, 33, 34, 35, 40, 41, 42, 43, 44], "dimension": [0, 4, 5, 6, 9, 11, 13, 14, 19, 25, 26, 27, 28, 33, 34, 35, 36, 37], "dimensionless": [0, 3, 33], "diment": 26, "diminish": 36, "dimnsion": 4, "diod": 3, "direct": [0, 1, 2, 4, 11, 12, 13, 14, 24, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "directli": [1, 4, 5, 6, 18, 22, 30, 34, 35, 41, 42], "directori": [], "disabl": 42, "disadvantag": [0, 28, 33, 36], "disappear": [3, 6, 37], "disc_loss": 4, "disc_tap": 4, "discard": [6, 11, 36, 37, 38], "disciplin": [0, 3, 12, 39, 40, 43, 44], "disclaim": 30, "discontinu": 40, "discord": [21, 33], "discourag": [13, 15, 35], "discov": [0, 33], "discover": 5, "discret": [1, 3, 5, 7, 13, 38, 39, 41], "discrimin": [4, 7, 10, 11, 23, 38, 39], "discriminator_loss": 4, "discriminator_loss_list": 4, "discriminator_model": 4, "discriminator_optim": 4, "discuss": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 20, 23, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "diseas": [7, 38, 39], "disguis": [6, 34, 36], "disk": 36, "disord": [1, 7, 38, 39], "dispai": [39, 40], "displai": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 44], "displaystyl": [0, 5, 17, 33, 34, 35, 36], "disregard": [0, 33], "dissimilar": [11, 14], "dist": 14, "distanc": [8, 9, 11, 14, 30], "distance_list": 9, "distinct": [3, 7, 8, 9, 10, 14, 38, 39, 43, 44], "distinctli": 8, "distinguish": [0, 4, 7, 8, 30, 33, 39], "distplot": [], "distribut": [0, 1, 4, 6, 7, 10, 11, 13, 14, 18, 19, 21, 25, 26, 27, 28, 33, 34, 35, 36, 38, 41], "distrubut": [0, 25, 27, 33], "div": [], "dive": [0, 8, 26, 33], "diverg": [1, 13, 35, 36, 41], "divid": [0, 1, 3, 5, 6, 7, 8, 9, 11, 12, 18, 19, 23, 24, 28, 30, 33, 34, 36, 37, 38, 39, 40, 41, 44], "divis": [6, 8, 9, 13, 18, 26, 30, 36, 37, 38, 40, 41, 42], "dl": [], "dm": [], "dna": [7, 38, 39], "dnn": [0, 1, 2, 4, 12, 33, 39, 40, 41, 42, 43], "dnn1": 4, "dnn2_gru2": 4, "dnn_kera": [1, 41, 42], "dnn_model": 1, "dnn_numpi": [1, 41], "dnn_scikit": [0, 1, 33, 41], "do": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 33, 34, 35, 37, 38, 40, 42, 44], "doc": [0, 15, 16, 19, 25, 27, 28, 29, 31, 32, 33, 41, 42], "document": [4, 13, 15, 24, 42], "docutil": [], "doe": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 24, 26, 27, 28, 30, 33, 36, 37, 38, 40, 41, 42, 43, 44], "doesn": [3, 9, 12, 24, 33, 36, 40, 41, 43, 44], "dog": [1, 3, 4, 41, 44], "dollar": [], "domain": [5, 8, 13, 27, 28, 35, 37], "domcontentload": [], "domin": [0, 23, 33], "don": [0, 1, 3, 5, 6, 8, 11, 13, 15, 16, 21, 23, 24, 25, 27, 28, 33, 34, 36, 41, 42], "done": [0, 2, 3, 4, 5, 6, 9, 10, 11, 13, 16, 20, 22, 26, 27, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "dot": [0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "doubl": [3, 4, 16, 26, 33, 43, 44], "doubli": [1, 41], "doubt": [27, 28], "down": [0, 3, 6, 9, 11, 12, 13, 35, 36, 39, 42, 43], "download": [0, 1, 3, 5, 6, 15, 20, 26, 32, 33, 41, 42, 44], "downsampl": [3, 43, 44], "downscal": 28, "downsiz": 43, "dozen": [1, 41], "dq": [6, 37], "draft": [20, 24], "drag": 13, "dragon": [], "dramat": 11, "drastic": 4, "draw": [4, 6, 10, 13, 35, 37, 38], "drawback": [0, 1, 3, 13, 34, 35, 36, 41], "drawn": [1, 4, 6, 7, 11, 30, 33, 37, 38, 39, 41], "drive": [3, 4, 21, 22], "driven": 3, "drop": [0, 1, 5, 6, 11, 13, 30, 33, 34, 35, 37, 41], "dropna": [0, 6, 33, 37, 38], "dropout": [4, 44], "dt": [2, 3, 13, 30, 40, 42, 43], "dtype": [0, 1, 3, 4, 14, 26, 33, 38, 39, 40, 41, 42, 44], "dual": [], "dub": [0, 33], "duboi": [], "due": [1, 2, 5, 6, 8, 10, 12, 13, 18, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "dugard": [], "dummi": [], "dumoulin": [43, 44], "dure": [0, 1, 3, 4, 8, 9, 11, 20, 25, 27, 28, 33, 36, 37, 38, 39, 41, 42, 43, 44], "dw": 22, "dw_1": 22, "dwell": [], "dwh": [1, 41], "dwo": [1, 41], "dx": [2, 3, 8, 30, 40, 42, 43], "dx_1": 30, "dx_1p": [6, 37], "dx_2p": [6, 37], "dx_mp": [6, 37], "dx_n": 30, "dxp": [6, 37], "dy": [1, 8, 30, 41, 42], "dynam": 4, "dz": [8, 22], "dz_1": 22, "dz_2": 22, "dzt6vm1wjh": 44, "e": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "e1e1e1": [], "e_": [0, 2, 33, 42, 43], "e_z": 21, "each": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 22, 23, 25, 26, 28, 29, 30, 31, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44], "eager": 37, "eapprox": [0, 33], "earli": [1, 13, 36, 41], "earlier": [0, 5, 7, 8, 9, 11, 12, 13, 19, 20, 21, 22, 33, 34, 38, 39, 40, 41], "earthexplor": 6, "eas": [6, 9, 14, 37], "easi": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 21, 22, 25, 26, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "easier": [5, 6, 8, 9, 13, 15, 20, 21, 22, 27, 28, 30, 33, 34, 35, 37, 38], "easiest": [13, 18, 38, 39], "easili": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "eastern": [31, 33], "ebind": [0, 33], "eblock": 9, "ec8e2c": [], "econom": [], "econometr": 33, "economi": 5, "ecosystem": [25, 33], "ect": 29, "edg": [3, 43, 44], "edgecolor": [6, 37, 38], "edit": [21, 22], "editor": [15, 20], "edu": [13, 27, 28, 35, 42], "educ": [0, 27, 28, 33, 37], "ee6677": [], "eff": 30, "effect": [1, 4, 10, 13, 16, 17, 18, 30, 36, 41, 42], "effic": [1, 41], "effici": [0, 3, 10, 13, 21, 22, 25, 26, 30, 33, 36, 38, 39, 40], "effort": 19, "efron": [6, 37, 38], "egrad": 13, "eig": [5, 11, 13, 26, 30, 33, 34, 35, 36], "eigen": 30, "eigenpair": [5, 11, 34, 35], "eigenvalu": [0, 5, 8, 11, 13, 26, 33, 34, 35, 36], "eigenvector": [5, 11, 13, 34, 35], "eight": [26, 33], "eigval": [26, 30, 33], "eigvalu": [11, 13, 35, 36], "eigvec": [26, 30, 33], "eigvector": [11, 13, 35, 36], "eir": [31, 33], "eispack": [26, 33], "either": [1, 5, 6, 7, 8, 9, 10, 11, 13, 18, 19, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43], "eivind": 31, "eivinsto": 31, "ekstr\u00f8m": 4, "elabor": 30, "elarn": 3, "electr": [0, 3, 12, 33, 39, 40], "electron": 33, "eleg": 11, "element": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 19, 20, 21, 24, 25, 26, 27, 28, 32, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "elementari": [10, 13, 26, 40], "elementwis": [3, 13, 43, 44], "elementwise_grad": [2, 13, 22, 41, 42, 43], "elessar": 33, "elif": [14, 41, 42], "elim": 26, "elimin": [3, 8], "elin": [31, 33], "ell_": [], "ellipsi": 16, "els": [1, 3, 4, 7, 9, 12, 13, 16, 22, 26, 38, 39, 41, 42, 43, 44], "elu": 1, "elus": [0, 33], "em": 23, "email": [20, 21, 29, 31, 33], "emb": [], "embark": 40, "embed": [0, 11, 34], "embeddings_fig5_349758607": 28, "embodi": [6, 27, 37, 38], "emit": 30, "emner": 32, "emph": 36, "emphas": [0, 10, 25, 33], "emphasi": [0, 25, 32, 33], "empir": [1, 11, 30, 41], "emploi": [0, 1, 5, 6, 11, 13, 28, 30, 33, 34, 35, 37, 41, 44], "employ": 0, "empti": [6, 10, 15, 37, 38, 41, 42], "emul": [12, 39, 40], "en": [25, 27, 32], "enabl": [11, 36, 41, 42], "enbodi": [6, 37], "encod": [0, 3, 5, 9, 11, 14, 33, 34, 35, 38, 39, 43, 44], "encompass": [0, 27, 30], "encount": [0, 1, 5, 7, 13, 15, 21, 27, 30, 33, 34, 35, 36, 38, 39, 41], "encourag": [15, 27, 28], "end": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 24, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "endblock": [], "endfor": [], "endif": [], "endors": [], "endpoint": [3, 6], "energi": [0, 4, 6, 37, 38], "enforc": [12, 39, 40], "eng": 32, "engin": [0, 1, 3, 4, 25, 33, 41, 43, 44], "english": [27, 28], "enjoi": 36, "enocurag": [27, 28], "enorm": [3, 43, 44], "enough": [0, 6, 13, 28, 33, 35, 36, 37], "ensembl": [1, 9, 33, 41], "ensur": [0, 1, 2, 3, 5, 6, 11, 13, 18, 30, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "entail": 33, "enter": [5, 6, 34, 35, 36], "enthought": [0, 25, 27, 33], "entir": [1, 3, 7, 9, 21, 25, 30, 33, 36, 38, 41, 42, 44], "entireti": [], "entiti": [9, 12, 26, 33], "entri": [0, 5, 8, 11, 12, 23, 26, 33, 34, 36, 37], "entropi": [1, 3, 7, 10, 13, 21, 22, 28, 33, 35, 36, 41, 42, 44], "enumer": [0, 1, 2, 3, 4, 6, 8, 21, 33, 34, 36, 38, 39, 41, 42, 43, 44], "env": 30, "environ": [2, 21, 22, 25, 27, 33, 42, 43], "environemnt": 15, "eo": [0, 6, 37, 38], "eol": 0, "eosfit": 0, "epoch": [0, 1, 3, 4, 12, 13, 21, 24, 33, 36, 38, 39, 41, 42, 44], "eppstein": [], "epsilon": [0, 5, 6, 7, 13, 27, 33, 34, 35, 36, 37, 38, 39, 40], "epsilon_": [0, 33], "epsilon_0": [0, 33], "epsilon_1": [0, 33], "epsilon_2": [0, 33], "epsilon_i": [0, 33, 34], "eq": [3, 13, 14, 26, 30, 35], "eqnarrai": [3, 5, 6, 37], "equal": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 16, 18, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "equat": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, 19, 26, 30, 33, 36, 37, 44], "equilibrium": [2, 12, 39, 40, 42, 43], "equiv": [3, 13, 26, 30, 35, 36], "equival": [0, 1, 5, 7, 8, 11, 13, 23, 24, 25, 26, 28, 33, 34, 35, 36, 37, 41, 43, 44], "equivel": [19, 21, 22], "eqynreyrxni": 41, "eras": [], "erf": 30, "eriador": 33, "eric": [41, 42], "err": [0, 10, 43], "err_": [6, 37, 38], "err_sqr": [2, 42, 43], "errat": [13, 35, 36], "erron": [2, 42, 43], "error": [1, 2, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 18, 19, 21, 23, 25, 26, 27, 28, 30, 36, 39, 40, 41, 42, 43], "error_estimate_corr_tim": 30, "error_hidden": [1, 41], "error_output": [1, 41], "escap": [13, 35, 36], "escapehtml": [], "especi": [1, 3, 9, 12, 13, 15, 18, 27, 28, 36, 39, 40, 41, 42], "essenti": [0, 5, 6, 9, 10, 12, 14, 15, 24, 27, 28, 30, 34, 35, 36, 39, 40, 41, 43, 44], "establish": [0, 6, 10, 11, 16, 27, 28], "estim": [0, 1, 5, 6, 7, 10, 11, 13, 25, 30, 33, 34, 35, 36, 38, 39, 41, 43], "estimated_mse_fold": [6, 37, 38], "estimated_mse_kfold": [6, 37, 38], "estimated_mse_sklearn": [6, 37, 38], "et": [0, 2, 4, 16, 17, 20, 28, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42], "eta": [0, 1, 3, 8, 12, 13, 18, 28, 33, 35, 36, 40, 41, 42, 44], "eta0": [8, 13], "eta_": 13, "eta_j": 36, "eta_t": [13, 36], "eta_v": [0, 1, 3, 33, 41, 42, 44], "etc": [0, 1, 3, 5, 7, 8, 9, 11, 12, 13, 14, 25, 26, 27, 28, 30, 34, 35, 36, 38, 39, 41, 42], "ethic": 25, "etsim": 37, "euclidean": [0, 14, 34, 36], "euler": [], "eval": [42, 44], "evalu": [0, 2, 3, 4, 5, 6, 9, 13, 15, 16, 17, 19, 21, 23, 27, 30, 33, 34, 35, 36, 37, 38, 39, 42, 43], "evalut": [13, 27], "even": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 22, 25, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "evenli": 4, "event": [5, 7, 10, 30, 37, 38], "eventu": [0, 5, 6, 11, 12, 13, 24, 27, 28, 31, 34, 35, 36, 37, 38, 39, 40], "everi": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 21, 25, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "everyth": [4, 12, 16, 18, 21, 40, 41], "everywher": [4, 13, 35], "evolv": 0, "exact": [0, 5, 11, 12, 13, 26, 30, 33, 34, 36, 40, 41], "exactli": [0, 3, 4, 6, 12, 18, 25, 34, 36, 37, 39, 40, 41, 43, 44], "exam": 33, "examin": [6, 37, 38], "exampl": [0, 5, 11, 12, 13, 15, 16, 18, 20, 23, 25, 26, 27, 28, 30, 32], "exce": [1, 12, 13, 36, 39, 40, 41], "exceed": 36, "excel": [0, 1, 4, 5, 10, 20, 27, 28, 33, 34, 41], "except": [3, 4, 6, 8, 9, 26, 41, 42, 43, 44], "excess": [0, 33], "exchang": 36, "excit": 0, "exclud": [1, 6, 12, 27, 28, 34, 36, 37, 38, 39, 41], "exclus": [0, 1, 3, 6, 30, 33, 37, 38, 41, 42, 44], "execut": [2, 5, 13, 15, 34, 35, 36, 42, 43], "exemplari": [], "exemplifi": [13, 36], "exercic": [31, 33], "exercis": [5, 25, 27, 28, 29, 31, 33, 35, 36, 37, 38, 39, 41, 43, 44], "exercisesweek41": 28, "exercisesweek42": [28, 41], "exhaust": [6, 36, 37, 38], "exhibit": [0, 5, 6, 8, 33, 34, 37], "exist": [0, 1, 2, 3, 5, 6, 7, 8, 9, 13, 19, 26, 27, 28, 33, 35, 36, 37, 38, 41, 42, 43, 44], "exit": [5, 26, 34, 35], "exp": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 16, 17, 19, 21, 22, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "exp_term": [1, 41], "exp_z": [38, 39], "expand": [5, 7, 11, 13, 35, 38, 39], "expans": [0, 3, 5, 8, 10, 12, 13, 33, 34, 35, 40], "expect": [0, 1, 5, 6, 7, 11, 12, 13, 15, 18, 23, 25, 27, 28, 33, 34, 36, 38, 40, 41], "expectation_value_of_h_wrt_p": 30, "expens": [6, 10, 13, 16, 35, 36], "experi": [0, 1, 6, 8, 13, 15, 25, 27, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "experiment": [0, 4, 6, 9, 30, 33, 37, 38], "expert": [1, 9, 41], "explain": [0, 6, 9, 10, 11, 13, 16, 19, 24, 27, 28, 33, 35, 38, 39], "explained_variance_ratio_": 11, "explan": [], "explanatori": [0, 33], "explicit": [0, 3, 6, 13, 26, 27, 33, 34, 35, 36, 43, 44], "explicitli": [0, 4, 21], "explod": [1, 24, 40], "exploit": [0, 3, 12, 13, 33, 36, 39, 40, 43, 44], "explor": [1, 4, 6, 8, 13, 18, 25, 27, 28, 33, 35, 36, 41], "expon": [1, 41, 42], "exponenti": [0, 1, 5, 6, 10, 13, 30, 33, 35, 40, 41], "export": [9, 15, 16, 19, 20, 24, 38, 39], "export_graphviz": 9, "export_text": 9, "exporttext": 9, "expos": 25, "expr": 40, "express": [0, 2, 3, 5, 6, 7, 10, 12, 13, 18, 22, 26, 27, 28, 30, 33, 35, 36, 37, 42, 43, 44], "exptmean": 30, "exptvari": 30, "extend": [0, 2, 7, 11, 13, 25, 33, 36, 42, 43], "extend_path": [], "extens": [0, 12, 15, 25, 28, 33, 39, 40], "extent": [0, 1, 6, 32, 37, 38, 41, 42, 43, 44], "extern": [3, 6, 9], "extra": [1, 3, 5, 15, 31, 33, 34, 35, 41, 44], "extract": [0, 3, 5, 6, 7, 8, 11, 13, 16, 17, 26, 28, 33, 34, 38, 39, 40], "extrapol": [0, 33], "extrem": [0, 1, 4, 5, 6, 7, 8, 9, 13, 15, 16, 26, 34, 35, 36, 38, 41], "extremum": [13, 35], "extrins": 11, "ey": [0, 5, 6, 13, 14, 18, 26, 33, 34, 35, 36], "f": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "f1": 13, "f11": [0, 33], "f12": [0, 33], "f13": [0, 33], "f1_grad": 13, "f1d": 13, "f2": 13, "f26196": [], "f2_grad_x1": 13, "f2_grad_x1_analyt": 13, "f2_grad_x2": 13, "f2_grad_x2_analyt": 13, "f2f2f2": [], "f3": 13, "f3_grad": 13, "f3_grad_analyt": 13, "f4": 13, "f4_grad": 13, "f4_grad_analyt": 13, "f5": 13, "f5_grad": 13, "f5a394": [], "f5ab35": [], "f5f5f5": [], "f6": 13, "f6_for": 13, "f6_for_grad": 13, "f6_grad_analyt": 13, "f6_while": 13, "f6_while_grad": 13, "f7": 13, "f78c6c": [], "f7_grad": 13, "f7_grad_analyt": 13, "f8": 13, "f8_grad": 13, "f8f8f2": [], "f9": [0, 13, 33], "f9_altern": 13, "f9_alternative_grad": 13, "f9_grad": 13, "f_": [10, 23], "f_0": [3, 10], "f_1": [10, 13, 35], "f_2": [12, 13, 35, 39], "f_3": [12, 39], "f_d": 30, "f_grad": 13, "f_grad_analyt": 13, "f_i": [0, 6, 12, 16, 37, 38, 39], "f_m": [3, 10], "f_n": 3, "f_vec": [2, 42, 43], "face": [13, 33, 35], "facecolor": [6, 8, 30, 37], "facil": [0, 25], "facilit": [12, 39, 40], "fact": [0, 1, 3, 5, 9, 11, 12, 13, 22, 33, 34, 35, 36, 41, 43, 44], "facto": 36, "factor": [0, 1, 3, 5, 6, 9, 10, 11, 13, 26, 30, 33, 34, 35, 41], "factori": 13, "fad000": [], "fade": 6, "fae4c2": [], "fafab0": [9, 10], "fail": [0, 6, 13, 31, 33, 35, 37, 38, 40], "failur": [7, 38, 39], "fairli": [1, 2, 18, 30, 36, 41, 42, 43], "faisal": [16, 34], "fake": 4, "fake_loss": 4, "fake_output": 4, "fall": [8, 9, 23, 29], "fals": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 14, 16, 17, 23, 26, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "famili": [0, 7, 8, 30, 34, 36, 38, 39, 40], "familiar": [0, 3, 5, 6, 8, 15, 25, 26, 27, 30, 33, 37, 40, 43, 44], "famou": [6, 12, 41, 42], "far": [0, 3, 4, 5, 6, 8, 11, 12, 13, 14, 16, 20, 21, 22, 33, 34, 35, 36, 39, 40, 43, 44], "fashion": [0, 9, 10, 28, 33, 36], "fashionmnist": 28, "fast": [1, 3, 6, 10, 12, 13, 25, 30, 33, 35, 36, 37, 38, 40, 41, 43, 44], "faster": [1, 11, 13, 21, 36, 41, 42], "fastest": [13, 26, 35], "fatal": [], "favor": [7, 36, 38], "favorit": 30, "fc": [3, 43, 44], "fc1": [42, 44], "fc2": [42, 44], "fc3": 42, "fcfcfc": [], "fdac54": [], "fdf2e2": [], "featur": [0, 1, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 17, 18, 19, 21, 23, 25, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "feature_nam": [1, 7, 9, 21, 39], "feautur": 9, "fed": [1, 40, 41], "feed": [0, 2, 3, 11, 21, 24, 25, 28, 33, 43, 44], "feed_forward": [1, 21, 22, 41], "feed_forward_all_relu": 21, "feed_forward_batch": 21, "feed_forward_one_lay": 22, "feed_forward_out": [1, 41], "feed_forward_sav": 22, "feed_forward_train": [1, 41], "feed_forward_two_lay": 22, "feedback": [4, 20, 24, 33], "feeddorward": 4, "feedforward": [1, 4, 12, 41, 42], "feel": [0, 5, 6, 11, 13, 15, 16, 18, 21, 22, 23, 25, 27, 28, 31, 33, 40], "feet": [], "fefef": [], "fefeff": [], "felt": [27, 28], "fenc": [], "fernando": [], "fetch": [6, 15, 28], "fetch_openml": 28, "few": [1, 3, 4, 5, 9, 17, 18, 19, 22, 23, 24, 30, 33, 40, 41, 43, 44], "fewer": [0, 9, 11, 19, 33, 36], "ff7b72": [], "ff9492": [], "ffa07a": [], "ffa657": [], "ffb757": [], "ffd700": [], "ffd900": [], "ffd9002e": [], "ffffff": [], "ffnn": [1, 12, 28, 39, 40, 41, 42], "fi": [], "field": [0, 3, 6, 12, 19, 25, 39, 40, 43, 44], "fieldmask": [], "fifteen": 40, "fifth": [0, 6, 33, 43, 44], "fig": [0, 1, 2, 3, 4, 6, 7, 12, 13, 14, 27, 33, 38, 39, 41, 42, 43, 44], "fig_id": [0, 6, 7, 9, 33, 37, 38], "figaxi": 30, "figsiz": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 33, 37, 38, 39, 41, 42, 43, 44], "figslid": 43, "figur": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 24, 25, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "figure_id": [0, 6, 7, 9, 33, 37, 38], "figurefil": [0, 6, 7, 9, 33, 37, 38], "file": [0, 4, 5, 6, 7, 9, 15, 20, 21, 22, 27, 28, 33, 37, 38], "file_prefix": 4, "filenam": 33, "fill": [5, 9, 18, 23, 34, 35, 41, 42], "fill_valu": [], "filter": [3, 4, 43, 44], "final": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 20, 21, 22, 23, 24, 27, 28, 29, 30, 31, 33, 35, 37, 38, 39], "financ": 0, "find": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 21, 22, 24, 25, 27, 28, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42], "fine": [0, 14], "finish": [2, 20, 21, 41, 42, 43], "finit": [3, 5, 6, 12, 13, 17, 30, 34, 35, 37, 38, 39, 40, 43, 44], "finnicki": 15, "fire": [], "first": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 34, 36, 37, 38, 39, 44], "first_moment": 36, "first_term": 36, "firsteigvector": 11, "fit": [1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, 19, 22, 23, 27, 28, 30, 34, 36, 37, 38, 39, 40, 41, 42, 44], "fit_beta": 34, "fit_intercept": [0, 5, 6, 16, 34, 35, 36, 37, 38, 39], "fit_mod": 9, "fit_theta": [6, 36], "fit_transform": [0, 6, 8, 9, 11, 15, 19, 37, 38], "fiti": [0, 33], "five": [0, 9, 33, 34, 40], "fix": [0, 3, 4, 6, 10, 11, 12, 13, 24, 27, 33, 37, 38, 39, 41, 42, 43, 44], "flag": 4, "flat": [12, 13, 35, 36], "flatten": [1, 3, 4, 5, 26, 41, 42, 43, 44], "flavor": [], "flexibl": [1, 6, 8, 10, 12, 28, 33, 36, 37, 38, 39, 41, 42], "flip": [21, 31, 33, 43, 44], "float": [0, 3, 4, 5, 9, 11, 13, 14, 26, 33, 34, 35, 41, 42, 43], "float32": [4, 9, 41, 42], "float64": [4, 26, 33, 39, 40, 41, 42], "floatingpointerror": [41, 42], "floor": [41, 42], "flop": [5, 26, 34, 35], "flow": [1, 4, 12, 39, 40, 41], "flower": 21, "fluctuat": [5, 36], "flush": [41, 42], "fly": 11, "fm": 0, "fmax": 3, "fmesh": 13, "fn": [7, 23], "focu": [0, 3, 4, 5, 6, 15, 25, 27, 28, 32, 33, 34, 35, 36, 37, 38, 43, 44], "focus": [1, 6, 7, 23, 26, 34, 36, 38, 39, 41], "fold": [6, 9, 27], "folder": [0, 4, 6, 15, 20, 24, 27, 28, 33], "follow": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "font": [7, 20, 30, 33, 38], "fontdict": 30, "fontsiz": [1, 6, 8, 9, 10, 30], "fontweight": 1, "footprint": [3, 36, 43, 44], "foral": [8, 34, 40], "forc": [0, 5, 6, 10, 11, 34, 35, 36, 40, 43, 44], "forcast": 4, "forcier": [], "forecast": [4, 12, 39, 40], "forest": [0, 1, 9, 25, 33, 41], "forget": [11, 36], "form": [0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "formal": [3, 4, 14, 18, 23, 30, 40, 43, 44], "format": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 20, 23, 25, 30, 32, 37, 38, 39, 41, 42, 44], "format_data": 4, "formatstrformatt": [6, 13, 35, 36], "formatt": [], "formul": [4, 6, 11, 14, 23], "formula": [3, 13, 23, 30, 35, 40], "forth": [4, 12, 22, 39], "fortran": [0, 25, 26, 33], "fortran2003": [25, 33], "fortran2008": [27, 28], "fortran90": 30, "fortun": [0, 11, 34], "forward": [0, 3, 6, 21, 24, 25, 26, 28, 33, 36, 37, 44], "forwardpropag": [40, 41], "found": [1, 2, 4, 5, 6, 12, 13, 19, 20, 21, 22, 27, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43], "foundat": [25, 33], "four": [4, 5, 6, 8, 12, 21, 26, 29, 31, 33, 35, 39, 40, 41, 43, 44], "fourier": [0, 33, 40], "fourierdef1": 3, "fourierdef2": 3, "fourierseriessign": 3, "fourth": [12, 33, 34], "fp": [7, 23], "fpr": 23, "frac": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 22, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "fraction": [9, 23, 38, 39], "frame": [7, 36, 39], "framework": [1, 8, 10, 30, 41, 42], "frank": [5, 11], "frankefunct": [5, 6, 11], "fredli": [21, 31, 33], "free": [0, 6, 11, 13, 15, 16, 18, 21, 22, 23, 25, 26, 27, 28, 30, 31, 32, 33, 40], "freecodecamp": 25, "freedom": [5, 35], "freeli": [0, 27], "freez": 15, "frequenc": [3, 6, 7, 30, 37, 39, 43, 44], "frequent": [0, 8, 9, 13, 35], "frequentist": 25, "fresh": 10, "frf4l5qax1m": 42, "fridai": [15, 21, 22, 23, 24, 31, 33, 43], "friedman": [6, 19, 27, 32, 33], "friendli": 4, "fro": 27, "frodo": 33, "frog": [3, 44], "from": [0, 1, 2, 3, 4, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 42], "from_cod": 9, "from_logit": [3, 4, 44], "from_tensor_slic": 4, "front": [0, 4, 5, 33, 34, 35], "frustrat": 15, "fulfil": [2, 5, 12, 34, 35, 39, 41, 42, 43], "full": [1, 3, 5, 7, 9, 10, 13, 21, 28, 30, 33, 34, 35, 38], "full_matric": [5, 34, 35, 43], "fulli": [3, 6, 12, 30, 37, 38, 39, 40, 43, 44], "fullnam": [], "fun": [25, 33], "func": [2, 21, 41, 42, 43], "function": [2, 3, 4, 5, 9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 44], "functionali": 11, "fundament": [0, 6, 25, 33, 37, 38], "funtion": [2, 42, 43], "furnish": [], "furthemor": 40, "further": [2, 7, 9, 19, 33, 40, 42], "furthermor": [0, 3, 5, 6, 7, 11, 12, 13, 25, 27, 28, 33, 34, 35, 36, 37, 38, 39, 41, 43, 44], "furthest": 22, "futur": [0, 4, 8, 9, 33], "fy": [15, 21, 27, 28, 29, 31, 32, 33], "fys4155": [27, 28], "fys5419": [32, 33], "fys5429": [32, 33], "f\u00f8470": [31, 33], "g": [0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 13, 15, 18, 19, 23, 24, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "g0": [2, 42, 43], "g_": [2, 9, 10, 36, 42, 43], "g_0": [2, 42, 43], "g_1": [2, 10, 42, 43], "g_2": [2, 10, 42, 43], "g_3": 40, "g_analyt": [2, 42, 43], "g_dnn_ag": [2, 42, 43], "g_euler": [2, 42, 43], "g_i": [2, 40, 42, 43], "g_j": 40, "g_m": [3, 10], "g_n": 3, "g_re": [2, 42, 43], "g_t": [2, 36, 41, 42, 43], "g_t_d2t": [2, 42], "g_t_d2x": [2, 42, 43], "g_t_dt": [2, 42, 43], "g_t_hessian": [2, 42, 43], "g_t_hessian_func": [2, 42, 43], "g_t_invers": [41, 42], "g_t_jacobian": [2, 42, 43], "g_t_jacobian_func": [2, 42, 43], "g_trial": [2, 42, 43], "g_trial_deep": [2, 42, 43], "g_vec": [2, 42, 43], "gain": [1, 5, 7, 9, 10, 13, 34, 35, 41, 42], "galleri": [0, 33], "game": 4, "gamge": 33, "gamma": [0, 2, 8, 9, 10, 11, 13, 33, 35, 42, 43], "gamma1": 8, "gamma2": 8, "gamma_": [0, 33], "gamma_0": 10, "gamma_1": 10, "gamma_1x": 10, "gamma_i": [0, 8, 30, 33], "gamma_j": 13, "gamma_k": [13, 35], "gamma_m": 10, "gamma_x": [0, 33], "gap": [8, 36], "gate": [4, 12, 40], "gather": [0, 1, 12, 34, 39, 40, 41, 42], "gaug": [12, 39, 40], "gaussbacksub": 26, "gaussian": [4, 5, 6, 8, 14, 18, 30, 33, 37, 38, 39], "gaussian_point": 14, "gaussian_rbf": 8, "gave": [13, 28, 36], "gavra": 33, "gbc": 33, "gca": [2, 6, 8, 13], "gd": [1, 35, 40, 41], "gd_clf": 10, "gdclassiffiercgain": 10, "gdclassiffierconfus": 10, "gdclassiffierroc": 10, "gdm": 13, "gdregress": 10, "ge": [1, 5, 7, 30, 34, 35, 38, 41, 42], "gen_loss": 4, "gen_tap": 4, "gender": [0, 33], "genener": 4, "gener": [0, 1, 2, 3, 5, 6, 8, 10, 11, 12, 13, 14, 15, 16, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 41, 42], "generaliz": [16, 41, 42], "generallay": [12, 39], "generate_and_save_imag": 4, "generate_binary_data": [38, 39], "generate_imag": 4, "generate_latent_point": 4, "generate_multiclass_data": [38, 39], "generate_simple_clustering_dataset": 14, "generated_imag": 4, "generator_loss": 4, "generator_loss_list": 4, "generator_model": 4, "generator_optim": 4, "genom": 25, "geodes": 11, "geoff": 36, "geometr": [0, 13, 33, 36], "geometri": 5, "georg": 32, "geotif": 6, "geq": [2, 5, 8, 9, 13, 34, 35, 36, 42, 43], "gerard": [], "geron": [0, 32, 33], "get": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 15, 19, 21, 22, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "get_dummi": 9, "get_paramet": [2, 42, 43], "get_split": 9, "get_yaxi": 8, "get_yticklabel": 6, "getmask": [], "gh": 15, "gi6mzxat0ew": 42, "giant": 36, "gibb": [25, 33], "gif": 4, "gini": 10, "gini_index": 9, "ginvers": 13, "git": [0, 15, 25, 33], "gitcdn": [], "giter": [13, 36], "github": [0, 20, 24, 25, 27, 28, 29, 31, 32, 33, 34, 40, 41, 42], "gitignor": 15, "gitlab": [0, 15, 25, 27, 28, 33], "gitta": [40, 41], "give": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 19, 23, 25, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "given": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 19, 21, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "glkfgrjhtlnplbx4": 21, "global": [6, 7, 13, 35, 36, 38, 39], "gloriou": 28, "glorot": 1, "gmail": [], "gnew": 13, "go": [0, 1, 3, 5, 6, 8, 9, 11, 12, 13, 15, 16, 18, 21, 33, 34, 35, 37, 40, 41, 44], "goal": [0, 7, 9, 33, 38, 39], "goe": [0, 1, 2, 5, 6, 13, 14, 15, 19, 26, 33, 34, 35, 36, 37, 41, 42, 43], "goessner": [], "golden": 13, "gone": [5, 34, 35], "gong": [1, 41], "good": [1, 3, 4, 5, 6, 9, 10, 11, 13, 15, 18, 21, 24, 25, 28, 30, 32, 34, 35, 36, 38, 40, 41, 42], "goodfellow": [4, 28, 32, 33, 34, 35, 38, 39, 40, 41, 43, 44], "googl": [1, 4, 21, 22, 25, 33, 41, 42], "got": [1, 6, 21, 22, 24, 27, 28, 41], "gotten": [33, 41, 42], "gov": 6, "govern": 33, "gp": 32, "gpu": [1, 13, 25, 33, 36, 41, 42], "grad": [2, 13, 21, 22, 36, 41, 42, 43], "grad_analyt": 13, "grad_ol": 18, "grad_ridg": 18, "grad_two_lay": 22, "grade": [27, 28, 29], "gradient": [0, 3, 4, 7, 8, 9, 12, 21, 24, 25, 33, 34, 38, 44], "gradient_bia": [41, 42], "gradient_desc": 36, "gradient_func": 21, "gradient_weight": [41, 42], "gradientboostingclassifi": 10, "gradientboostingregressor": 10, "gradients_of_discrimin": 4, "gradients_of_gener": 4, "gradienttap": 4, "gradual": [1, 14, 41], "grai": [4, 6, 43], "granger": [], "grant": [], "graph": [1, 9, 11, 12, 13, 16, 20, 23, 35, 36, 39, 40, 41, 42], "graph_from_dot_data": 9, "graphic": [0, 1, 9, 15, 33, 41, 42], "grasp": 0, "gray_r": [1, 3, 41, 42, 44], "grayscal": [3, 43, 44], "great": [5, 13, 15, 21, 22, 35, 36, 40, 42, 43], "greater": [1, 7, 30, 34, 39, 41], "greatli": 13, "greedi": 9, "green": [0, 3, 9, 30, 43, 44], "gregor": [41, 42], "grei": 4, "grid": [1, 3, 6, 7, 8, 12, 30, 34, 36, 37, 38, 39, 41, 42, 44], "groh": [40, 41], "grossli": [13, 35], "ground": [0, 33], "group": [0, 6, 7, 9, 14, 15, 20, 24, 25, 27, 28, 29, 31, 33, 37], "groupbi": [0, 33], "grow": [1, 3, 9, 10, 36, 41, 43, 44], "growth": [0, 33], "gru": 4, "guarante": [0, 4, 13, 30, 33, 34, 35, 36], "guess": [1, 4, 10, 13, 14, 23, 28, 35, 36, 41], "guestrin": 10, "gui": 15, "guid": [1, 21, 41, 42], "guidelin": [20, 27, 28, 38, 39], "g\u00f6ssner": [], "h": [0, 1, 5, 6, 8, 13, 15, 19, 21, 30, 31, 32, 33, 34, 35, 36, 41], "h1": [2, 42], "h_": [0, 13, 33, 35, 36], "h_0": 36, "h_1": [2, 13, 35, 42, 43, 44], "h_2": [2, 13, 35, 42, 43, 44], "h_m": 10, "h_t": 36, "ha": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "haanen": [31, 33], "habit": [0, 34], "had": [0, 1, 6, 7, 13, 33, 35, 36, 37, 38, 41], "hadamard": [1, 12, 13, 36, 40, 41], "half": [1, 8, 9, 38, 39, 40, 41, 42], "halv": 10, "hand": [0, 1, 2, 3, 5, 11, 12, 13, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 38, 39, 42, 43, 44], "handi": [3, 27, 28, 43, 44], "handl": [0, 1, 2, 5, 9, 11, 15, 18, 22, 25, 34, 35, 36, 41, 42, 43], "handle_unknown": 9, "handsid": [12, 40, 41], "handwrit": [12, 39, 40], "handwritten": [1, 5, 41], "happen": [1, 2, 3, 4, 5, 6, 10, 13, 24, 30, 34, 35, 36, 39, 41, 42, 43, 44], "hard": [1, 7, 8, 10, 13, 21, 22, 35, 36, 38, 40, 41], "hardcopi": [25, 33], "harder": [0, 1, 19, 21, 34, 41], "harmon": [3, 23], "hash": 36, "hasn": [], "hassl": [0, 25, 33], "hast": [25, 33], "hasti": [0, 6, 16, 17, 19, 20, 27, 32, 33, 34, 37, 38], "hat": [0, 1, 5, 6, 7, 9, 10, 11, 12, 13, 16, 17, 18, 19, 26, 34, 35, 36, 37, 39, 40], "hauser": [], "have": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "have_sys_un_h": [], "haven": [1, 22, 41], "he": [7, 38, 39], "head": [4, 10, 30], "header": [0, 33], "heads_proba": 10, "health": [0, 34], "hear": [0, 13, 33, 36], "heart": [0, 7, 33, 38], "heatmap": [0, 1, 3, 7, 17, 20, 24, 28, 33, 39, 41, 42, 44], "heavi": 36, "heavili": 0, "heavisid": [1, 41], "height": [1, 3, 6, 34, 41, 43, 44], "hein": [42, 43], "held": [13, 36], "help": [0, 1, 4, 12, 13, 15, 16, 27, 28, 33, 36, 37, 39, 40, 41], "helper": [4, 14, 38, 39], "henc": [0, 5, 6, 8, 9, 10, 12, 13, 33, 34, 35, 36, 37, 38, 39], "henrik": [31, 33], "her": [7, 38, 39], "here": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "hereaft": [0, 8, 12, 33], "herebi": [], "hermitian": 26, "hessenberg": 26, "hessian": [0, 2, 5, 13, 38, 39, 42, 43], "heterogen": [9, 10], "hex": [], "hi": [7, 38, 39], "hidden": [1, 3, 4, 12, 21, 23, 24, 28, 39, 44], "hidden_bia": [1, 41], "hidden_bias_gradi": [1, 40, 41], "hidden_deriv": [41, 42], "hidden_func": [41, 42], "hidden_layer_s": [0, 1, 33, 41], "hidden_neuron": 4, "hidden_nodes1": [41, 42], "hidden_nodes2": [41, 42], "hidden_weight": [1, 41], "hidden_weights_gradi": [1, 40, 41], "hierarch": [5, 34, 35], "high": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 13, 14, 21, 23, 25, 26, 27, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "higher": [0, 1, 3, 5, 6, 8, 13, 18, 23, 27, 33, 34, 35, 36, 37, 38, 41, 42], "highest": [1, 2, 38, 39, 41, 42, 43], "highli": [0, 3, 4, 10, 19, 25, 26, 28, 32, 33, 34, 35, 36], "highlight": 23, "highwai": [], "hing": 8, "hint": [13, 15, 16, 21, 22, 28, 34, 35], "hinton": 36, "hip": [25, 42], "hire": 0, "hist": [4, 6, 7, 30, 37, 39], "histogram": [6, 7, 30, 39], "histor": [7, 11, 38], "histori": [3, 4, 12, 15, 36, 39, 40, 42, 44], "hitherto": 5, "hjorth": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "hline": 23, "hobbi": 30, "hoc": [5, 34, 35], "hoff": 32, "hojjatk": 28, "hold": [1, 3, 6, 13, 14, 35, 36, 37, 41, 43, 44], "holder": [0, 33], "holdgraf_evidence_2014": [], "home": [], "homepag": [27, 28, 33], "homework": [6, 13, 35, 36], "homogen": [1, 3, 9, 10, 13, 36], "honchar": [2, 42, 43], "hop": [43, 44], "hope": 24, "hopefulli": [0, 11, 15, 19, 30, 33, 36], "horizont": 11, "horlyk": [31, 33], "hornik": 40, "hors": [3, 7, 33, 38, 39, 44], "hot": [1, 9, 38, 39, 41, 42], "hour": [1, 25, 29, 30, 31, 33, 36, 37, 41], "how": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44], "howev": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "href": [], "hspace": [0, 4, 8, 10, 30, 33, 40, 41], "hstack": [1, 41, 42], "htf": 33, "html": [0, 16, 20, 21, 25, 27, 28, 29, 31, 32, 33, 34, 35, 36, 40, 41, 42], "http": [0, 3, 4, 6, 13, 15, 16, 19, 20, 21, 22, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "huang": [0, 33], "huber": [0, 33], "huge": [1, 3, 4, 25, 36, 41, 43, 44], "human": [0, 1, 3, 6, 9, 12, 34, 39, 40, 41, 43, 44], "humid": 9, "hundr": [1, 41], "hungri": [1, 41], "hybrid": 29, "hydrogen": [0, 33], "hyper": 28, "hyperbol": [1, 4, 12, 42], "hyperparam": 8, "hyperparamat": 40, "hyperparamet": [3, 4, 5, 6, 9, 13, 18, 24, 27, 28, 34, 35, 36, 40, 42, 43, 44], "hyperplan": 11, "h\u00f8rlyk": [31, 33], "i": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40], "i0": [0, 33], "i1": [0, 6, 8, 12, 33, 34, 36, 39], "i2": [0, 8, 12, 33, 39], "i3": [0, 12, 33, 39], "i5": [0, 33], "i_": [13, 35, 36], "i_1": [5, 6, 37], "i_2": [5, 6, 37], "i_siz": [21, 22], "i_t": 36, "ian": 32, "iasuyvmceki": 43, "iayaan2": 21, "ic": [1, 27, 28, 41], "id": [7, 13, 35, 36, 38], "ida": [31, 33], "idea": [0, 1, 2, 3, 4, 6, 9, 10, 12, 13, 20, 26, 27, 28, 34, 35, 36, 37, 38, 39, 40, 41, 42], "ideal": [0, 2, 6, 8, 13, 23, 30, 33, 36, 37, 38, 39, 41, 42, 43], "idem": [6, 37, 38], "ident": [5, 6, 12, 13, 17, 18, 26, 34, 35, 39, 41, 42], "identical": 37, "identifi": [0, 1, 7, 9, 11, 12, 13, 14, 23, 33, 34, 38, 39, 41], "idx": [38, 39], "ieor": 30, "ifi": [32, 42, 43], "ifs": [25, 33], "ignor": [0, 1, 3, 9, 15, 34, 36, 41, 42, 44], "ii": [23, 26, 30, 41, 44], "iii": [26, 33, 41], "ij": [0, 1, 3, 6, 8, 12, 14, 16, 23, 26, 30, 33, 34, 36, 39, 40, 41, 42, 43, 44], "ik": [0, 26, 33, 34], "iki": [], "ilg3ggewq5u": [40, 41], "ill": 36, "illinoi": [], "illustr": [5, 7, 10, 12, 13, 14, 20, 25, 33, 38, 41], "ilsvrc": 36, "im": 6, "imag": [1, 3, 4, 6, 9, 11, 12, 14, 32, 33, 39, 40, 41, 42], "image_at_epoch_": 4, "image_batch": 4, "image_height": [3, 44], "image_path": [0, 6, 7, 9, 33, 37, 38], "image_width": [3, 44], "imageio": 6, "imagenet": 36, "images_from_seed_imag": 4, "imagin": [1, 41], "imbal": 23, "imbalanc": 23, "img": 43, "immedi": [0, 3, 4, 6, 25, 33, 36], "impact": 28, "implement": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 27, 30, 33, 34, 35, 36, 38, 39, 40, 44], "impli": [3, 5, 6, 7, 13, 26, 34, 35, 36, 37, 38, 43, 44], "implicit": [3, 36, 43, 44], "implicitli": [11, 30], "import": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 27, 28, 30, 36, 37, 38, 39, 42, 43], "importantli": 3, "importerror": [], "impos": [0, 6, 11, 12, 33, 39, 41, 42], "imposs": [0, 5, 33, 34, 35], "impract": 36, "impress": [0, 12, 33, 39, 40], "improv": [0, 4, 5, 9, 10, 11, 13, 15, 21, 27, 28, 34, 35], "impur": 9, "imread": [6, 43], "imshow": [1, 3, 4, 6, 41, 42, 43, 44], "in3050": [32, 33], "in3310": 33, "in4080": [32, 33], "in4300": [32, 33], "in4310": 32, "in5400": 3, "in5550": 32, "in_out_neuron": 4, "inaccur": [13, 35], "inact": [12, 39, 40, 41], "inadequ": [0, 33], "inappropri": 36, "inch": [6, 34], "incident": [], "includ": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 30, 31, 32, 33, 34, 35, 37, 41, 42, 43, 44], "include_bia": [6, 9, 37, 38], "inclus": 28, "incom": [12, 16, 39, 40], "incorrect": [1, 41], "incorrectli": 23, "incoveni": 8, "increas": [0, 1, 3, 4, 5, 6, 9, 12, 13, 19, 23, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42], "increasingli": 30, "increment": 36, "ind": 6, "inde": [0, 2, 4, 5, 6, 13, 33, 34, 35, 40, 42, 43], "indefinit": 4, "independ": [0, 5, 6, 7, 8, 12, 13, 30, 33, 34, 35, 36, 38, 39], "index": [0, 1, 3, 4, 10, 14, 25, 26, 27, 28, 30, 32, 33, 41, 43, 44], "index_col": [0, 33], "indic": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 16, 23, 27, 28, 33, 34, 40, 41, 42, 43, 44], "indirect": [], "indispens": [6, 37, 38], "individu": [1, 6, 7, 10, 12, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42], "indu": [], "indx": 26, "indx1": [2, 42, 43], "indx2": [2, 42, 43], "indx3": [2, 42, 43], "ineffici": [3, 13], "inequ": [8, 13], "inequaltii": 35, "inertia": 13, "inexperi": [], "inf": [], "inf1000": [25, 33], "inf1100": [25, 33], "inf1100l": [25, 33], "inf1110": [25, 33], "inf3000": 33, "infeas": [9, 36], "infer": [0, 1, 4, 6, 32, 33, 37, 38, 41, 42], "inferenc": 1, "infil": [0, 6, 7, 9, 33, 37, 38], "infin": [5, 6, 7, 11, 19, 34, 35, 37, 38, 40, 41], "infinit": [3, 36, 43, 44], "infinitesim": 30, "influenc": [6, 10, 18, 37, 38], "influenti": [1, 41], "info": 33, "inform": [0, 1, 3, 4, 6, 9, 11, 12, 13, 14, 23, 26, 27, 28, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "inforom": 15, "infrequ": 36, "infti": [3, 6, 13, 30, 35, 37, 40, 43, 44], "ingeni": [13, 35, 36], "ingredi": [0, 9, 33], "inher": [6, 36, 37, 38], "inherit": [26, 33, 36], "init": [], "initi": [0, 1, 2, 6, 10, 13, 14, 18, 26, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "inititi": [41, 42], "inject": 14, "inlin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "inn": 23, "inner": [0, 13, 34], "innerhtml": [], "inp": 4, "inplac": 13, "inpput": 40, "input": [0, 1, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 44], "input_dim": 1, "input_nod": [41, 42], "input_s": 21, "input_shap": [3, 4, 44], "inputs": 1, "inputs_shuffl": [0, 1, 34, 41], "inquiri": 20, "insert": [3, 5, 6, 8, 10, 30, 34, 35, 37], "insid": [4, 7, 21, 39], "insight": [0, 1, 5, 25, 28, 33, 34, 35, 37, 38, 40], "insist": [6, 13, 34, 36], "inspir": [0, 1, 12, 27, 28, 33, 39, 40, 41], "instabl": [2, 42, 43], "instal": [0, 1, 5, 6, 9, 15, 20, 28, 41, 42], "instanc": [0, 1, 2, 4, 6, 9, 11, 13, 16, 23, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "instanti": 10, "instead": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 17, 20, 21, 22, 24, 26, 28, 30, 33, 34, 36, 37, 41, 42, 43, 44], "institut": [1, 41], "instruct": [0, 1, 15, 41, 42], "int": [0, 1, 2, 3, 4, 5, 6, 11, 13, 14, 26, 30, 34, 36, 37, 38, 39, 41, 42, 43, 44], "int32": 10, "int_": [3, 6, 23, 30, 37, 40], "int_0": 30, "int_a": 30, "intak": [0, 34], "integ": [1, 2, 13, 14, 26, 30, 33, 38, 39, 41, 42, 43], "integer_vector": [1, 41], "integr": [3, 6, 23, 30, 33, 37, 43, 44], "intellig": [0, 14, 32, 33], "intend": 10, "intens": [1, 18, 41], "intention": 14, "interact": [0, 6, 9, 12, 25, 27, 28, 33, 39, 40], "intercept": [0, 6, 8, 11, 13, 16, 17, 18, 19, 33, 34, 35, 36, 37, 38, 39], "intercept_": [0, 6, 8, 9, 13, 33, 34, 36], "interchang": [5, 12, 26, 39, 40], "interconnect": [1, 41], "interesit": [], "interest": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 19, 25, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "interfac": [0, 1, 15, 26, 34, 41, 42], "interior": [0, 9, 33], "intermedi": [26, 34, 36], "intermediari": [21, 22], "intermeti": 22, "intermetidari": 22, "intern": [1, 10, 12, 22, 38, 39, 40, 41, 42], "internation": [], "interpol": [1, 3, 4, 6, 12, 39, 40, 41, 42, 44], "interpr": [5, 34, 35], "interpret": [0, 1, 6, 9, 10, 12, 13, 15, 16, 21, 23, 26, 27, 28, 30, 40, 41, 43, 44], "interrupt": [], "interv": [0, 3, 5, 6, 7, 13, 19, 30, 33, 34, 35, 38, 39], "intial": [13, 35], "intract": [0, 4, 34], "intrins": [3, 11, 26, 30, 33, 43, 44], "intro": [25, 32, 33], "introduc": [0, 1, 5, 6, 8, 10, 12, 24, 26, 27, 30, 33, 35, 36, 37, 39, 40, 41, 43, 44], "introduct": [1, 2, 4, 13, 24, 32, 34, 35, 36, 38, 41, 42, 43], "introductori": [0, 4, 26, 32, 33, 34], "intuit": [0, 5, 6, 8, 12, 13, 27, 33, 36, 37, 38, 39, 40, 41], "inv": [0, 5, 13, 17, 33, 34, 35, 36], "invalid": [], "invalu": [0, 13, 25, 33, 35], "invari": [1, 41, 43, 44], "invd": 5, "inver": [8, 39], "invers": [0, 3, 6, 13, 33, 34, 35, 36, 43, 44], "inverse_transform": 8, "invert": [0, 5, 7, 10, 13, 16, 18, 33, 36, 38, 39], "investig": [], "invh": [13, 36], "invok": 8, "involv": [0, 2, 6, 7, 11, 12, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "io": [0, 25, 27, 28, 29, 31, 32, 33, 34, 41, 42], "ion": [], "ip": [0, 8, 30, 33], "ipca": 11, "ipynb": [25, 33], "ipython": [0, 5, 7, 9, 11, 14, 25, 27, 28, 33, 34, 38], "iq": [6, 37], "iri": [8, 9, 21, 23], "irreduc": [6, 37, 38], "irrelev": [5, 34, 35], "irrespect": [0, 33], "irvin": [27, 28], "is_avail": [42, 44], "isaac": [], "isaacmus": [], "iseffici": [], "isn": [5, 24], "isnan": [41, 42], "isnul": [], "isolo": 22, "isomap": 11, "issu": [1, 9, 15, 26, 36, 41, 42], "it_arrai": 13, "item": [0, 13, 33, 42, 44], "items": [26, 33], "iter": [1, 2, 4, 6, 8, 13, 14, 18, 27, 30, 35, 36, 37, 38, 39, 40, 41, 42, 43], "its": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 23, 25, 26, 27, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "itself": [5, 6, 12, 27, 28, 30, 33, 34, 37, 40], "iv": 41, "ix": [41, 42], "j": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 23, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "j1": 26, "j_": 6, "j_41hld6ttu": 37, "j_lasso_sk": 6, "j_ridge_sk": 6, "j_sk": 6, "jackknif": [6, 25, 33, 37, 38], "jacobian": [2, 13, 35], "janko": [], "jason": 4, "javascript": [], "jax": [25, 28, 33, 36, 40], "jeff": [], "jensen": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "jentzen": [40, 41], "jerom": [19, 27, 32], "jhauser": [], "ji": [12, 23, 26, 40, 41], "jit": 13, "jj": [0, 5, 6, 33, 37], "jk": [0, 1, 6, 12, 26, 33, 39, 40, 41], "jl": [0, 33], "jm": 26, "jmlr": 42, "jnp": 13, "job": [2, 8, 10, 15, 42, 43], "join": [0, 4, 6, 7, 9, 24, 27, 28, 33, 37, 38, 43], "joint": [4, 5], "jonathan": [], "jpg": 43, "json": [], "judg": [13, 35, 38, 39], "judgement": 6, "julia": [25, 26, 27], "juliu": [40, 41], "jump": [30, 36], "junk": 4, "jupit": 33, "jupyt": [0, 15, 16, 19, 25, 27, 32, 33, 37, 40, 41], "jupyterbook": [], "jupytext": [], "just": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 25, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "justif": 0, "justifi": [3, 10, 43, 44], "k": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21, 23, 25, 26, 27, 30, 31, 33, 34, 35, 36, 39, 42, 43, 44], "k0": [7, 38, 39], "k1": [7, 38, 39], "kaggl": [6, 27, 28], "kajda": [41, 42], "kappa_d": 30, "karl": [31, 33], "karush": 8, "katex": [], "katrin": [31, 33], "keep": [0, 1, 4, 5, 6, 11, 13, 14, 15, 18, 21, 22, 26, 27, 28, 33, 34, 35, 36, 37, 38, 41, 43], "keepdim": [1, 6, 10, 26, 37, 38, 39, 41, 42], "kei": [1, 3, 6, 12, 36, 39, 41, 42], "kellei": [], "kenneth": [], "kept": [4, 6, 14, 37, 38], "kera": [0, 4, 25, 27, 28, 33], "kernel": [0, 1, 3, 25, 33, 34, 41, 42, 43, 44], "kernel_regular": [1, 3, 41, 42, 44], "kernel_s": 4, "kernelpca": 11, "kev": [0, 33], "kevin": [32, 33], "kevinsheppard": [], "keyboardinterrupt": [41, 42], "keyword": [18, 26, 33, 41, 42], "kfold": [6, 37, 38], "kg": [1, 41], "ki": 26, "kick": [1, 13, 36, 41], "kiener": [2, 42, 43], "kilomet": [6, 34], "kim": [], "kind": [0, 2, 3, 4, 8, 12, 13, 14, 24, 33, 34, 39, 40, 41, 42, 43, 44], "kingma": 36, "kj": [6, 12, 26, 34, 36, 40, 41, 42], "kjm": [25, 33], "kkt": 8, "kl": 30, "km": [12, 33, 39], "kmean": 14, "kmeanspoint": 14, "kn_k": 14, "know": [0, 1, 2, 5, 6, 8, 13, 15, 16, 17, 19, 20, 24, 25, 33, 34, 35, 41, 42, 43], "knowledg": [0, 25, 33], "known": [1, 3, 4, 5, 6, 7, 8, 9, 12, 18, 26, 27, 28, 30, 32, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "kondev": [0, 33], "kp": 30, "kpca": 11, "kramdown": [], "kristin": [42, 43], "kroneck": 14, "kt": [], "kuckuck": [40, 41], "kuhn": 8, "kumar": [42, 43], "kutyniok": [40, 41], "kvalsund": [31, 33], "kwarg": [41, 42], "kwown": [0, 33], "l": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 22, 23, 26, 27, 30, 33, 35, 36, 38, 39, 42, 43, 44], "l0": [7, 38, 39], "l1": [0, 1, 3, 7, 24, 28, 33, 38, 39, 41, 42, 44], "l1_l2": [1, 3, 41, 42, 44], "l1regl": 5, "l2": [1, 3, 24, 28, 41, 42, 44], "l2_reg": 42, "l_": [26, 36], "l_1": [7, 28, 38, 39, 40], "l_2": [7, 13, 28, 35, 36, 38, 39, 40], "l_i": 36, "l_j": [12, 40, 41], "l_ja": [41, 42], "la": 13, "la_": [], "la_i": [12, 40, 41, 42], "la_k": [12, 40], "lab": [20, 25, 27, 28, 33], "label": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "labelencod": [7, 10, 23, 39], "labels": [6, 8, 9], "labels_shuffl": [0, 1, 34, 41], "laboratori": 29, "lack": [0, 24, 33, 36], "lagari": [2, 42, 43], "lagrang": [8, 11], "lam": [18, 41, 42], "lambda": [0, 1, 2, 3, 5, 6, 7, 8, 10, 12, 13, 17, 18, 19, 20, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "lambda_": 11, "lambda_0": 11, "lambda_1": [5, 8, 11, 34, 35], "lambda_2": [8, 11], "lambda_i": [8, 11], "lambda_iy_i": 8, "lambda_jy_iy_j": 8, "lambda_k": 8, "lambda_n": [5, 8, 34, 35], "lamda": 1, "land": 8, "landmark": 8, "landscap": [13, 18, 35, 36], "langl": [0, 6, 11, 30, 33, 34], "languag": [0, 1, 4, 8, 25, 26, 27, 28, 32, 33, 41], "lapack": [26, 33], "laplac": 5, "laptop": [15, 25], "larg": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 13, 18, 25, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43], "larger": [0, 3, 5, 6, 8, 10, 11, 13, 17, 22, 23, 30, 33, 34, 35, 36, 37, 43, 44], "largest": [4, 8, 11], "larn": [43, 44], "lasso": [0, 7, 25, 28, 33, 36, 37, 38, 39], "lasso_sk": 6, "last": [0, 1, 3, 4, 5, 6, 7, 8, 12, 16, 17, 19, 21, 22, 26, 27, 30, 31, 33, 35, 37, 38, 42, 43], "latent": 4, "latent_dim": 4, "latent_point": 4, "latent_space_value_rang": 4, "later": [0, 1, 4, 7, 8, 12, 13, 14, 15, 19, 21, 22, 25, 27, 28, 33, 36, 38, 39, 40, 41, 42, 43], "latest": [4, 15, 25], "latest_checkpoint": 4, "latex": [20, 33], "latexcodec": [], "latrpygrtttbnjr3znuhl": 22, "latter": [0, 3, 6, 7, 8, 11, 13, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 43, 44], "lattic": [12, 39, 40], "law": 0, "layer": [0, 4, 13, 23, 24, 28, 33, 36, 39], "layer_grad": 22, "layer_input": 22, "layer_output_s": [21, 22], "layers_grad": 21, "lbfg": [7, 9, 10, 23, 39], "lc_messag": [], "lcc": [5, 6, 37], "lda": 11, "ldot": [0, 6, 11, 27, 33, 37, 38], "le": [5, 7, 10, 13, 17, 30, 34, 35, 36, 38, 43, 44], "lead": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 21, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "leaf": 9, "leaki": [1, 28, 41, 42], "leakyrelu": [4, 28], "lear": [13, 35], "learn": [3, 4, 5, 6, 7, 8, 9, 10, 12, 21, 23, 24, 26, 31, 32, 44], "learnabl": [3, 43, 44], "learner": 10, "learnig": 33, "learning_r": [8, 10, 21, 42], "learning_rate_init": [0, 1, 33, 41], "learning_schedul": [13, 36], "learnt": [27, 28], "least": [0, 7, 8, 10, 11, 17, 18, 24, 25, 26, 30, 37, 38, 39], "leat": [13, 36], "leav": [0, 1, 3, 5, 6, 9, 11, 21, 33, 35, 37, 38, 41, 43, 44], "lectur": [0, 1, 5, 10, 11, 12, 13, 25, 26, 27, 28, 29, 31, 32, 34], "lecture_11_backpropag": 42, "lecturenot": [0, 25, 27, 28, 32, 33, 41, 42], "left": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "leftarrow": [8, 12, 40, 41, 42], "legend": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 15, 21, 33, 34, 35, 36, 37, 38, 39, 42, 43, 44], "legend_el": 21, "leinonen": 33, "len": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 16, 17, 21, 22, 26, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "length": [0, 1, 3, 4, 8, 9, 13, 16, 21, 25, 33, 34, 35, 36, 41, 43, 44], "length_of_sequ": 4, "leq": [0, 5, 7, 8, 13, 14, 30, 33, 34, 35, 36, 38], "less": [0, 1, 3, 4, 5, 6, 8, 9, 13, 25, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "lessen": [1, 41], "let": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "letter": [0, 16, 26, 30, 33, 34], "level": [0, 1, 5, 6, 9, 23, 25, 26, 27, 28, 29, 31, 33, 36, 37, 38, 40, 41, 42], "leverag": 36, "lexer": [], "li": [8, 11], "liabil": [], "liabl": [], "lib": [], "liberti": 36, "liblinear": 10, "librari": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 26, 27, 30, 32, 34, 35, 36, 41, 42, 43, 44], "licenc": [], "licens": [0, 1, 25, 27, 33, 41, 42], "lie": [0, 6, 11, 30, 33, 34, 37, 38], "life": [0, 1, 8, 12, 33, 39, 40, 41, 42], "lifetim": 13, "lift": 23, "light": [], "like": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 16, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "likelihood": [0, 1, 5, 9, 33, 34, 41], "lim_": 30, "limit": [0, 5, 6, 8, 12, 26, 27, 28, 33, 34, 38, 39, 40], "lin_clf": 8, "lin_model": [], "lin_reg": 9, "linalg": [0, 2, 5, 6, 8, 11, 13, 17, 26, 30, 33, 34, 35, 36, 39, 42, 43], "line": [0, 3, 6, 8, 11, 13, 15, 16, 20, 21, 23, 33, 35, 36, 37, 40, 41, 42, 44], "line1": 8, "line2": 8, "line2d": [], "line3": 8, "line_model": 15, "line_ms": 15, "line_predict": 15, "linear": [1, 3, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 24, 25, 27, 28, 30, 36, 37, 39, 40, 41, 42, 43, 44], "linear_model": [0, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39], "linear_regress": [6, 37, 38, 41, 42], "linearli": [5, 34, 35, 36], "linearloc": [6, 13, 35, 36], "linearregress": [0, 6, 7, 9, 15, 16, 19, 33, 34, 36, 37, 38], "linearsvc": 8, "lineat": 35, "liner": [1, 3, 41, 44], "linerar": 10, "linewidth": [0, 2, 4, 6, 8, 9, 10, 37, 42, 43], "link": [0, 4, 9, 12, 15, 20, 21, 24, 25, 27, 28, 29, 31, 33, 38, 40], "linlag": 5, "linpack": [26, 33], "linreg": [0, 33], "linspac": [0, 2, 3, 4, 6, 8, 9, 10, 13, 16, 17, 19, 26, 30, 33, 34, 36, 37, 38, 42, 43], "linu": 4, "linux": [0, 1, 25, 27, 33, 41, 42], "liquid": [0, 33], "list": [1, 2, 3, 4, 9, 15, 21, 22, 24, 25, 27, 28, 33, 36, 39, 42, 43, 44], "list_physical_devic": 42, "listedcolormap": [9, 10], "literatur": [1, 7, 14, 32, 37, 38, 41], "littl": [1, 3, 9, 12, 22, 36, 40, 41, 44], "live": [8, 16], "ll": [0, 18, 30, 33, 34], "lle": [0, 34], "llm": 20, "lloyd": [4, 14], "lmb": [0, 2, 5, 6, 34, 35, 36, 37, 38, 42, 43], "lmbd": [0, 1, 3, 33, 41, 42, 44], "lmbd_val": [0, 1, 3, 33, 41, 42, 44], "lmbda": [13, 35, 36], "ln": [1, 13, 35, 41, 43], "load": [1, 4, 6, 7, 9, 10, 23, 36, 39, 42, 44], "load_boston": [], "load_breast_canc": [1, 7, 9, 10, 11, 39, 41, 42], "load_data": [3, 4, 42, 44], "load_digit": [1, 3, 23, 41, 42, 44], "load_iri": [8, 9, 21, 23], "loader": 44, "loc": [3, 6, 7, 8, 9, 10, 21, 33, 37, 38, 39, 44], "local": [0, 1, 3, 7, 12, 13, 15, 21, 22, 34, 35, 36, 38, 39, 40, 41, 43, 44], "locat": [2, 3, 8, 15, 42, 43, 44], "log": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 13, 15, 20, 21, 26, 27, 28, 33, 36, 37, 38, 39, 41, 42, 43], "log10": [0, 5, 6, 34, 35, 36, 37, 38, 41, 42, 43], "log_": [0, 33], "log_clf": 10, "logarithm": [0, 5, 7, 17, 26, 33, 37, 38, 39], "logbook": [27, 28], "logic": [0, 1, 9, 33, 41], "logical_or": [], "login": 15, "logist": [0, 1, 2, 8, 9, 10, 11, 12, 13, 23, 24, 25, 28, 34, 35, 36, 40, 42, 43], "logisti": 28, "logistic_regress": [41, 42], "logisticregress": [7, 9, 10, 11, 23, 28, 38, 39], "logit": [7, 28, 38, 39, 42], "logreg": [7, 9, 10, 11, 23, 39], "logspac": [0, 1, 3, 5, 6, 33, 34, 35, 36, 37, 38, 41, 42, 44], "long": [0, 1, 3, 4, 12, 13, 21, 33, 35, 36, 39, 40, 41], "longer": [2, 3, 8, 10, 14, 26, 30, 33, 36, 42, 43, 44], "loocv": [6, 37, 38], "look": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 19, 20, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "loop": [1, 4, 6, 10, 12, 14, 16, 17, 18, 22, 25, 26, 33, 36, 37, 38, 41, 42, 44], "lose": [1, 41], "loss": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 13, 18, 21, 24, 26, 27, 28, 33, 37, 38, 39, 40, 41, 42, 43, 44], "loss_bin": [38, 39], "loss_fil": 4, "loss_multi": [38, 39], "loss_vec": [38, 39], "lossfil": 4, "lost": 4, "lot": [1, 4, 6, 16, 19, 20, 24, 36, 37, 41, 42], "low": [0, 6, 9, 10, 11, 27, 33, 34, 37, 38], "lower": [0, 1, 3, 6, 9, 10, 16, 21, 26, 34, 36, 41, 42, 44], "lowercas": [26, 33], "lowest": [9, 13, 30, 36], "lr": [1, 3, 4, 10, 38, 39, 41, 42, 44], "lrelu": [41, 42], "lstat": [], "lstm": 4, "lstm_2layer": 4, "lstsq": [0, 33, 34], "lt": [6, 37], "lu": [0, 5, 33, 34, 35], "lubksb": 26, "luckili": [2, 42, 43], "ludcmp": 26, "lux": 26, "lvert": [1, 41], "lw": [0, 33], "lwwrf64f4qkqt": 43, "m": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 15, 26, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44], "m_": [9, 12, 40, 41], "m_0": 36, "m_1": 14, "m_h": [0, 33], "m_k": 14, "m_l": [12, 40, 41], "m_n": [0, 33], "m_p": [0, 33], "m_t": [13, 36], "ma": 11, "machin": [1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 26, 32, 34, 36, 37, 40, 41, 42, 43, 44], "machinelearn": [0, 6, 16, 20, 25, 27, 28, 29, 31, 32, 33, 34, 35, 38, 39, 41, 42, 43, 44], "machineri": 23, "mackai": 32, "macro": 23, "made": [0, 1, 3, 4, 5, 6, 7, 9, 11, 12, 27, 28, 33, 34, 36, 38, 39, 40, 41, 43, 44], "mae": [0, 33], "magic": 4, "magnitud": [1, 6, 7, 13, 21, 34, 36, 39, 40, 41], "mai": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 19, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "mail": [29, 31], "main": [0, 1, 3, 4, 5, 6, 7, 9, 24, 26, 27, 28, 32, 34, 35, 36, 38, 39, 41, 43, 44], "mainli": [0, 5, 6, 7, 9, 33, 34, 37, 38, 39], "maintain": [6, 36, 37], "major": [1, 6, 9, 10, 13, 26, 33, 35, 36, 37, 38, 41], "make": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "make_axes_locat": 6, "make_classif": [23, 39], "make_moon": [8, 9, 10], "make_pipelin": [0, 6, 10, 34, 37, 38], "makedir": [0, 6, 7, 9, 33, 37, 38], "malcondit": 26, "malign": [1, 7, 9, 39], "mammographi": 5, "manag": [0, 2, 3, 15, 25, 27, 33, 36, 42, 43, 44], "mandatori": [31, 33], "mani": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "manifold": 11, "manner": [3, 23, 43, 44], "manual": [6, 21, 22, 34, 36], "map": [0, 1, 2, 6, 7, 8, 11, 12, 14, 30, 33, 38, 39, 41, 42, 43, 44], "marc": 34, "marchant": [], "margin": [0, 5, 8], "marit": [0, 33], "mark": 33, "markdownfil": [], "markdownit": [], "markdownitdeflist": [], "markedli": [], "marker": [7, 26, 33, 38], "market": 23, "markov": [25, 33], "markup": [], "marsaglia": 30, "mask_or": [], "masked_arrai": [], "maskedrecord": [], "mass": [0, 1, 5, 13, 34, 35, 41], "massag": [0, 33], "masses2016": [0, 33], "masses2016ol": [0, 33], "masses2016tre": 0, "masseval2016": [0, 33], "master": [29, 31], "mat": [25, 33], "mat1100": [25, 33], "mat1110": [25, 33], "mat1120": [25, 33], "match": [1, 4, 5, 13, 14, 15, 34, 35, 36, 41], "materi": [4, 5, 7, 13, 15, 26, 29, 31, 39, 42], "math": [3, 7, 12, 13, 26, 30, 32, 33, 36, 38, 39, 41, 42, 43], "mathbb": [0, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 19, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40], "mathbf": [0, 5, 6, 7, 8, 13, 19, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40], "mathcal": [1, 5, 6, 7, 13, 27, 37, 38, 39, 41], "matheemat": 3, "mathemat": [0, 6, 11, 12, 13, 21, 23, 24, 25, 26, 30, 32, 33, 36], "mathemati": 33, "mathrm": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 23, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "matmul": [1, 2, 5, 40, 41, 42, 43], "matnat": 32, "matplotlib": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 23, 25, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "matplotlibrc": [], "matric": [0, 1, 3, 4, 6, 7, 8, 11, 13, 16, 17, 25, 34, 35, 38, 39, 40, 41, 42], "matrix": [0, 2, 3, 4, 6, 7, 8, 10, 13, 17, 18, 19, 21, 27, 28, 30, 37, 38, 40, 42, 43, 44], "matshow": 1, "matter": [2, 3, 13, 34, 35, 36, 40, 42, 43, 44], "matthia": [], "max": [0, 1, 2, 3, 4, 9, 10, 12, 13, 21, 31, 33, 35, 36, 38, 39, 40, 41, 42, 43, 44], "max_depth": [0, 9, 10], "max_diff": [2, 42, 43], "max_diff1": [2, 42, 43], "max_diff2": [2, 42, 43], "max_it": [0, 1, 8, 13, 28, 33, 39, 41], "max_iter": 14, "max_leaf_nod": 10, "max_pixel": 43, "max_sampl": 10, "maxdegre": [0, 6, 10, 34, 37, 38], "maxdepth": 10, "maxim": [1, 4, 5, 7, 8, 11, 37, 38, 39, 41], "maximum": [0, 2, 3, 5, 7, 8, 9, 10, 13, 14, 33, 34, 35, 36, 42, 43, 44], "maxpolydegre": [5, 6, 34, 35, 36, 37, 38], "maxpool2d": 44, "maxpooling2d": [3, 44], "mayb": 24, "mbox": [5, 6, 34, 35, 37], "mcculloch": [12, 39, 40], "md": 11, "mdoel": 4, "me": [], "mean": [1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 22, 23, 25, 26, 27, 28, 30, 33, 36, 37, 39, 40, 41, 42, 43, 44], "mean0": [38, 39], "mean1": [38, 39], "mean_absolute_error": [0, 33], "mean_divisor": 14, "mean_i": 30, "mean_matrix": 14, "mean_squared_error": [0, 4, 6, 7, 10, 15, 19, 33, 34, 37, 38], "mean_squared_log_error": [0, 33], "mean_vector": 14, "mean_x": 30, "meaning": [0, 4, 7, 33, 38], "meansquarederror": [0, 33], "meant": [3, 7, 10, 13, 24, 38, 40, 43, 44], "meanwhil": 36, "measur": [0, 1, 2, 5, 6, 9, 11, 12, 14, 16, 18, 27, 28, 30, 33, 34, 36, 37, 38, 40, 41, 42, 43, 44], "mechan": [0, 4, 30, 33, 36], "median": [0, 33, 34, 36], "medicin": [12, 39, 40], "medium": [4, 8, 13, 28, 36], "medv": [], "meet": [0, 24, 31], "mehta": [0, 33, 34, 35], "member": [20, 27, 28], "memori": [3, 4, 11, 12, 13, 18, 26, 39, 40], "mentat": [], "mention": [0, 12, 13, 24, 27, 28, 30, 33, 35, 36, 39, 40, 43, 44], "merchant": [], "mere": [0, 27, 28], "merg": [], "meshgrid": [2, 5, 6, 8, 9, 10, 11, 41, 42, 43], "mess": 15, "messag": [5, 13], "messi": [2, 42, 43], "messier": 22, "met": [0, 3, 8, 34], "meta": [], "meteorolog": 9, "meter": [6, 34], "method": [0, 1, 2, 3, 4, 5, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 30, 32, 34, 40, 43, 44], "metion": 6, "metric": [0, 1, 3, 6, 7, 9, 10, 14, 15, 21, 22, 23, 28, 33, 34, 37, 38, 39, 41, 42, 44], "metropoli": [25, 33], "mev": [0, 30, 33], "mgd": [13, 36], "mglearn": [25, 33], "mgrid": 13, "mhjensen": [], "mi": 10, "mia": [31, 33], "michael": [28, 40, 41], "micro": 23, "microsoft": 32, "mid": [1, 41], "midel": 4, "midnight": [15, 21, 22, 23, 24], "midpoint": 9, "might": [0, 1, 2, 4, 6, 9, 13, 15, 17, 18, 22, 24, 34, 35, 36, 41, 42, 43], "migth": 17, "mild": 9, "millimet": [6, 34], "million": [0, 33, 34, 36], "mimic": [12, 39, 40], "min": [0, 2, 5, 8, 9, 35, 42, 43], "min_": [0, 2, 5, 14, 17, 33, 34, 35, 42, 43], "min_samples_leaf": 9, "mind": [0, 6, 13, 15, 18, 21, 33, 34, 35, 36, 37], "mindboard": 4, "mine": [25, 33], "mini": [1, 11, 12, 13, 35, 41], "minibatch": [1, 11, 13, 41, 42], "minibathc": [13, 36], "miniforge3": [], "minim": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 34, 35, 36, 37, 41, 44], "minima": [0, 1, 7, 13, 33, 35, 36, 38, 39, 41], "minimum": [0, 1, 2, 6, 8, 9, 11, 13, 34, 35, 36, 37, 38, 39, 41, 42, 43], "minmaxscal": [0, 34, 36, 41, 42], "minor": 30, "minst": [1, 41, 42], "minu": [7, 38], "mirjalili": 33, "mirror": 9, "misc": 6, "misclassif": [8, 9, 10, 23], "misclassifi": [8, 10], "miser": 0, "mismatch": [1, 41], "miss": [7, 10], "mistak": [4, 19], "mit": [32, 43, 44], "mitig": 36, "mix": [1, 2, 33, 41, 42, 43], "mixtur": [13, 36], "mk": [9, 26], "mkdir": [0, 6, 7, 9, 33, 37, 38], "ml": [0, 1, 10, 13, 26, 27, 28, 34, 35, 36, 41, 42, 43], "mlab": 30, "mle": [5, 7, 38, 39], "mlp": [1, 39, 40, 41], "mlpclassifi": [1, 39, 41], "mlpregressor": [0, 33], "mm": 26, "mml": 34, "mn": [12, 30, 39], "mnist": [1, 11, 23, 28, 41, 43], "mnist_784": 28, "mo": [], "mod": 30, "mode": [29, 31, 33, 38, 39, 41, 42], "model": [2, 3, 5, 7, 8, 9, 10, 11, 13, 14, 16, 18, 19, 20, 21, 23, 24, 25, 27, 28, 30, 32, 34, 35, 36, 37, 38, 42], "model_bin": [38, 39], "model_multi": [38, 39], "model_select": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "moder": [10, 36], "modern": [0, 6, 7, 25, 33, 36, 37, 38, 39, 40, 41], "modest": 36, "modif": [2, 12, 13, 42, 43], "modifi": [0, 1, 3, 5, 7, 8, 10, 12, 13, 33, 34, 35, 36, 38, 39, 40, 41, 43, 44], "modul": [0, 16, 26, 33, 42, 44], "modular": 30, "modulo": 30, "moe": [11, 34], "moment": [5, 6, 13, 30, 37, 41, 42], "moment_correct": [41, 42], "momentum": [22, 24, 40, 41, 42], "momentum_schedul": [41, 42], "mondai": [31, 33, 38], "monitor": [13, 36, 42], "monoton": [5, 12, 30, 37, 39, 40, 41, 42], "mont": [0, 6, 25, 30, 32, 33, 37, 38], "montli": 16, "moor": [5, 6], "more": [0, 1, 2, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 22, 23, 25, 28, 30], "moreov": [0, 3, 28, 43, 44], "morten": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "mortenhj": 33, "most": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21, 22, 25, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "mostli": [1, 11, 18, 36, 41], "motion": [0, 13], "motiv": [1, 4, 40, 41], "moulin": 36, "move": [0, 4, 5, 6, 7, 9, 12, 13, 14, 15, 16, 21, 22, 27, 30, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "mpl": [7, 33, 38], "mpl_toolkit": [2, 6, 13, 35, 36, 42, 43], "mplot3d": [2, 6, 13, 35, 36, 42, 43], "mplregressor": [1, 41], "mqa": 43, "mr_": [], "mrecord": [], "ms3tv8fvar": 39, "mse": [0, 4, 5, 6, 9, 10, 15, 16, 17, 19, 20, 22, 27, 28, 33, 34, 35, 36, 37, 38, 41, 42], "mse_der": 22, "mse_simpletre": 10, "mselassopredict": [5, 35], "mselassotrain": [5, 35], "mseownridgepredict": [6, 34, 35, 36], "msepredict": [5, 35], "mseridgepredict": [0, 5, 6, 34, 35, 36], "msetrain": [5, 35], "msg": [], "msle": [0, 33], "mt": [7, 12, 38, 39, 41], "mu": [0, 6, 11, 13, 30, 33, 36, 37], "mu0": 30, "mu1": 30, "mu2": 30, "mu_": [6, 30, 34, 36, 37], "mu_i": [6, 34, 36], "mu_n": 11, "mu_x": 30, "much": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 20, 21, 22, 26, 27, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "multi": [0, 1, 3, 7, 23, 25, 33, 38, 42, 43, 44], "multi_class": [28, 38, 39], "multiclass": [1, 7, 23, 28, 38, 39], "multiclass_result": [38, 39], "multidimension": [11, 12, 33, 39, 40], "multilay": [1, 41], "multinomi": [7, 28, 38, 39], "multipl": [2, 4, 5, 6, 7, 12, 13, 15, 22, 28, 30, 34, 35, 36, 37, 38, 39, 40, 42], "multipli": [3, 5, 6, 11, 13, 18, 22, 26, 30, 34, 35, 36, 43, 44], "multiplum": 8, "multivari": [0, 2, 10, 11, 25, 30, 33, 42, 43], "multivariate_norm": [11, 14], "multpli": 16, "murphi": [11, 32, 33], "muse": [], "must": [1, 2, 5, 6, 8, 10, 12, 13, 14, 15, 20, 22, 27, 28, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "mutat": [7, 38, 39], "mutual": [1, 3, 6, 13, 37, 38, 41, 42, 44], "mx_": 30, "my": 33, "mydata": 23, "myenv": [], "myriad": [0, 25, 33], "myself": [], "mz1": 30, "mz2": 30, "m\u00f8svatn": 6, "n": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "n0": [38, 39], "n1": [26, 38, 39], "n2": 26, "n8grai": [], "n_": [1, 2, 3, 8, 12, 23, 30, 39, 41, 42, 43, 44], "n_0": [12, 30, 39], "n_boostrap": [6, 10, 37, 38], "n_bootstrap": [6, 37], "n_categori": [1, 3, 41, 42, 44], "n_class": [23, 38, 39], "n_cluster": 14, "n_compon": 11, "n_epoch": [13, 36, 41, 42], "n_estim": 10, "n_examples_to_gener": 4, "n_featur": [1, 18, 23, 38, 39, 40, 41, 42], "n_filter": [3, 44], "n_hidden": [2, 42, 43], "n_hidden_neuron": [0, 1, 33, 40, 41], "n_i": [23, 30], "n_inform": 23, "n_input": [0, 1, 3, 34, 40, 41, 42, 44], "n_instanc": 9, "n_iter": 36, "n_job": 10, "n_k": 14, "n_l": [12, 30, 39], "n_layer": 1, "n_m": 9, "n_neuron": 1, "n_neurons_connect": [3, 44], "n_neurons_layer1": [1, 41, 42], "n_neurons_layer2": [1, 41, 42], "n_output": [40, 41], "n_point": 14, "n_redund": 23, "n_sampl": [6, 8, 9, 10, 14, 18, 23, 37, 38, 39], "n_split": [6, 37, 38], "n_step": 4, "n_t": [2, 42, 43], "n_x": [2, 42, 43], "nabla": [1, 13, 35, 36, 41], "nabla_": [2, 13, 35, 36, 42, 43], "nabla_w": 13, "nafter": [41, 42], "nag": 13, "naimi": [0, 33], "naiv": [7, 38, 39], "naive_kmean": 14, "name": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 18, 20, 21, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "namespac": [], "nan": [41, 42], "narrow": [13, 36], "nathaniel": [], "nation": [1, 5, 41], "nativ": [25, 33], "natur": [0, 1, 4, 8, 9, 12, 13, 27, 28, 30, 32, 33, 35, 36, 39, 40, 41], "navier": [12, 39, 40], "navig": [15, 36], "nb": 30, "nb_": 26, "nbconvert": 33, "nd": 14, "ndarrai": [6, 41, 42], "nderiv": [41, 42], "ne": [9, 10, 26, 30, 34, 35], "nearest": [1, 3, 6, 11, 41, 42, 44], "nearli": [13, 35], "neat": 33, "neccesari": [6, 37], "necess": [2, 42, 43], "necessari": [0, 1, 3, 4, 8, 14, 18, 33, 40, 41, 42, 44], "necessarili": [0, 4, 11, 30, 33], "necesserali": 5, "neck": [7, 38, 39], "need": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 24, 26, 28, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "neg": [0, 1, 3, 5, 6, 7, 10, 13, 23, 26, 30, 33, 35, 37, 38, 39, 41, 42, 43, 44], "neg_mean_squared_error": [6, 37, 38], "neglect": [30, 36], "neglig": 30, "neighbor": [3, 6, 11, 43, 44], "neither": [4, 13, 36], "neq": [13, 14, 23, 30, 35], "nerual": [41, 42], "nervou": [12, 39, 40], "nest": [9, 12, 39], "nesterov": 13, "net": [2, 4, 12, 28, 39, 40, 42, 43], "netlib": [26, 33], "network": [0, 9, 13, 21, 22, 23, 24, 25, 32, 34], "network_input_s": [21, 22], "neural": [0, 13, 21, 22, 23, 24, 25, 32, 34, 38], "neural_network": [0, 1, 2, 33, 39, 41, 42, 43], "neuralnet": 42, "neuralnetwork": [1, 22, 41], "neuralnetworksanddeeplearn": [28, 40, 41], "neuron": [1, 2, 3, 4, 12, 41, 42], "neutral": [0, 33], "neutron": [0, 33], "never": [1, 4, 6, 9, 30, 37, 38, 41], "new": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 20, 22, 26, 33, 34, 35, 36, 38, 39, 41, 42], "new_chang": [13, 36], "new_hobbit": 33, "new_ma": [], "newaxi": [0, 3, 6, 9, 21, 37, 38, 44], "newli": [0, 33], "newlin": [38, 39], "newton": [1, 7, 8, 13, 30, 40, 41], "next": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 21, 22, 23, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44], "next_guess": 13, "next_input": 4, "ng": [1, 41], "nhow": [41, 42], "ni": 14, "nice": [0, 1, 5, 11, 22, 33, 34, 35, 41], "nicer": [18, 36], "nielsen": [28, 40, 41], "nine": [40, 41], "nip": 36, "niter": [13, 35, 36], "nitric": [], "nlambda": [0, 5, 6, 34, 35, 36, 37, 38], "nlp": 32, "nm": 30, "nm_n": [0, 33], "nmse": [6, 37, 38], "nn": [2, 5, 6, 12, 24, 26, 33, 37, 39], "nn_model": 1, "nnmin": [2, 42, 43], "no_grad": [42, 44], "node": [1, 3, 9, 10, 12, 21, 24, 28, 39, 42, 43, 44], "nois": [0, 4, 5, 6, 8, 9, 10, 13, 18, 19, 27, 33, 34, 35, 36, 37, 38, 43], "noise_dimens": 4, "noisi": [1, 6, 27, 36, 37, 38, 41], "nomask": [], "non": [0, 1, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 21, 24, 26, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "nondifferenti": 36, "none": [0, 1, 2, 4, 5, 9, 10, 13, 30, 33, 34, 38, 39, 40, 41, 42, 43], "noninfring": [], "nonlinear": [3, 6, 8, 9, 11, 12, 37, 38, 39, 40, 43, 44], "nonneg": [6, 9, 13, 35, 37, 38], "nonparametr": 6, "nonsens": 30, "nonsingular": 26, "nonumb": [3, 7, 8, 13, 26, 38, 39], "nor": [1, 4, 13, 22, 36, 40, 41], "norm": [0, 1, 5, 6, 8, 11, 13, 18, 33, 34, 35, 36, 37, 40, 41], "normal": [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 38, 39, 40, 42, 43, 44], "normali": [26, 33], "norwai": [6, 27, 28, 33, 35, 36, 37, 39, 40, 41, 42, 43], "notabl": [], "notat": [0, 2, 5, 6, 13, 14, 30, 33, 34, 35, 37, 38, 40, 41, 42, 43], "note": [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 18, 22, 25, 26, 30, 32, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44], "notebook": [0, 1, 3, 9, 15, 16, 19, 20, 21, 22, 25, 27, 28, 33, 37, 40, 41, 42, 44], "noteworthi": 36, "noth": [1, 2, 5, 8, 12, 14, 30, 34, 35, 39, 41, 42, 43], "notic": [4, 5, 12, 13, 22, 24, 26, 30, 33, 40, 41], "notimplementederror": [41, 42], "notion": [3, 43, 44], "noutput": [41, 42], "novel": [3, 6, 10, 33, 44], "novemb": [1, 31, 33, 41, 42], "now": [0, 2, 4, 5, 6, 7, 8, 10, 11, 12, 14, 15, 16, 19, 21, 22, 25, 26, 27, 28, 30, 33, 34, 39, 40, 41, 42, 43, 44], "nowadai": [0, 1, 3, 9, 25, 33, 41, 42, 43, 44], "nox": [], "np": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "npm": [], "npr": [2, 42, 43], "nsampl": [6, 37, 38], "nt": [2, 42, 43], "nu": 30, "nuclear": [5, 34, 35], "nuclei": [0, 30, 33], "nucleon": [0, 33], "nucleu": [0, 33], "num": 4, "num_coordin": [2, 42, 43], "num_epoch": [42, 44], "num_equ": [41, 42], "num_hidden_neuron": [2, 42, 43], "num_it": [2, 18, 42, 43], "num_neuron": [2, 42, 43], "num_neurons_hidden": [2, 42, 43], "num_not": [41, 42], "num_point": [2, 42, 43], "num_tre": 10, "num_valu": [2, 42, 43], "number": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 23, 26, 27, 28, 29, 31, 33, 35, 37, 38, 39, 41], "numberid": [7, 38], "numberparamet": [3, 43, 44], "numer": [0, 5, 6, 9, 10, 11, 12, 13, 21, 25, 26, 32, 33, 34, 35, 36, 37, 38, 39, 40], "numpi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 25, 27, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "numpydocstr": [], "nunmpi": [5, 34], "nve_frngahw": 35, "nx": [2, 42, 43], "ny": [30, 41, 42], "o": [0, 1, 4, 5, 6, 7, 8, 9, 11, 26, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44], "obei": [6, 11, 13, 34, 36], "object": [0, 1, 4, 8, 10, 15, 19, 26, 33, 36, 40, 42], "obliqu": [5, 34, 35], "observ": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 30, 33, 35, 36, 37, 38, 39, 43, 44], "obtain": [0, 1, 5, 6, 7, 8, 9, 10, 12, 13, 14, 17, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "obviou": [5, 6, 11, 30, 34, 35], "obviouli": 33, "obvious": [0, 4, 5, 6, 26, 33, 37], "oc": [34, 35], "occupi": [], "occur": [0, 6, 8, 9, 23, 26, 30, 33], "octob": [21, 22, 23, 24, 28, 31, 33, 39], "od": 0, "odd": [0, 3, 7, 33, 34, 36, 38, 39, 43, 44], "odenum": [2, 42, 43], "odesi": [2, 42, 43], "oen": 0, "off": [1, 3, 4, 5, 9, 13, 20, 23, 28, 30, 36, 37, 41, 42, 43, 44], "offer": [6, 11, 25, 26, 29, 31, 33, 37, 38], "offic": [31, 33], "offici": [29, 33], "offlin": [21, 22], "often": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "ofter": [26, 33], "og": [41, 42], "ogo": [43, 44], "ol": [0, 13, 17, 19, 28, 34, 36, 38], "old": [1, 5, 10, 13, 15, 18, 38, 39, 41], "old_ma": [], "oliph": [], "ols_paramet": 16, "ols_sk": 6, "ols_svd": 6, "olsbeta": 35, "olstheta": [0, 5], "omega": [2, 3, 6, 42, 43], "omega_0": 3, "omit": [0, 5, 33, 34, 35, 37], "onc": [1, 6, 9, 11, 13, 20, 37, 38, 41, 42], "one": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 19, 20, 21, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 36, 37, 38, 39], "one_hot": [38, 39], "one_hot_predict": 21, "onehot": [1, 41, 42], "onehot_vector": [1, 41], "onehotencod": 9, "ones": [0, 2, 5, 6, 8, 9, 10, 11, 13, 16, 18, 21, 22, 26, 27, 33, 34, 35, 36, 37, 38, 40, 42, 43], "ones_lik": 4, "ong": 34, "onl": [3, 43, 44], "onli": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "onlin": [11, 15, 20, 29, 36, 40, 41], "onto": [5, 11, 34, 35], "open": [0, 1, 4, 6, 7, 9, 15, 25, 27, 29, 31, 33, 37, 38, 39, 41, 42, 43, 44], "oper": [0, 1, 3, 5, 6, 10, 11, 12, 13, 15, 16, 21, 22, 23, 25, 30, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44], "operation": 30, "oplu": 30, "opmiz": [13, 36], "opportun": 0, "oppos": [6, 13], "opposit": [1, 5, 8, 34, 35, 41], "opt": [1, 5, 27, 28, 33, 35, 41, 42], "optim": [0, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14, 16, 17, 19, 21, 22, 24, 27, 28, 37, 42, 43, 44], "optimis": [1, 3, 41, 42, 44], "option": [0, 1, 3, 5, 6, 8, 11, 15, 18, 23, 26, 34, 36, 37, 41, 42, 43, 44], "optmiz": [1, 8, 13, 34, 41], "oral": 33, "orang": 0, "order": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 15, 19, 21, 26, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "ordinari": [0, 2, 3, 7, 11, 13, 17, 18, 24, 25, 37, 38, 39, 44], "oreilli": [32, 33], "org": [0, 3, 4, 16, 20, 21, 25, 26, 27, 28, 32, 33, 34, 35, 36, 40, 42], "organ": [6, 7, 10, 26, 37, 38], "orgin": 40, "orient": [1, 5, 30, 34, 35, 42], "origin": [0, 3, 5, 6, 8, 11, 12, 13, 15, 26, 33, 34, 35, 36, 37, 38, 39, 43, 44], "originals": 43, "orthogn": [5, 34, 35], "orthogon": [0, 5, 6, 8, 11, 13, 26, 33, 34, 35, 43], "orthonorm": [5, 34, 35], "os": [31, 33], "oscar": [1, 41], "oscil": [3, 13, 36], "oskar": 33, "oskarlei": 33, "osl": 18, "oslo": [0, 25, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "osx": [0, 25, 27, 33], "other": [0, 1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 19, 21, 22, 25, 29, 30, 31, 32, 34, 35, 36, 37, 38, 43, 44], "otherwis": [0, 1, 4, 7, 13, 26, 28, 33, 36, 38, 39, 41], "ouput": [5, 7, 12, 37, 38, 42], "our": [1, 2, 3, 6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18, 19, 21, 25, 26, 30, 36, 37, 40, 43, 44], "ourmodel": 0, "ourselv": [0, 5, 6, 8, 11, 13, 33, 34, 35, 37], "out": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 21, 22, 25, 26, 27, 28, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "out_deriv": [41, 42], "out_fil": 9, "outcom": [0, 7, 9, 10, 12, 23, 30, 34, 38, 39], "outdoor": 9, "outer": [6, 12, 13], "outfil": 4, "outlier": [0, 8, 33, 34, 36], "outlin": [6, 10, 11, 37, 38], "outlook": 9, "outperform": [10, 36], "output": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 19, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 44], "output_bia": [1, 41], "output_bias_gradi": [1, 40, 41], "output_func": [41, 42], "output_nod": [41, 42], "output_shap": 4, "output_weight": [1, 41], "output_weights_gradi": [1, 40, 41], "outputlayer1": [12, 39], "outputlayer2": [12, 39], "outsid": [4, 22], "over": [0, 1, 3, 4, 5, 6, 9, 10, 12, 13, 15, 16, 19, 22, 23, 24, 26, 27, 33, 34, 35, 36, 37, 38, 42, 43, 44], "over1": 13, "overal": [1, 10, 36, 41], "overcast": 9, "overcom": [12, 13, 39, 40], "overdetermin": [0, 33], "overfit": [0, 1, 3, 6, 9, 10, 13, 24, 28, 36, 37, 38, 41, 42, 43, 44], "overflow": [5, 36, 37], "overflowerror": [41, 42], "overhead": [12, 40, 41], "overlap": [3, 7, 8, 9, 39, 43, 44], "overleaf": [20, 24, 27, 28], "overlin": [0, 5, 6, 9, 10, 11, 14, 26, 33, 34, 36], "overshoot": 36, "overst": 0, "overtrain": 4, "overview": [3, 20], "overwhelm": 43, "overwritten": [41, 42], "own": [4, 5, 6, 8, 12, 13, 16, 18, 22, 23, 25, 26, 35, 36, 37, 40, 41, 43, 44], "owner": [], "ownmsepredict": 0, "ownmsetrain": 0, "ownridgebeta": 34, "ownridgetheta": [0, 6, 34, 35, 36], "ownypredictridg": 0, "ownytilderidg": 0, "ox": [], "oxid": [], "p": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "p0": [2, 42, 43], "p1": [2, 42, 43], "p_": [2, 4, 8, 9, 42, 43], "p_hidden": [2, 42, 43], "p_i": [5, 30], "p_j": 30, "p_n": 30, "p_output": [2, 42, 43], "p_x": 30, "pa": 40, "pack": [0, 33], "packag": [0, 1, 3, 4, 5, 8, 11, 13, 15, 20, 22, 25, 27, 28, 30, 34, 35, 36, 41, 42, 44], "packtpub": 33, "packtpublish": 33, "pad": [3, 4], "page": [0, 24, 25, 27, 28, 33, 35, 36, 37, 38], "pai": [0, 1, 9, 13, 15, 36, 41], "pair": [0, 2, 3, 9, 25, 30, 33, 42, 43, 44], "paltform": 15, "panda": [0, 4, 5, 6, 7, 9, 11, 25, 27, 35, 36, 37, 38, 39], "pandoc": [], "panel": 33, "paper": [1, 36, 42], "paper_fil": 36, "paradigm": [0, 33], "paragraph": 20, "parallel": [10, 13, 25, 26, 33], "param": [2, 42, 43], "paramat": [2, 42, 43], "paramet": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 16, 17, 18, 19, 21, 22, 24, 27, 28, 30, 35, 36, 37, 42], "parameter": [0, 6, 10, 33, 34], "parametr": [0, 6, 33, 34, 37, 38], "paramt": [3, 5, 37, 40, 43, 44], "parent": 40, "parser": 28, "part": [0, 1, 3, 5, 6, 10, 17, 19, 20, 21, 22, 24, 26, 29, 30, 31, 33, 34, 37, 43], "partial": [0, 1, 5, 6, 7, 8, 10, 11, 12, 13, 16, 21, 30, 33, 34, 35, 36, 38, 39, 40, 41], "particip": [15, 25, 29, 31, 33], "particl": [0, 4, 13, 30, 33], "particular": [0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13, 16, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "particularli": [5, 6, 8, 11, 13, 30, 34, 35, 36, 37, 38], "partit": [1, 4, 9, 41], "partli": [6, 33], "partner": [15, 24, 27, 28], "pass": [2, 3, 12, 14, 21, 36, 40, 42, 43, 44], "password": [27, 28], "past": [10, 30, 36], "patch": [6, 30, 37, 43, 44], "path": [0, 4, 6, 7, 9, 25, 33, 36, 37, 38, 43], "pathcollect": 17, "patholog": [], "patient": [7, 38, 39], "patter": 4, "pattern": [0, 3, 4, 12, 32, 33, 36, 39, 40, 43, 44], "paul": [], "pauli": [0, 33], "pav": [], "pc": [11, 15, 25], "pca": [0, 7, 25, 33, 34, 39], "pd": [0, 4, 5, 6, 7, 9, 11, 33, 34, 35, 36, 37, 38, 39], "pde": [2, 42, 43], "pdf": [0, 3, 4, 5, 6, 9, 15, 16, 19, 20, 24, 27, 28, 32, 33, 37, 42], "pedagog": [0, 33, 34], "penal": [6, 18, 34, 36], "penalti": [6, 13, 18, 27, 34, 36], "penros": [5, 6], "pentagon": [13, 35], "peopl": [1, 9, 13, 24, 25, 27, 28, 36, 41], "per": [0, 1, 6, 21, 23, 24, 28, 29, 31, 33, 36, 37, 38, 39, 41], "perc_print": [41, 42], "percent": 23, "percentag": [10, 11, 31, 41, 42], "perceptron": [0, 1, 7, 33, 38], "peregrin": 33, "perez": [], "perfect": [0, 1, 13, 23, 33, 36, 41], "perfectli": [4, 6, 37, 38], "perform": [0, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 14, 16, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 42], "performac": 4, "perhap": [0, 5, 13, 33, 34, 35, 36], "perimet": 1, "period": [1, 4, 30, 41], "permiss": 15, "permit": [], "permut": 11, "persist": 13, "person": [5, 6, 7, 16, 20, 29, 31, 33, 34, 38], "perspect": 32, "pertin": [12, 28, 33, 40, 41, 42], "petal": [8, 9], "peter": [32, 34], "petersen": [40, 41], "phantom": 30, "phase": [6, 12, 39, 40], "phd": [42, 43], "phenomena": 30, "phenomenon": 36, "phi": 8, "phi_k": 8, "philipp": [40, 41], "philosophi": 13, "phone": [31, 33], "photo": [4, 33], "photo1": 43, "php": [27, 28], "phrase": [0, 33], "physic": [0, 1, 4, 7, 12, 13, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "pi": [2, 3, 5, 6, 7, 9, 12, 13, 30, 37, 38, 39, 41, 42, 43], "pick": [1, 9, 10, 11, 13, 14, 24, 27, 28, 36, 41], "pickl": 1, "pictur": [0, 33], "pie": [25, 33], "piec": [11, 14, 21], "pierr": [], "pil": 43, "pillow": [0, 25, 27, 33], "pinv": [5, 6, 13, 27, 34, 35, 36, 39], "pip": [0, 1, 15, 25, 27, 33, 41, 42], "pip3": [0, 1, 27, 33, 41, 42], "pipelin": [0, 6, 8, 10, 34, 37, 38], "pippin": 33, "pit": 4, "pitfal": [6, 34], "pitt": [12, 39, 40], "pixel": [1, 3, 4, 28, 33, 41, 42, 43, 44], "pixel_height": [1, 3, 41, 42, 44], "pixel_width": [1, 3, 41, 42, 44], "pkg_resourc": [], "pkgutil": [], "place": [0, 4, 6, 8, 13, 15, 26, 27, 33, 35, 37], "plai": [0, 3, 4, 5, 6, 8, 11, 18, 22, 25, 27, 33, 34, 35, 37, 38, 40, 41, 43, 44], "plain": [8, 10, 12, 13, 14, 24, 27, 28, 35, 36, 40, 41], "plan": [6, 9, 31, 32, 33, 41], "plane": [8, 9], "plateau": [5, 35, 36], "platform": [25, 33], "plausibl": [12, 39, 41], "playlist": [43, 44], "plc1qu": 43, "pleas": [13, 27, 28, 31, 33], "plenti": [1, 41], "plethora": [3, 12, 39, 40, 43, 44], "pliahhy2ibx9hdharr6b7xevztgzra1p": [39, 40, 41], "plot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 20, 21, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 39, 41, 42, 43, 44], "plot_all_sc": [27, 34], "plot_confusion_matrix": [7, 10, 23, 39], "plot_count": 6, "plot_cumulative_gain": [7, 10, 23, 39], "plot_data": 1, "plot_dataset": 8, "plot_decision_boundari": [9, 10], "plot_import": 10, "plot_iris_dataset": 21, "plot_max": 4, "plot_min": 4, "plot_model": 4, "plot_numb": 4, "plot_predict": 8, "plot_regression_predict": 9, "plot_result": 4, "plot_roc": [7, 10, 23, 39], "plot_surfac": [2, 6, 13, 42, 43], "plot_train": 9, "plot_tre": [9, 10], "plqvvvaa0qudcjd5baw2dxe6of2tius3v3": [39, 40, 41], "plt": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "plu": [0, 3, 5, 7, 18, 33, 34, 38, 43, 44], "plugin": [], "plzhqobowtqdnu6r1_67000dx_zcjb": [43, 44], "pm": [8, 37], "pmatrix": [2, 42, 43], "pml": 32, "pn": 3, "png": [0, 4, 6, 7, 9, 33, 37, 38], "point": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 19, 20, 23, 26, 27, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43], "point_1": 4, "point_2": 4, "poisson": [25, 30, 33], "poli": [6, 8, 37, 38], "poly100_kernel_svm_clf": 8, "poly3": 0, "poly3_plot": 0, "poly_degre": [41, 42], "poly_featur": [8, 9, 15], "poly_features10": 9, "poly_fit": 9, "poly_fit10": 9, "poly_kernel_svm_clf": 8, "poly_model": 15, "poly_ms": 15, "poly_predict": 15, "polydegre": [0, 5, 6, 10, 34, 37, 38], "polygon": [13, 35], "polym": [12, 39, 40], "polymi": 27, "polynomi": [0, 5, 6, 7, 8, 9, 10, 11, 15, 17, 19, 20, 27, 28, 33, 34, 36, 37, 38, 39, 40], "polynomial_featur": [6, 15, 16, 17, 37, 38], "polynomial_svm_clf": 8, "polynomialfeatur": [0, 6, 8, 9, 15, 16, 19, 34, 37, 38], "polytrop": [0, 6, 37, 38], "pool": 3, "pool_siz": [3, 44], "poor": [1, 13, 35, 36, 41], "poorli": [0, 34], "popul": [0, 5, 23, 33, 34], "popular": [0, 1, 3, 6, 7, 8, 9, 11, 12, 15, 24, 25, 26, 27, 30, 34, 38, 39, 41, 43, 44], "popularli": [0, 33], "portabl": 10, "portion": [11, 13, 36], "pose": [0, 4, 5, 6, 11, 30, 33, 37], "posit": [0, 1, 2, 3, 5, 7, 8, 10, 11, 13, 14, 21, 23, 26, 30, 33, 34, 35, 36, 38, 39, 41, 42, 43], "possibl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 21, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 44], "possibli": [6, 8, 13, 27], "post": [], "posterior": 5, "postpon": [0, 34], "postscript": [27, 28], "postul": 5, "potenti": [0, 3, 5, 6, 12, 13, 34, 36, 37, 39, 40], "pott": [12, 39, 40], "power": [0, 1, 5, 6, 8, 9, 12, 13, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "pp": [5, 6, 19, 37, 40, 41], "pr": 23, "practic": [0, 5, 6, 7, 8, 16, 18, 19, 21, 23, 27, 28, 30, 34, 37, 38, 39, 43, 44], "practition": [0, 1, 3, 33, 36, 41, 42, 43, 44], "pre": 33, "preambl": [], "precalcul": 40, "preced": [1, 11, 12, 30, 39, 41], "preceed": [4, 41, 42], "preceq": 8, "precis": [0, 2, 5, 11, 13, 26, 27, 28, 30, 33, 34, 36, 37, 40, 42, 43], "pred": [6, 23, 37, 38, 39], "pred_train": [41, 42], "pred_val": [41, 42], "predicit": 0, "prediciton": [41, 42], "predict": [0, 1, 5, 6, 7, 8, 9, 10, 15, 16, 17, 19, 22, 23, 24, 25, 27, 28, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "predict_prob": [1, 38, 39, 41], "predict_proba": [7, 10, 23, 39], "predictedlabel": [38, 39], "predictor": [0, 5, 6, 7, 9, 10, 11, 33, 34, 36], "prefer": [0, 1, 6, 8, 9, 11, 13, 15, 20, 23, 25, 27, 28, 33, 41], "prefil": [], "premier": 43, "prepar": [0, 6, 26, 27, 28, 33, 34], "preprocess": [0, 4, 6, 7, 8, 9, 10, 11, 15, 16, 17, 18, 19, 23, 27, 37, 38, 39, 41, 42], "prerequisit": 0, "prescript": [27, 28], "presenc": 13, "present": [0, 5, 6, 7, 9, 12, 13, 26, 27, 28, 30, 33, 34, 35, 36, 39, 40, 41, 42], "preserv": [3, 11, 26, 43, 44], "press": [13, 15, 32, 35, 40, 41], "pretrain": [1, 4, 41], "pretti": [0, 4, 8, 9, 21, 25, 27, 33], "prettier": [], "prev_centroid": 14, "prevent": [13, 30, 36], "previou": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 21, 22, 26, 27, 28, 30, 34, 35, 36, 39, 40, 41, 42, 43, 44], "previous": [2, 3, 9, 10, 30, 42, 43, 44], "price": [0, 4, 9, 13, 36], "primal": 8, "primari": [0, 7, 33, 38, 39], "prime": 30, "princip": [0, 5, 7, 25, 33, 34, 35, 39], "principl": [0, 6, 7, 8, 14, 33, 37, 38, 39, 43, 44], "print": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 21, 22, 23, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "print_funct": [8, 9], "print_length": [41, 42], "printout": [0, 33], "prior": [0, 5, 6, 33], "privat": 0, "pro": 28, "prob": [1, 30, 38, 39], "probabilist": [0, 23, 32, 33, 34], "probabl": [0, 1, 3, 4, 6, 7, 10, 13, 21, 23, 25, 33, 34, 36, 38, 39, 41, 42, 44], "problem": [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 23, 24, 25, 26, 27, 28, 30, 37, 44], "probml": 32, "proce": [0, 5, 6, 7, 8, 9, 10, 11, 13, 26, 33, 34, 37, 40], "procedur": [2, 4, 5, 6, 8, 10, 11, 13, 34, 35, 36, 37, 38, 42, 43], "proceed": 26, "process": [0, 2, 4, 6, 9, 10, 12, 13, 25, 26, 27, 30, 32, 33, 35, 36, 37, 38, 39, 40], "procur": [], "prod": 32, "prod_": [1, 5, 7, 37, 38, 39, 41], "produc": [0, 3, 4, 5, 6, 9, 10, 11, 12, 13, 18, 20, 25, 26, 27, 28, 30, 33, 34, 37, 39, 40, 43, 44], "product": [0, 1, 3, 5, 6, 7, 8, 12, 13, 16, 17, 25, 26, 33, 34, 36, 37, 38, 39, 40, 41], "profess": [0, 33], "profit": [], "progag": 28, "program": [0, 1, 4, 5, 6, 8, 12, 14, 15, 25, 26, 29, 30, 31, 33, 34, 39, 41], "programm": 26, "progress": [1, 4, 14, 36, 38, 39, 41, 42], "prohibit": [6, 37, 38], "project": [0, 1, 2, 3, 5, 11, 13, 15, 19, 22, 23, 24, 25, 29, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "project_root_dir": [0, 6, 7, 9, 33, 37, 38], "promin": [12, 39, 40], "promis": 8, "promot": [31, 33], "prompt": 20, "prone": [9, 15, 21, 40], "pronounc": [13, 25, 33, 36], "proof": [0, 11, 12, 13, 33, 35, 37, 38, 40], "prop": [28, 36, 41, 42], "prop_cycl": [], "propag": [2, 3, 13, 21, 22, 28, 36, 44], "proper": [0, 2, 6, 7, 20, 37, 38, 42, 43], "properli": [1, 6, 8, 10, 13, 18, 20, 27, 28, 36, 41], "properti": [0, 1, 3, 12, 13, 16, 26, 33, 37, 39, 41, 42, 43, 44], "propgag": 40, "proport": [0, 1, 5, 9, 11, 13, 30, 33, 34, 41], "propos": [1, 4, 6, 10, 27, 28, 33, 36, 41], "propto": [5, 13, 35, 36], "proton": [0, 33], "prove": [3, 13, 35, 36, 43, 44], "provid": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 20, 21, 22, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "proxi": [1, 13, 36, 41], "prune": 9, "pseudo": [26, 30, 36], "pseudocod": [27, 28], "pseudoinv": 5, "pseudoinvers": [5, 6, 27], "pseudorandom": [6, 30, 37], "psychologi": [0, 33], "pt": 13, "public": [0, 15, 25, 33], "publish": [40, 41], "pull": 15, "punish": [0, 1, 33, 41, 42], "pure": [3, 9, 30], "purest": 9, "puriti": 9, "purpos": [0, 3, 10, 12, 14, 21, 33, 39, 40, 43, 44], "push": 15, "put": [1, 20, 27, 28, 36], "putmask": [], "py": 5, "pybtex": [], "pycod": 33, "pydata": 25, "pydevd_extension_api": [], "pydevd_plugin": [], "pydevd_plugin_plugin_nam": [], "pydot": 9, "pygment": [], "pyhton2": 33, "pylab": [7, 33, 38], "pypi": 25, "pyplot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "pythagora": 5, "python": [1, 2, 3, 5, 6, 8, 11, 12, 13, 14, 18, 20, 21, 22, 27, 28, 30, 34, 36, 40, 41, 42, 43], "python2": [0, 27], "python3": [0, 25, 27, 33], "pythonpath": [], "pytorch": [0, 25, 27, 28, 33, 40, 41], "pyzmq": [], "q": [5, 6, 8, 11, 30, 37, 41, 42], "qp": 8, "qqoghlgkig0": 43, "qquad": [2, 11, 13, 23, 26, 36, 42, 43], "qr": [5, 6, 26, 34, 35], "quad": [1, 13, 23, 26, 41], "quadrat": [0, 8, 9, 13, 33], "qualit": [4, 9, 27, 28, 30], "qualiti": [0, 9, 25, 33, 34, 40], "quantifi": [1, 23, 41], "quantil": 10, "quantit": [0, 6, 9, 27, 28, 33, 37, 38], "quantiti": [0, 2, 5, 6, 7, 9, 10, 11, 12, 14, 16, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "quantum": [4, 12, 32, 33, 39, 40], "quartil": [0, 34, 36], "quasi": 40, "quench": 5, "queri": 9, "question": [0, 5, 6, 9, 11, 12, 13, 24, 27, 28, 31, 33, 34, 36, 37, 40, 41], "qugan": 4, "quick": [4, 30], "quicker": 36, "quickli": [1, 3, 9, 11, 13, 35, 36, 41, 42, 43, 44], "quit": [1, 5, 6, 9, 10, 12, 15, 22, 34, 35, 37, 38, 39, 41, 42], "quot": 4, "r": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 25, 26, 27, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "r2": [0, 5, 6, 19, 33, 34, 35], "r2_score": [0, 33], "r2score": [0, 33], "r_": 36, "r_0": 36, "r_1": 9, "r_2": 9, "r_j": 9, "r_m": 9, "r_t": 36, "rad": [], "rade": [], "radial": [8, 12, 39, 40], "radioact": 30, "radiu": [0, 1, 34, 36], "radziej": [], "ragan": [], "rain": 9, "rais": [41, 42], "ram": 36, "ramanujam": [], "ramp": [1, 41], "ran0": 30, "ran1": 30, "ran2": 30, "ran3": 30, "rand": [0, 4, 5, 6, 9, 10, 13, 15, 19, 21, 22, 26, 33, 34, 35, 36, 37, 38, 41, 42], "randint": [6, 9, 13, 36, 37], "randn": [0, 1, 2, 5, 6, 9, 11, 13, 15, 18, 21, 22, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "random": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "random_forest_model": 10, "random_index": [13, 36], "random_indic": [1, 3, 41, 42, 44], "random_st": [7, 8, 9, 10, 11, 23, 28, 38, 39], "randomforestclassifi": 10, "randomli": [1, 6, 9, 13, 14, 18, 35, 36, 37, 38, 41], "randomst": [38, 39], "rang": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 19, 21, 22, 23, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "rangl": [0, 6, 11, 30, 33, 34], "rangle_x": 30, "rank": [5, 34, 35], "rankdir": 4, "raphson": [1, 8, 13, 41], "rapidli": [0, 36], "rare": [1, 13, 23, 36, 41], "rasbt": [43, 44], "raschka": [28, 33, 34, 37, 38, 39], "rasckha": 33, "rashcka": [35, 36, 40, 41, 42, 43], "rashkca": [40, 41], "rate": [1, 2, 3, 4, 8, 9, 10, 12, 13, 18, 23, 24, 28, 35, 37, 38, 39, 40, 43, 44], "rather": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 30, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44], "ratio": [4, 7, 9, 10, 11, 23, 38, 39, 43], "rational": [0, 33], "ravel": [5, 6, 7, 8, 9, 10, 11, 13, 26, 37, 38, 39, 41, 42], "raw": [3, 36, 43, 44], "rbf": [8, 11, 12, 39, 40], "rbf_kernel_svm_clf": 8, "rbf_pca": 11, "rc": 30, "rcond": [0, 33, 34], "rcparam": [1, 3, 7, 8, 9, 10, 30, 33, 38, 41, 42, 43, 44], "re": [2, 4, 13, 15, 35, 42, 43], "reach": [1, 4, 5, 6, 9, 10, 12, 13, 14, 23, 35, 36, 37, 38, 40, 41, 42], "react": [], "read": [0, 2, 3, 4, 5, 6, 7, 8, 11, 12, 16, 17, 19, 20, 26, 27, 28, 30, 32, 35, 42, 43, 44], "read_csv": [0, 6, 7, 9, 37, 38], "read_fwf": [0, 33], "reader": [0, 6, 20, 26, 30, 33, 34, 36], "readi": [0, 1, 5, 6, 8, 10, 11, 12, 26, 33, 40, 41, 42, 43, 44], "readili": [1, 41], "readm": [15, 20, 27, 28], "readthedoc": 25, "real": [0, 1, 4, 7, 10, 11, 12, 16, 18, 19, 26, 34, 37, 38, 39, 41, 42], "real_loss": 4, "real_output": 4, "realist": [8, 33], "realiti": 30, "realiz": [1, 12, 39, 41], "realli": [0, 1, 24, 33, 41, 42], "rearrang": 13, "reason": [0, 1, 3, 4, 10, 13, 24, 32, 33, 35, 36, 41, 43, 44], "reassign": 1, "reat": [41, 42], "reber": [41, 42], "recal": [5, 6, 9, 10, 11, 12, 22, 26, 30, 33, 34, 35, 36, 37, 38, 40, 41], "recarrai": [], "recast": [3, 43, 44], "receiv": [1, 3, 10, 12, 23, 30, 39, 40, 41, 43, 44], "recent": [0, 6, 13, 32, 36, 37, 38, 40, 41], "recept": [3, 12, 39, 40, 44], "receptive_field": [3, 44], "recip": [0, 6, 7, 26, 27, 28, 33, 34, 38, 39, 43], "reciproc": 5, "recogn": [0, 4, 5, 10, 33, 37], "recognit": [0, 1, 3, 12, 32, 33, 39, 40, 41, 43, 44], "recommen": 33, "recommend": [0, 2, 3, 4, 5, 6, 8, 13, 15, 19, 20, 21, 22, 25, 26, 27, 28, 32, 35, 36, 37, 38, 39, 40, 42, 43, 44], "reconsid": 9, "reconstruct": [11, 43], "record": [10, 27, 28, 29, 31, 33, 38, 39], "recreat": [15, 21], "rectangl": [9, 13, 35], "rectangular": [5, 34, 35], "rectifi": [1, 3, 12, 39, 41, 44], "recur": [0, 25, 33], "recurr": [0, 1, 25, 33, 41, 44], "recurs": [9, 25, 26, 33], "red": [0, 3, 4, 6, 8, 9, 36, 37, 43, 44], "redefin": [0, 10, 33, 34, 35], "redefinit": 35, "redistribut": [], "reduc": [1, 3, 5, 6, 9, 10, 11, 13, 21, 24, 33, 35, 36, 37, 41, 43, 44], "reduct": [0, 10, 11, 25, 30, 33, 34], "redund": [23, 43, 44], "reegress": 27, "ref": 20, "refer": [0, 1, 2, 3, 5, 6, 11, 12, 13, 14, 20, 24, 26, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "referansestil": 20, "referenc": [2, 40, 41, 42, 43], "refin": [12, 39, 40], "refit": [6, 37, 38], "reflect": [0, 1, 4, 5, 27, 28, 30, 33, 41, 42], "refresh": [25, 33], "refreshprogrammingskil": 33, "reg": [10, 11], "regard": [1, 9, 13, 41], "regardless": [12, 16, 23, 39, 41], "regexp": [], "reggi": [], "regim": 36, "region": [3, 4, 6, 9, 12, 27, 36, 39, 40, 43, 44], "regist": [6, 30], "reglasso": [5, 35], "regr_1": [0, 9], "regr_2": [0, 9], "regr_3": [0, 9], "regress": [1, 8, 11, 12, 16, 20, 23, 24, 25, 26, 40, 41, 42], "regressor": [0, 7, 10, 38, 41, 42], "regret": [], "regridg": [0, 5, 6, 34, 35, 36], "regular": [0, 3, 4, 5, 6, 7, 9, 13, 17, 18, 24, 28, 31, 33, 34, 35, 36, 37, 38, 39, 42], "regularli": 15, "reilli": [0, 32, 33], "reinforc": [0, 8, 25, 33], "reiniti": [41, 42], "reiter": 1, "reitz": [], "reject": 7, "rel": [0, 4, 6, 7, 9, 12, 13, 21, 30, 33, 34, 36, 37, 38, 39, 41], "relat": [0, 1, 3, 4, 5, 11, 13, 14, 19, 23, 26, 30, 33, 34, 35, 37, 40, 41, 43, 44], "relationship": [0, 4, 9, 18, 33], "relativeerror": [0, 33, 34], "releas": [1, 25, 33, 41, 42], "relev": [0, 1, 5, 7, 11, 25, 27, 28, 30, 33, 35, 36, 43, 44], "reli": [0, 6, 8, 36, 43], "reliabilti": [27, 28], "reliabl": [7, 30, 38, 39], "relu": [3, 4, 21, 22, 28, 33, 43, 44], "relu_d": 22, "remain": [1, 2, 4, 6, 12, 23, 26, 30, 34, 36, 37, 38, 39, 40, 41, 42, 43], "remaind": 30, "reman": [2, 42, 43], "remark": [1, 41], "rememb": [0, 8, 13, 20, 21, 22, 26, 27, 28, 33, 36], "remind": [0, 5, 11, 13, 19, 26, 30, 37, 42], "remot": 15, "remov": [4, 5, 6, 18, 34, 35, 36], "renam": [15, 43, 44], "render": [0, 33, 34], "reorder": [5, 7, 34, 35, 38, 39], "reorgan": [0, 33], "repeat": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 14, 26, 27, 30, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "repeated": 33, "repeatedli": [0, 6, 10, 13, 37, 38], "repet": [3, 43, 44], "repetit": [6, 33, 34, 37, 38], "rephras": [13, 35], "replac": [0, 1, 3, 4, 5, 6, 10, 12, 14, 23, 25, 27, 33, 34, 35, 37, 38, 40, 41, 43, 44], "replica": [6, 37], "repo": [15, 27, 28], "report": [24, 33, 36, 38, 39], "repositori": [4, 20, 24, 27, 28, 33, 43, 44], "reposotori": [], "repres": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "represent": [0, 1, 3, 6, 30, 33, 37, 38, 41, 42, 43, 44], "representd": [3, 43, 44], "reproduc": [0, 5, 6, 9, 12, 15, 16, 18, 20, 25, 27, 28, 30, 33, 34, 40, 41, 42], "repuls": [0, 33], "request": [0, 13, 36], "requir": [0, 1, 3, 4, 5, 6, 8, 9, 11, 12, 13, 15, 17, 18, 19, 20, 24, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 44], "rerun": [41, 42], "res1": [2, 42, 43], "res2": [2, 42, 43], "res3": [2, 42, 43], "res_analyt": [2, 42, 43], "res_analytical1": [2, 42, 43], "res_analytical2": [2, 42, 43], "res_analytical3": [2, 42, 43], "resaml": 6, "resampl": [0, 7, 10, 24, 25, 33, 34, 41, 42], "rescal": [0, 11, 12, 36, 39], "rescu": 5, "reseach": 6, "research": [0, 4, 13, 21, 22, 25, 28, 32, 33, 36, 43, 44], "researchg": 28, "resembl": [6, 30, 37], "reserv": [1, 5, 6, 30, 37, 38, 41], "reset": [41, 42], "reset_weight": [41, 42], "reshap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 26, 33, 34, 37, 38, 41, 42, 43, 44], "resid": 36, "residenti": [], "residu": [0, 5, 13, 33], "resiz": [5, 34, 35], "resnet": 36, "resort": 36, "resourc": [33, 36], "respect": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 21, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "respond": [12, 39, 40], "respons": [0, 7, 9, 12, 33, 34, 38, 39, 40], "rest": [0, 5, 18, 21, 22, 23, 34, 35, 36], "restat": [0, 12, 33], "restor": 4, "restored_discrimin": 4, "restored_gener": 4, "restrict": [0, 3, 9, 12, 33, 39, 40, 41, 42, 43, 44], "result": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 36, 37, 38, 39, 42, 43, 44], "retail": [], "retain": [5, 6, 34, 35, 36, 37, 38, 43, 44], "rethink": 37, "return": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 16, 17, 21, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "return_data": 14, "return_sequ": 4, "return_x_i": 9, "reus": [1, 3, 6, 19, 20, 22, 24, 27, 28, 40, 41, 43, 44], "reveal": [0, 12, 33, 39, 40], "revers": [1, 22, 26, 41, 42], "review": [25, 26], "revis": [], "revisit": 14, "revolut": 33, "reward": [0, 4, 33], "rewrit": [0, 3, 5, 6, 7, 8, 10, 11, 12, 13, 16, 19, 26, 27, 30, 35, 36, 38, 39, 40, 41], "rewritten": [2, 6, 8, 10, 30, 37, 42, 43], "rewrot": [13, 38, 39], "rf": 10, "rgb": [3, 43, 44], "rgoj5yh7evk": 25, "rh": [6, 37], "rho": [0, 10, 13, 36, 41, 42], "rho2": [41, 42], "rho_1": 10, "rho_2": 10, "rho_m": 10, "rich": [0, 33], "rid": [], "ride": 9, "rideclass": 9, "ridedata": 9, "ridg": [7, 11, 13, 20, 24, 25, 28, 33, 37, 38, 39], "ridge_paramet": 17, "ridge_sk": 6, "ridgebeta": 35, "ridgetheta": 5, "right": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 19, 21, 22, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "right_sid": [2, 42, 43], "rightarrow": [0, 1, 5, 6, 8, 11, 12, 13, 30, 33, 34, 35, 36, 37, 39, 40, 41], "rigor": [0, 23, 33, 34, 35], "ring": 6, "rise": [0, 33], "risk": [0, 13, 33, 35, 36], "rival": 4, "river": [], "rlm": 33, "rm": [28, 30, 36, 41, 42], "rms_prop": [41, 42], "rmse": [], "rmsporp": [13, 36], "rmsprop": [1, 3, 4, 13, 27, 28, 37, 40, 41, 42, 44], "rnd_clf": 10, "rng": [30, 38, 39], "rnn": [4, 12, 39, 40], "rnn1": 4, "rnn2": 4, "rnn_2layer": 4, "rnn_input": 4, "rnn_output": 4, "rnn_train": 4, "rntrick1": 30, "rntrick2": 30, "rntrick3": 30, "rntrick4": 30, "ro": [0, 13, 33, 35, 36], "robert": [19, 27, 32], "robust": [0, 33, 36], "robustscal": [0, 34, 36], "roc": [7, 10], "role": [0, 2, 5, 6, 8, 18, 25, 27, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43], "roll": 6, "ronach": [], "room": [0, 31, 33], "root": [0, 5, 9, 13, 15, 24, 30, 34, 35, 36, 40, 42, 44], "root_directori": [], "rot": 33, "rotat": [1, 8, 9, 10], "rotation_matrix": 9, "roughli": [1, 3, 18, 41, 44], "round": [7, 9, 13, 39, 41, 42, 43], "routin": [13, 26, 33, 35], "row": [0, 1, 2, 5, 6, 9, 11, 16, 21, 26, 33, 34, 35, 37, 41, 42, 43], "rr": [5, 34, 35], "rrr": [5, 34, 35], "rubric": [], "rudg": [], "rug": [13, 35, 36], "rule": [0, 1, 5, 6, 13, 22, 27, 33, 34, 35, 39, 42, 43, 44], "run": [0, 1, 2, 4, 5, 6, 8, 9, 11, 13, 15, 20, 21, 22, 25, 27, 28, 33, 34, 35, 36, 37, 38, 41, 42, 43], "rung": 28, "running_loss": [42, 44], "runtim": [1, 6, 14, 15, 41, 42], "rust": [0, 25, 26, 33], "rvert": [1, 41], "rvert_2": [1, 41], "s41467": 28, "s_": [3, 6], "s_1": 6, "s_i": [6, 7, 38], "s_j": 6, "s_k": 6, "s_phenomenon": 27, "saddl": [13, 35, 36], "safeguard": [18, 36], "saga": 28, "sai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "said": [6, 9, 13, 35], "sake": [0, 5, 7, 11, 33, 34, 35, 38, 39, 40, 41], "sale": [0, 33], "sam": 33, "same": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 18, 20, 21, 22, 24, 26, 27, 28, 30, 33, 34, 35, 39, 40, 41, 42, 43, 44], "samm": 10, "sampl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 18, 19, 23, 25, 26, 27, 30, 33, 34, 36, 37, 38, 39, 41, 42, 43, 44], "sample_vari": 14, "sampleexptvari": 30, "samples_per_class": [38, 39], "samwis": 33, "sandbox": [], "sandboxmod": [21, 22], "sasha": [], "sastri": 11, "satisfactori": [0, 33], "satisfi": [1, 2, 3, 6, 8, 13, 26, 30, 35, 37, 41, 42, 43], "satur": [1, 6, 37, 38, 41], "save": [0, 4, 6, 7, 9, 13, 20, 22, 33, 36, 37, 38], "save_fig": [0, 6, 7, 9, 10, 33, 37, 38], "savefig": [0, 4, 6, 7, 9, 30, 33, 37, 38], "savetxt": 4, "saw": [5, 34], "scalabl": 10, "scalar": [2, 5, 6, 10, 34, 37, 40, 41, 42, 43], "scale": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 22, 25, 26, 27, 28, 31, 33, 35, 38, 39, 41, 42], "scale_mean": 4, "scale_std": 4, "scaler": [0, 7, 8, 9, 10, 11, 17, 27, 34, 41, 42], "scan": [5, 7, 38, 39], "scari": 5, "scatter": [0, 1, 6, 7, 8, 9, 14, 15, 17, 21, 33, 34, 36, 37, 38], "scenario": [6, 13, 35, 36], "schedul": [13, 36], "scheduler_arg": [41, 42], "schedulers_bia": [41, 42], "schedulers_weight": [41, 42], "scheme": [1, 13, 35, 36, 38, 39, 41], "schrage": 30, "sch\u00f8yen": [6, 34, 36], "scienc": [0, 1, 10, 12, 13, 25, 29, 30, 31, 32, 35, 37, 38, 39, 40, 41], "scientif": [0, 20, 25, 27, 28, 33, 38, 39, 42, 43], "scientist": [0, 33], "scikit": [3, 5, 6, 8, 9, 10, 13, 15, 16, 20, 21, 23, 25, 26, 27, 28, 32, 42, 44], "scikit_learn": [0, 39], "scikitlearn": 33, "scikitplot": [7, 10, 23, 39], "scipi": [0, 3, 5, 6, 13, 25, 26, 27, 33, 34, 35, 37, 43], "scl": 6, "scm": 15, "score": [0, 1, 3, 6, 7, 9, 10, 11, 15, 16, 19, 21, 23, 27, 28, 31, 33, 34, 36, 37, 38, 39, 41, 42, 43, 44], "scores_kfold": [6, 37, 38], "scratch": [1, 13, 16, 39, 40, 41], "script": [], "sdg": [13, 36], "sdv4f4s2sb8": [35, 36], "seaborn": [0, 1, 3, 6, 7, 28, 33, 39, 41, 42, 44], "seamless": [0, 25, 27, 33], "seamlessli": [41, 42], "search": [0, 1, 3, 5, 9, 13, 15, 33, 35, 36, 41, 42, 44], "sebastian": [33, 40, 41], "sebastianraschka": [28, 33], "sec": 6, "second": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 14, 15, 16, 20, 21, 22, 24, 25, 26, 30, 31, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "second_correct": [41, 42], "second_mo": 36, "second_term": 36, "secondari": 36, "secondeigvector": 11, "secondli": [12, 40, 41, 42], "section": [4, 11, 16, 20, 26, 27, 30, 34, 36, 38], "sector": 0, "see": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "seed": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 18, 20, 21, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "seed_imag": 4, "seek": [1, 2, 8, 41, 42, 43], "seem": [1, 3, 4, 36, 41, 42, 43, 44], "seemingli": [0, 33], "seen": [0, 1, 3, 5, 10, 12, 30, 41, 43, 44], "segment": [13, 35, 41, 42], "seismic": 6, "seldomli": [0, 33], "select": [1, 5, 6, 8, 9, 10, 11, 15, 20, 23, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 41], "selevet": 15, "self": [1, 5, 22, 34, 38, 39, 41, 42, 44], "sell": 4, "semest": [7, 29, 39], "semi": [8, 13, 35, 36], "semilogx": 6, "send": [5, 12, 13, 21, 22, 31, 33, 39, 40], "senior": [29, 31], "sens": [0, 4, 6, 8, 21, 33, 37, 43, 44], "sensibl": [3, 21, 43, 44], "sensit": [0, 5, 6, 9, 13, 23, 33, 34, 36, 37, 38], "sent": [2, 21, 40, 41, 42, 43], "sentdex": [39, 40, 41], "sentenc": [4, 12, 39, 40], "separ": [0, 1, 2, 4, 6, 8, 9, 12, 14, 18, 21, 22, 23, 25, 27, 30, 33, 36, 37, 39, 40, 41, 42, 43], "septemb": [18, 27, 33], "sequenc": [3, 4, 7, 9, 10, 12, 13, 25, 26, 30, 33, 35, 38, 39, 40, 43, 44], "sequenti": [1, 3, 4, 10, 12, 30, 39, 40, 41, 42, 44], "seri": [0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 26, 33, 34, 35, 37, 39, 40, 41, 42], "serif": [7, 30, 33, 38], "serv": [0, 1, 2, 3, 5, 7, 13, 28, 32, 33, 34, 35, 36, 38, 39, 41, 42, 43, 44], "servic": [27, 28], "session": [1, 15, 20, 27, 28, 29, 31, 33], "set": [1, 4, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 36, 37, 38, 39], "set_cmap": 43, "set_major_formatt": 6, "set_major_loc": 6, "set_tick": [1, 8], "set_ticklabel": 1, "set_titl": [0, 1, 2, 3, 7, 12, 14, 33, 38, 39, 41, 42, 43, 44], "set_xlabel": [0, 1, 2, 3, 7, 12, 33, 38, 39, 41, 42, 43, 44], "set_xlim": [7, 12, 38, 39, 41], "set_xticklabel": 1, "set_ylabel": [0, 1, 2, 3, 7, 33, 39, 41, 42, 43, 44], "set_ylim": [7, 12, 38, 39, 41], "set_ytick": [7, 39], "set_yticklabel": [1, 6], "set_zlim": 6, "seth": 4, "setminu": 6, "setosa": [8, 9], "setosa_or_versicolor": 8, "setp": [6, 37, 38], "setup": [1, 4, 6, 8, 22, 25, 28, 33, 34, 35, 40, 41], "sever": [0, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 43, 44], "sgd": [1, 3, 35, 41, 42, 44], "sgd_clf": 8, "sgdclassifi": 8, "sgdreg": 13, "sgdregressor": 13, "sgn": [5, 34, 35], "shall": [], "shallow": [13, 36], "shape": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 21, 22, 23, 26, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "share": [1, 3, 15, 33, 41, 42, 43, 44], "share_mask": [], "shareabl": 15, "she": [7, 38, 39], "sheppard": [], "shibukawa": [], "shift": [1, 6, 12, 15, 18, 30, 34, 36, 39, 41], "ship": [3, 44], "shire": 33, "short": [4, 5, 20, 24, 27, 28, 41, 42], "shortcom": [13, 35, 36], "shorten": 4, "shorter": 30, "shorthand": [33, 37], "shortli": [26, 33], "should": [0, 2, 3, 5, 6, 8, 9, 11, 12, 15, 18, 19, 20, 21, 22, 24, 26, 27, 30, 33, 34, 36, 37, 38, 40, 43, 44], "shouldn": [], "show": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "show_shap": 4, "shown": [0, 4, 5, 8, 12, 13, 23, 26, 34, 35, 36, 39, 40, 41, 42], "shrink": [3, 5, 6, 8, 11, 34, 35, 36, 43, 44], "shrinkag": [5, 6, 34, 35], "shrunk": 11, "shuffl": [0, 1, 4, 6, 13, 34, 36, 37, 38, 41, 42, 44], "sickit": [40, 41], "side": [0, 2, 5, 8, 12, 13, 26, 27, 28, 33, 35, 38, 39, 41, 42, 43], "sigh": [25, 33], "sigma": [0, 1, 5, 6, 7, 10, 11, 12, 13, 19, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "sigma0": 30, "sigma1": 30, "sigma2": 30, "sigma_": [5, 26, 33, 34, 35, 37], "sigma_0": [5, 34, 35], "sigma_1": [5, 34, 35, 40, 41], "sigma_2": [5, 34, 35, 40, 41], "sigma_fn": [7, 12, 38, 39, 41], "sigma_i": [0, 5, 33, 34, 35], "sigma_j": [5, 34, 35], "sigma_m": [6, 30, 37], "sigma_n": [11, 30], "sigma_t": 13, "sigma_x": 30, "sigmoid": [1, 2, 4, 7, 8, 10, 12, 21, 22, 28, 38, 39, 40, 42, 43], "sigmoid_autograd": 22, "sigmoid_d": 22, "sigmundson": [6, 34, 36], "sign": [1, 2, 7, 8, 10, 28, 30, 31, 38, 41, 42, 43], "signa": 43, "signal": [1, 3, 10, 12, 36, 39, 40, 41, 43, 44], "signifi": 4, "signific": [1, 36, 41], "significantli": [1, 13, 18, 30, 35, 36, 41], "sim": [4, 5, 6, 13, 19, 30, 37], "similar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 18, 25, 26, 27, 28, 33, 35, 37, 38, 39, 40, 41, 43, 44], "similarli": [0, 1, 3, 5, 8, 10, 13, 30, 33, 34, 35, 36, 40, 41, 43, 44], "similiar": [41, 42], "simpl": [1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 14, 16, 17, 22, 23, 25, 26, 28, 30, 37, 39, 42], "simple_plot": [], "simplefilt": [41, 42], "simplepredict": 10, "simpler": [0, 1, 5, 6, 7, 13, 16, 25, 27, 28, 33, 35, 36, 41, 42], "simplernn": 4, "simplest": [0, 1, 3, 4, 9, 10, 12, 14, 23, 27, 33, 39, 40, 41, 43, 44], "simpletre": 10, "simpli": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "simplic": [2, 5, 6, 7, 8, 9, 10, 11, 12, 14, 34, 35, 36, 38, 39, 40, 41, 42, 43], "simplicti": [5, 34, 35], "simplif": 40, "simplifi": [0, 6, 9, 18, 22, 25, 27, 33, 34, 36, 37, 38, 40], "simplist": [3, 6, 30, 37, 43, 44], "simul": [6, 18, 36, 37, 38], "simultan": [6, 36, 37, 38], "sin": [0, 1, 2, 3, 4, 9, 12, 13, 26, 33, 39, 41, 42, 43], "sinc": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 16, 18, 21, 22, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "sine": [3, 12, 39, 41], "singl": [0, 1, 2, 3, 5, 6, 7, 8, 9, 12, 13, 18, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "singular": [0, 6, 13, 26, 33, 37, 43], "sinusoid": 3, "site": [0, 27, 28, 29, 34], "situat": [0, 4, 5, 7, 13, 30, 33, 34, 35, 36, 38, 39], "six": [3, 30, 40], "size": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 18, 20, 21, 23, 26, 27, 30, 33, 37, 38, 39, 40, 41, 42, 43, 44], "sizesp": 36, "skeleton": 22, "sketch": 10, "ski": 9, "skill": 0, "skip": 11, "skl": [0, 6, 33, 34, 36], "sklearn": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 20, 21, 22, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "skplt": [7, 10, 23, 39], "skrankefunct": [41, 42], "sl": [6, 34, 36], "slack": 8, "slender": [], "slice": [2, 26, 33, 42, 43], "slide": [0, 3, 16, 27, 28, 30, 33, 34, 35, 40, 41, 42, 43, 44], "slight": [6, 13, 24, 37, 38], "slightli": [1, 2, 3, 5, 6, 7, 10, 30, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "slope": [8, 11, 12, 39], "slow": [0, 2, 8, 13, 18, 34, 35, 36, 42, 43], "slower": [5, 26, 33, 34, 35, 36], "slowest": 26, "slowli": [12, 36], "slp": [1, 41], "small": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 18, 21, 22, 25, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "smaller": [0, 1, 2, 5, 6, 8, 9, 11, 13, 21, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43], "smallest": [0, 4, 14, 33], "smallest_row_index": 14, "smodin": [], "smooth": [0, 3, 6, 13, 27, 33, 35, 36, 43, 44], "smoother": 36, "sn": [0, 1, 3, 6, 7, 33, 39, 41, 42, 44], "sne": 11, "so": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "soar": 6, "social": 0, "soft": [1, 7, 10, 12, 38, 39, 40, 41], "soften": 8, "softmax": [3, 7, 21, 22, 28, 38, 39, 42, 43, 44], "softmax_vec": 21, "softwar": [0, 8, 25, 26, 40], "sokogskriv": 20, "sol": 8, "sol1": 21, "sole": [0, 6, 33], "solid": [0, 7, 38, 39], "solut": [0, 1, 2, 3, 5, 6, 8, 10, 11, 13, 18, 21, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 41, 44], "solution_ev": 36, "soluton": [2, 42, 43], "solv": [0, 1, 3, 5, 6, 8, 10, 11, 12, 13, 16, 26, 27, 28, 33, 34, 40, 41, 44], "solve_expdec": [2, 42, 43], "solve_ode_deep_neural_network": [2, 42, 43], "solve_ode_neural_network": [2, 42, 43], "solve_pde_deep_neural_network": [2, 42, 43], "solveod": [2, 42, 43], "solveode_popul": [2, 42, 43], "solver": [2, 7, 8, 9, 10, 23, 26, 28, 33, 39, 42, 43], "some": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18, 19, 21, 22, 23, 24, 27, 28, 30, 33, 36, 37, 39, 41, 42, 43, 44], "some_model": [6, 34, 36], "somehow": 4, "someon": 16, "someth": [0, 1, 3, 4, 7, 9, 11, 15, 19, 20, 27, 28, 30, 33, 34, 39, 41, 43, 44], "sometim": [0, 1, 11, 12, 13, 14, 19, 34, 36, 39, 40, 41], "somewhat": [28, 39], "soon": [26, 31, 34], "sophist": [0, 33], "sopt": 13, "sort": [5, 6, 9, 11, 23, 30, 37, 38], "sound": [3, 5], "sourc": [0, 1, 3, 6, 24, 25, 26, 27, 28, 30, 33, 36, 37, 38, 41, 42], "source1": 22, "source2": 22, "space": [0, 1, 4, 5, 8, 9, 11, 12, 13, 14, 30, 34, 35, 36, 38, 39, 40, 41, 43, 44], "span": [0, 3, 5, 9, 11, 26, 33, 34, 35, 43, 44], "spare": [1, 41, 42], "spars": [3, 6, 18, 26, 33, 36, 43, 44], "sparse_categorical_crossentropi": 42, "sparse_mtx": [26, 33], "sparsecategoricalcrossentropi": [3, 44], "sparsiti": [10, 18], "spatial": [1, 2, 3, 12, 39, 40, 41, 42, 43, 44], "speak": 30, "special": [6, 7, 10, 12, 13, 23, 26, 30, 33, 34, 35, 36, 38, 39, 40, 41], "specif": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 16, 23, 25, 26, 27, 28, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 44], "specifi": [0, 3, 5, 6, 7, 9, 11, 13, 14, 30, 33, 35, 36, 37, 38, 39, 41, 44], "specifici": [0, 10, 33], "spectacular": [3, 43, 44], "spectral": 1, "speech": [0, 1, 3, 4, 12, 39, 40, 41, 43, 44], "speed": [1, 2, 4, 13, 41, 42], "spend": [16, 30, 36], "spent": [27, 28], "sphere": [0, 34, 36], "sphinx": [], "sphinx_book_them": [], "sphinxcontrib": [], "spike": 36, "spin": 6, "spite": 0, "spitzer": [], "spline": 8, "split": [1, 3, 4, 5, 6, 8, 9, 10, 11, 14, 16, 17, 20, 21, 22, 27, 28, 30, 33, 35, 36, 37, 38, 41, 42, 43, 44], "splite": 0, "splitter": [1, 10], "spoiler": [], "spontan": 30, "spot": [3, 43, 44], "spread": [0, 11, 30, 33, 34, 38, 39], "spring": [41, 42], "springer": [19, 27, 32, 33, 37, 38], "spuriou": [13, 36], "sqquar": 35, "sqrsignal": 3, "sqrt": [3, 4, 5, 6, 8, 10, 11, 13, 30, 34, 35, 36, 37, 40, 41, 42, 43], "squar": [1, 2, 3, 4, 7, 8, 9, 11, 13, 14, 15, 17, 18, 24, 25, 26, 28, 30, 37, 38, 39, 40, 41, 42, 43, 44], "squarederror": 10, "squaredeuclidean": 14, "squash": [12, 39, 41], "src": [], "srtm": 6, "srtm_data_norway_1": 6, "srv": 43, "sso": 20, "stabil": [5, 27, 28, 36, 38, 39], "stabl": [0, 4, 5, 6, 9, 16, 20, 25, 27, 33, 34, 35, 36], "stack": [3, 4, 43, 44], "stage": [5, 13, 15, 27, 28, 36, 40, 41, 42], "stagnat": 36, "stai": [0, 2, 4, 5, 11, 33, 34, 36, 41, 42, 43], "stand": [0, 5, 9, 12, 33, 34, 35, 39], "standard": [0, 1, 4, 5, 6, 7, 8, 10, 12, 17, 18, 19, 23, 26, 27, 28, 30, 33, 35, 36, 38, 39, 40, 41, 43, 44], "standardscal": [0, 6, 7, 8, 9, 10, 11, 17, 34, 36], "standpoint": 36, "stanford": [13, 35, 42, 43], "stanforduniversityschoolofengin": 43, "start": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 21, 22, 24, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 40, 41, 42, 44], "start_tim": 14, "starter": [], "stat": [6, 37], "state": [1, 2, 4, 5, 6, 7, 8, 10, 11, 12, 13, 24, 25, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43], "statement": [0, 7, 26, 33, 39], "static": [], "stationari": [35, 36], "statist": [0, 1, 3, 4, 7, 9, 10, 11, 12, 13, 14, 19, 26, 27, 32, 34, 35, 36, 39, 40, 41, 43, 44], "statu": [0, 7, 15, 33, 38, 39], "stavang": 6, "stb": [], "std": [0, 4, 6, 18, 33, 34, 36, 37, 38, 42], "stdout": [41, 42], "steep": [13, 23, 35, 36], "steepest": 36, "stefan": [], "step": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 22, 26, 27, 33, 35, 39, 40, 41, 42, 43, 44], "step_fn": [7, 12, 38, 39, 41], "step_length": [13, 36], "step_siz": 36, "steps_list": 9, "stereo": [3, 43, 44], "sticki": [], "still": [0, 2, 3, 5, 6, 11, 13, 21, 22, 24, 28, 30, 34, 35, 36, 37, 38, 40, 42, 43, 44], "stimuli": [12, 39, 40], "stk": [32, 33], "stk2100": [32, 33], "stk3155": [15, 27, 28, 29, 31], "stk4021": [32, 33], "stk4051": [32, 33], "stk4155": [29, 31], "stk5000": 32, "stochast": [0, 1, 5, 6, 8, 11, 12, 22, 24, 28, 35, 37, 38, 40, 41], "stock": 4, "stoke": [12, 39, 40], "stone": [0, 7, 38, 39, 40], "stop": [1, 4, 9, 13, 14, 18, 35, 40, 41, 42], "storag": [5, 34, 35], "store": [0, 1, 2, 3, 6, 11, 13, 22, 30, 33, 36, 41, 42, 43, 44], "storehaug": [31, 33], "stori": [], "str": [1, 3, 4, 41, 42, 43, 44], "straight": [0, 6, 8, 13, 33, 35, 37], "straightforward": [0, 2, 3, 5, 6, 8, 9, 10, 13, 26, 33, 34, 35, 37, 42, 43, 44], "strategi": [0, 1, 9, 33, 41], "stratifi": [6, 37, 38], "stream": 36, "strength": [0, 5, 14, 34, 35, 42], "stretch": 11, "strict": [8, 13, 35], "strictli": [8, 13, 35], "stride": [4, 26, 43, 44], "strike": 6, "string": [1, 41], "stroke": [7, 38, 39], "strong": [3, 6, 9, 10, 12, 26, 30, 36, 37, 39, 40], "strongli": [0, 8, 15, 20, 22, 24, 25, 26, 28, 41, 42, 43, 44], "stronli": [], "structur": [0, 1, 2, 3, 6, 9, 10, 12, 22, 25, 33, 37, 38, 39, 41, 42, 43, 44], "stuck": [1, 13, 35, 36, 41, 42], "student": [0, 15, 27, 28, 29, 31, 32, 33, 42, 43], "studi": [0, 3, 4, 5, 6, 7, 8, 11, 12, 13, 23, 25, 27, 28, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 44], "studier": 32, "stuff": [21, 22], "style": [7, 9, 20, 26, 33], "stylesheet": [], "st\u00f8land": 31, "sub": [9, 12, 36, 39, 40], "subarrai": [], "subclass": [], "subdivid": [0, 26, 33], "subfield": 0, "subgradi": 36, "subject": [6, 8, 30], "sublicens": [], "sublinear": 36, "submatric": [43, 44], "submit": 33, "subplot": [0, 1, 3, 4, 6, 7, 8, 9, 10, 14, 21, 33, 37, 38, 39, 41, 42, 44], "subplots_adjust": [8, 30], "subprogram": [26, 33], "subproject": [], "subract": [0, 34], "subregion": [43, 44], "subroutin": [0, 33], "subsampl": [43, 44], "subscript": [1, 41], "subsequ": [1, 4, 5, 6, 12, 26, 30, 34, 35, 37, 39, 40], "subset": [1, 6, 9, 12, 13, 23, 25, 33, 35, 36, 37, 38, 39, 40, 41], "subspac": [0, 8, 11, 34], "substanti": [9, 10, 36], "substep": 11, "substitut": [3, 6, 12, 16, 26, 37, 38, 39], "subsubset": 9, "subtask": 6, "subtl": [1, 41], "subtract": [0, 4, 5, 6, 11, 13, 18, 19, 26, 27, 30, 34, 36, 37, 38, 41, 42], "subtre": 9, "succeed": [0, 4, 33], "success": [3, 7, 9, 13, 30, 38, 39, 43, 44], "successfulli": [4, 9], "succinctli": 36, "sudo": [0, 25, 27, 33], "suffer": [0, 1, 2, 5, 10, 24, 33, 34, 35, 41, 42, 43], "suffici": [1, 6, 8, 11, 13, 35, 37, 38, 41], "suggest": [1, 13, 23, 27, 28, 32, 35, 36, 41, 43, 44], "suit": [8, 12, 28, 39, 40], "suitabl": [0, 15, 19, 24, 30, 34, 36], "sum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 21, 23, 26, 30, 33, 34, 35, 36, 39, 42, 43, 44], "sum_": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "sum_i": [0, 2, 5, 6, 8, 13, 19, 23, 27, 34, 35, 36, 37, 38, 42, 43], "sum_j": [6, 18, 36], "sum_ja_": 0, "sum_k": [6, 8, 12, 26, 40, 41, 42], "sum_logist": 13, "sum_m": [3, 43, 44], "sum_n": [3, 43, 44], "sum_nx_": 3, "summar": [5, 6, 9, 23, 28, 37, 38], "summari": [1, 3, 4, 10, 24, 29, 35, 36, 41, 42, 43, 44], "summat": [0, 3, 16, 34, 35, 43, 44], "sunni": 9, "super": [5, 34, 35, 36, 41, 42, 44], "superfici": 3, "superscript": [1, 12, 39, 40, 41], "supervis": [0, 5, 6, 7, 9, 12, 25, 33, 34, 35, 37, 38, 39, 40], "supplement": [7, 27, 28, 38, 39], "supplementari": 28, "suppli": [], "support": [0, 1, 9, 10, 11, 13, 20, 21, 23, 25, 33, 34, 36, 38, 39, 40, 41, 42], "suppos": [0, 5, 6, 7, 8, 10, 11, 12, 13, 26, 33, 34, 35, 36, 37, 38, 39, 40], "suppress": [5, 13, 35], "sure": [0, 1, 4, 6, 16, 20, 21, 22, 27, 41, 42], "surf": 6, "surfac": [0, 6, 33, 36], "surpass": 6, "surpris": [0, 33], "surround": [3, 25, 44], "survei": [0, 5, 6, 33, 34], "svc": [8, 9, 10], "svd": [0, 6, 11, 33, 37], "svdinv": 5, "svm": [8, 9, 10, 11], "svm_clf": [8, 10], "svn": [], "swap": 21, "swath": [5, 34, 35], "sweep": 23, "switch": [0, 41, 42], "sy": [13, 35, 36, 41, 42], "symbol": [1, 5, 11, 13, 25, 30, 33, 34, 35, 40, 41, 43, 44], "symmeteri": 1, "symmetr": [0, 5, 8, 11, 12, 13, 26, 33, 34, 39, 40], "symmetri": 6, "sympi": [0, 25, 27, 33, 40], "synonim": 30, "syntax": 13, "system": [0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 25, 26, 27, 33, 35, 36, 38, 39, 40, 41, 42], "systemat": [4, 6, 37, 38], "t": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 35, 36, 37, 38, 39, 40, 41, 42], "t0": [3, 6, 13, 36], "t1": [2, 13, 36, 42, 43], "t2": [2, 42, 43], "t3": [2, 42, 43], "t9jjwsmsd1o": 37, "t_": [2, 42, 43], "t_0": [2, 9, 13, 36, 42, 43], "t_1": [13, 36], "t_b": 10, "t_batch": [41, 42], "t_i": [1, 2, 5, 12, 28, 34, 35, 41, 42, 43], "t_j": 12, "t_k": 9, "t_test": [41, 42], "t_train": [41, 42], "t_val": [41, 42], "tabl": [9, 23, 27, 28, 30, 31, 33, 39], "tabul": [0, 23, 33], "tabular": 33, "tackl": 4, "tag": [2, 3, 4, 5, 6, 7, 12, 13, 14, 26, 30, 34, 35, 38, 39, 40, 41, 42, 43], "tagrget": 40, "taht": [0, 33], "tail": 30, "tailor": [2, 8, 11, 33, 40, 42, 43], "taiwan": [0, 33], "take": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 21, 22, 23, 25, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "taken": [0, 1, 3, 6, 10, 13, 21, 26, 37, 41, 43, 44], "tan": 3, "tangent": [1, 4, 12, 13, 35, 39, 41, 42], "tanh": [1, 4, 7, 8, 12, 38, 39, 41, 42], "target": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 19, 21, 22, 23, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44], "target_nam": [9, 21], "task": [0, 1, 3, 6, 9, 11, 12, 14, 21, 24, 27, 28, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44], "tau": [3, 5, 30], "taught": 33, "tax": [], "taylor": [2, 13, 35, 40, 42, 43], "taylornr": [13, 35], "tc": 8, "teach": [15, 29, 33, 37], "team": [1, 41, 42], "teaser": 0, "technic": [0, 5, 6, 13, 27, 28, 35, 36, 37], "techniqu": [0, 1, 8, 10, 13, 25, 30, 32, 33, 34, 36, 37, 38, 41], "technologi": [0, 1, 41], "tell": [0, 4, 6, 10, 11, 13, 16, 30, 36, 37, 38], "temp": 1, "temp1": 1, "temp2": 1, "temperatur": [0, 9, 33], "templat": [18, 20], "temporari": [], "temporarili": [1, 41], "ten": [3, 33, 40, 43, 44], "tend": [3, 5, 6, 8, 9, 10, 12, 13, 14, 34, 36, 37, 38, 43, 44], "tendenc": [0, 33], "tension": [6, 37, 38], "tensor": [3, 44], "tensorflow": [0, 2, 4, 8, 14, 25, 26, 27, 28, 32, 33, 34], "term": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 22, 23, 27, 28, 30, 33, 34, 35, 36, 38, 39, 42, 43, 44], "term1": [5, 6, 11], "term2": [5, 6, 11], "term3": [5, 6, 11], "term4": [5, 6, 11], "termin": [0, 4, 5, 9, 10, 13, 15, 34, 35, 36], "terminarl": 15, "terrain": 6, "terrain1": 6, "test": [3, 4, 5, 6, 7, 8, 9, 10, 13, 16, 19, 20, 21, 23, 24, 26, 27, 30, 33, 35, 36, 37, 38, 39, 44], "test_acc": [3, 42, 44], "test_accuraci": [1, 3, 41, 42, 44], "test_dataset": [42, 44], "test_error": 6, "test_imag": [3, 4, 44], "test_ind": [6, 37, 38], "test_input": 4, "test_label": [3, 4, 44], "test_load": [42, 44], "test_loss": [3, 42, 44], "test_pr": [1, 41], "test_predict": [1, 41], "test_rnn": 4, "test_scor": [7, 10, 23, 39], "test_siz": [0, 1, 3, 5, 6, 10, 15, 17, 28, 34, 35, 36, 37, 38, 41, 42, 44], "test_split": 9, "testerror": [0, 6, 34, 37, 38], "testi": 4, "testpredict": 4, "testx": 4, "tex": [], "text": [0, 1, 2, 4, 5, 8, 9, 11, 13, 15, 18, 20, 23, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 41, 42, 43], "textbf": [], "textbook": [16, 27, 28, 34, 35, 37, 38], "textual": 9, "textur": 1, "tf": [1, 3, 4, 13, 14, 35, 41, 42, 44], "th": [0, 1, 2, 5, 6, 7, 9, 12, 13, 14, 26, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43], "than": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 17, 21, 23, 25, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "thank": [4, 6, 34, 36, 42, 43], "thats": [41, 42], "theano": [1, 25, 33, 41, 42], "thei": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 18, 20, 22, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "them": [0, 1, 3, 4, 6, 8, 9, 10, 11, 12, 13, 18, 21, 24, 26, 27, 28, 33, 34, 39, 40, 41, 42, 43, 44], "theme": [0, 15, 33], "themselv": [0, 27, 28, 30, 33, 36, 43, 44], "thenc": [6, 37, 38], "theorem": [2, 6, 7, 34, 35, 38, 39, 41, 42, 43, 44], "theoret": [0, 4, 10], "theori": [0, 1, 3, 8, 9, 12, 13, 19, 25, 27, 32, 33, 36, 39, 40, 41], "thereaft": [0, 5, 6, 11, 12, 26, 27, 33, 37, 38, 40, 41, 42, 44], "therebi": [0, 5, 7, 11, 27, 33, 34, 35, 38, 39, 40], "therefor": [0, 1, 2, 3, 4, 6, 7, 8, 11, 13, 19, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "therein": 11, "thereof": [0, 6, 13, 33, 36, 37], "theta": [0, 1, 4, 5, 6, 7, 13, 16, 27, 30, 33, 34, 35, 36, 38, 39, 40, 41], "theta1": 36, "theta2": 36, "theta_": [0, 1, 6, 7, 13, 33, 34, 35, 36, 38, 39, 41], "theta_0": [0, 5, 6, 7, 16, 33, 34, 35, 36, 38, 39], "theta_0x_": [0, 33, 34], "theta_1": [0, 5, 6, 7, 33, 34, 35, 36, 38, 39], "theta_1x_": [0, 33, 34], "theta_1x_0": [0, 33], "theta_1x_1": [0, 7, 33, 38, 39], "theta_1x_2": [0, 33], "theta_1x_i": [7, 34, 35, 36, 38, 39], "theta_2": [0, 33, 34], "theta_2x_": [0, 33, 34], "theta_2x_0": [0, 33], "theta_2x_1": [0, 33], "theta_2x_2": [0, 7, 33, 38, 39], "theta_2x_i": 34, "theta_3x_i": 34, "theta_4x_i": 34, "theta_closed_form": 18, "theta_closed_formol": 18, "theta_closed_formridg": 18, "theta_gdol": 18, "theta_gdridg": 18, "theta_i": [0, 1, 5, 33, 34, 35, 41], "theta_j": [0, 5, 6, 18, 33, 34, 36], "theta_k": [35, 36], "theta_linreg": [13, 35, 36], "theta_ol": 18, "theta_p": [7, 38, 39], "theta_px_p": [7, 38, 39], "theta_ridg": 18, "theta_t": [13, 36], "theta_tru": 18, "thetaand": 39, "thetaith": 36, "thetaor": 39, "thetavalu": 5, "thetaxor": 39, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 23, 25, 26, 27, 28, 29, 30, 32, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "thing": [0, 1, 2, 4, 5, 7, 9, 15, 16, 18, 21, 22, 30, 33, 37, 39, 41, 42], "think": [0, 1, 3, 4, 6, 9, 12, 13, 14, 30, 33, 34, 35, 36, 37, 39, 41, 43, 44], "third": [0, 3, 6, 13, 31, 33, 35, 36, 43, 44], "thirti": [7, 39], "thorughout": 33, "those": [0, 3, 5, 6, 8, 9, 10, 11, 23, 26, 27, 28, 33, 34, 35, 36, 37, 38, 40, 42, 43, 44], "though": [1, 2, 3, 4, 13, 16, 17, 19, 21, 22, 26, 30, 36, 41, 42, 43], "thought": [6, 14, 27, 28, 30, 37, 38], "thousand": [0, 1, 27, 34, 36, 41], "three": [0, 1, 3, 5, 6, 8, 9, 12, 21, 23, 24, 26, 27, 28, 29, 30, 31, 33, 34, 35, 37, 38, 39, 43, 44], "threshold": [1, 3, 9, 10, 11, 12, 13, 23, 36, 38, 39, 40, 41, 42, 43, 44], "through": [0, 1, 2, 3, 4, 5, 6, 8, 11, 12, 13, 14, 15, 21, 22, 23, 25, 26, 27, 30, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44], "throughout": [0, 4, 5, 14, 15, 25, 26, 30, 33, 41, 42], "throw": [3, 6, 30, 37], "thu": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "thumb": [0, 6, 27, 34], "thursdai": [], "tibshirani": [6, 19, 27, 32, 33, 37, 38], "tick_param": 6, "ticker": [6, 13, 30, 35, 36], "tif": 6, "tight_layout": [1, 7, 39], "tightli": 11, "tild": [0, 5, 6, 7, 11, 19, 27, 30, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "till": [0, 4, 7, 8, 9, 10, 12, 26, 33, 34, 38, 39, 40, 41, 42, 43, 44], "time": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "timeit": 4, "timer": 4, "times2": 23, "timeseri": [], "tini": [1, 36, 41], "tip": [3, 43, 44], "titl": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 20, 21, 24, 30, 33, 35, 36, 37, 38, 41, 42, 43, 44], "tm": [], "tmp": 13, "tn": [2, 3, 7, 23, 42, 43], "to_categor": [1, 3, 4, 41, 42, 44], "to_categorical_numpi": [1, 41], "to_numer": [0, 6, 33, 37, 38], "todai": 3, "togeth": [0, 3, 6, 8, 11, 13, 22, 25, 33, 42, 43, 44], "toi": 14, "token": [], "told": 13, "toler": [2, 14, 42, 43], "tolist": 4, "tomographi": [12, 39, 40], "too": [0, 2, 4, 5, 6, 9, 11, 13, 17, 18, 24, 30, 32, 34, 35, 36, 37, 38, 42, 43, 44], "took": [8, 33], "tool": [0, 1, 3, 6, 13, 15, 25, 34, 37, 38, 41, 43, 44], "toolbox": 8, "top": [0, 3, 5, 6, 9, 10, 19, 23, 25, 33, 37], "topic": [0, 5, 6, 7, 8, 25, 27, 28, 34, 35, 37, 38, 39, 40, 42], "topolog": [3, 12, 39, 40, 43, 44], "topologi": [1, 12, 41], "torch": [42, 44], "torchvis": [42, 44], "torkjellsdatt": [31, 33], "tort": [], "toss": [10, 30], "total": [0, 1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13, 14, 23, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "total_loss": 4, "totalclustervari": 14, "totalscatt": 14, "totensor": [42, 44], "toward": [1, 2, 7, 12, 13, 15, 23, 24, 35, 38, 39, 41, 42, 43], "towardsdatasci": 36, "town": [], "tp": [4, 7, 23], "tpng": 9, "tpr": 23, "tpu": [13, 25, 33], "tqdm": 6, "tr": [], "track": [3, 13, 14, 15, 22, 26, 34, 35, 36, 43, 44], "tract": [], "tractabl": [0, 33, 34], "trade": [5, 9, 20, 23, 28, 36, 37], "tradeoff": [0, 5, 19, 27, 33, 34, 35], "tradit": [0, 1, 4, 6, 24, 33, 37, 38, 41], "train": [2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 16, 17, 20, 24, 27, 28, 35, 36, 37, 38, 39, 42], "train_acc": [41, 42], "train_accuraci": [0, 1, 3, 33, 41, 42, 44], "train_dataset": [4, 42, 44], "train_end": [0, 1, 34, 41], "train_error": [6, 41, 42], "train_imag": [3, 4, 44], "train_ind": [6, 37, 38], "train_label": [3, 4, 44], "train_load": [42, 44], "train_network": 21, "train_pr": [1, 41], "train_siz": [0, 1, 3, 34, 41, 42, 44], "train_step": 4, "train_test_split": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "train_test_split_numpi": [0, 1, 34, 41], "trainable_vari": 4, "trained_model": [6, 34, 36], "trainerror": [0, 34], "traini": 4, "training_checkpoint": 4, "training_dataset": 4, "training_gradi": [13, 36], "trainingerror": [6, 37, 38], "trainpredict": 4, "trainscor": 4, "trainx": 4, "trait": [0, 33], "trajectori": [4, 36], "transfer": [9, 33], "transform": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 21, 25, 26, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "transit": [6, 12, 39, 40], "translat": [1, 4, 6, 10, 33, 34, 36, 41, 43, 44], "transpos": [1, 5, 11, 21, 26, 34, 35, 41], "travers": [0, 5], "travi": [], "treat": [0, 1, 3, 6, 12, 13, 18, 21, 23, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "tree": [0, 1, 25, 33, 41], "tree_clf": [9, 10], "tree_clf_": 9, "tree_clf_sr": 9, "tree_reg": 9, "tree_reg1": 9, "tree_reg2": 9, "trend": 30, "treue": 7, "trevor": [19, 27, 32], "tri": [2, 3, 4, 9, 13, 16, 36, 42, 43], "triain": 0, "trial": [0, 2, 4, 6, 13, 30, 33, 35, 36, 37, 38], "triangl": [13, 35], "triangular": 26, "trick": [3, 4, 8, 11, 13, 30, 36, 43, 44], "tricki": 22, "trickier": 30, "tridiagon": 26, "trigonometr": [43, 44], "trillion": 25, "trim": [], "trivial": [0, 1, 5, 11, 30, 33, 35, 41, 42], "troffa": [], "troubl": [0, 8, 12, 15, 21, 22, 34, 36, 40, 41], "truck": [3, 44], "true": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "true_beta": 34, "true_fun": [6, 37, 38], "true_theta": [6, 36], "truelabel": [38, 39], "truli": 33, "truncat": 40, "try": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 18, 21, 22, 25, 26, 27, 28, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "tr\u00f6ger": [], "tucker": 8, "tuesdai": [31, 33, 38, 42], "tumor": [7, 9, 38, 39], "tumour": [7, 39], "tunabl": 1, "tune": [4, 9, 13, 24, 26, 33, 36], "tupl": [21, 41, 42], "turn": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "tutori": [1, 4, 28, 41, 42], "tv": [2, 42], "tveito": [2, 42, 43], "tvw1zdmznwm": 39, "tweak": [1, 4, 10, 30, 41, 42], "twice": [13, 35], "twist": 11, "two": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 17, 21, 23, 24, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 42], "tx": [13, 35, 36, 39], "tx_1": [13, 35], "txt": [4, 15, 20, 27, 28], "ty": [13, 35], "type": [0, 1, 3, 6, 8, 10, 13, 21, 23, 24, 26, 30, 34, 35, 36, 37, 41], "typeset": 20, "typic": [0, 1, 2, 3, 4, 5, 7, 9, 10, 12, 13, 15, 16, 20, 24, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "typo": [27, 28], "u": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "u_": 26, "u_i": [12, 39], "u_m": 10, "ua": [0, 33], "ubuntu": [0, 25, 27, 33], "uci": [27, 28], "ufunc": [], "uio": [15, 20, 21, 27, 28, 31, 32], "uk": [], "un": 14, "unabl": [15, 21], "unari": [26, 33], "unbalanc": [6, 9, 37, 38], "unbias": [0, 5, 6, 33, 37], "uncent": [6, 34, 36], "uncertainti": [0, 5, 33], "uncertitud": 30, "unchang": [1, 3, 41, 43, 44], "uncom": [], "uncorrel": [10, 30], "undefin": [5, 34, 35], "under": [0, 1, 5, 6, 10, 13, 23, 25, 27, 33, 34, 35, 36, 37, 41, 42], "underdetermin": [0, 33], "underfit": [1, 6, 24, 37, 38, 41], "underflowproblem": [5, 37], "undergo": [5, 21], "undergradu": [29, 31], "underli": [0, 1, 9, 13, 18, 23, 30, 33, 36, 41], "underlin": [], "underscor": [], "underset": [4, 14], "understand": [0, 1, 3, 5, 6, 10, 13, 14, 15, 19, 20, 21, 25, 33, 34, 35, 36, 40, 41, 42, 43, 44], "understood": [8, 13], "underwai": [], "undesir": 8, "undetermin": [5, 8, 37], "undo": 4, "unexpect": [6, 37], "unexpected": 30, "unexplain": 18, "unfair": [6, 34], "unfortun": [1, 8, 9, 10, 41], "unicode_liter": [8, 9], "uniform": [0, 1, 5, 6, 11, 13, 27, 30, 33, 35, 36, 38, 39, 41], "uniformli": [13, 30, 35, 36], "unifrompdf": 30, "unimport": [13, 35], "union": [5, 6, 37, 38], "uniqu": [0, 2, 6, 13, 14, 26, 33, 37, 38, 39, 42, 43], "unique_class": [38, 39], "unique_cluster_label": 14, "unit": [0, 1, 3, 4, 5, 10, 12, 18, 30, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "unitari": [5, 6, 26, 34, 35], "unitarili": [26, 33], "uniti": 30, "univari": 30, "univers": [0, 1, 2, 13, 25, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "unix": [1, 41, 42], "unknow": [0, 26, 33], "unknown": [0, 1, 3, 4, 5, 6, 8, 10, 13, 19, 26, 27, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "unknowwn": 12, "unlabel": [1, 41], "unless": [0, 3, 6, 11, 13, 27, 28, 33, 35, 37, 40], "unlik": [1, 3, 8, 13, 35, 36, 41, 42, 43, 44], "unnecessarili": 9, "unord": [3, 43, 44], "unpickl": [], "unpleas": [], "unpublish": 36, "unravel": [1, 41], "unrol": [3, 11, 44], "unscal": 19, "unseen": [0, 7, 9, 15, 38, 39], "unstabl": [1, 41], "unsupervis": [0, 1, 4, 12, 25, 33, 39, 40, 41], "unsymmetr": [26, 33], "until": [1, 2, 4, 9, 12, 13, 14, 21, 35, 36, 39, 41, 42, 43], "untouch": 0, "unusu": [12, 39, 40], "unweight": 23, "up": [1, 3, 4, 5, 6, 8, 10, 11, 13, 14, 16, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 31, 36, 39], "updat": [1, 2, 10, 12, 13, 14, 15, 18, 19, 21, 22, 28, 37, 38, 39, 43], "update_chang": [41, 42], "update_matrix": [41, 42], "update_weight": 22, "uploa": 33, "upload": [15, 20, 25, 27, 28, 32], "upon": [0, 1, 6, 7, 11, 26, 40, 41, 42], "upper": [0, 8, 9, 16, 26, 34, 43, 44], "uppercas": [26, 33], "upsampl": 4, "upscal": 4, "uptad": 40, "upward": [], "url": [33, 34, 39], "us": [4, 5, 6, 8, 9, 10, 11, 12, 14, 15, 17, 20, 21, 23, 24, 26, 30, 32, 37], "usag": [0, 8, 25, 33, 34, 40], "usd": [], "usd10000": [], "use_bia": 4, "usecol": [0, 33], "useless": [1, 41], "user": [0, 1, 2, 4, 6, 7, 15, 25, 26, 27, 33, 34, 38, 39, 41, 42, 43], "usernam": [15, 27, 28], "usetex": 30, "usg": 6, "usr": 30, "usual": [0, 3, 4, 7, 12, 13, 14, 23, 33, 36, 38, 39, 40, 43, 44], "ut": 5, "utf": [], "util": [1, 3, 4, 6, 7, 10, 14, 19, 33, 37, 38, 41, 42, 44], "ux": 26, "v": [2, 4, 5, 6, 11, 13, 15, 23, 25, 34, 35, 37, 38, 39, 40, 41, 42], "v0": 30, "v1": 30, "v2": 30, "v5": [], "v8xr": [39, 40, 41], "v_": 36, "v_0": [11, 36], "v_t": 36, "va": 1, "vahid": 33, "val": 13, "val_acc": [41, 42], "val_accuraci": [3, 44], "val_error": [41, 42], "val_loss": 4, "val_set": [41, 42], "vale": [2, 42, 43], "valid": [0, 1, 4, 7, 9, 10, 13, 23, 24, 25, 30, 33, 34, 36, 39, 41, 42], "validation_data": [3, 44], "validation_split": [4, 42], "valu": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 20, 21, 22, 25, 26, 27, 28, 33, 36, 39, 40, 41, 42, 43, 44], "valuat": 9, "valued_at_a": [41, 42], "valued_at_z": [41, 42], "valueerror": [], "valy": 4, "van": [0, 19, 27, 33, 34, 35, 36], "vandenbergh": [8, 13, 35], "vandermond": [0, 33], "vanilla": [0, 6, 11, 14, 34, 36], "vanish": [1, 4, 13, 24, 30, 35, 40], "var": [5, 6, 10, 11, 19, 27, 30, 34, 37, 38], "var_x": 30, "varabl": 8, "varepsilon": [5, 6, 19, 37], "varepsilon_": [5, 6, 37], "varepsilon_i": [5, 6, 37], "vari": [0, 1, 3, 5, 6, 10, 21, 23, 33, 37, 38, 40, 41, 44], "variabl": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 14, 21, 26, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "varianc": [0, 1, 5, 7, 9, 10, 11, 13, 14, 18, 20, 25, 26, 28, 30, 33, 34, 35, 36, 39, 41], "variance_i": [5, 11, 34], "variance_x": [5, 11, 34], "variant": [0, 1, 6, 8, 12, 13, 28, 33, 34, 35, 36, 39, 40, 41, 42], "variat": [3, 4, 11, 33, 43, 44], "varieti": [0, 3, 12, 25, 27, 33, 39, 40, 43, 44], "variou": [1, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 19, 20, 25, 26, 27, 30, 33, 34, 35, 36, 39, 40, 41, 42], "varydimens": 4, "vast": 36, "vastli": [3, 43, 44], "vaue": 1, "vault": 0, "vdot": [2, 13, 35, 36, 42, 43], "ve": [27, 28, 36], "vec": [6, 37], "vector": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 14, 17, 18, 21, 22, 25, 35, 36, 37, 38, 40, 41, 42], "vector_mean": 14, "ventur": [0, 8, 25, 33], "venv": 15, "verbos": [1, 3, 4, 38, 39, 41, 42, 44], "veri": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 21, 22, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "verifi": [3, 11, 26, 33], "versatil": [8, 33], "versicolor": [8, 9], "version": [0, 3, 10, 13, 14, 15, 21, 22, 24, 25, 26, 27, 28, 30, 33, 43, 44], "versu": [1, 23, 36, 41], "vert": [0, 1, 5, 6, 7, 8, 9, 11, 13, 16, 17, 33, 34, 35, 36, 37, 38, 39, 40, 41], "vert_1": [5, 6, 34, 35, 36], "vert_2": [5, 6, 11, 17, 34, 35, 36, 37], "vg5wr4qee1zxk": 43, "vi": [41, 42], "via": [0, 5, 6, 7, 8, 9, 10, 11, 12, 19, 23, 25, 26, 27, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "vidal": 11, "video": [0, 1, 12, 25, 29, 31, 33, 34, 35, 42, 43, 44], "view": [1, 3, 5, 6, 12, 13, 30, 32, 33, 35, 36, 37, 39, 41, 42, 43, 44], "vii": [41, 42], "viii": [41, 42], "violat": 8, "virginica": 9, "viridi": [0, 1, 2, 3, 33, 41, 42, 43, 44], "virtanen": [], "virtual": [1, 36, 41], "viscos": 13, "viscou": 13, "visibl": 15, "visin": [43, 44], "vision": [0, 3, 43, 44], "visit": 36, "visual": [0, 3, 11, 12, 18, 23, 25, 33, 34, 39, 40, 42, 43], "visualis": 1, "visualstudio": [15, 16, 19], "viz": [6, 8, 30], "vmap": 13, "vmax": [1, 6], "vmh0zpt0tli": 36, "vmin": [1, 6], "voic": [3, 43, 44], "volatil": 36, "volum": [0, 3, 33], "volume18": 42, "von": [40, 41], "vote": [10, 33], "voting_clf": 10, "votingclassifi": 10, "votingsimpl": 10, "vscode": [21, 22], "vstack": [5, 11, 26, 30, 33, 34, 38, 39, 41, 42], "vt": [5, 34, 35, 43], "w": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 21, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "w1": [8, 21, 22], "w2": [8, 11, 21, 22], "w3": 8, "w_": [1, 12, 39, 40, 41, 42, 43, 44], "w_0": 40, "w_1": [8, 26, 40, 41, 43, 44], "w_1a_0": [40, 41], "w_1x": [40, 41], "w_1x_": 8, "w_1x_1": 8, "w_2": [8, 26, 40, 41, 43, 44], "w_2a_1": [40, 41], "w_2x_": 8, "w_2x_2": 8, "w_3": 26, "w_4": 26, "w_g": [21, 22], "w_hidden": [2, 42, 43], "w_i": [1, 2, 10, 40, 41, 42, 43], "w_ix_i": [12, 39, 40], "w_j": 26, "w_m": 26, "w_output": [2, 42, 43], "w_px_": 8, "w_px_p": 8, "w_t": [], "wa": [1, 3, 4, 5, 6, 7, 10, 11, 12, 14, 17, 19, 21, 26, 33, 34, 36, 37, 38, 39, 40, 41, 42], "wai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 18, 19, 21, 22, 23, 24, 26, 30, 33, 34, 35, 36, 39, 41, 42], "walk": 9, "walker": 30, "wall": 36, "walt": [], "wang": [0, 33], "want": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, 25, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "warn": [4, 41, 42], "warrant": [6, 37, 38], "warranti": [], "wast": [3, 36, 43, 44], "watch": [25, 35, 36, 37, 39, 40, 41, 42, 43], "wave": 3, "wavelet": 8, "wcag": [], "we": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 37, 38, 39, 43, 44], "weak": [9, 10, 14], "weaker": 36, "weather": [1, 12, 39, 40, 41], "web": [25, 29, 31, 33], "weblink": 28, "webpag": 33, "websit": [6, 26, 27, 28, 29, 33], "wedg": [8, 30, 40, 41], "wednesdai": [31, 33, 38, 42], "wee": 11, "week": [0, 5, 6, 7, 27, 28, 29, 31], "week41": [28, 42], "week42": [28, 42], "weekli": [15, 16, 24, 25, 27, 29, 31, 32, 33, 39], "weierstrass": 40, "weight": [1, 2, 3, 6, 7, 9, 10, 12, 13, 18, 21, 22, 23, 28, 30, 36, 38, 39, 40, 42, 43, 44], "weight_arrai": [41, 42], "weight_decai": 42, "weigth": [2, 22, 42, 43], "welchlab": [39, 40, 41], "welcom": [8, 15, 25], "well": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 20, 21, 22, 23, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "went": 8, "were": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 14, 30, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44], "wessel": [0, 19, 27, 33, 34, 35, 36], "wg_nf1awssi": 40, "what": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 26, 27, 28, 30, 36, 39, 40, 41, 42], "whatev": [3, 21, 43, 44], "when": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "whenev": [13, 15, 30, 36, 40], "where": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "wherea": [6, 23, 30, 36, 37, 38], "wherefrom": [27, 28], "wherein": [1, 12, 39, 40, 41], "whether": [0, 3, 5, 7, 9, 27, 28, 30, 33, 38, 39, 43, 44], "which": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40, 43, 44], "whichev": [1, 3, 41, 42, 44], "while": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 19, 20, 21, 22, 23, 24, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "white": 9, "whiteboad": 36, "whiteboard": [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "who": [0, 15], "whole": [1, 3, 4, 5, 9, 11, 13, 21, 36, 41, 43, 44], "whom": [], "whose": [0, 6, 10, 23, 28, 30, 34, 37, 38, 43, 44], "whow": [11, 34], "why": [0, 1, 3, 6, 13, 15, 16, 17, 19, 21, 24, 27, 34, 35, 41], "wide": [0, 1, 3, 6, 7, 12, 24, 25, 26, 27, 33, 37, 38, 39, 40, 41, 43, 44], "widehat": [6, 37], "width": [0, 3, 8, 9, 21, 33, 43, 44], "wieringen": [0, 19, 27, 33, 34, 35, 36], "wiki": 27, "wikipedia": 27, "win": [10, 36], "wind": 9, "window": [43, 44], "wing": [31, 33], "winther": [2, 42, 43], "wiothout": 6, "wiscons": 7, "wisconsin": [10, 39, 41, 42], "wisdom": [6, 34, 36], "wise": [1, 5, 12, 13, 21, 34, 35, 36, 39, 41], "wish": [0, 2, 5, 7, 8, 11, 13, 14, 18, 26, 27, 28, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "with_std": [0, 34], "wither": 6, "within": [0, 2, 3, 4, 7, 9, 12, 13, 14, 30, 32, 33, 35, 38, 39, 42, 43, 44], "withinclust": 14, "without": [0, 1, 5, 6, 8, 9, 11, 12, 13, 15, 18, 23, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41], "wo5dmep_bbi": [39, 40, 41], "won": [0, 15, 33, 40], "wonder": 8, "word": [0, 1, 3, 4, 5, 6, 7, 14, 19, 23, 27, 28, 30, 33, 34, 35, 36, 41, 42, 43, 44], "work": [0, 1, 4, 6, 7, 8, 9, 13, 15, 16, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 31, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "workabl": 36, "workaround": [], "workhors": 36, "workload": 36, "workshop": 33, "world": [0, 8, 16, 34], "worldwid": [0, 33], "worri": 15, "wors": [0, 1, 3, 4, 6, 24, 33, 36, 37, 38, 41, 44], "worst": 23, "worth": [9, 19, 21], "worthi": [27, 28], "would": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 18, 20, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "wouldn": [], "wrap": [6, 26, 33], "wrapper": [21, 22], "write": [0, 1, 2, 3, 5, 6, 7, 8, 12, 13, 15, 16, 18, 21, 23, 24, 26, 33, 34, 36, 37, 38, 39, 40, 42, 43], "writer": [38, 39], "writerow": [38, 39], "written": [0, 2, 3, 5, 11, 12, 13, 16, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 40, 41, 42, 43, 44], "wrong": [1, 8, 15, 19, 41], "wrongli": 10, "wrote": [5, 11, 34], "wrt": [10, 13, 21, 22, 36, 40, 41], "wth": [10, 13, 36], "wurstemberg": [40, 41], "www": [20, 25, 26, 27, 28, 32, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44], "wx_1": 8, "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "x0": [8, 38, 39], "x1": [4, 8, 9, 10, 13, 38, 39], "x1_exampl": 8, "x1d": 8, "x2": [8, 9, 10, 13], "x2d": [8, 11], "x2d_train": 11, "x2dsl": 11, "x3": 8, "x_": [0, 2, 3, 5, 6, 8, 10, 11, 13, 14, 26, 30, 33, 34, 35, 36, 37, 38, 40, 42, 43, 44], "x_0": [0, 5, 11, 18, 26, 33, 34, 37, 40], "x_1": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 18, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "x_2": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 26, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43], "x_3": [8, 26, 30, 40], "x_4": [26, 40], "x_5": 40, "x_6": 18, "x_batch": [41, 42], "x_bin": [38, 39], "x_center": 11, "x_data": [1, 41], "x_data_ful": [1, 41], "x_hidden": [2, 42, 43], "x_i": [0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "x_input": [2, 42, 43], "x_ix_": [0, 33], "x_iy_i": 8, "x_j": [0, 2, 8, 9, 12, 16, 30, 34, 36, 39, 40, 42, 43], "x_jy_j": 8, "x_k": [12, 14, 26, 30, 34, 39], "x_l": [30, 40], "x_m": [6, 12, 26, 30, 37, 39], "x_mean": [18, 36], "x_multi": [38, 39], "x_n": [0, 2, 3, 6, 8, 11, 12, 13, 26, 30, 33, 35, 37, 39, 40, 42, 43], "x_new": [9, 10], "x_norm": [18, 36], "x_offset": [6, 34, 36], "x_output": [2, 42, 43], "x_p": [3, 7, 9, 38, 39], "x_poli": 9, "x_poly10": 9, "x_pred": 4, "x_prev": [2, 42, 43], "x_reduc": 11, "x_sampl": [], "x_scale": 8, "x_small": 13, "x_std": [18, 36], "x_t": 36, "x_test": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 34, 35, 36, 37, 38, 39, 41, 42, 44], "x_test_": 17, "x_test_own": 6, "x_test_scal": [0, 6, 7, 9, 10, 11, 34, 36], "x_tot": 4, "x_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "x_train_": 17, "x_train_mean": [6, 34, 36], "x_train_own": 6, "x_train_r": 19, "x_train_scal": [0, 6, 7, 9, 10, 11, 34, 36], "x_val": [1, 41, 42], "xapprox": 43, "xarrai": [25, 33], "xavier": [1, 41], "xbnew": [13, 35, 36], "xcode": [0, 25, 27, 33], "xdclassiffierconfus": 10, "xdclassiffierroc": 10, "xg_clf": 10, "xgb": 10, "xgbclassifi": 10, "xgboost": 9, "xgboot": 10, "xgbregressor": 10, "xgparam": 10, "xgtree": 10, "xi": [8, 13, 36, 38, 39], "xi_": 8, "xi_1": 8, "xi_i": 8, "xinv": 39, "xk": 8, "xla": [13, 25, 33], "xlabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 21, 30, 33, 34, 35, 36, 37, 38, 42, 43, 44], "xlim": [6, 10, 37, 38], "xm": 9, "xmesh": 13, "xnew": [0, 13, 33, 35, 36], "xp": 30, "xpanda": [0, 34], "xpd": [5, 11, 34], "xplot": 0, "xscale": [0, 34], "xsr": 9, "xt_x": [13, 35, 36], "xtest": [6, 37, 38], "xtick": [3, 6, 8, 9, 37, 38, 44], "xtrain": [6, 37, 38], "xu": [0, 33], "xx": [0, 26, 33], "xy": [0, 6, 8, 26, 33], "xytext": 8, "xyz": [], "xz": [26, 33], "y": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "y1": 4, "y2": 4, "y3": 4, "y_": [0, 1, 5, 6, 10, 11, 26, 33, 34, 37, 38, 41, 43, 44], "y_0": [0, 5, 11, 26, 33, 34, 37], "y_1": [0, 5, 8, 9, 11, 13, 26, 33, 34, 35, 36, 37], "y_1y_1": 8, "y_1y_1k": 8, "y_1y_2": 8, "y_1y_2k": 8, "y_1y_n": 8, "y_1y_nk": 8, "y_2": [0, 5, 8, 9, 11, 26, 33, 34], "y_2y_1": 8, "y_2y_1k": 8, "y_2y_2": 8, "y_2y_2k": 8, "y_3": [0, 9, 26], "y_4": 26, "y_bin": [38, 39], "y_binari": [38, 39], "y_center": [18, 36], "y_data": [0, 1, 5, 6, 33, 34, 35, 36, 41], "y_data_ful": [1, 41], "y_decis": 8, "y_fit": [0, 34], "y_i": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41], "y_if_": 10, "y_indic": [38, 39], "y_ix_": [0, 33], "y_ix_i": [7, 8, 13, 34, 35, 36, 38, 39], "y_iy_jk": 8, "y_j": [6, 8, 12, 27, 37, 38, 39, 40, 41], "y_k": [12, 39], "y_m": 26, "y_mean": [18, 36], "y_model": [0, 4, 5, 6, 33, 34, 35, 36], "y_multi": [38, 39], "y_n": [8, 13, 35, 36], "y_ny_1": 8, "y_ny_1k": 8, "y_ny_2": 8, "y_ny_2k": 8, "y_ny_n": 8, "y_ny_nk": 8, "y_offset": [6, 17, 34, 36], "y_onehot": [38, 39], "y_plot": 9, "y_pred": [0, 1, 4, 6, 7, 8, 9, 10, 23, 28, 34, 36, 37, 38, 39, 41], "y_pred1": 9, "y_pred2": 9, "y_pred_bin": [38, 39], "y_pred_multi": [38, 39], "y_pred_rf": 10, "y_pred_tre": 10, "y_prob": [38, 39], "y_prob_bin": [38, 39], "y_prob_multi": [38, 39], "y_proba": [7, 10, 23, 39], "y_sampl": [], "y_scaler": [6, 34, 36], "y_test": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 34, 35, 36, 37, 38, 39, 41, 42, 44], "y_test_onehot": [1, 41], "y_test_predict": [], "y_tot": 4, "y_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "y_train_mean": [6, 34, 36], "y_train_onehot": [1, 41], "y_train_predict": [], "y_train_r": 19, "y_train_scal": [6, 34, 36], "y_true": [38, 39], "y_val": 1, "yadav": [42, 43], "yand": 39, "ye": [3, 6, 7, 37, 38, 39, 43, 44], "year": [0, 25, 33, 43], "yet": [0, 1, 6, 8, 11, 13, 20, 21, 33, 38, 40, 41, 42], "yi": [13, 36, 38, 39], "yield": [0, 2, 5, 6, 8, 10, 12, 13, 14, 23, 26, 30, 33, 35, 36, 37, 39, 40, 41, 42, 43], "yk": 8, "ylabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 21, 30, 33, 34, 35, 36, 37, 38, 42, 43, 44], "ylim": [3, 6, 37, 38, 44], "ym": 9, "ymesh": 13, "yn": 0, "yo": [8, 9, 10], "yor": 39, "yoshiki": [], "yoshua": [1, 32, 41], "you": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "young": 0, "your": [1, 2, 4, 5, 6, 8, 11, 13, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43], "your_model_object": 16, "yourself": [11, 13, 23, 33, 35, 43, 44], "youtu": [34, 35, 37, 39, 41, 42, 43, 44], "youtub": [25, 35, 36, 37, 39, 40, 41, 42, 43, 44], "ypred": [6, 37, 38], "ypredict": [0, 13, 33, 34, 35, 36], "ypredict2": [13, 35, 36], "ypredictlasso": [5, 35], "ypredictol": [0, 5, 35], "ypredictown": [6, 34, 36], "ypredictownridg": [6, 34, 35, 36], "ypredictridg": [0, 5, 6, 34, 35, 36], "ypredictskl": [6, 34, 36], "ytest": [6, 37, 38], "ytick": [3, 6, 8, 9, 37, 38, 44], "ytild": [0, 6, 33, 34, 37, 38], "ytildelasso": [5, 35], "ytildenp": [0, 33, 34], "ytildeol": [0, 5, 35], "ytildeownridg": [6, 34, 35, 36], "ytilderidg": [5, 6, 34, 35, 36], "ytrain": [6, 37, 38], "yuxi": 33, "yx": [26, 33], "yxor": [39, 41, 42], "yy": [26, 33], "yz": [26, 33], "z": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 21, 22, 26, 30, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44], "z1": [21, 22], "z2": [21, 22], "z_": [1, 2, 12, 26, 33, 40, 41, 42, 43], "z_0": [26, 33, 40], "z_1": [26, 33, 40, 41], "z_2": [22, 26, 33, 40, 41], "z_c": [1, 41], "z_h": [1, 41], "z_hidden": [2, 42, 43], "z_i": [1, 12, 39, 41], "z_j": [1, 12, 42], "z_k": [12, 34, 40, 41], "z_m": [1, 41], "z_matric": [41, 42], "z_mod": 9, "z_o": [1, 41], "z_output": [2, 42, 43], "za": [], "zalando": 28, "zaman": 30, "zaxi": 6, "zero": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "zero_grad": [42, 44], "zeros_lik": [4, 38, 39], "zeroth": [34, 43, 44], "zfill": 4, "zip": [4, 6, 21, 22, 38, 39], "zm_h": [0, 33], "zn": [], "zone": [], "zoom": 33, "zscout": [], "zx": [26, 33], "zy": [26, 33], "zz": [26, 33], "\u00f8yvind": [6, 34, 36], "\u03b4": [41, 42]}, "titles": ["3. Linear Regression", "14. Building a Feed Forward Neural Network", "15. Solving Differential Equations with Deep Learning", "16. Convolutional Neural Networks", "17. Recurrent neural networks: Overarching view", "4. Ridge and Lasso Regression", "5. Resampling Methods", "6. Logistic Regression", "8. Support Vector Machines, overarching aims", "9. Decision trees, overarching aims", "10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods", "11. Basic ideas of the Principal Component Analysis (PCA)", "13. Neural networks", "7. Optimization, the central part of any Machine Learning algortithm", "12. Clustering and Unsupervised Learning", "Exercises week 34", "Exercises week 35", "Exercises week 36", "Exercises week 37", "Exercises week 38", "Exercises week 39", "Exercises week 41", "Exercises week 42", "Exercises week 43", "Exercises week 44", "Applied Data Analysis and Machine Learning", "2. Linear Algebra, Handling of Arrays and more Python Features", "Project 1 on Machine Learning, deadline October 6 (midnight), 2025", "Project 2 on Machine Learning, deadline November 10 (Midnight)", "Course setting", "1. Elements of Probability Theory and Statistical Data Analysis", "Teachers and Grading", "Textbooks", "Week 34: Introduction to the course, Logistics and Practicalities", "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression", "Week 36: Linear Regression and Gradient descent", "Week 37: Gradient descent methods", "Week 38: Statistical analysis, bias-variance tradeoff and resampling methods", "Week 39: Resampling methods and logistic regression", "Week 40: Gradient descent methods (continued) and start Neural networks", "Week 41 Neural networks and constructing a neural network code", "Week 42 Constructing a Neural Network code with examples", "Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations", "Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)", "Week 45, Convolutional Neural Networks (CCNs)"], "titleterms": {"": [8, 10, 35, 36, 37, 38, 39], "0": 41, "04": [], "05": [], "06": [], "07": [], "1": [0, 15, 16, 17, 18, 19, 20, 21, 22, 24, 27, 28, 34, 40, 41, 42], "10": [28, 40], "11": [], "13": 41, "15": [19, 37], "19": 19, "1a": 18, "2": [0, 15, 16, 17, 18, 19, 20, 21, 22, 24, 28, 33, 34, 35, 40, 41, 42], "20": 42, "2017": [], "2018": [], "2019": [], "2023": 31, "2025": [27, 38, 39, 40, 41], "22": 38, "26": 38, "27": 43, "29": 39, "2a": [], "2b": [], "3": [0, 15, 16, 17, 18, 19, 20, 21, 22, 34, 40, 41, 42, 44], "34": [15, 33], "35": [16, 34], "36": [17, 35], "37": [18, 36], "38": [19, 37], "39": [20, 38], "3a": 18, "3b": 18, "3d": [43, 44], "4": [0, 15, 16, 17, 18, 19, 20, 21, 22, 34, 41], "40": 39, "41": [21, 40], "42": [22, 41], "43": [23, 42], "44": [24, 43], "45": 44, "4a": 18, "4b": 18, "5": [0, 16, 18, 19, 20, 21, 22], "6": [21, 22, 27, 40], "7": [21, 22], "8": [22, 36], "A": [0, 1, 4, 8, 9, 33, 37, 38, 39, 41, 42, 43, 44], "AND": 39, "And": [33, 34, 36, 42], "But": 36, "In": [31, 40], "Ising": 6, "OR": 39, "The": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 15, 25, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "To": 33, "With": [4, 35], "a11i": [], "about": [28, 33, 34, 35], "abov": [35, 40, 41, 42, 43, 44], "abstract": 20, "accuraci": 36, "across": 36, "activ": [1, 12, 21, 28, 39, 40, 41, 42], "ad": [0, 6, 20, 27, 33, 34, 39, 40, 41], "adaboost": 10, "adagrad": [13, 36], "adam": [13, 36], "adapt": [10, 36], "add": 44, "addit": [], "adjust": [1, 41], "advanc": 27, "adversari": 4, "again": [3, 9, 44], "against": 28, "ai": [27, 28, 33], "aim": [8, 9, 21, 22, 23, 24, 33], "aka": 33, "al": [36, 43, 44], "algebra": [26, 33], "algorithm": [9, 10, 11, 12, 28, 33, 34, 35, 36, 40, 41, 42], "algortithm": [13, 35, 38, 39], "all": [8, 40, 41], "an": [0, 4, 10, 15, 20, 33, 40], "analys": [5, 34, 35], "analysi": [0, 5, 6, 11, 25, 27, 28, 30, 33, 34, 35, 37, 38, 40], "analyt": [0, 16, 18, 28, 42], "analyz": [28, 40, 41], "ani": [13, 22, 35, 38, 39], "anoth": [9, 35, 37, 38], "api": [], "appli": 25, "approach": [0, 8, 14, 33, 36, 37, 38], "approxim": [12, 40], "architectur": [1, 41], "arithmet": [43, 44], "arrai": [26, 33], "artifici": [39, 40], "assist": 31, "assumpt": 37, "august": [], "author": [], "autocorrel": 30, "autograd": [2, 13, 22, 36, 42, 43], "automat": [13, 36, 40, 42, 43], "avail": 20, "avali": [], "averag": 36, "b": [23, 27, 28], "back": [1, 11, 12, 34, 35, 40, 41, 42, 43], "background": [25, 27, 28, 37], "backpropag": 22, "bag": 10, "base": [13, 36, 37], "basic": [0, 5, 7, 9, 10, 11, 26, 34, 35, 38, 39, 40], "batch": [1, 22, 36, 41], "bay": 5, "befor": [11, 43], "bengio": 41, "beta": [], "better": [8, 39], "bia": [6, 19, 27, 36, 37, 38], "bias": [40, 41], "binari": [1, 41], "bind": 33, "bird": 10, "blind": [], "block": [], "boldsymbol": [18, 34, 37], "book": [19, 40, 41], "boost": 10, "bootstrap": [6, 10, 37, 38], "boston": [], "breast": 1, "brief": [33, 37, 38, 43, 44], "bring": [12, 40, 41], "browser": [], "bsd": [], "build": [1, 3, 9, 41, 42, 43, 44], "c": [23, 27, 28, 33], "calcul": [18, 34, 35], "can": [33, 36, 37, 38, 40], "cancer": [1, 7, 9, 11], "cart": 9, "case": [8, 10, 30, 34, 35, 36, 38, 39, 43, 44], "ccn": 44, "cdn": [], "cell": [], "central": [13, 25, 30, 35, 37, 38, 39], "chain": [12, 40, 41], "challeng": 36, "chang": 10, "changelog": [], "channel": 33, "chi": [0, 33], "choic": [17, 41], "choos": [1, 36, 41], "cifar01": [3, 44], "citat": [], "class": [38, 39, 40], "classic": 11, "classif": [1, 9, 10, 28, 38, 39, 41, 42], "classifi": [8, 38], "claus": [], "clip": [1, 41], "cluster": 14, "cnn": [3, 43, 44], "code": [1, 2, 5, 9, 11, 12, 13, 14, 15, 16, 20, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "collect": [1, 3, 41, 42, 44], "color": [], "colorblind": [], "combin": 36, "common": [43, 44], "commun": 33, "commut": [43, 44], "compact": [38, 39, 40, 41], "compar": [2, 10, 16, 42, 43], "comparison": [35, 36], "compet": 36, "compil": 44, "complet": [34, 40, 41], "complex": [0, 6, 27, 34], "complic": [6, 40], "compon": 11, "compress": 43, "comput": [9, 19, 36], "computation": [37, 38], "computerlab": 33, "con": [9, 36], "concept": 30, "condit": 35, "confid": 37, "confus": 23, "conjug": 13, "consider": [40, 41, 43, 44], "constraint": 36, "construct": [40, 41, 42], "contain": [], "content": [], "continu": 39, "contn": 33, "contrast": [], "contributor": [], "converg": 36, "convex": [8, 13, 35, 36], "convolut": [3, 12, 39, 40, 43, 44], "copyright": [], "core": [], "correct": 36, "correl": [11, 34, 39, 43, 44], "correspond": [], "cost": [1, 10, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "count": 40, "cours": [25, 29, 32, 33], "covari": [5, 11, 30, 34], "cover": 33, "creat": [16, 20], "creator": [], "critic": 28, "cross": [6, 27, 37, 38, 39, 43, 44], "ct": [43, 44], "cumul": 23, "curv": 23, "custom": 21, "cython": 33, "d": [27, 28], "dark": [], "data": [0, 1, 3, 6, 7, 9, 11, 15, 17, 18, 21, 25, 30, 33, 34, 38, 39, 40, 41, 42, 44], "dataset": [1, 3, 18, 41, 44], "david": 33, "deadlin": [27, 28, 33], "deadllin": 31, "decai": [2, 36, 42, 43], "decis": [9, 10], "decomposit": [5, 11, 26, 34, 35], "deeep": [], "deep": [1, 2, 33, 36, 38, 39, 40, 41, 42, 43, 44], "defin": [1, 33, 40, 41, 42, 43], "definit": [19, 40, 41], "deflist": [], "degre": [0, 17, 34], "deliver": [15, 16, 19, 20, 24, 27, 28], "deliveri": [27, 28], "delta": 37, "dens": [0, 44], "depend": [], "depth": 28, "deriv": [5, 12, 16, 17, 19, 34, 35, 36, 37, 40, 41], "descent": [2, 10, 13, 18, 27, 35, 36, 39, 42, 43], "design": 34, "detail": [3, 33, 42, 43, 44], "develop": [1, 41], "diagon": 11, "differ": [8, 28, 36, 43, 44], "differenti": [2, 13, 36, 40, 42, 43], "diffus": [2, 42, 43], "dimens": 36, "dimension": [2, 3, 8, 18, 42, 43, 44], "direct": [], "disadvantag": 9, "discret": [30, 43, 44], "discrimin": 33, "discuss": 39, "distribut": [5, 30, 37], "do": [1, 36, 39, 41, 43], "document": 20, "doe": [34, 35, 39], "domain": 30, "don": [43, 44], "dot": [43, 44], "down": [1, 41], "dropout": [1, 41], "e": [27, 28], "each": [21, 38], "economi": [34, 35], "effici": [43, 44], "electron": [27, 28], "element": [0, 30, 33], "elimin": 26, "elu": [41, 42], "empir": 36, "energi": 33, "ensembl": 10, "entri": [40, 41], "entropi": [9, 38, 39], "environ": [0, 15], "equat": [0, 2, 12, 34, 35, 38, 39, 40, 41, 42, 43], "era": 43, "error": [0, 10, 33, 34, 35, 37, 38], "essenti": 33, "estim": 37, "et": [36, 43, 44], "etc": [33, 43, 44], "euler": [2, 42, 43], "evalu": [1, 28, 40, 41, 44], "evid": 36, "exampl": [1, 2, 3, 4, 6, 7, 8, 9, 10, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "exercis": [0, 6, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 34, 40, 42], "expect": [19, 30, 37], "expens": [37, 38], "experi": 30, "explicit": [40, 41], "explod": 41, "explor": 0, "exponenti": [2, 36, 42, 43], "express": [16, 17, 19, 34, 38, 39, 40, 41], "extend": [35, 38, 39, 40], "extrapol": 4, "extrem": [10, 33], "ey": 10, "f": [27, 28], "f_1": 23, "fall": 31, "famili": [1, 33, 41, 42], "famou": 26, "fantast": [34, 35], "faq": [], "featur": [9, 16, 26, 34], "februari": [], "feed": [1, 12, 22, 39, 40, 41, 42], "figur": 20, "file": [43, 44], "fill": [], "final": [12, 34, 36, 40, 41, 42, 43, 44], "find": [16, 18, 37, 43, 44], "fine": [1, 41], "first": [4, 12, 33, 35, 40, 41, 42, 43], "fit": [0, 10, 15, 16, 33, 35], "fix": [34, 35, 36], "float": 40, "fold": [37, 38], "forc": 3, "forest": 10, "form": 18, "format": [27, 28, 33], "formula": 18, "forward": [1, 2, 12, 22, 39, 40, 41, 42, 43], "foster": 33, "fourier": [3, 43, 44], "frank": 6, "freedom": [0, 17, 34], "frequent": [34, 36], "frequentist": [0, 33], "from": [5, 10, 12, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "full": [2, 36, 41, 42, 43, 44], "function": [0, 1, 6, 7, 8, 10, 11, 12, 13, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "funtion": 41, "further": [3, 5, 34, 35, 43, 44], "g": [27, 28], "gain": 23, "gan": 4, "gate": [39, 41, 42], "gaussian": 26, "gd": [13, 36], "gener": [4, 9, 33, 38, 39, 40, 43, 44], "geometr": [11, 35], "get": [20, 40], "gini": 9, "github": 15, "glorot": 41, "goal": [15, 16, 17, 18, 19, 20], "good": [0, 20, 33], "goodfellow": 36, "gotthard": [], "grade": [31, 33], "gradient": [1, 2, 10, 13, 18, 22, 27, 28, 35, 36, 39, 40, 41, 42, 43], "greativ": [], "group": 38, "growth": [2, 42, 43], "guid": [], "h": 27, "ha": 25, "hand": [22, 40, 41], "handl": [26, 33], "happen": [37, 38], "hessian": [34, 35, 36], "hidden": [2, 40, 41, 42, 43], "high": [], "histogram": 37, "histori": [], "homogen": 41, "hous": [], "how": [16, 43], "hyperbol": [39, 41], "hyperparamet": [1, 17, 41], "hyperplan": 8, "i": [0, 1, 33, 41, 42, 43, 44], "id3": 9, "idea": [11, 43, 44], "ideal": 35, "ident": 37, "identifi": 37, "ii": [33, 42, 43], "iid": 37, "iii": [42, 43], "illustr": [35, 39, 40], "imag": [43, 44], "implement": [1, 16, 17, 18, 28, 41, 42, 43], "implic": [5, 34, 35], "import": [5, 26, 33, 34, 35, 40, 41, 44], "improv": [1, 36, 41], "includ": [13, 27, 28, 36, 38, 39, 40], "incorpor": [], "increment": 11, "independ": 37, "index": 9, "inform": 31, "ingredi": 40, "init": [41, 42], "input": [2, 21, 22, 40, 41, 42, 43], "insight": 41, "instal": [25, 27, 33], "instructor": 31, "intermedi": 40, "interpret": [5, 11, 19, 33, 34, 35, 37], "interv": 37, "introduc": [11, 13, 34], "introduct": [0, 6, 20, 25, 26, 27, 28, 33, 39, 40], "invers": [5, 26], "invert": [34, 35], "ipython": [], "iter": 10, "its": 34, "iv": [42, 43], "j": [], "jacobian": [34, 42, 43], "januari": [], "jax": 13, "job": 39, "julia": 33, "jungl": 10, "jupyt": [], "k": [37, 38, 40, 41], "kei": [43, 44], "kera": [1, 3, 41, 42, 43, 44], "kernel": [8, 11], "l": [40, 41], "lab": [35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "lagrangian": 8, "lasso": [5, 6, 27, 34, 35], "last": [34, 36, 39, 40, 41, 44], "later": [5, 34, 35], "layer": [1, 2, 3, 12, 21, 22, 40, 41, 42, 43, 44], "layout": [40, 41], "learn": [0, 1, 2, 11, 13, 14, 15, 16, 17, 18, 19, 20, 25, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "least": [5, 6, 16, 19, 27, 28, 33, 34, 35, 36], "lectur": [33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "level": 10, "librari": [25, 28, 33], "licens": [], "light": [], "likelihood": [7, 37, 38, 39], "limit": [1, 13, 30, 35, 36, 37, 41], "linear": [0, 8, 13, 15, 26, 33, 34, 35, 38], "link": [5, 11, 32, 34, 37], "list": [40, 41], "literatur": [27, 28], "logist": [7, 33, 38, 39, 41], "loss": [34, 35, 36], "lu": 26, "ma": [], "machin": [0, 8, 13, 25, 27, 28, 33, 35, 38, 39], "machineri": 28, "made": 37, "main": [30, 33], "make": [0, 9, 10, 20, 34], "mani": [10, 12], "markdown": [], "mask": [], "maskedarrai": [], "mass": 33, "materi": [27, 28, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "math": [5, 34, 35], "mathemat": [3, 5, 8, 34, 35, 39, 40, 41, 43, 44], "matplotlib": [], "matric": [5, 26, 33, 43, 44], "matrix": [1, 5, 11, 12, 16, 23, 26, 33, 34, 35, 36, 39, 41], "matter": 0, "max": 34, "maximum": [37, 38, 39], "me": [], "mean": [0, 34, 35, 38], "measur": [23, 39], "medic": [43, 44], "meet": [5, 10, 30, 33, 34], "memori": [36, 43, 44], "mercer": 8, "metadata": [], "method": [6, 9, 10, 13, 27, 28, 33, 35, 36, 37, 38, 39, 41, 42], "metric": 19, "midnight": [27, 28], "min": 34, "mini": 36, "minibatch": 36, "minim": [33, 38, 39, 42, 43], "mit": [], "ml": 33, "mle": 37, "mlp": 12, "mnist": [3, 4, 42, 44], "mode": 40, "model": [0, 1, 4, 6, 12, 15, 17, 33, 39, 40, 41, 43, 44], "moment": 36, "momentum": [13, 27, 36], "mondai": [35, 36, 37, 39, 40, 42, 43, 44], "moon": [8, 9], "more": [3, 6, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "motiv": 36, "move": 36, "multi": [39, 40, 41], "multiclass": [41, 42], "multilay": [12, 39, 40], "multipl": [1, 3, 17, 21, 41, 43, 44], "multipli": 8, "multivari": 40, "myst": [], "ncsa": [], "need": [27, 33], "network": [1, 2, 3, 4, 7, 12, 28, 33, 36, 38, 39, 40, 41, 42, 43, 44], "neural": [1, 2, 3, 4, 7, 12, 28, 33, 36, 39, 40, 41, 42, 43, 44], "neuron": [39, 40, 43, 44], "new": [4, 18, 37, 40, 43, 44], "newton": [35, 36, 38, 39], "nn": [40, 41, 42, 43, 44], "node": [40, 41], "noeds": 41, "non": [8, 36], "none": 36, "norm": 28, "normal": [0, 1, 37, 41], "notat": [12, 39], "note": [27, 28, 34, 35], "notebook": [], "novemb": [28, 44], "now": [1, 9, 13, 35, 36, 37, 38], "nuclear": [0, 33], "nueral": 38, "numba": 33, "number": [0, 2, 22, 30, 34, 36, 40, 42, 43, 44], "numer": [2, 27, 28, 30, 42, 43], "numpi": [26, 33], "object": [3, 22, 41, 43, 44], "observ": [40, 41], "obtain": 11, "octob": [27, 40, 41, 42, 43], "od": [2, 42, 43], "off": [6, 19, 27], "ol": [5, 6, 15, 16, 18, 27, 35, 37], "onc": 21, "one": [2, 12, 18, 22, 35, 40, 41, 42, 43, 44], "ones": [39, 41], "open": [], "oper": [26, 40], "optim": [1, 8, 13, 18, 25, 33, 34, 35, 36, 38, 39, 40, 41], "option": [21, 22, 28], "order": [13, 18, 36], "ordinari": [5, 6, 16, 19, 27, 33, 34, 35, 36, 42, 43], "organ": [0, 33], "orient": [22, 41], "oslo": 32, "other": [4, 9, 11, 12, 23, 26, 27, 28, 33, 39, 40, 41, 42], "ouput": [40, 41], "our": [0, 4, 5, 11, 13, 27, 28, 33, 34, 35, 38, 39, 41, 42], "outcom": [25, 33], "output": [2, 40, 41, 42, 43], "over": [40, 41], "overarch": [0, 4, 8, 9, 21, 22, 23, 24, 33, 34, 40], "overview": [10, 33, 36], "own": [0, 10, 11, 27, 28, 33, 34, 42], "packag": [26, 33], "pad": [43, 44], "panda": [33, 34], "paper": 41, "parallel": 40, "paramet": [33, 34, 38, 39, 40, 41, 43, 44], "paramt": 18, "part": [13, 25, 27, 28, 35, 38, 39, 40, 41, 42, 44], "partial": [2, 42, 43], "pass": [1, 22, 41], "pca": 11, "pdf": 30, "percepetron": [40, 41], "perceptron": [12, 39, 40, 41], "perform": [1, 9, 41, 43, 44], "period": 3, "perspect": [1, 41], "pitaya": [], "plan": [34, 35, 36, 37, 38, 40, 42, 43, 44], "plethora": 33, "plot": [37, 38], "point": [4, 40], "poisson": [2, 42, 43], "polici": [], "polynomi": [3, 16, 18, 35, 43, 44], "pool": [43, 44], "popul": [2, 42, 43], "popular": 33, "possibl": [42, 43], "practic": [13, 31, 33, 36], "pre": [1, 3, 41, 42, 44], "preambl": [27, 28], "precis": 23, "predict": [4, 21], "predictor": [38, 39], "preprocess": [34, 36], "prerequisit": [3, 25, 33, 44], "present": 20, "princip": 11, "principl": 3, "pro": [9, 36], "probabl": [5, 30, 37], "problem": [1, 2, 13, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "procedur": [9, 33], "process": [1, 3, 21, 41, 42, 43, 44], "product": [43, 44], "program": [2, 13, 27, 28, 35, 36, 40, 42, 43], "project": [6, 20, 27, 28, 31, 33], "prop": 13, "propag": [1, 12, 40, 41, 42, 43], "properti": [5, 30, 34, 35, 36, 38], "python": [0, 9, 15, 25, 26, 33], "pytorch": [42, 43, 44], "quick": 8, "quickli": [], "r": 33, "random": [10, 11, 30], "raphson": [35, 38, 39], "raschka": [43, 44], "rate": [27, 36, 41, 42], "read": [9, 33, 34, 36, 37, 38, 39, 40, 41], "real": [6, 21, 33, 40], "recal": 23, "recogn": [43, 44], "recommend": [33, 34, 41], "record": [], "recurr": [4, 12, 39, 40], "reduc": [0, 34, 40], "reduct": [3, 44], "refer": [27, 28], "referenc": 20, "reformul": [2, 42, 43], "regress": [0, 5, 6, 7, 9, 10, 13, 15, 17, 18, 19, 27, 28, 33, 34, 35, 36, 37, 38, 39], "regular": [1, 41, 43, 44], "relat": [], "relev": [32, 34, 39, 41], "relu": [1, 41, 42], "remark": [3, 43, 44], "remind": [6, 8, 28, 33, 34, 35, 36, 40, 41, 44], "replac": [13, 36], "report": [20, 27, 28], "repositori": [15, 37, 38], "requir": [2, 25, 28, 42, 43], "resampl": [6, 19, 27, 37, 38], "rescal": [6, 34], "residu": [34, 35], "resourc": [2, 42, 43], "result": [34, 35, 40, 41], "revers": 40, "revis": [], "revisit": [13, 35, 36, 38, 39], "rewrit": [33, 34, 37, 43, 44], "rewritten": [38, 39], "ridg": [0, 5, 6, 17, 18, 19, 27, 34, 35, 36], "rm": 13, "rmsprop": 36, "roc": 23, "role": [], "root": 41, "rule": [12, 36, 40, 41], "run": 44, "rung": 27, "same": [13, 36, 37, 38], "sampl": 11, "scalabl": 36, "scale": [17, 18, 19, 34, 36, 43, 44], "scan": [43, 44], "schedul": [33, 41, 42], "schemat": 9, "scheme": [2, 42, 43], "scienc": 33, "scikit": [0, 1, 11, 33, 34, 35, 36, 37, 38, 39, 41], "second": [13, 18, 36], "select": 38, "semest": 31, "sensit": 35, "septemb": [19, 35, 36, 37, 38, 39], "seri": [43, 44], "seriou": 40, "session": [35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "set": [0, 2, 3, 9, 12, 15, 29, 33, 34, 35, 40, 41, 42, 43, 44], "setup": [15, 42, 43, 44], "sgd": [13, 36], "should": [1, 28, 41, 42], "show": [], "sigmoid": 41, "similar": [13, 36, 42], "simpl": [0, 4, 9, 13, 18, 33, 34, 35, 36, 38, 40, 41, 43, 44], "simpler": 40, "simplest": 18, "simplif": [43, 44], "singl": [10, 39, 40], "singular": [5, 11, 34, 35], "size": [34, 35, 36], "sklearn": 16, "slightli": 36, "smarter": 40, "smoothi": [], "sneak": 36, "soft": 8, "softmax": [1, 41], "softwar": [27, 28, 33], "solut": [42, 43], "solv": [2, 35, 38, 39, 42, 43], "solver": 13, "some": [13, 26, 34, 35, 38, 40], "sound": [43, 44], "sourc": [], "specif": [42, 43], "specifi": [2, 42, 43], "speed": 36, "sphinx": [], "split": [0, 15, 34], "squar": [0, 5, 6, 10, 16, 19, 27, 33, 34, 35, 36], "stage": [43, 44], "standard": [13, 34, 37], "start": [20, 39, 43], "state": 0, "statist": [5, 6, 25, 30, 33, 37, 38], "steepest": [10, 13, 35], "step": [36, 37, 38], "stochast": [13, 27, 30, 36], "stop": 36, "strong": 44, "strongli": [33, 36], "structur": [], "studi": 39, "suggest": [33, 39], "sum": [37, 38, 40, 41], "summar": [43, 44], "summari": [28, 31, 33], "superposit": 3, "supervis": [1, 41], "support": 8, "svd": [5, 34, 35, 43], "synthet": [18, 38, 39], "systemat": [3, 44], "t": [34, 43, 44], "take": 16, "taken": [33, 36], "teach": 31, "teacher": [31, 33], "team": [], "technic": [34, 42, 43], "techniqu": [6, 11, 27], "technologi": 25, "tensorflow": [1, 3, 41, 42, 43, 44], "tent": [31, 33], "term": [37, 40, 41], "test": [0, 1, 15, 17, 28, 34, 41, 42], "texmath": [], "text": 33, "textbook": [32, 33], "than": 35, "thank": [], "theorem": [5, 8, 11, 12, 30, 37, 40], "theoret": 36, "theori": 30, "theta": [18, 37], "thi": [21, 22, 24, 33, 40], "three": [40, 41], "through": 40, "time": 36, "tip": [13, 36], "todo": [], "toeplitz": [43, 44], "togeth": [12, 40, 41], "tool": [27, 28, 33], "top": [1, 41, 44], "topic": 33, "toward": 11, "trade": [6, 19, 27], "tradeoff": [6, 37, 38], "train": [0, 1, 4, 15, 21, 22, 33, 34, 40, 41, 43, 44], "transform": [3, 43, 44], "translat": [], "tree": [9, 10], "trial": [42, 43], "tuesdai": [35, 39, 40, 41, 43], "tune": [1, 41], "two": [3, 8, 22, 25, 28, 38, 39, 40, 41, 43, 44], "type": [2, 4, 12, 33, 39, 40, 42, 43, 44], "uio": 33, "understand": [22, 37, 38], "univers": [12, 32, 40], "unsupervis": 14, "up": [0, 2, 9, 12, 15, 28, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44], "updat": [27, 36, 40, 41, 42], "us": [0, 1, 2, 3, 7, 13, 16, 18, 19, 22, 25, 27, 28, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44], "usag": [36, 41, 42], "v": [3, 36, 43, 44], "valid": [6, 27, 37, 38], "valu": [5, 11, 19, 30, 34, 35, 37, 38], "vanish": 41, "vari": 36, "variabl": [30, 35], "varianc": [6, 19, 27, 37, 38], "variou": [0, 28, 37, 38], "vector": [8, 12, 16, 26, 33, 34, 39, 43, 44], "verifi": 44, "versu": 33, "video": [36, 37, 38, 39, 40, 41], "view": [0, 4, 10, 34, 40], "virtual": 15, "visual": [1, 9, 41, 44], "volum": [43, 44], "wai": [9, 27, 37, 38, 40, 43, 44], "warm": 28, "wave": [2, 42], "we": [33, 36, 40, 41, 42], "wednesdai": [35, 39, 40, 41, 43], "week": [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "weekli": [], "weight": 41, "welcom": [], "well": [43, 44], "what": [0, 33, 34, 35, 37, 38, 43, 44], "when": 36, "which": [1, 36, 41, 42], "why": [33, 36, 37, 38, 39, 40, 42, 43, 44], "wisconsin": 7, "word": 40, "workflow": [], "wrap": 37, "write": [4, 11, 20, 22, 27, 28, 35, 41], "x": 34, "xgboost": 10, "xor": [39, 41, 42], "yaml": [], "yet": 35, "you": 28, "your": [0, 10, 16, 18, 27, 28, 34], "z_j": [40, 41]}}) \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/statistics.html b/doc/LectureNotes/_build/html/statistics.html index 97a898137..d973b08be 100644 --- a/doc/LectureNotes/_build/html/statistics.html +++ b/doc/LectureNotes/_build/html/statistics.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/teachers.html b/doc/LectureNotes/_build/html/teachers.html index 1f89a2222..85acc656f 100644 --- a/doc/LectureNotes/_build/html/teachers.html +++ b/doc/LectureNotes/_build/html/teachers.html @@ -228,10 +228,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/textbooks.html b/doc/LectureNotes/_build/html/textbooks.html index b169538f8..565ff7ef8 100644 --- a/doc/LectureNotes/_build/html/textbooks.html +++ b/doc/LectureNotes/_build/html/textbooks.html @@ -228,10 +228,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/week34.html b/doc/LectureNotes/_build/html/week34.html index 4b2a537ec..d218bca00 100644 --- a/doc/LectureNotes/_build/html/week34.html +++ b/doc/LectureNotes/_build/html/week34.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/week35.html b/doc/LectureNotes/_build/html/week35.html index 4b070b2bc..6a2b2b830 100644 --- a/doc/LectureNotes/_build/html/week35.html +++ b/doc/LectureNotes/_build/html/week35.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/week36.html b/doc/LectureNotes/_build/html/week36.html index ac976bbb6..0fe2ee2e3 100644 --- a/doc/LectureNotes/_build/html/week36.html +++ b/doc/LectureNotes/_build/html/week36.html @@ -230,10 +230,45 @@
  • Exercises week 36
  • Week 36: Linear Regression and Gradient descent
  • Exercises week 37
  • +
  • Week 37: Gradient descent methods
  • +
  • Exercises week 38
  • +
  • Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
  • +
  • Exercises week 39
  • +
  • Week 39: Resampling methods and logistic regression
  • +
  • Week 40: Gradient descent methods (continued) and start Neural networks
  • +
  • Week 41 Neural networks and constructing a neural network code
  • +
  • Exercises week 41
  • + + + + + + + + +
  • Week 42 Constructing a Neural Network code with examples
  • +
  • Exercises week 42
  • + + + + + + + + + +
  • Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
  • +
  • Exercises week 43
  • + +
  • Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
  • +
  • Exercises week 44
  • + +
  • Week 45, Convolutional Neural Networks (CCNs)
  • Projects

    diff --git a/doc/LectureNotes/_build/html/week37.html b/doc/LectureNotes/_build/html/week37.html new file mode 100644 index 000000000..345a65c8a --- /dev/null +++ b/doc/LectureNotes/_build/html/week37.html @@ -0,0 +1,2927 @@ + + + + + + + + + + + Week 37: Gradient descent methods — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 37: Gradient descent methods

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 37: Gradient descent methods#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo, Norway

    +

    Date: September 8-12, 2025

    +
    +

    Plans for week 37, lecture Monday#

    +

    Plans and material for the lecture on Monday September 8.

    +

    The family of gradient descent methods

    +
      +
    1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge

    2. +
    3. Improving gradient descent with momentum

    4. +
    5. Introducing stochastic gradient descent

    6. +
    7. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM

    8. +
    9. Video of Lecture

    10. +
    11. Whiteboard notes

    12. +
    +
    +
    +

    Readings and Videos:#

    +
      +
    1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at https://www.deeplearningbook.org/contents/numerical.html and chapter 8.3-8.5 at https://www.deeplearningbook.org/contents/optimization.html

    2. +
    3. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.

    4. +
    5. Video on gradient descent at https://www.youtube.com/watch?v=sDv4f4s2SB8

    6. +
    7. Video on Stochastic gradient descent at https://www.youtube.com/watch?v=vMh0zPT0tLI

    8. +
    +
    +
    +

    Material for lecture Monday September 8#

    +
    +
    +

    Gradient descent and revisiting Ordinary Least Squares from last week#

    +

    Last week we started with linear regression as a case study for the gradient descent +methods. Linear regression is a great test case for the gradient +descent methods discussed in the lectures since it has several +desirable properties such as:

    +
      +
    1. An analytical solution (recall homework sets for week 35).

    2. +
    3. The gradient can be computed analytically.

    4. +
    5. The cost function is convex which guarantees that gradient descent converges for small enough learning rates

    6. +
    +

    We revisit an example similar to what we had in the first homework set. We have a function of the type

    +
    +
    +
    import numpy as np
    +x = 2*np.random.rand(m,1)
    +y = 4+3*x+np.random.randn(m,1)
    +
    +
    +
    +
    +

    with \(x_i \in [0,1] \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \(\cal {N}(0,1)\). +The linear regression model is given by

    +
    +\[ +h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x, +\]
    +

    such that

    +
    +\[ +\boldsymbol{y}_i = \theta_0 + \theta_1 x_i. +\]
    +
    +
    +

    Gradient descent example#

    +

    Let \(\mathbf{y} = (y_1,\cdots,y_n)^T\), \(\mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T\) and \(\theta = (\theta_0, \theta_1)^T\)

    +

    It is convenient to write \(\mathbf{\boldsymbol{y}} = X\theta\) where \(X \in \mathbb{R}^{100 \times 2} \) is the design matrix given by (we keep the intercept here)

    +
    +\[\begin{split} +X \equiv \begin{bmatrix} +1 & x_1 \\ +\vdots & \vdots \\ +1 & x_{100} & \\ +\end{bmatrix}. +\end{split}\]
    +

    The cost/loss/risk function is given by

    +
    +\[ +C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] +\]
    +

    and we want to find \(\theta\) such that \(C(\theta)\) is minimized.

    +
    +
    +

    The derivative of the cost/loss function#

    +

    Computing \(\partial C(\theta) / \partial \theta_0\) and \(\partial C(\theta) / \partial \theta_1\) we can show that the gradient can be written as

    +
    +\[\begin{split} +\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\ +\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\ +\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), +\end{split}\]
    +

    where \(X\) is the design matrix defined above.

    +
    +
    +

    The Hessian matrix#

    +

    The Hessian matrix of \(C(\theta)\) is given by

    +
    +\[\begin{split} +\boldsymbol{H} \equiv \begin{bmatrix} +\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ +\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ +\end{bmatrix} = \frac{2}{n}X^T X. +\end{split}\]
    +

    This result implies that \(C(\theta)\) is a convex function since the matrix \(X^T X\) always is positive semi-definite.

    +
    +
    +

    Simple program#

    +

    We can now write a program that minimizes \(C(\theta)\) using the gradient descent method with a constant learning rate \(\eta\) according to

    +
    +\[ +\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots +\]
    +

    We can use the expression we computed for the gradient and let use a +\(\theta_0\) be chosen randomly and let \(\eta = 0.001\). Stop iterating +when \(||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8}\). Note that the code below does not include the latter stop criterion.

    +

    And finally we can compare our solution for \(\theta\) with the analytic result given by +\(\theta= (X^TX)^{-1} X^T \mathbf{y}\).

    +
    +
    +

    Gradient Descent Example#

    +

    Here our simple example

    +
    +
    +
    %matplotlib inline
    +
    +
    +# Importing various packages
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from mpl_toolkits.mplot3d import Axes3D
    +from matplotlib import cm
    +from matplotlib.ticker import LinearLocator, FormatStrFormatter
    +import sys
    +
    +# the number of datapoints
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +# Hessian matrix
    +H = (2.0/n)* X.T @ X
    +# Get the eigenvalues
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
    +print(theta_linreg)
    +theta = np.random.randn(2,1)
    +
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +for iter in range(Niterations):
    +    gradient = (2.0/n)*X.T @ (X @ theta-y)
    +    theta -= eta*gradient
    +
    +print(theta)
    +xnew = np.array([[0],[2]])
    +xbnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = xbnew.dot(theta)
    +ypredict2 = xbnew.dot(theta_linreg)
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Gradient descent example')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Gradient descent and Ridge#

    +

    We have also discussed Ridge regression where the loss function contains a regularized term given by the \(L_2\) norm of \(\theta\),

    +
    +\[ +C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0. +\]
    +

    In order to minimize \(C_{\text{ridge}}(\theta)\) using GD we adjust the gradient as follows

    +
    +\[\begin{split} +\nabla_\theta C_{\text{ridge}}(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\ +\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\ +\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta). +\end{split}\]
    +

    We can easily extend our program to minimize \(C_{\text{ridge}}(\theta)\) using gradient descent and compare with the analytical solution given by

    +
    +\[ +\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}. +\]
    +
    +
    +

    The Hessian matrix for Ridge Regression#

    +

    The Hessian matrix of Ridge Regression for our simple example is given by

    +
    +\[\begin{split} +\boldsymbol{H} \equiv \begin{bmatrix} +\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ +\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ +\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}. +\end{split}\]
    +

    This implies that the Hessian matrix is positive definite, hence the stationary point is a +minimum. +Note that the Ridge cost function is convex being a sum of two convex +functions. Therefore, the stationary point is a global +minimum of this function.

    +
    +
    +

    Program example for gradient descent with Ridge Regression#

    +
    +
    +
    from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from mpl_toolkits.mplot3d import Axes3D
    +from matplotlib import cm
    +from matplotlib.ticker import LinearLocator, FormatStrFormatter
    +import sys
    +
    +# the number of datapoints
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +
    +#Ridge parameter lambda
    +lmbda  = 0.001
    +Id = n*lmbda* np.eye(XT_X.shape[0])
    +
    +# Hessian matrix
    +H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
    +# Get the eigenvalues
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +
    +theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
    +print(theta_linreg)
    +# Start plain gradient descent
    +theta = np.random.randn(2,1)
    +
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta
    +    theta -= eta*gradients
    +
    +print(theta)
    +ypredict = X @ theta
    +ypredict2 = X @ theta_linreg
    +plt.plot(x, ypredict, "r-")
    +plt.plot(x, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Gradient descent example for Ridge')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Using gradient descent methods, limitations#

    +
      +
    • Gradient descent (GD) finds local minima of our function. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.

    • +
    • GD is sensitive to initial conditions. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.

    • +
    • Gradients are computationally expensive to calculate for large datasets. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \(E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2\); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over all \(n\) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called “mini batches”. This has the added benefit of introducing stochasticity into our algorithm.

    • +
    • GD is very sensitive to choices of learning rates. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would adaptively choose the learning rates to match the landscape.

    • +
    • GD treats all directions in parameter space uniformly. Another major drawback of GD is that unlike Newton’s method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.

    • +
    • GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.

    • +
    +
    +
    +

    Momentum based GD#

    +

    We discuss here some simple examples where we introduce what is called +‘memory’about previous steps, or what is normally called momentum +gradient descent. +For the mathematical details, see whiteboad notes from lecture on September 8, 2025.

    +
    +
    +

    Improving gradient descent with momentum#

    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# take a step
    +		solution = solution - step_size * gradient
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# perform the gradient descent search
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +

    Same code but now with momentum gradient descent#

    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# keep track of the change
    +	change = 0.0
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# calculate update
    +		new_change = step_size * gradient + momentum * change
    +		# take a step
    +		solution = solution - new_change
    +		# save the change
    +		change = new_change
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# define momentum
    +momentum = 0.3
    +# perform the gradient descent search with momentum
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +

    Overview video on Stochastic Gradient Descent (SGD)#

    +

    What is Stochastic Gradient Descent +There are several reasons for using stochastic gradient descent. Some of these are:

    +
      +
    1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.

    2. +
    3. Hopefully avoid Local Minima

    4. +
    5. Memory Usage: Requires less memory compared to computing gradients for the entire dataset.

    6. +
    +
    +
    +

    Batches and mini-batches#

    +

    In gradient descent we compute the cost function and its gradient for all data points we have.

    +

    In large-scale applications such as the ILSVRC challenge, the +training data can have on order of millions of examples. Hence, it +seems wasteful to compute the full cost function over the entire +training set in order to perform only a single parameter update. A +very common approach to addressing this challenge is to compute the +gradient over batches of the training data. For example, a typical batch could contain some thousand examples from +an entire training set of several millions. This batch is then used to +perform a parameter update.

    +
    +
    +

    Pros and cons#

    +
      +
    1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.

    2. +
    3. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.

    4. +
    5. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.

    6. +
    +
    +
    +

    Convergence rates#

    +
      +
    1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.

    2. +
    3. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.

    4. +
    +
    +
    +

    Accuracy#

    +

    In general, stochastic Gradient Descent is Less accurate than gradient +descent, as it calculates the gradient on single examples, which may +not accurately represent the overall dataset. Gradient Descent is +more accurate because it uses the average gradient calculated over the +entire dataset.

    +

    There are other disadvantages to using SGD. The main drawback is that +its convergence behaviour can be more erratic due to the random +sampling of individual training examples. This can lead to less +accurate results, as the algorithm may not converge to the true +minimum of the cost function. Additionally, the learning rate, which +determines the step size of each update to the model’s parameters, +must be carefully chosen to ensure convergence.

    +

    It is however the method of choice in deep learning algorithms where +SGD is often used in combination with other optimization techniques, +such as momentum or adaptive learning rates

    +
    +
    +

    Stochastic Gradient Descent (SGD)#

    +

    In stochastic gradient descent, the extreme case is the case where we +have only one batch, that is we include the whole data set.

    +

    This process is called Stochastic Gradient +Descent (SGD) (or also sometimes on-line gradient descent). This is +relatively less common to see because in practice due to vectorized +code optimizations it can be computationally much more efficient to +evaluate the gradient for 100 examples, than the gradient for one +example 100 times. Even though SGD technically refers to using a +single example at a time to evaluate the gradient, you will hear +people use the term SGD even when referring to mini-batch gradient +descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD +for “Batch gradient descent” are rare to see), where it is usually +assumed that mini-batches are used. The size of the mini-batch is a +hyperparameter but it is not very common to cross-validate or bootstrap it. It is +usually based on memory constraints (if any), or set to some value, +e.g. 32, 64 or 128. We use powers of 2 in practice because many +vectorized operation implementations work faster when their inputs are +sized in powers of 2.

    +

    In our notes with SGD we mean stochastic gradient descent with mini-batches.

    +
    +
    +

    Stochastic Gradient Descent#

    +

    Stochastic gradient descent (SGD) and variants thereof address some of +the shortcomings of the Gradient descent method discussed above.

    +

    The underlying idea of SGD comes from the observation that the cost +function, which we want to minimize, can almost always be written as a +sum over \(n\) data points \(\{\mathbf{x}_i\}_{i=1}^n\),

    +
    +\[ +C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i, +\mathbf{\theta}). +\]
    +
    +
    +

    Computation of gradients#

    +

    This in turn means that the gradient can be +computed as a sum over \(i\)-gradients

    +
    +\[ +\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}). +\]
    +

    Stochasticity/randomness is introduced by only taking the +gradient on a subset of the data called minibatches. If there are \(n\) +data points and the size of each minibatch is \(M\), there will be \(n/M\) +minibatches. We denote these minibatches by \(B_k\) where +\(k=1,\cdots,n/M\).

    +
    +
    +

    SGD example#

    +

    As an example, suppose we have \(10\) data points \((\mathbf{x}_1,\cdots, \mathbf{x}_{10})\) +and we choose to have \(M=5\) minibathces, +then each minibatch contains two data points. In particular we have +\(B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 = +(\mathbf{x}_9,\mathbf{x}_{10})\). Note that if you choose \(M=1\) you +have only a single batch with all data points and on the other extreme, +you may choose \(M=n\) resulting in a minibatch for each datapoint, i.e +\(B_k = \mathbf{x}_k\).

    +

    The idea is now to approximate the gradient by replacing the sum over +all data points with a sum over the data points in one the minibatches +picked at random in each gradient descent step

    +
    +\[ +\nabla_{\theta} +C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta +c_i(\mathbf{x}_i, \mathbf{\theta}). +\]
    +
    +
    +

    The gradient step#

    +

    Thus a gradient descent step now looks like

    +
    +\[ +\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}) +\]
    +

    where \(k\) is picked at random with equal +probability from \([1,n/M]\). An iteration over the number of +minibathces (n/M) is commonly referred to as an epoch. Thus it is +typical to choose a number of epochs and for each epoch iterate over +the number of minibatches, as exemplified in the code below.

    +
    +
    +

    Simple example code#

    +
    +
    +
    import numpy as np 
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 10 #number of epochs
    +
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for 
    +        j += 1
    +
    +
    +
    +
    +

    Taking the gradient only on a subset of the data has two important +benefits. First, it introduces randomness which decreases the chance +that our opmization scheme gets stuck in a local minima. Second, if +the size of the minibatches are small relative to the number of +datapoints (\(M < n\)), the computation of the gradient is much +cheaper since we sum over the datapoints in the \(k-th\) minibatch and not +all \(n\) datapoints.

    +
    +
    +

    When do we stop?#

    +

    A natural question is when do we stop the search for a new minimum? +One possibility is to compute the full gradient after a given number +of epochs and check if the norm of the gradient is smaller than some +threshold and stop if true. However, the condition that the gradient +is zero is valid also for local minima, so this would only tell us +that we are close to a local/global minimum. However, we could also +evaluate the cost function at this point, store the result and +continue the search. If the test kicks in at a later stage we can +compare the values of the cost function and keep the \(\theta\) that +gave the lowest value.

    +
    +
    +

    Slightly different approach#

    +

    Another approach is to let the step length \(\eta_j\) depend on the +number of epochs in such a way that it becomes very small after a +reasonable time such that we do not move at all. Such approaches are +also called scaling. There are many such ways to scale the learning +rate +and discussions here. See +also +https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 +for a discussion of different scaling functions for the learning rate.

    +
    +
    +

    Time decay rate#

    +

    As an example, let \(e = 0,1,2,3,\cdots\) denote the current epoch and let \(t_0, t_1 > 0\) be two fixed numbers. Furthermore, let \(t = e \cdot m + i\) where \(m\) is the number of minibatches and \(i=0,\cdots,m-1\). Then the function $\(\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} \)\( goes to zero as the number of epochs gets large. I.e. we start with a step length \)\eta_j (0; t_0, t_1) = t_0/t_1\( which decays in *time* \)t$.

    +

    In this way we can fix the number of epochs, compute \(\theta\) and +evaluate the cost function at the end. Repeating the computation will +give a different result since the scheme is random by design. Then we +pick the final \(\theta\) that gives the lowest value of the cost +function.

    +
    +
    +
    import numpy as np 
    +
    +def step_length(t,t0,t1):
    +    return t0/(t+t1)
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 500 #number of epochs
    +t0 = 1.0
    +t1 = 10
    +
    +eta_j = t0/t1
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for theta
    +        t = epoch*m+i
    +        eta_j = step_length(t,t0,t1)
    +        j += 1
    +
    +print("eta_j after %d epochs: %g" % (n_epochs,eta_j))
    +
    +
    +
    +
    +
    +
    +

    Code with a Number of Minibatches which varies#

    +

    In the code here we vary the number of mini-batches.

    +
    +
    +
    # Importing various packages
    +from math import exp, sqrt
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ ((X @ theta)-y)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Replace or not#

    +

    In the above code, we have use replacement in setting up the +mini-batches. The discussion +here may be +useful.

    +
    +
    +

    SGD vs Full-Batch GD: Convergence Speed and Memory Comparison#

    +
    +

    Theoretical Convergence Speed and convex optimization#

    +

    Consider minimizing an empirical cost function

    +
    +\[ +C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta), +\]
    +

    where each \(l_i(\theta)\) is a +differentiable loss term. Gradient Descent (GD) updates parameters +using the full gradient \(\nabla C(\theta)\), while Stochastic Gradient +Descent (SGD) uses a single sample (or mini-batch) gradient \(\nabla +l_i(\theta)\) selected at random. In equation form, one GD step is:

    +
    +\[ +\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t), +\]
    +

    whereas one SGD step is:

    +
    +\[ +\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t), +\]
    +

    with \(i_t\) randomly chosen. On smooth convex problems, GD and SGD both +converge to the global minimum, but their rates differ. GD can take +larger, more stable steps since it uses the exact gradient, achieving +an error that decreases on the order of \(O(1/t)\) per iteration for +convex objectives (and even exponentially fast for strongly convex +cases). In contrast, plain SGD has more variance in each step, leading +to sublinear convergence in expectation – typically \(O(1/\sqrt{t})\) +for general convex objectives (\thetaith appropriate diminishing step +sizes) . Intuitively, GD’s trajectory is smoother and more +predictable, while SGD’s path oscillates due to noise but costs far +less per iteration, enabling many more updates in the same time.

    +
    +
    +

    Strongly Convex Case#

    +

    If \(C(\theta)\) is strongly convex and \(L\)-smooth (so GD enjoys linear +convergence), the gap \(C(\theta_t)-C(\theta^*)\) for GD shrinks as

    +
    +\[ +C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)], +\]
    +

    a geometric (linear) convergence per iteration . Achieving an +\(\epsilon\)-accurate solution thus takes on the order of +\(\log(1/\epsilon)\) iterations for GD. However, each GD iteration costs +\(O(N)\) gradient evaluations. SGD cannot exploit strong convexity to +obtain a linear rate – instead, with a properly decaying step size +(e.g. \(\eta_t = \frac{1}{\mu t}\)) or iterate averaging, SGD attains an +\(O(1/t)\) convergence rate in expectation . For example, one result +of Moulines and Bach 2011, see https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html shows that with \(\eta_t = \Theta(1/t)\),

    +
    +\[ +\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t), +\]
    +

    for strongly convex, smooth \(F\) . This \(1/t\) rate is slower per +iteration than GD’s exponential decay, but each SGD iteration is \(N\) +times cheaper. In fact, to reach error \(\epsilon\), plain SGD needs on +the order of \(T=O(1/\epsilon)\) iterations (sub-linear convergence), +while GD needs \(O(\log(1/\epsilon))\) iterations. When accounting for +cost-per-iteration, GD requires \(O(N \log(1/\epsilon))\) total gradient +computations versus SGD’s \(O(1/\epsilon)\) single-sample +computations. In large-scale regimes (huge \(N\)), SGD can be +faster in wall-clock time because \(N \log(1/\epsilon)\) may far exceed +\(1/\epsilon\) for reasonable accuracy levels. In other words, +with millions of data points, one epoch of GD (one full gradient) is +extremely costly, whereas SGD can make \(N\) cheap updates in the time +GD makes one – often yielding a good solution faster in practice, even +though SGD’s asymptotic error decays more slowly. As one lecture +succinctly puts it: “SGD can be super effective in terms of iteration +cost and memory, but SGD is slow to converge and can’t adapt to strong +convexity” . Thus, the break-even point depends on \(N\) and the desired +accuracy: for moderate accuracy on very large \(N\), SGD’s cheaper +updates win; for extremely high precision (very small \(\epsilon\)) on a +modest \(N\), GD’s fast convergence per step can be advantageous.

    +
    +
    +

    Non-Convex Problems#

    +

    In non-convex optimization (e.g. deep neural networks), neither GD nor +SGD guarantees global minima, but SGD often displays faster progress +in finding useful minima. Theoretical results here are weaker, usually +showing convergence to a stationary point \(\theta\) (\(|\nabla C|\) is +small) in expectation. For example, GD might require \(O(1/\epsilon^2)\) +iterations to ensure \(|\nabla C(\theta)| < \epsilon\), and SGD typically has +similar polynomial complexity (often worse due to gradient +noise). However, a noteworthy difference is that SGD’s stochasticity +can help escape saddle points or poor local minima. Random gradient +fluctuations act like implicit noise, helping the iterate “jump” out +of flat saddle regions where full-batch GD could stagnate . In fact, +research has shown that adding noise to GD can guarantee escaping +saddle points in polynomial time, and the inherent noise in SGD often +serves this role. Empirically, this means SGD can sometimes find a +lower loss basin faster, whereas full-batch GD might get “stuck” near +saddle points or need a very small learning rate to navigate complex +error surfaces . Overall, in modern high-dimensional machine learning, +SGD (or mini-batch SGD) is the workhorse for large non-convex problems +because it converges to good solutions much faster in practice, +despite the lack of a linear convergence guarantee. Full-batch GD is +rarely used on large neural networks, as it would require tiny steps +to avoid divergence and is extremely slow per iteration .

    +
    +
    +
    +

    Memory Usage and Scalability#

    +

    A major advantage of SGD is its memory efficiency in handling large +datasets. Full-batch GD requires access to the entire training set for +each iteration, which often means the whole dataset (or a large +subset) must reside in memory to compute \(\nabla C(\theta)\) . This results +in memory usage that scales linearly with the dataset size \(N\). For +instance, if each training sample is large (e.g. high-dimensional +features), computing a full gradient may require storing a substantial +portion of the data or all intermediate gradients until they are +aggregated. In contrast, SGD needs only a single (or a small +mini-batch of) training example(s) in memory at any time . The +algorithm processes one sample (or mini-batch) at a time and +immediately updates the model, discarding that sample before moving to +the next. This streaming approach means that memory footprint is +essentially independent of \(N\) (apart from storing the model +parameters themselves). As one source notes, gradient descent +“requires more memory than SGD” because it “must store the entire +dataset for each iteration,” whereas SGD “only needs to store the +current training example” . In practical terms, if you have a dataset +of size, say, 1 million examples, full-batch GD would need memory for +all million every step, while SGD could be implemented to load just +one example at a time – a crucial benefit if data are too large to fit +in RAM or GPU memory. This scalability makes SGD suitable for +large-scale learning: as long as you can stream data from disk, SGD +can handle arbitrarily large datasets with fixed memory. In fact, SGD +“does not need to remember which examples were visited” in the past, +allowing it to run in an online fashion on infinite data streams +. Full-batch GD, on the other hand, would require multiple passes +through a giant dataset per update (or a complex distributed memory +system), which is often infeasible.

    +

    There is also a secondary memory effect: computing a full-batch +gradient in deep learning requires storing all intermediate +activations for backpropagation across the entire batch. A very large +batch (approaching the full dataset) might exhaust GPU memory due to +the need to hold activation gradients for thousands or millions of +examples simultaneously. SGD/minibatches mitigate this by splitting +the workload – e.g. with a mini-batch of size 32 or 256, memory use +stays bounded, whereas a full-batch (size = \(N\)) forward/backward pass +could not even be executed if \(N\) is huge. Techniques like gradient +accumulation exist to simulate large-batch GD by summing many +small-batch gradients – but these still process data in manageable +chunks to avoid memory overflow. In summary, memory complexity for GD +grows with \(N\), while for SGD it remains \(O(1)\) w.r.t. dataset size +(only the model and perhaps a mini-batch reside in memory) . This is a +key reason why batch GD “does not scale” to very large data and why +virtually all large-scale machine learning algorithms rely on +stochastic or mini-batch methods.

    +
    +
    +

    Empirical Evidence: Convergence Time and Memory in Practice#

    +

    Empirical studies strongly support the theoretical trade-offs +above. In large-scale machine learning tasks, SGD often converges to a +good solution much faster in wall-clock time than full-batch GD, and +it uses far less memory. For example, Bottou & Bousquet (2008) +analyzed learning time under a fixed computational budget and +concluded that when data is abundant, it’s better to use a faster +(even if less precise) optimization method to process more examples in +the same time . This analysis showed that for large-scale problems, +processing more data with SGD yields lower error than spending the +time to do exact (batch) optimization on fewer data . In other words, +if you have a time budget, it’s often optimal to accept slightly +slower convergence per step (as with SGD) in exchange for being able +to use many more training samples in that time. This phenomenon is +borne out by experiments:

    +
    +

    Deep Neural Networks#

    +

    In modern deep learning, full-batch GD is so slow that it is rarely +attempted; instead, mini-batch SGD is standard. A recent study +demonstrated that it is possible to train a ResNet-50 on ImageNet +using full-batch gradient descent, but it required careful tuning +(e.g. gradient clipping, tiny learning rates) and vast computational +resources – and even then, each full-batch update was extremely +expensive.

    +

    Using a huge batch +(closer to full GD) tends to slow down convergence if the learning +rate is not scaled up, and often encounters optimization difficulties +(plateaus) that small batches avoid. +Empirically, small or medium +batch SGD finds minima in fewer clock hours because it can rapidly +loop over the data with gradient noise aiding exploration.

    +
    +
    +

    Memory constraints#

    +

    From a memory standpoint, practitioners note that batch GD becomes +infeasible on large data. For example, if one tried to do full-batch +training on a dataset that doesn’t fit in RAM or GPU memory, the +program would resort to heavy disk I/O or simply crash. SGD +circumvents this by processing mini-batches. Even in cases where data +does fit in memory, using a full batch can spike memory usage due to +storing all gradients. One empirical observation is that mini-batch +training has a “lower, fluctuating usage pattern” of memory, whereas +full-batch loading “quickly consumes memory (often exceeding limits)” +. This is especially relevant for graph neural networks or other +models where a “batch” may include a huge chunk of a graph: full-batch +gradient computation can exhaust GPU memory, whereas mini-batch +methods keep memory usage manageable .

    +

    In summary, SGD converges faster than full-batch GD in terms of actual +training time for large-scale problems, provided we measure +convergence as reaching a good-enough solution. Theoretical bounds +show SGD needs more iterations, but because it performs many more +updates per unit time (and requires far less memory), it often +achieves lower loss in a given time frame than GD. Full-batch GD might +take slightly fewer iterations in theory, but each iteration is so +costly that it is “slower… especially for large datasets” . Meanwhile, +memory scaling strongly favors SGD: GD’s memory cost grows with +dataset size, making it impractical beyond a point, whereas SGD’s +memory use is modest and mostly constant w.r.t. \(N\) . These +differences have made SGD (and mini-batch variants) the de facto +choice for training large machine learning models, from logistic +regression on millions of examples to deep neural networks with +billions of parameters. The consensus in both research and practice is +that for large-scale or high-dimensional tasks, SGD-type methods +converge quicker per unit of computation and handle memory constraints +better than standard full-batch gradient descent .

    +
    +
    +
    +

    Second moment of the gradient#

    +

    In stochastic gradient descent, with and without momentum, we still +have to specify a schedule for tuning the learning rates \(\eta_t\) +as a function of time. As discussed in the context of Newton’s +method, this presents a number of dilemmas. The learning rate is +limited by the steepest direction which can change depending on the +current position in the landscape. To circumvent this problem, ideally +our algorithm would keep track of curvature and take large steps in +shallow, flat directions and small steps in steep, narrow directions. +Second-order methods accomplish this by calculating or approximating +the Hessian and normalizing the learning rate by the +curvature. However, this is very computationally expensive for +extremely large models. Ideally, we would like to be able to +adaptively change the step size to match the landscape without paying +the steep computational price of calculating or approximating +Hessians.

    +

    During the last decade a number of methods have been introduced that accomplish +this by tracking not only the gradient, but also the second moment of +the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and +ADAM.

    +
    +
    +

    Challenge: Choosing a Fixed Learning Rate#

    +

    A fixed \(\eta\) is hard to get right:

    +
      +
    1. If \(\eta\) is too large, the updates can overshoot the minimum, causing oscillations or divergence

    2. +
    3. If \(\eta\) is too small, convergence is very slow (many iterations to make progress)

    4. +
    +

    In practice, one often uses trial-and-error or schedules (decaying \(\eta\) over time) to find a workable balance. +For a function with steep directions and flat directions, a single global \(\eta\) may be inappropriate:

    +
      +
    1. Steep coordinates require a smaller step size to avoid oscillation.

    2. +
    3. Flat/shallow coordinates could use a larger step to speed up progress.

    4. +
    5. This issue is pronounced in high-dimensional problems with sparse or varying-scale features – we need a method to adjust step sizesper feature.

    6. +
    +
    +
    +

    Motivation for Adaptive Step Sizes#

    +
      +
    1. Instead of a fixed global \(\eta\), use an adaptive learning rate for each parameter that depends on the history of gradients.

    2. +
    3. Parameters that have large accumulated gradient magnitude should get smaller steps (they’ve been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.

    4. +
    5. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected

    6. +
    7. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates

    8. +
    9. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive AdaGrad, one of the first adaptive methods.

    10. +
    +
    +
    +

    AdaGrad algorithm, taken from Goodfellow et al#

    + + +

    Figure 1:

    +
    +
    +

    Derivation of the AdaGrad Algorithm#

    +

    Accumulating Gradient History.

    +
      +
    1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)

    2. +
    3. Let \(g_t = \nabla C_{i_t}(x_t)\) be the gradient at step \(t\) (or a subgradient for nondifferentiable cases).

    4. +
    5. Initialize \(r_0 = 0\) (an all-zero vector in \(\mathbb{R}^d\)).

    6. +
    7. At each iteration \(t\), update the accumulation:

    8. +
    +
    +\[ +r_t = r_{t-1} + g_t \circ g_t, +\]
    +
      +
    1. Here \(g_t \circ g_t\) denotes element-wise square of the gradient vector. \(g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2\) for each parameter \(j\).

    2. +
    3. We can view \(H_t = \mathrm{diag}(r_t)\) as a diagonal matrix of past squared gradients. Initially \(H_0 = 0\).

    4. +
    +
    +
    +

    AdaGrad Update Rule Derivation#

    +

    We scale the gradient by the inverse square root of the accumulated matrix \(H_t\). The AdaGrad update at step \(t\) is:

    +
    +\[ +\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t, +\]
    +

    where \(H_t^{-1/2}\) is the diagonal matrix with entries \((r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2}\) +In coordinates, this means each parameter \(j\) has an individual step size:

    +
    +\[ +\theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}. +\]
    +

    In practice we add a small constant \(\epsilon\) in the denominator for numerical stability to avoid division by zero:

    +
    +\[ +\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}. +\]
    +

    Equivalently, the effective learning rate for parameter \(j\) at time \(t\) is \(\displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}}\). This decreases over time as \(r_{t,j}\) grows.

    +
    +
    +

    AdaGrad Properties#

    +
      +
    1. AdaGrad automatically tunes the step size for each parameter. Parameters with more volatile or large gradients get smaller steps, and those with small or infrequent gradients get relatively larger steps

    2. +
    3. No manual schedule needed: The accumulation \(r_t\) keeps increasing (or stays the same if gradient is zero), so step sizes \(\eta/\sqrt{r_t}\) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.

    4. +
    5. Sparse data benefit: For very sparse features, \(r_{t,j}\) grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal

    6. +
    7. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem

    8. +
    +

    It effectively reduces the need to tune \(\eta\) by hand.

    +
      +
    1. Limitations: Because \(r_t\) accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)

    2. +
    +
    +
    +

    RMSProp: Adaptive Learning Rates#

    +

    Addresses AdaGrad’s diminishing learning rate issue. +Uses a decaying average of squared gradients (instead of a cumulative sum):

    +
    +\[ +v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2, +\]
    +

    with \(\rho\) typically \(0.9\) (or \(0.99\)).

    +
      +
    1. Update: \(\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t)\).

    2. +
    3. Recent gradients have more weight, so \(v_t\) adapts to the current landscape.

    4. +
    5. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.

    6. +
    +

    RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)

    +
    +
    +

    RMSProp algorithm, taken from Goodfellow et al#

    + + +

    Figure 1:

    +
    +
    +

    Adam Optimizer#

    +

    Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.

    +
      +
    1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).

    2. +
    3. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).

    4. +
    5. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)

    6. +
    7. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)

    8. +
    +

    Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.

    +
    +
    +

    ADAM optimizer#

    +

    In ADAM, we keep a running average of +both the first and second moment of the gradient and use this +information to adaptively change the learning rate for different +parameters. The method is efficient when working with large +problems involving lots data and/or parameters. It is a combination of the +gradient descent with momentum algorithm and the RMSprop algorithm +discussed above.

    +
    +
    +

    Why Combine Momentum and RMSProp?#

    +
      +
    1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).

    2. +
    3. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).

    4. +
    5. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)

    6. +
    7. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)

    8. +
    +

    Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice

    +
    +
    +

    Adam: Exponential Moving Averages (Moments)#

    +

    Adam maintains two moving averages at each time step \(t\) for each parameter \(w\): +First moment (mean) \(m_t\).

    +

    The Momentum term

    +
    +\[ +m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t), +\]
    +

    Second moment (uncentered variance) \(v_t\).

    +

    The RMS term

    +
    +\[ +v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2, +\]
    +

    with typical \(\beta_1 = 0.9\), \(\beta_2 = 0.999\). Initialize \(m_0 = 0\), \(v_0 = 0\).

    +

    These are biased estimators of the true first and second moment of the gradients, especially at the start (since \(m_0,v_0\) are zero)

    +
    +
    +

    Adam: Bias Correction#

    +

    To counteract initialization bias in \(m_t, v_t\), Adam computes bias-corrected estimates

    +
    +\[ +\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. +\]
    +
      +
    • When \(t\) is small, \(1-\beta_i^t \approx 0\), so \(\hat{m}_t, \hat{v}_t\) significantly larger than raw \(m_t, v_t\), compensating for the initial zero bias.

    • +
    • As \(t\) increases, \(1-\beta_i^t \to 1\), and \(\hat{m}_t, \hat{v}_t\) converge to \(m_t, v_t\).

    • +
    • Bias correction is important for Adam’s stability in early iterations

    • +
    +
    +
    +

    Adam: Update Rule Derivation#

    +

    Finally, Adam updates parameters using the bias-corrected moments:

    +
    +\[ +\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t, +\]
    +

    where \(\epsilon\) is a small constant (e.g. \(10^{-8}\)) to prevent division by zero. +Breaking it down:

    +
      +
    1. Compute gradient \(\nabla C(\theta_t)\).

    2. +
    3. Update first moment \(m_t\) and second moment \(v_t\) (exponential moving averages).

    4. +
    5. Bias-correct: \(\hat{m}_t = m_t/(1-\beta_1^t)\), \(\; \hat{v}_t = v_t/(1-\beta_2^t)\).

    6. +
    7. Compute step: \(\Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}\).

    8. +
    9. Update parameters: \(\theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t\).

    10. +
    +

    This is the Adam update rule as given in the original paper.

    +
    +
    +

    Adam vs. AdaGrad and RMSProp#

    +
      +
    1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)

    2. +
    3. RMSProp: Uses moving average of squared gradients (like Adam’s \(v_t\)) to maintain adaptive learning rates, but does not include momentum or bias-correction.

    4. +
    5. Adam: Effectively RMSProp + Momentum + Bias-correction

    6. +
    +
      +
    • Momentum (\(m_t\)) provides acceleration and smoother convergence.

    • +
    • Adaptive \(v_t\) scaling moderates the step size per dimension.

    • +
    • Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.

    • +
    +

    In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone

    +
    +
    +

    Adaptivity Across Dimensions#

    +
      +
    1. Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.

    2. +
    3. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.

    4. +
    5. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.

    6. +
    +
    +
    +

    ADAM algorithm, taken from Goodfellow et al#

    + + +

    Figure 1:

    +
    +
    +

    Algorithms and codes for Adagrad, RMSprop and Adam#

    +

    The algorithms we have implemented are well described in the text by Goodfellow, Bengio and Courville, chapter 8.

    +

    The codes which implement these algorithms are discussed below here.

    +
    +
    +

    Practical tips#

    +
      +
    • Randomize the data when making mini-batches. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.

    • +
    • Transform your inputs. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.

    • +
    • Monitor the out-of-sample performance. Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This early stopping significantly improves performance in many settings.

    • +
    • Adaptive optimization methods don’t always have good generalization. Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.

    • +
    +
    +
    +

    Sneaking in automatic differentiation using Autograd#

    +

    In the examples here we take the liberty of sneaking in automatic +differentiation (without having discussed the mathematics). In +project 1 you will write the gradients as discussed above, that is +hard-coding the gradients. By introducing automatic differentiation +via the library autograd, which is now replaced by JAX, we have +more flexibility in setting up alternative cost functions.

    +

    The +first example shows results with ordinary leats squares.

    +
    +
    +
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Same code but now with momentum gradient descent#

    +
    +
    +
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x#+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 30
    +
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd")
    +print(theta)
    +
    +# Now improve with momentum gradient descent
    +change = 0.0
    +delta_momentum = 0.3
    +for iter in range(Niterations):
    +    # calculate gradient
    +    gradients = training_gradient(theta)
    +    # calculate update
    +    new_change = eta*gradients+delta_momentum*change
    +    # take a step
    +    theta -= new_change
    +    # save the change
    +    change = new_change
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd wth momentum")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +

    Including Stochastic Gradient Descent with Autograd#

    +

    In this code we include the stochastic gradient descent approach +discussed above. Note here that we specify which argument we are +taking the derivative with respect to when using autograd.

    +
    +
    +
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +

    Same code but now with momentum gradient descent#

    +
    +
    +
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +change = 0.0
    +delta_momentum = 0.3
    +
    +for epoch in range(n_epochs):
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        # calculate update
    +        new_change = eta*gradients+delta_momentum*change
    +        # take a step
    +        theta -= new_change
    +        # save the change
    +        change = new_change
    +print("theta from own sdg with momentum")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +

    But none of these can compete with Newton’s method#

    +

    Note that we here have introduced automatic differentiation

    +
    +
    +
    # Using Newton's method
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+5*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +# Note that here the Hessian does not depend on the parameters theta
    +invH = np.linalg.pinv(H)
    +theta = np.random.randn(3,1)
    +Niterations = 5
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= invH @ gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own Newton code")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +

    Similar (second order function now) problem but now with AdaGrad#

    +
    +
    +
    # Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        Giter += gradients*gradients
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +        theta -= update
    +print("theta from own AdaGrad")
    +print(theta)
    +
    +
    +
    +
    +

    Running this code we note an almost perfect agreement with the results from matrix inversion.

    +
    +
    +

    RMSprop for adaptive learning rate with Stochastic Gradient Descent#

    +
    +
    +
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameter rho
    +rho = 0.99
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +	# Accumulated gradient
    +	# Scaling with rho the new and the previous results
    +        Giter = (rho*Giter+(1-rho)*gradients*gradients)
    +	# Taking the diagonal only and inverting
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +	# Hadamard product
    +        theta -= update
    +print("theta from own RMSprop")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +

    And finally ADAM#

    +
    +
    +
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980
    +theta1 = 0.9
    +theta2 = 0.999
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-7
    +iter = 0
    +for epoch in range(n_epochs):
    +    first_moment = 0.0
    +    second_moment = 0.0
    +    iter += 1
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        # Computing moments first
    +        first_moment = theta1*first_moment + (1-theta1)*gradients
    +        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients
    +        first_term = first_moment/(1.0-theta1**iter)
    +        second_term = second_moment/(1.0-theta2**iter)
    +	# Scaling with rho the new and the previous results
    +        update = eta*first_term/(np.sqrt(second_term)+delta)
    +        theta -= update
    +print("theta from own ADAM")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +

    Material for the lab sessions#

    +
      +
    1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)

    2. +
    3. Work on project 1

    4. +
    + +

    For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen’s article is highly recommended.

    +
    +
    +

    Reminder on different scaling methods#

    +

    Before fitting a regression model, it is good practice to normalize or +standardize the features. This ensures all features are on a +comparable scale, which is especially important when using +regularization. In the exercises this week we will perform standardization, scaling each +feature to have mean 0 and standard deviation 1.

    +

    Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix \(\boldsymbol{X}\). +Then we subtract the mean and divide by the standard deviation for each feature.

    +

    In the example here we +we will also center the target \(\boldsymbol{y}\) to mean \(0\). Centering \(\boldsymbol{y}\) +(and each feature) means the model does not require a separate intercept +term, the data is shifted such that the intercept is effectively 0 +. (In practice, one could include an intercept in the model and not +penalize it, but here we simplify by centering.) +Choose \(n=100\) data points and set up \(\boldsymbol{x}, \)\boldsymbol{y}\( and the design matrix \)\boldsymbol{X}$.

    +
    +
    +
    # Standardize features (zero mean, unit variance for each feature)
    +X_mean = X.mean(axis=0)
    +X_std = X.std(axis=0)
    +X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
    +X_norm = (X - X_mean) / X_std
    +
    +# Center the target to zero mean (optional, to simplify intercept handling)
    +y_mean = ?
    +y_centered = ?
    +
    +
    +
    +
    +

    Do we need to center the values of \(y\)?

    +

    After this preprocessing, each column of \(\boldsymbol{X}_{\mathrm{norm}}\) has mean zero and standard deviation \(1\) +and \(\boldsymbol{y}_{\mathrm{centered}}\) has mean 0. This can make the optimization landscape +nicer and ensures the regularization penalty \(\lambda \sum_j +\theta_j^2\) in Ridge regression treats each coefficient fairly (since features are on the +same scale).

    +
    +
    +

    Functionality in Scikit-Learn#

    +

    Scikit-Learn has several functions which allow us to rescale the +data, normally resulting in much better results in terms of various +accuracy scores. The StandardScaler function in Scikit-Learn +ensures that for each feature/predictor we study the mean value is +zero and the variance is one (every column in the design/feature +matrix). This scaling has the drawback that it does not ensure that +we have a particular maximum or minimum in our data set. Another +function included in Scikit-Learn is the MinMaxScaler which +ensures that all features are exactly between \(0\) and \(1\). The

    +
    +
    +

    More preprocessing#

    +

    The Normalizer scales each data +point such that the feature vector has a euclidean length of one. In other words, it +projects a data point on the circle (or sphere in the case of higher dimensions) with a +radius of 1. This means every data point is scaled by a different number (by the +inverse of it’s length). +This normalization is often used when only the direction (or angle) of the data matters, +not the length of the feature vector.

    +

    The RobustScaler works similarly to the StandardScaler in that it +ensures statistical properties for each feature that guarantee that +they are on the same scale. However, the RobustScaler uses the median +and quartiles, instead of mean and variance. This makes the +RobustScaler ignore data points that are very different from the rest +(like measurement errors). These odd data points are also called +outliers, and might often lead to trouble for other scaling +techniques.

    +
    +
    +

    Frequently used scaling functions#

    +

    Many features are often scaled using standardization to improve performance. In Scikit-Learn this is given by the StandardScaler function as discussed above. It is easy however to write your own. +Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:

    +
    +\[ +x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)}, +\]
    +

    where \(\overline{x}_j\) and \(\sigma(x_j)\) are the mean and standard deviation, respectively, of the feature \(x_j\). +This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don’t wish to calculate it, it is then common to simply set it to one.

    +

    Keep in mind that when you transform your data set before training a model, the same transformation needs to be done +on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as

    +
    +
    +
    """
    +#Model training, we compute the mean value of y and X
    +y_train_mean = np.mean(y_train)
    +X_train_mean = np.mean(X_train,axis=0)
    +X_train = X_train - X_train_mean
    +y_train = y_train - y_train_mean
    +
    +# The we fit our model with the training data
    +trained_model = some_model.fit(X_train,y_train)
    +
    +
    +#Model prediction, we need also to transform our data set used for the prediction.
    +X_test = X_test - X_train_mean #Use mean from training data
    +y_pred = trained_model(X_test)
    +y_pred = y_pred + y_train_mean
    +"""
    +
    +
    +
    +
    +

    Let us try to understand what this may imply mathematically when we +subtract the mean values, also known as zero centering. For +simplicity, we will focus on ordinary regression, as done in the above example.

    +

    The cost/loss function for regression is

    +
    +\[ +C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,. +\]
    +

    Recall also that we use the squared value. This expression can lead to an +increased penalty for higher differences between predicted and +output/target values.

    +

    What we have done is to single out the \(\theta_0\) term in the +definition of the mean squared error (MSE). The design matrix \(X\) +does in this case not contain any intercept column. When we take the +derivative with respect to \(\theta_0\), we want the derivative to obey

    +
    +\[ +\frac{\partial C}{\partial \theta_j} = 0, +\]
    +

    for all \(j\). For \(\theta_0\) we have

    +
    +\[ +\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right). +\]
    +

    Multiplying away the constant \(2/n\), we obtain

    +
    +\[ +\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j. +\]
    +

    Let us specialize first to the case where we have only two parameters \(\theta_0\) and \(\theta_1\). +Our result for \(\theta_0\) simplifies then to

    +
    +\[ +n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1. +\]
    +

    We obtain then

    +
    +\[ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}. +\]
    +

    If we define

    +
    +\[ +\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}, +\]
    +

    and the mean value of the outputs as

    +
    +\[ +\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i, +\]
    +

    we have

    +
    +\[ +\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}. +\]
    +

    In the general case with more parameters than \(\theta_0\) and \(\theta_1\), we have

    +
    +\[ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j. +\]
    +

    We can rewrite the latter equation as

    +
    +\[ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j, +\]
    +

    where we have defined

    +
    +\[ +\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij}, +\]
    +

    the mean value for all elements of the column vector \(\boldsymbol{x}_j\).

    +

    Replacing \(y_i\) with \(y_i - y_i - \overline{\boldsymbol{y}}\) and centering also our design matrix results in a cost function (in vector-matrix disguise)

    +
    +\[ +C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). +\]
    +

    If we minimize with respect to \(\boldsymbol{\theta}\) we have then

    +
    +\[ +\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}, +\]
    +

    where \(\boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}}\) +and \(\tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj}\).

    +

    For Ridge regression we need to add \(\lambda \boldsymbol{\theta}^T\boldsymbol{\theta}\) to the cost function and get then

    +
    +\[ +\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}. +\]
    +

    What does this mean? And why do we insist on all this? Let us look at some examples.

    +

    This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (code example thanks to Øyvind Sigmundson Schøyen). Here our scaling of the data is done by subtracting the mean values only. +Note also that we do not split the data into training and test.

    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +
    +from sklearn.linear_model import LinearRegression
    +
    +
    +np.random.seed(2021)
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +
    +
    +def fit_theta(X, y):
    +    return np.linalg.pinv(X.T @ X) @ X.T @ y
    +
    +
    +true_theta = [2, 0.5, 3.7]
    +
    +x = np.linspace(0, 1, 11)
    +y = np.sum(
    +    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0
    +) + 0.1 * np.random.normal(size=len(x))
    +
    +degree = 3
    +X = np.zeros((len(x), degree))
    +
    +# Include the intercept in the design matrix
    +for p in range(degree):
    +    X[:, p] = x ** p
    +
    +theta = fit_theta(X, y)
    +
    +# Intercept is included in the design matrix
    +skl = LinearRegression(fit_intercept=False).fit(X, y)
    +
    +print(f"True theta: {true_theta}")
    +print(f"Fitted theta: {theta}")
    +print(f"Sklearn fitted theta: {skl.coef_}")
    +ypredictOwn = X @ theta
    +ypredictSKL = skl.predict(X)
    +print(f"MSE with intercept column")
    +print(MSE(y,ypredictOwn))
    +print(f"MSE with intercept column from SKL")
    +print(MSE(y,ypredictSKL))
    +
    +
    +plt.figure()
    +plt.scatter(x, y, label="Data")
    +plt.plot(x, X @ theta, label="Fit")
    +plt.plot(x, skl.predict(X), label="Sklearn (fit_intercept=False)")
    +
    +
    +# Do not include the intercept in the design matrix
    +X = np.zeros((len(x), degree - 1))
    +
    +for p in range(degree - 1):
    +    X[:, p] = x ** (p + 1)
    +
    +# Intercept is not included in the design matrix
    +skl = LinearRegression(fit_intercept=True).fit(X, y)
    +
    +# Use centered values for X and y when computing coefficients
    +y_offset = np.average(y, axis=0)
    +X_offset = np.average(X, axis=0)
    +
    +theta = fit_theta(X - X_offset, y - y_offset)
    +intercept = np.mean(y_offset - X_offset @ theta)
    +
    +print(f"Manual intercept: {intercept}")
    +print(f"Fitted theta (without intercept): {theta}")
    +print(f"Sklearn intercept: {skl.intercept_}")
    +print(f"Sklearn fitted theta (without intercept): {skl.coef_}")
    +ypredictOwn = X @ theta
    +ypredictSKL = skl.predict(X)
    +print(f"MSE with Manual intercept")
    +print(MSE(y,ypredictOwn+intercept))
    +print(f"MSE with Sklearn intercept")
    +print(MSE(y,ypredictSKL))
    +
    +plt.plot(x, X @ theta + intercept, "--", label="Fit (manual intercept)")
    +plt.plot(x, skl.predict(X), "--", label="Sklearn (fit_intercept=True)")
    +plt.grid()
    +plt.legend()
    +
    +plt.show()
    +
    +
    +
    +
    +

    The intercept is the value of our output/target variable +when all our features are zero and our function crosses the \(y\)-axis (for a one-dimensional case).

    +

    Printing the MSE, we see first that both methods give the same MSE, as +they should. However, when we move to for example Ridge regression, +the way we treat the intercept may give a larger or smaller MSE, +meaning that the MSE can be penalized by the value of the +intercept. Not including the intercept in the fit, means that the +regularization term does not include \(\theta_0\). For different values +of \(\lambda\), this may lead to different MSE values.

    +

    To remind the reader, the regularization term, with the intercept in Ridge regression, is given by

    +
    +\[ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2, +\]
    +

    but when we take out the intercept, this equation becomes

    +
    +\[ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2. +\]
    +

    For Lasso regression we have

    +
    +\[ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert. +\]
    +

    It means that, when scaling the design matrix and the outputs/targets, +by subtracting the mean values, we have an optimization problem which +is not penalized by the intercept. The MSE value can then be smaller +since it focuses only on the remaining quantities. If we however bring +back the intercept, we will get a MSE which then contains the +intercept.

    +

    Armed with this wisdom, we attempt first to simply set the intercept equal to False in our implementation of Ridge regression for our well-known vanilla data set.

    +
    +
    +
    import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import train_test_split
    +from sklearn import linear_model
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +
    +
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(3155)
    +
    +n = 100
    +x = np.random.rand(n)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
    +
    +Maxpolydegree = 20
    +X = np.zeros((n,Maxpolydegree))
    +#We include explicitely the intercept column
    +for degree in range(Maxpolydegree):
    +    X[:,degree] = x**degree
    +# We split the data in test and training data
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    +
    +p = Maxpolydegree
    +I = np.eye(p,p)
    +# Decide which values of lambda to use
    +nlambdas = 6
    +MSEOwnRidgePredict = np.zeros(nlambdas)
    +MSERidgePredict = np.zeros(nlambdas)
    +lambdas = np.logspace(-4, 2, nlambdas)
    +for i in range(nlambdas):
    +    lmb = lambdas[i]
    +    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
    +    # Note: we include the intercept column and no scaling
    +    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)
    +    RegRidge.fit(X_train,y_train)
    +    # and then make the prediction
    +    ytildeOwnRidge = X_train @ OwnRidgeTheta
    +    ypredictOwnRidge = X_test @ OwnRidgeTheta
    +    ytildeRidge = RegRidge.predict(X_train)
    +    ypredictRidge = RegRidge.predict(X_test)
    +    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
    +    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
    +    print("Theta values for own Ridge implementation")
    +    print(OwnRidgeTheta)
    +    print("Theta values for Scikit-Learn Ridge implementation")
    +    print(RegRidge.coef_)
    +    print("MSE values for own Ridge implementation")
    +    print(MSEOwnRidgePredict[i])
    +    print("MSE values for Scikit-Learn Ridge implementation")
    +    print(MSERidgePredict[i])
    +
    +# Now plot the results
    +plt.figure()
    +plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')
    +plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')
    +
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('MSE')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +

    The results here agree when we force Scikit-Learn’s Ridge function to include the first column in our design matrix. +We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix. +What happens if we do not include the intercept in our fit? +Let us see how we can change this code by zero centering.

    +
    +
    +
    import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import train_test_split
    +from sklearn import linear_model
    +from sklearn.preprocessing import StandardScaler
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(315)
    +
    +n = 100
    +x = np.random.rand(n)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
    +
    +Maxpolydegree = 20
    +X = np.zeros((n,Maxpolydegree-1))
    +
    +for degree in range(1,Maxpolydegree): #No intercept column
    +    X[:,degree-1] = x**(degree)
    +
    +# We split the data in test and training data
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    +
    +#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable
    +X_train_mean = np.mean(X_train,axis=0)
    +#Center by removing mean from each feature
    +X_train_scaled = X_train - X_train_mean 
    +X_test_scaled = X_test - X_train_mean
    +#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)
    +#Remove the intercept from the training data.
    +y_scaler = np.mean(y_train)           
    +y_train_scaled = y_train - y_scaler   
    +
    +p = Maxpolydegree-1
    +I = np.eye(p,p)
    +# Decide which values of lambda to use
    +nlambdas = 6
    +MSEOwnRidgePredict = np.zeros(nlambdas)
    +MSERidgePredict = np.zeros(nlambdas)
    +
    +lambdas = np.logspace(-4, 2, nlambdas)
    +for i in range(nlambdas):
    +    lmb = lambdas[i]
    +    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
    +    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data
    +    #Add intercept to prediction
    +    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
    +    RegRidge = linear_model.Ridge(lmb)
    +    RegRidge.fit(X_train,y_train)
    +    ypredictRidge = RegRidge.predict(X_test)
    +    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
    +    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
    +    print("Theta values for own Ridge implementation")
    +    print(OwnRidgeTheta) #Intercept is given by mean of target variable
    +    print("Theta values for Scikit-Learn Ridge implementation")
    +    print(RegRidge.coef_)
    +    print('Intercept from own implementation:')
    +    print(intercept_)
    +    print('Intercept from Scikit-Learn Ridge implementation')
    +    print(RegRidge.intercept_)
    +    print("MSE values for own Ridge implementation")
    +    print(MSEOwnRidgePredict[i])
    +    print("MSE values for Scikit-Learn Ridge implementation")
    +    print(MSERidgePredict[i])
    +
    +
    +# Now plot the results
    +plt.figure()
    +plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')
    +plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('MSE')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +

    We see here, when compared to the code which includes explicitely the +intercept column, that our MSE value is actually smaller. This is +because the regularization term does not include the intercept value +\(\theta_0\) in the fitting. This applies to Lasso regularization as +well. It means that our optimization is now done only with the +centered matrix and/or vector that enter the fitting procedure.

    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week38.html b/doc/LectureNotes/_build/html/week38.html new file mode 100644 index 000000000..b62f3b3f0 --- /dev/null +++ b/doc/LectureNotes/_build/html/week38.html @@ -0,0 +1,1860 @@ + + + + + + + + + + + Week 38: Statistical analysis, bias-variance tradeoff and resampling methods — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 38: Statistical analysis, bias-variance tradeoff and resampling methods

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 38: Statistical analysis, bias-variance tradeoff and resampling methods#

    +

    Morten Hjorth-Jensen, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway

    +

    Date: September 15-19, 2025

    +
    +

    Plans for week 38, lecture Monday September 15#

    +

    Material for the lecture on Monday September 15.

    +
      +
    1. Statistical interpretation of OLS and various expectation values

    2. +
    3. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff

    4. +
    5. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at https://youtu.be/J_41Hld6tTU

    6. +
    7. Video of Lecture

    8. +
    9. Whiteboard notes

    10. +
    +
    +
    +

    Readings and Videos#

    +
      +
    1. Raschka et al, pages 175-192

    2. +
    3. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See https://link.springer.com/book/10.1007/978-0-387-84858-7.

    4. +
    5. Video on bias-variance tradeoff

    6. +
    7. Video on Bootstrapping

    8. +
    9. Video on cross validation

    10. +
    +

    For the lab session, the following video on cross validation (from 2024), could be helpful, see https://www.youtube.com/watch?v=T9jjWsmsd1o

    +
    +
    +

    Linking the regression analysis with a statistical interpretation#

    +

    We will now couple the discussions of ordinary least squares, Ridge +and Lasso regression with a statistical interpretation, that is we +move from a linear algebra analysis to a statistical analysis. In +particular, we will focus on what the regularization terms can result +in. We will amongst other things show that the regularization +parameter can reduce considerably the variance of the parameters +\(\theta\).

    +

    On of the advantages of doing linear regression is that we actually end up with +analytical expressions for several statistical quantities.
    +Standard least squares and Ridge regression allow us to +derive quantities like the variance and other expectation values in a +rather straightforward way.

    +

    It is assumed that \(\varepsilon_i +\sim \mathcal{N}(0, \sigma^2)\) and the \(\varepsilon_{i}\) are +independent, i.e.:

    +
    +\[\begin{split} +\begin{align*} +\mbox{Cov}(\varepsilon_{i_1}, +\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if} +& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2. \end{array} \right. +\end{align*} +\end{split}\]
    +

    The randomness of \(\varepsilon_i\) implies that +\(\mathbf{y}_i\) is also a random variable. In particular, +\(\mathbf{y}_i\) is normally distributed, because \(\varepsilon_i \sim +\mathcal{N}(0, \sigma^2)\) and \(\mathbf{X}_{i,\ast} \, \boldsymbol{\theta}\) is a +non-random scalar. To specify the parameters of the distribution of +\(\mathbf{y}_i\) we need to calculate its first two moments.

    +

    Recall that \(\boldsymbol{X}\) is a matrix of dimensionality \(n\times p\). The +notation above \(\mathbf{X}_{i,\ast}\) means that we are looking at the +row number \(i\) and perform a sum over all values \(p\).

    +
    +
    +

    Assumptions made#

    +

    The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off) +that there exists a function \(f(\boldsymbol{x})\) and a normal distributed error \(\boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2)\) +which describe our data

    +
    +\[ +\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon} +\]
    +

    We approximate this function with our model from the solution of the linear regression equations, that is our +function \(f\) is approximated by \(\boldsymbol{\tilde{y}}\) where we want to minimize \((\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\), our MSE, with

    +
    +\[ +\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\theta}. +\]
    +
    +
    +

    Expectation value and variance#

    +

    We can calculate the expectation value of \(\boldsymbol{y}\) for a given element \(i\)

    +
    +\[ +\begin{align*} +\mathbb{E}(y_i) & = +\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\theta}) + \mathbb{E}(\varepsilon_i) +\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, +\end{align*} +\]
    +

    while +its variance is

    +
    +\[\begin{split} +\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i +- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) - +[\mathbb{E}(y_i)]^2 \\ & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, +\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 \\ & += \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 \varepsilon_i +\mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i, +\ast} \, \theta)^2 \\ & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 +\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + +\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 +\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \, +\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2. +\end{align*} +\end{split}\]
    +

    Hence, \(y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta}, \sigma^2)\), that is \(\boldsymbol{y}\) follows a normal distribution with +mean value \(\boldsymbol{X}\boldsymbol{\theta}\) and variance \(\sigma^2\) (not be confused with the singular values of the SVD).

    +
    +
    +

    Expectation value and variance for \(\boldsymbol{\theta}\)#

    +

    With the OLS expressions for the optimal parameters \(\boldsymbol{\hat{\theta}}\) we can evaluate the expectation value

    +
    +\[ +\mathbb{E}(\boldsymbol{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\theta}=\boldsymbol{\theta}. +\]
    +

    This means that the estimator of the regression parameters is unbiased.

    +

    We can also calculate the variance

    +

    The variance of the optimal value \(\boldsymbol{\hat{\theta}}\) is

    +
    +\[\begin{split} +\begin{eqnarray*} +\mbox{Var}(\boldsymbol{\hat{\theta}}) & = & \mathbb{E} \{ [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})] [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})]^{T} \} +\\ +& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}]^{T} \} +\\ +% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} +% \\ +% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} +% \\ +& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} +\\ +& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} +% \\ +% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^T \, \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1} +% \\ +% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\theta} \boldsymbol{\theta}^T +\\ +& = & \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} +\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1}, +\end{eqnarray*} +\end{split}\]
    +

    where we have used that \(\mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) = +\mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} + +\sigma^2 \, \mathbf{I}_{nn}\). From \(\mbox{Var}(\boldsymbol{\theta}) = \sigma^2 +\, (\mathbf{X}^{T} \mathbf{X})^{-1}\), one obtains an estimate of the +variance of the estimate of the \(j\)-th regression coefficient: +\(\boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \). This may be used to +construct a confidence interval for the estimates.

    +

    In a similar way, we can obtain analytical expressions for say the +expectation values of the parameters \(\boldsymbol{\theta}\) and their variance +when we employ Ridge regression, allowing us again to define a confidence interval.

    +

    It is rather straightforward to show that

    +
    +\[ +\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\theta}^{\mathrm{OLS}}. +\]
    +

    We see clearly that +\(\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big] \not= \boldsymbol{\theta}^{\mathrm{OLS}}\) for any \(\lambda > 0\). We say then that the ridge estimator is biased.

    +

    We can also compute the variance as

    +
    +\[ +\mbox{Var}[\boldsymbol{\theta}^{\mathrm{Ridge}}]=\sigma^2[ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1} \mathbf{X}^{T} \mathbf{X} \{ [ \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}, +\]
    +

    and it is easy to see that if the parameter \(\lambda\) goes to infinity then the variance of Ridge parameters \(\boldsymbol{\theta}\) goes to zero.

    +

    With this, we can compute the difference

    +
    +\[ +\mbox{Var}[\boldsymbol{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\boldsymbol{\theta}^{\mathrm{Ridge}})=\sigma^2 [ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}. +\]
    +

    The difference is non-negative definite since each component of the +matrix product is non-negative definite. +This means the variance we obtain with the standard OLS will always for \(\lambda > 0\) be larger than the variance of \(\boldsymbol{\theta}\) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below.

    +
    +
    +

    Deriving OLS from a probability distribution#

    +

    Our basic assumption when we derived the OLS equations was to assume +that our output is determined by a given continuous function +\(f(\boldsymbol{x})\) and a random noise \(\boldsymbol{\epsilon}\) given by the normal +distribution with zero mean value and an undetermined variance +\(\sigma^2\).

    +

    We found above that the outputs \(\boldsymbol{y}\) have a mean value given by +\(\boldsymbol{X}\hat{\boldsymbol{\theta}}\) and variance \(\sigma^2\). Since the entries to +the design matrix are not stochastic variables, we can assume that the +probability distribution of our targets is also a normal distribution +but now with mean value \(\boldsymbol{X}\hat{\boldsymbol{\theta}}\). This means that a +single output \(y_i\) is given by the Gaussian distribution

    +
    +\[ +y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}. +\]
    +
    +
    +

    Independent and Identically Distributed (iid)#

    +

    We assume now that the various \(y_i\) values are stochastically distributed according to the above Gaussian distribution. +We define this distribution as

    +
    +\[ +p(y_i, \boldsymbol{X}\vert\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}, +\]
    +

    which reads as finding the likelihood of an event \(y_i\) with the input variables \(\boldsymbol{X}\) given the parameters (to be determined) \(\boldsymbol{\theta}\).

    +

    Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \(\boldsymbol{y}\) as the product of the single events, that is we have

    +
    +\[ +p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta}). +\]
    +

    We will write this in a more compact form reserving \(\boldsymbol{D}\) for the domain of events, including the ouputs (targets) and the inputs. That is +in case we have a simple one-dimensional input and output case

    +
    +\[ +\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})]. +\]
    +

    In the more general case the various inputs should be replaced by the possible features represented by the input data set \(\boldsymbol{X}\). +We can now rewrite the above probability as

    +
    +\[ +p(\boldsymbol{D}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}. +\]
    +

    It is a conditional probability (see below) and reads as the likelihood of a domain of events \(\boldsymbol{D}\) given a set of parameters \(\boldsymbol{\theta}\).

    +
    +
    +

    Maximum Likelihood Estimation (MLE)#

    +

    In statistics, maximum likelihood estimation (MLE) is a method of +estimating the parameters of an assumed probability distribution, +given some observed data. This is achieved by maximizing a likelihood +function so that, under the assumed statistical model, the observed +data is the most probable.

    +

    We will assume here that our events are given by the above Gaussian +distribution and we will determine the optimal parameters \(\theta\) by +maximizing the above PDF. However, computing the derivatives of a +product function is cumbersome and can easily lead to overflow and/or +underflowproblems, with potentials for loss of numerical precision.

    +

    In practice, it is more convenient to maximize the logarithm of the +PDF because it is a monotonically increasing function of the argument. +Alternatively, and this will be our option, we will minimize the +negative of the logarithm since this is a monotonically decreasing +function.

    +

    Note also that maximization/minimization of the logarithm of the PDF +is equivalent to the maximization/minimization of the function itself.

    +
    +
    +

    A new Cost Function#

    +

    We could now define a new cost function to minimize, namely the negative logarithm of the above PDF

    +
    +\[ +C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}, +\]
    +

    which becomes

    +
    +\[ +C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}. +\]
    +

    Taking the derivative of the new cost function with respect to the parameters \(\theta\) we recognize our familiar OLS equation, namely

    +
    +\[ +\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0, +\]
    +

    which leads to the well-known OLS equation for the optimal paramters \(\theta\)

    +
    +\[ +\hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}! +\]
    +

    Next week we will make a similar analysis for Ridge and Lasso regression

    +
    +
    +

    Why resampling methods#

    +

    Before we proceed, we need to rethink what we have been doing. In our +eager to fit the data, we have omitted several important elements in +our regression analysis. In what follows we will

    +
      +
    1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff

    2. +
    3. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more

    4. +
    +

    and discuss how to select a given model (one of the difficult parts in machine learning).

    +
    +
    +

    Resampling methods#

    +

    Resampling methods are an indispensable tool in modern +statistics. They involve repeatedly drawing samples from a training +set and refitting a model of interest on each sample in order to +obtain additional information about the fitted model. For example, in +order to estimate the variability of a linear regression fit, we can +repeatedly draw different samples from the training data, fit a linear +regression to each new sample, and then examine the extent to which +the resulting fits differ. Such an approach may allow us to obtain +information that would not be available from fitting the model only +once using the original training sample.

    +

    Two resampling methods are often used in Machine Learning analyses,

    +
      +
    1. The bootstrap method

    2. +
    3. and Cross-Validation

    4. +
    +

    In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular +cross-validation and the bootstrap method.

    +
    +
    +

    Resampling approaches can be computationally expensive#

    +

    Resampling approaches can be computationally expensive, because they +involve fitting the same statistical method multiple times using +different subsets of the training data. However, due to recent +advances in computing power, the computational requirements of +resampling methods generally are not prohibitive. In this chapter, we +discuss two of the most commonly used resampling methods, +cross-validation and the bootstrap. Both methods are important tools +in the practical application of many statistical learning +procedures. For example, cross-validation can be used to estimate the +test error associated with a given statistical learning method in +order to evaluate its performance, or to select the appropriate level +of flexibility. The process of evaluating a model’s performance is +known as model assessment, whereas the process of selecting the proper +level of flexibility for a model is known as model selection. The +bootstrap is widely used.

    +
    +
    +

    Why resampling methods ?#

    +

    Statistical analysis.

    +
      +
    • Our simulations can be treated as computer experiments. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.

    • +
    • The results can be analysed with the same statistical tools as we would use when analysing experimental data.

    • +
    • As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.

    • +
    +
    +
    +

    Statistical analysis#

    +
      +
    • As in other experiments, many numerical experiments have two classes of errors:

      +
        +
      • Statistical errors

      • +
      • Systematical errors

      • +
      +
    • +
    • Statistical errors can be estimated using standard tools from statistics

    • +
    • Systematical errors are method specific and must be treated differently from case to case.

    • +
    +
    +
    +

    Resampling methods#

    +

    With all these analytical equations for both the OLS and Ridge +regression, we will now outline how to assess a given model. This will +lead to a discussion of the so-called bias-variance tradeoff (see +below) and so-called resampling methods.

    +

    One of the quantities we have discussed as a way to measure errors is +the mean-squared error (MSE), mainly used for fitting of continuous +functions. Another choice is the absolute error.

    +

    In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data, +we discuss the

    +
      +
    1. prediction error or simply the test error \(\mathrm{Err_{Test}}\), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the

    2. +
    3. training error \(\mathrm{Err_{Train}}\), which is the average loss over the training data.

    4. +
    +

    As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error. +For a certain level of complexity the test error will reach minimum, before starting to increase again. The +training error reaches a saturation.

    +
    +
    +

    Resampling methods: Bootstrap#

    +

    Bootstrapping is a non-parametric approach to statistical inference +that substitutes computation for more traditional distributional +assumptions and asymptotic results. Bootstrapping offers a number of +advantages:

    +
      +
    1. The bootstrap is quite general, although there are some cases in which it fails.

    2. +
    3. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.

    4. +
    5. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.

    6. +
    7. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).

    8. +
    +

    The textbook by Davison on the Bootstrap Methods and their Applications provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by Efron and Tibshirani.

    +

    Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called central limit theorem.

    +
    +
    +

    The Central Limit Theorem#

    +

    Suppose we have a PDF \(p(x)\) from which we generate a series \(N\) +of averages \(\mathbb{E}[x_i]\). Each mean value \(\mathbb{E}[x_i]\) +is viewed as the average of a specific measurement, e.g., throwing +dice 100 times and then taking the average value, or producing a certain +amount of random numbers. +For notational ease, we set \(\mathbb{E}[x_i]=x_i\) in the discussion +which follows. We do the same for \(\mathbb{E}[z]=z\).

    +

    If we compute the mean \(z\) of \(m\) such mean values \(x_i\)

    +
    +\[ +z=\frac{x_1+x_2+\dots+x_m}{m}, +\]
    +

    the question we pose is which is the PDF of the new variable \(z\).

    +
    +
    +

    Finding the Limit#

    +

    The probability of obtaining an average value \(z\) is the product of the +probabilities of obtaining arbitrary individual mean values \(x_i\), +but with the constraint that the average is \(z\). We can express this through +the following expression

    +
    +\[ +\tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m) + \delta(z-\frac{x_1+x_2+\dots+x_m}{m}), +\]
    +

    where the \(\delta\)-function enbodies the constraint that the mean is \(z\). +All measurements that lead to each individual \(x_i\) are expected to +be independent, which in turn means that we can express \(\tilde{p}\) as the +product of individual \(p(x_i)\). The independence assumption is important in the derivation of the central limit theorem.

    +
    +
    +

    Rewriting the \(\delta\)-function#

    +

    If we use the integral expression for the \(\delta\)-function

    +
    +\[ +\delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty} + dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)}, +\]
    +

    and inserting \(e^{i\mu q-i\mu q}\) where \(\mu\) is the mean value +we arrive at

    +
    +\[ +\tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty} + dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty} + dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m, +\]
    +

    with the integral over \(x\) resulting in

    +
    +\[ +\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}= + \int_{-\infty}^{\infty}dxp(x) + \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right]. +\]
    +
    +
    +

    Identifying Terms#

    +

    The second term on the rhs disappears since this is just the mean and +employing the definition of \(\sigma^2\) we have

    +
    +\[ +\int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}= + 1-\frac{q^2\sigma^2}{2m^2}+\dots, +\]
    +

    resulting in

    +
    +\[ +\left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx + \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m, +\]
    +

    and in the limit \(m\rightarrow \infty\) we obtain

    +
    +\[ +\tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})} + \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)}, +\]
    +

    which is the normal distribution with variance +\(\sigma^2_m=\sigma^2/m\), where \(\sigma\) is the variance of the PDF \(p(x)\) +and \(\mu\) is also the mean of the PDF \(p(x)\).

    +
    +
    +

    Wrapping it up#

    +

    Thus, the central limit theorem states that the PDF \(\tilde{p}(z)\) of +the average of \(m\) random values corresponding to a PDF \(p(x)\) +is a normal distribution whose mean is the +mean value of the PDF \(p(x)\) and whose variance is the variance +of the PDF \(p(x)\) divided by \(m\), the number of values used to compute \(z\).

    +

    The central limit theorem leads to the well-known expression for the +standard deviation, given by

    +
    +\[ +\sigma_m= +\frac{\sigma}{\sqrt{m}}. +\]
    +

    The latter is true only if the average value is known exactly. This is obtained in the limit +\(m\rightarrow \infty\) only. Because the mean and the variance are measured quantities we obtain +the familiar expression in statistics (the so-called Bessel correction)

    +
    +\[ +\sigma_m\approx +\frac{\sigma}{\sqrt{m-1}}. +\]
    +

    In many cases however the above estimate for the standard deviation, +in particular if correlations are strong, may be too simplistic. Keep +in mind that we have assumed that the variables \(x\) are independent +and identically distributed. This is obviously not always the +case. For example, the random numbers (or better pseudorandom numbers) +we generate in various calculations do always exhibit some +correlations.

    +

    The theorem is satisfied by a large class of PDFs. Note however that for a +finite \(m\), it is not always possible to find a closed form /analytic expression for +\(\tilde{p}(x)\).

    +
    +
    +

    Confidence Intervals#

    +

    Confidence intervals are used in statistics and represent a type of estimate +computed from the observed data. This gives a range of values for an +unknown parameter such as the parameters \(\boldsymbol{\theta}\) from linear regression.

    +

    With the OLS expressions for the parameters \(\boldsymbol{\theta}\) we found +\(\mathbb{E}(\boldsymbol{\theta}) = \boldsymbol{\theta}\), which means that the estimator of the regression parameters is unbiased.

    +

    In the exercises this week we show that the variance of the estimate of the \(j\)-th regression coefficient is +\(\boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \).

    +

    This quantity can be used to +construct a confidence interval for the estimates.

    +
    +
    +

    Standard Approach based on the Normal Distribution#

    +

    We will assume that the parameters \(\theta\) follow a normal +distribution. We can then define the confidence interval. Here we will be using as +shorthands \(\mu_{\theta}\) for the above mean value and \(\sigma_{\theta}\) +for the standard deviation. We have then a confidence interval

    +
    +\[ +\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right), +\]
    +

    where \(z\) defines the level of certainty (or confidence). For a normal +distribution typical parameters are \(z=2.576\) which corresponds to a +confidence of \(99\%\) while \(z=1.96\) corresponds to a confidence of +\(95\%\). A confidence level of \(95\%\) is commonly used and it is +normally referred to as a two-sigmas confidence level, that is we +approximate \(z\approx 2\).

    +

    For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by Davison on the Bootstrap Methods and their Applications

    +

    In this text you will also find an in-depth discussion of the +Bootstrap method, why it works and various theorems related to it.

    +
    +
    +

    Resampling methods: Bootstrap background#

    +

    Since \(\widehat{\theta} = \widehat{\theta}(\boldsymbol{X})\) is a function of random variables, +\(\widehat{\theta}\) itself must be a random variable. Thus it has +a pdf, call this function \(p(\boldsymbol{t})\). The aim of the bootstrap is to +estimate \(p(\boldsymbol{t})\) by the relative frequency of +\(\widehat{\theta}\). You can think of this as using a histogram +in the place of \(p(\boldsymbol{t})\). If the relative frequency closely +resembles \(p(\vec{t})\), then using numerics, it is straight forward to +estimate all the interesting parameters of \(p(\boldsymbol{t})\) using point +estimators.

    +
    +
    +

    Resampling methods: More Bootstrap background#

    +

    In the case that \(\widehat{\theta}\) has +more than one component, and the components are independent, we use the +same estimator on each component separately. If the probability +density function of \(X_i\), \(p(x)\), had been known, then it would have +been straightforward to do this by:

    +
      +
    1. Drawing lots of numbers from \(p(x)\), suppose we call one such set of numbers \((X_1^*, X_2^*, \cdots, X_n^*)\).

    2. +
    3. Then using these numbers, we could compute a replica of \(\widehat{\theta}\) called \(\widehat{\theta}^*\).

    4. +
    +

    By repeated use of the above two points, many +estimates of \(\widehat{\theta}\) can be obtained. The +idea is to use the relative frequency of \(\widehat{\theta}^*\) +(think of a histogram) as an estimate of \(p(\boldsymbol{t})\).

    +
    +
    +

    Resampling methods: Bootstrap approach#

    +

    But +unless there is enough information available about the process that +generated \(X_1,X_2,\cdots,X_n\), \(p(x)\) is in general +unknown. Therefore, Efron in 1979 asked the +question: What if we replace \(p(x)\) by the relative frequency +of the observation \(X_i\)?

    +

    If we draw observations in accordance with +the relative frequency of the observations, will we obtain the same +result in some asymptotic sense? The answer is yes.

    +
    +
    +

    Resampling methods: Bootstrap steps#

    +

    The independent bootstrap works like this:

    +
      +
    1. Draw with replacement \(n\) numbers for the observed variables \(\boldsymbol{x} = (x_1,x_2,\cdots,x_n)\).

    2. +
    3. Define a vector \(\boldsymbol{x}^*\) containing the values which were drawn from \(\boldsymbol{x}\).

    4. +
    5. Using the vector \(\boldsymbol{x}^*\) compute \(\widehat{\theta}^*\) by evaluating \(\widehat \theta\) under the observations \(\boldsymbol{x}^*\).

    6. +
    7. Repeat this process \(k\) times.

    8. +
    +

    When you are done, you can draw a histogram of the relative frequency +of \(\widehat \theta^*\). This is your estimate of the probability +distribution \(p(t)\). Using this probability distribution you can +estimate any statistics thereof. In principle you never draw the +histogram of the relative frequency of \(\widehat{\theta}^*\). Instead +you use the estimators corresponding to the statistic of interest. For +example, if you are interested in estimating the variance of \(\widehat +\theta\), apply the etsimator \(\widehat \sigma^2\) to the values +\(\widehat \theta^*\).

    +
    +
    +

    Code example for the Bootstrap method#

    +

    The following code starts with a Gaussian distribution with mean value +\(\mu =100\) and variance \(\sigma=15\). We use this to generate the data +used in the bootstrap analysis. The bootstrap analysis returns a data +set after a given number of bootstrap operations (as many as we have +data points). This data set consists of estimated mean values for each +bootstrap operation. The histogram generated by the bootstrap method +shows that the distribution for these mean values is also a Gaussian, +centered around the mean value \(\mu=100\) but with standard deviation +\(\sigma/\sqrt{n}\), where \(n\) is the number of bootstrap samples (in +this case the same as the number of original data points). The value +of the standard deviation is what we expect from the central limit +theorem.

    +
    +
    +
    %matplotlib inline
    +
    +import numpy as np
    +from time import time
    +from scipy.stats import norm
    +import matplotlib.pyplot as plt
    +
    +# Returns mean of bootstrap samples 
    +# Bootstrap algorithm
    +def bootstrap(data, datapoints):
    +    t = np.zeros(datapoints)
    +    n = len(data)
    +    # non-parametric bootstrap         
    +    for i in range(datapoints):
    +        t[i] = np.mean(data[np.random.randint(0,n,n)])
    +    # analysis    
    +    print("Bootstrap Statistics :")
    +    print("original           bias      std. error")
    +    print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
    +    return t
    +
    +# We set the mean value to 100 and the standard deviation to 15
    +mu, sigma = 100, 15
    +datapoints = 10000
    +# We generate random numbers according to the normal distribution
    +x = mu + sigma*np.random.randn(datapoints)
    +# bootstrap returns the data sample                                    
    +t = bootstrap(x, datapoints)
    +
    +
    +
    +
    +

    We see that our new variance and from that the standard deviation, agrees with the central limit theorem.

    +
    +
    +

    Plotting the Histogram#

    +
    +
    +
    # the histogram of the bootstrapped data (normalized data if density = True)
    +n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)
    +# add a 'best fit' line  
    +y = norm.pdf(binsboot, np.mean(t), np.std(t))
    +lt = plt.plot(binsboot, y, 'b', linewidth=1)
    +plt.xlabel('x')
    +plt.ylabel('Probability')
    +plt.grid(True)
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    The bias-variance tradeoff#

    +

    We will discuss the bias-variance tradeoff in the context of +continuous predictions such as regression. However, many of the +intuitions and ideas discussed here also carry over to classification +tasks. Consider a dataset \(\mathcal{D}\) consisting of the data +\(\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}\).

    +

    Let us assume that the true data is generated from a noisy model

    +
    +\[ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon} +\]
    +

    where \(\epsilon\) is normally distributed with mean zero and standard deviation \(\sigma^2\).

    +

    In our derivation of the ordinary least squares method we defined then +an approximation to the function \(f\) in terms of the parameters +\(\boldsymbol{\theta}\) and the design matrix \(\boldsymbol{X}\) which embody our model, +that is \(\boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta}\).

    +

    Thereafter we found the parameters \(\boldsymbol{\theta}\) by optimizing the means squared error via the so-called cost function

    +
    +\[ +C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +\]
    +

    We can rewrite this as

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2. +\]
    +

    The three terms represent the square of the bias of the learning +method, which can be thought of as the error caused by the simplifying +assumptions built into the method. The second term represents the +variance of the chosen model and finally the last terms is variance of +the error \(\boldsymbol{\epsilon}\).

    +

    To derive this equation, we need to recall that the variance of \(\boldsymbol{y}\) and \(\boldsymbol{\epsilon}\) are both equal to \(\sigma^2\). The mean value of \(\boldsymbol{\epsilon}\) is by definition equal to zero. Furthermore, the function \(f\) is not a stochastics variable, idem for \(\boldsymbol{\tilde{y}}\). +We use a more compact notation in terms of the expectation value

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right], +\]
    +

    and adding and subtracting \(\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\) we get

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right], +\]
    +

    which, using the abovementioned expectation values can be rewritten as

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2, +\]
    +

    that is the rewriting in terms of the so-called bias, the variance of the model \(\boldsymbol{\tilde{y}}\) and the variance of \(\boldsymbol{\epsilon}\).

    +
    +
    +

    A way to Read the Bias-Variance Tradeoff#

    + + +

    Figure 1:

    +
    +
    +

    Example code for Bias-Variance tradeoff#

    +
    +
    +
    import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.model_selection import train_test_split
    +from sklearn.pipeline import make_pipeline
    +from sklearn.utils import resample
    +
    +np.random.seed(2018)
    +
    +n = 500
    +n_boostraps = 100
    +degree = 18  # A quite high value, just to show.
    +noise = 0.1
    +
    +# Make data set.
    +x = np.linspace(-1, 3, n).reshape(-1, 1)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
    +
    +# Hold out some test data that is never used in training.
    +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    +
    +# Combine x transformation and model into one operation.
    +# Not neccesary, but convenient.
    +model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    +
    +# The following (m x n_bootstraps) matrix holds the column vectors y_pred
    +# for each bootstrap iteration.
    +y_pred = np.empty((y_test.shape[0], n_boostraps))
    +for i in range(n_boostraps):
    +    x_, y_ = resample(x_train, y_train)
    +
    +    # Evaluate the new model on the same test data each time.
    +    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    +
    +# Note: Expectations and variances taken w.r.t. different training
    +# data sets, hence the axis=1. Subsequent means are taken across the test data
    +# set in order to obtain a total value, but before this we have error/bias/variance
    +# calculated per data point in the test set.
    +# Note 2: The use of keepdims=True is important in the calculation of bias as this 
    +# maintains the column vector form. Dropping this yields very unexpected results.
    +error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    +bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    +variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    +print('Error:', error)
    +print('Bias^2:', bias)
    +print('Var:', variance)
    +print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))
    +
    +plt.plot(x[::5, :], y[::5, :], label='f(x)')
    +plt.scatter(x_test, y_test, label='Data points')
    +plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Understanding what happens#

    +
    +
    +
    import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.model_selection import train_test_split
    +from sklearn.pipeline import make_pipeline
    +from sklearn.utils import resample
    +
    +np.random.seed(2018)
    +
    +n = 40
    +n_boostraps = 100
    +maxdegree = 14
    +
    +
    +# Make data set.
    +x = np.linspace(-3, 3, n).reshape(-1, 1)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
    +error = np.zeros(maxdegree)
    +bias = np.zeros(maxdegree)
    +variance = np.zeros(maxdegree)
    +polydegree = np.zeros(maxdegree)
    +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    +
    +for degree in range(maxdegree):
    +    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    +    y_pred = np.empty((y_test.shape[0], n_boostraps))
    +    for i in range(n_boostraps):
    +        x_, y_ = resample(x_train, y_train)
    +        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    +
    +    polydegree[degree] = degree
    +    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    +    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    +    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    +    print('Polynomial degree:', degree)
    +    print('Error:', error[degree])
    +    print('Bias^2:', bias[degree])
    +    print('Var:', variance[degree])
    +    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
    +
    +plt.plot(polydegree, error, label='Error')
    +plt.plot(polydegree, bias, label='bias')
    +plt.plot(polydegree, variance, label='Variance')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Summing up#

    +

    The bias-variance tradeoff summarizes the fundamental tension in +machine learning, particularly supervised learning, between the +complexity of a model and the amount of training data needed to train +it. Since data is often limited, in practice it is often useful to +use a less-complex model with higher bias, that is a model whose asymptotic +performance is worse than another model because it is easier to +train and less sensitive to sampling noise arising from having a +finite-sized training dataset (smaller variance).

    +

    The above equations tell us that in +order to minimize the expected test error, we need to select a +statistical learning method that simultaneously achieves low variance +and low bias. Note that variance is inherently a nonnegative quantity, +and squared bias is also nonnegative. Hence, we see that the expected +test MSE can never lie below \(Var(\epsilon)\), the irreducible error.

    +

    What do we mean by the variance and bias of a statistical learning +method? The variance refers to the amount by which our model would change if we +estimated it using a different training data set. Since the training +data are used to fit the statistical learning method, different +training data sets will result in a different estimate. But ideally the +estimate for our model should not vary too much between training +sets. However, if a method has high variance then small changes in +the training data can result in large changes in the model. In general, more +flexible statistical methods have higher variance.

    +

    You may also find this recent article of interest.

    +
    +
    +

    Another Example from Scikit-Learn’s Repository#

    +

    This example demonstrates the problems of underfitting and overfitting and +how we can use linear regression with polynomial features to approximate +nonlinear functions. The plot shows the function that we want to approximate, +which is a part of the cosine function. In addition, the samples from the +real function and the approximations of different models are displayed. The +models have polynomial features of different degrees. We can see that a +linear function (polynomial with degree 1) is not sufficient to fit the +training samples. This is called underfitting. A polynomial of degree 4 +approximates the true function almost perfectly. However, for higher degrees +the model will overfit the training data, i.e. it learns the noise of the +training data. +We evaluate quantitatively overfitting and underfitting by using +cross-validation. We calculate the mean squared error (MSE) on the validation +set, the higher, the less likely the model generalizes correctly from the +training data.

    +
    +
    +
    
    +
    +#print(__doc__)
    +
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.pipeline import Pipeline
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.linear_model import LinearRegression
    +from sklearn.model_selection import cross_val_score
    +
    +
    +def true_fun(X):
    +    return np.cos(1.5 * np.pi * X)
    +
    +np.random.seed(0)
    +
    +n_samples = 30
    +degrees = [1, 4, 15]
    +
    +X = np.sort(np.random.rand(n_samples))
    +y = true_fun(X) + np.random.randn(n_samples) * 0.1
    +
    +plt.figure(figsize=(14, 5))
    +for i in range(len(degrees)):
    +    ax = plt.subplot(1, len(degrees), i + 1)
    +    plt.setp(ax, xticks=(), yticks=())
    +
    +    polynomial_features = PolynomialFeatures(degree=degrees[i],
    +                                             include_bias=False)
    +    linear_regression = LinearRegression()
    +    pipeline = Pipeline([("polynomial_features", polynomial_features),
    +                         ("linear_regression", linear_regression)])
    +    pipeline.fit(X[:, np.newaxis], y)
    +
    +    # Evaluate the models using crossvalidation
    +    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
    +                             scoring="neg_mean_squared_error", cv=10)
    +
    +    X_test = np.linspace(0, 1, 100)
    +    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    +    plt.plot(X_test, true_fun(X_test), label="True function")
    +    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
    +    plt.xlabel("x")
    +    plt.ylabel("y")
    +    plt.xlim((0, 1))
    +    plt.ylim((-2, 2))
    +    plt.legend(loc="best")
    +    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
    +        degrees[i], -scores.mean(), scores.std()))
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Various steps in cross-validation#

    +

    When the repetitive splitting of the data set is done randomly, +samples may accidently end up in a fast majority of the splits in +either training or test set. Such samples may have an unbalanced +influence on either model building or prediction evaluation. To avoid +this \(k\)-fold cross-validation structures the data splitting. The +samples are divided into \(k\) more or less equally sized exhaustive and +mutually exclusive subsets. In turn (at each split) one of these +subsets plays the role of the test set while the union of the +remaining subsets constitutes the training set. Such a splitting +warrants a balanced representation of each sample in both training and +test set over the splits. Still the division into the \(k\) subsets +involves a degree of randomness. This may be fully excluded when +choosing \(k=n\). This particular case is referred to as leave-one-out +cross-validation (LOOCV).

    +
    +
    +

    Cross-validation in brief#

    +

    For the various values of \(k\)

    +
      +
    1. shuffle the dataset randomly.

    2. +
    3. Split the dataset into \(k\) groups.

    4. +
    5. For each unique group:

    6. +
    +

    a. Decide which group to use as set for test data

    +

    b. Take the remaining groups as a training data set

    +

    c. Fit a model on the training set and evaluate it on the test set

    +

    d. Retain the evaluation score and discard the model

    +
      +
    1. Summarize the model using the sample of model evaluation scores

    2. +
    +
    +
    +

    Code Example for Cross-validation and \(k\)-fold Cross-validation#

    +

    The code here uses Ridge regression with cross-validation (CV) resampling and \(k\)-fold CV in order to fit a specific polynomial.

    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import KFold
    +from sklearn.linear_model import Ridge
    +from sklearn.model_selection import cross_val_score
    +from sklearn.preprocessing import PolynomialFeatures
    +
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(3155)
    +
    +# Generate the data.
    +nsamples = 100
    +x = np.random.randn(nsamples)
    +y = 3*x**2 + np.random.randn(nsamples)
    +
    +## Cross-validation on Ridge regression using KFold only
    +
    +# Decide degree on polynomial to fit
    +poly = PolynomialFeatures(degree = 6)
    +
    +# Decide which values of lambda to use
    +nlambdas = 500
    +lambdas = np.logspace(-3, 5, nlambdas)
    +
    +# Initialize a KFold instance
    +k = 5
    +kfold = KFold(n_splits = k)
    +
    +# Perform the cross-validation to estimate MSE
    +scores_KFold = np.zeros((nlambdas, k))
    +
    +i = 0
    +for lmb in lambdas:
    +    ridge = Ridge(alpha = lmb)
    +    j = 0
    +    for train_inds, test_inds in kfold.split(x):
    +        xtrain = x[train_inds]
    +        ytrain = y[train_inds]
    +
    +        xtest = x[test_inds]
    +        ytest = y[test_inds]
    +
    +        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
    +        ridge.fit(Xtrain, ytrain[:, np.newaxis])
    +
    +        Xtest = poly.fit_transform(xtest[:, np.newaxis])
    +        ypred = ridge.predict(Xtest)
    +
    +        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
    +
    +        j += 1
    +    i += 1
    +
    +
    +estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
    +
    +## Cross-validation using cross_val_score from sklearn along with KFold
    +
    +# kfold is an instance initialized above as:
    +# kfold = KFold(n_splits = k)
    +
    +estimated_mse_sklearn = np.zeros(nlambdas)
    +i = 0
    +for lmb in lambdas:
    +    ridge = Ridge(alpha = lmb)
    +
    +    X = poly.fit_transform(x[:, np.newaxis])
    +    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
    +
    +    # cross_val_score return an array containing the estimated negative mse for every fold.
    +    # we have to the the mean of every array in order to get an estimate of the mse of the model
    +    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
    +
    +    i += 1
    +
    +## Plot and compare the slightly different ways to perform cross-validation
    +
    +plt.figure()
    +
    +plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
    +plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
    +
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('mse')
    +
    +plt.legend()
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    More examples on bootstrap and cross-validation and errors#

    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +testerror = np.zeros(Maxpolydegree)
    +trainingerror = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +
    +trials = 100
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +
    +# loop over trials in order to estimate the expectation value of the MSE
    +    testerror[polydegree] = 0.0
    +    trainingerror[polydegree] = 0.0
    +    for samples in range(trials):
    +        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    +        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    +        ypred = model.predict(x_train)
    +        ytilde = model.predict(x_test)
    +        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    +        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    +
    +    testerror[polydegree] /= trials
    +    trainingerror[polydegree] /= trials
    +    print("Degree of polynomial: %3d"% polynomial[polydegree])
    +    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    +    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    +
    +plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    +plt.plot(polynomial, np.log10(testerror), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \(X\) without the first column of ones.

    +
    +
    +

    The same example but now with cross-validation#

    +

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.metrics import mean_squared_error
    +from sklearn.model_selection import KFold
    +from sklearn.model_selection import cross_val_score
    +
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +estimated_mse_sklearn = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +k =5
    +kfold = KFold(n_splits = k)
    +
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +        OLS = LinearRegression(fit_intercept=False)
    +# loop over trials in order to estimate the expectation value of the MSE
    +    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    +#[:, np.newaxis]
    +    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    +
    +plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Material for the lab sessions#

    +

    This week we will discuss during the first hour of each lab session +some technicalities related to the project and methods for updating +the learning like ADAgrad, RMSprop and ADAM. As teaching material, see +the jupyter-notebook from week 37 (September 12-16).

    +

    For the lab session, the following video on cross validation (from 2024), could be helpful, see https://www.youtube.com/watch?v=T9jjWsmsd1o

    +

    See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at https://youtu.be/J_41Hld6tTU

    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week39.html b/doc/LectureNotes/_build/html/week39.html new file mode 100644 index 000000000..a2746005a --- /dev/null +++ b/doc/LectureNotes/_build/html/week39.html @@ -0,0 +1,2073 @@ + + + + + + + + + + + Week 39: Resampling methods and logistic regression — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 39: Resampling methods and logistic regression

    + +
    + +
    +
    + + + + +
    + + +
    +

    Week 39: Resampling methods and logistic regression#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo

    +

    Date: Week 39

    +
    +

    Plan for week 39, September 22-26, 2025#

    +

    Material for the lecture on Monday September 22.

    +
      +
    1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff

    2. +
    3. Logistic regression, our first classification encounter and a stepping stone towards neural networks

    4. +
    5. Video of lecture

    6. +
    7. Whiteboard notes

    8. +
    +
    +
    +

    Readings and Videos, resampling methods#

    +
      +
    1. Raschka et al, pages 175-192

    2. +
    3. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See https://link.springer.com/book/10.1007/978-0-387-84858-7.

    4. +
    5. Video on bias-variance tradeoff

    6. +
    7. Video on Bootstrapping

    8. +
    9. Video on cross validation

    10. +
    +
    +
    +

    Readings and Videos, logistic regression#

    +
      +
    1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression

    2. +
    3. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization

    4. +
    5. Video on Logistic regression

    6. +
    7. Yet another video on logistic regression

    8. +
    +
    +
    +

    Lab sessions week 39#

    +

    Material for the lab sessions on Tuesday and Wednesday.

    +
      +
    1. Discussions on how to structure your report for the first project

    2. +
    3. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.

    4. +
    5. Work on project 1, in particular resampling methods like cross-validation and bootstrap. For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11.

    6. +
    7. Video on how to write scientific reports recorded during one of the lab sessions

    8. +
    9. A general guideline can be found at CompPhysics/MachineLearning.

    10. +
    +
    +
    +

    Lecture material#

    +
    +
    +

    Resampling methods#

    +

    Resampling methods are an indispensable tool in modern +statistics. They involve repeatedly drawing samples from a training +set and refitting a model of interest on each sample in order to +obtain additional information about the fitted model. For example, in +order to estimate the variability of a linear regression fit, we can +repeatedly draw different samples from the training data, fit a linear +regression to each new sample, and then examine the extent to which +the resulting fits differ. Such an approach may allow us to obtain +information that would not be available from fitting the model only +once using the original training sample.

    +

    Two resampling methods are often used in Machine Learning analyses,

    +
      +
    1. The bootstrap method

    2. +
    3. and Cross-Validation

    4. +
    +

    In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.

    +
    +
    +

    Resampling approaches can be computationally expensive#

    +

    Resampling approaches can be computationally expensive, because they +involve fitting the same statistical method multiple times using +different subsets of the training data. However, due to recent +advances in computing power, the computational requirements of +resampling methods generally are not prohibitive. In this chapter, we +discuss two of the most commonly used resampling methods, +cross-validation and the bootstrap. Both methods are important tools +in the practical application of many statistical learning +procedures. For example, cross-validation can be used to estimate the +test error associated with a given statistical learning method in +order to evaluate its performance, or to select the appropriate level +of flexibility. The process of evaluating a model’s performance is +known as model assessment, whereas the process of selecting the proper +level of flexibility for a model is known as model selection. The +bootstrap is widely used.

    +
    +
    +

    Why resampling methods ?#

    +

    Statistical analysis.

    +
      +
    • Our simulations can be treated as computer experiments. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.

    • +
    • The results can be analysed with the same statistical tools as we would use when analysing experimental data.

    • +
    • As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.

    • +
    +
    +
    +

    Statistical analysis#

    +
      +
    • As in other experiments, many numerical experiments have two classes of errors:

      +
        +
      • Statistical errors

      • +
      • Systematical errors

      • +
      +
    • +
    • Statistical errors can be estimated using standard tools from statistics

    • +
    • Systematical errors are method specific and must be treated differently from case to case.

    • +
    +
    +
    +

    Resampling methods#

    +

    With all these analytical equations for both the OLS and Ridge +regression, we will now outline how to assess a given model. This will +lead to a discussion of the so-called bias-variance tradeoff (see +below) and so-called resampling methods.

    +

    One of the quantities we have discussed as a way to measure errors is +the mean-squared error (MSE), mainly used for fitting of continuous +functions. Another choice is the absolute error.

    +

    In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data, +we discuss the

    +
      +
    1. prediction error or simply the test error \(\mathrm{Err_{Test}}\), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the

    2. +
    3. training error \(\mathrm{Err_{Train}}\), which is the average loss over the training data.

    4. +
    +

    As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error. +For a certain level of complexity the test error will reach minimum, before starting to increase again. The +training error reaches a saturation.

    +
    +
    +

    Resampling methods: Bootstrap#

    +

    Bootstrapping is a non-parametric approach to statistical inference +that substitutes computation for more traditional distributional +assumptions and asymptotic results. Bootstrapping offers a number of +advantages:

    +
      +
    1. The bootstrap is quite general, although there are some cases in which it fails.

    2. +
    3. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.

    4. +
    5. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.

    6. +
    7. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).

    8. +
    +

    The textbook by Davison on the Bootstrap Methods and their Applications provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by Efron and Tibshirani.

    +
    +
    +

    The bias-variance tradeoff#

    +

    We will discuss the bias-variance tradeoff in the context of +continuous predictions such as regression. However, many of the +intuitions and ideas discussed here also carry over to classification +tasks. Consider a dataset \(\mathcal{D}\) consisting of the data +\(\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}\).

    +

    Let us assume that the true data is generated from a noisy model

    +
    +\[ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon} +\]
    +

    where \(\epsilon\) is normally distributed with mean zero and standard deviation \(\sigma^2\).

    +

    In our derivation of the ordinary least squares method we defined then +an approximation to the function \(f\) in terms of the parameters +\(\boldsymbol{\theta}\) and the design matrix \(\boldsymbol{X}\) which embody our model, +that is \(\boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta}\).

    +

    Thereafter we found the parameters \(\boldsymbol{\theta}\) by optimizing the means squared error via the so-called cost function

    +
    +\[ +C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +\]
    +

    We can rewrite this as

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2. +\]
    +

    The three terms represent the square of the bias of the learning +method, which can be thought of as the error caused by the simplifying +assumptions built into the method. The second term represents the +variance of the chosen model and finally the last terms is variance of +the error \(\boldsymbol{\epsilon}\).

    +

    To derive this equation, we need to recall that the variance of \(\boldsymbol{y}\) and \(\boldsymbol{\epsilon}\) are both equal to \(\sigma^2\). The mean value of \(\boldsymbol{\epsilon}\) is by definition equal to zero. Furthermore, the function \(f\) is not a stochastics variable, idem for \(\boldsymbol{\tilde{y}}\). +We use a more compact notation in terms of the expectation value

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right], +\]
    +

    and adding and subtracting \(\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\) we get

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right], +\]
    +

    which, using the abovementioned expectation values can be rewritten as

    +
    +\[ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2, +\]
    +

    that is the rewriting in terms of the so-called bias, the variance of the model \(\boldsymbol{\tilde{y}}\) and the variance of \(\boldsymbol{\epsilon}\).

    +

    Note that in order to derive these equations we have assumed we can replace the unknown function \(\boldsymbol{f}\) with the target/output data \(\boldsymbol{y}\).

    +
    +
    +

    A way to Read the Bias-Variance Tradeoff#

    + + +

    Figure 1:

    +
    +
    +

    Understanding what happens#

    +
    +
    +
    %matplotlib inline
    +
    +import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.model_selection import train_test_split
    +from sklearn.pipeline import make_pipeline
    +from sklearn.utils import resample
    +
    +np.random.seed(2018)
    +
    +n = 40
    +n_boostraps = 100
    +maxdegree = 14
    +
    +
    +# Make data set.
    +x = np.linspace(-3, 3, n).reshape(-1, 1)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
    +error = np.zeros(maxdegree)
    +bias = np.zeros(maxdegree)
    +variance = np.zeros(maxdegree)
    +polydegree = np.zeros(maxdegree)
    +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    +
    +for degree in range(maxdegree):
    +    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    +    y_pred = np.empty((y_test.shape[0], n_boostraps))
    +    for i in range(n_boostraps):
    +        x_, y_ = resample(x_train, y_train)
    +        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    +
    +    polydegree[degree] = degree
    +    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    +    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    +    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    +    print('Polynomial degree:', degree)
    +    print('Error:', error[degree])
    +    print('Bias^2:', bias[degree])
    +    print('Var:', variance[degree])
    +    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
    +
    +plt.plot(polydegree, error, label='Error')
    +plt.plot(polydegree, bias, label='bias')
    +plt.plot(polydegree, variance, label='Variance')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Summing up#

    +

    The bias-variance tradeoff summarizes the fundamental tension in +machine learning, particularly supervised learning, between the +complexity of a model and the amount of training data needed to train +it. Since data is often limited, in practice it is often useful to +use a less-complex model with higher bias, that is a model whose asymptotic +performance is worse than another model because it is easier to +train and less sensitive to sampling noise arising from having a +finite-sized training dataset (smaller variance).

    +

    The above equations tell us that in +order to minimize the expected test error, we need to select a +statistical learning method that simultaneously achieves low variance +and low bias. Note that variance is inherently a nonnegative quantity, +and squared bias is also nonnegative. Hence, we see that the expected +test MSE can never lie below \(Var(\epsilon)\), the irreducible error.

    +

    What do we mean by the variance and bias of a statistical learning +method? The variance refers to the amount by which our model would change if we +estimated it using a different training data set. Since the training +data are used to fit the statistical learning method, different +training data sets will result in a different estimate. But ideally the +estimate for our model should not vary too much between training +sets. However, if a method has high variance then small changes in +the training data can result in large changes in the model. In general, more +flexible statistical methods have higher variance.

    +

    You may also find this recent article of interest.

    +
    +
    +

    Another Example from Scikit-Learn’s Repository#

    +

    This example demonstrates the problems of underfitting and overfitting and +how we can use linear regression with polynomial features to approximate +nonlinear functions. The plot shows the function that we want to approximate, +which is a part of the cosine function. In addition, the samples from the +real function and the approximations of different models are displayed. The +models have polynomial features of different degrees. We can see that a +linear function (polynomial with degree 1) is not sufficient to fit the +training samples. This is called underfitting. A polynomial of degree 4 +approximates the true function almost perfectly. However, for higher degrees +the model will overfit the training data, i.e. it learns the noise of the +training data. +We evaluate quantitatively overfitting and underfitting by using +cross-validation. We calculate the mean squared error (MSE) on the validation +set, the higher, the less likely the model generalizes correctly from the +training data.

    +
    +
    +
    
    +
    +#print(__doc__)
    +
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.pipeline import Pipeline
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.linear_model import LinearRegression
    +from sklearn.model_selection import cross_val_score
    +
    +
    +def true_fun(X):
    +    return np.cos(1.5 * np.pi * X)
    +
    +np.random.seed(0)
    +
    +n_samples = 30
    +degrees = [1, 4, 15]
    +
    +X = np.sort(np.random.rand(n_samples))
    +y = true_fun(X) + np.random.randn(n_samples) * 0.1
    +
    +plt.figure(figsize=(14, 5))
    +for i in range(len(degrees)):
    +    ax = plt.subplot(1, len(degrees), i + 1)
    +    plt.setp(ax, xticks=(), yticks=())
    +
    +    polynomial_features = PolynomialFeatures(degree=degrees[i],
    +                                             include_bias=False)
    +    linear_regression = LinearRegression()
    +    pipeline = Pipeline([("polynomial_features", polynomial_features),
    +                         ("linear_regression", linear_regression)])
    +    pipeline.fit(X[:, np.newaxis], y)
    +
    +    # Evaluate the models using crossvalidation
    +    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
    +                             scoring="neg_mean_squared_error", cv=10)
    +
    +    X_test = np.linspace(0, 1, 100)
    +    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    +    plt.plot(X_test, true_fun(X_test), label="True function")
    +    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
    +    plt.xlabel("x")
    +    plt.ylabel("y")
    +    plt.xlim((0, 1))
    +    plt.ylim((-2, 2))
    +    plt.legend(loc="best")
    +    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
    +        degrees[i], -scores.mean(), scores.std()))
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Various steps in cross-validation#

    +

    When the repetitive splitting of the data set is done randomly, +samples may accidently end up in a fast majority of the splits in +either training or test set. Such samples may have an unbalanced +influence on either model building or prediction evaluation. To avoid +this \(k\)-fold cross-validation structures the data splitting. The +samples are divided into \(k\) more or less equally sized exhaustive and +mutually exclusive subsets. In turn (at each split) one of these +subsets plays the role of the test set while the union of the +remaining subsets constitutes the training set. Such a splitting +warrants a balanced representation of each sample in both training and +test set over the splits. Still the division into the \(k\) subsets +involves a degree of randomness. This may be fully excluded when +choosing \(k=n\). This particular case is referred to as leave-one-out +cross-validation (LOOCV).

    +
    +
    +

    Cross-validation in brief#

    +

    For the various values of \(k\)

    +
      +
    1. shuffle the dataset randomly.

    2. +
    3. Split the dataset into \(k\) groups.

    4. +
    5. For each unique group:

    6. +
    +

    a. Decide which group to use as set for test data

    +

    b. Take the remaining groups as a training data set

    +

    c. Fit a model on the training set and evaluate it on the test set

    +

    d. Retain the evaluation score and discard the model

    +
      +
    1. Summarize the model using the sample of model evaluation scores

    2. +
    +
    +
    +

    Code Example for Cross-validation and \(k\)-fold Cross-validation#

    +

    The code here uses Ridge regression with cross-validation (CV) resampling and \(k\)-fold CV in order to fit a specific polynomial.

    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import KFold
    +from sklearn.linear_model import Ridge
    +from sklearn.model_selection import cross_val_score
    +from sklearn.preprocessing import PolynomialFeatures
    +
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(3155)
    +
    +# Generate the data.
    +nsamples = 100
    +x = np.random.randn(nsamples)
    +y = 3*x**2 + np.random.randn(nsamples)
    +
    +## Cross-validation on Ridge regression using KFold only
    +
    +# Decide degree on polynomial to fit
    +poly = PolynomialFeatures(degree = 6)
    +
    +# Decide which values of lambda to use
    +nlambdas = 500
    +lambdas = np.logspace(-3, 5, nlambdas)
    +
    +# Initialize a KFold instance
    +k = 5
    +kfold = KFold(n_splits = k)
    +
    +# Perform the cross-validation to estimate MSE
    +scores_KFold = np.zeros((nlambdas, k))
    +
    +i = 0
    +for lmb in lambdas:
    +    ridge = Ridge(alpha = lmb)
    +    j = 0
    +    for train_inds, test_inds in kfold.split(x):
    +        xtrain = x[train_inds]
    +        ytrain = y[train_inds]
    +
    +        xtest = x[test_inds]
    +        ytest = y[test_inds]
    +
    +        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
    +        ridge.fit(Xtrain, ytrain[:, np.newaxis])
    +
    +        Xtest = poly.fit_transform(xtest[:, np.newaxis])
    +        ypred = ridge.predict(Xtest)
    +
    +        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
    +
    +        j += 1
    +    i += 1
    +
    +
    +estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
    +
    +## Cross-validation using cross_val_score from sklearn along with KFold
    +
    +# kfold is an instance initialized above as:
    +# kfold = KFold(n_splits = k)
    +
    +estimated_mse_sklearn = np.zeros(nlambdas)
    +i = 0
    +for lmb in lambdas:
    +    ridge = Ridge(alpha = lmb)
    +
    +    X = poly.fit_transform(x[:, np.newaxis])
    +    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
    +
    +    # cross_val_score return an array containing the estimated negative mse for every fold.
    +    # we have to the the mean of every array in order to get an estimate of the mse of the model
    +    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
    +
    +    i += 1
    +
    +## Plot and compare the slightly different ways to perform cross-validation
    +
    +plt.figure()
    +
    +plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
    +#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
    +
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('mse')
    +
    +plt.legend()
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    More examples on bootstrap and cross-validation and errors#

    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +testerror = np.zeros(Maxpolydegree)
    +trainingerror = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +
    +trials = 100
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +
    +# loop over trials in order to estimate the expectation value of the MSE
    +    testerror[polydegree] = 0.0
    +    trainingerror[polydegree] = 0.0
    +    for samples in range(trials):
    +        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    +        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    +        ypred = model.predict(x_train)
    +        ytilde = model.predict(x_test)
    +        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    +        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    +
    +    testerror[polydegree] /= trials
    +    trainingerror[polydegree] /= trials
    +    print("Degree of polynomial: %3d"% polynomial[polydegree])
    +    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    +    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    +
    +plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    +plt.plot(polynomial, np.log10(testerror), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \(X\) without the first column of ones.

    +
    +
    +

    The same example but now with cross-validation#

    +

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.metrics import mean_squared_error
    +from sklearn.model_selection import KFold
    +from sklearn.model_selection import cross_val_score
    +
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +estimated_mse_sklearn = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +k =5
    +kfold = KFold(n_splits = k)
    +
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +        OLS = LinearRegression(fit_intercept=False)
    +# loop over trials in order to estimate the expectation value of the MSE
    +    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    +#[:, np.newaxis]
    +    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    +
    +plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Logistic Regression#

    +

    In linear regression our main interest was centered on learning the +coefficients of a functional fit (say a polynomial) in order to be +able to predict the response of a continuous variable on some unseen +data. The fit to the continuous variable \(y_i\) is based on some +independent variables \(\boldsymbol{x}_i\). Linear regression resulted in +analytical expressions for standard ordinary Least Squares or Ridge +regression (in terms of matrices to invert) for several quantities, +ranging from the variance and thereby the confidence intervals of the +parameters \(\boldsymbol{\theta}\) to the mean squared error. If we can invert +the product of the design matrices, linear regression gives then a +simple recipe for fitting our data.

    +
    +
    +

    Classification problems#

    +

    Classification problems, however, are concerned with outcomes taking +the form of discrete variables (i.e. categories). We may for example, +on the basis of DNA sequencing for a number of patients, like to find +out which mutations are important for a certain disease; or based on +scans of various patients’ brains, figure out if there is a tumor or +not; or given a specific physical system, we’d like to identify its +state, say whether it is an ordered or disordered system (typical +situation in solid state physics); or classify the status of a +patient, whether she/he has a stroke or not and many other similar +situations.

    +

    The most common situation we encounter when we apply logistic +regression is that of two possible outcomes, normally denoted as a +binary outcome, true or false, positive or negative, success or +failure etc.

    +
    +
    +

    Optimization and Deep learning#

    +

    Logistic regression will also serve as our stepping stone towards +neural network algorithms and supervised deep learning. For logistic +learning, the minimization of the cost function leads to a non-linear +equation in the parameters \(\boldsymbol{\theta}\). The optimization of the +problem calls therefore for minimization algorithms. This forms the +bottle neck of all machine learning algorithms, namely how to find +reliable minima of a multi-variable function. This leads us to the +family of gradient descent methods. The latter are the working horses +of basically all modern machine learning algorithms.

    +

    We note also that many of the topics discussed here on logistic +regression are also commonly used in modern supervised Deep Learning +models, as we will see later.

    +
    +
    +

    Basics#

    +

    We consider the case where the outputs/targets, also called the +responses or the outcomes, \(y_i\) are discrete and only take values +from \(k=0,\dots,K-1\) (i.e. \(K\) classes).

    +

    The goal is to predict the +output classes from the design matrix \(\boldsymbol{X}\in\mathbb{R}^{n\times p}\) +made of \(n\) samples, each of which carries \(p\) features or predictors. The +primary goal is to identify the classes to which new unseen samples +belong.

    +

    Let us specialize to the case of two classes only, with outputs +\(y_i=0\) and \(y_i=1\). Our outcomes could represent the status of a +credit card user that could default or not on her/his credit card +debt. That is

    +
    +\[\begin{split} +y_i = \begin{bmatrix} 0 & \mathrm{no}\\ 1 & \mathrm{yes} \end{bmatrix}. +\end{split}\]
    +
    +
    +

    Linear classifier#

    +

    Before moving to the logistic model, let us try to use our linear +regression model to classify these two outcomes. We could for example +fit a linear model to the default case if \(y_i > 0.5\) and the no +default case \(y_i \leq 0.5\).

    +

    We would then have our +weighted linear combination, namely

    + +
    +
    +\[ +\begin{equation} +\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} + \boldsymbol{\epsilon}, +\label{_auto1} \tag{1} +\end{equation} +\]
    +

    where \(\boldsymbol{y}\) is a vector representing the possible outcomes, \(\boldsymbol{X}\) is our +\(n\times p\) design matrix and \(\boldsymbol{\theta}\) represents our estimators/predictors.

    +
    +
    +

    Some selected properties#

    +

    The main problem with our function is that it takes values on the +entire real axis. In the case of logistic regression, however, the +labels \(y_i\) are discrete variables. A typical example is the credit +card data discussed below here, where we can set the state of +defaulting the debt to \(y_i=1\) and not to \(y_i=0\) for one the persons +in the data set (see the full example below).

    +

    One simple way to get a discrete output is to have sign +functions that map the output of a linear regressor to values \(\{0,1\}\), +\(f(s_i)=sign(s_i)=1\) if \(s_i\ge 0\) and 0 if otherwise. +We will encounter this model in our first demonstration of neural networks.

    +

    Historically it is called the perceptron model in the machine learning +literature. This model is extremely simple. However, in many cases it is more +favorable to use a ``soft” classifier that outputs +the probability of a given category. This leads us to the logistic function.

    +
    +
    +

    Simple example#

    +

    The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person’s against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.

    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +from IPython.display import display
    +from pylab import plt, mpl
    +mpl.rcParams['font.family'] = 'serif'
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("chddata.csv"),'r')
    +
    +# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
    +chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
    +chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
    +output = chd['CHD']
    +age = chd['Age']
    +agegroup = chd['Agegroup']
    +numberID  = chd['ID'] 
    +display(chd)
    +
    +plt.scatter(age, output, marker='o')
    +plt.axis([18,70.0,-0.1, 1.2])
    +plt.xlabel(r'Age')
    +plt.ylabel(r'CHD')
    +plt.title(r'Age distribution and Coronary heart disease')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Plotting the mean value for each group#

    +

    What we could attempt however is to plot the mean value for each group.

    +
    +
    +
    agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
    +group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
    +plt.plot(group, agegroupmean, "r-")
    +plt.axis([0,9,0, 1.0])
    +plt.xlabel(r'Age group')
    +plt.ylabel(r'CHD mean values')
    +plt.title(r'Mean values for each age group')
    +plt.show()
    +
    +
    +
    +
    +

    We are now trying to find a function \(f(y\vert x)\), that is a function which gives us an expected value for the output \(y\) with a given input \(x\). +In standard linear regression with a linear dependence on \(x\), we would write this in terms of our model

    +
    +\[ +f(y_i\vert x_i)=\theta_0+\theta_1 x_i. +\]
    +

    This expression implies however that \(f(y_i\vert x_i)\) could take any +value from minus infinity to plus infinity. If we however let +\(f(y\vert y)\) be represented by the mean value, the above example +shows us that we can constrain the function to take values between +zero and one, that is we have \(0 \le f(y_i\vert x_i) \le 1\). Looking +at our last curve we see also that it has an S-shaped form. This leads +us to a very popular model for the function \(f\), namely the so-called +Sigmoid function or logistic model. We will consider this function as +representing the probability for finding a value of \(y_i\) with a given +\(x_i\).

    +
    +
    +

    The logistic function#

    +

    Another widely studied model, is the so-called +perceptron model, which is an example of a “hard classification” model. We +will encounter this model when we discuss neural networks as +well. Each datapoint is deterministically assigned to a category (i.e +\(y_i=0\) or \(y_i=1\)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a “soft” +classifier that outputs the probability of a given category rather +than a single value. For example, given \(x_i\), the classifier +outputs the probability of being in a category \(k\). Logistic regression +is the most common example of a so-called soft classifier. In logistic +regression, the probability that a data point \(x_i\) +belongs to a category \(y_i=\{0,1\}\) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,

    +
    +\[ +p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}. +\]
    +

    Note that \(1-p(t)= p(-t)\).

    +
    +
    +

    Examples of likelihood functions used in logistic regression and nueral networks#

    +

    The following code plots the logistic function, the step function and other functions we will encounter from here and on.

    +
    +
    +
    """The sigmoid function (or the logistic curve) is a
    +function that takes any real number, z, and outputs a number (0,1).
    +It is useful in neural networks for assigning weights on a relative scale.
    +The value z is the weighted sum of parameters involved in the learning algorithm."""
    +
    +import numpy
    +import matplotlib.pyplot as plt
    +import math as mt
    +
    +z = numpy.arange(-5, 5, .1)
    +sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
    +sigma = sigma_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, sigma)
    +ax.set_ylim([-0.1, 1.1])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sigmoid function')
    +
    +plt.show()
    +
    +"""Step Function"""
    +z = numpy.arange(-5, 5, .02)
    +step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
    +step = step_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, step)
    +ax.set_ylim([-0.5, 1.5])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('step function')
    +
    +plt.show()
    +
    +"""tanh Function"""
    +z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
    +t = numpy.tanh(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, t)
    +ax.set_ylim([-1.0, 1.0])
    +ax.set_xlim([-2*mt.pi,2*mt.pi])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('tanh function')
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Two parameters#

    +

    We assume now that we have two classes with \(y_i\) either \(0\) or \(1\). Furthermore we assume also that we have only two parameters \(\theta\) in our fitting of the Sigmoid function, that is we define probabilities

    +
    +\[\begin{split} +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} +\end{split}\]
    +

    where \(\boldsymbol{\theta}\) are the weights we wish to extract from data, in our case \(\theta_0\) and \(\theta_1\).

    +

    Note that we used

    +
    +\[ +p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}). +\]
    +
    +
    +

    Maximum likelihood#

    +

    In order to define the total likelihood for all possible outcomes from a
    +dataset \(\mathcal{D}=\{(y_i,x_i)\}\), with the binary labels +\(y_i\in\{0,1\}\) and where the data points are drawn independently, we use the so-called Maximum Likelihood Estimation (MLE) principle. +We aim thus at maximizing +the probability of seeing the observed data. We can then approximate the +likelihood in terms of the product of the individual probabilities of a specific outcome \(y_i\), that is

    +
    +\[\begin{split} +\begin{align*} +P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\ +\end{align*} +\end{split}\]
    +

    from which we obtain the log-likelihood and our cost/loss function

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right). +\]
    +
    +
    +

    The cost function rewritten#

    +

    Reordering the logarithms, we can rewrite the cost/loss function as

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). +\]
    +

    The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \(\theta\). +Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). +\]
    +

    This equation is known in statistics as the cross entropy. Finally, we note that just as in linear regression, +in practice we often supplement the cross-entropy with additional regularization terms, usually \(L_1\) and \(L_2\) regularization as we did for Ridge and Lasso regression.

    +
    +
    +

    Minimizing the cross entropy#

    +

    The cross entropy is a convex function of the weights \(\boldsymbol{\theta}\) and, +therefore, any local minimizer is a global minimizer.

    +

    Minimizing this +cost function with respect to the two parameters \(\theta_0\) and \(\theta_1\) we obtain

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right), +\]
    +

    and

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right). +\]
    +
    +
    +

    A more compact expression#

    +

    Let us now define a vector \(\boldsymbol{y}\) with \(n\) elements \(y_i\), an +\(n\times p\) matrix \(\boldsymbol{X}\) which contains the \(x_i\) values and a +vector \(\boldsymbol{p}\) of fitted probabilities \(p(y_i\vert x_i,\boldsymbol{\theta})\). We can rewrite in a more compact form the first +derivative of the cost function as

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). +\]
    +

    If we in addition define a diagonal matrix \(\boldsymbol{W}\) with elements +\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\), we can obtain a compact expression of the second derivative as

    +
    +\[ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +\]
    +
    +
    +

    Extending to more predictors#

    +

    Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \(p\) predictors

    +
    +\[ +\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p. +\]
    +

    Here we defined \(\boldsymbol{x}=[1,x_1,x_2,\dots,x_p]\) and \(\boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p]\) leading to

    +
    +\[ +p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}. +\]
    +
    +
    +

    Including more classes#

    +

    Till now we have mainly focused on two classes, the so-called binary +system. Suppose we wish to extend to \(K\) classes. Let us for the sake +of simplicity assume we have only two predictors. We have then following model

    +
    +\[ +\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1, +\]
    +

    and

    +
    +\[ +\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1, +\]
    +

    and so on till the class \(C=K-1\) class

    +
    +\[ +\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1, +\]
    +

    and the model is specified in term of \(K-1\) so-called log-odds or +logit transformations.

    +
    +
    +

    More classes#

    +

    In our discussion of neural networks we will encounter the above again +in terms of a slightly modified function, the so-called Softmax function.

    +

    The softmax function is used in various multiclass classification +methods, such as multinomial logistic regression (also known as +softmax regression), multiclass linear discriminant analysis, naive +Bayes classifiers, and artificial neural networks. Specifically, in +multinomial logistic regression and linear discriminant analysis, the +input to the function is the result of \(K\) distinct linear functions, +and the predicted probability for the \(k\)-th class given a sample +vector \(\boldsymbol{x}\) and a weighting vector \(\boldsymbol{\theta}\) is (with two +predictors):

    +
    +\[ +p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}. +\]
    +

    It is easy to extend to more predictors. The final class is

    +
    +\[ +p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}, +\]
    +

    and they sum to one. Our earlier discussions were all specialized to +the case with two classes only. It is easy to see from the above that +what we derived earlier is compatible with these equations.

    +

    To find the optimal parameters we would typically use a gradient +descent method. Newton’s method and gradient descent methods are +discussed in the material on optimization +methods.

    +
    +
    +

    Optimization, the central part of any Machine Learning algortithm#

    +

    Almost every problem in machine learning and data science starts with +a dataset \(X\), a model \(g(\theta)\), which is a function of the +parameters \(\theta\) and a cost function \(C(X, g(\theta))\) that allows +us to judge how well the model \(g(\theta)\) explains the observations +\(X\). The model is fit by finding the values of \(\theta\) that minimize +the cost function. Ideally we would be able to solve for \(\theta\) +analytically, however this is not possible in general and we must use +some approximative/numerical method to compute the minimum.

    +
    +
    +

    Revisiting our Logistic Regression case#

    +

    In our discussion on Logistic Regression we studied the +case of +two classes, with \(y_i\) either +\(0\) or \(1\). Furthermore we assumed also that we have only two +parameters \(\theta\) in our fitting, that is we +defined probabilities

    +
    +\[\begin{split} +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} +\end{split}\]
    +

    where \(\boldsymbol{\theta}\) are the weights we wish to extract from data, in our case \(\theta_0\) and \(\theta_1\).

    +
    +
    +

    The equations to solve#

    +

    Our compact equations used a definition of a vector \(\boldsymbol{y}\) with \(n\) +elements \(y_i\), an \(n\times p\) matrix \(\boldsymbol{X}\) which contains the +\(x_i\) values and a vector \(\boldsymbol{p}\) of fitted probabilities +\(p(y_i\vert x_i,\boldsymbol{\theta})\). We rewrote in a more compact form +the first derivative of the cost function as

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). +\]
    +

    If we in addition define a diagonal matrix \(\boldsymbol{W}\) with elements +\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\), we can obtain a compact expression of the second derivative as

    +
    +\[ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +\]
    +

    This defines what is called the Hessian matrix.

    +
    +
    +

    Solving using Newton-Raphson’s method#

    +

    If we can set up these equations, Newton-Raphson’s iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.

    +

    Our iterative scheme is then given by

    +
    +\[ +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}}, +\]
    +

    or in matrix form as

    +
    +\[ +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}. +\]
    +

    The right-hand side is computed with the old values of \(\theta\).

    +

    If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.

    +
    +
    +

    Example code for Logistic Regression#

    +

    Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.

    +
    +
    +
    import numpy as np
    +
    +class LogisticRegression:
    +    """
    +    Logistic Regression for binary and multiclass classification.
    +    """
    +    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
    +        self.lr = lr                  # Learning rate for gradient descent
    +        self.epochs = epochs          # Number of iterations
    +        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
    +        self.verbose = verbose        # Print loss during training if True
    +        self.weights = None
    +        self.multi_class = False      # Will be determined at fit time
    +
    +    def _add_intercept(self, X):
    +        """Add intercept term (column of ones) to feature matrix."""
    +        intercept = np.ones((X.shape[0], 1))
    +        return np.concatenate((intercept, X), axis=1)
    +
    +    def _sigmoid(self, z):
    +        """Sigmoid function for binary logistic."""
    +        return 1 / (1 + np.exp(-z))
    +
    +    def _softmax(self, Z):
    +        """Softmax function for multiclass logistic."""
    +        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
    +        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
    +
    +    def fit(self, X, y):
    +        """
    +        Train the logistic regression model using gradient descent.
    +        Supports binary (sigmoid) and multiclass (softmax) based on y.
    +        """
    +        X = np.array(X)
    +        y = np.array(y)
    +        n_samples, n_features = X.shape
    +
    +        # Add intercept if needed
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +            n_features += 1
    +
    +        # Determine classes and mode (binary vs multiclass)
    +        unique_classes = np.unique(y)
    +        if len(unique_classes) > 2:
    +            self.multi_class = True
    +        else:
    +            self.multi_class = False
    +
    +        # ----- Multiclass case -----
    +        if self.multi_class:
    +            n_classes = len(unique_classes)
    +            # Map original labels to 0...n_classes-1
    +            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
    +            y_indices = np.array([class_to_index[c] for c in y])
    +            # Initialize weight matrix (features x classes)
    +            self.weights = np.zeros((n_features, n_classes))
    +
    +            # One-hot encode y
    +            Y_onehot = np.zeros((n_samples, n_classes))
    +            Y_onehot[np.arange(n_samples), y_indices] = 1
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
    +                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
    +                # Compute gradient (features x classes)
    +                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
    +                # Update weights
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute current loss (categorical cross-entropy)
    +                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
    +                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
    +
    +        # ----- Binary case -----
    +        else:
    +            # Convert y to 0/1 if not already
    +            if not np.array_equal(unique_classes, [0, 1]):
    +                # Map the two classes to 0 and 1
    +                class0, class1 = unique_classes
    +                y_binary = np.where(y == class1, 1, 0)
    +            else:
    +                y_binary = y.copy().astype(int)
    +
    +            # Initialize weights vector (features,)
    +            self.weights = np.zeros(n_features)
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                linear_model = X.dot(self.weights)     # (n_samples,)
    +                probs = self._sigmoid(linear_model)   # (n_samples,)
    +                # Gradient for binary cross-entropy
    +                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute binary cross-entropy loss
    +                    loss = -np.mean(
    +                        y_binary * np.log(probs + 1e-15) + 
    +                        (1 - y_binary) * np.log(1 - probs + 1e-15)
    +                    )
    +                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
    +
    +    def predict_prob(self, X):
    +        """
    +        Compute probability estimates. Returns a 1D array for binary or
    +        a 2D array (n_samples x n_classes) for multiclass.
    +        """
    +        X = np.array(X)
    +        # Add intercept if the model used it
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +        scores = X.dot(self.weights)
    +        if self.multi_class:
    +            return self._softmax(scores)
    +        else:
    +            return self._sigmoid(scores)
    +
    +    def predict(self, X):
    +        """
    +        Predict class labels for samples in X.
    +        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
    +        """
    +        probs = self.predict_prob(X)
    +        if self.multi_class:
    +            # Choose class with highest probability
    +            return np.argmax(probs, axis=1)
    +        else:
    +            # Threshold at 0.5 for binary
    +            return (probs >= 0.5).astype(int)
    +
    +
    +
    +
    +

    The class implements the sigmoid and softmax internally. During fit(), +we check the number of classes: if more than 2, we set +self.multi_class=True and perform multinomial logistic regression. We +one-hot encode the target vector and update a weight matrix with +softmax probabilities. Otherwise, we do standard binary logistic +regression, converting labels to 0/1 if needed and updating a weight +vector. In both cases we use batch gradient descent on the +cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical +stability). Progress (loss) can be printed if verbose=True.

    +
    +
    +
    # Evaluation Metrics
    +#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
    +
    +def accuracy_score(y_true, y_pred):
    +    """Accuracy = (# correct predictions) / (total samples)."""
    +    y_true = np.array(y_true)
    +    y_pred = np.array(y_pred)
    +    return np.mean(y_true == y_pred)
    +
    +def binary_cross_entropy(y_true, y_prob):
    +    """
    +    Binary cross-entropy loss.
    +    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
    +    """
    +    y_true = np.array(y_true)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
    +
    +def categorical_cross_entropy(y_true, y_prob):
    +    """
    +    Categorical cross-entropy loss for multiclass.
    +    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
    +    """
    +    y_true = np.array(y_true, dtype=int)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    # One-hot encode true labels
    +    n_samples, n_classes = y_prob.shape
    +    one_hot = np.zeros_like(y_prob)
    +    one_hot[np.arange(n_samples), y_true] = 1
    +    # Compute cross-entropy
    +    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
    +    return np.mean(loss_vec)
    +
    +
    +
    +
    +
    +

    Synthetic data generation#

    +

    Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2]. +Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.

    +
    +
    +
    import numpy as np
    +
    +def generate_binary_data(n_samples=100, n_features=2, random_state=None):
    +    """
    +    Generate synthetic binary classification data.
    +    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    # Half samples for class 0, half for class 1
    +    n0 = n_samples // 2
    +    n1 = n_samples - n0
    +    # Class 0 around mean -2, class 1 around +2
    +    mean0 = -2 * np.ones(n_features)
    +    mean1 =  2 * np.ones(n_features)
    +    X0 = rng.randn(n0, n_features) + mean0
    +    X1 = rng.randn(n1, n_features) + mean1
    +    X = np.vstack((X0, X1))
    +    y = np.array([0]*n0 + [1]*n1)
    +    return X, y
    +
    +def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
    +    """
    +    Generate synthetic multiclass data with n_classes Gaussian clusters.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    X = []
    +    y = []
    +    samples_per_class = n_samples // n_classes
    +    for cls in range(n_classes):
    +        # Random cluster center for each class
    +        center = rng.uniform(-5, 5, size=n_features)
    +        Xi = rng.randn(samples_per_class, n_features) + center
    +        yi = [cls] * samples_per_class
    +        X.append(Xi)
    +        y.extend(yi)
    +    X = np.vstack(X)
    +    y = np.array(y)
    +    return X, y
    +
    +
    +# Generate and test on binary data
    +X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
    +model_bin = LogisticRegression(lr=0.1, epochs=1000)
    +model_bin.fit(X_bin, y_bin)
    +y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
    +y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
    +
    +acc_bin = accuracy_score(y_bin, y_pred_bin)
    +loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
    +print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
    +#For multiclass:
    +# Generate and test on multiclass data
    +X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
    +model_multi = LogisticRegression(lr=0.1, epochs=1000)
    +model_multi.fit(X_multi, y_multi)
    +y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
    +y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
    +
    +acc_multi = accuracy_score(y_multi, y_pred_multi)
    +loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
    +print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
    +
    +# CSV Export
    +import csv
    +
    +# Export binary results
    +with open('binary_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_bin, y_pred_bin):
    +        writer.writerow([true, pred])
    +
    +# Export multiclass results
    +with open('multiclass_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_multi, y_pred_multi):
    +        writer.writerow([true, pred])
    +
    +
    +
    +
    +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week40.html b/doc/LectureNotes/_build/html/week40.html new file mode 100644 index 000000000..3a6944e4d --- /dev/null +++ b/doc/LectureNotes/_build/html/week40.html @@ -0,0 +1,2002 @@ + + + + + + + + + + + Week 40: Gradient descent methods (continued) and start Neural networks — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 40: Gradient descent methods (continued) and start Neural networks

    + +
    + +
    +
    + + + + +
    + + +
    +

    Week 40: Gradient descent methods (continued) and start Neural networks#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo, Norway

    +

    Date: September 29-October 3, 2025

    +
    +

    Lecture Monday September 29, 2025#

    +
      +
    1. Logistic regression and gradient descent, examples on how to code

    2. +
    + +
      +
    1. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model

    2. +
    3. Video of lecture at https://youtu.be/MS3Tv8FVArs

    4. +
    5. Whiteboard notes at CompPhysics/MachineLearning

    6. +
    +
    +
    +

    Suggested readings and videos#

    +

    Readings and Videos:

    +
      +
    1. The lecture notes for week 40 (these notes)

    2. +
    + +
      +
    1. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)

    2. +
    + + +
      +
    1. Neural Networks demystified at https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs

    2. +
    3. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex

    4. +
    +
    +
    +

    Lab sessions Tuesday and Wednesday#

    +

    Material for the active learning sessions on Tuesday and Wednesday.

    +
      +
    • Work on project 1 and discussions on how to structure your report

    • +
    • No weekly exercises for week 40, project work only

    • +
    • Video on how to write scientific reports recorded during one of the lab sessions at https://youtu.be/tVW1ZDmZnwM

    • +
    • A general guideline can be found at CompPhysics/MachineLearning.

    • +
    +
    +
    +

    Logistic Regression, from last week#

    +

    In linear regression our main interest was centered on learning the +coefficients of a functional fit (say a polynomial) in order to be +able to predict the response of a continuous variable on some unseen +data. The fit to the continuous variable \(y_i\) is based on some +independent variables \(\boldsymbol{x}_i\). Linear regression resulted in +analytical expressions for standard ordinary Least Squares or Ridge +regression (in terms of matrices to invert) for several quantities, +ranging from the variance and thereby the confidence intervals of the +parameters \(\boldsymbol{\theta}\) to the mean squared error. If we can invert +the product of the design matrices, linear regression gives then a +simple recipe for fitting our data.

    +
    +
    +

    Classification problems#

    +

    Classification problems, however, are concerned with outcomes taking +the form of discrete variables (i.e. categories). We may for example, +on the basis of DNA sequencing for a number of patients, like to find +out which mutations are important for a certain disease; or based on +scans of various patients’ brains, figure out if there is a tumor or +not; or given a specific physical system, we’d like to identify its +state, say whether it is an ordered or disordered system (typical +situation in solid state physics); or classify the status of a +patient, whether she/he has a stroke or not and many other similar +situations.

    +

    The most common situation we encounter when we apply logistic +regression is that of two possible outcomes, normally denoted as a +binary outcome, true or false, positive or negative, success or +failure etc.

    +
    +
    +

    Optimization and Deep learning#

    +

    Logistic regression will also serve as our stepping stone towards +neural network algorithms and supervised deep learning. For logistic +learning, the minimization of the cost function leads to a non-linear +equation in the parameters \(\boldsymbol{\theta}\). The optimization of the +problem calls therefore for minimization algorithms.

    +

    As we have discussed earlier, this forms the +bottle neck of all machine learning algorithms, namely how to find +reliable minima of a multi-variable function. This leads us to the +family of gradient descent methods. The latter are the working horses +of basically all modern machine learning algorithms.

    +

    We note also that many of the topics discussed here on logistic +regression are also commonly used in modern supervised Deep Learning +models, as we will see later.

    +
    +
    +

    Basics#

    +

    We consider the case where the outputs/targets, also called the +responses or the outcomes, \(y_i\) are discrete and only take values +from \(k=0,\dots,K-1\) (i.e. \(K\) classes).

    +

    The goal is to predict the +output classes from the design matrix \(\boldsymbol{X}\in\mathbb{R}^{n\times p}\) +made of \(n\) samples, each of which carries \(p\) features or predictors. The +primary goal is to identify the classes to which new unseen samples +belong.

    +

    Last week we specialized to the case of two classes only, with outputs +\(y_i=0\) and \(y_i=1\). Our outcomes could represent the status of a +credit card user that could default or not on her/his credit card +debt. That is

    +
    +\[\begin{split} +y_i = \begin{bmatrix} 0 & \mathrm{no}\\ 1 & \mathrm{yes} \end{bmatrix}. +\end{split}\]
    +
    +
    +

    Two parameters#

    +

    We assume now that we have two classes with \(y_i\) either \(0\) or \(1\). Furthermore we assume also that we have only two parameters \(\theta\) in our fitting of the Sigmoid function, that is we define probabilities

    +
    +\[\begin{split} +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} +\end{split}\]
    +

    where \(\boldsymbol{\theta}\) are the weights we wish to extract from data, in our case \(\theta_0\) and \(\theta_1\).

    +

    Note that we used

    +
    +\[ +p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}). +\]
    +
    +
    +

    Maximum likelihood#

    +

    In order to define the total likelihood for all possible outcomes from a
    +dataset \(\mathcal{D}=\{(y_i,x_i)\}\), with the binary labels +\(y_i\in\{0,1\}\) and where the data points are drawn independently, we use the so-called Maximum Likelihood Estimation (MLE) principle. +We aim thus at maximizing +the probability of seeing the observed data. We can then approximate the +likelihood in terms of the product of the individual probabilities of a specific outcome \(y_i\), that is

    +
    +\[\begin{split} +\begin{align*} +P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\ +\end{align*} +\end{split}\]
    +

    from which we obtain the log-likelihood and our cost/loss function

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right). +\]
    +
    +
    +

    The cost function rewritten#

    +

    Reordering the logarithms, we can rewrite the cost/loss function as

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). +\]
    +

    The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \(\theta\). +Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). +\]
    +

    This equation is known in statistics as the cross entropy. Finally, we note that just as in linear regression, +in practice we often supplement the cross-entropy with additional regularization terms, usually \(L_1\) and \(L_2\) regularization as we did for Ridge and Lasso regression.

    +
    +
    +

    Minimizing the cross entropy#

    +

    The cross entropy is a convex function of the weights \(\boldsymbol{\theta}\) and, +therefore, any local minimizer is a global minimizer.

    +

    Minimizing this +cost function with respect to the two parameters \(\theta_0\) and \(\theta_1\) we obtain

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right), +\]
    +

    and

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right). +\]
    +
    +
    +

    A more compact expression#

    +

    Let us now define a vector \(\boldsymbol{y}\) with \(n\) elements \(y_i\), an +\(n\times p\) matrix \(\boldsymbol{X}\) which contains the \(x_i\) values and a +vector \(\boldsymbol{p}\) of fitted probabilities \(p(y_i\vert x_i,\boldsymbol{\theta})\). We can rewrite in a more compact form the first +derivative of the cost function as

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). +\]
    +

    If we in addition define a diagonal matrix \(\boldsymbol{W}\) with elements +\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\), we can obtain a compact expression of the second derivative as

    +
    +\[ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +\]
    +
    +
    +

    Extending to more predictors#

    +

    Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \(p\) predictors

    +
    +\[ +\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p. +\]
    +

    Here we defined \(\boldsymbol{x}=[1,x_1,x_2,\dots,x_p]\) and \(\boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p]\) leading to

    +
    +\[ +p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}. +\]
    +
    +
    +

    Including more classes#

    +

    Till now we have mainly focused on two classes, the so-called binary +system. Suppose we wish to extend to \(K\) classes. Let us for the sake +of simplicity assume we have only two predictors. We have then following model

    +
    +\[ +\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1, +\]
    +

    and

    +
    +\[ +\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1, +\]
    +

    and so on till the class \(C=K-1\) class

    +
    +\[ +\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1, +\]
    +

    and the model is specified in term of \(K-1\) so-called log-odds or +logit transformations.

    +
    +
    +

    More classes#

    +

    In our discussion of neural networks we will encounter the above again +in terms of a slightly modified function, the so-called Softmax function.

    +

    The softmax function is used in various multiclass classification +methods, such as multinomial logistic regression (also known as +softmax regression), multiclass linear discriminant analysis, naive +Bayes classifiers, and artificial neural networks. Specifically, in +multinomial logistic regression and linear discriminant analysis, the +input to the function is the result of \(K\) distinct linear functions, +and the predicted probability for the \(k\)-th class given a sample +vector \(\boldsymbol{x}\) and a weighting vector \(\boldsymbol{\theta}\) is (with two +predictors):

    +
    +\[ +p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}. +\]
    +

    It is easy to extend to more predictors. The final class is

    +
    +\[ +p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}, +\]
    +

    and they sum to one. Our earlier discussions were all specialized to +the case with two classes only. It is easy to see from the above that +what we derived earlier is compatible with these equations.

    +

    To find the optimal parameters we would typically use a gradient +descent method. Newton’s method and gradient descent methods are +discussed in the material on optimization +methods.

    +
    +
    +

    Optimization, the central part of any Machine Learning algortithm#

    +

    Almost every problem in machine learning and data science starts with +a dataset \(X\), a model \(g(\theta)\), which is a function of the +parameters \(\theta\) and a cost function \(C(X, g(\theta))\) that allows +us to judge how well the model \(g(\theta)\) explains the observations +\(X\). The model is fit by finding the values of \(\theta\) that minimize +the cost function. Ideally we would be able to solve for \(\theta\) +analytically, however this is not possible in general and we must use +some approximative/numerical method to compute the minimum.

    +
    +
    +

    Revisiting our Logistic Regression case#

    +

    In our discussion on Logistic Regression we studied the +case of +two classes, with \(y_i\) either +\(0\) or \(1\). Furthermore we assumed also that we have only two +parameters \(\theta\) in our fitting, that is we +defined probabilities

    +
    +\[\begin{split} +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} +\end{split}\]
    +

    where \(\boldsymbol{\theta}\) are the weights we wish to extract from data, in our case \(\theta_0\) and \(\theta_1\).

    +
    +
    +

    The equations to solve#

    +

    Our compact equations used a definition of a vector \(\boldsymbol{y}\) with \(n\) +elements \(y_i\), an \(n\times p\) matrix \(\boldsymbol{X}\) which contains the +\(x_i\) values and a vector \(\boldsymbol{p}\) of fitted probabilities +\(p(y_i\vert x_i,\boldsymbol{\theta})\). We rewrote in a more compact form +the first derivative of the cost function as

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). +\]
    +

    If we in addition define a diagonal matrix \(\boldsymbol{W}\) with elements +\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\), we can obtain a compact expression of the second derivative as

    +
    +\[ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +\]
    +

    This defines what is called the Hessian matrix.

    +
    +
    +

    Solving using Newton-Raphson’s method#

    +

    If we can set up these equations, Newton-Raphson’s iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.

    +

    Our iterative scheme is then given by

    +
    +\[ +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}}, +\]
    +

    or in matrix form as

    +
    +\[ +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}. +\]
    +

    The right-hand side is computed with the old values of \(\theta\).

    +

    If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.

    +
    +
    +

    Example code for Logistic Regression#

    +

    Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.

    +
    +
    +
    import numpy as np
    +
    +class LogisticRegression:
    +    """
    +    Logistic Regression for binary and multiclass classification.
    +    """
    +    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
    +        self.lr = lr                  # Learning rate for gradient descent
    +        self.epochs = epochs          # Number of iterations
    +        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
    +        self.verbose = verbose        # Print loss during training if True
    +        self.weights = None
    +        self.multi_class = False      # Will be determined at fit time
    +
    +    def _add_intercept(self, X):
    +        """Add intercept term (column of ones) to feature matrix."""
    +        intercept = np.ones((X.shape[0], 1))
    +        return np.concatenate((intercept, X), axis=1)
    +
    +    def _sigmoid(self, z):
    +        """Sigmoid function for binary logistic."""
    +        return 1 / (1 + np.exp(-z))
    +
    +    def _softmax(self, Z):
    +        """Softmax function for multiclass logistic."""
    +        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
    +        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
    +
    +    def fit(self, X, y):
    +        """
    +        Train the logistic regression model using gradient descent.
    +        Supports binary (sigmoid) and multiclass (softmax) based on y.
    +        """
    +        X = np.array(X)
    +        y = np.array(y)
    +        n_samples, n_features = X.shape
    +
    +        # Add intercept if needed
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +            n_features += 1
    +
    +        # Determine classes and mode (binary vs multiclass)
    +        unique_classes = np.unique(y)
    +        if len(unique_classes) > 2:
    +            self.multi_class = True
    +        else:
    +            self.multi_class = False
    +
    +        # ----- Multiclass case -----
    +        if self.multi_class:
    +            n_classes = len(unique_classes)
    +            # Map original labels to 0...n_classes-1
    +            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
    +            y_indices = np.array([class_to_index[c] for c in y])
    +            # Initialize weight matrix (features x classes)
    +            self.weights = np.zeros((n_features, n_classes))
    +
    +            # One-hot encode y
    +            Y_onehot = np.zeros((n_samples, n_classes))
    +            Y_onehot[np.arange(n_samples), y_indices] = 1
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
    +                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
    +                # Compute gradient (features x classes)
    +                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
    +                # Update weights
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute current loss (categorical cross-entropy)
    +                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
    +                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
    +
    +        # ----- Binary case -----
    +        else:
    +            # Convert y to 0/1 if not already
    +            if not np.array_equal(unique_classes, [0, 1]):
    +                # Map the two classes to 0 and 1
    +                class0, class1 = unique_classes
    +                y_binary = np.where(y == class1, 1, 0)
    +            else:
    +                y_binary = y.copy().astype(int)
    +
    +            # Initialize weights vector (features,)
    +            self.weights = np.zeros(n_features)
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                linear_model = X.dot(self.weights)     # (n_samples,)
    +                probs = self._sigmoid(linear_model)   # (n_samples,)
    +                # Gradient for binary cross-entropy
    +                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute binary cross-entropy loss
    +                    loss = -np.mean(
    +                        y_binary * np.log(probs + 1e-15) + 
    +                        (1 - y_binary) * np.log(1 - probs + 1e-15)
    +                    )
    +                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
    +
    +    def predict_prob(self, X):
    +        """
    +        Compute probability estimates. Returns a 1D array for binary or
    +        a 2D array (n_samples x n_classes) for multiclass.
    +        """
    +        X = np.array(X)
    +        # Add intercept if the model used it
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +        scores = X.dot(self.weights)
    +        if self.multi_class:
    +            return self._softmax(scores)
    +        else:
    +            return self._sigmoid(scores)
    +
    +    def predict(self, X):
    +        """
    +        Predict class labels for samples in X.
    +        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
    +        """
    +        probs = self.predict_prob(X)
    +        if self.multi_class:
    +            # Choose class with highest probability
    +            return np.argmax(probs, axis=1)
    +        else:
    +            # Threshold at 0.5 for binary
    +            return (probs >= 0.5).astype(int)
    +
    +
    +
    +
    +

    The class implements the sigmoid and softmax internally. During fit(), +we check the number of classes: if more than 2, we set +self.multi_class=True and perform multinomial logistic regression. We +one-hot encode the target vector and update a weight matrix with +softmax probabilities. Otherwise, we do standard binary logistic +regression, converting labels to 0/1 if needed and updating a weight +vector. In both cases we use batch gradient descent on the +cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical +stability). Progress (loss) can be printed if verbose=True.

    +
    +
    +
    # Evaluation Metrics
    +#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
    +
    +def accuracy_score(y_true, y_pred):
    +    """Accuracy = (# correct predictions) / (total samples)."""
    +    y_true = np.array(y_true)
    +    y_pred = np.array(y_pred)
    +    return np.mean(y_true == y_pred)
    +
    +def binary_cross_entropy(y_true, y_prob):
    +    """
    +    Binary cross-entropy loss.
    +    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
    +    """
    +    y_true = np.array(y_true)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
    +
    +def categorical_cross_entropy(y_true, y_prob):
    +    """
    +    Categorical cross-entropy loss for multiclass.
    +    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
    +    """
    +    y_true = np.array(y_true, dtype=int)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    # One-hot encode true labels
    +    n_samples, n_classes = y_prob.shape
    +    one_hot = np.zeros_like(y_prob)
    +    one_hot[np.arange(n_samples), y_true] = 1
    +    # Compute cross-entropy
    +    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
    +    return np.mean(loss_vec)
    +
    +
    +
    +
    +
    +

    Synthetic data generation#

    +

    Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2]. +Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.

    +
    +
    +
    import numpy as np
    +
    +def generate_binary_data(n_samples=100, n_features=2, random_state=None):
    +    """
    +    Generate synthetic binary classification data.
    +    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    # Half samples for class 0, half for class 1
    +    n0 = n_samples // 2
    +    n1 = n_samples - n0
    +    # Class 0 around mean -2, class 1 around +2
    +    mean0 = -2 * np.ones(n_features)
    +    mean1 =  2 * np.ones(n_features)
    +    X0 = rng.randn(n0, n_features) + mean0
    +    X1 = rng.randn(n1, n_features) + mean1
    +    X = np.vstack((X0, X1))
    +    y = np.array([0]*n0 + [1]*n1)
    +    return X, y
    +
    +def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
    +    """
    +    Generate synthetic multiclass data with n_classes Gaussian clusters.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    X = []
    +    y = []
    +    samples_per_class = n_samples // n_classes
    +    for cls in range(n_classes):
    +        # Random cluster center for each class
    +        center = rng.uniform(-5, 5, size=n_features)
    +        Xi = rng.randn(samples_per_class, n_features) + center
    +        yi = [cls] * samples_per_class
    +        X.append(Xi)
    +        y.extend(yi)
    +    X = np.vstack(X)
    +    y = np.array(y)
    +    return X, y
    +
    +
    +# Generate and test on binary data
    +X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
    +model_bin = LogisticRegression(lr=0.1, epochs=1000)
    +model_bin.fit(X_bin, y_bin)
    +y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
    +y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
    +
    +acc_bin = accuracy_score(y_bin, y_pred_bin)
    +loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
    +print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
    +#For multiclass:
    +# Generate and test on multiclass data
    +X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
    +model_multi = LogisticRegression(lr=0.1, epochs=1000)
    +model_multi.fit(X_multi, y_multi)
    +y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
    +y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
    +
    +acc_multi = accuracy_score(y_multi, y_pred_multi)
    +loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
    +print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
    +
    +# CSV Export
    +import csv
    +
    +# Export binary results
    +with open('binary_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_bin, y_pred_bin):
    +        writer.writerow([true, pred])
    +
    +# Export multiclass results
    +with open('multiclass_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_multi, y_pred_multi):
    +        writer.writerow([true, pred])
    +
    +
    +
    +
    +
    +
    +
    +

    Using Scikit-learn#

    +

    We show here how we can use a logistic regression case on a data set +included in scikit_learn, the so-called Wisconsin breast cancer data +using Logistic regression as our algorithm for classification. This is +a widely studied data set and can easily be included in demonstrations +of classification problems.

    +
    +
    +
    %matplotlib inline
    +
    +import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.model_selection import  train_test_split 
    +from sklearn.datasets import load_breast_cancer
    +from sklearn.linear_model import LogisticRegression
    +
    +# Load the data
    +cancer = load_breast_cancer()
    +
    +X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
    +print(X_train.shape)
    +print(X_test.shape)
    +# Logistic Regression
    +logreg = LogisticRegression(solver='lbfgs')
    +logreg.fit(X_train, y_train)
    +print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
    +
    +
    +
    +
    +
    +
    +

    Using the correlation matrix#

    +

    In addition to the above scores, we could also study the covariance (and the correlation matrix). +We use Pandas to compute the correlation matrix.

    +
    +
    +
    import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.model_selection import  train_test_split 
    +from sklearn.datasets import load_breast_cancer
    +from sklearn.linear_model import LogisticRegression
    +cancer = load_breast_cancer()
    +import pandas as pd
    +# Making a data frame
    +cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
    +
    +fig, axes = plt.subplots(15,2,figsize=(10,20))
    +malignant = cancer.data[cancer.target == 0]
    +benign = cancer.data[cancer.target == 1]
    +ax = axes.ravel()
    +
    +for i in range(30):
    +    _, bins = np.histogram(cancer.data[:,i], bins =50)
    +    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)
    +    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)
    +    ax[i].set_title(cancer.feature_names[i])
    +    ax[i].set_yticks(())
    +ax[0].set_xlabel("Feature magnitude")
    +ax[0].set_ylabel("Frequency")
    +ax[0].legend(["Malignant", "Benign"], loc ="best")
    +fig.tight_layout()
    +plt.show()
    +
    +import seaborn as sns
    +correlation_matrix = cancerpd.corr().round(1)
    +# use the heatmap function from seaborn to plot the correlation matrix
    +# annot = True to print the values inside the square
    +plt.figure(figsize=(15,8))
    +sns.heatmap(data=correlation_matrix, annot=True)
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Discussing the correlation data#

    +

    In the above example we note two things. In the first plot we display +the overlap of benign and malignant tumors as functions of the various +features in the Wisconsin data set. We see that for +some of the features we can distinguish clearly the benign and +malignant cases while for other features we cannot. This can point to +us which features may be of greater interest when we wish to classify +a benign or not benign tumour.

    +

    In the second figure we have computed the so-called correlation +matrix, which in our case with thirty features becomes a \(30\times 30\) +matrix.

    +

    We constructed this matrix using pandas via the statements

    +
    +
    +
    cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
    +
    +
    +
    +
    +

    and then

    +
    +
    +
    correlation_matrix = cancerpd.corr().round(1)
    +
    +
    +
    +
    +

    Diagonalizing this matrix we can in turn say something about which +features are of relevance and which are not. This leads us to +the classical Principal Component Analysis (PCA) theorem with +applications. This will be discussed later this semester.

    +
    +
    +

    Other measures in classification studies#

    +
    +
    +
    import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.model_selection import  train_test_split 
    +from sklearn.datasets import load_breast_cancer
    +from sklearn.linear_model import LogisticRegression
    +
    +# Load the data
    +cancer = load_breast_cancer()
    +
    +X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
    +print(X_train.shape)
    +print(X_test.shape)
    +# Logistic Regression
    +logreg = LogisticRegression(solver='lbfgs')
    +logreg.fit(X_train, y_train)
    +
    +from sklearn.preprocessing import LabelEncoder
    +from sklearn.model_selection import cross_validate
    +#Cross validation
    +accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']
    +print(accuracy)
    +print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
    +
    +import scikitplot as skplt
    +y_pred = logreg.predict(X_test)
    +skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
    +plt.show()
    +y_probas = logreg.predict_proba(X_test)
    +skplt.metrics.plot_roc(y_test, y_probas)
    +plt.show()
    +skplt.metrics.plot_cumulative_gain(y_test, y_probas)
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Introduction to Neural networks#

    +

    Artificial neural networks are computational systems that can learn to +perform tasks by considering examples, generally without being +programmed with any task-specific rules. It is supposed to mimic a +biological system, wherein neurons interact by sending signals in the +form of mathematical functions between layers. All layers can contain +an arbitrary number of neurons, and each connection is represented by +a weight variable.

    +
    +
    +

    Artificial neurons#

    +

    The field of artificial neural networks has a long history of +development, and is closely connected with the advancement of computer +science and computers in general. A model of artificial neurons was +first developed by McCulloch and Pitts in 1943 to study signal +processing in the brain and has later been refined by others. The +general idea is to mimic neural networks in the human brain, which is +composed of billions of neurons that communicate with each other by +sending electrical signals. Each neuron accumulates its incoming +signals, which must exceed an activation threshold to yield an +output. If the threshold is not overcome, the neuron remains inactive, +i.e. has zero output.

    +

    This behaviour has inspired a simple mathematical model for an artificial neuron.

    + +
    +
    +\[ +\begin{equation} + y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u) +\label{artificialNeuron} \tag{1} +\end{equation} +\]
    +

    Here, the output \(y\) of the neuron is the value of its activation function, which have as input +a weighted sum of signals \(x_i, \dots ,x_n\) received by \(n\) other neurons.

    +

    Conceptually, it is helpful to divide neural networks into four +categories:

    +
      +
    1. general purpose neural networks for supervised learning,

    2. +
    3. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),

    4. +
    5. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and

    6. +
    7. neural networks for unsupervised learning such as Deep Boltzmann Machines.

    8. +
    +

    In natural science, DNNs and CNNs have already found numerous +applications. In statistical physics, they have been applied to detect +phase transitions in 2D Ising and Potts models, lattice gauge +theories, and different phases of polymers, or solving the +Navier-Stokes equation in weather forecasting. Deep learning has also +found interesting applications in quantum physics. Various quantum +phase transitions can be detected and studied using DNNs and CNNs, +topological phases, and even non-equilibrium many-body +localization. Representing quantum states as DNNs quantum state +tomography are among some of the impressive achievements to reveal the +potential of DNNs to facilitate the study of quantum systems.

    +

    In quantum information theory, it has been shown that one can perform +gate decompositions with the help of neural.

    +

    The applications are not limited to the natural sciences. There is a +plethora of applications in essentially all disciplines, from the +humanities to life science and medicine.

    +
    +
    +

    Neural network types#

    +

    An artificial neural network (ANN), is a computational model that +consists of layers of connected neurons, or nodes or units. We will +refer to these interchangeably as units or nodes, and sometimes as +neurons.

    +

    It is supposed to mimic a biological nervous system by letting each +neuron interact with other neurons by sending signals in the form of +mathematical functions between layers. A wide variety of different +ANNs have been developed, but most of them consist of an input layer, +an output layer and eventual layers in-between, called hidden +layers. All layers can contain an arbitrary number of nodes, and each +connection between two nodes is associated with a weight variable.

    +

    Neural networks (also called neural nets) are neural-inspired +nonlinear models for supervised learning. As we will see, neural nets +can be viewed as natural, more powerful extensions of supervised +learning methods such as linear and logistic regression and soft-max +methods we discussed earlier.

    +
    +
    +

    Feed-forward neural networks#

    +

    The feed-forward neural network (FFNN) was the first and simplest type +of ANNs that were devised. In this network, the information moves in +only one direction: forward through the layers.

    +

    Nodes are represented by circles, while the arrows display the +connections between the nodes, including the direction of information +flow. Additionally, each arrow corresponds to a weight variable +(figure to come). We observe that each node in a layer is connected +to all nodes in the subsequent layer, making this a so-called +fully-connected FFNN.

    +
    +
    +

    Convolutional Neural Network#

    +

    A different variant of FFNNs are convolutional neural networks +(CNNs), which have a connectivity pattern inspired by the animal +visual cortex. Individual neurons in the visual cortex only respond to +stimuli from small sub-regions of the visual field, called a receptive +field. This makes the neurons well-suited to exploit the strong +spatially local correlation present in natural images. The response of +each neuron can be approximated mathematically as a convolution +operation. (figure to come)

    +

    Convolutional neural networks emulate the behaviour of neurons in the +visual cortex by enforcing a local connectivity pattern between +nodes of adjacent layers: Each node in a convolutional layer is +connected only to a subset of the nodes in the previous layer, in +contrast to the fully-connected FFNN. Often, CNNs consist of several +convolutional layers that learn local features of the input, with a +fully-connected layer at the end, which gathers all the local data and +produces the outputs. They have wide applications in image and video +recognition.

    +
    +
    +

    Recurrent neural networks#

    +

    So far we have only mentioned ANNs where information flows in one +direction: forward. Recurrent neural networks on the other hand, +have connections between nodes that form directed cycles. This +creates a form of internal memory which are able to capture +information on what has been calculated before; the output is +dependent on the previous computations. Recurrent NNs make use of +sequential information by performing the same task for every element +in a sequence, where each element depends on previous elements. An +example of such information is sentences, making recurrent NNs +especially well-suited for handwriting and speech recognition.

    +
    +
    +

    Other types of networks#

    +

    There are many other kinds of ANNs that have been developed. One type +that is specifically designed for interpolation in multidimensional +space is the radial basis function (RBF) network. RBFs are typically +made up of three layers: an input layer, a hidden layer with +non-linear radial symmetric activation functions and a linear output +layer (‘’linear’’ here means that each node in the output layer has a +linear activation function). The layers are normally fully-connected +and there are no cycles, thus RBFs can be viewed as a type of +fully-connected FFNN. They are however usually treated as a separate +type of NN due the unusual activation functions.

    +
    +
    +

    Multilayer perceptrons#

    +

    One uses often so-called fully-connected feed-forward neural networks +with three or more layers (an input layer, one or more hidden layers +and an output layer) consisting of neurons that have non-linear +activation functions.

    +

    Such networks are often called multilayer perceptrons (MLPs).

    +
    +
    +

    Why multilayer perceptrons?#

    +

    According to the Universal approximation theorem, a feed-forward +neural network with just a single hidden layer containing a finite +number of neurons can approximate a continuous multidimensional +function to arbitrary accuracy, assuming the activation function for +the hidden layer is a non-constant, bounded and +monotonically-increasing continuous function.

    +

    Note that the requirements on the activation function only applies to +the hidden layer, the output nodes are always assumed to be linear, so +as to not restrict the range of output values.

    +
    +
    +

    Illustration of a single perceptron model and a multi-perceptron model#

    + + +

    Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

    +
    +
    +

    Examples of XOR, OR and AND gates#

    +

    Let us first try to fit various gates using standard linear +regression. The gates we are thinking of are the classical XOR, OR and +AND gates, well-known elements in computer science. The tables here +show how we can set up the inputs \(x_1\) and \(x_2\) in order to yield a +specific target \(y_i\).

    +
    +
    +
    """
    +Simple code that tests XOR, OR and AND gates with linear regression
    +"""
    +
    +import numpy as np
    +# Design matrix
    +X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
    +print(f"The X.TX  matrix:{X.T @ X}")
    +Xinv = np.linalg.pinv(X.T @ X)
    +print(f"The invers of X.TX  matrix:{Xinv}")
    +
    +# The XOR gate 
    +yXOR = np.array( [ 0, 1 ,1, 0])
    +ThetaXOR  = Xinv @ X.T @ yXOR
    +print(f"The values of theta for the XOR gate:{ThetaXOR}")
    +print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
    +
    +
    +# The OR gate 
    +yOR = np.array( [ 0, 1 ,1, 1])
    +ThetaOR  = Xinv @ X.T @ yOR
    +print(f"The values of theta for the OR gate:{ThetaOR}")
    +print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
    +
    +
    +# The OR gate 
    +yAND = np.array( [ 0, 0 ,0, 1])
    +ThetaAND  = Xinv @ X.T @ yAND
    +print(f"The values of theta for the AND gate:{ThetaAND}")
    +print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
    +
    +
    +
    +
    +

    What is happening here?

    +
    +
    +

    Does Logistic Regression do a better Job?#

    +
    +
    +
    """
    +Simple code that tests XOR and OR gates with linear regression
    +and logistic regression
    +"""
    +
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LogisticRegression
    +import numpy as np
    +
    +# Design matrix
    +X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
    +print(f"The X.TX  matrix:{X.T @ X}")
    +Xinv = np.linalg.pinv(X.T @ X)
    +print(f"The invers of X.TX  matrix:{Xinv}")
    +
    +# The XOR gate 
    +yXOR = np.array( [ 0, 1 ,1, 0])
    +ThetaXOR  = Xinv @ X.T @ yXOR
    +print(f"The values of theta for the XOR gate:{ThetaXOR}")
    +print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
    +
    +
    +# The OR gate 
    +yOR = np.array( [ 0, 1 ,1, 1])
    +ThetaOR  = Xinv @ X.T @ yOR
    +print(f"The values of theta for the OR gate:{ThetaOR}")
    +print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
    +
    +
    +# The OR gate 
    +yAND = np.array( [ 0, 0 ,0, 1])
    +ThetaAND  = Xinv @ X.T @ yAND
    +print(f"The values of theta for the AND gate:{ThetaAND}")
    +print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
    +
    +# Now we change to logistic regression
    +
    +
    +# Logistic Regression
    +logreg = LogisticRegression()
    +logreg.fit(X, yOR)
    +print("Test set accuracy with Logistic Regression for OR gate: {:.2f}".format(logreg.score(X,yOR)))
    +
    +logreg.fit(X, yXOR)
    +print("Test set accuracy with Logistic Regression for XOR gate: {:.2f}".format(logreg.score(X,yXOR)))
    +
    +
    +logreg.fit(X, yAND)
    +print("Test set accuracy with Logistic Regression for AND gate: {:.2f}".format(logreg.score(X,yAND)))
    +
    +
    +
    +
    +

    Not exactly impressive, but somewhat better.

    +
    +
    +

    Adding Neural Networks#

    +
    +
    +
    
    +# and now neural networks with Scikit-Learn and the XOR
    +
    +from sklearn.neural_network import MLPClassifier
    +from sklearn.datasets import make_classification
    +X, yXOR = make_classification(n_samples=100, random_state=1)
    +FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)
    +FFNN.predict_proba(X)
    +print(f"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}")
    +
    +
    +
    +
    +
    +
    +

    Mathematical model#

    +

    The output \(y\) is produced via the activation function \(f\)

    +
    +\[ +y = f\left(\sum_{i=1}^n w_ix_i + b_i\right) = f(z), +\]
    +

    This function receives \(x_i\) as inputs. +Here the activation \(z=(\sum_{i=1}^n w_ix_i+b_i)\). +In an FFNN of such neurons, the inputs \(x_i\) are the outputs of +the neurons in the preceding layer. Furthermore, an MLP is +fully-connected, which means that each neuron receives a weighted sum +of the outputs of all neurons in the previous layer.

    +
    +
    +

    Mathematical model#

    +

    First, for each node \(i\) in the first hidden layer, we calculate a weighted sum \(z_i^1\) of the input coordinates \(x_j\),

    + +
    +
    +\[ +\begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1 +\label{_auto1} \tag{2} +\end{equation} +\]
    +

    Here \(b_i\) is the so-called bias which is normally needed in +case of zero activation weights or inputs. How to fix the biases and +the weights will be discussed below. The value of \(z_i^1\) is the +argument to the activation function \(f_i\) of each node \(i\), The +variable \(M\) stands for all possible inputs to a given node \(i\) in the +first layer. We define the output \(y_i^1\) of all neurons in layer 1 as

    + +
    +
    +\[ +\begin{equation} + y_i^1 = f(z_i^1) = f\left(\sum_{j=1}^M w_{ij}^1 x_j + b_i^1\right) +\label{outputLayer1} \tag{3} +\end{equation} +\]
    +

    where we assume that all nodes in the same layer have identical +activation functions, hence the notation \(f\). In general, we could assume in the more general case that different layers have different activation functions. +In this case we would identify these functions with a superscript \(l\) for the \(l\)-th layer,

    + +
    +
    +\[ +\begin{equation} + y_i^l = f^l(u_i^l) = f^l\left(\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\right) +\label{generalLayer} \tag{4} +\end{equation} +\]
    +

    where \(N_l\) is the number of nodes in layer \(l\). When the output of +all the nodes in the first hidden layer are computed, the values of +the subsequent layer can be calculated and so forth until the output +is obtained.

    +
    +
    +

    Mathematical model#

    +

    The output of neuron \(i\) in layer 2 is thus,

    + +
    +
    +\[ +\begin{equation} + y_i^2 = f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) +\label{_auto2} \tag{5} +\end{equation} +\]
    + +
    +
    +\[ +\begin{equation} + = f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right] +\label{outputLayer2} \tag{6} +\end{equation} +\]
    +

    where we have substituted \(y_k^1\) with the inputs \(x_k\). Finally, the ANN output reads

    + +
    +
    +\[ +\begin{equation} + y_i^3 = f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) +\label{_auto3} \tag{7} +\end{equation} +\]
    + +
    +
    +\[ +\begin{equation} + = f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right) + + b_1^3\right] +\label{_auto4} \tag{8} +\end{equation} +\]
    +
    +
    +

    Mathematical model#

    +

    We can generalize this expression to an MLP with \(l\) hidden +layers. The complete functional form is,

    + +
    +
    +\[ +\begin{equation} +y^{l+1}_i = f^{l+1}\left[\!\sum_{j=1}^{N_l} w_{ij}^3 f^l\left(\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\left(\dots f^1\left(\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\right)\dots\right)+b_k^2\right)+b_1^3\right] +\label{completeNN} \tag{9} +\end{equation} +\]
    +

    which illustrates a basic property of MLPs: The only independent +variables are the input values \(x_n\).

    +
    +
    +

    Mathematical model#

    +

    This confirms that an MLP, despite its quite convoluted mathematical +form, is nothing more than an analytic function, specifically a +mapping of real-valued vectors \(\hat{x} \in \mathbb{R}^n \rightarrow +\hat{y} \in \mathbb{R}^m\).

    +

    Furthermore, the flexibility and universality of an MLP can be +illustrated by realizing that the expression is essentially a nested +sum of scaled activation functions of the form

    + +
    +
    +\[ +\begin{equation} + f(x) = c_1 f(c_2 x + c_3) + c_4 +\label{_auto5} \tag{10} +\end{equation} +\]
    +

    where the parameters \(c_i\) are weights and biases. By adjusting these +parameters, the activation functions can be shifted up and down or +left and right, change slope or be rescaled which is the key to the +flexibility of a neural network.

    +
    +

    Matrix-vector notation#

    +

    We can introduce a more convenient notation for the activations in an A NN.

    +

    Additionally, we can represent the biases and activations +as layer-wise column vectors \(\hat{b}_l\) and \(\hat{y}_l\), so that the \(i\)-th element of each vector +is the bias \(b_i^l\) and activation \(y_i^l\) of node \(i\) in layer \(l\) respectively.

    +

    We have that \(\mathrm{W}_l\) is an \(N_{l-1} \times N_l\) matrix, while \(\hat{b}_l\) and \(\hat{y}_l\) are \(N_l \times 1\) column vectors. +With this notation, the sum becomes a matrix-vector multiplication, and we can write +the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as

    + +
    +
    +\[\begin{split} +\begin{equation} + \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = + f_2\left(\left[\begin{array}{ccc} + w^2_{11} &w^2_{12} &w^2_{13} \\ + w^2_{21} &w^2_{22} &w^2_{23} \\ + w^2_{31} &w^2_{32} &w^2_{33} \\ + \end{array} \right] \cdot + \left[\begin{array}{c} + y^1_1 \\ + y^1_2 \\ + y^1_3 \\ + \end{array}\right] + + \left[\begin{array}{c} + b^2_1 \\ + b^2_2 \\ + b^2_3 \\ + \end{array}\right]\right). +\label{_auto6} \tag{11} +\end{equation} +\end{split}\]
    +
    +
    +

    Matrix-vector notation and activation#

    +

    The activation of node \(i\) in layer 2 is

    + +
    +
    +\[ +\begin{equation} + y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = + f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right). +\label{_auto7} \tag{12} +\end{equation} +\]
    +

    This is not just a convenient and compact notation, but also a useful +and intuitive way to think about MLPs: The output is calculated by a +series of matrix-vector multiplications and vector additions that are +used as input to the activation functions. For each operation +\(\mathrm{W}_l \hat{y}_{l-1}\) we move forward one layer.

    +
    +
    +

    Activation functions#

    +

    A property that characterizes a neural network, other than its +connectivity, is the choice of activation function(s). As described +in, the following restrictions are imposed on an activation function +for a FFNN to fulfill the universal approximation theorem

    +
      +
    • Non-constant

    • +
    • Bounded

    • +
    • Monotonically-increasing

    • +
    • Continuous

    • +
    +
    +
    +

    Activation functions, Logistic and Hyperbolic ones#

    +

    The second requirement excludes all linear functions. Furthermore, in +a MLP with only linear activation functions, each layer simply +performs a linear transformation of its inputs.

    +

    Regardless of the number of layers, the output of the NN will be +nothing but a linear function of the inputs. Thus we need to introduce +some kind of non-linearity to the NN to be able to fit non-linear +functions Typical examples are the logistic Sigmoid

    +
    +\[ +f(x) = \frac{1}{1 + e^{-x}}, +\]
    +

    and the hyperbolic tangent function

    +
    +\[ +f(x) = \tanh(x) +\]
    +
    +
    +

    Relevance#

    +

    The sigmoid function are more biologically plausible because the +output of inactive neurons are zero. Such activation function are +called one-sided. However, it has been shown that the hyperbolic +tangent performs better than the sigmoid for training MLPs. has +become the most popular for deep neural networks

    +
    +
    +
    """The sigmoid function (or the logistic curve) is a 
    +function that takes any real number, z, and outputs a number (0,1).
    +It is useful in neural networks for assigning weights on a relative scale.
    +The value z is the weighted sum of parameters involved in the learning algorithm."""
    +
    +import numpy
    +import matplotlib.pyplot as plt
    +import math as mt
    +
    +z = numpy.arange(-5, 5, .1)
    +sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
    +sigma = sigma_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, sigma)
    +ax.set_ylim([-0.1, 1.1])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sigmoid function')
    +
    +plt.show()
    +
    +"""Step Function"""
    +z = numpy.arange(-5, 5, .02)
    +step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
    +step = step_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, step)
    +ax.set_ylim([-0.5, 1.5])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('step function')
    +
    +plt.show()
    +
    +"""Sine Function"""
    +z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
    +t = numpy.sin(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, t)
    +ax.set_ylim([-1.0, 1.0])
    +ax.set_xlim([-2*mt.pi,2*mt.pi])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sine function')
    +
    +plt.show()
    +
    +"""Plots a graph of the squashing function used by a rectified linear
    +unit"""
    +z = numpy.arange(-2, 2, .1)
    +zero = numpy.zeros(len(z))
    +y = numpy.max([zero, z], axis=0)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, y)
    +ax.set_ylim([-2.0, 2.0])
    +ax.set_xlim([-2.0, 2.0])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('Rectified linear unit')
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week41.html b/doc/LectureNotes/_build/html/week41.html new file mode 100644 index 000000000..c6cf52bc8 --- /dev/null +++ b/doc/LectureNotes/_build/html/week41.html @@ -0,0 +1,2080 @@ + + + + + + + + + + + Week 41 Neural networks and constructing a neural network code — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 41 Neural networks and constructing a neural network code

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 41 Neural networks and constructing a neural network code#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo, Norway

    +

    Date: Week 41

    +
    +

    Plan for week 41, October 6-10#

    +
    +
    +

    Material for the lecture on Monday October 6, 2025#

    +
      +
    1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.

    2. +
    3. Building our own Feed-forward Neural Network, getting started

    4. +
    + +
    +
    +

    Readings and Videos:#

    +
      +
    1. These lecture notes

    2. +
    3. For neural networks we recommend Goodfellow et al chapters 6 and 7.

    4. +
    5. Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub

    6. +
    7. Neural Networks demystified at https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs

    8. +
    9. Building Neural Networks from scratch at https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex

    10. +
    11. Video on Neural Networks at https://www.youtube.com/watch?v=CqOfi41LfDw

    12. +
    13. Video on the back propagation algorithm at https://www.youtube.com/watch?v=Ilg3gGewQ5U

    14. +
    15. We also recommend Michael Nielsen’s intuitive approach to the neural networks and the universal approximation theorem, see the slides at http://neuralnetworksanddeeplearning.com/chap4.html.

    16. +
    +
    +
    +

    Mathematics of deep learning#

    +

    Two recent books online.

    +
      +
    1. The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen, published as Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022

    2. +
    3. Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

    4. +
    +
    +
    +

    Reminder on books with hands-on material and codes#

    +

    Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch

    +
    +
    +

    Lab sessions on Tuesday and Wednesday#

    +

    Aim: Getting started with coding neural network. The exercises this +week aim at setting up the feed-forward part of a neural network.

    +
    +
    +

    Lecture Monday October 6#

    +
    +
    +

    Introduction to Neural networks#

    +

    Artificial neural networks are computational systems that can learn to +perform tasks by considering examples, generally without being +programmed with any task-specific rules. It is supposed to mimic a +biological system, wherein neurons interact by sending signals in the +form of mathematical functions between layers. All layers can contain +an arbitrary number of neurons, and each connection is represented by +a weight variable.

    +
    +
    +

    Artificial neurons#

    +

    The field of artificial neural networks has a long history of +development, and is closely connected with the advancement of computer +science and computers in general. A model of artificial neurons was +first developed by McCulloch and Pitts in 1943 to study signal +processing in the brain and has later been refined by others. The +general idea is to mimic neural networks in the human brain, which is +composed of billions of neurons that communicate with each other by +sending electrical signals. Each neuron accumulates its incoming +signals, which must exceed an activation threshold to yield an +output. If the threshold is not overcome, the neuron remains inactive, +i.e. has zero output.

    +

    This behaviour has inspired a simple mathematical model for an artificial neuron.

    + +
    +
    +\[ +\begin{equation} + y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u) +\label{artificialNeuron} \tag{1} +\end{equation} +\]
    +

    Here, the output \(y\) of the neuron is the value of its activation function, which have as input +a weighted sum of signals \(x_i, \dots ,x_n\) received by \(n\) other neurons.

    +

    Conceptually, it is helpful to divide neural networks into four +categories:

    +
      +
    1. general purpose neural networks for supervised learning,

    2. +
    3. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),

    4. +
    5. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and

    6. +
    7. neural networks for unsupervised learning such as Deep Boltzmann Machines.

    8. +
    +

    In natural science, DNNs and CNNs have already found numerous +applications. In statistical physics, they have been applied to detect +phase transitions in 2D Ising and Potts models, lattice gauge +theories, and different phases of polymers, or solving the +Navier-Stokes equation in weather forecasting. Deep learning has also +found interesting applications in quantum physics. Various quantum +phase transitions can be detected and studied using DNNs and CNNs, +topological phases, and even non-equilibrium many-body +localization. Representing quantum states as DNNs quantum state +tomography are among some of the impressive achievements to reveal the +potential of DNNs to facilitate the study of quantum systems.

    +

    In quantum information theory, it has been shown that one can perform +gate decompositions with the help of neural.

    +

    The applications are not limited to the natural sciences. There is a +plethora of applications in essentially all disciplines, from the +humanities to life science and medicine.

    +
    +
    +

    Neural network types#

    +

    An artificial neural network (ANN), is a computational model that +consists of layers of connected neurons, or nodes or units. We will +refer to these interchangeably as units or nodes, and sometimes as +neurons.

    +

    It is supposed to mimic a biological nervous system by letting each +neuron interact with other neurons by sending signals in the form of +mathematical functions between layers. A wide variety of different +ANNs have been developed, but most of them consist of an input layer, +an output layer and eventual layers in-between, called hidden +layers. All layers can contain an arbitrary number of nodes, and each +connection between two nodes is associated with a weight variable.

    +

    Neural networks (also called neural nets) are neural-inspired +nonlinear models for supervised learning. As we will see, neural nets +can be viewed as natural, more powerful extensions of supervised +learning methods such as linear and logistic regression and soft-max +methods we discussed earlier.

    +
    +
    +

    Feed-forward neural networks#

    +

    The feed-forward neural network (FFNN) was the first and simplest type +of ANNs that were devised. In this network, the information moves in +only one direction: forward through the layers.

    +

    Nodes are represented by circles, while the arrows display the +connections between the nodes, including the direction of information +flow. Additionally, each arrow corresponds to a weight variable +(figure to come). We observe that each node in a layer is connected +to all nodes in the subsequent layer, making this a so-called +fully-connected FFNN.

    +
    +
    +

    Convolutional Neural Network#

    +

    A different variant of FFNNs are convolutional neural networks +(CNNs), which have a connectivity pattern inspired by the animal +visual cortex. Individual neurons in the visual cortex only respond to +stimuli from small sub-regions of the visual field, called a receptive +field. This makes the neurons well-suited to exploit the strong +spatially local correlation present in natural images. The response of +each neuron can be approximated mathematically as a convolution +operation. (figure to come)

    +

    Convolutional neural networks emulate the behaviour of neurons in the +visual cortex by enforcing a local connectivity pattern between +nodes of adjacent layers: Each node in a convolutional layer is +connected only to a subset of the nodes in the previous layer, in +contrast to the fully-connected FFNN. Often, CNNs consist of several +convolutional layers that learn local features of the input, with a +fully-connected layer at the end, which gathers all the local data and +produces the outputs. They have wide applications in image and video +recognition.

    +
    +
    +

    Recurrent neural networks#

    +

    So far we have only mentioned ANNs where information flows in one +direction: forward. Recurrent neural networks on the other hand, +have connections between nodes that form directed cycles. This +creates a form of internal memory which are able to capture +information on what has been calculated before; the output is +dependent on the previous computations. Recurrent NNs make use of +sequential information by performing the same task for every element +in a sequence, where each element depends on previous elements. An +example of such information is sentences, making recurrent NNs +especially well-suited for handwriting and speech recognition.

    +
    +
    +

    Other types of networks#

    +

    There are many other kinds of ANNs that have been developed. One type +that is specifically designed for interpolation in multidimensional +space is the radial basis function (RBF) network. RBFs are typically +made up of three layers: an input layer, a hidden layer with +non-linear radial symmetric activation functions and a linear output +layer (‘’linear’’ here means that each node in the output layer has a +linear activation function). The layers are normally fully-connected +and there are no cycles, thus RBFs can be viewed as a type of +fully-connected FFNN. They are however usually treated as a separate +type of NN due the unusual activation functions.

    +
    +
    +

    Multilayer perceptrons#

    +

    One uses often so-called fully-connected feed-forward neural networks +with three or more layers (an input layer, one or more hidden layers +and an output layer) consisting of neurons that have non-linear +activation functions.

    +

    Such networks are often called multilayer perceptrons (MLPs).

    +
    +
    +

    Why multilayer perceptrons?#

    +

    According to the Universal approximation theorem, a feed-forward +neural network with just a single hidden layer containing a finite +number of neurons can approximate a continuous multidimensional +function to arbitrary accuracy, assuming the activation function for +the hidden layer is a non-constant, bounded and +monotonically-increasing continuous function.

    +

    Note that the requirements on the activation function only applies to +the hidden layer, the output nodes are always assumed to be linear, so +as to not restrict the range of output values.

    +
    +
    +

    Illustration of a single perceptron model and a multi-perceptron model#

    + + +

    Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

    +
    +
    +

    Mathematics of deep learning and neural networks#

    +

    Neural networks, in its so-called feed-forward form, where each +iterations contains a feed-forward stage and a back-propgagation +stage, consist of series of affine matrix-matrix and matrix-vector +multiplications. The unknown parameters (the so-called biases and +weights which deternine the architecture of a neural network), are +uptaded iteratively using the so-called back-propagation algorithm. +This algorithm corresponds to the so-called reverse mode of +automatic differentation.

    +
    +
    +

    Basics of an NN#

    +

    A neural network consists of a series of hidden layers, in addition to +the input and output layers. Each layer \(l\) has a set of parameters +\(\boldsymbol{\Theta}^{(l)}=(\boldsymbol{W}^{(l)},\boldsymbol{b}^{(l)})\) which are related to the +parameters in other layers through a series of affine transformations, +for a standard NN these are matrix-matrix and matrix-vector +multiplications. For all layers we will simply use a collective variable \(\boldsymbol{\Theta}\).

    +

    It consist of two basic steps:

    +
      +
    1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.

    2. +
    3. a back-propagation state where the unknown parameters \(\boldsymbol{\Theta}\) are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.

    4. +
    +

    These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion.

    +
    +
    +

    Overarching view of a neural network#

    +

    The architecture of a neural network defines our model. This model +aims at describing some function \(f(\boldsymbol{x}\) which represents +some final result (outputs or tagrget values) given a specific inpput +\(\boldsymbol{x}\). Note that here \(\boldsymbol{y}\) and \(\boldsymbol{x}\) are not limited to be +vectors.

    +

    The architecture consists of

    +
      +
    1. An input and an output layer where the input layer is defined by the inputs \(\boldsymbol{x}\). The output layer produces the model ouput \(\boldsymbol{\tilde{y}}\) which is compared with the target value \(\boldsymbol{y}\)

    2. +
    3. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)

    4. +
    5. A given activation function \(\sigma(\boldsymbol{z})\) with arguments \(\boldsymbol{z}\) to be defined below. The activation functions may differ from layer to layer.

    6. +
    7. The last layer, normally called output layer has normally an activation function tailored to the specific problem

    8. +
    9. Finally we define a so-called cost or loss function which is used to gauge the quality of our model.

    10. +
    +
    +
    +

    The optimization problem#

    +

    The cost function is a function of the unknown parameters +\(\boldsymbol{\Theta}\) where the latter is a container for all possible +parameters needed to define a neural network

    +

    If we are dealing with a regression task a typical cost/loss function +is the mean squared error

    +
    +\[ +C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}. +\]
    +

    This function represents one of many possible ways to define +the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters \(\boldsymbol{\Theta}\). This is in general not the case.

    +
    +
    +

    Parameters of neural networks#

    +

    For neural networks the parameters +\(\boldsymbol{\Theta}\) are given by the so-called weights and biases (to be +defined below).

    +

    The weights are given by matrix elements \(w_{ij}^{(l)}\) where the +superscript indicates the layer number. The biases are typically given +by vector elements representing each single node of a given layer, +that is \(b_j^{(l)}\).

    +
    +
    +

    Other ingredients of a neural network#

    +

    Having defined the architecture of a neural network, the optimization +of the cost function with respect to the parameters \(\boldsymbol{\Theta}\), +involves the calculations of gradients and their optimization. The +gradients represent the derivatives of a multidimensional object and +are often approximated by various gradient methods, including

    +
      +
    1. various quasi-Newton methods,

    2. +
    3. plain gradient descent (GD) with a constant learning rate \(\eta\),

    4. +
    5. GD with momentum and other approximations to the learning rates such as

    6. +
    +
      +
    • Adapative gradient (ADAgrad)

    • +
    • Root mean-square propagation (RMSprop)

    • +
    • Adaptive gradient with momentum (ADAM) and many other

    • +
    +
      +
    1. Stochastic gradient descent and various families of learning rate approximations

    2. +
    +
    +
    +

    Other parameters#

    +

    In addition to the above, there are often additional hyperparamaters +which are included in the setup of a neural network. These will be +discussed below.

    +
    +
    +

    Universal approximation theorem#

    +

    The universal approximation theorem plays a central role in deep +learning. Cybenko (1989) showed +the following:

    +

    Let \(\sigma\) be any continuous sigmoidal function such that

    +
    +\[\begin{split} +\sigma(z) = \left\{\begin{array}{cc} 1 & z\rightarrow \infty\\ 0 & z \rightarrow -\infty \end{array}\right. +\end{split}\]
    +

    Given a continuous and deterministic function \(F(\boldsymbol{x})\) on the unit +cube in \(d\)-dimensions \(F\in [0,1]^d\), \(x\in [0,1]^d\) and a parameter +\(\epsilon >0\), there is a one-layer (hidden) neural network +\(f(\boldsymbol{x};\boldsymbol{\Theta})\) with \(\boldsymbol{\Theta}=(\boldsymbol{W},\boldsymbol{b})\) and \(\boldsymbol{W}\in +\mathbb{R}^{m\times n}\) and \(\boldsymbol{b}\in \mathbb{R}^{n}\), for which

    +
    +\[ +\vert F(\boldsymbol{x})-f(\boldsymbol{x};\boldsymbol{\Theta})\vert < \epsilon \hspace{0.1cm} \forall \boldsymbol{x}\in[0,1]^d. +\]
    +
    +
    +

    Some parallels from real analysis#

    +

    For those of you familiar with for example the Stone-Weierstrass +theorem +for polynomial approximations or the convergence criterion for Fourier +series, there are similarities in the derivation of the proof for +neural networks.

    +
    +
    +

    The approximation theorem in words#

    +

    Any continuous function \(y=F(\boldsymbol{x})\) supported on the unit cube in +\(d\)-dimensions can be approximated by a one-layer sigmoidal network to +arbitrary accuracy.

    +

    Hornik (1991) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value

    +
    +\[ +\mathbb{E}[\vert F(\boldsymbol{x})\vert^2] =\int_{\boldsymbol{x}\in D} \vert F(\boldsymbol{x})\vert^2p(\boldsymbol{x})d\boldsymbol{x} < \infty. +\]
    +

    Then we have

    +
    +\[ +\mathbb{E}[\vert F(\boldsymbol{x})-f(\boldsymbol{x};\boldsymbol{\Theta})\vert^2] =\int_{\boldsymbol{x}\in D} \vert F(\boldsymbol{x})-f(\boldsymbol{x};\boldsymbol{\Theta})\vert^2p(\boldsymbol{x})d\boldsymbol{x} < \epsilon. +\]
    +
    +
    +

    More on the general approximation theorem#

    +

    None of the proofs give any insight into the relation between the +number of of hidden layers and nodes and the approximation error +\(\epsilon\), nor the magnitudes of \(\boldsymbol{W}\) and \(\boldsymbol{b}\).

    +

    Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.

    +

    It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want.

    +
    +
    +

    Class of functions we can approximate#

    +

    The class of functions that can be approximated are the continuous ones. +If the function \(F(\boldsymbol{x})\) is discontinuous, it won’t in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points.

    +
    +
    +

    Setting up the equations for a neural network#

    +

    The questions we want to ask are how do changes in the biases and the +weights in our network change the cost function and how can we use the +final output to modify the weights and biases?

    +

    To derive these equations let us start with a plain regression problem +and define our cost function as

    +
    +\[ +{\cal C}(\boldsymbol{\Theta}) = \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2, +\]
    +

    where the \(y_i\)s are our \(n\) targets (the values we want to +reproduce), while the outputs of the network after having propagated +all inputs \(\boldsymbol{x}\) are given by \(\boldsymbol{\tilde{y}}_i\).

    +
    +
    +

    Layout of a neural network with three hidden layers#

    + + +

    Figure 1:

    +
    +
    +

    Definitions#

    +

    With our definition of the targets \(\boldsymbol{y}\), the outputs of the +network \(\boldsymbol{\tilde{y}}\) and the inputs \(\boldsymbol{x}\) we +define now the activation \(z_j^l\) of node/neuron/unit \(j\) of the +\(l\)-th layer as a function of the bias, the weights which add up from +the previous layer \(l-1\) and the forward passes/outputs +\(\hat{a}^{l-1}\) from the previous layer as

    +
    +\[ +z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l, +\]
    +

    where \(b_k^l\) are the biases from layer \(l\). Here \(M_{l-1}\) +represents the total number of nodes/neurons/units of layer \(l-1\). The +figure in the whiteboard notes illustrates this equation. We can rewrite this in a more +compact form as the matrix-vector products we discussed earlier,

    +
    +\[ +\hat{z}^l = \left(\hat{W}^l\right)^T\hat{a}^{l-1}+\hat{b}^l. +\]
    +
    +
    +

    Inputs to the activation function#

    +

    With the activation values \(\boldsymbol{z}^l\) we can in turn define the +output of layer \(l\) as \(\boldsymbol{a}^l = f(\boldsymbol{z}^l)\) where \(f\) is our +activation function. In the examples here we will use the sigmoid +function discussed in our logistic regression lectures. We will also use the same activation function \(f\) for all layers +and their nodes. It means we have

    +
    +\[ +a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}. +\]
    +
    +
    +

    Derivatives and the chain rule#

    +

    From the definition of the activation \(z_j^l\) we have

    +
    +\[ +\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1}, +\]
    +

    and

    +
    +\[ +\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. +\]
    +

    With our definition of the activation function we have that (note that this function depends only on \(z_j^l\))

    +
    +\[ +\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)). +\]
    +
    +
    +

    Derivative of the cost function#

    +

    With these definitions we can now compute the derivative of the cost function in terms of the weights.

    +

    Let us specialize to the output layer \(l=L\). Our cost function is

    +
    +\[ +{\cal C}(\boldsymbol{\Theta}^L) = \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2, +\]
    +

    The derivative of this function with respect to the weights is

    +
    +\[ +\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{jk}^L} = \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{jk}^{L}}, +\]
    +

    The last partial derivative can easily be computed and reads (by applying the chain rule)

    +
    +\[ +\frac{\partial a_j^L}{\partial w_{jk}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}. +\]
    +
    +
    +

    Simpler examples first, and automatic differentiation#

    +

    In order to understand the back propagation algorithm and its +derivation (an implementation of the chain rule), let us first digress +with some simple examples. These examples are also meant to motivate +the link with back propagation and automatic differentiation. We will discuss these topics next week (week 42).

    +
    +
    +

    Reminder on the chain rule and gradients#

    +

    If we have a multivariate function \(f(x,y)\) where \(x=x(t)\) and \(y=y(t)\) are functions of a variable \(t\), we have that the gradient of \(f\) with respect to \(t\) (without the explicit unit vector components)

    +
    +\[\begin{split} +\frac{df}{dt} = \begin{bmatrix}\frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix}\frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial t} \end{bmatrix}=\frac{\partial f}{\partial x} \frac{\partial x}{\partial t} +\frac{\partial f}{\partial y} \frac{\partial y}{\partial t}. +\end{split}\]
    +
    +
    +

    Multivariable functions#

    +

    If we have a multivariate function \(f(x,y)\) where \(x=x(t,s)\) and \(y=y(t,s)\) are functions of the variables \(t\) and \(s\), we have that the partial derivatives

    +
    +\[ +\frac{\partial f}{\partial s}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial s}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial s}, +\]
    +

    and

    +
    +\[ +\frac{\partial f}{\partial t}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}. +\]
    +

    the gradient of \(f\) with respect to \(t\) and \(s\) (without the explicit unit vector components)

    +
    +\[\begin{split} +\frac{df}{d(s,t)} = \begin{bmatrix}\frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix}\frac{\partial x}{\partial s} &\frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial s} & \frac{\partial y}{\partial t} \end{bmatrix}. +\end{split}\]
    +
    +
    +

    Automatic differentiation through examples#

    +

    A great introduction to automatic differentiation is given by Baydin et al., see https://arxiv.org/abs/1502.05767. +See also the video at https://www.youtube.com/watch?v=wG_nF1awSSY.

    +

    Automatic differentiation is a represented by a repeated application +of the chain rule on well-known functions and allows for the +calculation of derivatives to numerical precision. It is not the same +as the calculation of symbolic derivatives via for example SymPy, nor +does it use approximative formulae based on Taylor-expansions of a +function around a given value. The latter are error prone due to +truncation errors and values of the step size \(\Delta\).

    +
    +
    +

    Simple example#

    +

    Our first example is rather simple,

    +
    +\[ +f(x) =\exp{x^2}, +\]
    +

    with derivative

    +
    +\[ +f'(x) =2x\exp{x^2}. +\]
    +

    We can use SymPy to extract the pertinent lines of Python code through the following simple example

    +
    +
    +
    from __future__ import division
    +from sympy import *
    +x = symbols('x')
    +expr = exp(x*x)
    +simplify(expr)
    +derivative = diff(expr,x)
    +print(python(expr))
    +print(python(derivative))
    +
    +
    +
    +
    +
    +
    +

    Smarter way of evaluating the above function#

    +

    If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable

    +
    +\[ +a = x^2, +\]
    +

    leading to

    +
    +\[ +f(x) = f(a(x)) = b= \exp{a}. +\]
    +

    We now assume that all operations can be counted in terms of equal +floating point operations. This means that in order to calculate +\(f(x)\) we need first to square \(x\) and then compute the exponential. We +have thus two floating point operations only.

    +
    +
    +

    Reducing the number of operations#

    +

    With the introduction of a precalculated quantity \(a\) and thereby \(f(x)\) we have that the derivative can be written as

    +
    +\[ +f'(x) = 2xb, +\]
    +

    which reduces the number of operations from four in the orginal +expression to two. This means that if we need to compute \(f(x)\) and +its derivative (a common task in optimizations), we have reduced the +number of operations from six to four in total.

    +

    Note that the usage of a symbolic software like SymPy does not +include such simplifications and the calculations of the function and +the derivatives yield in general more floating point operations.

    +
    +
    +

    Chain rule, forward and reverse modes#

    +

    In the above example we have introduced the variables \(a\) and \(b\), and our function is

    +
    +\[ +f(x) = f(a(x)) = b= \exp{a}, +\]
    +

    with \(a=x^2\). We can decompose the derivative of \(f\) with respect to \(x\) as

    +
    +\[ +\frac{df}{dx}=\frac{df}{db}\frac{db}{da}\frac{da}{dx}. +\]
    +

    We note that since \(b=f(x)\) that

    +
    +\[ +\frac{df}{db}=1, +\]
    +

    leading to

    +
    +\[ +\frac{df}{dx}=\frac{db}{da}\frac{da}{dx}=2x\exp{x^2}, +\]
    +

    as before.

    +
    +
    +

    Forward and reverse modes#

    +

    We have that

    +
    +\[ +\frac{df}{dx}=\frac{df}{db}\frac{db}{da}\frac{da}{dx}, +\]
    +

    which we can rewrite either as

    +
    +\[ +\frac{df}{dx}=\left[\frac{df}{db}\frac{db}{da}\right]\frac{da}{dx}, +\]
    +

    or

    +
    +\[ +\frac{df}{dx}=\frac{df}{db}\left[\frac{db}{da}\frac{da}{dx}\right]. +\]
    +

    The first expression is called reverse mode (or back propagation) +since we start by evaluating the derivatives at the end point and then +propagate backwards. This is the standard way of evaluating +derivatives (gradients) when optimizing the parameters of a neural +network. In the context of deep learning this is computationally +more efficient since the output of a neural network consists of either +one or some few other output variables.

    +

    The second equation defines the so-called forward mode.

    +
    +
    +

    More complicated function#

    +

    We increase our ambitions and introduce a slightly more complicated function

    +
    +\[ +f(x) =\sqrt{x^2+exp{x^2}}, +\]
    +

    with derivative

    +
    +\[ +f'(x) =\frac{x(1+\exp{x^2})}{\sqrt{x^2+exp{x^2}}}. +\]
    +

    The corresponding SymPy code reads

    +
    +
    +
    from __future__ import division
    +from sympy import *
    +x = symbols('x')
    +expr = sqrt(x*x+exp(x*x))
    +simplify(expr)
    +derivative = diff(expr,x)
    +print(python(expr))
    +print(python(derivative))
    +
    +
    +
    +
    +
    +
    +

    Counting the number of floating point operations#

    +

    A simple count of operations shows that we need five operations for +the function itself and ten for the derivative. Fifteen operations in total if we wish to proceed with the above codes.

    +

    Can we reduce this to +say half the number of operations?

    +
    +
    +

    Defining intermediate operations#

    +

    We can indeed reduce the number of operation to half of those listed in the brute force approach above. +We define the following quantities

    +
    +\[ +a = x^2, +\]
    +

    and

    +
    +\[ +b = \exp{x^2} = \exp{a}, +\]
    +

    and

    +
    +\[ +c= a+b, +\]
    +

    and

    +
    +\[ +d=f(x)=\sqrt{c}. +\]
    +
    +
    +

    New expression for the derivative#

    +

    With these definitions we obtain the following partial derivatives

    +
    +\[ +\frac{\partial a}{\partial x} = 2x, +\]
    +

    and

    +
    +\[ +\frac{\partial b}{\partial a} = \exp{a}, +\]
    +

    and

    +
    +\[ +\frac{\partial c}{\partial a} = 1, +\]
    +

    and

    +
    +\[ +\frac{\partial c}{\partial b} = 1, +\]
    +

    and

    +
    +\[ +\frac{\partial d}{\partial c} = \frac{1}{2\sqrt{c}}, +\]
    +

    and finally

    +
    +\[ +\frac{\partial f}{\partial d} = 1. +\]
    +
    +
    +

    Final derivatives#

    +

    Our final derivatives are thus

    +
    +\[ +\frac{\partial f}{\partial c} = \frac{\partial f}{\partial d} \frac{\partial d}{\partial c} = \frac{1}{2\sqrt{c}}, +\]
    +
    +\[ +\frac{\partial f}{\partial b} = \frac{\partial f}{\partial c} \frac{\partial c}{\partial b} = \frac{1}{2\sqrt{c}}, +\]
    +
    +\[ +\frac{\partial f}{\partial a} = \frac{\partial f}{\partial c} \frac{\partial c}{\partial a}+ +\frac{\partial f}{\partial b} \frac{\partial b}{\partial a} = \frac{1+\exp{a}}{2\sqrt{c}}, +\]
    +

    and finally

    +
    +\[ +\frac{\partial f}{\partial x} = \frac{\partial f}{\partial a} \frac{\partial a}{\partial x} = \frac{x(1+\exp{a})}{\sqrt{c}}, +\]
    +

    which is just

    +
    +\[ +\frac{\partial f}{\partial x} = \frac{x(1+b)}{d}, +\]
    +

    and requires only three operations if we can reuse all intermediate variables.

    +
    +
    +

    In general not this simple#

    +

    In general, see the generalization below, unless we can obtain simple +analytical expressions which we can simplify further, the final +implementation of automatic differentiation involves repeated +calculations (and thereby operations) of derivatives of elementary +functions.

    +
    +
    +

    Automatic differentiation#

    +

    We can make this example more formal. Automatic differentiation is a +formalization of the previous example (see graph).

    +

    We define \(\boldsymbol{x}\in x_1,\dots, x_l\) input variables to a given function \(f(\boldsymbol{x})\) and \(x_{l+1},\dots, x_L\) intermediate variables.

    +

    In the above example we have only one input variable, \(l=1\) and four intermediate variables, that is

    +
    +\[ +\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\exp{a}= b & x_4=c=a+b & x_5 = \sqrt{c}=d \end{bmatrix}. +\]
    +

    Furthemore, for \(i=l+1, \dots, L\) (here \(i=2,3,4,5\) and \(f=x_L=d\)), we +define the elementary functions \(g_i(x_{Pa(x_i)})\) where \(x_{Pa(x_i)}\) are the parent nodes of the variable \(x_i\).

    +

    In our case, we have for example for \(x_3=g_3(x_{Pa(x_i)})=\exp{a}\), that \(g_3=\exp{()}\) and \(x_{Pa(x_3)}=a\).

    +
    +
    +

    Chain rule#

    +

    We can now compute the gradients by back-propagating the derivatives using the chain rule. +We have defined

    +
    +\[ +\frac{\partial f}{\partial x_L} = 1, +\]
    +

    which allows us to find the derivatives of the various variables \(x_i\) as

    +
    +\[ +\frac{\partial f}{\partial x_i} = \sum_{x_j:x_i\in Pa(x_j)}\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial x_i}=\sum_{x_j:x_i\in Pa(x_j)}\frac{\partial f}{\partial x_j} \frac{\partial g_j}{\partial x_i}. +\]
    +

    Whenever we have a function which can be expressed as a computation +graph and the various functions can be expressed in terms of +elementary functions that are differentiable, then automatic +differentiation works. The functions may not need to be elementary +functions, they could also be computer programs, although not all +programs can be automatically differentiated.

    +
    +
    +

    First network example, simple percepetron with one input#

    +

    As yet another example we define now a simple perceptron model with +all quantities given by scalars. We consider only one input variable +\(x\) and one target value \(y\). We define an activation function +\(\sigma_1\) which takes as input

    +
    +\[ +z_1 = w_1x+b_1, +\]
    +

    where \(w_1\) is the weight and \(b_1\) is the bias. These are the +parameters we want to optimize. The output is \(a_1=\sigma(z_1)\) (see +graph from whiteboard notes). This output is then fed into the +cost/loss function, which we here for the sake of simplicity just +define as the squared error

    +
    +\[ +C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2. +\]
    +
    +
    +

    Layout of a simple neural network with no hidden layer#

    + + +

    Figure 1:

    +
    +
    +

    Optimizing the parameters#

    +

    In setting up the feed forward and back propagation parts of the +algorithm, we need now the derivative of the various variables we want +to train.

    +

    We need

    +
    +\[ +\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}. +\]
    +

    Using the chain rule we find

    +
    +\[ +\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x, +\]
    +

    and

    +
    +\[ +\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1', +\]
    +

    which we later will just define as

    +
    +\[ +\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1. +\]
    +
    +
    +

    Adding a hidden layer#

    +

    We change our simple model to (see graph) +a network with just one hidden layer but with scalar variables only.

    +

    Our output variable changes to \(a_2\) and \(a_1\) is now the output from the hidden node and \(a_0=x\). +We have then

    +
    +\[ +z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1), +\]
    +
    +\[ +z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2), +\]
    +

    and the cost function

    +
    +\[ +C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2, +\]
    +

    with \(\boldsymbol{\Theta}=[w_1,w_2,b_1,b_2]\).

    +
    +
    +

    Layout of a simple neural network with one hidden layer#

    + + +

    Figure 1:

    +
    +
    +

    The derivatives#

    +

    The derivatives are now, using the chain rule again

    +
    +\[ +\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1, +\]
    +
    +\[ +\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2, +\]
    +
    +\[ +\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0, +\]
    +
    +\[ +\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1. +\]
    +

    Can you generalize this to more than one hidden layer?

    +
    +
    +

    Important observations#

    +

    From the above equations we see that the derivatives of the activation +functions play a central role. If they vanish, the training may +stop. This is called the vanishing gradient problem, see discussions below. If they become +large, the parameters \(w_i\) and \(b_i\) may simply go to infinity. This +is referenced as the exploding gradient problem.

    +
    +
    +

    The training#

    +

    The training of the parameters is done through various gradient descent approximations with

    +
    +\[ +w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1}, +\]
    +

    and

    +
    +\[ +b_i \leftarrow b_i-\eta \delta_i, +\]
    +

    with \(\eta\) is the learning rate.

    +

    One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters \(\boldsymbol{\Theta}\).

    +

    For the first hidden layer \(a_{i-1}=a_0=x\) for this simple model.

    +
    +
    +

    Code example#

    +

    The code here implements the above model with one hidden layer and +scalar variables for the same function we studied in the previous +example. The code is however set up so that we can add multiple +inputs \(x\) and target values \(y\). Note also that we have the +possibility of defining a feature matrix \(\boldsymbol{X}\) with more than just +one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.

    +
    +
    +
    import numpy as np
    +# We use the Sigmoid function as activation function
    +def sigmoid(z):
    +    return 1.0/(1.0+np.exp(-z))
    +
    +def forwardpropagation(x):
    +    # weighted sum of inputs to the hidden layer
    +    z_1 = np.matmul(x, w_1) + b_1
    +    # activation in the hidden layer
    +    a_1 = sigmoid(z_1)
    +    # weighted sum of inputs to the output layer
    +    z_2 = np.matmul(a_1, w_2) + b_2
    +    a_2 = z_2
    +    return a_1, a_2
    +
    +def backpropagation(x, y):
    +    a_1, a_2 = forwardpropagation(x)
    +    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1
    +    delta_2 = a_2 - y
    +    print(0.5*((a_2-y)**2))
    +    # delta for  the hidden layer
    +    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)
    +    # gradients for the output layer
    +    output_weights_gradient = np.matmul(a_1.T, delta_2)
    +    output_bias_gradient = np.sum(delta_2, axis=0)
    +    # gradient for the hidden layer
    +    hidden_weights_gradient = np.matmul(x.T, delta_1)
    +    hidden_bias_gradient = np.sum(delta_1, axis=0)
    +    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
    +
    +
    +# ensure the same random numbers appear every time
    +np.random.seed(0)
    +# Input variable
    +x = np.array([4.0],dtype=np.float64)
    +# Target values
    +y = 2*x+1.0 
    +
    +# Defining the neural network, only scalars here
    +n_inputs = x.shape
    +n_features = 1
    +n_hidden_neurons = 1
    +n_outputs = 1
    +
    +# Initialize the network
    +# weights and bias in the hidden layer
    +w_1 = np.random.randn(n_features, n_hidden_neurons)
    +b_1 = np.zeros(n_hidden_neurons) + 0.01
    +
    +# weights and bias in the output layer
    +w_2 = np.random.randn(n_hidden_neurons, n_outputs)
    +b_2 = np.zeros(n_outputs) + 0.01
    +
    +eta = 0.1
    +for i in range(50):
    +    # calculate gradients
    +    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)
    +    # update weights and biases
    +    w_2 -= eta * derivW2
    +    b_2 -= eta * derivB2
    +    w_1 -= eta * derivW1
    +    b_1 -= eta * derivB1
    +
    +
    +
    +
    +

    We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.

    +
    +
    +

    Exercise 1: Including more data#

    +

    Try to increase the amount of input and +target/output data. Try also to perform calculations for more values +of the learning rates. Feel free to add either hyperparameters with an +\(l_1\) norm or an \(l_2\) norm and discuss your results. +Discuss your results as functions of the amount of training data and various learning rates.

    +

    Challenge: Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either autograd or JAX.

    +
    +
    +

    Simple neural network and the back propagation equations#

    +

    Let us now try to increase our level of ambition and attempt at setting +up the equations for a neural network with two input nodes, one hidden +layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..

    +

    We need to define the following parameters and variables with the input layer (layer \((0)\)) +where we label the nodes \(x_0\) and \(x_1\)

    +
    +\[ +x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}. +\]
    +

    The hidden layer (layer \((1)\)) has nodes which yield the outputs \(a_0^{(1)}\) and \(a_1^{(1)}\)) with weight \(\boldsymbol{w}\) and bias \(\boldsymbol{b}\) parameters

    +
    +\[ +w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}. +\]
    +
    +
    +

    Layout of a simple neural network with two input nodes, one hidden layer and one output node#

    + + +

    Figure 1:

    +
    +
    +

    The ouput layer#

    +

    Finally, we have the ouput layer given by layer label \((2)\) with output \(a^{(2)}\) and weights and biases to be determined given by the variables

    +
    +\[ +w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}. +\]
    +

    Our output is \(\tilde{y}=a^{(2)}\) and we define a generic cost function \(C(a^{(2)},y;\boldsymbol{\Theta})\) where \(y\) is the target value (a scalar here). +The parameters we need to optimize are given by

    +
    +\[ +\boldsymbol{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}. +\]
    +
    +
    +

    Compact expressions#

    +

    We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions. +The inputs to the first hidden layer are

    +
    +\[\begin{split} +\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\ w_{10}^{(1)} &w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix}, +\end{split}\]
    +

    with outputs

    +
    +\[\begin{split} +\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}. +\end{split}\]
    +
    +
    +

    Output layer#

    +

    For the final output layer we have the inputs to the final activation function

    +
    +\[ +z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)}, +\]
    +

    resulting in the output

    +
    +\[ +a^{(2)}=\sigma^{(2)}(z^{(2)}). +\]
    +
    +
    +

    Explicit derivatives#

    +

    In total we have nine parameters which we need to train. Using the +chain rule (or just the back-propagation algorithm) we can find all +derivatives. Since we will use automatic differentiation in reverse +mode, we start with the derivatives of the cost function with respect +to the parameters of the output layer, namely

    +
    +\[ +\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)}, +\]
    +

    with

    +
    +\[ +\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}} +\]
    +

    and finally

    +
    +\[ +\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}. +\]
    +
    +
    +

    Derivatives of the hidden layer#

    +

    Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)

    +
    +\[ +\frac{\partial C}{\partial w_{00}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}} +\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}= \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}, +\]
    +

    which, noting that

    +
    +\[ +z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)}, +\]
    +

    allows us to rewrite

    +
    +\[ +\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}=w_0^{(2)}\frac{\partial a_0^{(1)}}{\partial z_0^{(1)}}a_0^{(1)}. +\]
    +
    +
    +

    Final expression#

    +

    Defining

    +
    +\[ +\delta_0^{(1)}=w_0^{(2)}\frac{\partial a_0^{(1)}}{\partial z_0^{(1)}}\delta^{(2)}, +\]
    +

    we have

    +
    +\[ +\frac{\partial C}{\partial w_{00}^{(1)}}=\delta_0^{(1)}a_0^{(1)}. +\]
    +

    Similarly, we obtain

    +
    +\[ +\frac{\partial C}{\partial w_{01}^{(1)}}=\delta_0^{(1)}a_1^{(1)}. +\]
    +
    +
    +

    Completing the list#

    +

    Similarly, we find

    +
    +\[ +\frac{\partial C}{\partial w_{10}^{(1)}}=\delta_1^{(1)}a_0^{(1)}, +\]
    +

    and

    +
    +\[ +\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)}, +\]
    +

    where we have defined

    +
    +\[ +\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)}. +\]
    +
    +
    +

    Final expressions for the biases of the hidden layer#

    +

    For the sake of completeness, we list the derivatives of the biases, which are

    +
    +\[ +\frac{\partial C}{\partial b_{0}^{(1)}}=\delta_0^{(1)}, +\]
    +

    and

    +
    +\[ +\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)}. +\]
    +

    As we will see below, these expressions can be generalized in a more compact form.

    +
    +
    +

    Gradient expressions#

    +

    For this specific model, with just one output node and two hidden +nodes, the gradient descent equations take the following form for output layer

    +
    +\[ +w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)}, +\]
    +

    and

    +
    +\[ +b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)}, +\]
    +

    and

    +
    +\[ +w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)}, +\]
    +

    and

    +
    +\[ +b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)}, +\]
    +

    where \(\eta\) is the learning rate.

    +
    +
    +

    Exercise 2: Extended program#

    +

    We extend our simple code to a function which depends on two variable \(x_0\) and \(x_1\), that is

    +
    +\[ +y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5. +\]
    +

    We feed our network with \(n=100\) entries \(x_0\) and \(x_1\). We have thus two features represented by these variable and an input matrix/design matrix \(\boldsymbol{X}\in \mathbf{R}^{n\times 2}\)

    +
    +\[\begin{split} +\boldsymbol{X}=\begin{bmatrix} x_{00} & x_{01} \\ x_{00} & x_{01} \\ x_{10} & x_{11} \\ x_{20} & x_{21} \\ \dots & \dots \\ \dots & \dots \\ x_{n-20} & x_{n-21} \\ x_{n-10} & x_{n-11} \end{bmatrix}. +\end{split}\]
    +

    Write a code, based on the previous code examples, which takes as input these data and fit the above function. +You can extend your code to include automatic differentiation.

    +

    With these examples, we are now ready to embark upon the writing of more a general code for neural networks.

    +
    +
    +

    Getting serious, the back propagation equations for a neural network#

    +

    Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.

    +

    We have thus

    +
    +\[ +\frac{\partial{\cal C}((\boldsymbol{\Theta}^L)}{\partial w_{jk}^L} = \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_k^{L-1}, +\]
    +

    Defining

    +
    +\[ +\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}, +\]
    +

    and using the Hadamard product of two vectors we can write this as

    +
    +\[ +\boldsymbol{\delta}^L = \sigma'(\hat{z}^L)\circ\frac{\partial {\cal C}}{\partial (\boldsymbol{a}^L)}. +\]
    +
    +
    +

    Analyzing the last results#

    +

    This is an important expression. The second term on the right handside +measures how fast the cost function is changing as a function of the \(j\)th +output activation. If, for example, the cost function doesn’t depend +much on a particular output node \(j\), then \(\delta_j^L\) will be small, +which is what we would expect. The first term on the right, measures +how fast the activation function \(f\) is changing at a given activation +value \(z_j^L\).

    +
    +
    +

    More considerations#

    +

    Notice that everything in the above equations is easily computed. In +particular, we compute \(z_j^L\) while computing the behaviour of the +network, and it is only a small additional overhead to compute +\(\sigma'(z^L_j)\). The exact form of the derivative with respect to the +output depends on the form of the cost function. +However, provided the cost function is known there should be little +trouble in calculating

    +
    +\[ +\frac{\partial {\cal C}}{\partial (a_j^L)} +\]
    +

    With the definition of \(\delta_j^L\) we have a more compact definition of the derivative of the cost function in terms of the weights, namely

    +
    +\[ +\frac{\partial{\cal C}}{\partial w_{jk}^L} = \delta_j^La_k^{L-1}. +\]
    +
    +
    +

    Derivatives in terms of \(z_j^L\)#

    +

    It is also easy to see that our previous equation can be written as

    +
    +\[ +\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L}, +\]
    +

    which can also be interpreted as the partial derivative of the cost function with respect to the biases \(b_j^L\), namely

    +
    +\[ +\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L}, +\]
    +

    That is, the error \(\delta_j^L\) is exactly equal to the rate of change of the cost function as a function of the bias.

    +
    +
    +

    Bringing it together#

    +

    We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are

    + +
    +
    +\[ +\begin{equation} +\frac{\partial{\cal C}(\hat{W^L})}{\partial w_{jk}^L} = \delta_j^La_k^{L-1}, +\label{_auto1} \tag{2} +\end{equation} +\]
    +

    and

    + +
    +
    +\[ +\begin{equation} +\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}, +\label{_auto2} \tag{3} +\end{equation} +\]
    +

    and

    + +
    +
    +\[ +\begin{equation} +\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}, +\label{_auto3} \tag{4} +\end{equation} +\]
    +
    +
    +

    Final back propagating equation#

    +

    We have that (replacing \(L\) with a general layer \(l\))

    +
    +\[ +\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}. +\]
    +

    We want to express this in terms of the equations for layer \(l+1\).

    +
    +
    +

    Using the chain rule and summing over all \(k\) entries#

    +

    We obtain

    +
    +\[ +\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}, +\]
    +

    and recalling that

    +
    +\[ +z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1}, +\]
    +

    with \(M_l\) being the number of nodes in layer \(l\), we obtain

    +
    +\[ +\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l), +\]
    +

    This is our final equation.

    +

    We are now ready to set up the algorithm for back propagation and learning the weights and biases.

    +
    +
    +

    Setting up the back propagation algorithm#

    +

    The four equations provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.

    +

    First, we set up the input data \(\hat{x}\) and the activations +\(\hat{z}_1\) of the input layer and compute the activation function and +the pertinent outputs \(\hat{a}^1\).

    +

    Secondly, we perform then the feed forward till we reach the output +layer and compute all \(\hat{z}_l\) of the input layer and compute the +activation function and the pertinent outputs \(\hat{a}^l\) for +\(l=1,2,3,\dots,L\).

    +

    Notation: The first hidden layer has \(l=1\) as label and the final output layer has \(l=L\).

    +
    +
    +

    Setting up the back propagation algorithm, part 2#

    +

    Thereafter we compute the ouput error \(\hat{\delta}^L\) by computing all

    +
    +\[ +\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}. +\]
    +

    Then we compute the back propagate error for each \(l=L-1,L-2,\dots,1\) as

    +
    +\[ +\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l). +\]
    +
    +
    +

    Setting up the Back propagation algorithm, part 3#

    +

    Finally, we update the weights and the biases using gradient descent +for each \(l=L-1,L-2,\dots,1\) and update the weights and biases +according to the rules

    +
    +\[ +w_{jk}^l\leftarrow = w_{jk}^l- \eta \delta_j^la_k^{l-1}, +\]
    +
    +\[ +b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l, +\]
    +

    with \(\eta\) being the learning rate.

    +
    +
    +

    Updating the gradients#

    +

    With the back propagate error for each \(l=L-1,L-2,\dots,1\) as

    +
    +\[ +\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l), +\]
    +

    we update the weights and the biases using gradient descent for each \(l=L-1,L-2,\dots,1\) and update the weights and biases according to the rules

    +
    +\[ +w_{jk}^l\leftarrow = w_{jk}^l- \eta \delta_j^la_k^{l-1}, +\]
    +
    +\[ +b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l, +\]
    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week42.html b/doc/LectureNotes/_build/html/week42.html new file mode 100644 index 000000000..efa142492 --- /dev/null +++ b/doc/LectureNotes/_build/html/week42.html @@ -0,0 +1,4066 @@ + + + + + + + + + + + Week 42 Constructing a Neural Network code with examples — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 42 Constructing a Neural Network code with examples

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 42 Constructing a Neural Network code with examples#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo, Norway

    +

    Date: October 13-17, 2025

    +
    +

    Lecture October 13, 2025#

    +
      +
    1. Building our own Feed-forward Neural Network and discussion of project 2

    2. +
    3. Project 2 is available at CompPhysics/MachineLearning

    4. +
    +
    +
    +

    Readings and videos#

    +
      +
    1. These lecture notes

    2. +
    3. Video of lecture at https://youtu.be/eqyNrEYRXnY

    4. +
    5. Whiteboard notes at CompPhysics/MachineLearning

    6. +
    7. For a more in depth discussion on neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.

    8. +
    9. Neural Networks demystified at https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs

    10. +
    11. Building Neural Networks from scratch at https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex

    12. +
    13. Video on Neural Networks at https://www.youtube.com/watch?v=CqOfi41LfDw

    14. +
    15. Video on the back propagation algorithm at https://www.youtube.com/watch?v=Ilg3gGewQ5U

    16. +
    +

    I also recommend Michael Nielsen’s intuitive approach to the neural networks and the universal approximation theorem, see the slides at http://neuralnetworksanddeeplearning.com/chap4.html.

    +
    +
    +

    Material for the lab sessions on Tuesday and Wednesday#

    +
      +
    1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html

    2. +
    3. Discussion of project 2

    4. +
    +
    +
    +

    Lecture material: Writing a code which implements a feed-forward neural network#

    +

    Last week we discussed the basics of neural networks and deep learning +and the basics of automatic differentiation. We looked also at +examples on how compute the parameters of a simple network with scalar +inputs and ouputs and no or just one hidden layers.

    +

    We ended our discussions with the derivation of the equations for a +neural network with one hidden layers and two input variables and two +hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.

    +
    +
    +

    Mathematics of deep learning#

    +

    Two recent books online.

    +
      +
    1. The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen, published as Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022

    2. +
    3. Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

    4. +
    +
    +
    +

    Reminder on books with hands-on material and codes#

    + +
    +
    +

    Reading recommendations#

    +
      +
    1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub

    2. +
    3. Goodfellow et al, chapter 6 and 7 contain most of the neural network background.

    4. +
    +
    +
    +

    Reminder from last week: First network example, simple percepetron with one input#

    +

    As yet another example we define now a simple perceptron model with +all quantities given by scalars. We consider only one input variable +\(x\) and one target value \(y\). We define an activation function +\(\sigma_1\) which takes as input

    +
    +\[ +z_1 = w_1x+b_1, +\]
    +

    where \(w_1\) is the weight and \(b_1\) is the bias. These are the +parameters we want to optimize. The output is \(a_1=\sigma(z_1)\) (see +graph from whiteboard notes). This output is then fed into the +cost/loss function, which we here for the sake of simplicity just +define as the squared error

    +
    +\[ +C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2. +\]
    +
    +
    +

    Layout of a simple neural network with no hidden layer#

    + + +

    Figure 1:

    +
    +
    +

    Optimizing the parameters#

    +

    In setting up the feed forward and back propagation parts of the +algorithm, we need now the derivative of the various variables we want +to train.

    +

    We need

    +
    +\[ +\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}. +\]
    +

    Using the chain rule we find

    +
    +\[ +\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x, +\]
    +

    and

    +
    +\[ +\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1', +\]
    +

    which we later will just define as

    +
    +\[ +\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1. +\]
    +
    +
    +

    Adding a hidden layer#

    +

    We change our simple model to (see graph) +a network with just one hidden layer but with scalar variables only.

    +

    Our output variable changes to \(a_2\) and \(a_1\) is now the output from the hidden node and \(a_0=x\). +We have then

    +
    +\[ +z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1), +\]
    +
    +\[ +z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2), +\]
    +

    and the cost function

    +
    +\[ +C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2, +\]
    +

    with \(\boldsymbol{\Theta}=[w_1,w_2,b_1,b_2]\).

    +
    +
    +

    Layout of a simple neural network with one hidden layer#

    + + +

    Figure 1:

    +
    +
    +

    The derivatives#

    +

    The derivatives are now, using the chain rule again

    +
    +\[ +\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1, +\]
    +
    +\[ +\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2, +\]
    +
    +\[ +\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0, +\]
    +
    +\[ +\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1. +\]
    +

    Can you generalize this to more than one hidden layer?

    +
    +
    +

    Important observations#

    +

    From the above equations we see that the derivatives of the activation +functions play a central role. If they vanish, the training may +stop. This is called the vanishing gradient problem, see discussions below. If they become +large, the parameters \(w_i\) and \(b_i\) may simply go to infinity. This +is referenced as the exploding gradient problem.

    +
    +
    +

    The training#

    +

    The training of the parameters is done through various gradient descent approximations with

    +
    +\[ +w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1}, +\]
    +

    and

    +
    +\[ +b_i \leftarrow b_i-\eta \delta_i, +\]
    +

    with \(\eta\) is the learning rate.

    +

    One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters \(\boldsymbol{\Theta}\).

    +

    For the first hidden layer \(a_{i-1}=a_0=x\) for this simple model.

    +
    +
    +

    Code example#

    +

    The code here implements the above model with one hidden layer and +scalar variables for the same function we studied in the previous +example. The code is however set up so that we can add multiple +inputs \(x\) and target values \(y\). Note also that we have the +possibility of defining a feature matrix \(\boldsymbol{X}\) with more than just +one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.

    +
    +
    +
    import numpy as np
    +# We use the Sigmoid function as activation function
    +def sigmoid(z):
    +    return 1.0/(1.0+np.exp(-z))
    +
    +def forwardpropagation(x):
    +    # weighted sum of inputs to the hidden layer
    +    z_1 = np.matmul(x, w_1) + b_1
    +    # activation in the hidden layer
    +    a_1 = sigmoid(z_1)
    +    # weighted sum of inputs to the output layer
    +    z_2 = np.matmul(a_1, w_2) + b_2
    +    a_2 = z_2
    +    return a_1, a_2
    +
    +def backpropagation(x, y):
    +    a_1, a_2 = forwardpropagation(x)
    +    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1
    +    delta_2 = a_2 - y
    +    print(0.5*((a_2-y)**2))
    +    # delta for  the hidden layer
    +    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)
    +    # gradients for the output layer
    +    output_weights_gradient = np.matmul(a_1.T, delta_2)
    +    output_bias_gradient = np.sum(delta_2, axis=0)
    +    # gradient for the hidden layer
    +    hidden_weights_gradient = np.matmul(x.T, delta_1)
    +    hidden_bias_gradient = np.sum(delta_1, axis=0)
    +    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
    +
    +
    +# ensure the same random numbers appear every time
    +np.random.seed(0)
    +# Input variable
    +x = np.array([4.0],dtype=np.float64)
    +# Target values
    +y = 2*x+1.0 
    +
    +# Defining the neural network, only scalars here
    +n_inputs = x.shape
    +n_features = 1
    +n_hidden_neurons = 1
    +n_outputs = 1
    +
    +# Initialize the network
    +# weights and bias in the hidden layer
    +w_1 = np.random.randn(n_features, n_hidden_neurons)
    +b_1 = np.zeros(n_hidden_neurons) + 0.01
    +
    +# weights and bias in the output layer
    +w_2 = np.random.randn(n_hidden_neurons, n_outputs)
    +b_2 = np.zeros(n_outputs) + 0.01
    +
    +eta = 0.1
    +for i in range(50):
    +    # calculate gradients
    +    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)
    +    # update weights and biases
    +    w_2 -= eta * derivW2
    +    b_2 -= eta * derivB2
    +    w_1 -= eta * derivW1
    +    b_1 -= eta * derivB1
    +
    +
    +
    +
    +

    We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.

    +
    +
    +

    Simple neural network and the back propagation equations#

    +

    Let us now try to increase our level of ambition and attempt at setting +up the equations for a neural network with two input nodes, one hidden +layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..

    +

    We need to define the following parameters and variables with the input layer (layer \((0)\)) +where we label the nodes \(x_1\) and \(x_2\)

    +
    +\[ +x_1 = a_1^{(0)} \wedge x_2 = a_2^{(0)}. +\]
    +

    The hidden layer (layer \((1)\)) has nodes which yield the outputs \(a_1^{(1)}\) and \(a_2^{(1)}\)) with weight \(\boldsymbol{w}\) and bias \(\boldsymbol{b}\) parameters

    +
    +\[ +w_{ij}^{(1)}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\right\} \wedge b^{(1)}=\left\{b_1^{(1)},b_2^{(1)}\right\}. +\]
    +
    +
    +

    Layout of a simple neural network with two input nodes, one hidden layer with two hidden noeds and one output node#

    + + +

    Figure 1:

    +
    +
    +

    The ouput layer#

    +

    We have the ouput layer given by layer label \((2)\) with output \(a^{(2)}\) and weights and biases to be determined given by the variables

    +
    +\[ +w_{i}^{(2)}=\left\{w_{1}^{(2)},w_{2}^{(2)}\right\} \wedge b^{(2)}. +\]
    +

    Our output is \(\tilde{y}=a^{(2)}\) and we define a generic cost function \(C(a^{(2)},y;\boldsymbol{\Theta})\) where \(y\) is the target value (a scalar here). +The parameters we need to optimize are given by

    +
    +\[ +\boldsymbol{\Theta}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\right\}. +\]
    +
    +
    +

    Compact expressions#

    +

    We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions. +The inputs to the first hidden layer are

    +
    +\[\begin{split} +\begin{bmatrix}z_1^{(1)} \\ z_2^{(1)} \end{bmatrix}=\left(\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\ w_{21}^{(1)} &w_{22}^{(1)} \end{bmatrix}\right)^{T}\begin{bmatrix}a_1^{(0)} \\ a_2^{(0)} \end{bmatrix}+\begin{bmatrix}b_1^{(1)} \\ b_2^{(1)} \end{bmatrix}, +\end{split}\]
    +

    with outputs

    +
    +\[\begin{split} +\begin{bmatrix}a_1^{(1)} \\ a_2^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_1^{(1)}) \\ \sigma^{(1)}(z_2^{(1)}) \end{bmatrix}. +\end{split}\]
    +
    +
    +

    Output layer#

    +

    For the final output layer we have the inputs to the final activation function

    +
    +\[ +z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)}, +\]
    +

    resulting in the output

    +
    +\[ +a^{(2)}=\sigma^{(2)}(z^{(2)}). +\]
    +
    +
    +

    Explicit derivatives#

    +

    In total we have nine parameters which we need to train. Using the +chain rule (or just the back-propagation algorithm) we can find all +derivatives. Since we will use automatic differentiation in reverse +mode, we start with the derivatives of the cost function with respect +to the parameters of the output layer, namely

    +
    +\[ +\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)}, +\]
    +

    with

    +
    +\[ +\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}} +\]
    +

    and finally

    +
    +\[ +\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}. +\]
    +
    +
    +

    Derivatives of the hidden layer#

    +

    Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)

    +
    +\[ +\frac{\partial C}{\partial w_{11}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}} +\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}= \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}, +\]
    +

    which, noting that

    +
    +\[ +z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)}, +\]
    +

    allows us to rewrite

    +
    +\[ +\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}a_1^{(1)}. +\]
    +
    +
    +

    Final expression#

    +

    Defining

    +
    +\[ +\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)}, +\]
    +

    we have

    +
    +\[ +\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)}. +\]
    +

    Similarly, we obtain

    +
    +\[ +\frac{\partial C}{\partial w_{12}^{(1)}}=\delta_1^{(1)}a_2^{(1)}. +\]
    +
    +
    +

    Completing the list#

    +

    Similarly, we find

    +
    +\[ +\frac{\partial C}{\partial w_{21}^{(1)}}=\delta_2^{(1)}a_1^{(1)}, +\]
    +

    and

    +
    +\[ +\frac{\partial C}{\partial w_{22}^{(1)}}=\delta_2^{(1)}a_2^{(1)}, +\]
    +

    where we have defined

    +
    +\[ +\delta_2^{(1)}=w_2^{(2)}\frac{\partial a_2^{(1)}}{\partial z_2^{(1)}}\delta^{(2)}. +\]
    +
    +
    +

    Final expressions for the biases of the hidden layer#

    +

    For the sake of completeness, we list the derivatives of the biases, which are

    +
    +\[ +\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)}, +\]
    +

    and

    +
    +\[ +\frac{\partial C}{\partial b_{2}^{(1)}}=\delta_2^{(1)}. +\]
    +

    As we will see below, these expressions can be generalized in a more compact form.

    +
    +
    +

    Gradient expressions#

    +

    For this specific model, with just one output node and two hidden +nodes, the gradient descent equations take the following form for output layer

    +
    +\[ +w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)}, +\]
    +

    and

    +
    +\[ +b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)}, +\]
    +

    and

    +
    +\[ +w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)}, +\]
    +

    and

    +
    +\[ +b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)}, +\]
    +

    where \(\eta\) is the learning rate.

    +
    +
    +

    Setting up the equations for a neural network#

    +

    The questions we want to ask are how do changes in the biases and the +weights in our network change the cost function and how can we use the +final output to modify the weights and biases?

    +

    To derive these equations let us start with a plain regression problem +and define our cost function as

    +
    +\[ +{\cal C}(\boldsymbol{\Theta}) = \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2, +\]
    +

    where the \(y_i\)s are our \(n\) targets (the values we want to +reproduce), while the outputs of the network after having propagated +all inputs \(\boldsymbol{x}\) are given by \(\boldsymbol{\tilde{y}}_i\).

    +
    +
    +

    Layout of a neural network with three hidden layers (last layer = \(l=L=4\), first layer \(l=0\))#

    + + +

    Figure 1:

    +
    +
    +

    Definitions#

    +

    With our definition of the targets \(\boldsymbol{y}\), the outputs of the +network \(\boldsymbol{\tilde{y}}\) and the inputs \(\boldsymbol{x}\) we +define now the activation \(z_j^l\) of node/neuron/unit \(j\) of the +\(l\)-th layer as a function of the bias, the weights which add up from +the previous layer \(l-1\) and the forward passes/outputs +\(\boldsymbol{a}^{l-1}\) from the previous layer as

    +
    +\[ +z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l, +\]
    +

    where \(b_k^l\) are the biases from layer \(l\). Here \(M_{l-1}\) +represents the total number of nodes/neurons/units of layer \(l-1\). The +figure in the whiteboard notes illustrates this equation. We can rewrite this in a more +compact form as the matrix-vector products we discussed earlier,

    +
    +\[ +\boldsymbol{z}^l = \left(\boldsymbol{W}^l\right)^T\boldsymbol{a}^{l-1}+\boldsymbol{b}^l. +\]
    +
    +
    +

    Inputs to the activation function#

    +

    With the activation values \(\boldsymbol{z}^l\) we can in turn define the +output of layer \(l\) as \(\boldsymbol{a}^l = \sigma(\boldsymbol{z}^l)\) where \(\sigma\) is our +activation function. In the examples here we will use the sigmoid +function discussed in our logistic regression lectures. We will also use the same activation function \(\sigma\) for all layers +and their nodes. It means we have

    +
    +\[ +a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}. +\]
    +
    +
    +

    Layout of input to first hidden layer \(l=1\) from input layer \(l=0\)#

    + + +

    Figure 1:

    +
    +
    +

    Derivatives and the chain rule#

    +

    From the definition of the input variable to the activation function, that is \(z_j^l\) we have

    +
    +\[ +\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1}, +\]
    +

    and

    +
    +\[ +\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. +\]
    +

    With our definition of the activation function we have that (note that this function depends only on \(z_j^l\))

    +
    +\[ +\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)). +\]
    +
    +
    +

    Derivative of the cost function#

    +

    With these definitions we can now compute the derivative of the cost function in terms of the weights.

    +

    Let us specialize to the output layer \(l=L\). Our cost function is

    +
    +\[ +{\cal C}(\boldsymbol{\Theta}^L) = \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2, +\]
    +

    The derivative of this function with respect to the weights is

    +
    +\[ +\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{ij}^L} = \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}}, +\]
    +

    The last partial derivative can easily be computed and reads (by applying the chain rule)

    +
    +\[ +\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}. +\]
    +
    +
    +

    The back propagation equations for a neural network#

    +

    We have thus

    +
    +\[ +\frac{\partial{\cal C}((\boldsymbol{\Theta}^L)}{\partial w_{ij}^L} = \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_i^{L-1}, +\]
    +

    Defining

    +
    +\[ +\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}, +\]
    +

    and using the Hadamard product of two vectors we can write this as

    +
    +\[ +\boldsymbol{\delta}^L = \sigma'(\boldsymbol{z}^L)\circ\frac{\partial {\cal C}}{\partial (\boldsymbol{a}^L)}. +\]
    +
    +
    +

    Analyzing the last results#

    +

    This is an important expression. The second term on the right handside +measures how fast the cost function is changing as a function of the \(j\)th +output activation. If, for example, the cost function doesn’t depend +much on a particular output node \(j\), then \(\delta_j^L\) will be small, +which is what we would expect. The first term on the right, measures +how fast the activation function \(f\) is changing at a given activation +value \(z_j^L\).

    +
    +
    +

    More considerations#

    +

    Notice that everything in the above equations is easily computed. In +particular, we compute \(z_j^L\) while computing the behaviour of the +network, and it is only a small additional overhead to compute +\(\sigma'(z^L_j)\). The exact form of the derivative with respect to the +output depends on the form of the cost function. +However, provided the cost function is known there should be little +trouble in calculating

    +
    +\[ +\frac{\partial {\cal C}}{\partial (a_j^L)} +\]
    +

    With the definition of \(\delta_j^L\) we have a more compact definition of the derivative of the cost function in terms of the weights, namely

    +
    +\[ +\frac{\partial{\cal C}}{\partial w_{ij}^L} = \delta_j^La_i^{L-1}. +\]
    +
    +
    +

    Derivatives in terms of \(z_j^L\)#

    +

    It is also easy to see that our previous equation can be written as

    +
    +\[ +\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L}, +\]
    +

    which can also be interpreted as the partial derivative of the cost function with respect to the biases \(b_j^L\), namely

    +
    +\[ +\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L}, +\]
    +

    That is, the error \(\delta_j^L\) is exactly equal to the rate of change of the cost function as a function of the bias.

    +
    +
    +

    Bringing it together#

    +

    We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are

    + +
    +
    +\[ +\begin{equation} +\frac{\partial{\cal C}(\boldsymbol{W^L})}{\partial w_{ij}^L} = \delta_j^La_i^{L-1}, +\label{_auto1} \tag{1} +\end{equation} +\]
    +

    and

    + +
    +
    +\[ +\begin{equation} +\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}, +\label{_auto2} \tag{2} +\end{equation} +\]
    +

    and

    + +
    +
    +\[ +\begin{equation} +\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}, +\label{_auto3} \tag{3} +\end{equation} +\]
    +
    +
    +

    Final back propagating equation#

    +

    We have that (replacing \(L\) with a general layer \(l\))

    +
    +\[ +\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}. +\]
    +

    We want to express this in terms of the equations for layer \(l+1\).

    +
    +
    +

    Using the chain rule and summing over all \(k\) entries#

    +

    We obtain

    +
    +\[ +\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}, +\]
    +

    and recalling that

    +
    +\[ +z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1}, +\]
    +

    with \(M_l\) being the number of nodes in layer \(l\), we obtain

    +
    +\[ +\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l), +\]
    +

    This is our final equation.

    +

    We are now ready to set up the algorithm for back propagation and learning the weights and biases.

    +
    +
    +

    Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations#

    +

    The architecture (our model).

    +
      +
    1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)

    2. +
    3. Define the number of hidden layers and hidden nodes

    4. +
    5. Define activation functions for hidden layers and output layers

    6. +
    7. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates

    8. +
    9. Define cost function and possible regularization terms with hyperparameters

    10. +
    11. Initialize weights and biases

    12. +
    13. Fix number of iterations for the feed forward part and back propagation part

    14. +
    +
    +
    +

    Setting up the back propagation algorithm, part 1#

    +

    The four equations provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.

    +

    First, we set up the input data \(\boldsymbol{x}\) and the activations +\(\boldsymbol{z}_1\) of the input layer and compute the activation function and +the pertinent outputs \(\boldsymbol{a}^1\).

    +

    Secondly, we perform then the feed forward till we reach the output +layer and compute all \(\boldsymbol{z}_l\) of the input layer and compute the +activation function and the pertinent outputs \(\boldsymbol{a}^l\) for +\(l=1,2,3,\dots,L\).

    +

    Notation: The first hidden layer has \(l=1\) as label and the final output layer has \(l=L\).

    +
    +
    +

    Setting up the back propagation algorithm, part 2#

    +

    Thereafter we compute the ouput error \(\boldsymbol{\delta}^L\) by computing all

    +
    +\[ +\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}. +\]
    +

    Then we compute the back propagate error for each \(l=L-1,L-2,\dots,1\) as

    +
    +\[ +\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l). +\]
    +
    +
    +

    Setting up the Back propagation algorithm, part 3#

    +

    Finally, we update the weights and the biases using gradient descent +for each \(l=L-1,L-2,\dots,1\) (the first hidden layer) and update the weights and biases +according to the rules

    +
    +\[ +w_{ij}^l\leftarrow = w_{ij}^l- \eta \delta_j^la_i^{l-1}, +\]
    +
    +\[ +b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l, +\]
    +

    with \(\eta\) being the learning rate.

    +
    +
    +

    Updating the gradients#

    +

    With the back propagate error for each \(l=L-1,L-2,\dots,1\) as

    +
    +\[ +\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l), +\]
    +

    we update the weights and the biases using gradient descent for each \(l=L-1,L-2,\dots,1\) and update the weights and biases according to the rules

    +
    +\[ +w_{ij}^l\leftarrow = w_{ij}^l- \eta \delta_j^la_i^{l-1}, +\]
    +
    +\[ +b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l, +\]
    +
    +
    +

    Activation functions#

    +

    A property that characterizes a neural network, other than its +connectivity, is the choice of activation function(s). The following +restrictions are imposed on an activation function for an FFNN to +fulfill the universal approximation theorem

    +
      +
    • Non-constant

    • +
    • Bounded

    • +
    • Monotonically-increasing

    • +
    • Continuous

    • +
    +
    +

    Activation functions, Logistic and Hyperbolic ones#

    +

    The second requirement excludes all linear functions. Furthermore, in +a MLP with only linear activation functions, each layer simply +performs a linear transformation of its inputs.

    +

    Regardless of the number of layers, the output of the NN will be +nothing but a linear function of the inputs. Thus we need to introduce +some kind of non-linearity to the NN to be able to fit non-linear +functions Typical examples are the logistic Sigmoid

    +
    +\[ +\sigma(x) = \frac{1}{1 + e^{-x}}, +\]
    +

    and the hyperbolic tangent function

    +
    +\[ +\sigma(x) = \tanh(x) +\]
    +
    +
    +
    +

    Relevance#

    +

    The sigmoid function are more biologically plausible because the +output of inactive neurons are zero. Such activation function are +called one-sided. However, it has been shown that the hyperbolic +tangent performs better than the sigmoid for training MLPs. has +become the most popular for deep neural networks

    +
    +
    +
    %matplotlib inline
    +
    +"""The sigmoid function (or the logistic curve) is a 
    +function that takes any real number, z, and outputs a number (0,1).
    +It is useful in neural networks for assigning weights on a relative scale.
    +The value z is the weighted sum of parameters involved in the learning algorithm."""
    +
    +import numpy
    +import matplotlib.pyplot as plt
    +import math as mt
    +
    +z = numpy.arange(-5, 5, .1)
    +sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
    +sigma = sigma_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, sigma)
    +ax.set_ylim([-0.1, 1.1])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sigmoid function')
    +
    +plt.show()
    +
    +"""Step Function"""
    +z = numpy.arange(-5, 5, .02)
    +step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
    +step = step_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, step)
    +ax.set_ylim([-0.5, 1.5])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('step function')
    +
    +plt.show()
    +
    +"""Sine Function"""
    +z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
    +t = numpy.sin(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, t)
    +ax.set_ylim([-1.0, 1.0])
    +ax.set_xlim([-2*mt.pi,2*mt.pi])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sine function')
    +
    +plt.show()
    +
    +"""Plots a graph of the squashing function used by a rectified linear
    +unit"""
    +z = numpy.arange(-2, 2, .1)
    +zero = numpy.zeros(len(z))
    +y = numpy.max([zero, z], axis=0)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, y)
    +ax.set_ylim([-2.0, 2.0])
    +ax.set_xlim([-2.0, 2.0])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('Rectified linear unit')
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Vanishing gradients#

    +

    The Back propagation algorithm we derived above works by going from +the output layer to the input layer, propagating the error gradient on +the way. Once the algorithm has computed the gradient of the cost +function with regards to each parameter in the network, it uses these +gradients to update each parameter with a Gradient Descent (GD) step.

    +

    Unfortunately for us, the gradients often get smaller and smaller as +the algorithm progresses down to the first hidden layers. As a result, +the GD update leaves the lower layer connection weights virtually +unchanged, and training never converges to a good solution. This is +known in the literature as the vanishing gradients problem.

    +
    +
    +

    Exploding gradients#

    +

    In other cases, the opposite can happen, namely the the gradients can +grow bigger and bigger. The result is that many of the layers get +large updates of the weights the algorithm diverges. This is the +exploding gradients problem, which is mostly encountered in +recurrent neural networks. More generally, deep neural networks suffer +from unstable gradients, different layers may learn at widely +different speeds

    +
    +
    +

    Is the Logistic activation function (Sigmoid) our choice?#

    +

    Although this unfortunate behavior has been empirically observed for +quite a while (it was one of the reasons why deep neural networks were +mostly abandoned for a long time), it is only around 2010 that +significant progress was made in understanding it.

    +

    A paper titled Understanding the Difficulty of Training Deep +Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio found that +the problems with the popular logistic +sigmoid activation function and the weight initialization technique +that was most popular at the time, namely random initialization using +a normal distribution with a mean of 0 and a standard deviation of +1.

    +
    +
    +

    Logistic function as the root of problems#

    +

    They showed that with this activation function and this +initialization scheme, the variance of the outputs of each layer is +much greater than the variance of its inputs. Going forward in the +network, the variance keeps increasing after each layer until the +activation function saturates at the top layers. This is actually made +worse by the fact that the logistic function has a mean of 0.5, not 0 +(the hyperbolic tangent function has a mean of 0 and behaves slightly +better than the logistic function in deep networks).

    +
    +
    +

    The derivative of the Logistic funtion#

    +

    Looking at the logistic activation function, when inputs become large +(negative or positive), the function saturates at 0 or 1, with a +derivative extremely close to 0. Thus when backpropagation kicks in, +it has virtually no gradient to propagate back through the network, +and what little gradient exists keeps getting diluted as +backpropagation progresses down through the top layers, so there is +really nothing left for the lower layers.

    +

    In their paper, Glorot and Bengio propose a way to significantly +alleviate this problem. We need the signal to flow properly in both +directions: in the forward direction when making predictions, and in +the reverse direction when backpropagating gradients. We don’t want +the signal to die out, nor do we want it to explode and saturate. For +the signal to flow properly, the authors argue that we need the +variance of the outputs of each layer to be equal to the variance of +its inputs, and we also need the gradients to have equal variance +before and after flowing through a layer in the reverse direction.

    +
    +
    +

    Insights from the paper by Glorot and Bengio#

    +

    One of the insights in the 2010 paper by Glorot and Bengio was that +the vanishing/exploding gradients problems were in part due to a poor +choice of activation function. Until then most people had assumed that +if Nature had chosen to use roughly sigmoid activation functions in +biological neurons, they must be an excellent choice. But it turns out +that other activation functions behave much better in deep neural +networks, in particular the ReLU activation function, mostly because +it does not saturate for positive values (and also because it is quite +fast to compute).

    +
    +
    +

    The RELU function family#

    +

    The ReLU activation function suffers from a problem known as the dying +ReLUs: during training, some neurons effectively die, meaning they +stop outputting anything other than 0.

    +

    In some cases, you may find that half of your network’s neurons are +dead, especially if you used a large learning rate. During training, +if a neuron’s weights get updated such that the weighted sum of the +neuron’s inputs is negative, it will start outputting 0. When this +happen, the neuron is unlikely to come back to life since the gradient +of the ReLU function is 0 when its input is negative.

    +
    +
    +

    ELU function#

    +

    To solve this problem, nowadays practitioners use a variant of the +ReLU function, such as the leaky ReLU discussed above or the so-called +exponential linear unit (ELU) function

    +
    +\[\begin{split} +ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\ z & z \ge 0.\end{array}\right. +\end{split}\]
    +
    +
    +

    Which activation function should we use?#

    +

    In general it seems that the ELU activation function is better than +the leaky ReLU function (and its variants), which is better than +ReLU. ReLU performs better than \(\tanh\) which in turn performs better +than the logistic function.

    +

    If runtime performance is an issue, then you may opt for the leaky +ReLU function over the ELU function If you don’t want to tweak yet +another hyperparameter, you may just use the default \(\alpha\) of +\(0.01\) for the leaky ReLU, and \(1\) for ELU. If you have spare time and +computing power, you can use cross-validation or bootstrap to evaluate +other activation functions.

    +
    +
    +

    More on activation functions, output layers#

    +

    In most cases you can use the ReLU activation function in the hidden +layers (or one of its variants).

    +

    It is a bit faster to compute than other activation functions, and the +gradient descent optimization does in general not get stuck.

    +

    For the output layer:

    +
      +
    • For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).

    • +
    • For regression tasks, you can simply use no activation function at all.

    • +
    +
    +
    +

    Fine-tuning neural network hyperparameters#

    +

    The flexibility of neural networks is also one of their main +drawbacks: there are many hyperparameters to tweak. Not only can you +use any imaginable network topology (how neurons/nodes are +interconnected), but even in a simple FFNN you can change the number +of layers, the number of neurons per layer, the type of activation +function to use in each layer, the weight initialization logic, the +stochastic gradient optmized and much more. How do you know what +combination of hyperparameters is the best for your task?

    +
      +
    • You can use grid search with cross-validation to find the right hyperparameters.

    • +
    +

    However,since there are many hyperparameters to tune, and since +training a neural network on a large dataset takes a lot of time, you +will only be able to explore a tiny part of the hyperparameter space.

    +
      +
    • You can use randomized search.

    • +
    • Or use tools like Oscar, which implements more complex algorithms to help you find a good set of hyperparameters quickly.

    • +
    +
    +
    +

    Hidden layers#

    +

    For many problems you can start with just one or two hidden layers and +it will work just fine. For the MNIST data set discussed below you can easily get a +high accuracy using just one hidden layer with a few hundred neurons. +You can reach for this data set above 98% accuracy using two hidden +layers with the same total amount of neurons, in roughly the same +amount of training time.

    +

    For more complex problems, you can gradually ramp up the number of +hidden layers, until you start overfitting the training set. Very +complex tasks, such as large image classification or speech +recognition, typically require networks with dozens of layers and they +need a huge amount of training data. However, you will rarely have to +train such networks from scratch: it is much more common to reuse +parts of a pretrained state-of-the-art network that performs a similar +task.

    +
    +
    +

    Batch Normalization#

    +

    Batch Normalization aims to address the vanishing/exploding gradients +problems, and more generally the problem that the distribution of each +layer’s inputs changes during training, as the parameters of the +previous layers change.

    +

    The technique consists of adding an operation in the model just before +the activation function of each layer, simply zero-centering and +normalizing the inputs, then scaling and shifting the result using two +new parameters per layer (one for scaling, the other for shifting). In +other words, this operation lets the model learn the optimal scale and +mean of the inputs for each layer. In order to zero-center and +normalize the inputs, the algorithm needs to estimate the inputs’ mean +and standard deviation. It does so by evaluating the mean and standard +deviation of the inputs over the current mini-batch, from this the +name batch normalization.

    +
    +
    +

    Dropout#

    +

    It is a fairly simple algorithm: at every training step, every neuron +(including the input neurons but excluding the output neurons) has a +probability \(p\) of being temporarily dropped out, meaning it will be +entirely ignored during this training step, but it may be active +during the next step.

    +

    The hyperparameter \(p\) is called the dropout rate, and it is typically +set to 50%. After training, the neurons are not dropped anymore. It +is viewed as one of the most popular regularization techniques.

    +
    +
    +

    Gradient Clipping#

    +

    A popular technique to lessen the exploding gradients problem is to +simply clip the gradients during backpropagation so that they never +exceed some threshold (this is mostly useful for recurrent neural +networks).

    +

    This technique is called Gradient Clipping.

    +

    In general however, Batch +Normalization is preferred.

    +
    +
    +

    A top-down perspective on Neural networks#

    +

    The first thing we would like to do is divide the data into two or +three parts. A training set, a validation or dev (development) set, +and a test set. The test set is the data on which we want to make +predictions. The dev set is a subset of the training data we use to +check how well we are doing out-of-sample, after training the model on +the training dataset. We use the validation error as a proxy for the +test error in order to make tweaks to our model. It is crucial that we +do not use any of the test data to train the algorithm. This is a +cardinal sin in ML. Then:

    +
      +
    1. Estimate optimal error rate

    2. +
    3. Minimize underfitting (bias) on training data set.

    4. +
    5. Make sure you are not overfitting.

    6. +
    +
    +
    +

    More top-down perspectives#

    +

    If the validation and test sets are drawn from the same distributions, +then a good performance on the validation set should lead to similarly +good performance on the test set.

    +

    However, sometimes +the training data and test data differ in subtle ways because, for +example, they are collected using slightly different methods, or +because it is cheaper to collect data in one way versus another. In +this case, there can be a mismatch between the training and test +data. This can lead to the neural network overfitting these small +differences between the test and training sets, and a poor performance +on the test set despite having a good performance on the validation +set. To rectify this, Andrew Ng suggests making two validation or dev +sets, one constructed from the training data and one constructed from +the test data. The difference between the performance of the algorithm +on these two validation sets quantifies the train-test mismatch. This +can serve as another important diagnostic when using DNNs for +supervised learning.

    +
    +
    +

    Limitations of supervised learning with deep networks#

    +

    Like all statistical methods, supervised learning using neural +networks has important limitations. This is especially important when +one seeks to apply these methods, especially to physics problems. Like +all tools, DNNs are not a universal solution. Often, the same or +better performance on a task can be achieved by using a few +hand-engineered features (or even a collection of random +features).

    +
    +
    +

    Limitations of NNs#

    +

    Here we list some of the important limitations of supervised neural network based models.

    +
      +
    • Need labeled data. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).

    • +
    • Supervised neural networks are extremely data intensive. DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.

    • +
    +
    +
    +

    Homogeneous data#

    +
      +
    • Homogeneous data. Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.

    • +
    +
    +
    +

    More limitations#

    +
      +
    • Many problems are not about prediction. In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a wrong model. The model might or might not be useful for understanding the underlying science.

    • +
    +

    Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.

    +
    +
    +

    Setting up a Multi-layer perceptron model for classification#

    +

    We are now gong to develop an example based on the MNIST data +base. This is a classification problem and we need to use our +cross-entropy function we discussed in connection with logistic +regression. The cross-entropy defines our cost function for the +classificaton problems with neural networks.

    +

    In binary classification with two classes \((0, 1)\) we define the +logistic/sigmoid function as the probability that a particular input +is in class \(0\) or \(1\). This is possible because the logistic +function takes any input from the real numbers and inputs a number +between 0 and 1, and can therefore be interpreted as a probability. It +also has other nice properties, such as a derivative that is simple to +calculate.

    +

    For an input \(\boldsymbol{a}\) from the hidden layer, the probability that the input \(\boldsymbol{x}\) +is in class 0 or 1 is just. We let \(\theta\) represent the unknown weights and biases to be adjusted by our equations). The variable \(x\) +represents our activation values \(z\). We have

    +
    +\[ +P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} , +\]
    +

    and

    +
    +\[ +P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) , +\]
    +

    where \(y \in \{0, 1\}\) and \(\boldsymbol{\theta}\) represents the weights and biases +of our network.

    +
    +
    +

    Defining the cost function#

    +

    Our cost function is given as (see the Logistic regression lectures)

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n +y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) . +\]
    +

    This last equality means that we can interpret our cost function as a sum over the loss function +for each point in the dataset \(\mathcal{L}_i(\boldsymbol{\theta})\).
    +The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather +than maximizing a negative number.

    +

    In multiclass classification it is common to treat each integer label as a so called one-hot vector:

    +

    \(y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,\) and

    +

    \(y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,\)

    +

    i.e. a binary bit string of length \(C\), where \(C = 10\) is the number of classes in the MNIST dataset (numbers from \(0\) to \(9\))..

    +

    If \(\boldsymbol{x}_i\) is the \(i\)-th input (image), \(y_{ic}\) refers to the \(c\)-th component of the \(i\)-th +output vector \(\boldsymbol{y}_i\).
    +The probability of \(\boldsymbol{x}_i\) being in class \(c\) will be given by the softmax function:

    +
    +\[ +P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}} +{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} , +\]
    +

    which reduces to the logistic function in the binary case.
    +The likelihood of this \(C\)-class classifier +is now given as:

    +
    +\[ +P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} . +\]
    +

    Again we take the negative log-likelihood to define our cost function:

    +
    +\[ +\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}. +\]
    +

    See the logistic regression lectures for a full definition of the cost function.

    +

    The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!

    +
    +
    +

    Example: binary classification problem#

    +

    As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \(\beta\) as

    +
    +\[ +\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right), +\]
    +

    where we had defined the logistic (sigmoid) function

    +
    +\[ +p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}}, +\]
    +

    and

    +
    +\[ +p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}). +\]
    +

    The parameters \(\boldsymbol{\beta}\) were defined using a minimization method like gradient descent or Newton-Raphson’s method.

    +

    Now we replace \(x_i\) with the activation \(z_i^l\) for a given layer \(l\) and the outputs as \(y_i=a_i^l=f(z_i^l)\), with \(z_i^l\) now being a function of the weights \(w_{ij}^l\) and biases \(b_i^l\). +We have then

    +
    +\[ +a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}}, +\]
    +

    with

    +
    +\[ +z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l, +\]
    +

    where the superscript \(l-1\) indicates that these are the outputs from layer \(l-1\). +Our cost function at the final layer \(l=L\) is now

    +
    +\[ +\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right), +\]
    +

    where we have defined the targets \(t_i\). The derivatives of the cost function with respect to the output \(a_i^L\) are then easily calculated and we get

    +
    +\[ +\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. +\]
    +

    In case we use another activation function than the logistic one, we need to evaluate other derivatives.

    +
    +
    +

    The Softmax function#

    +

    In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \(z_i^l\), that is we need

    +
    +\[ +\frac{\partial f(z_i^l)}{\partial w_{jk}^l} = +\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}. +\]
    +

    For the Softmax function we have

    +
    +\[ +f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}. +\]
    +

    Its derivative with respect to \(z_j^l\) gives

    +
    +\[ +\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), +\]
    +

    which in case of the simply binary model reduces to having \(i=j\).

    +
    +
    +

    Developing a code for doing neural networks with back propagation#

    +

    One can identify a set of key steps when using neural networks to solve supervised learning problems:

    +
      +
    1. Collect and pre-process data

    2. +
    3. Define model and architecture

    4. +
    5. Choose cost function and optimizer

    6. +
    7. Train the model

    8. +
    9. Evaluate model performance on test data

    10. +
    11. Adjust hyperparameters (if necessary, network architecture)

    12. +
    +
    +
    +

    Collect and pre-process data#

    +

    Here we will be using the MNIST dataset, which is readily available through the scikit-learn +package. You may also find it for example here.
    +The MNIST (Modified National Institute of Standards and Technology) database is a large database +of handwritten digits that is commonly used for training various image processing systems.
    +The MNIST dataset consists of 70 000 images of size \(28\times 28\) pixels, each labeled from 0 to 9.
    +The scikit-learn dataset we will use consists of a selection of 1797 images of size \(8\times 8\) collected and processed from this database.

    +

    To feed data into a feed-forward neural network we need to represent +the inputs as a design/feature matrix \(X = (n_{inputs}, n_{features})\). Each +row represents an input, in this case a handwritten digit, and +each column represents a feature, in this case a pixel. The +correct answers, also known as labels or targets are +represented as a 1D array of integers +\(Y = (n_{inputs}) = (5, 3, 1, 8,...)\).

    +

    As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from +measurements of height (in m)
    +and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:

    +
    +\[\begin{split} X = \begin{bmatrix} +1.85 & 81\\ +1.71 & 65\\ +1.95 & 103\\ +1.55 & 42\\ +1.63 & 56 +\end{bmatrix} ,\end{split}\]
    +

    and the targets would be:

    +
    +\[ Y = (23.7, 22.2, 27.1, 17.5, 21.1) \]
    +

    Since each input image is a 2D matrix, we need to flatten the image +(i.e. “unravel” the 2D matrix into a 1D array) to turn the data into a +design/feature matrix. This means we lose all spatial information in the +image, such as locality and translational invariance. More complicated +architectures such as Convolutional Neural Networks can take advantage +of such information, and are most commonly applied when analyzing +images.

    +
    +
    +
    # import necessary packages
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn import datasets
    +
    +
    +# ensure the same random numbers appear every time
    +np.random.seed(0)
    +
    +# display images in notebook
    +%matplotlib inline
    +plt.rcParams['figure.figsize'] = (12,12)
    +
    +
    +# download MNIST dataset
    +digits = datasets.load_digits()
    +
    +# define inputs and labels
    +inputs = digits.images
    +labels = digits.target
    +
    +print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
    +print("labels = (n_inputs) = " + str(labels.shape))
    +
    +
    +# flatten the image
    +# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
    +n_inputs = len(inputs)
    +inputs = inputs.reshape(n_inputs, -1)
    +print("X = (n_inputs, n_features) = " + str(inputs.shape))
    +
    +
    +# choose some random images to display
    +indices = np.arange(n_inputs)
    +random_indices = np.random.choice(indices, size=5)
    +
    +for i, image in enumerate(digits.images[random_indices]):
    +    plt.subplot(1, 5, i+1)
    +    plt.axis('off')
    +    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    +    plt.title("Label: %d" % digits.target[random_indices[i]])
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Train and test datasets#

    +

    Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.

    +

    We will reserve \(80 \%\) of our dataset for training and \(20 \%\) for testing.

    +

    It is important that the train and test datasets are drawn randomly from our dataset, to ensure +no bias in the sampling.
    +Say you are taking measurements of weather data to predict the weather in the coming 5 days. +You don’t want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data +collected from 12.00 to 24.00.

    +
    +
    +
    from sklearn.model_selection import train_test_split
    +
    +# one-liner from scikit-learn library
    +train_size = 0.8
    +test_size = 1 - train_size
    +X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
    +                                                    test_size=test_size)
    +
    +# equivalently in numpy
    +def train_test_split_numpy(inputs, labels, train_size, test_size):
    +    n_inputs = len(inputs)
    +    inputs_shuffled = inputs.copy()
    +    labels_shuffled = labels.copy()
    +    
    +    np.random.shuffle(inputs_shuffled)
    +    np.random.shuffle(labels_shuffled)
    +    
    +    train_end = int(n_inputs*train_size)
    +    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
    +    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
    +    
    +    return X_train, X_test, Y_train, Y_test
    +
    +#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)
    +
    +print("Number of training images: " + str(len(X_train)))
    +print("Number of test images: " + str(len(X_test)))
    +
    +
    +
    +
    +
    +
    +

    Define model and architecture#

    +

    Our simple feed-forward neural network will consist of an input layer, a single hidden layer and an output layer. The activation \(y\) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have

    +
    +\[ z = \sum_{i=1}^n w_i a_i ,\]
    +
    +\[ y = f(z) ,\]
    +

    where \(f\) is the activation function, \(a_i\) represents input from neuron \(i\) in the preceding layer +and \(w_i\) is the weight to input \(i\).
    +The activation of the neurons in the input layer is just the features (e.g. a pixel value).

    +

    The simplest activation function for a neuron is the Heaviside function:

    +
    +\[\begin{split} f(z) = +\begin{cases} +1, & z > 0\\ +0, & \text{otherwise} +\end{cases} +\end{split}\]
    +

    A feed-forward neural network with this activation is known as a perceptron.
    +For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.
    +This activation can be generalized to \(k\) classes (using e.g. the one-against-all strategy), +and we call these architectures multiclass perceptrons.

    +

    However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and
    +Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.

    +

    Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).
    +We will be using the sigmoid function \(\sigma(x)\):

    +
    +\[ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,\]
    +

    which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.

    +
    +
    +

    Layers#

    +
      +
    • Input

    • +
    +

    Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.

    +
      +
    • Hidden layer

    • +
    +

    We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.
    +Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.

    +
      +
    • Output

    • +
    +

    If we were building a binary classifier, it would be sufficient with a single neuron in the output layer, +which could output 0 or 1 according to the Heaviside function. This would be an example of a hard classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a soft classifier, which outputs the probability of being in class 0 or 1.

    +

    For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.

    +

    Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \(j = 0,1,...,9\). The activation of each output neuron \(j\) will be according to the softmax function:

    +
    +\[ P(\text{class $j$} \mid \text{input $\boldsymbol{a}$}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}} +{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,\]
    +

    i.e. each neuron \(j\) outputs the probability of being in class \(j\) given an input from the hidden layer \(\boldsymbol{a}\), with \(\boldsymbol{w}_j\) the weights of neuron \(j\) to the inputs.
    +The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.
    +The exponent is just the weighted sum of inputs as before:

    +
    +\[ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.\]
    +

    Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500 +weights to the output layer.

    +
    +
    +

    Weights and biases#

    +

    Typically weights are initialized with small values distributed around zero, drawn from a uniform +or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.

    +

    Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range +of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \(j\), \(b_j\):

    +
    +\[ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.\]
    +

    The bias weights \(\boldsymbol{b}\) are often initialized to zero, but a small value like \(0.01\) ensures all neurons have some output which can be backpropagated in the first training cycle.

    +
    +
    +
    # building our neural network
    +
    +n_inputs, n_features = X_train.shape
    +n_hidden_neurons = 50
    +n_categories = 10
    +
    +# we make the weights normally distributed using numpy.random.randn
    +
    +# weights and bias in the hidden layer
    +hidden_weights = np.random.randn(n_features, n_hidden_neurons)
    +hidden_bias = np.zeros(n_hidden_neurons) + 0.01
    +
    +# weights and bias in the output layer
    +output_weights = np.random.randn(n_hidden_neurons, n_categories)
    +output_bias = np.zeros(n_categories) + 0.01
    +
    +
    +
    +
    +
    +
    +

    Feed-forward pass#

    +

    Denote \(F\) the number of features, \(H\) the number of hidden neurons and \(C\) the number of categories.
    +For each input image we calculate a weighted sum of input features (pixel values) to each neuron \(j\) in the hidden layer \(l\):

    +
    +\[ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},\]
    +

    this is then passed through our activation function

    +
    +\[ a_{j}^{l} = f(z_{j}^{l}) .\]
    +

    We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \(j\) in the output layer:

    +
    +\[ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.\]
    +

    Finally we calculate the output of neuron \(j\) in the output layer using the softmax function:

    +
    +\[ a_{j}^{L} = \frac{\exp{(z_j^{L})}} +{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .\]
    +
    +
    +

    Matrix multiplications#

    +

    Since our data has the dimensions \(X = (n_{inputs}, n_{features})\) and our weights to the hidden +layer have the dimensions
    +\(W_{hidden} = (n_{features}, n_{hidden})\), +we can easily feed the network all our training data in one go by taking the matrix product

    +
    +\[ X W^{h} = (n_{inputs}, n_{hidden}),\]
    +

    and obtain a matrix that holds the weighted sum of inputs to the hidden layer +for each input image and each hidden neuron.
    +We also add the bias to obtain a matrix of weighted sums to the hidden layer \(Z^{h}\):

    +
    +\[ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,\]
    +

    meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.
    +This is then passed through the activation:

    +
    +\[ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .\]
    +

    This is fed to the output layer:

    +
    +\[ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .\]
    +

    Finally we receive our output values for each image and each category by passing it through the softmax function:

    +
    +\[ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .\]
    +
    +
    +
    # setup the feed-forward pass, subscript h = hidden layer
    +
    +def sigmoid(x):
    +    return 1/(1 + np.exp(-x))
    +
    +def feed_forward(X):
    +    # weighted sum of inputs to the hidden layer
    +    z_h = np.matmul(X, hidden_weights) + hidden_bias
    +    # activation in the hidden layer
    +    a_h = sigmoid(z_h)
    +    
    +    # weighted sum of inputs to the output layer
    +    z_o = np.matmul(a_h, output_weights) + output_bias
    +    # softmax output
    +    # axis 0 holds each input and axis 1 the probabilities of each category
    +    exp_term = np.exp(z_o)
    +    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
    +    
    +    return probabilities
    +
    +probabilities = feed_forward(X_train)
    +print("probabilities = (n_inputs, n_categories) = " + str(probabilities.shape))
    +print("probability that image 0 is in category 0,1,2,...,9 = \n" + str(probabilities[0]))
    +print("probabilities sum up to: " + str(probabilities[0].sum()))
    +print()
    +
    +# we obtain a prediction by taking the class with the highest likelihood
    +def predict(X):
    +    probabilities = feed_forward(X)
    +    return np.argmax(probabilities, axis=1)
    +
    +predictions = predict(X_train)
    +print("predictions = (n_inputs) = " + str(predictions.shape))
    +print("prediction for image 0: " + str(predictions[0]))
    +print("correct label for image 0: " + str(Y_train[0]))
    +
    +
    +
    +
    +
    +
    +

    Choose cost function and optimizer#

    +

    To measure how well our neural network is doing we need to introduce a cost function.
    +We will call the function that gives the error of a single sample output the loss function, and the function +that gives the total error of our network across all samples the cost function. +A typical choice for multiclass classification is the cross-entropy loss, also known as the negative log likelihood.

    +

    In multiclass classification it is common to treat each integer label as a so called one-hot vector:

    +
    +\[ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,\]
    +
    +\[ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,\]
    +

    i.e. a binary bit string of length \(C\), where \(C = 10\) is the number of classes in the MNIST dataset.

    +

    Let \(y_{ic}\) denote the \(c\)-th component of the \(i\)-th one-hot vector.
    +We define the cost function \(\mathcal{C}\) as a sum over the cross-entropy loss for each point \(\boldsymbol{x}_i\) in the dataset.

    +

    In the one-hot representation only one of the terms in the loss function is non-zero, namely the +probability of the correct category \(c'\)
    +(i.e. the category \(c'\) such that \(y_{ic'} = 1\)). This means that the cross entropy loss only punishes you for how wrong +you got the correct label. The probability of category \(c\) is given by the softmax function. The vector \(\boldsymbol{\theta}\) represents the parameters of our network, i.e. all the weights and biases.

    +
    +
    +

    Optimizing the cost function#

    +

    The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is gradient descent and its generalizations. The idea behind gradient descent +is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a local minimum of the cost function.
    +Each parameter \(\theta\) is iteratively adjusted according to the rule

    +
    +\[ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,\]
    +

    where \(\eta\) is known as the learning rate, which controls how big a step we take towards the minimum.
    +This update can be repeated for any number of iterations, or until we are satisfied with the result.

    +

    A simple and effective improvement is a variant called Batch Gradient Descent.
    +Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient +on a subset of the data called a minibatch.
    +If there are \(N\) data points and we have a minibatch size of \(M\), the total number of batches +is \(N/M\).
    +We denote each minibatch \(B_k\), with \(k = 1, 2,...,N/M\). The gradient then becomes:

    +
    +\[ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad +\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,\]
    +

    i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.

    +

    This has two important benefits:

    +
      +
    1. Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.

    2. +
    3. It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.

    4. +
    +

    The various optmization methods, with codes and algorithms, are discussed in our lectures on Gradient descent approaches.

    +
    +
    +

    Regularization#

    +

    It is common to add an extra term to the cost function, proportional +to the size of the weights. This is equivalent to constraining the +size of the weights, so that they do not grow out of control. +Constraining the size of the weights means that the weights cannot +grow arbitrarily large to fit the training data, and in this way +reduces overfitting.

    +

    We will measure the size of the weights using the so called L2-norm, meaning our cost function becomes:

    +
    +\[ \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad +\frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 += \frac{1}{N} \sum_{i=1}^N \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,\]
    +

    i.e. we sum up all the weights squared. The factor \(\lambda\) is known as a regularization parameter.

    +

    In order to train the model, we need to calculate the derivative of +the cost function with respect to every bias and weight in the +network. In total our network has \((64 + 1)\times 50=3250\) weights in +the hidden layer and \((50 + 1)\times 10=510\) weights to the output +layer (\(+1\) for the bias), and the gradient must be calculated for +every parameter. We use the backpropagation algorithm discussed +above. This is a clever use of the chain rule that allows us to +calculate the gradient efficently.

    +
    +
    +

    Matrix multiplication#

    +

    To more efficently train our network these equations are implemented using matrix operations.
    +The error in the output layer is calculated simply as, with \(\boldsymbol{t}\) being our targets,

    +
    +\[ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .\]
    +

    The gradient for the output weights is calculated as

    +
    +\[ \nabla W_{L} = \boldsymbol{a}^T \delta_L = (n_{hidden}, n_{categories}) ,\]
    +

    where \(\boldsymbol{a} = (n_{inputs}, n_{hidden})\). This simply means that we are summing up the gradients for each input.
    +Since we are going backwards we have to transpose the activation matrix.

    +

    The gradient with respect to the output bias is then

    +
    +\[ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .\]
    +

    The error in the hidden layer is

    +
    +\[ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,\]
    +

    where \(f'(a_{h})\) is the derivative of the activation in the hidden layer. The matrix products mean +that we are summing up the products for each neuron in the output layer. The symbol \(\circ\) denotes +the Hadamard product, meaning element-wise multiplication.

    +

    This again gives us the gradients in the hidden layer:

    +
    +\[ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,\]
    +
    +\[ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .\]
    +
    +
    +
    # to categorical turns our integer vector into a onehot representation
    +from sklearn.metrics import accuracy_score
    +
    +# one-hot in numpy
    +def to_categorical_numpy(integer_vector):
    +    n_inputs = len(integer_vector)
    +    n_categories = np.max(integer_vector) + 1
    +    onehot_vector = np.zeros((n_inputs, n_categories))
    +    onehot_vector[range(n_inputs), integer_vector] = 1
    +    
    +    return onehot_vector
    +
    +#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)
    +Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
    +
    +def feed_forward_train(X):
    +    # weighted sum of inputs to the hidden layer
    +    z_h = np.matmul(X, hidden_weights) + hidden_bias
    +    # activation in the hidden layer
    +    a_h = sigmoid(z_h)
    +    
    +    # weighted sum of inputs to the output layer
    +    z_o = np.matmul(a_h, output_weights) + output_bias
    +    # softmax output
    +    # axis 0 holds each input and axis 1 the probabilities of each category
    +    exp_term = np.exp(z_o)
    +    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
    +    
    +    # for backpropagation need activations in hidden and output layers
    +    return a_h, probabilities
    +
    +def backpropagation(X, Y):
    +    a_h, probabilities = feed_forward_train(X)
    +    
    +    # error in the output layer
    +    error_output = probabilities - Y
    +    # error in the hidden layer
    +    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)
    +    
    +    # gradients for the output layer
    +    output_weights_gradient = np.matmul(a_h.T, error_output)
    +    output_bias_gradient = np.sum(error_output, axis=0)
    +    
    +    # gradient for the hidden layer
    +    hidden_weights_gradient = np.matmul(X.T, error_hidden)
    +    hidden_bias_gradient = np.sum(error_hidden, axis=0)
    +
    +    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
    +
    +print("Old accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
    +
    +eta = 0.01
    +lmbd = 0.01
    +for i in range(1000):
    +    # calculate gradients
    +    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
    +    
    +    # regularization term gradients
    +    dWo += lmbd * output_weights
    +    dWh += lmbd * hidden_weights
    +    
    +    # update weights and biases
    +    output_weights -= eta * dWo
    +    output_bias -= eta * dBo
    +    hidden_weights -= eta * dWh
    +    hidden_bias -= eta * dBh
    +
    +print("New accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
    +
    +
    +
    +
    +
    +
    +

    Improving performance#

    +

    As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.
    +In order to obtain a network that does something useful, we will have to do a bit more work.

    +

    The choice of hyperparameters such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a grid-search is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \(\eta = 10^{-6}, 10^{-5},...,10^{-1}\) with different regularization parameters \(\lambda = 10^{-6},...,10^{-0}\).

    +

    Next, we haven’t implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an iteration, and a full training period +going through the entire dataset (\(n/M\) batches) an epoch.

    +

    If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.
    +Andrew Ng goes through some of these considerations in this video. You can find a summary of the video here.

    +
    +
    +

    Full object-oriented implementation#

    +

    It is very natural to think of the network as an object, with specific instances of the network +being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.

    +
    +
    +
    class NeuralNetwork:
    +    def __init__(
    +            self,
    +            X_data,
    +            Y_data,
    +            n_hidden_neurons=50,
    +            n_categories=10,
    +            epochs=10,
    +            batch_size=100,
    +            eta=0.1,
    +            lmbd=0.0):
    +
    +        self.X_data_full = X_data
    +        self.Y_data_full = Y_data
    +
    +        self.n_inputs = X_data.shape[0]
    +        self.n_features = X_data.shape[1]
    +        self.n_hidden_neurons = n_hidden_neurons
    +        self.n_categories = n_categories
    +
    +        self.epochs = epochs
    +        self.batch_size = batch_size
    +        self.iterations = self.n_inputs // self.batch_size
    +        self.eta = eta
    +        self.lmbd = lmbd
    +
    +        self.create_biases_and_weights()
    +
    +    def create_biases_and_weights(self):
    +        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)
    +        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01
    +
    +        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)
    +        self.output_bias = np.zeros(self.n_categories) + 0.01
    +
    +    def feed_forward(self):
    +        # feed-forward for training
    +        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias
    +        self.a_h = sigmoid(self.z_h)
    +
    +        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias
    +
    +        exp_term = np.exp(self.z_o)
    +        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
    +
    +    def feed_forward_out(self, X):
    +        # feed-forward for output
    +        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias
    +        a_h = sigmoid(z_h)
    +
    +        z_o = np.matmul(a_h, self.output_weights) + self.output_bias
    +        
    +        exp_term = np.exp(z_o)
    +        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
    +        return probabilities
    +
    +    def backpropagation(self):
    +        error_output = self.probabilities - self.Y_data
    +        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)
    +
    +        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)
    +        self.output_bias_gradient = np.sum(error_output, axis=0)
    +
    +        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)
    +        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)
    +
    +        if self.lmbd > 0.0:
    +            self.output_weights_gradient += self.lmbd * self.output_weights
    +            self.hidden_weights_gradient += self.lmbd * self.hidden_weights
    +
    +        self.output_weights -= self.eta * self.output_weights_gradient
    +        self.output_bias -= self.eta * self.output_bias_gradient
    +        self.hidden_weights -= self.eta * self.hidden_weights_gradient
    +        self.hidden_bias -= self.eta * self.hidden_bias_gradient
    +
    +    def predict(self, X):
    +        probabilities = self.feed_forward_out(X)
    +        return np.argmax(probabilities, axis=1)
    +
    +    def predict_probabilities(self, X):
    +        probabilities = self.feed_forward_out(X)
    +        return probabilities
    +
    +    def train(self):
    +        data_indices = np.arange(self.n_inputs)
    +
    +        for i in range(self.epochs):
    +            for j in range(self.iterations):
    +                # pick datapoints with replacement
    +                chosen_datapoints = np.random.choice(
    +                    data_indices, size=self.batch_size, replace=False
    +                )
    +
    +                # minibatch training data
    +                self.X_data = self.X_data_full[chosen_datapoints]
    +                self.Y_data = self.Y_data_full[chosen_datapoints]
    +
    +                self.feed_forward()
    +                self.backpropagation()
    +
    +
    +
    +
    +
    +
    +

    Evaluate model performance on test data#

    +

    To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.
    +We measure the performance of the network using the accuracy score.
    +The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \(1\).

    +
    +\[ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,\]
    +

    where \(I\) is the indicator function, \(1\) if \(\tilde{y}_i = y_i\) and \(0\) otherwise.

    +
    +
    +
    epochs = 100
    +batch_size = 100
    +
    +dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
    +                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
    +dnn.train()
    +test_predict = dnn.predict(X_test)
    +
    +# accuracy score from scikit library
    +print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
    +
    +# equivalent in numpy
    +def accuracy_score_numpy(Y_test, Y_pred):
    +    return np.sum(Y_test == Y_pred) / len(Y_test)
    +
    +#print("Accuracy score on test set: ", accuracy_score_numpy(Y_test, test_predict))
    +
    +
    +
    +
    +
    +
    +

    Adjust hyperparameters#

    +

    We now perform a grid search to find the optimal hyperparameters for the network.
    +Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \(98\%\) (\(2\%\) error rate).

    +
    +
    +
    eta_vals = np.logspace(-5, 1, 7)
    +lmbd_vals = np.logspace(-5, 1, 7)
    +# store the models for later use
    +DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
    +
    +# grid search
    +for i, eta in enumerate(eta_vals):
    +    for j, lmbd in enumerate(lmbd_vals):
    +        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
    +                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
    +        dnn.train()
    +        
    +        DNN_numpy[i][j] = dnn
    +        
    +        test_predict = dnn.predict(X_test)
    +        
    +        print("Learning rate  = ", eta)
    +        print("Lambda = ", lmbd)
    +        print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
    +        print()
    +
    +
    +
    +
    +
    +
    +

    Visualization#

    +
    +
    +
    # visual representation of grid search
    +# uses seaborn heatmap, you can also do this with matplotlib imshow
    +import seaborn as sns
    +
    +sns.set()
    +
    +train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +
    +for i in range(len(eta_vals)):
    +    for j in range(len(lmbd_vals)):
    +        dnn = DNN_numpy[i][j]
    +        
    +        train_pred = dnn.predict(X_train) 
    +        test_pred = dnn.predict(X_test)
    +
    +        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
    +        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
    +
    +        
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Training Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Test Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    scikit-learn implementation#

    +

    scikit-learn focuses more +on traditional machine learning methods, such as regression, +clustering, decision trees, etc. As such, it has only two types of +neural networks: Multi Layer Perceptron outputting continuous values, +MPLRegressor, and Multi Layer Perceptron outputting labels, +MLPClassifier. We will see how simple it is to use these classes.

    +

    scikit-learn implements a few improvements from our neural network, +such as early stopping, a varying learning rate, different +optimization methods, etc. We would therefore expect a better +performance overall.

    +
    +
    +
    from sklearn.neural_network import MLPClassifier
    +# store models for later use
    +DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
    +
    +for i, eta in enumerate(eta_vals):
    +    for j, lmbd in enumerate(lmbd_vals):
    +        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',
    +                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
    +        dnn.fit(X_train, Y_train)
    +        
    +        DNN_scikit[i][j] = dnn
    +        
    +        print("Learning rate  = ", eta)
    +        print("Lambda = ", lmbd)
    +        print("Accuracy score on test set: ", dnn.score(X_test, Y_test))
    +        print()
    +
    +
    +
    +
    +
    +
    +

    Visualization#

    +
    +
    +
    # optional
    +# visual representation of grid search
    +# uses seaborn heatmap, could probably do this in matplotlib
    +import seaborn as sns
    +
    +sns.set()
    +
    +train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +
    +for i in range(len(eta_vals)):
    +    for j in range(len(lmbd_vals)):
    +        dnn = DNN_scikit[i][j]
    +        
    +        train_pred = dnn.predict(X_train) 
    +        test_pred = dnn.predict(X_test)
    +
    +        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
    +        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
    +
    +        
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Training Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Test Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Building neural networks in Tensorflow and Keras#

    +

    Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn +and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy +and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.

    +

    In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite +clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or +NumPy arrays.

    +
    +
    +

    Tensorflow#

    +

    Tensorflow is an open source library machine learning library +developed by the Google Brain team for internal use. It was released +under the Apache 2.0 open source license in November 9, 2015.

    +

    Tensorflow is a computational framework that allows you to construct +machine learning models at different levels of abstraction, from +high-level, object-oriented APIs like Keras, down to the C++ kernels +that Tensorflow is built upon. The higher levels of abstraction are +simpler to use, but less flexible, and our choice of implementation +should reflect the problems we are trying to solve.

    +

    Tensorflow uses so-called graphs to represent your computation +in terms of the dependencies between individual operations, such that you first build a Tensorflow graph +to represent your model, and then create a Tensorflow session to run the graph.

    +

    In this guide we will analyze the same data as we did in our NumPy and +scikit-learn tutorial, gathered from the MNIST database of images. We +will give an introduction to the lower level Python Application +Program Interfaces (APIs), and see how we use them to build our graph. +Then we will build (effectively) the same graph in Keras, to see just +how simple solving a machine learning problem can be.

    +

    To install tensorflow on Unix/Linux systems, use pip as

    +
    +
    +
    pip3 install tensorflow
    +
    +
    +
    +
    +

    and/or if you use anaconda, just write (or install from the graphical user interface) +(current release of CPU-only TensorFlow)

    +
    +
    +
    conda create -n tf tensorflow
    +conda activate tf
    +
    +
    +
    +
    +

    To install the current release of GPU TensorFlow

    +
    +
    +
    conda create -n tf-gpu tensorflow-gpu
    +conda activate tf-gpu
    +
    +
    +
    +
    +
    +
    +

    Using Keras#

    +

    Keras is a high level neural network +that supports Tensorflow, CTNK and Theano as backends.
    +If you have Anaconda installed you may run the following command

    +
    +
    +
    conda install keras
    +
    +
    +
    +
    +

    You can look up the instructions here for more information.

    +

    We will to a large extent use keras in this course.

    +
    +
    +

    Collect and pre-process data#

    +

    Let us look again at the MINST data set.

    +
    +
    +
    # import necessary packages
    +import numpy as np
    +import matplotlib.pyplot as plt
    +import tensorflow as tf
    +from sklearn import datasets
    +
    +
    +# ensure the same random numbers appear every time
    +np.random.seed(0)
    +
    +# display images in notebook
    +%matplotlib inline
    +plt.rcParams['figure.figsize'] = (12,12)
    +
    +
    +# download MNIST dataset
    +digits = datasets.load_digits()
    +
    +# define inputs and labels
    +inputs = digits.images
    +labels = digits.target
    +
    +print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
    +print("labels = (n_inputs) = " + str(labels.shape))
    +
    +
    +# flatten the image
    +# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
    +n_inputs = len(inputs)
    +inputs = inputs.reshape(n_inputs, -1)
    +print("X = (n_inputs, n_features) = " + str(inputs.shape))
    +
    +
    +# choose some random images to display
    +indices = np.arange(n_inputs)
    +random_indices = np.random.choice(indices, size=5)
    +
    +for i, image in enumerate(digits.images[random_indices]):
    +    plt.subplot(1, 5, i+1)
    +    plt.axis('off')
    +    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    +    plt.title("Label: %d" % digits.target[random_indices[i]])
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    from tensorflow.keras.layers import Input
    +from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
    +from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
    +from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
    +from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
    +from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
    +
    +from sklearn.model_selection import train_test_split
    +
    +# one-hot representation of labels
    +labels = to_categorical(labels)
    +
    +# split into train and test data
    +train_size = 0.8
    +test_size = 1 - train_size
    +X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
    +                                                    test_size=test_size)
    +
    +
    +
    +
    +
    +
    +
    
    +epochs = 100
    +batch_size = 100
    +n_neurons_layer1 = 100
    +n_neurons_layer2 = 50
    +n_categories = 10
    +eta_vals = np.logspace(-5, 1, 7)
    +lmbd_vals = np.logspace(-5, 1, 7)
    +def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
    +    model = Sequential()
    +    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
    +    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
    +    model.add(Dense(n_categories, activation='softmax'))
    +    
    +    sgd = optimizers.SGD(lr=eta)
    +    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    +    
    +    return model
    +
    +
    +
    +
    +
    +
    +
    DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
    +        
    +for i, eta in enumerate(eta_vals):
    +    for j, lmbd in enumerate(lmbd_vals):
    +        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
    +                                         eta=eta, lmbd=lmbd)
    +        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
    +        scores = DNN.evaluate(X_test, Y_test)
    +        
    +        DNN_keras[i][j] = DNN
    +        
    +        print("Learning rate = ", eta)
    +        print("Lambda = ", lmbd)
    +        print("Test accuracy: %.3f" % scores[1])
    +        print()
    +
    +
    +
    +
    +
    +
    +
    # optional
    +# visual representation of grid search
    +# uses seaborn heatmap, could probably do this in matplotlib
    +import seaborn as sns
    +
    +sns.set()
    +
    +train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +
    +for i in range(len(eta_vals)):
    +    for j in range(len(lmbd_vals)):
    +        DNN = DNN_keras[i][j]
    +
    +        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
    +        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
    +
    +        
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Training Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Test Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Building a neural network code#

    +

    Here we present a flexible object oriented codebase +for a feed forward neural network, along with a demonstration of how +to use it. Before we get into the details of the neural network, we +will first present some implementations of various schedulers, cost +functions and activation functions that can be used together with the +neural network.

    +

    The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.

    +
    +

    Learning rate methods#

    +

    The code below shows object oriented implementations of the Constant, +Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All +of the classes belong to the shared abstract Scheduler class, and +share the update_change() and reset() methods allowing for any of the +schedulers to be seamlessly used during the training stage, as will +later be shown in the fit() method of the neural +network. Update_change() only has one parameter, the gradient +(\(δ^l_ja^{l−1}_k\)), and returns the change which will be subtracted +from the weights. The reset() function takes no parameters, and resets +the desired variables. For Constant and Momentum, reset does nothing.

    +
    +
    +
    import autograd.numpy as np
    +
    +class Scheduler:
    +    """
    +    Abstract class for Schedulers
    +    """
    +
    +    def __init__(self, eta):
    +        self.eta = eta
    +
    +    # should be overwritten
    +    def update_change(self, gradient):
    +        raise NotImplementedError
    +
    +    # overwritten if needed
    +    def reset(self):
    +        pass
    +
    +
    +class Constant(Scheduler):
    +    def __init__(self, eta):
    +        super().__init__(eta)
    +
    +    def update_change(self, gradient):
    +        return self.eta * gradient
    +    
    +    def reset(self):
    +        pass
    +
    +
    +class Momentum(Scheduler):
    +    def __init__(self, eta: float, momentum: float):
    +        super().__init__(eta)
    +        self.momentum = momentum
    +        self.change = 0
    +
    +    def update_change(self, gradient):
    +        self.change = self.momentum * self.change + self.eta * gradient
    +        return self.change
    +
    +    def reset(self):
    +        pass
    +
    +
    +class Adagrad(Scheduler):
    +    def __init__(self, eta):
    +        super().__init__(eta)
    +        self.G_t = None
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +
    +        if self.G_t is None:
    +            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
    +
    +        self.G_t += gradient @ gradient.T
    +
    +        G_t_inverse = 1 / (
    +            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
    +        )
    +        return self.eta * gradient * G_t_inverse
    +
    +    def reset(self):
    +        self.G_t = None
    +
    +
    +class AdagradMomentum(Scheduler):
    +    def __init__(self, eta, momentum):
    +        super().__init__(eta)
    +        self.G_t = None
    +        self.momentum = momentum
    +        self.change = 0
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +
    +        if self.G_t is None:
    +            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
    +
    +        self.G_t += gradient @ gradient.T
    +
    +        G_t_inverse = 1 / (
    +            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
    +        )
    +        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
    +        return self.change
    +
    +    def reset(self):
    +        self.G_t = None
    +
    +
    +class RMS_prop(Scheduler):
    +    def __init__(self, eta, rho):
    +        super().__init__(eta)
    +        self.rho = rho
    +        self.second = 0.0
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
    +        return self.eta * gradient / (np.sqrt(self.second + delta))
    +
    +    def reset(self):
    +        self.second = 0.0
    +
    +
    +class Adam(Scheduler):
    +    def __init__(self, eta, rho, rho2):
    +        super().__init__(eta)
    +        self.rho = rho
    +        self.rho2 = rho2
    +        self.moment = 0
    +        self.second = 0
    +        self.n_epochs = 1
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +
    +        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
    +        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
    +
    +        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
    +        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
    +
    +        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
    +
    +    def reset(self):
    +        self.n_epochs += 1
    +        self.moment = 0
    +        self.second = 0
    +
    +
    +
    +
    +
    +
    +

    Usage of the above learning rate schedulers#

    +

    To initalize a scheduler, simply create the object and pass in the +necessary parameters such as the learning rate and the momentum as +shown below. As the Scheduler class is an abstract class it should not +called directly, and will raise an error upon usage.

    +
    +
    +
    momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
    +adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
    +
    +
    +
    +
    +

    Here is a small example for how a segment of code using schedulers +could look. Switching out the schedulers is simple.

    +
    +
    +
    weights = np.ones((3,3))
    +print(f"Before scheduler:\n{weights=}")
    +
    +epochs = 10
    +for e in range(epochs):
    +    gradient = np.random.rand(3, 3)
    +    change = adam_scheduler.update_change(gradient)
    +    weights = weights - change
    +    adam_scheduler.reset()
    +
    +print(f"\nAfter scheduler:\n{weights=}")
    +
    +
    +
    +
    +
    +
    +

    Cost functions#

    +

    Here we discuss cost functions that can be used when creating the +neural network. Every cost function takes the target vector as its +parameter, and returns a function valued only at \(x\) such that it may +easily be differentiated.

    +
    +
    +
    import autograd.numpy as np
    +
    +def CostOLS(target):
    +    
    +    def func(X):
    +        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
    +
    +    return func
    +
    +
    +def CostLogReg(target):
    +
    +    def func(X):
    +        
    +        return -(1.0 / target.shape[0]) * np.sum(
    +            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
    +        )
    +
    +    return func
    +
    +
    +def CostCrossEntropy(target):
    +    
    +    def func(X):
    +        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
    +
    +    return func
    +
    +
    +
    +
    +

    Below we give a short example of how these cost function may be used +to obtain results if you wish to test them out on your own using +AutoGrad’s automatics differentiation.

    +
    +
    +
    from autograd import grad
    +
    +target = np.array([[1, 2, 3]]).T
    +a = np.array([[4, 5, 6]]).T
    +
    +cost_func = CostCrossEntropy
    +cost_func_derivative = grad(cost_func(target))
    +
    +valued_at_a = cost_func_derivative(a)
    +print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
    +
    +
    +
    +
    +
    +
    +

    Activation functions#

    +

    Finally, before we look at the neural network, we will look at the +activation functions which can be specified between the hidden layers +and as the output function. Each function can be valued for any given +vector or matrix X, and can be differentiated via derivate().

    +
    +
    +
    import autograd.numpy as np
    +from autograd import elementwise_grad
    +
    +def identity(X):
    +    return X
    +
    +
    +def sigmoid(X):
    +    try:
    +        return 1.0 / (1 + np.exp(-X))
    +    except FloatingPointError:
    +        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
    +
    +
    +def softmax(X):
    +    X = X - np.max(X, axis=-1, keepdims=True)
    +    delta = 10e-10
    +    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
    +
    +
    +def RELU(X):
    +    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
    +
    +
    +def LRELU(X):
    +    delta = 10e-4
    +    return np.where(X > np.zeros(X.shape), X, delta * X)
    +
    +
    +def derivate(func):
    +    if func.__name__ == "RELU":
    +
    +        def func(X):
    +            return np.where(X > 0, 1, 0)
    +
    +        return func
    +
    +    elif func.__name__ == "LRELU":
    +
    +        def func(X):
    +            delta = 10e-4
    +            return np.where(X > 0, 1, delta)
    +
    +        return func
    +
    +    else:
    +        return elementwise_grad(func)
    +
    +
    +
    +
    +

    Below follows a short demonstration of how to use an activation +function. The derivative of the activation function will be important +when calculating the output delta term during backpropagation. Note +that derivate() can also be used for cost functions for a more +generalized approach.

    +
    +
    +
    z = np.array([[4, 5, 6]]).T
    +print(f"Input to activation function:\n{z}")
    +
    +act_func = sigmoid
    +a = act_func(z)
    +print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
    +
    +act_func_derivative = derivate(act_func)
    +valued_at_z = act_func_derivative(a)
    +print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
    +
    +
    +
    +
    +
    +
    +

    The Neural Network#

    +

    Now that we have gotten a good understanding of the implementation of +some important components, we can take a look at an object oriented +implementation of a feed forward neural network. The feed forward +neural network has been implemented as a class named FFNN, which can +be initiated as a regressor or classifier dependant on the choice of +cost function. The FFNN can have any number of input nodes, hidden +layers with any amount of hidden nodes, and any amount of output nodes +meaning it can perform multiclass classification as well as binary +classification and regression problems. Although there is a lot of +code present, it makes for an easy to use and generalizeable interface +for creating many types of neural networks as will be demonstrated +below.

    +
    +
    +
    import math
    +import autograd.numpy as np
    +import sys
    +import warnings
    +from autograd import grad, elementwise_grad
    +from random import random, seed
    +from copy import deepcopy, copy
    +from typing import Tuple, Callable
    +from sklearn.utils import resample
    +
    +warnings.simplefilter("error")
    +
    +
    +class FFNN:
    +    """
    +    Description:
    +    ------------
    +        Feed Forward Neural Network with interface enabling flexible design of a
    +        nerual networks architecture and the specification of activation function
    +        in the hidden layers and output layer respectively. This model can be used
    +        for both regression and classification problems, depending on the output function.
    +
    +    Attributes:
    +    ------------
    +        I   dimensions (tuple[int]): A list of positive integers, which specifies the
    +            number of nodes in each of the networks layers. The first integer in the array
    +            defines the number of nodes in the input layer, the second integer defines number
    +            of nodes in the first hidden layer and so on until the last number, which
    +            specifies the number of nodes in the output layer.
    +        II  hidden_func (Callable): The activation function for the hidden layers
    +        III output_func (Callable): The activation function for the output layer
    +        IV  cost_func (Callable): Our cost function
    +        V   seed (int): Sets random seed, makes results reproducible
    +    """
    +
    +    def __init__(
    +        self,
    +        dimensions: tuple[int],
    +        hidden_func: Callable = sigmoid,
    +        output_func: Callable = lambda x: x,
    +        cost_func: Callable = CostOLS,
    +        seed: int = None,
    +    ):
    +        self.dimensions = dimensions
    +        self.hidden_func = hidden_func
    +        self.output_func = output_func
    +        self.cost_func = cost_func
    +        self.seed = seed
    +        self.weights = list()
    +        self.schedulers_weight = list()
    +        self.schedulers_bias = list()
    +        self.a_matrices = list()
    +        self.z_matrices = list()
    +        self.classification = None
    +
    +        self.reset_weights()
    +        self._set_classification()
    +
    +    def fit(
    +        self,
    +        X: np.ndarray,
    +        t: np.ndarray,
    +        scheduler: Scheduler,
    +        batches: int = 1,
    +        epochs: int = 100,
    +        lam: float = 0,
    +        X_val: np.ndarray = None,
    +        t_val: np.ndarray = None,
    +    ):
    +        """
    +        Description:
    +        ------------
    +            This function performs the training the neural network by performing the feedforward and backpropagation
    +            algorithm to update the networks weights.
    +
    +        Parameters:
    +        ------------
    +            I    X (np.ndarray) : training data
    +            II   t (np.ndarray) : target data
    +            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
    +            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
    +
    +        Optional Parameters:
    +        ------------
    +            V    batches (int) : number of batches the datasets are split into, default equal to 1
    +            VI   epochs (int) : number of iterations used to train the network, default equal to 100
    +            VII  lam (float) : regularization hyperparameter lambda
    +            VIII X_val (np.ndarray) : validation set
    +            IX   t_val (np.ndarray) : validation target set
    +
    +        Returns:
    +        ------------
    +            I   scores (dict) : A dictionary containing the performance metrics of the model.
    +                The number of the metrics depends on the parameters passed to the fit-function.
    +
    +        """
    +
    +        # setup 
    +        if self.seed is not None:
    +            np.random.seed(self.seed)
    +
    +        val_set = False
    +        if X_val is not None and t_val is not None:
    +            val_set = True
    +
    +        # creating arrays for score metrics
    +        train_errors = np.empty(epochs)
    +        train_errors.fill(np.nan)
    +        val_errors = np.empty(epochs)
    +        val_errors.fill(np.nan)
    +
    +        train_accs = np.empty(epochs)
    +        train_accs.fill(np.nan)
    +        val_accs = np.empty(epochs)
    +        val_accs.fill(np.nan)
    +
    +        self.schedulers_weight = list()
    +        self.schedulers_bias = list()
    +
    +        batch_size = X.shape[0] // batches
    +
    +        X, t = resample(X, t)
    +
    +        # this function returns a function valued only at X
    +        cost_function_train = self.cost_func(t)
    +        if val_set:
    +            cost_function_val = self.cost_func(t_val)
    +
    +        # create schedulers for each weight matrix
    +        for i in range(len(self.weights)):
    +            self.schedulers_weight.append(copy(scheduler))
    +            self.schedulers_bias.append(copy(scheduler))
    +
    +        print(f"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}")
    +
    +        try:
    +            for e in range(epochs):
    +                for i in range(batches):
    +                    # allows for minibatch gradient descent
    +                    if i == batches - 1:
    +                        # If the for loop has reached the last batch, take all thats left
    +                        X_batch = X[i * batch_size :, :]
    +                        t_batch = t[i * batch_size :, :]
    +                    else:
    +                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
    +                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
    +
    +                    self._feedforward(X_batch)
    +                    self._backpropagate(X_batch, t_batch, lam)
    +
    +                # reset schedulers for each epoch (some schedulers pass in this call)
    +                for scheduler in self.schedulers_weight:
    +                    scheduler.reset()
    +
    +                for scheduler in self.schedulers_bias:
    +                    scheduler.reset()
    +
    +                # computing performance metrics
    +                pred_train = self.predict(X)
    +                train_error = cost_function_train(pred_train)
    +
    +                train_errors[e] = train_error
    +                if val_set:
    +                    
    +                    pred_val = self.predict(X_val)
    +                    val_error = cost_function_val(pred_val)
    +                    val_errors[e] = val_error
    +
    +                if self.classification:
    +                    train_acc = self._accuracy(self.predict(X), t)
    +                    train_accs[e] = train_acc
    +                    if val_set:
    +                        val_acc = self._accuracy(pred_val, t_val)
    +                        val_accs[e] = val_acc
    +
    +                # printing progress bar
    +                progression = e / epochs
    +                print_length = self._progress_bar(
    +                    progression,
    +                    train_error=train_errors[e],
    +                    train_acc=train_accs[e],
    +                    val_error=val_errors[e],
    +                    val_acc=val_accs[e],
    +                )
    +        except KeyboardInterrupt:
    +            # allows for stopping training at any point and seeing the result
    +            pass
    +
    +        # visualization of training progression (similiar to tensorflow progression bar)
    +        sys.stdout.write("\r" + " " * print_length)
    +        sys.stdout.flush()
    +        self._progress_bar(
    +            1,
    +            train_error=train_errors[e],
    +            train_acc=train_accs[e],
    +            val_error=val_errors[e],
    +            val_acc=val_accs[e],
    +        )
    +        sys.stdout.write("")
    +
    +        # return performance metrics for the entire run
    +        scores = dict()
    +
    +        scores["train_errors"] = train_errors
    +
    +        if val_set:
    +            scores["val_errors"] = val_errors
    +
    +        if self.classification:
    +            scores["train_accs"] = train_accs
    +
    +            if val_set:
    +                scores["val_accs"] = val_accs
    +
    +        return scores
    +
    +    def predict(self, X: np.ndarray, *, threshold=0.5):
    +        """
    +         Description:
    +         ------------
    +             Performs prediction after training of the network has been finished.
    +
    +         Parameters:
    +        ------------
    +             I   X (np.ndarray): The design matrix, with n rows of p features each
    +
    +         Optional Parameters:
    +         ------------
    +             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
    +                 in classification problems
    +
    +         Returns:
    +         ------------
    +             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
    +                 This vector is thresholded if regression=False, meaning that classification results
    +                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
    +
    +        """
    +
    +        predict = self._feedforward(X)
    +
    +        if self.classification:
    +            return np.where(predict > threshold, 1, 0)
    +        else:
    +            return predict
    +
    +    def reset_weights(self):
    +        """
    +        Description:
    +        ------------
    +            Resets/Reinitializes the weights in order to train the network for a new problem.
    +
    +        """
    +        if self.seed is not None:
    +            np.random.seed(self.seed)
    +
    +        self.weights = list()
    +        for i in range(len(self.dimensions) - 1):
    +            weight_array = np.random.randn(
    +                self.dimensions[i] + 1, self.dimensions[i + 1]
    +            )
    +            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
    +
    +            self.weights.append(weight_array)
    +
    +    def _feedforward(self, X: np.ndarray):
    +        """
    +        Description:
    +        ------------
    +            Calculates the activation of each layer starting at the input and ending at the output.
    +            Each following activation is calculated from a weighted sum of each of the preceeding
    +            activations (except in the case of the input layer).
    +
    +        Parameters:
    +        ------------
    +            I   X (np.ndarray): The design matrix, with n rows of p features each
    +
    +        Returns:
    +        ------------
    +            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
    +        """
    +
    +        # reset matrices
    +        self.a_matrices = list()
    +        self.z_matrices = list()
    +
    +        # if X is just a vector, make it into a matrix
    +        if len(X.shape) == 1:
    +            X = X.reshape((1, X.shape[0]))
    +
    +        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
    +        # to add bias to our data
    +        bias = np.ones((X.shape[0], 1)) * 0.01
    +        X = np.hstack([bias, X])
    +
    +        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
    +        # exponent indicates layer number).
    +        a = X
    +        self.a_matrices.append(a)
    +        self.z_matrices.append(a)
    +
    +        # The feed forward algorithm
    +        for i in range(len(self.weights)):
    +            if i < len(self.weights) - 1:
    +                z = a @ self.weights[i]
    +                self.z_matrices.append(z)
    +                a = self.hidden_func(z)
    +                # bias column again added to the data here
    +                bias = np.ones((a.shape[0], 1)) * 0.01
    +                a = np.hstack([bias, a])
    +                self.a_matrices.append(a)
    +            else:
    +                try:
    +                    # a^L, the nodes in our output layers
    +                    z = a @ self.weights[i]
    +                    a = self.output_func(z)
    +                    self.a_matrices.append(a)
    +                    self.z_matrices.append(z)
    +                except Exception as OverflowError:
    +                    print(
    +                        "OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling"
    +                    )
    +
    +        # this will be a^L
    +        return a
    +
    +    def _backpropagate(self, X, t, lam):
    +        """
    +        Description:
    +        ------------
    +            Performs the backpropagation algorithm. In other words, this method
    +            calculates the gradient of all the layers starting at the
    +            output layer, and moving from right to left accumulates the gradient until
    +            the input layer is reached. Each layers respective weights are updated while
    +            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
    +
    +        Parameters:
    +        ------------
    +            I   X (np.ndarray): The design matrix, with n rows of p features each.
    +            II  t (np.ndarray): The target vector, with n rows of p targets.
    +            III lam (float32): regularization parameter used to punish the weights in case of overfitting
    +
    +        Returns:
    +        ------------
    +            No return value.
    +
    +        """
    +        out_derivative = derivate(self.output_func)
    +        hidden_derivative = derivate(self.hidden_func)
    +
    +        for i in range(len(self.weights) - 1, -1, -1):
    +            # delta terms for output
    +            if i == len(self.weights) - 1:
    +                # for multi-class classification
    +                if (
    +                    self.output_func.__name__ == "softmax"
    +                ):
    +                    delta_matrix = self.a_matrices[i + 1] - t
    +                # for single class classification
    +                else:
    +                    cost_func_derivative = grad(self.cost_func(t))
    +                    delta_matrix = out_derivative(
    +                        self.z_matrices[i + 1]
    +                    ) * cost_func_derivative(self.a_matrices[i + 1])
    +
    +            # delta terms for hidden layer
    +            else:
    +                delta_matrix = (
    +                    self.weights[i + 1][1:, :] @ delta_matrix.T
    +                ).T * hidden_derivative(self.z_matrices[i + 1])
    +
    +            # calculate gradient
    +            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
    +            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
    +                1, delta_matrix.shape[1]
    +            )
    +
    +            # regularization term
    +            gradient_weights += self.weights[i][1:, :] * lam
    +
    +            # use scheduler
    +            update_matrix = np.vstack(
    +                [
    +                    self.schedulers_bias[i].update_change(gradient_bias),
    +                    self.schedulers_weight[i].update_change(gradient_weights),
    +                ]
    +            )
    +
    +            # update weights and bias
    +            self.weights[i] -= update_matrix
    +
    +    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
    +        """
    +        Description:
    +        ------------
    +            Calculates accuracy of given prediction to target
    +
    +        Parameters:
    +        ------------
    +            I   prediction (np.ndarray): vector of predicitons output network
    +                (1s and 0s in case of classification, and real numbers in case of regression)
    +            II  target (np.ndarray): vector of true values (What the network ideally should predict)
    +
    +        Returns:
    +        ------------
    +            A floating point number representing the percentage of correctly classified instances.
    +        """
    +        assert prediction.size == target.size
    +        return np.average((target == prediction))
    +    def _set_classification(self):
    +        """
    +        Description:
    +        ------------
    +            Decides if FFNN acts as classifier (True) og regressor (False),
    +            sets self.classification during init()
    +        """
    +        self.classification = False
    +        if (
    +            self.cost_func.__name__ == "CostLogReg"
    +            or self.cost_func.__name__ == "CostCrossEntropy"
    +        ):
    +            self.classification = True
    +
    +    def _progress_bar(self, progression, **kwargs):
    +        """
    +        Description:
    +        ------------
    +            Displays progress of training
    +        """
    +        print_length = 40
    +        num_equals = int(progression * print_length)
    +        num_not = print_length - num_equals
    +        arrow = ">" if num_equals > 0 else ""
    +        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
    +        perc_print = self._format(progression * 100, decimals=5)
    +        line = f"  {bar} {perc_print}% "
    +
    +        for key in kwargs:
    +            if not np.isnan(kwargs[key]):
    +                value = self._format(kwargs[key], decimals=4)
    +                line += f"| {key}: {value} "
    +        sys.stdout.write("\r" + line)
    +        sys.stdout.flush()
    +        return len(line)
    +
    +    def _format(self, value, decimals=4):
    +        """
    +        Description:
    +        ------------
    +            Formats decimal numbers for progress bar
    +        """
    +        if value > 0:
    +            v = value
    +        elif value < 0:
    +            v = -10 * value
    +        else:
    +            v = 1
    +        n = 1 + math.floor(math.log10(v))
    +        if n >= decimals - 1:
    +            return str(round(value))
    +        return f"{value:.{decimals-n-1}f}"
    +
    +
    +
    +
    +

    Before we make a model, we will quickly generate a dataset we can use +for our linear regression problem as shown below

    +
    +
    +
    import autograd.numpy as np
    +from sklearn.model_selection import train_test_split
    +
    +def SkrankeFunction(x, y):
    +    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
    +
    +def create_X(x, y, n):
    +    if len(x.shape) > 1:
    +        x = np.ravel(x)
    +        y = np.ravel(y)
    +
    +    N = len(x)
    +    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
    +    X = np.ones((N, l))
    +
    +    for i in range(1, n + 1):
    +        q = int((i) * (i + 1) / 2)
    +        for k in range(i + 1):
    +            X[:, q + k] = (x ** (i - k)) * (y**k)
    +
    +    return X
    +
    +step=0.5
    +x = np.arange(0, 1, step)
    +y = np.arange(0, 1, step)
    +x, y = np.meshgrid(x, y)
    +target = SkrankeFunction(x, y)
    +target = target.reshape(target.shape[0], 1)
    +
    +poly_degree=3
    +X = create_X(x, y, poly_degree)
    +
    +X_train, X_test, t_train, t_test = train_test_split(X, target)
    +
    +
    +
    +
    +

    Now that we have our dataset ready for the regression, we can create +our regressor. Note that with the seed parameter, we can make sure our +results stay the same every time we run the neural network. For +inititialization, we simply specify the dimensions (we wish the amount +of input nodes to be equal to the datapoints, and the output to +predict one value).

    +
    +
    +
    input_nodes = X_train.shape[1]
    +output_nodes = 1
    +
    +linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
    +
    +
    +
    +
    +

    We then fit our model with our training data using the scheduler of our choice.

    +
    +
    +
    linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Constant(eta=1e-3)
    +scores = linear_regression.fit(X_train, t_train, scheduler)
    +
    +
    +
    +
    +

    Due to the progress bar we can see the MSE (train_error) throughout +the FFNN’s training. Note that the fit() function has some optional +parameters with defualt arguments. For example, the regularization +hyperparameter can be left ignored if not needed, and equally the FFNN +will by default run for 100 epochs. These can easily be changed, such +as for example:

    +
    +
    +
    linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
    +
    +
    +
    +
    +

    We see that given more epochs to train on, the regressor reaches a lower MSE.

    +

    Let us then switch to a binary classification. We use a binary +classification dataset, and follow a similar setup to the regression +case.

    +
    +
    +
    from sklearn.datasets import load_breast_cancer
    +from sklearn.preprocessing import MinMaxScaler
    +
    +wisconsin = load_breast_cancer()
    +X = wisconsin.data
    +target = wisconsin.target
    +target = target.reshape(target.shape[0], 1)
    +
    +X_train, X_val, t_train, t_val = train_test_split(X, target)
    +
    +scaler = MinMaxScaler()
    +scaler.fit(X_train)
    +X_train = scaler.transform(X_train)
    +X_val = scaler.transform(X_val)
    +
    +
    +
    +
    +
    +
    +
    input_nodes = X_train.shape[1]
    +output_nodes = 1
    +
    +logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
    +
    +
    +
    +
    +

    We will now make use of our validation data by passing it into our fit function as a keyword argument

    +
    +
    +
    logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
    +scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
    +
    +
    +
    +
    +

    Finally, we will create a neural network with 2 hidden layers with activation functions.

    +
    +
    +
    input_nodes = X_train.shape[1]
    +hidden_nodes1 = 100
    +hidden_nodes2 = 30
    +output_nodes = 1
    +
    +dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
    +
    +neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
    +
    +
    +
    +
    +
    +
    +
    neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
    +scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
    +
    +
    +
    +
    +
    +
    +

    Multiclass classification#

    +

    Finally, we will demonstrate the use case of multiclass classification +using our FFNN with the famous MNIST dataset, which contain images of +digits between the range of 0 to 9.

    +
    +
    +
    from sklearn.datasets import load_digits
    +
    +def onehot(target: np.ndarray):
    +    onehot = np.zeros((target.size, target.max() + 1))
    +    onehot[np.arange(target.size), target] = 1
    +    return onehot
    +
    +digits = load_digits()
    +
    +X = digits.data
    +target = digits.target
    +target = onehot(target)
    +
    +input_nodes = 64
    +hidden_nodes1 = 100
    +hidden_nodes2 = 30
    +output_nodes = 10
    +
    +dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
    +
    +multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
    +
    +multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
    +scores = multiclass.fit(X, target, scheduler, epochs=1000)
    +
    +
    +
    +
    +
    +
    +
    +

    Testing the XOR gate and other gates#

    +

    Let us now use our code to test the XOR gate.

    +
    +
    +
    X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
    +
    +# The XOR gate
    +yXOR = np.array( [[ 0], [1] ,[1], [0]])
    +
    +input_nodes = X.shape[1]
    +output_nodes = 1
    +
    +logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
    +logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
    +scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
    +
    +
    +
    +
    +

    Not bad, but the results depend strongly on the learning reate. Try different learning rates.

    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week43.html b/doc/LectureNotes/_build/html/week43.html new file mode 100644 index 000000000..0f4021673 --- /dev/null +++ b/doc/LectureNotes/_build/html/week43.html @@ -0,0 +1,4614 @@ + + + + + + + + + + + Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo, Norway

    +

    Date: October 20, 2025

    +
    +

    Plans for week 43#

    +

    Material for the lecture on Monday October 20, 2025.

    +
      +
    1. Reminder from last week, see also lecture notes from week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html as well as those from week 41, see see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html.

    2. +
    3. Building our own Feed-forward Neural Network.

    4. +
    5. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka’s text, see chapters 11-13..

    6. +
    7. Start discussions on how to use neural networks for solving differential equations (ordinary and partial ones). This topic continues next week as well.

    8. +
    9. Video of lecture at https://youtu.be/Gi6mzxAT0Ew

    10. +
    11. Whiteboard notes at CompPhysics/MachineLearning

    12. +
    +
    +
    +

    Exercises and lab session week 43#

    +

    Lab sessions on Tuesday and Wednesday.

    +
      +
    1. Work on writing your own neural network code and discussions of project 2. If you didn’t get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.

    2. +
    3. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems

    4. +
    +
    +
    +

    Using Automatic differentiation#

    +

    In our discussions of ordinary differential equations and neural network codes +we will also study the usage of Autograd, see for example https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at HIPS/autograd and the lecture slides from week 41, see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html.

    +
    +
    +

    Back propagation and automatic differentiation#

    +

    For more details on the back propagation algorithm and automatic differentiation see

    +
      +
    1. https://www.jmlr.org/papers/volume18/17-468/17-468.pdf

    2. +
    3. https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf

    4. +
    5. Slides 12-44 at http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf

    6. +
    +
    +
    +

    Lecture Monday October 20#

    +
    +
    +

    Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations#

    +

    This is a reminder from last week.

    +

    The architecture (our model).

    +
      +
    1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)

    2. +
    3. Define the number of hidden layers and hidden nodes

    4. +
    5. Define activation functions for hidden layers and output layers

    6. +
    7. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates

    8. +
    9. Define cost function and possible regularization terms with hyperparameters

    10. +
    11. Initialize weights and biases

    12. +
    13. Fix number of iterations for the feed forward part and back propagation part

    14. +
    +
    +
    +

    Setting up the back propagation algorithm, part 1#

    +

    Let us write this out in the form of an algorithm.

    +

    First, we set up the input data \(\boldsymbol{x}\) and the activations +\(\boldsymbol{z}_1\) of the input layer and compute the activation function and +the pertinent outputs \(\boldsymbol{a}^1\).

    +

    Secondly, we perform then the feed forward till we reach the output +layer and compute all \(\boldsymbol{z}_l\) of the input layer and compute the +activation function and the pertinent outputs \(\boldsymbol{a}^l\) for +\(l=1,2,3,\dots,L\).

    +

    Notation: The first hidden layer has \(l=1\) as label and the final output layer has \(l=L\).

    +
    +
    +

    Setting up the back propagation algorithm, part 2#

    +

    Thereafter we compute the ouput error \(\boldsymbol{\delta}^L\) by computing all

    +
    +\[ +\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}. +\]
    +

    Then we compute the back propagate error for each \(l=L-1,L-2,\dots,1\) as

    +
    +\[ +\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l). +\]
    +
    +
    +

    Setting up the Back propagation algorithm, part 3#

    +

    Finally, we update the weights and the biases using gradient descent +for each \(l=L-1,L-2,\dots,1\) (the first hidden layer) and update the weights and biases +according to the rules

    +
    +\[ +w_{ij}^l\leftarrow = w_{ij}^l- \eta \delta_j^la_i^{l-1}, +\]
    +
    +\[ +b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l, +\]
    +

    with \(\eta\) being the learning rate.

    +
    +
    +

    Updating the gradients#

    +

    With the back propagate error for each \(l=L-1,L-2,\dots,1\) as

    +
    +\[ +\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l), +\]
    +

    we update the weights and the biases using gradient descent for each \(l=L-1,L-2,\dots,1\) and update the weights and biases according to the rules

    +
    +\[ +w_{ij}^l\leftarrow = w_{ij}^l- \eta \delta_j^la_i^{l-1}, +\]
    +
    +\[ +b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l, +\]
    +
    +
    +

    Activation functions#

    +

    A property that characterizes a neural network, other than its +connectivity, is the choice of activation function(s). The following +restrictions are imposed on an activation function for an FFNN to +fulfill the universal approximation theorem

    +
      +
    • Non-constant

    • +
    • Bounded

    • +
    • Monotonically-increasing

    • +
    • Continuous

    • +
    +
    +

    Activation functions, examples#

    +

    Typical examples are the logistic Sigmoid

    +
    +\[ +\sigma(x) = \frac{1}{1 + e^{-x}}, +\]
    +

    and the hyperbolic tangent function

    +
    +\[ +\sigma(x) = \tanh(x) +\]
    +
    +
    +
    +

    The RELU function family#

    +

    The ReLU activation function suffers from a problem known as the dying +ReLUs: during training, some neurons effectively die, meaning they +stop outputting anything other than 0.

    +

    In some cases, you may find that half of your network’s neurons are +dead, especially if you used a large learning rate. During training, +if a neuron’s weights get updated such that the weighted sum of the +neuron’s inputs is negative, it will start outputting 0. When this +happen, the neuron is unlikely to come back to life since the gradient +of the ReLU function is 0 when its input is negative.

    +
    +
    +

    ELU function#

    +

    To solve this problem, nowadays practitioners use a variant of the +ReLU function, such as the leaky ReLU discussed above or the so-called +exponential linear unit (ELU) function

    +
    +\[\begin{split} +ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\ z & z \ge 0.\end{array}\right. +\end{split}\]
    +
    +
    +

    Which activation function should we use?#

    +

    In general it seems that the ELU activation function is better than +the leaky ReLU function (and its variants), which is better than +ReLU. ReLU performs better than \(\tanh\) which in turn performs better +than the logistic function.

    +

    If runtime performance is an issue, then you may opt for the leaky +ReLU function over the ELU function If you don’t want to tweak yet +another hyperparameter, you may just use the default \(\alpha\) of +\(0.01\) for the leaky ReLU, and \(1\) for ELU. If you have spare time and +computing power, you can use cross-validation or bootstrap to evaluate +other activation functions.

    +
    +
    +

    More on activation functions, output layers#

    +

    In most cases you can use the ReLU activation function in the hidden +layers (or one of its variants).

    +

    It is a bit faster to compute than other activation functions, and the +gradient descent optimization does in general not get stuck.

    +

    For the output layer:

    +
      +
    • For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).

    • +
    • For regression tasks, you can simply use no activation function at all.

    • +
    +
    +
    +

    Building neural networks in Tensorflow and Keras#

    +

    Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn +and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy +and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.

    +

    In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite +clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or +NumPy arrays.

    +
    +
    +

    Tensorflow#

    +

    Tensorflow is an open source library machine learning library +developed by the Google Brain team for internal use. It was released +under the Apache 2.0 open source license in November 9, 2015.

    +

    Tensorflow is a computational framework that allows you to construct +machine learning models at different levels of abstraction, from +high-level, object-oriented APIs like Keras, down to the C++ kernels +that Tensorflow is built upon. The higher levels of abstraction are +simpler to use, but less flexible, and our choice of implementation +should reflect the problems we are trying to solve.

    +

    Tensorflow uses so-called graphs to represent your computation +in terms of the dependencies between individual operations, such that you first build a Tensorflow graph +to represent your model, and then create a Tensorflow session to run the graph.

    +

    In this guide we will analyze the same data as we did in our NumPy and +scikit-learn tutorial, gathered from the MNIST database of images. We +will give an introduction to the lower level Python Application +Program Interfaces (APIs), and see how we use them to build our graph. +Then we will build (effectively) the same graph in Keras, to see just +how simple solving a machine learning problem can be.

    +

    To install tensorflow on Unix/Linux systems, use pip as

    +
    +
    +
    pip3 install tensorflow
    +
    +
    +
    +
    +

    and/or if you use anaconda, just write (or install from the graphical user interface) +(current release of CPU-only TensorFlow)

    +
    +
    +
    conda create -n tf tensorflow
    +conda activate tf
    +
    +
    +
    +
    +

    To install the current release of GPU TensorFlow

    +
    +
    +
    conda create -n tf-gpu tensorflow-gpu
    +conda activate tf-gpu
    +
    +
    +
    +
    +
    +
    +

    Using Keras#

    +

    Keras is a high level neural network +that supports Tensorflow, CTNK and Theano as backends.
    +If you have Anaconda installed you may run the following command

    +
    +
    +
    conda install keras
    +
    +
    +
    +
    +

    You can look up the instructions here for more information.

    +

    We will to a large extent use keras in this course.

    +
    +
    +

    Collect and pre-process data#

    +

    Let us look again at the MINST data set.

    +
    +
    +
    %matplotlib inline
    +
    +# import necessary packages
    +import numpy as np
    +import matplotlib.pyplot as plt
    +import tensorflow as tf
    +from sklearn import datasets
    +
    +
    +# ensure the same random numbers appear every time
    +np.random.seed(0)
    +
    +# display images in notebook
    +%matplotlib inline
    +plt.rcParams['figure.figsize'] = (12,12)
    +
    +
    +# download MNIST dataset
    +digits = datasets.load_digits()
    +
    +# define inputs and labels
    +inputs = digits.images
    +labels = digits.target
    +
    +print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
    +print("labels = (n_inputs) = " + str(labels.shape))
    +
    +
    +# flatten the image
    +# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
    +n_inputs = len(inputs)
    +inputs = inputs.reshape(n_inputs, -1)
    +print("X = (n_inputs, n_features) = " + str(inputs.shape))
    +
    +
    +# choose some random images to display
    +indices = np.arange(n_inputs)
    +random_indices = np.random.choice(indices, size=5)
    +
    +for i, image in enumerate(digits.images[random_indices]):
    +    plt.subplot(1, 5, i+1)
    +    plt.axis('off')
    +    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    +    plt.title("Label: %d" % digits.target[random_indices[i]])
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    from tensorflow.keras.layers import Input
    +from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
    +from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
    +from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
    +from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
    +from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
    +
    +from sklearn.model_selection import train_test_split
    +
    +# one-hot representation of labels
    +labels = to_categorical(labels)
    +
    +# split into train and test data
    +train_size = 0.8
    +test_size = 1 - train_size
    +X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
    +                                                    test_size=test_size)
    +
    +
    +
    +
    +
    +
    +
    
    +epochs = 100
    +batch_size = 100
    +n_neurons_layer1 = 100
    +n_neurons_layer2 = 50
    +n_categories = 10
    +eta_vals = np.logspace(-5, 1, 7)
    +lmbd_vals = np.logspace(-5, 1, 7)
    +def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
    +    model = Sequential()
    +    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
    +    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
    +    model.add(Dense(n_categories, activation='softmax'))
    +    
    +    sgd = optimizers.SGD(learning_rate=eta)
    +    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    +    
    +    return model
    +
    +
    +
    +
    +
    +
    +
    DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
    +        
    +for i, eta in enumerate(eta_vals):
    +    for j, lmbd in enumerate(lmbd_vals):
    +        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
    +                                         eta=eta, lmbd=lmbd)
    +        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
    +        scores = DNN.evaluate(X_test, Y_test)
    +        
    +        DNN_keras[i][j] = DNN
    +        
    +        print("Learning rate = ", eta)
    +        print("Lambda = ", lmbd)
    +        print("Test accuracy: %.3f" % scores[1])
    +        print()
    +
    +
    +
    +
    +
    +
    +
    # optional
    +# visual representation of grid search
    +# uses seaborn heatmap, could probably do this in matplotlib
    +import seaborn as sns
    +
    +sns.set()
    +
    +train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +
    +for i in range(len(eta_vals)):
    +    for j in range(len(lmbd_vals)):
    +        DNN = DNN_keras[i][j]
    +
    +        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
    +        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
    +
    +        
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Training Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Test Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Using Pytorch with the full MNIST data set#

    +
    +
    +
    import torch
    +import torch.nn as nn
    +import torch.optim as optim
    +import torchvision
    +import torchvision.transforms as transforms
    +
    +# Device configuration: use GPU if available
    +device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    +
    +# MNIST dataset (downloads if not already present)
    +transform = transforms.Compose([
    +    transforms.ToTensor(),
    +    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)
    +])
    +train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
    +test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
    +
    +train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
    +test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
    +
    +
    +class NeuralNet(nn.Module):
    +    def __init__(self):
    +        super(NeuralNet, self).__init__()
    +        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)
    +        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)
    +        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)
    +    def forward(self, x):
    +        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784
    +        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation
    +        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation
    +        x = self.fc3(x)                   # output layer (logits for 10 classes)
    +        return x
    +
    +model = NeuralNet().to(device)
    +
    +
    +criterion = nn.CrossEntropyLoss()
    +optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
    +
    +num_epochs = 10
    +for epoch in range(num_epochs):
    +    model.train()  # set model to training mode
    +    running_loss = 0.0
    +    for images, labels in train_loader:
    +        # Move data to device (GPU if available, else CPU)
    +        images, labels = images.to(device), labels.to(device)
    +
    +        optimizer.zero_grad()            # reset gradients to zero
    +        outputs = model(images)          # forward pass: compute predictions
    +        loss = criterion(outputs, labels)  # compute cross-entropy loss
    +        loss.backward()                 # backpropagate to compute gradients
    +        optimizer.step()                # update weights using SGD step 
    +
    +        running_loss += loss.item()
    +    # Compute average loss over all batches in this epoch
    +    avg_loss = running_loss / len(train_loader)
    +    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}")
    +
    +#Evaluation on the Test Set
    +
    +
    +
    +model.eval()  # set model to evaluation mode 
    +correct = 0
    +total = 0
    +with torch.no_grad():  # disable gradient calculation for evaluation 
    +    for images, labels in test_loader:
    +        images, labels = images.to(device), labels.to(device)
    +        outputs = model(images)
    +        _, predicted = torch.max(outputs, dim=1)  # class with highest score
    +        total += labels.size(0)
    +        correct += (predicted == labels).sum().item()
    +
    +accuracy = 100 * correct / total
    +print(f"Test Accuracy: {accuracy:.2f}%")
    +
    +
    +
    +
    +
    +
    +

    And a similar example using Tensorflow with Keras#

    +
    +
    +
    
    +import tensorflow as tf
    +from tensorflow import keras
    +from tensorflow.keras import layers, regularizers
    +
    +# Check for GPU (TensorFlow will use it automatically if available)
    +gpus = tf.config.list_physical_devices('GPU')
    +print(f"GPUs available: {gpus}")
    +
    +# 1) Load and preprocess MNIST
    +(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    +# Normalize to [0, 1]
    +x_train = (x_train.astype("float32") / 255.0)
    +x_test  = (x_test.astype("float32") / 255.0)
    +
    +# 2) Build the model: 784 -> 100 -> 100 -> 10
    +l2_reg = 1e-4  # L2 regularization strength
    +
    +model = keras.Sequential([
    +    layers.Input(shape=(28, 28)),
    +    layers.Flatten(),
    +    layers.Dense(100, activation="relu",
    +                 kernel_regularizer=regularizers.l2(l2_reg)),
    +    layers.Dense(100, activation="relu",
    +                 kernel_regularizer=regularizers.l2(l2_reg)),
    +    layers.Dense(10, activation="softmax")  # output probabilities for 10 classes
    +])
    +
    +# 3) Compile with SGD + weight decay via L2 regularizers
    +model.compile(
    +    optimizer=keras.optimizers.SGD(learning_rate=0.01),
    +    loss="sparse_categorical_crossentropy",
    +    metrics=["accuracy"],
    +)
    +
    +model.summary()
    +
    +# 4) Train
    +history = model.fit(
    +    x_train, y_train,
    +    epochs=10,
    +    batch_size=64,
    +    validation_split=0.1,  # optional: monitor validation during training
    +    verbose=1
    +)
    +
    +# 5) Evaluate on test set
    +test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
    +print(f"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}")
    +
    +
    +
    +
    +
    +
    +

    Building our own neural network code#

    +

    Here we present a flexible object oriented codebase +for a feed forward neural network, along with a demonstration of how +to use it. Before we get into the details of the neural network, we +will first present some implementations of various schedulers, cost +functions and activation functions that can be used together with the +neural network.

    +

    The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.

    +
    +

    Learning rate methods#

    +

    The code below shows object oriented implementations of the Constant, +Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All +of the classes belong to the shared abstract Scheduler class, and +share the update_change() and reset() methods allowing for any of the +schedulers to be seamlessly used during the training stage, as will +later be shown in the fit() method of the neural +network. Update_change() only has one parameter, the gradient +(\(δ^l_ja^{l−1}_k\)), and returns the change which will be subtracted +from the weights. The reset() function takes no parameters, and resets +the desired variables. For Constant and Momentum, reset does nothing.

    +
    +
    +
    import autograd.numpy as np
    +
    +class Scheduler:
    +    """
    +    Abstract class for Schedulers
    +    """
    +
    +    def __init__(self, eta):
    +        self.eta = eta
    +
    +    # should be overwritten
    +    def update_change(self, gradient):
    +        raise NotImplementedError
    +
    +    # overwritten if needed
    +    def reset(self):
    +        pass
    +
    +
    +class Constant(Scheduler):
    +    def __init__(self, eta):
    +        super().__init__(eta)
    +
    +    def update_change(self, gradient):
    +        return self.eta * gradient
    +    
    +    def reset(self):
    +        pass
    +
    +
    +class Momentum(Scheduler):
    +    def __init__(self, eta: float, momentum: float):
    +        super().__init__(eta)
    +        self.momentum = momentum
    +        self.change = 0
    +
    +    def update_change(self, gradient):
    +        self.change = self.momentum * self.change + self.eta * gradient
    +        return self.change
    +
    +    def reset(self):
    +        pass
    +
    +
    +class Adagrad(Scheduler):
    +    def __init__(self, eta):
    +        super().__init__(eta)
    +        self.G_t = None
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +
    +        if self.G_t is None:
    +            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
    +
    +        self.G_t += gradient @ gradient.T
    +
    +        G_t_inverse = 1 / (
    +            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
    +        )
    +        return self.eta * gradient * G_t_inverse
    +
    +    def reset(self):
    +        self.G_t = None
    +
    +
    +class AdagradMomentum(Scheduler):
    +    def __init__(self, eta, momentum):
    +        super().__init__(eta)
    +        self.G_t = None
    +        self.momentum = momentum
    +        self.change = 0
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +
    +        if self.G_t is None:
    +            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
    +
    +        self.G_t += gradient @ gradient.T
    +
    +        G_t_inverse = 1 / (
    +            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
    +        )
    +        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
    +        return self.change
    +
    +    def reset(self):
    +        self.G_t = None
    +
    +
    +class RMS_prop(Scheduler):
    +    def __init__(self, eta, rho):
    +        super().__init__(eta)
    +        self.rho = rho
    +        self.second = 0.0
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
    +        return self.eta * gradient / (np.sqrt(self.second + delta))
    +
    +    def reset(self):
    +        self.second = 0.0
    +
    +
    +class Adam(Scheduler):
    +    def __init__(self, eta, rho, rho2):
    +        super().__init__(eta)
    +        self.rho = rho
    +        self.rho2 = rho2
    +        self.moment = 0
    +        self.second = 0
    +        self.n_epochs = 1
    +
    +    def update_change(self, gradient):
    +        delta = 1e-8  # avoid division ny zero
    +
    +        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
    +        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
    +
    +        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
    +        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
    +
    +        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
    +
    +    def reset(self):
    +        self.n_epochs += 1
    +        self.moment = 0
    +        self.second = 0
    +
    +
    +
    +
    +
    +
    +

    Usage of the above learning rate schedulers#

    +

    To initalize a scheduler, simply create the object and pass in the +necessary parameters such as the learning rate and the momentum as +shown below. As the Scheduler class is an abstract class it should not +called directly, and will raise an error upon usage.

    +
    +
    +
    momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
    +adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
    +
    +
    +
    +
    +

    Here is a small example for how a segment of code using schedulers +could look. Switching out the schedulers is simple.

    +
    +
    +
    weights = np.ones((3,3))
    +print(f"Before scheduler:\n{weights=}")
    +
    +epochs = 10
    +for e in range(epochs):
    +    gradient = np.random.rand(3, 3)
    +    change = adam_scheduler.update_change(gradient)
    +    weights = weights - change
    +    adam_scheduler.reset()
    +
    +print(f"\nAfter scheduler:\n{weights=}")
    +
    +
    +
    +
    +
    +
    +

    Cost functions#

    +

    Here we discuss cost functions that can be used when creating the +neural network. Every cost function takes the target vector as its +parameter, and returns a function valued only at \(x\) such that it may +easily be differentiated.

    +
    +
    +
    import autograd.numpy as np
    +
    +def CostOLS(target):
    +    
    +    def func(X):
    +        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
    +
    +    return func
    +
    +
    +def CostLogReg(target):
    +
    +    def func(X):
    +        
    +        return -(1.0 / target.shape[0]) * np.sum(
    +            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
    +        )
    +
    +    return func
    +
    +
    +def CostCrossEntropy(target):
    +    
    +    def func(X):
    +        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
    +
    +    return func
    +
    +
    +
    +
    +

    Below we give a short example of how these cost function may be used +to obtain results if you wish to test them out on your own using +AutoGrad’s automatics differentiation.

    +
    +
    +
    from autograd import grad
    +
    +target = np.array([[1, 2, 3]]).T
    +a = np.array([[4, 5, 6]]).T
    +
    +cost_func = CostCrossEntropy
    +cost_func_derivative = grad(cost_func(target))
    +
    +valued_at_a = cost_func_derivative(a)
    +print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
    +
    +
    +
    +
    +
    +
    +

    Activation functions#

    +

    Finally, before we look at the neural network, we will look at the +activation functions which can be specified between the hidden layers +and as the output function. Each function can be valued for any given +vector or matrix X, and can be differentiated via derivate().

    +
    +
    +
    import autograd.numpy as np
    +from autograd import elementwise_grad
    +
    +def identity(X):
    +    return X
    +
    +
    +def sigmoid(X):
    +    try:
    +        return 1.0 / (1 + np.exp(-X))
    +    except FloatingPointError:
    +        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
    +
    +
    +def softmax(X):
    +    X = X - np.max(X, axis=-1, keepdims=True)
    +    delta = 10e-10
    +    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
    +
    +
    +def RELU(X):
    +    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
    +
    +
    +def LRELU(X):
    +    delta = 10e-4
    +    return np.where(X > np.zeros(X.shape), X, delta * X)
    +
    +
    +def derivate(func):
    +    if func.__name__ == "RELU":
    +
    +        def func(X):
    +            return np.where(X > 0, 1, 0)
    +
    +        return func
    +
    +    elif func.__name__ == "LRELU":
    +
    +        def func(X):
    +            delta = 10e-4
    +            return np.where(X > 0, 1, delta)
    +
    +        return func
    +
    +    else:
    +        return elementwise_grad(func)
    +
    +
    +
    +
    +

    Below follows a short demonstration of how to use an activation +function. The derivative of the activation function will be important +when calculating the output delta term during backpropagation. Note +that derivate() can also be used for cost functions for a more +generalized approach.

    +
    +
    +
    z = np.array([[4, 5, 6]]).T
    +print(f"Input to activation function:\n{z}")
    +
    +act_func = sigmoid
    +a = act_func(z)
    +print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
    +
    +act_func_derivative = derivate(act_func)
    +valued_at_z = act_func_derivative(a)
    +print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
    +
    +
    +
    +
    +
    +
    +

    The Neural Network#

    +

    Now that we have gotten a good understanding of the implementation of +some important components, we can take a look at an object oriented +implementation of a feed forward neural network. The feed forward +neural network has been implemented as a class named FFNN, which can +be initiated as a regressor or classifier dependant on the choice of +cost function. The FFNN can have any number of input nodes, hidden +layers with any amount of hidden nodes, and any amount of output nodes +meaning it can perform multiclass classification as well as binary +classification and regression problems. Although there is a lot of +code present, it makes for an easy to use and generalizeable interface +for creating many types of neural networks as will be demonstrated +below.

    +
    +
    +
    import math
    +import autograd.numpy as np
    +import sys
    +import warnings
    +from autograd import grad, elementwise_grad
    +from random import random, seed
    +from copy import deepcopy, copy
    +from typing import Tuple, Callable
    +from sklearn.utils import resample
    +
    +warnings.simplefilter("error")
    +
    +
    +class FFNN:
    +    """
    +    Description:
    +    ------------
    +        Feed Forward Neural Network with interface enabling flexible design of a
    +        nerual networks architecture and the specification of activation function
    +        in the hidden layers and output layer respectively. This model can be used
    +        for both regression and classification problems, depending on the output function.
    +
    +    Attributes:
    +    ------------
    +        I   dimensions (tuple[int]): A list of positive integers, which specifies the
    +            number of nodes in each of the networks layers. The first integer in the array
    +            defines the number of nodes in the input layer, the second integer defines number
    +            of nodes in the first hidden layer and so on until the last number, which
    +            specifies the number of nodes in the output layer.
    +        II  hidden_func (Callable): The activation function for the hidden layers
    +        III output_func (Callable): The activation function for the output layer
    +        IV  cost_func (Callable): Our cost function
    +        V   seed (int): Sets random seed, makes results reproducible
    +    """
    +
    +    def __init__(
    +        self,
    +        dimensions: tuple[int],
    +        hidden_func: Callable = sigmoid,
    +        output_func: Callable = lambda x: x,
    +        cost_func: Callable = CostOLS,
    +        seed: int = None,
    +    ):
    +        self.dimensions = dimensions
    +        self.hidden_func = hidden_func
    +        self.output_func = output_func
    +        self.cost_func = cost_func
    +        self.seed = seed
    +        self.weights = list()
    +        self.schedulers_weight = list()
    +        self.schedulers_bias = list()
    +        self.a_matrices = list()
    +        self.z_matrices = list()
    +        self.classification = None
    +
    +        self.reset_weights()
    +        self._set_classification()
    +
    +    def fit(
    +        self,
    +        X: np.ndarray,
    +        t: np.ndarray,
    +        scheduler: Scheduler,
    +        batches: int = 1,
    +        epochs: int = 100,
    +        lam: float = 0,
    +        X_val: np.ndarray = None,
    +        t_val: np.ndarray = None,
    +    ):
    +        """
    +        Description:
    +        ------------
    +            This function performs the training the neural network by performing the feedforward and backpropagation
    +            algorithm to update the networks weights.
    +
    +        Parameters:
    +        ------------
    +            I    X (np.ndarray) : training data
    +            II   t (np.ndarray) : target data
    +            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
    +            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
    +
    +        Optional Parameters:
    +        ------------
    +            V    batches (int) : number of batches the datasets are split into, default equal to 1
    +            VI   epochs (int) : number of iterations used to train the network, default equal to 100
    +            VII  lam (float) : regularization hyperparameter lambda
    +            VIII X_val (np.ndarray) : validation set
    +            IX   t_val (np.ndarray) : validation target set
    +
    +        Returns:
    +        ------------
    +            I   scores (dict) : A dictionary containing the performance metrics of the model.
    +                The number of the metrics depends on the parameters passed to the fit-function.
    +
    +        """
    +
    +        # setup 
    +        if self.seed is not None:
    +            np.random.seed(self.seed)
    +
    +        val_set = False
    +        if X_val is not None and t_val is not None:
    +            val_set = True
    +
    +        # creating arrays for score metrics
    +        train_errors = np.empty(epochs)
    +        train_errors.fill(np.nan)
    +        val_errors = np.empty(epochs)
    +        val_errors.fill(np.nan)
    +
    +        train_accs = np.empty(epochs)
    +        train_accs.fill(np.nan)
    +        val_accs = np.empty(epochs)
    +        val_accs.fill(np.nan)
    +
    +        self.schedulers_weight = list()
    +        self.schedulers_bias = list()
    +
    +        batch_size = X.shape[0] // batches
    +
    +        X, t = resample(X, t)
    +
    +        # this function returns a function valued only at X
    +        cost_function_train = self.cost_func(t)
    +        if val_set:
    +            cost_function_val = self.cost_func(t_val)
    +
    +        # create schedulers for each weight matrix
    +        for i in range(len(self.weights)):
    +            self.schedulers_weight.append(copy(scheduler))
    +            self.schedulers_bias.append(copy(scheduler))
    +
    +        print(f"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}")
    +
    +        try:
    +            for e in range(epochs):
    +                for i in range(batches):
    +                    # allows for minibatch gradient descent
    +                    if i == batches - 1:
    +                        # If the for loop has reached the last batch, take all thats left
    +                        X_batch = X[i * batch_size :, :]
    +                        t_batch = t[i * batch_size :, :]
    +                    else:
    +                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
    +                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
    +
    +                    self._feedforward(X_batch)
    +                    self._backpropagate(X_batch, t_batch, lam)
    +
    +                # reset schedulers for each epoch (some schedulers pass in this call)
    +                for scheduler in self.schedulers_weight:
    +                    scheduler.reset()
    +
    +                for scheduler in self.schedulers_bias:
    +                    scheduler.reset()
    +
    +                # computing performance metrics
    +                pred_train = self.predict(X)
    +                train_error = cost_function_train(pred_train)
    +
    +                train_errors[e] = train_error
    +                if val_set:
    +                    
    +                    pred_val = self.predict(X_val)
    +                    val_error = cost_function_val(pred_val)
    +                    val_errors[e] = val_error
    +
    +                if self.classification:
    +                    train_acc = self._accuracy(self.predict(X), t)
    +                    train_accs[e] = train_acc
    +                    if val_set:
    +                        val_acc = self._accuracy(pred_val, t_val)
    +                        val_accs[e] = val_acc
    +
    +                # printing progress bar
    +                progression = e / epochs
    +                print_length = self._progress_bar(
    +                    progression,
    +                    train_error=train_errors[e],
    +                    train_acc=train_accs[e],
    +                    val_error=val_errors[e],
    +                    val_acc=val_accs[e],
    +                )
    +        except KeyboardInterrupt:
    +            # allows for stopping training at any point and seeing the result
    +            pass
    +
    +        # visualization of training progression (similiar to tensorflow progression bar)
    +        sys.stdout.write("\r" + " " * print_length)
    +        sys.stdout.flush()
    +        self._progress_bar(
    +            1,
    +            train_error=train_errors[e],
    +            train_acc=train_accs[e],
    +            val_error=val_errors[e],
    +            val_acc=val_accs[e],
    +        )
    +        sys.stdout.write("")
    +
    +        # return performance metrics for the entire run
    +        scores = dict()
    +
    +        scores["train_errors"] = train_errors
    +
    +        if val_set:
    +            scores["val_errors"] = val_errors
    +
    +        if self.classification:
    +            scores["train_accs"] = train_accs
    +
    +            if val_set:
    +                scores["val_accs"] = val_accs
    +
    +        return scores
    +
    +    def predict(self, X: np.ndarray, *, threshold=0.5):
    +        """
    +         Description:
    +         ------------
    +             Performs prediction after training of the network has been finished.
    +
    +         Parameters:
    +        ------------
    +             I   X (np.ndarray): The design matrix, with n rows of p features each
    +
    +         Optional Parameters:
    +         ------------
    +             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
    +                 in classification problems
    +
    +         Returns:
    +         ------------
    +             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
    +                 This vector is thresholded if regression=False, meaning that classification results
    +                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
    +
    +        """
    +
    +        predict = self._feedforward(X)
    +
    +        if self.classification:
    +            return np.where(predict > threshold, 1, 0)
    +        else:
    +            return predict
    +
    +    def reset_weights(self):
    +        """
    +        Description:
    +        ------------
    +            Resets/Reinitializes the weights in order to train the network for a new problem.
    +
    +        """
    +        if self.seed is not None:
    +            np.random.seed(self.seed)
    +
    +        self.weights = list()
    +        for i in range(len(self.dimensions) - 1):
    +            weight_array = np.random.randn(
    +                self.dimensions[i] + 1, self.dimensions[i + 1]
    +            )
    +            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
    +
    +            self.weights.append(weight_array)
    +
    +    def _feedforward(self, X: np.ndarray):
    +        """
    +        Description:
    +        ------------
    +            Calculates the activation of each layer starting at the input and ending at the output.
    +            Each following activation is calculated from a weighted sum of each of the preceeding
    +            activations (except in the case of the input layer).
    +
    +        Parameters:
    +        ------------
    +            I   X (np.ndarray): The design matrix, with n rows of p features each
    +
    +        Returns:
    +        ------------
    +            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
    +        """
    +
    +        # reset matrices
    +        self.a_matrices = list()
    +        self.z_matrices = list()
    +
    +        # if X is just a vector, make it into a matrix
    +        if len(X.shape) == 1:
    +            X = X.reshape((1, X.shape[0]))
    +
    +        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
    +        # to add bias to our data
    +        bias = np.ones((X.shape[0], 1)) * 0.01
    +        X = np.hstack([bias, X])
    +
    +        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
    +        # exponent indicates layer number).
    +        a = X
    +        self.a_matrices.append(a)
    +        self.z_matrices.append(a)
    +
    +        # The feed forward algorithm
    +        for i in range(len(self.weights)):
    +            if i < len(self.weights) - 1:
    +                z = a @ self.weights[i]
    +                self.z_matrices.append(z)
    +                a = self.hidden_func(z)
    +                # bias column again added to the data here
    +                bias = np.ones((a.shape[0], 1)) * 0.01
    +                a = np.hstack([bias, a])
    +                self.a_matrices.append(a)
    +            else:
    +                try:
    +                    # a^L, the nodes in our output layers
    +                    z = a @ self.weights[i]
    +                    a = self.output_func(z)
    +                    self.a_matrices.append(a)
    +                    self.z_matrices.append(z)
    +                except Exception as OverflowError:
    +                    print(
    +                        "OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling"
    +                    )
    +
    +        # this will be a^L
    +        return a
    +
    +    def _backpropagate(self, X, t, lam):
    +        """
    +        Description:
    +        ------------
    +            Performs the backpropagation algorithm. In other words, this method
    +            calculates the gradient of all the layers starting at the
    +            output layer, and moving from right to left accumulates the gradient until
    +            the input layer is reached. Each layers respective weights are updated while
    +            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
    +
    +        Parameters:
    +        ------------
    +            I   X (np.ndarray): The design matrix, with n rows of p features each.
    +            II  t (np.ndarray): The target vector, with n rows of p targets.
    +            III lam (float32): regularization parameter used to punish the weights in case of overfitting
    +
    +        Returns:
    +        ------------
    +            No return value.
    +
    +        """
    +        out_derivative = derivate(self.output_func)
    +        hidden_derivative = derivate(self.hidden_func)
    +
    +        for i in range(len(self.weights) - 1, -1, -1):
    +            # delta terms for output
    +            if i == len(self.weights) - 1:
    +                # for multi-class classification
    +                if (
    +                    self.output_func.__name__ == "softmax"
    +                ):
    +                    delta_matrix = self.a_matrices[i + 1] - t
    +                # for single class classification
    +                else:
    +                    cost_func_derivative = grad(self.cost_func(t))
    +                    delta_matrix = out_derivative(
    +                        self.z_matrices[i + 1]
    +                    ) * cost_func_derivative(self.a_matrices[i + 1])
    +
    +            # delta terms for hidden layer
    +            else:
    +                delta_matrix = (
    +                    self.weights[i + 1][1:, :] @ delta_matrix.T
    +                ).T * hidden_derivative(self.z_matrices[i + 1])
    +
    +            # calculate gradient
    +            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
    +            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
    +                1, delta_matrix.shape[1]
    +            )
    +
    +            # regularization term
    +            gradient_weights += self.weights[i][1:, :] * lam
    +
    +            # use scheduler
    +            update_matrix = np.vstack(
    +                [
    +                    self.schedulers_bias[i].update_change(gradient_bias),
    +                    self.schedulers_weight[i].update_change(gradient_weights),
    +                ]
    +            )
    +
    +            # update weights and bias
    +            self.weights[i] -= update_matrix
    +
    +    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
    +        """
    +        Description:
    +        ------------
    +            Calculates accuracy of given prediction to target
    +
    +        Parameters:
    +        ------------
    +            I   prediction (np.ndarray): vector of predicitons output network
    +                (1s and 0s in case of classification, and real numbers in case of regression)
    +            II  target (np.ndarray): vector of true values (What the network ideally should predict)
    +
    +        Returns:
    +        ------------
    +            A floating point number representing the percentage of correctly classified instances.
    +        """
    +        assert prediction.size == target.size
    +        return np.average((target == prediction))
    +    def _set_classification(self):
    +        """
    +        Description:
    +        ------------
    +            Decides if FFNN acts as classifier (True) og regressor (False),
    +            sets self.classification during init()
    +        """
    +        self.classification = False
    +        if (
    +            self.cost_func.__name__ == "CostLogReg"
    +            or self.cost_func.__name__ == "CostCrossEntropy"
    +        ):
    +            self.classification = True
    +
    +    def _progress_bar(self, progression, **kwargs):
    +        """
    +        Description:
    +        ------------
    +            Displays progress of training
    +        """
    +        print_length = 40
    +        num_equals = int(progression * print_length)
    +        num_not = print_length - num_equals
    +        arrow = ">" if num_equals > 0 else ""
    +        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
    +        perc_print = self._format(progression * 100, decimals=5)
    +        line = f"  {bar} {perc_print}% "
    +
    +        for key in kwargs:
    +            if not np.isnan(kwargs[key]):
    +                value = self._format(kwargs[key], decimals=4)
    +                line += f"| {key}: {value} "
    +        sys.stdout.write("\r" + line)
    +        sys.stdout.flush()
    +        return len(line)
    +
    +    def _format(self, value, decimals=4):
    +        """
    +        Description:
    +        ------------
    +            Formats decimal numbers for progress bar
    +        """
    +        if value > 0:
    +            v = value
    +        elif value < 0:
    +            v = -10 * value
    +        else:
    +            v = 1
    +        n = 1 + math.floor(math.log10(v))
    +        if n >= decimals - 1:
    +            return str(round(value))
    +        return f"{value:.{decimals-n-1}f}"
    +
    +
    +
    +
    +

    Before we make a model, we will quickly generate a dataset we can use +for our linear regression problem as shown below

    +
    +
    +
    import autograd.numpy as np
    +from sklearn.model_selection import train_test_split
    +
    +def SkrankeFunction(x, y):
    +    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
    +
    +def create_X(x, y, n):
    +    if len(x.shape) > 1:
    +        x = np.ravel(x)
    +        y = np.ravel(y)
    +
    +    N = len(x)
    +    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
    +    X = np.ones((N, l))
    +
    +    for i in range(1, n + 1):
    +        q = int((i) * (i + 1) / 2)
    +        for k in range(i + 1):
    +            X[:, q + k] = (x ** (i - k)) * (y**k)
    +
    +    return X
    +
    +step=0.5
    +x = np.arange(0, 1, step)
    +y = np.arange(0, 1, step)
    +x, y = np.meshgrid(x, y)
    +target = SkrankeFunction(x, y)
    +target = target.reshape(target.shape[0], 1)
    +
    +poly_degree=3
    +X = create_X(x, y, poly_degree)
    +
    +X_train, X_test, t_train, t_test = train_test_split(X, target)
    +
    +
    +
    +
    +

    Now that we have our dataset ready for the regression, we can create +our regressor. Note that with the seed parameter, we can make sure our +results stay the same every time we run the neural network. For +inititialization, we simply specify the dimensions (we wish the amount +of input nodes to be equal to the datapoints, and the output to +predict one value).

    +
    +
    +
    input_nodes = X_train.shape[1]
    +output_nodes = 1
    +
    +linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
    +
    +
    +
    +
    +

    We then fit our model with our training data using the scheduler of our choice.

    +
    +
    +
    linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Constant(eta=1e-3)
    +scores = linear_regression.fit(X_train, t_train, scheduler)
    +
    +
    +
    +
    +

    Due to the progress bar we can see the MSE (train_error) throughout +the FFNN’s training. Note that the fit() function has some optional +parameters with defualt arguments. For example, the regularization +hyperparameter can be left ignored if not needed, and equally the FFNN +will by default run for 100 epochs. These can easily be changed, such +as for example:

    +
    +
    +
    linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
    +
    +
    +
    +
    +

    We see that given more epochs to train on, the regressor reaches a lower MSE.

    +

    Let us then switch to a binary classification. We use a binary +classification dataset, and follow a similar setup to the regression +case.

    +
    +
    +
    from sklearn.datasets import load_breast_cancer
    +from sklearn.preprocessing import MinMaxScaler
    +
    +wisconsin = load_breast_cancer()
    +X = wisconsin.data
    +target = wisconsin.target
    +target = target.reshape(target.shape[0], 1)
    +
    +X_train, X_val, t_train, t_val = train_test_split(X, target)
    +
    +scaler = MinMaxScaler()
    +scaler.fit(X_train)
    +X_train = scaler.transform(X_train)
    +X_val = scaler.transform(X_val)
    +
    +
    +
    +
    +
    +
    +
    input_nodes = X_train.shape[1]
    +output_nodes = 1
    +
    +logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
    +
    +
    +
    +
    +

    We will now make use of our validation data by passing it into our fit function as a keyword argument

    +
    +
    +
    logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
    +scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
    +
    +
    +
    +
    +

    Finally, we will create a neural network with 2 hidden layers with activation functions.

    +
    +
    +
    input_nodes = X_train.shape[1]
    +hidden_nodes1 = 100
    +hidden_nodes2 = 30
    +output_nodes = 1
    +
    +dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
    +
    +neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
    +
    +
    +
    +
    +
    +
    +
    neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
    +scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
    +
    +
    +
    +
    +
    +
    +

    Multiclass classification#

    +

    Finally, we will demonstrate the use case of multiclass classification +using our FFNN with the famous MNIST dataset, which contain images of +digits between the range of 0 to 9.

    +
    +
    +
    from sklearn.datasets import load_digits
    +
    +def onehot(target: np.ndarray):
    +    onehot = np.zeros((target.size, target.max() + 1))
    +    onehot[np.arange(target.size), target] = 1
    +    return onehot
    +
    +digits = load_digits()
    +
    +X = digits.data
    +target = digits.target
    +target = onehot(target)
    +
    +input_nodes = 64
    +hidden_nodes1 = 100
    +hidden_nodes2 = 30
    +output_nodes = 10
    +
    +dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
    +
    +multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
    +
    +multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +
    +scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
    +scores = multiclass.fit(X, target, scheduler, epochs=1000)
    +
    +
    +
    +
    +
    +
    +
    +

    Testing the XOR gate and other gates#

    +

    Let us now use our code to test the XOR gate.

    +
    +
    +
    X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
    +
    +# The XOR gate
    +yXOR = np.array( [[ 0], [1] ,[1], [0]])
    +
    +input_nodes = X.shape[1]
    +output_nodes = 1
    +
    +logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
    +logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
    +scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
    +scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
    +
    +
    +
    +
    +

    Not bad, but the results depend strongly on the learning reate. Try different learning rates.

    +
    +
    +

    Solving differential equations with Deep Learning#

    +

    The Universal Approximation Theorem states that a neural network can +approximate any function at a single hidden layer along with one input +and output layer to any given precision.

    +

    Book on solving differential equations with ML methods.

    +

    An Introduction to Neural Network Methods for Differential Equations, by Yadav and Kumar.

    +

    Physics informed neural networks.

    +

    Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next, by Cuomo et al

    +

    Thanks to Kristine Baluka Hein.

    +

    The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI. +A great thanks to Kristine.

    +
    +
    +

    Ordinary Differential Equations first#

    +

    An ordinary differential equation (ODE) is an equation involving functions having one variable.

    +

    In general, an ordinary differential equation looks like

    + +
    +
    +\[ +\begin{equation} \label{ode} \tag{1} +f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0 +\end{equation} +\]
    +

    where \(g(x)\) is the function to find, and \(g^{(n)}(x)\) is the \(n\)-th derivative of \(g(x)\).

    +

    The \(f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right)\) is just a way to write that there is an expression involving \(x\) and \(g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x)\) on the left side of the equality sign in (1). +The highest order of derivative, that is the value of \(n\), determines to the order of the equation. +The equation is referred to as a \(n\)-th order ODE. +Along with (1), some additional conditions of the function \(g(x)\) are typically given +for the solution to be unique.

    +
    +
    +

    The trial solution#

    +

    Let the trial solution \(g_t(x)\) be

    + +
    +
    +\[ +\begin{equation} + g_t(x) = h_1(x) + h_2(x,N(x,P)) +\label{_auto1} \tag{2} +\end{equation} +\]
    +

    where \(h_1(x)\) is a function that makes \(g_t(x)\) satisfy a given set +of conditions, \(N(x,P)\) a neural network with weights and biases +described by \(P\) and \(h_2(x, N(x,P))\) some expression involving the +neural network. The role of the function \(h_2(x, N(x,P))\), is to +ensure that the output from \(N(x,P)\) is zero when \(g_t(x)\) is +evaluated at the values of \(x\) where the given conditions must be +satisfied. The function \(h_1(x)\) should alone make \(g_t(x)\) satisfy +the conditions.

    +

    But what about the network \(N(x,P)\)?

    +

    As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.

    +
    +
    +

    Minimization process#

    +

    For the minimization to be defined, we need to have a cost function at hand to minimize.

    +

    It is given that \(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\) should be equal to zero in (1). +We can choose to consider the mean squared error as the cost function for an input \(x\). +Since we are looking at one input, the cost function is just \(f\) squared. +The cost function \(c\left(x, P \right)\) can therefore be expressed as

    +
    +\[ +C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2 +\]
    +

    If \(N\) inputs are given as a vector \(\boldsymbol{x}\) with elements \(x_i\) for \(i = 1,\dots,N\), +the cost function becomes

    + +
    +
    +\[ +\begin{equation} \label{cost} \tag{3} + C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2 +\end{equation} +\]
    +

    The neural net should then find the parameters \(P\) that minimizes the cost function in +(3) for a set of \(N\) training samples \(x_i\).

    +
    +
    +

    Minimizing the cost function using gradient descent and automatic differentiation#

    +

    To perform the minimization using gradient descent, the gradient of \(C\left(\boldsymbol{x}, P\right)\) is needed. +It might happen so that finding an analytical expression of the gradient of \(C(\boldsymbol{x}, P)\) from (3) gets too messy, depending on which cost function one desires to use.

    +

    Luckily, there exists libraries that makes the job for us through automatic differentiation. +Automatic differentiation is a method of finding the derivatives numerically with very high precision.

    +
    +
    +

    Example: Exponential decay#

    +

    An exponential decay of a quantity \(g(x)\) is described by the equation

    + +
    +
    +\[ +\begin{equation} \label{solve_expdec} \tag{4} + g'(x) = -\gamma g(x) +\end{equation} +\]
    +

    with \(g(0) = g_0\) for some chosen initial value \(g_0\).

    +

    The analytical solution of (4) is

    + +
    +
    +\[ +\begin{equation} + g(x) = g_0 \exp\left(-\gamma x\right) +\label{_auto2} \tag{5} +\end{equation} +\]
    +

    Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of (4).

    +
    +
    +

    The function to solve for#

    +

    The program will use a neural network to solve

    + +
    +
    +\[ +\begin{equation} \label{solveode} \tag{6} +g'(x) = -\gamma g(x) +\end{equation} +\]
    +

    where \(g(0) = g_0\) with \(\gamma\) and \(g_0\) being some chosen values.

    +

    In this example, \(\gamma = 2\) and \(g_0 = 10\).

    +
    +
    +

    The trial solution#

    +

    To begin with, a trial solution \(g_t(t)\) must be chosen. A general trial solution for ordinary differential equations could be

    +
    +\[ +g_t(x, P) = h_1(x) + h_2(x, N(x, P)) +\]
    +

    with \(h_1(x)\) ensuring that \(g_t(x)\) satisfies some conditions and \(h_2(x,N(x, P))\) an expression involving \(x\) and the output from the neural network \(N(x,P)\) with \(P \) being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.

    +
    +
    +

    Setup of Network#

    +

    In this network, there are no weights and bias at the input layer, so \(P = \{ P_{\text{hidden}}, P_{\text{output}} \}\). +If there are \(N_{\text{hidden} }\) neurons in the hidden layer, then \(P_{\text{hidden}}\) is a \(N_{\text{hidden} } \times (1 + N_{\text{input}})\) matrix, given that there are \(N_{\text{input}}\) neurons in the input layer.

    +

    The first column in \(P_{\text{hidden} }\) represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer. +If there are \(N_{\text{output} }\) neurons in the output layer, then \(P_{\text{output}} \) is a \(N_{\text{output} } \times (1 + N_{\text{hidden} })\) matrix.

    +

    Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.

    +

    It is given that \(g(0) = g_0\). The trial solution must fulfill this condition to be a proper solution of (6). A possible way to ensure that \(g_t(0, P) = g_0\), is to let \(F(N(x,P)) = x \cdot N(x,P)\) and \(A(x) = g_0\). This gives the following trial solution:

    + +
    +
    +\[ +\begin{equation} \label{trial} \tag{7} +g_t(x, P) = g_0 + x \cdot N(x, P) +\end{equation} +\]
    +
    +
    +

    Reformulating the problem#

    +

    We wish that our neural network manages to minimize a given cost function.

    +

    A reformulation of out equation, (6), must therefore be done, +such that it describes the problem a neural network can solve for.

    +

    The neural network must find the set of weights and biases \(P\) such that the trial solution in (7) satisfies (6).

    +

    The trial solution

    +
    +\[ +g_t(x, P) = g_0 + x \cdot N(x, P) +\]
    +

    has been chosen such that it already solves the condition \(g(0) = g_0\). What remains, is to find \(P\) such that

    + +
    +
    +\[ +\begin{equation} \label{nnmin} \tag{8} +g_t'(x, P) = - \gamma g_t(x, P) +\end{equation} +\]
    +

    is fulfilled as best as possible.

    +
    +
    +

    More technicalities#

    +

    The left hand side and right hand side of (8) must be computed separately, and then the neural network must choose weights and biases, contained in \(P\), such that the sides are equal as best as possible. +This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero. +In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \(P\) of the neural network.

    +

    This gives the following cost function our neural network must solve for:

    +
    +\[ +\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\} +\]
    +

    (the notation \(\min_{P}\{ f(x, P) \}\) means that we desire to find \(P\) that yields the minimum of \(f(x, P)\))

    +

    or, in terms of weights and biases for the hidden and output layer in our network:

    +
    +\[ +\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\} +\]
    +

    for an input value \(x\).

    +
    +
    +

    More details#

    +

    If the neural network evaluates \(g_t(x, P)\) at more values for \(x\), say \(N\) values \(x_i\) for \(i = 1, \dots, N\), then the total error to minimize becomes

    + +
    +
    +\[ +\begin{equation} \label{min} \tag{9} +\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\} +\end{equation} +\]
    +

    Letting \(\boldsymbol{x}\) be a vector with elements \(x_i\) and \(C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2\) denote the cost function, the minimization problem that our network must solve, becomes

    +
    +\[ +\min_{P} C(\boldsymbol{x}, P) +\]
    +

    In terms of \(P_{\text{hidden} }\) and \(P_{\text{output} }\), this could also be expressed as

    +
    +\[ +\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\}) +\]
    +
    +
    +

    A possible implementation of a neural network#

    +

    For simplicity, it is assumed that the input is an array \(\boldsymbol{x} = (x_1, \dots, x_N)\) with \(N\) elements. It is at these points the neural network should find \(P\) such that it fulfills (9).

    +

    First, the neural network must feed forward the inputs. +This means that \(\boldsymbol{x}s\) must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further. +The input layer will consist of \(N_{\text{input} }\) neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be \(N_{\text{hidden} }\).

    +
    +
    +

    Technicalities#

    +

    For the \(i\)-th in the hidden layer with weight \(w_i^{\text{hidden} }\) and bias \(b_i^{\text{hidden} }\), the weighting from the \(j\)-th neuron at the input layer is:

    +
    +\[\begin{split} +\begin{aligned} +z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\ +&= +\begin{pmatrix} +b_i^{\text{hidden}} & w_i^{\text{hidden}} +\end{pmatrix} +\begin{pmatrix} +1 \\ +x_j +\end{pmatrix} +\end{aligned} +\end{split}\]
    +
    +
    +

    Final technicalities I#

    +

    The result after weighting the inputs at the \(i\)-th hidden neuron can be written as a vector:

    +
    +\[\begin{split} +\begin{aligned} +\boldsymbol{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big) \\ +&= +\begin{pmatrix} + b_i^{\text{hidden}} & w_i^{\text{hidden}} +\end{pmatrix} +\begin{pmatrix} +1 & 1 & \dots & 1 \\ +x_1 & x_2 & \dots & x_N +\end{pmatrix} \\ +&= \boldsymbol{p}_{i, \text{hidden}}^T X +\end{aligned} +\end{split}\]
    +
    +
    +

    Final technicalities II#

    +

    The vector \(\boldsymbol{p}_{i, \text{hidden}}^T\) constitutes each row in \(P_{\text{hidden} }\), which contains the weights for the neural network to minimize according to (9).

    +

    After having found \(\boldsymbol{z}_{i}^{\text{hidden}} \) for every \(i\)-th neuron within the hidden layer, the vector will be sent to an activation function \(a_i(\boldsymbol{z})\).

    +

    In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:

    +
    +\[ +f(z) = \frac{1}{1 + \exp{(-z)}} +\]
    +

    It is possible to use other activations functions for the hidden layer also.

    +

    The output \(\boldsymbol{x}_i^{\text{hidden}}\) from each \(i\)-th hidden neuron is:

    +
    +\[ +\boldsymbol{x}_i^{\text{hidden} } = f\big( \boldsymbol{z}_{i}^{\text{hidden}} \big) +\]
    +

    The outputs \(\boldsymbol{x}_i^{\text{hidden} } \) are then sent to the output layer.

    +

    The output layer consists of one neuron in this case, and combines the +output from each of the neurons in the hidden layers. The output layer +combines the results from the hidden layer using some weights \(w_i^{\text{output}}\) +and biases \(b_i^{\text{output}}\). In this case, +it is assumes that the number of neurons in the output layer is one.

    +
    +
    +

    Final technicalities III#

    +

    The procedure of weighting the output neuron \(j\) in the hidden layer to the \(i\)-th neuron in the output layer is similar as for the hidden layer described previously.

    +
    +\[\begin{split} +\begin{aligned} +z_{1,j}^{\text{output}} & = +\begin{pmatrix} +b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}} +\end{pmatrix} +\begin{pmatrix} +1 \\ +\boldsymbol{x}_j^{\text{hidden}} +\end{pmatrix} +\end{aligned} +\end{split}\]
    +
    +
    +

    Final technicalities IV#

    +

    Expressing \(z_{1,j}^{\text{output}}\) as a vector gives the following way of weighting the inputs from the hidden layer:

    +
    +\[\begin{split} +\boldsymbol{z}_{1}^{\text{output}} = +\begin{pmatrix} +b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}} +\end{pmatrix} +\begin{pmatrix} +1 & 1 & \dots & 1 \\ +\boldsymbol{x}_1^{\text{hidden}} & \boldsymbol{x}_2^{\text{hidden}} & \dots & \boldsymbol{x}_N^{\text{hidden}} +\end{pmatrix} +\end{split}\]
    +

    In this case we seek a continuous range of values since we are approximating a function. This means that after computing \(\boldsymbol{z}_{1}^{\text{output}}\) the neural network has finished its feed forward step, and \(\boldsymbol{z}_{1}^{\text{output}}\) is the final output of the network.

    +
    +
    +

    Back propagation#

    +

    The next step is to decide how the parameters should be changed such that they minimize the cost function.

    +

    The chosen cost function for this problem is

    +
    +\[ +C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 +\]
    +

    In order to minimize the cost function, an optimization method must be chosen.

    +

    Here, gradient descent with a constant step size has been chosen.

    +
    +
    +

    Gradient descent#

    +

    The idea of the gradient descent algorithm is to update parameters in +a direction where the cost function decreases goes to a minimum.

    +

    In general, the update of some parameters \(\boldsymbol{\omega}\) given a cost +function defined by some weights \(\boldsymbol{\omega}\), \(C(\boldsymbol{x}, +\boldsymbol{\omega})\), goes as follows:

    +
    +\[ +\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega}) +\]
    +

    for a number of iterations or until \( \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|\) becomes smaller than some given tolerance.

    +

    The value of \(\lambda\) decides how large steps the algorithm must take +in the direction of \( \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})\). +The notation \(\nabla_{\boldsymbol{\omega}}\) express the gradient with respect +to the elements in \(\boldsymbol{\omega}\).

    +

    In our case, we have to minimize the cost function \(C(\boldsymbol{x}, P)\) with +respect to the two sets of weights and biases, that is for the hidden +layer \(P_{\text{hidden} }\) and for the output layer \(P_{\text{output} +}\) .

    +

    This means that \(P_{\text{hidden} }\) and \(P_{\text{output} }\) is updated by

    +
    +\[\begin{split} +\begin{aligned} +P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P) \\ +P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P) +\end{aligned} +\end{split}\]
    +
    +
    +

    The code for solving the ODE#

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +# Assuming one input, hidden, and output layer
    +def neural_network(params, x):
    +
    +    # Find the weights (including and biases) for the hidden and output layer.
    +    # Assume that params is a list of parameters for each layer.
    +    # The biases are the first element for each array in params,
    +    # and the weights are the remaning elements in each array in params.
    +
    +    w_hidden = params[0]
    +    w_output = params[1]
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    ## Hidden layer:
    +
    +    # Add a row of ones to include bias
    +    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)
    +
    +    z_hidden = np.matmul(w_hidden, x_input)
    +    x_hidden = sigmoid(z_hidden)
    +
    +    ## Output layer:
    +
    +    # Include bias:
    +    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_hidden)
    +    x_output = z_output
    +
    +    return x_output
    +
    +# The trial solution using the deep neural network:
    +def g_trial(x,params, g0 = 10):
    +    return g0 + x*neural_network(params,x)
    +
    +# The right side of the ODE:
    +def g(x, g_trial, gamma = 2):
    +    return -gamma*g_trial
    +
    +# The cost function:
    +def cost_function(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial(x,P)
    +
    +    # Find the derivative w.r.t x of the neural network
    +    d_net_out = elementwise_grad(neural_network,1)(P,x)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d_g_t = elementwise_grad(g_trial,0)(x,P)
    +
    +    # The right side of the ODE
    +    func = g(x, g_t)
    +
    +    err_sqr = (d_g_t - func)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum / np.size(err_sqr)
    +
    +# Solve the exponential decay ODE using neural network with one input, hidden, and output layer
    +def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):
    +    ## Set up initial weights and biases
    +
    +    # For the hidden layer
    +    p0 = npr.randn(num_neurons_hidden, 2 )
    +
    +    # For the output layer
    +    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included
    +
    +    P = [p0, p1]
    +
    +    print('Initial cost: %g'%cost_function(P, x))
    +
    +    ## Start finding the optimal weights using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_grad = grad(cost_function,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of two arrays;
    +        # one for the gradient w.r.t P_hidden and
    +        # one for the gradient w.r.t P_output
    +        cost_grad =  cost_function_grad(P, x)
    +
    +        P[0] = P[0] - lmb * cost_grad[0]
    +        P[1] = P[1] - lmb * cost_grad[1]
    +
    +    print('Final cost: %g'%cost_function(P, x))
    +
    +    return P
    +
    +def g_analytic(x, gamma = 2, g0 = 10):
    +    return g0*np.exp(-gamma*x)
    +
    +# Solve the given problem
    +if __name__ == '__main__':
    +    # Set seed such that the weight are initialized
    +    # with same weights and biases for every run.
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    N = 10
    +    x = np.linspace(0, 1, N)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = 10
    +    num_iter = 10000
    +    lmb = 0.001
    +
    +    # Use the network
    +    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    # Print the deviation from the trial solution and true solution
    +    res = g_trial(x,P)
    +    res_analytical = g_analytic(x)
    +
    +    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))
    +
    +    # Plot the results
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, res_analytical)
    +    plt.plot(x, res[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('x')
    +    plt.ylabel('g(x)')
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    The network with one input layer, specified number of hidden layers, and one output layer#

    +

    It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.

    +

    The number of neurons within each hidden layer are given as a list of integers in the program below.

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +# The neural network with one input layer and one output layer,
    +# but with number of hidden layers specified by the user.
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +# The trial solution using the deep neural network:
    +def g_trial_deep(x,params, g0 = 10):
    +    return g0 + x*deep_neural_network(params, x)
    +
    +# The right side of the ODE:
    +def g(x, g_trial, gamma = 2):
    +    return -gamma*g_trial
    +
    +# The same cost function as before, but calls deep_neural_network instead.
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the neural network
    +    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
    +
    +    # The right side of the ODE
    +    func = g(x, g_t)
    +
    +    err_sqr = (d_g_t - func)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum / np.size(err_sqr)
    +
    +# Solve the exponential decay ODE using neural network with one input and one output layer,
    +# but with specified number of hidden layers from the user.
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # The number of elements in the list num_hidden_neurons thus represents
    +    # the number of hidden layers.
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weights and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weights using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +def g_analytic(x, gamma = 2, g0 = 10):
    +    return g0*np.exp(-gamma*x)
    +
    +# Solve the given problem
    +if __name__ == '__main__':
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    N = 10
    +    x = np.linspace(0, 1, N)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = np.array([10,10])
    +    num_iter = 10000
    +    lmb = 0.001
    +
    +    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    res = g_trial_deep(x,P)
    +    res_analytical = g_analytic(x)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, res_analytical)
    +    plt.plot(x, res[0,:])
    +    plt.legend(['analytical','dnn'])
    +    plt.ylabel('g(x)')
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Example: Population growth#

    +

    A logistic model of population growth assumes that a population converges toward an equilibrium. +The population growth can be modeled by

    + +
    +
    +\[ +\begin{equation} \label{log} \tag{10} + g'(t) = \alpha g(t)(A - g(t)) +\end{equation} +\]
    +

    where \(g(t)\) is the population density at time \(t\), \(\alpha > 0\) the growth rate and \(A > 0\) is the maximum population number in the environment. +Also, at \(t = 0\) the population has the size \(g(0) = g_0\), where \(g_0\) is some chosen constant.

    +

    In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability +and high execution time (this might be more apparent in the examples solving PDEs), +using a library like TensorFlow is recommended. +Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.

    +
    +
    +

    Setting up the problem#

    +

    Here, we will model a population \(g(t)\) in an environment having carrying capacity \(A\). +The population follows the model

    + +
    +
    +\[ +\begin{equation} \label{solveode_population} \tag{11} +g'(t) = \alpha g(t)(A - g(t)) +\end{equation} +\]
    +

    where \(g(0) = g_0\).

    +

    In this example, we let \(\alpha = 2\), \(A = 1\), and \(g_0 = 1.2\).

    +
    +
    +

    The trial solution#

    +

    We will get a slightly different trial solution, as the boundary conditions are different +compared to the case for exponential decay.

    +

    A possible trial solution satisfying the condition \(g(0) = g_0\) could be

    +
    +\[ +h_1(t) = g_0 + t \cdot N(t,P) +\]
    +

    with \(N(t,P)\) being the output from the neural network with weights and biases for each layer collected in the set \(P\).

    +

    The analytical solution is

    +
    +\[ +g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)} +\]
    +
    +
    +

    The program using Autograd#

    +

    The network will be the similar as for the exponential decay example, but with some small modifications for our problem.

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +# Function to get the parameters.
    +# Done such that one can easily change the paramaters after one's liking.
    +def get_parameters():
    +    alpha = 2
    +    A = 1
    +    g0 = 1.2
    +    return alpha, A, g0
    +
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +
    +
    +
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
    +
    +    # The right side of the ODE
    +    func = f(x, g_t)
    +
    +    err_sqr = (d_g_t - func)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum / np.size(err_sqr)
    +
    +# The right side of the ODE:
    +def f(x, g_trial):
    +    alpha,A, g0 = get_parameters()
    +    return alpha*g_trial*(A - g_trial)
    +
    +# The trial solution using the deep neural network:
    +def g_trial_deep(x, params):
    +    alpha,A, g0 = get_parameters()
    +    return g0 + x*deep_neural_network(params,x)
    +
    +# The analytical solution:
    +def g_analytic(t):
    +    alpha,A, g0 = get_parameters()
    +    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
    +
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weigths using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nt = 10
    +    T = 1
    +    t = np.linspace(0,T, Nt)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [100, 50, 25]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(t,P)
    +    g_analytical = g_analytic(t)
    +
    +    # Find the maximum absolute difference between the solutons:
    +    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
    +    print("The max absolute difference between the solutions is: %g"%diff_ag)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(t, g_analytical)
    +    plt.plot(t, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('t')
    +    plt.ylabel('g(t)')
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Using forward Euler to solve the ODE#

    +

    A straightforward way of solving an ODE numerically, is to use Euler’s method.

    +

    Euler’s method uses Taylor series to approximate the value at a function \(f\) at a step \(\Delta x\) from \(x\):

    +
    +\[ +f(x + \Delta x) \approx f(x) + \Delta x f'(x) +\]
    +

    In our case, using Euler’s method to approximate the value of \(g\) at a step \(\Delta t\) from \(t\) yields

    +
    +\[\begin{split} +\begin{aligned} + g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\ + &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big) +\end{aligned} +\end{split}\]
    +

    along with the condition that \(g(0) = g_0\).

    +

    Let \(t_i = i \cdot \Delta t\) where \(\Delta t = \frac{T}{N_t-1}\) where \(T\) is the final time our solver must solve for and \(N_t\) the number of values for \(t \in [0, T]\) for \(i = 0, \dots, N_t-1\).

    +

    For \(i \geq 1\), we have that

    +
    +\[\begin{split} +\begin{aligned} +t_i &= i\Delta t \\ +&= (i - 1)\Delta t + \Delta t \\ +&= t_{i-1} + \Delta t +\end{aligned} +\end{split}\]
    +

    Now, if \(g_i = g(t_i)\) then

    + +
    +
    +\[\begin{split} +\begin{equation} + \begin{aligned} + g_i &= g(t_i) \\ + &= g(t_{i-1} + \Delta t) \\ + &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\ + &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big) + \end{aligned} +\end{equation} \label{odenum} \tag{12} +\end{split}\]
    +

    for \(i \geq 1\) and \(g_0 = g(t_0) = g(0) = g_0\).

    +

    Equation (12) could be implemented in the following way, +extending the program that uses the network using Autograd:

    +
    +
    +
    # Assume that all function definitions from the example program using Autograd
    +# are located here.
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nt = 10
    +    T = 1
    +    t = np.linspace(0,T, Nt)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [100,50,25]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(t,P)
    +    g_analytical = g_analytic(t)
    +
    +    # Find the maximum absolute difference between the solutons:
    +    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
    +    print("The max absolute difference between the solutions is: %g"%diff_ag)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(t, g_analytical)
    +    plt.plot(t, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('t')
    +    plt.ylabel('g(t)')
    +
    +    ## Find an approximation to the funtion using forward Euler
    +
    +    alpha, A, g0 = get_parameters()
    +    dt = T/(Nt - 1)
    +
    +    # Perform forward Euler to solve the ODE
    +    g_euler = np.zeros(Nt)
    +    g_euler[0] = g0
    +
    +    for i in range(1,Nt):
    +        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))
    +
    +    # Print the errors done by each method
    +    diff1 = np.max(np.abs(g_euler - g_analytical))
    +    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))
    +
    +    print('Max absolute difference between Euler method and analytical: %g'%diff1)
    +    print('Max absolute difference between deep neural network and analytical: %g'%diff2)
    +
    +    # Plot results
    +    plt.figure(figsize=(10,10))
    +
    +    plt.plot(t,g_euler)
    +    plt.plot(t,g_analytical)
    +    plt.plot(t,g_dnn_ag[0,:])
    +
    +    plt.legend(['euler','analytical','dnn'])
    +    plt.xlabel('Time t')
    +    plt.ylabel('g(t)')
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Example: Solving the one dimensional Poisson equation#

    +

    The Poisson equation for \(g(x)\) in one dimension is

    + +
    +
    +\[ +\begin{equation} \label{poisson} \tag{13} + -g''(x) = f(x) +\end{equation} +\]
    +

    where \(f(x)\) is a given function for \(x \in (0,1)\).

    +

    The conditions that \(g(x)\) is chosen to fulfill, are

    +
    +\[\begin{split} +\begin{align*} + g(0) &= 0 \\ + g(1) &= 0 +\end{align*} +\end{split}\]
    +

    This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used. +The results from the networks can then be compared to the analytical solution. +In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.

    +
    +
    +

    The specific equation to solve for#

    +

    Here, the function \(g(x)\) to solve for follows the equation

    +
    +\[ +-g''(x) = f(x),\qquad x \in (0,1) +\]
    +

    where \(f(x)\) is a given function, along with the chosen conditions

    + +
    +
    +\[ +\begin{aligned} +g(0) = g(1) = 0 +\end{aligned}\label{cond} \tag{14} +\]
    +

    In this example, we consider the case when \(f(x) = (3x + x^2)\exp(x)\).

    +

    For this case, a possible trial solution satisfying the conditions could be

    +
    +\[ +g_t(x) = x \cdot (1-x) \cdot N(P,x) +\]
    +

    The analytical solution for this problem is

    +
    +\[ +g(x) = x(1 - x)\exp(x) +\]
    +
    +
    +

    Solving the equation using Autograd#

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weigths using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +## Set up the cost function specified for this Poisson equation:
    +
    +# The right side of the ODE
    +def f(x):
    +    return (3*x + x**2)*np.exp(x)
    +
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
    +
    +    right_side = f(x)
    +
    +    err_sqr = (-d2_g_t - right_side)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum/np.size(err_sqr)
    +
    +# The trial solution:
    +def g_trial_deep(x,P):
    +    return x*(1-x)*deep_neural_network(P,x)
    +
    +# The analytic solution;
    +def g_analytic(x):
    +    return x*(1-x)*np.exp(x)
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10
    +    x = np.linspace(0,1, Nx)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [200,100]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(x,P)
    +    g_analytical = g_analytic(x)
    +
    +    # Find the maximum absolute difference between the solutons:
    +    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
    +    print("The max absolute difference between the solutions is: %g"%max_diff)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, g_analytical)
    +    plt.plot(x, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('x')
    +    plt.ylabel('g(x)')
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Comparing with a numerical scheme#

    +

    The Poisson equation is possible to solve using Taylor series to approximate the second derivative.

    +

    Using Taylor series, the second derivative can be expressed as

    +
    +\[ +g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x) +\]
    +

    where \(\Delta x\) is a small step size and \(E_{\Delta x}(x)\) being the error term.

    +

    Looking away from the error terms gives an approximation to the second derivative:

    + +
    +
    +\[ +\begin{equation} \label{approx} \tag{15} +g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} +\end{equation} +\]
    +

    If \(x_i = i \Delta x = x_{i-1} + \Delta x\) and \(g_i = g(x_i)\) for \(i = 1,\dots N_x - 2\) with \(N_x\) being the number of values for \(x\), (15) becomes

    +
    +\[\begin{split} +\begin{aligned} +g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\ +&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} +\end{aligned} +\end{split}\]
    +

    Since we know from our problem that

    +
    +\[\begin{split} +\begin{aligned} +-g''(x) &= f(x) \\ +&= (3x + x^2)\exp(x) +\end{aligned} +\end{split}\]
    +

    along with the conditions \(g(0) = g(1) = 0\), +the following scheme can be used to find an approximate solution for \(g(x)\) numerically:

    + +
    +
    +\[\begin{split} +\begin{equation} + \begin{aligned} + -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\ + -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i) + \end{aligned} +\end{equation} \label{odesys} \tag{16} +\end{split}\]
    +

    for \(i = 1, \dots, N_x - 2\) where \(g_0 = g_{N_x - 1} = 0\) and \(f(x_i) = (3x_i + x_i^2)\exp(x_i)\), which is given for our specific problem.

    +

    The equation can be rewritten into a matrix equation:

    +
    +\[\begin{split} +\begin{aligned} +\begin{pmatrix} +2 & -1 & 0 & \dots & 0 \\ +-1 & 2 & -1 & \dots & 0 \\ +\vdots & & \ddots & & \vdots \\ +0 & \dots & -1 & 2 & -1 \\ +0 & \dots & 0 & -1 & 2\\ +\end{pmatrix} +\begin{pmatrix} +g_1 \\ +g_2 \\ +\vdots \\ +g_{N_x - 3} \\ +g_{N_x - 2} +\end{pmatrix} +&= +\Delta x^2 +\begin{pmatrix} +f(x_1) \\ +f(x_2) \\ +\vdots \\ +f(x_{N_x - 3}) \\ +f(x_{N_x - 2}) +\end{pmatrix} \\ +\boldsymbol{A}\boldsymbol{g} &= \boldsymbol{f}, +\end{aligned} +\end{split}\]
    +

    which makes it possible to solve for the vector \(\boldsymbol{g}\).

    +
    +
    +

    Setting up the code#

    +

    We can then compare the result from this numerical scheme with the output from our network using Autograd:

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weigths using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +## Set up the cost function specified for this Poisson equation:
    +
    +# The right side of the ODE
    +def f(x):
    +    return (3*x + x**2)*np.exp(x)
    +
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
    +
    +    right_side = f(x)
    +
    +    err_sqr = (-d2_g_t - right_side)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum/np.size(err_sqr)
    +
    +# The trial solution:
    +def g_trial_deep(x,P):
    +    return x*(1-x)*deep_neural_network(P,x)
    +
    +# The analytic solution;
    +def g_analytic(x):
    +    return x*(1-x)*np.exp(x)
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10
    +    x = np.linspace(0,1, Nx)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [200,100]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(x,P)
    +    g_analytical = g_analytic(x)
    +
    +    # Find the maximum absolute difference between the solutons:
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, g_analytical)
    +    plt.plot(x, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('x')
    +    plt.ylabel('g(x)')
    +
    +    ## Perform the computation using the numerical scheme
    +
    +    dx = 1/(Nx - 1)
    +
    +    # Set up the matrix A
    +    A = np.zeros((Nx-2,Nx-2))
    +
    +    A[0,0] = 2
    +    A[0,1] = -1
    +
    +    for i in range(1,Nx-3):
    +        A[i,i-1] = -1
    +        A[i,i] = 2
    +        A[i,i+1] = -1
    +
    +    A[Nx - 3, Nx - 4] = -1
    +    A[Nx - 3, Nx - 3] = 2
    +
    +    # Set up the vector f
    +    f_vec = dx**2 * f(x[1:-1])
    +
    +    # Solve the equation
    +    g_res = np.linalg.solve(A,f_vec)
    +
    +    g_vec = np.zeros(Nx)
    +    g_vec[1:-1] = g_res
    +
    +    # Print the differences between each method
    +    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
    +    max_diff2 = np.max(np.abs(g_vec - g_analytical))
    +    print("The max absolute difference between the analytical solution and DNN Autograd: %g"%max_diff1)
    +    print("The max absolute difference between the analytical solution and numerical scheme: %g"%max_diff2)
    +
    +    # Plot the results
    +    plt.figure(figsize=(10,10))
    +
    +    plt.plot(x,g_vec)
    +    plt.plot(x,g_analytical)
    +    plt.plot(x,g_dnn_ag[0,:])
    +
    +    plt.legend(['numerical scheme','analytical','dnn'])
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Partial Differential Equations#

    +

    A partial differential equation (PDE) has a solution here the function +is defined by multiple variables. The equation may involve all kinds +of combinations of which variables the function is differentiated with +respect to.

    +

    In general, a partial differential equation for a function \(g(x_1,\dots,x_N)\) with \(N\) variables may be expressed as

    + +
    +
    +\[ +\begin{equation} \label{PDE} \tag{17} + f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0 +\end{equation} +\]
    +

    where \(f\) is an expression involving all kinds of possible mixed derivatives of \(g(x_1,\dots,x_N)\) up to an order \(n\). In order for the solution to be unique, some additional conditions must also be given.

    +
    +
    +

    Type of problem#

    +

    The problem our network must solve for, is similar to the ODE case. +We must have a trial solution \(g_t\) at hand.

    +

    For instance, the trial solution could be expressed as

    +
    +\[ +\begin{align*} + g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) +\end{align*} +\]
    +

    where \(h_1(x_1,\dots,x_N)\) is a function that ensures \(g_t(x_1,\dots,x_N)\) satisfies some given conditions. +The neural network \(N(x_1,\dots,x_N,P)\) has weights and biases described by \(P\) and \(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\) is an expression using the output from the neural network in some way.

    +

    The role of the function \(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\), is to ensure that the output of \(N(x_1,\dots,x_N,P)\) is zero when \(g_t(x_1,\dots,x_N)\) is evaluated at the values of \(x_1,\dots,x_N\) where the given conditions must be satisfied. The function \(h_1(x_1,\dots,x_N)\) should alone make \(g_t(x_1,\dots,x_N)\) satisfy the conditions.

    +
    +
    +

    Network requirements#

    +

    The network tries then the minimize the cost function following the +same ideas as described for the ODE case, but now with more than one +variables to consider. The concept still remains the same; find a set +of parameters \(P\) such that the expression \(f\) in (17) is as +close to zero as possible.

    +

    As for the ODE case, the cost function is the mean squared error that +the network must try to minimize. The cost function for the network to +minimize is

    +
    +\[ +C\left(x_1, \dots, x_N, P\right) = \left( f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2 +\]
    +
    +
    +

    More details#

    +

    If we let \(\boldsymbol{x} = \big( x_1, \dots, x_N \big)\) be an array containing the values for \(x_1, \dots, x_N\) respectively, the cost function can be reformulated into the following:

    +
    +\[ +C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2 +\]
    +

    If we also have \(M\) different sets of values for \(x_1, \dots, x_N\), that is \(\boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big)\) for \(i = 1,\dots,M\) being the rows in matrix \(X\), the cost function can be generalized into

    +
    +\[ +C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2. +\]
    +
    +
    +

    Example: The diffusion equation#

    +

    In one spatial dimension, the equation reads

    +
    +\[ +\frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2} +\]
    +

    where a possible choice of conditions are

    +
    +\[\begin{split} +\begin{align*} +g(0,t) &= 0 ,\qquad t \geq 0 \\ +g(1,t) &= 0, \qquad t \geq 0 \\ +g(x,0) &= u(x),\qquad x\in [0,1] +\end{align*} +\end{split}\]
    +

    with \(u(x)\) being some given function.

    +
    +
    +

    Defining the problem#

    +

    For this case, we want to find \(g(x,t)\) such that

    + +
    +
    +\[ +\begin{equation} + \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2} +\end{equation} \label{diffonedim} \tag{18} +\]
    +

    and

    +
    +\[\begin{split} +\begin{align*} +g(0,t) &= 0 ,\qquad t \geq 0 \\ +g(1,t) &= 0, \qquad t \geq 0 \\ +g(x,0) &= u(x),\qquad x\in [0,1] +\end{align*} +\end{split}\]
    +

    with \(u(x) = \sin(\pi x)\).

    +

    First, let us set up the deep neural network. +The deep neural network will follow the same structure as discussed in the examples solving the ODEs. +First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.

    +
    +
    +

    Setting up the network using Autograd#

    +

    The only change to do here, is to extend our network such that +functions of multiple parameters are correctly handled. In this case +we have two variables in our function to solve for, that is time \(t\) +and position \(x\). The variables will be represented by a +one-dimensional array in the program. The program will evaluate the +network at each possible pair \((x,t)\), given an array for the desired +\(x\)-values and \(t\)-values to approximate the solution at.

    +
    +
    +
    def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # x is now a point and a 1D numpy array; make it a column vector
    +    num_coordinates = np.size(x,0)
    +    x = x.reshape(num_coordinates,-1)
    +
    +    num_points = np.size(x,1)
    +
    +    # N_hidden is the number of hidden layers
    +    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output[0][0]
    +
    +
    +
    +
    +
    +
    +

    Setting up the network using Autograd; The trial solution#

    +

    The cost function must then iterate through the given arrays +containing values for \(x\) and \(t\), defines a point \((x,t)\) the deep +neural network and the trial solution is evaluated at, and then finds +the Jacobian of the trial solution.

    +

    A possible trial solution for this PDE is

    +
    +\[ +g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P) +\]
    +

    with \(A(x,t)\) being a function ensuring that \(g_t(x,t)\) satisfies our given conditions, and \(N(x,t,P)\) being the output from the deep neural network using weights and biases for each layer from \(P\).

    +

    To fulfill the conditions, \(A(x,t)\) could be:

    +
    +\[ +h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x) +\]
    +

    since \((0) = u(1) = 0\) and \(u(x) = \sin(\pi x)\).

    +
    +
    +

    Why the jacobian?#

    +

    The Jacobian is used because the program must find the derivative of +the trial solution with respect to \(x\) and \(t\).

    +

    This gives the necessity of computing the Jacobian matrix, as we want +to evaluate the gradient with respect to \(x\) and \(t\) (note that the +Jacobian of a scalar-valued multivariate function is simply its +gradient).

    +

    In Autograd, the differentiation is by default done with respect to +the first input argument of your Python function. Since the points is +an array representing \(x\) and \(t\), the Jacobian is calculated using +the values of \(x\) and \(t\).

    +

    To find the second derivative with respect to \(x\) and \(t\), the +Jacobian can be found for the second time. The result is a Hessian +matrix, which is the matrix containing all the possible second order +mixed derivatives of \(g(x,t)\).

    +
    +
    +
    # Set up the trial function:
    +def u(x):
    +    return np.sin(np.pi*x)
    +
    +def g_trial(point,P):
    +    x,t = point
    +    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
    +
    +# The right side of the ODE:
    +def f(point):
    +    return 0.
    +
    +# The cost function:
    +def cost_function(P, x, t):
    +    cost_sum = 0
    +
    +    g_t_jacobian_func = jacobian(g_trial)
    +    g_t_hessian_func = hessian(g_trial)
    +
    +    for x_ in x:
    +        for t_ in t:
    +            point = np.array([x_,t_])
    +
    +            g_t = g_trial(point,P)
    +            g_t_jacobian = g_t_jacobian_func(point,P)
    +            g_t_hessian = g_t_hessian_func(point,P)
    +
    +            g_t_dt = g_t_jacobian[1]
    +            g_t_d2x = g_t_hessian[0][0]
    +
    +            func = f(point)
    +
    +            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
    +            cost_sum += err_sqr
    +
    +    return cost_sum
    +
    +
    +
    +
    +
    +
    +

    Setting up the network using Autograd; The full program#

    +

    Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.

    +

    The analytical solution of our problem is

    +
    +\[ +g(x,t) = \exp(-\pi^2 t)\sin(\pi x) +\]
    +

    A possible way to implement a neural network solving the PDE, is given below. +Be aware, though, that it is fairly slow for the parameters used. +A better result is possible, but requires more iterations, and thus longer time to complete.

    +

    Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE. +Using TensorFlow results in a much better execution time. Try it!

    +
    +
    +
    import autograd.numpy as np
    +from autograd import jacobian,hessian,grad
    +import autograd.numpy.random as npr
    +from matplotlib import cm
    +from matplotlib import pyplot as plt
    +from mpl_toolkits.mplot3d import axes3d
    +
    +## Set up the network
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # x is now a point and a 1D numpy array; make it a column vector
    +    num_coordinates = np.size(x,0)
    +    x = x.reshape(num_coordinates,-1)
    +
    +    num_points = np.size(x,1)
    +
    +    # N_hidden is the number of hidden layers
    +    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output[0][0]
    +
    +## Define the trial solution and cost function
    +def u(x):
    +    return np.sin(np.pi*x)
    +
    +def g_trial(point,P):
    +    x,t = point
    +    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
    +
    +# The right side of the ODE:
    +def f(point):
    +    return 0.
    +
    +# The cost function:
    +def cost_function(P, x, t):
    +    cost_sum = 0
    +
    +    g_t_jacobian_func = jacobian(g_trial)
    +    g_t_hessian_func = hessian(g_trial)
    +
    +    for x_ in x:
    +        for t_ in t:
    +            point = np.array([x_,t_])
    +
    +            g_t = g_trial(point,P)
    +            g_t_jacobian = g_t_jacobian_func(point,P)
    +            g_t_hessian = g_t_hessian_func(point,P)
    +
    +            g_t_dt = g_t_jacobian[1]
    +            g_t_d2x = g_t_hessian[0][0]
    +
    +            func = f(point)
    +
    +            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
    +            cost_sum += err_sqr
    +
    +    return cost_sum /( np.size(x)*np.size(t) )
    +
    +## For comparison, define the analytical solution
    +def g_analytic(point):
    +    x,t = point
    +    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)
    +
    +## Set up a function for training the network to solve for the equation
    +def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
    +    ## Set up initial weigths and biases
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: ',cost_function(P, x, t))
    +
    +    cost_function_grad = grad(cost_function,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        cost_grad =  cost_function_grad(P, x , t)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_grad[l]
    +
    +    print('Final cost: ',cost_function(P, x, t))
    +
    +    return P
    +
    +if __name__ == '__main__':
    +    ### Use the neural network:
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10; Nt = 10
    +    x = np.linspace(0, 1, Nx)
    +    t = np.linspace(0,1,Nt)
    +
    +    ## Set up the parameters for the network
    +    num_hidden_neurons = [100, 25]
    +    num_iter = 250
    +    lmb = 0.01
    +
    +    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
    +
    +    ## Store the results
    +    g_dnn_ag = np.zeros((Nx, Nt))
    +    G_analytical = np.zeros((Nx, Nt))
    +    for i,x_ in enumerate(x):
    +        for j, t_ in enumerate(t):
    +            point = np.array([x_, t_])
    +            g_dnn_ag[i,j] = g_trial(point,P)
    +
    +            G_analytical[i,j] = g_analytic(point)
    +
    +    # Find the map difference between the analytical and the computed solution
    +    diff_ag = np.abs(g_dnn_ag - G_analytical)
    +    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))
    +
    +    ## Plot the solutions in two dimensions, that being in position and time
    +
    +    T,X = np.meshgrid(t,x)
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))
    +    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Analytical solution')
    +    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Difference')
    +    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +    ## Take some slices of the 3D plots just to see the solutions at particular times
    +    indx1 = 0
    +    indx2 = int(Nt/2)
    +    indx3 = Nt-1
    +
    +    t1 = t[indx1]
    +    t2 = t[indx2]
    +    t3 = t[indx3]
    +
    +    # Slice the results from the DNN
    +    res1 = g_dnn_ag[:,indx1]
    +    res2 = g_dnn_ag[:,indx2]
    +    res3 = g_dnn_ag[:,indx3]
    +
    +    # Slice the analytical results
    +    res_analytical1 = G_analytical[:,indx1]
    +    res_analytical2 = G_analytical[:,indx2]
    +    res_analytical3 = G_analytical[:,indx3]
    +
    +    # Plot the slices
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t1)
    +    plt.plot(x, res1)
    +    plt.plot(x,res_analytical1)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t2)
    +    plt.plot(x, res2)
    +    plt.plot(x,res_analytical2)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t3)
    +    plt.plot(x, res3)
    +    plt.plot(x,res_analytical3)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Example: Solving the wave equation with Neural Networks#

    +

    The wave equation is

    +
    +\[ +\frac{\partial^2 g(x,t)}{\partial t^2} = c^2\frac{\partial^2 g(x,t)}{\partial x^2} +\]
    +

    with \(c\) being the specified wave speed.

    +

    Here, the chosen conditions are

    +
    +\[\begin{split} +\begin{align*} + g(0,t) &= 0 \\ + g(1,t) &= 0 \\ + g(x,0) &= u(x) \\ + \frac{\partial g(x,t)}{\partial t} \Big |_{t = 0} &= v(x) +\end{align*} +\end{split}\]
    +

    where \(\frac{\partial g(x,t)}{\partial t} \Big |_{t = 0}\) means the derivative of \(g(x,t)\) with respect to \(t\) is evaluated at \(t = 0\), and \(u(x)\) and \(v(x)\) being given functions.

    +
    +
    +

    The problem to solve for#

    +

    The wave equation to solve for, is

    + +
    +
    +\[ +\begin{equation} \label{wave} \tag{19} +\frac{\partial^2 g(x,t)}{\partial t^2} = c^2 \frac{\partial^2 g(x,t)}{\partial x^2} +\end{equation} +\]
    +

    where \(c\) is the given wave speed. +The chosen conditions for this equation are

    + +
    +
    +\[\begin{split} +\begin{aligned} +g(0,t) &= 0, &t \geq 0 \\ +g(1,t) &= 0, &t \geq 0 \\ +g(x,0) &= u(x), &x\in[0,1] \\ +\frac{\partial g(x,t)}{\partial t}\Big |_{t = 0} &= v(x), &x \in [0,1] +\end{aligned} \label{condwave} \tag{20} +\end{split}\]
    +

    In this example, let \(c = 1\) and \(u(x) = \sin(\pi x)\) and \(v(x) = -\pi\sin(\pi x)\).

    +
    +
    +

    The trial solution#

    +

    Setting up the network is done in similar matter as for the example of solving the diffusion equation. +The only things we have to change, is the trial solution such that it satisfies the conditions from (20) and the cost function.

    +

    The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution \(g_t(x,t)\) is

    +
    +\[ +g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P) +\]
    +

    where

    +
    +\[ +h_1(x,t) = (1-t^2)u(x) + tv(x) +\]
    +

    Note that this trial solution satisfies the conditions only if \(u(0) = v(0) = u(1) = v(1) = 0\), which is the case in this example.

    +
    +
    +

    The analytical solution#

    +

    The analytical solution for our specific problem, is

    +
    +\[ +g(x,t) = \sin(\pi x)\cos(\pi t) - \sin(\pi x)\sin(\pi t) +\]
    +
    +
    +

    Solving the wave equation - the full program using Autograd#

    +
    +
    +
    import autograd.numpy as np
    +from autograd import hessian,grad
    +import autograd.numpy.random as npr
    +from matplotlib import cm
    +from matplotlib import pyplot as plt
    +from mpl_toolkits.mplot3d import axes3d
    +
    +## Set up the trial function:
    +def u(x):
    +    return np.sin(np.pi*x)
    +
    +def v(x):
    +    return -np.pi*np.sin(np.pi*x)
    +
    +def h1(point):
    +    x,t = point
    +    return (1 - t**2)*u(x) + t*v(x)
    +
    +def g_trial(point,P):
    +    x,t = point
    +    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)
    +
    +## Define the cost function
    +def cost_function(P, x, t):
    +    cost_sum = 0
    +
    +    g_t_hessian_func = hessian(g_trial)
    +
    +    for x_ in x:
    +        for t_ in t:
    +            point = np.array([x_,t_])
    +
    +            g_t_hessian = g_t_hessian_func(point,P)
    +
    +            g_t_d2x = g_t_hessian[0][0]
    +            g_t_d2t = g_t_hessian[1][1]
    +
    +            err_sqr = ( (g_t_d2t - g_t_d2x) )**2
    +            cost_sum += err_sqr
    +
    +    return cost_sum / (np.size(t) * np.size(x))
    +
    +## The neural network
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # x is now a point and a 1D numpy array; make it a column vector
    +    num_coordinates = np.size(x,0)
    +    x = x.reshape(num_coordinates,-1)
    +
    +    num_points = np.size(x,1)
    +
    +    # N_hidden is the number of hidden layers
    +    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output[0][0]
    +
    +## The analytical solution
    +def g_analytic(point):
    +    x,t = point
    +    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)
    +
    +def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
    +    ## Set up initial weigths and biases
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: ',cost_function(P, x, t))
    +
    +    cost_function_grad = grad(cost_function,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        cost_grad =  cost_function_grad(P, x , t)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_grad[l]
    +
    +
    +    print('Final cost: ',cost_function(P, x, t))
    +
    +    return P
    +
    +if __name__ == '__main__':
    +    ### Use the neural network:
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10; Nt = 10
    +    x = np.linspace(0, 1, Nx)
    +    t = np.linspace(0,1,Nt)
    +
    +    ## Set up the parameters for the network
    +    num_hidden_neurons = [50,20]
    +    num_iter = 1000
    +    lmb = 0.01
    +
    +    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
    +
    +    ## Store the results
    +    res = np.zeros((Nx, Nt))
    +    res_analytical = np.zeros((Nx, Nt))
    +    for i,x_ in enumerate(x):
    +        for j, t_ in enumerate(t):
    +            point = np.array([x_, t_])
    +            res[i,j] = g_trial(point,P)
    +
    +            res_analytical[i,j] = g_analytic(point)
    +
    +    diff = np.abs(res - res_analytical)
    +    print("Max difference between analytical and solution from nn: %g"%np.max(diff))
    +
    +    ## Plot the solutions in two dimensions, that being in position and time
    +
    +    T,X = np.meshgrid(t,x)
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))
    +    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Analytical solution')
    +    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Difference')
    +    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +    ## Take some slices of the 3D plots just to see the solutions at particular times
    +    indx1 = 0
    +    indx2 = int(Nt/2)
    +    indx3 = Nt-1
    +
    +    t1 = t[indx1]
    +    t2 = t[indx2]
    +    t3 = t[indx3]
    +
    +    # Slice the results from the DNN
    +    res1 = res[:,indx1]
    +    res2 = res[:,indx2]
    +    res3 = res[:,indx3]
    +
    +    # Slice the analytical results
    +    res_analytical1 = res_analytical[:,indx1]
    +    res_analytical2 = res_analytical[:,indx2]
    +    res_analytical3 = res_analytical[:,indx3]
    +
    +    # Plot the slices
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t1)
    +    plt.plot(x, res1)
    +    plt.plot(x,res_analytical1)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t2)
    +    plt.plot(x, res2)
    +    plt.plot(x,res_analytical2)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t3)
    +    plt.plot(x, res3)
    +    plt.plot(x,res_analytical3)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Resources on differential equations and deep learning#

    +
      +
    1. Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al

    2. +
    3. Neural networks for solving differential equations by A. Honchar

    4. +
    5. Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener

    6. +
    7. Introduction to Partial Differential Equations by A. Tveito, R. Winther

    8. +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week44.html b/doc/LectureNotes/_build/html/week44.html new file mode 100644 index 000000000..f55e16b0d --- /dev/null +++ b/doc/LectureNotes/_build/html/week44.html @@ -0,0 +1,3493 @@ + + + + + + + + + + + Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo, Norway

    +

    Date: Week 44

    +
    +

    Plan for week 44#

    +

    Material for the lecture Monday October 27, 2025.

    +
      +
    1. Solving differential equations, continuation from last week, first lecture

    2. +
    3. Convolutional Neural Networks, second lecture

    4. +
    5. Readings and Videos:

    6. +
    + +
    +
    +

    Lab sessions on Tuesday and Wednesday#

    +
      +
    • Main focus is discussion of and work on project 2

    • +
    • If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday

    • +
    +
    +
    +

    Material for Lecture Monday October 27#

    +
    +
    +

    Solving differential equations with Deep Learning#

    +

    The Universal Approximation Theorem states that a neural network can +approximate any function at a single hidden layer along with one input +and output layer to any given precision.

    +

    Book on solving differential equations with ML methods.

    +

    An Introduction to Neural Network Methods for Differential Equations, by Yadav and Kumar.

    +

    Physics informed neural networks.

    +

    Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next, by Cuomo et al

    +

    Thanks to Kristine Baluka Hein.

    +

    The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI. +A great thanks to Kristine.

    +
    +
    +

    Ordinary Differential Equations first#

    +

    An ordinary differential equation (ODE) is an equation involving functions having one variable.

    +

    In general, an ordinary differential equation looks like

    + +
    +
    +\[ +\begin{equation} \label{ode} \tag{1} +f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0 +\end{equation} +\]
    +

    where \(g(x)\) is the function to find, and \(g^{(n)}(x)\) is the \(n\)-th derivative of \(g(x)\).

    +

    The \(f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right)\) is just a way to write that there is an expression involving \(x\) and \(g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x)\) on the left side of the equality sign in (1). +The highest order of derivative, that is the value of \(n\), determines to the order of the equation. +The equation is referred to as a \(n\)-th order ODE. +Along with (1), some additional conditions of the function \(g(x)\) are typically given +for the solution to be unique.

    +
    +
    +

    The trial solution#

    +

    Let the trial solution \(g_t(x)\) be

    + +
    +
    +\[ +\begin{equation} + g_t(x) = h_1(x) + h_2(x,N(x,P)) +\label{_auto1} \tag{2} +\end{equation} +\]
    +

    where \(h_1(x)\) is a function that makes \(g_t(x)\) satisfy a given set +of conditions, \(N(x,P)\) a neural network with weights and biases +described by \(P\) and \(h_2(x, N(x,P))\) some expression involving the +neural network. The role of the function \(h_2(x, N(x,P))\), is to +ensure that the output from \(N(x,P)\) is zero when \(g_t(x)\) is +evaluated at the values of \(x\) where the given conditions must be +satisfied. The function \(h_1(x)\) should alone make \(g_t(x)\) satisfy +the conditions.

    +

    But what about the network \(N(x,P)\)?

    +

    As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.

    +
    +
    +

    Minimization process#

    +

    For the minimization to be defined, we need to have a cost function at hand to minimize.

    +

    It is given that \(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\) should be equal to zero in (1). +We can choose to consider the mean squared error as the cost function for an input \(x\). +Since we are looking at one input, the cost function is just \(f\) squared. +The cost function \(c\left(x, P \right)\) can therefore be expressed as

    +
    +\[ +C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2 +\]
    +

    If \(N\) inputs are given as a vector \(\boldsymbol{x}\) with elements \(x_i\) for \(i = 1,\dots,N\), +the cost function becomes

    + +
    +
    +\[ +\begin{equation} \label{cost} \tag{3} + C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2 +\end{equation} +\]
    +

    The neural net should then find the parameters \(P\) that minimizes the cost function in +(3) for a set of \(N\) training samples \(x_i\).

    +
    +
    +

    Minimizing the cost function using gradient descent and automatic differentiation#

    +

    To perform the minimization using gradient descent, the gradient of \(C\left(\boldsymbol{x}, P\right)\) is needed. +It might happen so that finding an analytical expression of the gradient of \(C(\boldsymbol{x}, P)\) from (3) gets too messy, depending on which cost function one desires to use.

    +

    Luckily, there exists libraries that makes the job for us through automatic differentiation. +Automatic differentiation is a method of finding the derivatives numerically with very high precision.

    +
    +
    +

    Example: Exponential decay#

    +

    An exponential decay of a quantity \(g(x)\) is described by the equation

    + +
    +
    +\[ +\begin{equation} \label{solve_expdec} \tag{4} + g'(x) = -\gamma g(x) +\end{equation} +\]
    +

    with \(g(0) = g_0\) for some chosen initial value \(g_0\).

    +

    The analytical solution of (4) is

    + +
    +
    +\[ +\begin{equation} + g(x) = g_0 \exp\left(-\gamma x\right) +\label{_auto2} \tag{5} +\end{equation} +\]
    +

    Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of (4).

    +
    +
    +

    The function to solve for#

    +

    The program will use a neural network to solve

    + +
    +
    +\[ +\begin{equation} \label{solveode} \tag{6} +g'(x) = -\gamma g(x) +\end{equation} +\]
    +

    where \(g(0) = g_0\) with \(\gamma\) and \(g_0\) being some chosen values.

    +

    In this example, \(\gamma = 2\) and \(g_0 = 10\).

    +
    +
    +

    The trial solution#

    +

    To begin with, a trial solution \(g_t(t)\) must be chosen. A general trial solution for ordinary differential equations could be

    +
    +\[ +g_t(x, P) = h_1(x) + h_2(x, N(x, P)) +\]
    +

    with \(h_1(x)\) ensuring that \(g_t(x)\) satisfies some conditions and \(h_2(x,N(x, P))\) an expression involving \(x\) and the output from the neural network \(N(x,P)\) with \(P \) being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.

    +
    +
    +

    Setup of Network#

    +

    In this network, there are no weights and bias at the input layer, so \(P = \{ P_{\text{hidden}}, P_{\text{output}} \}\). +If there are \(N_{\text{hidden} }\) neurons in the hidden layer, then \(P_{\text{hidden}}\) is a \(N_{\text{hidden} } \times (1 + N_{\text{input}})\) matrix, given that there are \(N_{\text{input}}\) neurons in the input layer.

    +

    The first column in \(P_{\text{hidden} }\) represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer. +If there are \(N_{\text{output} }\) neurons in the output layer, then \(P_{\text{output}} \) is a \(N_{\text{output} } \times (1 + N_{\text{hidden} })\) matrix.

    +

    Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.

    +

    It is given that \(g(0) = g_0\). The trial solution must fulfill this condition to be a proper solution of (6). A possible way to ensure that \(g_t(0, P) = g_0\), is to let \(F(N(x,P)) = x \cdot N(x,P)\) and \(h_1(x) = g_0\). This gives the following trial solution:

    + +
    +
    +\[ +\begin{equation} \label{trial} \tag{7} +g_t(x, P) = g_0 + x \cdot N(x, P) +\end{equation} +\]
    +
    +
    +

    Reformulating the problem#

    +

    We wish that our neural network manages to minimize a given cost function.

    +

    A reformulation of out equation, (6), must therefore be done, +such that it describes the problem a neural network can solve for.

    +

    The neural network must find the set of weights and biases \(P\) such that the trial solution in (7) satisfies (6).

    +

    The trial solution

    +
    +\[ +g_t(x, P) = g_0 + x \cdot N(x, P) +\]
    +

    has been chosen such that it already solves the condition \(g(0) = g_0\). What remains, is to find \(P\) such that

    + +
    +
    +\[ +\begin{equation} \label{nnmin} \tag{8} +g_t'(x, P) = - \gamma g_t(x, P) +\end{equation} +\]
    +

    is fulfilled as best as possible.

    +
    +
    +

    More technicalities#

    +

    The left hand side and right hand side of (8) must be computed separately, and then the neural network must choose weights and biases, contained in \(P\), such that the sides are equal as best as possible. +This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero. +In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \(P\) of the neural network.

    +

    This gives the following cost function our neural network must solve for:

    +
    +\[ +\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\} +\]
    +

    (the notation \(\min_{P}\{ f(x, P) \}\) means that we desire to find \(P\) that yields the minimum of \(f(x, P)\))

    +

    or, in terms of weights and biases for the hidden and output layer in our network:

    +
    +\[ +\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\} +\]
    +

    for an input value \(x\).

    +
    +
    +

    More details#

    +

    If the neural network evaluates \(g_t(x, P)\) at more values for \(x\), say \(N\) values \(x_i\) for \(i = 1, \dots, N\), then the total error to minimize becomes

    + +
    +
    +\[ +\begin{equation} \label{min} \tag{9} +\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\} +\end{equation} +\]
    +

    Letting \(\boldsymbol{x}\) be a vector with elements \(x_i\) and \(C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2\) denote the cost function, the minimization problem that our network must solve, becomes

    +
    +\[ +\min_{P} C(\boldsymbol{x}, P) +\]
    +

    In terms of \(P_{\text{hidden} }\) and \(P_{\text{output} }\), this could also be expressed as

    +
    +\[ +\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\}) +\]
    +
    +
    +

    A possible implementation of a neural network#

    +

    For simplicity, it is assumed that the input is an array \(\boldsymbol{x} = (x_1, \dots, x_N)\) with \(N\) elements. It is at these points the neural network should find \(P\) such that it fulfills (9).

    +

    First, the neural network must feed forward the inputs. +This means that \(\boldsymbol{x}s\) must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further. +The input layer will consist of \(N_{\text{input} }\) neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be \(N_{\text{hidden} }\).

    +
    +
    +

    Technicalities#

    +

    For the \(i\)-th in the hidden layer with weight \(w_i^{\text{hidden} }\) and bias \(b_i^{\text{hidden} }\), the weighting from the \(j\)-th neuron at the input layer is:

    +
    +\[\begin{split} +\begin{aligned} +z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\ +&= +\begin{pmatrix} +b_i^{\text{hidden}} & w_i^{\text{hidden}} +\end{pmatrix} +\begin{pmatrix} +1 \\ +x_j +\end{pmatrix} +\end{aligned} +\end{split}\]
    +
    +
    +

    Final technicalities I#

    +

    The result after weighting the inputs at the \(i\)-th hidden neuron can be written as a vector:

    +
    +\[\begin{split} +\begin{aligned} +\boldsymbol{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big) \\ +&= +\begin{pmatrix} + b_i^{\text{hidden}} & w_i^{\text{hidden}} +\end{pmatrix} +\begin{pmatrix} +1 & 1 & \dots & 1 \\ +x_1 & x_2 & \dots & x_N +\end{pmatrix} \\ +&= \boldsymbol{p}_{i, \text{hidden}}^T X +\end{aligned} +\end{split}\]
    +
    +
    +

    Final technicalities II#

    +

    The vector \(\boldsymbol{p}_{i, \text{hidden}}^T\) constitutes each row in \(P_{\text{hidden} }\), which contains the weights for the neural network to minimize according to (9).

    +

    After having found \(\boldsymbol{z}_{i}^{\text{hidden}} \) for every \(i\)-th neuron within the hidden layer, the vector will be sent to an activation function \(a_i(\boldsymbol{z})\).

    +

    In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:

    +
    +\[ +f(z) = \frac{1}{1 + \exp{(-z)}} +\]
    +

    It is possible to use other activations functions for the hidden layer also.

    +

    The output \(\boldsymbol{x}_i^{\text{hidden}}\) from each \(i\)-th hidden neuron is:

    +
    +\[ +\boldsymbol{x}_i^{\text{hidden} } = f\big( \boldsymbol{z}_{i}^{\text{hidden}} \big) +\]
    +

    The outputs \(\boldsymbol{x}_i^{\text{hidden} } \) are then sent to the output layer.

    +

    The output layer consists of one neuron in this case, and combines the +output from each of the neurons in the hidden layers. The output layer +combines the results from the hidden layer using some weights \(w_i^{\text{output}}\) +and biases \(b_i^{\text{output}}\). In this case, +it is assumes that the number of neurons in the output layer is one.

    +
    +
    +

    Final technicalities III#

    +

    The procedure of weighting the output neuron \(j\) in the hidden layer to the \(i\)-th neuron in the output layer is similar as for the hidden layer described previously.

    +
    +\[\begin{split} +\begin{aligned} +z_{1,j}^{\text{output}} & = +\begin{pmatrix} +b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}} +\end{pmatrix} +\begin{pmatrix} +1 \\ +\boldsymbol{x}_j^{\text{hidden}} +\end{pmatrix} +\end{aligned} +\end{split}\]
    +
    +
    +

    Final technicalities IV#

    +

    Expressing \(z_{1,j}^{\text{output}}\) as a vector gives the following way of weighting the inputs from the hidden layer:

    +
    +\[\begin{split} +\boldsymbol{z}_{1}^{\text{output}} = +\begin{pmatrix} +b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}} +\end{pmatrix} +\begin{pmatrix} +1 & 1 & \dots & 1 \\ +\boldsymbol{x}_1^{\text{hidden}} & \boldsymbol{x}_2^{\text{hidden}} & \dots & \boldsymbol{x}_N^{\text{hidden}} +\end{pmatrix} +\end{split}\]
    +

    In this case we seek a continuous range of values since we are approximating a function. This means that after computing \(\boldsymbol{z}_{1}^{\text{output}}\) the neural network has finished its feed forward step, and \(\boldsymbol{z}_{1}^{\text{output}}\) is the final output of the network.

    +
    +
    +

    Back propagation#

    +

    The next step is to decide how the parameters should be changed such that they minimize the cost function.

    +

    The chosen cost function for this problem is

    +
    +\[ +C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 +\]
    +

    In order to minimize the cost function, an optimization method must be chosen.

    +

    Here, gradient descent with a constant step size has been chosen.

    +
    +
    +

    Gradient descent#

    +

    The idea of the gradient descent algorithm is to update parameters in +a direction where the cost function decreases goes to a minimum.

    +

    In general, the update of some parameters \(\boldsymbol{\omega}\) given a cost +function defined by some weights \(\boldsymbol{\omega}\), \(C(\boldsymbol{x}, +\boldsymbol{\omega})\), goes as follows:

    +
    +\[ +\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega}) +\]
    +

    for a number of iterations or until \( \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|\) becomes smaller than some given tolerance.

    +

    The value of \(\lambda\) decides how large steps the algorithm must take +in the direction of \( \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})\). +The notation \(\nabla_{\boldsymbol{\omega}}\) express the gradient with respect +to the elements in \(\boldsymbol{\omega}\).

    +

    In our case, we have to minimize the cost function \(C(\boldsymbol{x}, P)\) with +respect to the two sets of weights and biases, that is for the hidden +layer \(P_{\text{hidden} }\) and for the output layer \(P_{\text{output} +}\) .

    +

    This means that \(P_{\text{hidden} }\) and \(P_{\text{output} }\) is updated by

    +
    +\[\begin{split} +\begin{aligned} +P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P) \\ +P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P) +\end{aligned} +\end{split}\]
    +
    +
    +

    The code for solving the ODE#

    +
    +
    +
    %matplotlib inline
    +
    +import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +# Assuming one input, hidden, and output layer
    +def neural_network(params, x):
    +
    +    # Find the weights (including and biases) for the hidden and output layer.
    +    # Assume that params is a list of parameters for each layer.
    +    # The biases are the first element for each array in params,
    +    # and the weights are the remaning elements in each array in params.
    +
    +    w_hidden = params[0]
    +    w_output = params[1]
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    ## Hidden layer:
    +
    +    # Add a row of ones to include bias
    +    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)
    +
    +    z_hidden = np.matmul(w_hidden, x_input)
    +    x_hidden = sigmoid(z_hidden)
    +
    +    ## Output layer:
    +
    +    # Include bias:
    +    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_hidden)
    +    x_output = z_output
    +
    +    return x_output
    +
    +# The trial solution using the deep neural network:
    +def g_trial(x,params, g0 = 10):
    +    return g0 + x*neural_network(params,x)
    +
    +# The right side of the ODE:
    +def g(x, g_trial, gamma = 2):
    +    return -gamma*g_trial
    +
    +# The cost function:
    +def cost_function(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial(x,P)
    +
    +    # Find the derivative w.r.t x of the neural network
    +    d_net_out = elementwise_grad(neural_network,1)(P,x)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d_g_t = elementwise_grad(g_trial,0)(x,P)
    +
    +    # The right side of the ODE
    +    func = g(x, g_t)
    +
    +    err_sqr = (d_g_t - func)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum / np.size(err_sqr)
    +
    +# Solve the exponential decay ODE using neural network with one input, hidden, and output layer
    +def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):
    +    ## Set up initial weights and biases
    +
    +    # For the hidden layer
    +    p0 = npr.randn(num_neurons_hidden, 2 )
    +
    +    # For the output layer
    +    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included
    +
    +    P = [p0, p1]
    +
    +    print('Initial cost: %g'%cost_function(P, x))
    +
    +    ## Start finding the optimal weights using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_grad = grad(cost_function,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of two arrays;
    +        # one for the gradient w.r.t P_hidden and
    +        # one for the gradient w.r.t P_output
    +        cost_grad =  cost_function_grad(P, x)
    +
    +        P[0] = P[0] - lmb * cost_grad[0]
    +        P[1] = P[1] - lmb * cost_grad[1]
    +
    +    print('Final cost: %g'%cost_function(P, x))
    +
    +    return P
    +
    +def g_analytic(x, gamma = 2, g0 = 10):
    +    return g0*np.exp(-gamma*x)
    +
    +# Solve the given problem
    +if __name__ == '__main__':
    +    # Set seed such that the weight are initialized
    +    # with same weights and biases for every run.
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    N = 10
    +    x = np.linspace(0, 1, N)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = 10
    +    num_iter = 10000
    +    lmb = 0.001
    +
    +    # Use the network
    +    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    # Print the deviation from the trial solution and true solution
    +    res = g_trial(x,P)
    +    res_analytical = g_analytic(x)
    +
    +    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))
    +
    +    # Plot the results
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, res_analytical)
    +    plt.plot(x, res[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('x')
    +    plt.ylabel('g(x)')
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    The network with one input layer, specified number of hidden layers, and one output layer#

    +

    It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.

    +

    The number of neurons within each hidden layer are given as a list of integers in the program below.

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +# The neural network with one input layer and one output layer,
    +# but with number of hidden layers specified by the user.
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +# The trial solution using the deep neural network:
    +def g_trial_deep(x,params, g0 = 10):
    +    return g0 + x*deep_neural_network(params, x)
    +
    +# The right side of the ODE:
    +def g(x, g_trial, gamma = 2):
    +    return -gamma*g_trial
    +
    +# The same cost function as before, but calls deep_neural_network instead.
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the neural network
    +    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
    +
    +    # The right side of the ODE
    +    func = g(x, g_t)
    +
    +    err_sqr = (d_g_t - func)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum / np.size(err_sqr)
    +
    +# Solve the exponential decay ODE using neural network with one input and one output layer,
    +# but with specified number of hidden layers from the user.
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # The number of elements in the list num_hidden_neurons thus represents
    +    # the number of hidden layers.
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weights and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weights using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +def g_analytic(x, gamma = 2, g0 = 10):
    +    return g0*np.exp(-gamma*x)
    +
    +# Solve the given problem
    +if __name__ == '__main__':
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    N = 10
    +    x = np.linspace(0, 1, N)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = np.array([10,10])
    +    num_iter = 10000
    +    lmb = 0.001
    +
    +    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    res = g_trial_deep(x,P)
    +    res_analytical = g_analytic(x)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, res_analytical)
    +    plt.plot(x, res[0,:])
    +    plt.legend(['analytical','dnn'])
    +    plt.ylabel('g(x)')
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Example: Population growth#

    +

    A logistic model of population growth assumes that a population converges toward an equilibrium. +The population growth can be modeled by

    + +
    +
    +\[ +\begin{equation} \label{log} \tag{10} + g'(t) = \alpha g(t)(A - g(t)) +\end{equation} +\]
    +

    where \(g(t)\) is the population density at time \(t\), \(\alpha > 0\) the growth rate and \(A > 0\) is the maximum population number in the environment. +Also, at \(t = 0\) the population has the size \(g(0) = g_0\), where \(g_0\) is some chosen constant.

    +

    In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability +and high execution time (this might be more apparent in the examples solving PDEs), +using a library like TensorFlow is recommended. +Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.

    +
    +
    +

    Setting up the problem#

    +

    Here, we will model a population \(g(t)\) in an environment having carrying capacity \(A\). +The population follows the model

    + +
    +
    +\[ +\begin{equation} \label{solveode_population} \tag{11} +g'(t) = \alpha g(t)(A - g(t)) +\end{equation} +\]
    +

    where \(g(0) = g_0\).

    +

    In this example, we let \(\alpha = 2\), \(A = 1\), and \(g_0 = 1.2\).

    +
    +
    +

    The trial solution#

    +

    We will get a slightly different trial solution, as the boundary conditions are different +compared to the case for exponential decay.

    +

    A possible trial solution satisfying the condition \(g(0) = g_0\) could be

    +
    +\[ +h_1(t) = g_0 + t \cdot N(t,P) +\]
    +

    with \(N(t,P)\) being the output from the neural network with weights and biases for each layer collected in the set \(P\).

    +

    The analytical solution is

    +
    +\[ +g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)} +\]
    +
    +
    +

    The program using Autograd#

    +

    The network will be the similar as for the exponential decay example, but with some small modifications for our problem.

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +# Function to get the parameters.
    +# Done such that one can easily change the paramaters after one's liking.
    +def get_parameters():
    +    alpha = 2
    +    A = 1
    +    g0 = 1.2
    +    return alpha, A, g0
    +
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +
    +
    +
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
    +
    +    # The right side of the ODE
    +    func = f(x, g_t)
    +
    +    err_sqr = (d_g_t - func)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum / np.size(err_sqr)
    +
    +# The right side of the ODE:
    +def f(x, g_trial):
    +    alpha,A, g0 = get_parameters()
    +    return alpha*g_trial*(A - g_trial)
    +
    +# The trial solution using the deep neural network:
    +def g_trial_deep(x, params):
    +    alpha,A, g0 = get_parameters()
    +    return g0 + x*deep_neural_network(params,x)
    +
    +# The analytical solution:
    +def g_analytic(t):
    +    alpha,A, g0 = get_parameters()
    +    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
    +
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weigths using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nt = 10
    +    T = 1
    +    t = np.linspace(0,T, Nt)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [100, 50, 25]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(t,P)
    +    g_analytical = g_analytic(t)
    +
    +    # Find the maximum absolute difference between the solutons:
    +    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
    +    print("The max absolute difference between the solutions is: %g"%diff_ag)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(t, g_analytical)
    +    plt.plot(t, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('t')
    +    plt.ylabel('g(t)')
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Using forward Euler to solve the ODE#

    +

    A straightforward way of solving an ODE numerically, is to use Euler’s method.

    +

    Euler’s method uses Taylor series to approximate the value at a function \(f\) at a step \(\Delta x\) from \(x\):

    +
    +\[ +f(x + \Delta x) \approx f(x) + \Delta x f'(x) +\]
    +

    In our case, using Euler’s method to approximate the value of \(g\) at a step \(\Delta t\) from \(t\) yields

    +
    +\[\begin{split} +\begin{aligned} + g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\ + &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big) +\end{aligned} +\end{split}\]
    +

    along with the condition that \(g(0) = g_0\).

    +

    Let \(t_i = i \cdot \Delta t\) where \(\Delta t = \frac{T}{N_t-1}\) where \(T\) is the final time our solver must solve for and \(N_t\) the number of values for \(t \in [0, T]\) for \(i = 0, \dots, N_t-1\).

    +

    For \(i \geq 1\), we have that

    +
    +\[\begin{split} +\begin{aligned} +t_i &= i\Delta t \\ +&= (i - 1)\Delta t + \Delta t \\ +&= t_{i-1} + \Delta t +\end{aligned} +\end{split}\]
    +

    Now, if \(g_i = g(t_i)\) then

    + +
    +
    +\[\begin{split} +\begin{equation} + \begin{aligned} + g_i &= g(t_i) \\ + &= g(t_{i-1} + \Delta t) \\ + &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\ + &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big) + \end{aligned} +\end{equation} \label{odenum} \tag{12} +\end{split}\]
    +

    for \(i \geq 1\) and \(g_0 = g(t_0) = g(0) = g_0\).

    +

    Equation (12) could be implemented in the following way, +extending the program that uses the network using Autograd:

    +
    +
    +
    # Assume that all function definitions from the example program using Autograd
    +# are located here.
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nt = 10
    +    T = 1
    +    t = np.linspace(0,T, Nt)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [100,50,25]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(t,P)
    +    g_analytical = g_analytic(t)
    +
    +    # Find the maximum absolute difference between the solutons:
    +    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
    +    print("The max absolute difference between the solutions is: %g"%diff_ag)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(t, g_analytical)
    +    plt.plot(t, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('t')
    +    plt.ylabel('g(t)')
    +
    +    ## Find an approximation to the funtion using forward Euler
    +
    +    alpha, A, g0 = get_parameters()
    +    dt = T/(Nt - 1)
    +
    +    # Perform forward Euler to solve the ODE
    +    g_euler = np.zeros(Nt)
    +    g_euler[0] = g0
    +
    +    for i in range(1,Nt):
    +        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))
    +
    +    # Print the errors done by each method
    +    diff1 = np.max(np.abs(g_euler - g_analytical))
    +    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))
    +
    +    print('Max absolute difference between Euler method and analytical: %g'%diff1)
    +    print('Max absolute difference between deep neural network and analytical: %g'%diff2)
    +
    +    # Plot results
    +    plt.figure(figsize=(10,10))
    +
    +    plt.plot(t,g_euler)
    +    plt.plot(t,g_analytical)
    +    plt.plot(t,g_dnn_ag[0,:])
    +
    +    plt.legend(['euler','analytical','dnn'])
    +    plt.xlabel('Time t')
    +    plt.ylabel('g(t)')
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Example: Solving the one dimensional Poisson equation#

    +

    The Poisson equation for \(g(x)\) in one dimension is

    + +
    +
    +\[ +\begin{equation} \label{poisson} \tag{13} + -g''(x) = f(x) +\end{equation} +\]
    +

    where \(f(x)\) is a given function for \(x \in (0,1)\).

    +

    The conditions that \(g(x)\) is chosen to fulfill, are

    +
    +\[\begin{split} +\begin{align*} + g(0) &= 0 \\ + g(1) &= 0 +\end{align*} +\end{split}\]
    +

    This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used. +The results from the networks can then be compared to the analytical solution. +In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.

    +
    +
    +

    The specific equation to solve for#

    +

    Here, the function \(g(x)\) to solve for follows the equation

    +
    +\[ +-g''(x) = f(x),\qquad x \in (0,1) +\]
    +

    where \(f(x)\) is a given function, along with the chosen conditions

    + +
    +
    +\[ +\begin{aligned} +g(0) = g(1) = 0 +\end{aligned}\label{cond} \tag{14} +\]
    +

    In this example, we consider the case when \(f(x) = (3x + x^2)\exp(x)\).

    +

    For this case, a possible trial solution satisfying the conditions could be

    +
    +\[ +g_t(x) = x \cdot (1-x) \cdot N(P,x) +\]
    +

    The analytical solution for this problem is

    +
    +\[ +g(x) = x(1 - x)\exp(x) +\]
    +
    +
    +

    Solving the equation using Autograd#

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weigths using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +## Set up the cost function specified for this Poisson equation:
    +
    +# The right side of the ODE
    +def f(x):
    +    return (3*x + x**2)*np.exp(x)
    +
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
    +
    +    right_side = f(x)
    +
    +    err_sqr = (-d2_g_t - right_side)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum/np.size(err_sqr)
    +
    +# The trial solution:
    +def g_trial_deep(x,P):
    +    return x*(1-x)*deep_neural_network(P,x)
    +
    +# The analytic solution;
    +def g_analytic(x):
    +    return x*(1-x)*np.exp(x)
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10
    +    x = np.linspace(0,1, Nx)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [200,100]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(x,P)
    +    g_analytical = g_analytic(x)
    +
    +    # Find the maximum absolute difference between the solutons:
    +    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
    +    print("The max absolute difference between the solutions is: %g"%max_diff)
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, g_analytical)
    +    plt.plot(x, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('x')
    +    plt.ylabel('g(x)')
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Comparing with a numerical scheme#

    +

    The Poisson equation is possible to solve using Taylor series to approximate the second derivative.

    +

    Using Taylor series, the second derivative can be expressed as

    +
    +\[ +g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x) +\]
    +

    where \(\Delta x\) is a small step size and \(E_{\Delta x}(x)\) being the error term.

    +

    Looking away from the error terms gives an approximation to the second derivative:

    + +
    +
    +\[ +\begin{equation} \label{approx} \tag{15} +g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} +\end{equation} +\]
    +

    If \(x_i = i \Delta x = x_{i-1} + \Delta x\) and \(g_i = g(x_i)\) for \(i = 1,\dots N_x - 2\) with \(N_x\) being the number of values for \(x\), (15) becomes

    +
    +\[\begin{split} +\begin{aligned} +g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\ +&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} +\end{aligned} +\end{split}\]
    +

    Since we know from our problem that

    +
    +\[\begin{split} +\begin{aligned} +-g''(x) &= f(x) \\ +&= (3x + x^2)\exp(x) +\end{aligned} +\end{split}\]
    +

    along with the conditions \(g(0) = g(1) = 0\), +the following scheme can be used to find an approximate solution for \(g(x)\) numerically:

    + +
    +
    +\[\begin{split} +\begin{equation} + \begin{aligned} + -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\ + -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i) + \end{aligned} +\end{equation} \label{odesys} \tag{16} +\end{split}\]
    +

    for \(i = 1, \dots, N_x - 2\) where \(g_0 = g_{N_x - 1} = 0\) and \(f(x_i) = (3x_i + x_i^2)\exp(x_i)\), which is given for our specific problem.

    +

    The equation can be rewritten into a matrix equation:

    +
    +\[\begin{split} +\begin{aligned} +\begin{pmatrix} +2 & -1 & 0 & \dots & 0 \\ +-1 & 2 & -1 & \dots & 0 \\ +\vdots & & \ddots & & \vdots \\ +0 & \dots & -1 & 2 & -1 \\ +0 & \dots & 0 & -1 & 2\\ +\end{pmatrix} +\begin{pmatrix} +g_1 \\ +g_2 \\ +\vdots \\ +g_{N_x - 3} \\ +g_{N_x - 2} +\end{pmatrix} +&= +\Delta x^2 +\begin{pmatrix} +f(x_1) \\ +f(x_2) \\ +\vdots \\ +f(x_{N_x - 3}) \\ +f(x_{N_x - 2}) +\end{pmatrix} \\ +\boldsymbol{A}\boldsymbol{g} &= \boldsymbol{f}, +\end{aligned} +\end{split}\]
    +

    which makes it possible to solve for the vector \(\boldsymbol{g}\).

    +
    +
    +

    Setting up the code#

    +

    We can then compare the result from this numerical scheme with the output from our network using Autograd:

    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad, elementwise_grad
    +import autograd.numpy.random as npr
    +from matplotlib import pyplot as plt
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # N_hidden is the number of hidden layers
    +    # deep_params is a list, len() should be used
    +    N_hidden = len(deep_params) - 1 # -1 since params consists of
    +                                        # parameters to all the hidden
    +                                        # layers AND the output layer.
    +
    +    # Assumes input x being an one-dimensional array
    +    num_values = np.size(x)
    +    x = x.reshape(-1, num_values)
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +
    +    # Due to multiple hidden layers, define a variable referencing to the
    +    # output of the previous layer:
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output
    +
    +
    +def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
    +    # num_hidden_neurons is now a list of number of neurons within each hidden layer
    +
    +    # Find the number of hidden layers:
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 )
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: %g'%cost_function_deep(P, x))
    +
    +    ## Start finding the optimal weigths using gradient descent
    +
    +    # Find the Python function that represents the gradient of the cost function
    +    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
    +    cost_function_deep_grad = grad(cost_function_deep,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        # Evaluate the gradient at the current weights and biases in P.
    +        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
    +        # in the hidden layers and output layers evaluated at x.
    +        cost_deep_grad =  cost_function_deep_grad(P, x)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_deep_grad[l]
    +
    +    print('Final cost: %g'%cost_function_deep(P, x))
    +
    +    return P
    +
    +## Set up the cost function specified for this Poisson equation:
    +
    +# The right side of the ODE
    +def f(x):
    +    return (3*x + x**2)*np.exp(x)
    +
    +def cost_function_deep(P, x):
    +
    +    # Evaluate the trial function with the current parameters P
    +    g_t = g_trial_deep(x,P)
    +
    +    # Find the derivative w.r.t x of the trial function
    +    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
    +
    +    right_side = f(x)
    +
    +    err_sqr = (-d2_g_t - right_side)**2
    +    cost_sum = np.sum(err_sqr)
    +
    +    return cost_sum/np.size(err_sqr)
    +
    +# The trial solution:
    +def g_trial_deep(x,P):
    +    return x*(1-x)*deep_neural_network(P,x)
    +
    +# The analytic solution;
    +def g_analytic(x):
    +    return x*(1-x)*np.exp(x)
    +
    +if __name__ == '__main__':
    +    npr.seed(4155)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10
    +    x = np.linspace(0,1, Nx)
    +
    +    ## Set up the initial parameters
    +    num_hidden_neurons = [200,100]
    +    num_iter = 1000
    +    lmb = 1e-3
    +
    +    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
    +
    +    g_dnn_ag = g_trial_deep(x,P)
    +    g_analytical = g_analytic(x)
    +
    +    # Find the maximum absolute difference between the solutons:
    +
    +    plt.figure(figsize=(10,10))
    +
    +    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
    +    plt.plot(x, g_analytical)
    +    plt.plot(x, g_dnn_ag[0,:])
    +    plt.legend(['analytical','nn'])
    +    plt.xlabel('x')
    +    plt.ylabel('g(x)')
    +
    +    ## Perform the computation using the numerical scheme
    +
    +    dx = 1/(Nx - 1)
    +
    +    # Set up the matrix A
    +    A = np.zeros((Nx-2,Nx-2))
    +
    +    A[0,0] = 2
    +    A[0,1] = -1
    +
    +    for i in range(1,Nx-3):
    +        A[i,i-1] = -1
    +        A[i,i] = 2
    +        A[i,i+1] = -1
    +
    +    A[Nx - 3, Nx - 4] = -1
    +    A[Nx - 3, Nx - 3] = 2
    +
    +    # Set up the vector f
    +    f_vec = dx**2 * f(x[1:-1])
    +
    +    # Solve the equation
    +    g_res = np.linalg.solve(A,f_vec)
    +
    +    g_vec = np.zeros(Nx)
    +    g_vec[1:-1] = g_res
    +
    +    # Print the differences between each method
    +    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
    +    max_diff2 = np.max(np.abs(g_vec - g_analytical))
    +    print("The max absolute difference between the analytical solution and DNN Autograd: %g"%max_diff1)
    +    print("The max absolute difference between the analytical solution and numerical scheme: %g"%max_diff2)
    +
    +    # Plot the results
    +    plt.figure(figsize=(10,10))
    +
    +    plt.plot(x,g_vec)
    +    plt.plot(x,g_analytical)
    +    plt.plot(x,g_dnn_ag[0,:])
    +
    +    plt.legend(['numerical scheme','analytical','dnn'])
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Partial Differential Equations#

    +

    A partial differential equation (PDE) has a solution here the function +is defined by multiple variables. The equation may involve all kinds +of combinations of which variables the function is differentiated with +respect to.

    +

    In general, a partial differential equation for a function \(g(x_1,\dots,x_N)\) with \(N\) variables may be expressed as

    + +
    +
    +\[ +\begin{equation} \label{PDE} \tag{17} + f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0 +\end{equation} +\]
    +

    where \(f\) is an expression involving all kinds of possible mixed derivatives of \(g(x_1,\dots,x_N)\) up to an order \(n\). In order for the solution to be unique, some additional conditions must also be given.

    +
    +
    +

    Type of problem#

    +

    The problem our network must solve for, is similar to the ODE case. +We must have a trial solution \(g_t\) at hand.

    +

    For instance, the trial solution could be expressed as

    +
    +\[ +\begin{align*} + g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) +\end{align*} +\]
    +

    where \(h_1(x_1,\dots,x_N)\) is a function that ensures \(g_t(x_1,\dots,x_N)\) satisfies some given conditions. +The neural network \(N(x_1,\dots,x_N,P)\) has weights and biases described by \(P\) and \(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\) is an expression using the output from the neural network in some way.

    +

    The role of the function \(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\), is to ensure that the output of \(N(x_1,\dots,x_N,P)\) is zero when \(g_t(x_1,\dots,x_N)\) is evaluated at the values of \(x_1,\dots,x_N\) where the given conditions must be satisfied. The function \(h_1(x_1,\dots,x_N)\) should alone make \(g_t(x_1,\dots,x_N)\) satisfy the conditions.

    +
    +
    +

    Network requirements#

    +

    The network tries then the minimize the cost function following the +same ideas as described for the ODE case, but now with more than one +variables to consider. The concept still remains the same; find a set +of parameters \(P\) such that the expression \(f\) in (17) is as +close to zero as possible.

    +

    As for the ODE case, the cost function is the mean squared error that +the network must try to minimize. The cost function for the network to +minimize is

    +
    +\[ +C\left(x_1, \dots, x_N, P\right) = \left( f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2 +\]
    +
    +
    +

    More details#

    +

    If we let \(\boldsymbol{x} = \big( x_1, \dots, x_N \big)\) be an array containing the values for \(x_1, \dots, x_N\) respectively, the cost function can be reformulated into the following:

    +
    +\[ +C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2 +\]
    +

    If we also have \(M\) different sets of values for \(x_1, \dots, x_N\), that is \(\boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big)\) for \(i = 1,\dots,M\) being the rows in matrix \(X\), the cost function can be generalized into

    +
    +\[ +C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2. +\]
    +
    +
    +

    Example: The diffusion equation#

    +

    In one spatial dimension, the equation reads

    +
    +\[ +\frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2} +\]
    +

    where a possible choice of conditions are

    +
    +\[\begin{split} +\begin{align*} +g(0,t) &= 0 ,\qquad t \geq 0 \\ +g(1,t) &= 0, \qquad t \geq 0 \\ +g(x,0) &= u(x),\qquad x\in [0,1] +\end{align*} +\end{split}\]
    +

    with \(u(x)\) being some given function.

    +
    +
    +

    Defining the problem#

    +

    For this case, we want to find \(g(x,t)\) such that

    + +
    +
    +\[ +\begin{equation} + \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2} +\end{equation} \label{diffonedim} \tag{18} +\]
    +

    and

    +
    +\[\begin{split} +\begin{align*} +g(0,t) &= 0 ,\qquad t \geq 0 \\ +g(1,t) &= 0, \qquad t \geq 0 \\ +g(x,0) &= u(x),\qquad x\in [0,1] +\end{align*} +\end{split}\]
    +

    with \(u(x) = \sin(\pi x)\).

    +

    First, let us set up the deep neural network. +The deep neural network will follow the same structure as discussed in the examples solving the ODEs. +First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.

    +
    +
    +

    Setting up the network using Autograd#

    +

    The only change to do here, is to extend our network such that +functions of multiple parameters are correctly handled. In this case +we have two variables in our function to solve for, that is time \(t\) +and position \(x\). The variables will be represented by a +one-dimensional array in the program. The program will evaluate the +network at each possible pair \((x,t)\), given an array for the desired +\(x\)-values and \(t\)-values to approximate the solution at.

    +
    +
    +
    def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # x is now a point and a 1D numpy array; make it a column vector
    +    num_coordinates = np.size(x,0)
    +    x = x.reshape(num_coordinates,-1)
    +
    +    num_points = np.size(x,1)
    +
    +    # N_hidden is the number of hidden layers
    +    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output[0][0]
    +
    +
    +
    +
    +
    +
    +

    Setting up the network using Autograd; The trial solution#

    +

    The cost function must then iterate through the given arrays +containing values for \(x\) and \(t\), defines a point \((x,t)\) the deep +neural network and the trial solution is evaluated at, and then finds +the Jacobian of the trial solution.

    +

    A possible trial solution for this PDE is

    +
    +\[ +g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P) +\]
    +

    with \(h_1(x,t)\) being a function ensuring that \(g_t(x,t)\) satisfies our given conditions, and \(N(x,t,P)\) being the output from the deep neural network using weights and biases for each layer from \(P\).

    +

    To fulfill the conditions, \(h_1(x,t)\) could be:

    +
    +\[ +h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x) +\]
    +

    since \((0) = u(1) = 0\) and \(u(x) = \sin(\pi x)\).

    +
    +
    +

    Why the Jacobian?#

    +

    The Jacobian is used because the program must find the derivative of +the trial solution with respect to \(x\) and \(t\).

    +

    This gives the necessity of computing the Jacobian matrix, as we want +to evaluate the gradient with respect to \(x\) and \(t\) (note that the +Jacobian of a scalar-valued multivariate function is simply its +gradient).

    +

    In Autograd, the differentiation is by default done with respect to +the first input argument of your Python function. Since the points is +an array representing \(x\) and \(t\), the Jacobian is calculated using +the values of \(x\) and \(t\).

    +

    To find the second derivative with respect to \(x\) and \(t\), the +Jacobian can be found for the second time. The result is a Hessian +matrix, which is the matrix containing all the possible second order +mixed derivatives of \(g(x,t)\).

    +
    +
    +
    # Set up the trial function:
    +def u(x):
    +    return np.sin(np.pi*x)
    +
    +def g_trial(point,P):
    +    x,t = point
    +    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
    +
    +# The right side of the ODE:
    +def f(point):
    +    return 0.
    +
    +# The cost function:
    +def cost_function(P, x, t):
    +    cost_sum = 0
    +
    +    g_t_jacobian_func = jacobian(g_trial)
    +    g_t_hessian_func = hessian(g_trial)
    +
    +    for x_ in x:
    +        for t_ in t:
    +            point = np.array([x_,t_])
    +
    +            g_t = g_trial(point,P)
    +            g_t_jacobian = g_t_jacobian_func(point,P)
    +            g_t_hessian = g_t_hessian_func(point,P)
    +
    +            g_t_dt = g_t_jacobian[1]
    +            g_t_d2x = g_t_hessian[0][0]
    +
    +            func = f(point)
    +
    +            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
    +            cost_sum += err_sqr
    +
    +    return cost_sum
    +
    +
    +
    +
    +
    +
    +

    Setting up the network using Autograd; The full program#

    +

    Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.

    +

    The analytical solution of our problem is

    +
    +\[ +g(x,t) = \exp(-\pi^2 t)\sin(\pi x) +\]
    +

    A possible way to implement a neural network solving the PDE, is given below. +Be aware, though, that it is fairly slow for the parameters used. +A better result is possible, but requires more iterations, and thus longer time to complete.

    +

    Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE. +Using TensorFlow results in a much better execution time. Try it!

    +
    +
    +
    import autograd.numpy as np
    +from autograd import jacobian,hessian,grad
    +import autograd.numpy.random as npr
    +from matplotlib import cm
    +from matplotlib import pyplot as plt
    +from mpl_toolkits.mplot3d import axes3d
    +
    +## Set up the network
    +
    +def sigmoid(z):
    +    return 1/(1 + np.exp(-z))
    +
    +def deep_neural_network(deep_params, x):
    +    # x is now a point and a 1D numpy array; make it a column vector
    +    num_coordinates = np.size(x,0)
    +    x = x.reshape(num_coordinates,-1)
    +
    +    num_points = np.size(x,1)
    +
    +    # N_hidden is the number of hidden layers
    +    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
    +
    +    # Assume that the input layer does nothing to the input x
    +    x_input = x
    +    x_prev = x_input
    +
    +    ## Hidden layers:
    +
    +    for l in range(N_hidden):
    +        # From the list of parameters P; find the correct weigths and bias for this layer
    +        w_hidden = deep_params[l]
    +
    +        # Add a row of ones to include bias
    +        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
    +
    +        z_hidden = np.matmul(w_hidden, x_prev)
    +        x_hidden = sigmoid(z_hidden)
    +
    +        # Update x_prev such that next layer can use the output from this layer
    +        x_prev = x_hidden
    +
    +    ## Output layer:
    +
    +    # Get the weights and bias for this layer
    +    w_output = deep_params[-1]
    +
    +    # Include bias:
    +    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
    +
    +    z_output = np.matmul(w_output, x_prev)
    +    x_output = z_output
    +
    +    return x_output[0][0]
    +
    +## Define the trial solution and cost function
    +def u(x):
    +    return np.sin(np.pi*x)
    +
    +def g_trial(point,P):
    +    x,t = point
    +    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
    +
    +# The right side of the ODE:
    +def f(point):
    +    return 0.
    +
    +# The cost function:
    +def cost_function(P, x, t):
    +    cost_sum = 0
    +
    +    g_t_jacobian_func = jacobian(g_trial)
    +    g_t_hessian_func = hessian(g_trial)
    +
    +    for x_ in x:
    +        for t_ in t:
    +            point = np.array([x_,t_])
    +
    +            g_t = g_trial(point,P)
    +            g_t_jacobian = g_t_jacobian_func(point,P)
    +            g_t_hessian = g_t_hessian_func(point,P)
    +
    +            g_t_dt = g_t_jacobian[1]
    +            g_t_d2x = g_t_hessian[0][0]
    +
    +            func = f(point)
    +
    +            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
    +            cost_sum += err_sqr
    +
    +    return cost_sum /( np.size(x)*np.size(t) )
    +
    +## For comparison, define the analytical solution
    +def g_analytic(point):
    +    x,t = point
    +    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)
    +
    +## Set up a function for training the network to solve for the equation
    +def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
    +    ## Set up initial weigths and biases
    +    N_hidden = np.size(num_neurons)
    +
    +    ## Set up initial weigths and biases
    +
    +    # Initialize the list of parameters:
    +    P = [None]*(N_hidden + 1) # + 1 to include the output layer
    +
    +    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
    +    for l in range(1,N_hidden):
    +        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
    +
    +    # For the output layer
    +    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
    +
    +    print('Initial cost: ',cost_function(P, x, t))
    +
    +    cost_function_grad = grad(cost_function,0)
    +
    +    # Let the update be done num_iter times
    +    for i in range(num_iter):
    +        cost_grad =  cost_function_grad(P, x , t)
    +
    +        for l in range(N_hidden+1):
    +            P[l] = P[l] - lmb * cost_grad[l]
    +
    +    print('Final cost: ',cost_function(P, x, t))
    +
    +    return P
    +
    +if __name__ == '__main__':
    +    ### Use the neural network:
    +    npr.seed(15)
    +
    +    ## Decide the vales of arguments to the function to solve
    +    Nx = 10; Nt = 10
    +    x = np.linspace(0, 1, Nx)
    +    t = np.linspace(0,1,Nt)
    +
    +    ## Set up the parameters for the network
    +    num_hidden_neurons = [100, 25]
    +    num_iter = 250
    +    lmb = 0.01
    +
    +    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
    +
    +    ## Store the results
    +    g_dnn_ag = np.zeros((Nx, Nt))
    +    G_analytical = np.zeros((Nx, Nt))
    +    for i,x_ in enumerate(x):
    +        for j, t_ in enumerate(t):
    +            point = np.array([x_, t_])
    +            g_dnn_ag[i,j] = g_trial(point,P)
    +
    +            G_analytical[i,j] = g_analytic(point)
    +
    +    # Find the map difference between the analytical and the computed solution
    +    diff_ag = np.abs(g_dnn_ag - G_analytical)
    +    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))
    +
    +    ## Plot the solutions in two dimensions, that being in position and time
    +
    +    T,X = np.meshgrid(t,x)
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))
    +    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Analytical solution')
    +    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +    fig = plt.figure(figsize=(10,10))
    +    ax = fig.add_suplot(projection='3d')
    +    ax.set_title('Difference')
    +    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
    +    ax.set_xlabel('Time $t$')
    +    ax.set_ylabel('Position $x$');
    +
    +    ## Take some slices of the 3D plots just to see the solutions at particular times
    +    indx1 = 0
    +    indx2 = int(Nt/2)
    +    indx3 = Nt-1
    +
    +    t1 = t[indx1]
    +    t2 = t[indx2]
    +    t3 = t[indx3]
    +
    +    # Slice the results from the DNN
    +    res1 = g_dnn_ag[:,indx1]
    +    res2 = g_dnn_ag[:,indx2]
    +    res3 = g_dnn_ag[:,indx3]
    +
    +    # Slice the analytical results
    +    res_analytical1 = G_analytical[:,indx1]
    +    res_analytical2 = G_analytical[:,indx2]
    +    res_analytical3 = G_analytical[:,indx3]
    +
    +    # Plot the slices
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t1)
    +    plt.plot(x, res1)
    +    plt.plot(x,res_analytical1)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t2)
    +    plt.plot(x, res2)
    +    plt.plot(x,res_analytical2)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.figure(figsize=(10,10))
    +    plt.title("Computed solutions at time = %g"%t3)
    +    plt.plot(x, res3)
    +    plt.plot(x,res_analytical3)
    +    plt.legend(['dnn','analytical'])
    +
    +    plt.show()
    +
    +
    +
    +
    +
    +
    +

    Resources on differential equations and deep learning#

    +
      +
    1. Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al

    2. +
    3. Neural networks for solving differential equations by A. Honchar

    4. +
    5. Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener

    6. +
    7. Introduction to Partial Differential Equations by A. Tveito, R. Winther

    8. +
    +
    +
    +

    Convolutional Neural Networks (recognizing images)#

    +

    Convolutional neural networks (CNNs) were developed during the last +decade of the previous century, with a focus on character recognition +tasks. Nowadays, CNNs are a central element in the spectacular success +of deep learning methods. The success in for example image +classifications have made them a central tool for most machine +learning practitioners.

    +

    CNNs are very similar to ordinary Neural Networks. +They are made up of neurons that have learnable weights and +biases. Each neuron receives some inputs, performs a dot product and +optionally follows it with a non-linearity. The whole network still +expresses a single differentiable score function: from the raw image +pixels on one end to class scores at the other. And they still have a +loss function (for example Softmax) on the last (fully-connected) layer +and all the tips/tricks we developed for learning regular Neural +Networks still apply (back propagation, gradient descent etc etc).

    +
    +
    +

    What is the Difference#

    +

    CNN architectures make the explicit assumption that +the inputs are images, which allows us to encode certain properties +into the architecture. These then make the forward function more +efficient to implement and vastly reduce the amount of parameters in +the network.

    +
    +
    +

    Neural Networks vs CNNs#

    +

    Neural networks are defined as affine transformations, that is +a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an +output (to which a bias vector is usually added before passing the result +through a nonlinear activation function). This is applicable to any type of input, be it an +image, a sound clip or an unordered collection of features: whatever their +dimensionality, their representation can always be flattened into a vector +before the transformation.

    +
    +
    +

    Why CNNS for images, sound files, medical images from CT scans etc?#

    +

    However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic +structure. More formally, they share these important properties:

    +
      +
    • They are stored as multi-dimensional arrays (think of the pixels of a figure) .

    • +
    • They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).

    • +
    • One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).

    • +
    +

    These properties are not exploited when an affine transformation is applied; in +fact, all the axes are treated in the same way and the topological information +is not taken into account. Still, taking advantage of the implicit structure of +the data may prove very handy in solving some tasks, like computer vision and +speech recognition, and in these cases it would be best to preserve it. This is +where discrete convolutions come into play.

    +

    A discrete convolution is a linear transformation that preserves this notion of +ordering. It is sparse (only a few input units contribute to a given output +unit) and reuses parameters (the same weights are applied to multiple locations +in the input).

    +
    +
    +

    Regular NNs don’t scale well to full images#

    +

    As an example, consider +an image of size \(32\times 32\times 3\) (32 wide, 32 high, 3 color channels), so a +single fully-connected neuron in a first hidden layer of a regular +Neural Network would have \(32\times 32\times 3 = 3072\) weights. This amount still +seems manageable, but clearly this fully-connected structure does not +scale to larger images. For example, an image of more respectable +size, say \(200\times 200\times 3\), would lead to neurons that have +\(200\times 200\times 3 = 120,000\) weights.

    +

    We could have +several such neurons, and the parameters would add up quickly! Clearly, +this full connectivity is wasteful and the huge number of parameters +would quickly lead to possible overfitting.

    + + +

    Figure 1: A regular 3-layer Neural Network.

    +
    +
    +

    3D volumes of neurons#

    +

    Convolutional Neural Networks take advantage of the fact that the +input consists of images and they constrain the architecture in a more +sensible way.

    +

    In particular, unlike a regular Neural Network, the +layers of a CNN have neurons arranged in 3 dimensions: width, +height, depth. (Note that the word depth here refers to the third +dimension of an activation volume, not to the depth of a full Neural +Network, which can refer to the total number of layers in a network.)

    +

    To understand it better, the above example of an image +with an input volume of +activations has dimensions \(32\times 32\times 3\) (width, height, +depth respectively).

    +

    The neurons in a layer will +only be connected to a small region of the layer before it, instead of +all of the neurons in a fully-connected manner. Moreover, the final +output layer could for this specific image have dimensions \(1\times 1 \times 10\), +because by the +end of the CNN architecture we will reduce the full image into a +single vector of class scores, arranged along the depth +dimension.

    + + +

    Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

    +
    +
    +

    More on Dimensionalities#

    +

    In fields like signal processing (and imaging as well), one designs +so-called filters. These filters are defined by the convolutions and +are often hand-crafted. One may specify filters for smoothing, edge +detection, frequency reshaping, and similar operations. However with +neural networks the idea is to automatically learn the filters and use +many of them in conjunction with non-linear operations (activation +functions).

    +

    As an example consider a neural network operating on sound sequence +data. Assume that we an input vector \(\boldsymbol{x}\) of length \(d=10^6\). We +construct then a neural network with onle hidden layer only with +\(10^4\) nodes. This means that we will have a weight matrix with +\(10^4\times 10^6=10^{10}\) weights to be determined, together with \(10^4\) biases.

    +

    Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false). +It means that we have only one output node. But since this output node connects to \(10^4\) nodes in the hidden layer, there are in total \(10^4\) weights to be determined for the output layer, plus one bias. In total we have

    +
    +\[ +\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10}, +\]
    +

    that is ten billion parameters to determine.

    +
    +
    +

    Further remarks#

    +

    The main principles that justify convolutions is locality of +information and repetion of patterns within the signal. Sound samples +of the input in adjacent spots are much more likely to affect each +other than those that are very far away. Similarly, sounds are +repeated in multiple times in the signal. While slightly simplistic, +reasoning about such a sound example demonstrates this. The same +principles then apply to images and other similar data.

    +
    +
    +

    Layers used to build CNNs#

    +

    A simple CNN is a sequence of layers, and every layer of a CNN +transforms one volume of activations to another through a +differentiable function. We use three main types of layers to build +CNN architectures: Convolutional Layer, Pooling Layer, and +Fully-Connected Layer (exactly as seen in regular Neural Networks). We +will stack these layers to form a full CNN architecture.

    +

    A simple CNN for image classification could have the architecture:

    +
      +
    • INPUT (\(32\times 32 \times 3\)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.

    • +
    • CONV (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \([32\times 32\times 12]\) if we decided to use 12 filters.

    • +
    • RELU layer will apply an elementwise activation function, such as the \(max(0,x)\) thresholding at zero. This leaves the size of the volume unchanged (\([32\times 32\times 12]\)).

    • +
    • POOL (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \([16\times 16\times 12]\).

    • +
    • FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \([1\times 1\times 10]\), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

    • +
    +
    +
    +

    Transforming images#

    +

    CNNs transform the original image layer by layer from the original +pixel values to the final class scores.

    +

    Observe that some layers contain +parameters and other don’t. In particular, the CNN layers perform +transformations that are a function of not only the activations in the +input volume, but also of the parameters (the weights and biases of +the neurons). On the other hand, the RELU/POOL layers will implement a +fixed function. The parameters in the CONV/FC layers will be trained +with gradient descent so that the class scores that the CNN computes +are consistent with the labels in the training set for each image.

    +
    +
    +

    CNNs in brief#

    +

    In summary:

    +
      +
    • A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)

    • +
    • There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)

    • +
    • Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function

    • +
    • Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)

    • +
    • Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)

    • +
    +
    +
    +

    A deep CNN model (From Raschka et al)#

    + + +

    Figure 1: A deep CNN

    +
    +
    +

    Key Idea#

    +

    A dense neural network is representd by an affine operation (like +matrix-matrix multiplication) where all parameters are included.

    +

    The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect +only neighboring neurons in the input instead of connecting all with the first hidden layer.

    +

    We say we perform a filtering (convolution is the mathematical operation).

    +
    +
    +

    How to do image compression before the era of deep learning#

    +

    The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images. +The lectures on the SVD give many of the essential details concerning the SVD.

    +

    The orthogonal vectors which are obtained from the SVD, can be used to +project down the dimensionality of a given image. In the example here +we gray-scale an image and downsize it.

    +

    This recipe relies on us being able to actually perform the SVD. For +large images, and in particular with many images to reconstruct, using the SVD +may quickly become an overwhelming task. With the advent of efficient deep +learning methods like CNNs and later generative methods, these methods +have become in the last years the premier way of performing image +analysis. In particular for classification problems with labelled images.

    +
    +
    +

    The SVD example#

    +
    +
    +
    from matplotlib.image import imread
    +import matplotlib.pyplot as plt
    +import scipy.linalg as ln
    +import numpy as np
    +import os
    +from PIL import Image
    +from math import log10, sqrt 
    +plt.rcParams['figure.figsize'] = [16, 8]
    +# Import image
    +A = imread(os.path.join("figslides/photo1.jpg"))
    +X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale
    +img = plt.imshow(X)
    +# convert to gray
    +img.set_cmap('gray')
    +plt.axis('off')
    +plt.show()
    +# Call image size
    +print(': %s'%str(X.shape))
    +
    +
    +# split the matrix into U, S, VT
    +U, S, VT = np.linalg.svd(X,full_matrices=False)
    +S = np.diag(S)
    +m = 800 # Image's width
    +n = 1200 # Image's height
    +j = 0
    +# Try compression with different k vectors (these represent projections):
    +for k in (5,10, 20, 100,200,400,500):
    +    # Original size of the image
    +    originalSize = m * n 
    +    # Size after compressed
    +    compressedSize = k * (1 + m + n) 
    +    # The projection of the original image
    +    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]
    +    plt.figure(j+1)
    +    j += 1
    +    img = plt.imshow(Xapprox)
    +    img.set_cmap('gray')
    +    
    +    plt.axis('off')
    +    plt.title('k = ' + str(k))
    +    plt.show() 
    +    print('Original size of image:')
    +    print(originalSize)
    +    print('Compression rate as Compressed image / Original size:')
    +    ratio = compressedSize * 1.0 / originalSize
    +    print(ratio)
    +    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  
    +    # Estimate MQA
    +    x= X.astype("float")
    +    y=Xapprox.astype("float")
    +    err = np.sum((x - y) ** 2)
    +    err /= float(X.shape[0] * Xapprox.shape[1])
    +    print('The mean-square deviation '+ str(round( err)))
    +    max_pixel = 255.0
    +    # Estimate Signal Noise Ratio
    +    srv = 20 * (log10(max_pixel / sqrt(err)))
    +    print('Signa to noise ratio '+ str(round(srv)) +'dB')
    +
    +
    +
    +
    +
    +
    +

    Mathematics of CNNs#

    +

    The mathematics of CNNs is based on the mathematical operation of +convolution. In mathematics (in particular in functional analysis), +convolution is represented by mathematical operations (integration, +summation etc) on two functions in order to produce a third function +that expresses how the shape of one gets modified by the other. +Convolution has a plethora of applications in a variety of +disciplines, spanning from statistics to signal processing, computer +vision, solutions of differential equations,linear algebra, +engineering, and yes, machine learning.

    +

    Mathematically, convolution is defined as follows (one-dimensional example): +Let us define a continuous function \(y(t)\) given by

    +
    +\[ +y(t) = \int x(a) w(t-a) da, +\]
    +

    where \(x(a)\) represents a so-called input and \(w(t-a)\) is normally called the weight function or kernel.

    +

    The above integral is written in a more compact form as

    +
    +\[ +y(t) = \left(x * w\right)(t). +\]
    +

    The discretized version reads

    +
    +\[ +y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a). +\]
    +

    Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.

    +

    How can we use this? And what does it mean? Let us study some familiar examples first.

    +
    +
    +

    Convolution Examples: Polynomial multiplication#

    +

    Our first example is that of a multiplication between two polynomials, +which we will rewrite in terms of the mathematics of convolution. In +the final stage, since the problem here is a discrete one, we will +recast the final expression in terms of a matrix-vector +multiplication, where the matrix is a so-called Toeplitz matrix +.

    +

    Let us look a the following polynomials to second and third order, respectively:

    +
    +\[ +p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2, +\]
    +

    and

    +
    +\[ +s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3. +\]
    +

    The polynomial multiplication gives us a new polynomial of degree \(5\)

    +
    +\[ +z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5. +\]
    +
    +
    +

    Efficient Polynomial Multiplication#

    +

    Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution. +We note first that the new coefficients are given as

    +
    +\[\begin{split} +\begin{split} +\delta_0=&\alpha_0\beta_0\\ +\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\ +\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\ +\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\ +\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\ +\delta_5=&\alpha_2\beta_3.\\ +\end{split} +\end{split}\]
    +

    We note that \(\alpha_i=0\) except for \(i\in \left\{0,1,2\right\}\) and \(\beta_i=0\) except for \(i\in\left\{0,1,2,3\right\}\).

    +

    We can then rewrite the coefficients \(\delta_j\) using a discrete convolution as

    +
    +\[ +\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j, +\]
    +

    or as a double sum with restriction \(l=i+j\)

    +
    +\[ +\delta_l = \sum_{ij}\alpha_i\beta_{j}. +\]
    +
    +
    +

    Further simplification#

    +

    Although we may have redundant operations with some few zeros for \(\beta_i\), we can rewrite the above sum in a more compact way as

    +
    +\[ +\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k}, +\]
    +

    where \(m=3\) in our case, the maximum length of +the vector \(\alpha\). Note that the vector \(\boldsymbol{\beta}\) has length \(n=4\). Below we will find an even more efficient representation.

    +
    +
    +

    A more efficient way of coding the above Convolution#

    +

    Since we only have a finite number of \(\alpha\) and \(\beta\) values +which are non-zero, we can rewrite the above convolution expressions +as a matrix-vector multiplication

    +
    +\[\begin{split} +\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\ + \alpha_1 & \alpha_0 & 0 & 0 \\ + \alpha_2 & \alpha_1 & \alpha_0 & 0 \\ + 0 & \alpha_2 & \alpha_1 & \alpha_0 \\ + 0 & 0 & \alpha_2 & \alpha_1 \\ + 0 & 0 & 0 & \alpha_2 + \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}. +\end{split}\]
    +
    +
    +

    Commutative process#

    +

    The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding \(\beta\) and a vector holding \(\alpha\). +In this case we have

    +
    +\[\begin{split} +\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0 \\ + \beta_1 & \beta_0 & 0 \\ + \beta_2 & \beta_1 & \beta_0 \\ + \beta_3 & \beta_2 & \beta_1 \\ + 0 & \beta_3 & \beta_2 \\ + 0 & 0 & \beta_3 + \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}. +\end{split}\]
    +

    Note that the use of these matrices is for mathematical purposes only +and not implementation purposes. When implementing the above equation +we do not encode (and allocate memory) the matrices explicitely. We +rather code the convolutions in the minimal memory footprint that they +require.

    +
    +
    +

    Toeplitz matrices#

    +

    The above matrices are examples of so-called Toeplitz +matrices. A +Toeplitz matrix is a matrix in which each descending diagonal from +left to right is constant. For instance the last matrix, which we +rewrite as

    +
    +\[\begin{split} +\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0 \\ + a_1 & a_0 & 0 \\ + a_2 & a_1 & a_0 \\ + a_3 & a_2 & a_1 \\ + 0 & a_3 & a_2 \\ + 0 & 0 & a_3 + \end{bmatrix}, +\end{split}\]
    +

    with elements \(a_{ii}=a_{i+1,j+1}=a_{i-j}\) is an example of a Toeplitz +matrix. Such a matrix does not need to be a square matrix. Toeplitz +matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric +polynomial, compressed to a finite-dimensional space, can be +represented by such a matrix. The example above shows that we can +represent linear convolution as multiplication of a Toeplitz matrix by +a vector.

    +
    +
    +

    Fourier series and Toeplitz matrices#

    +

    This is an active and ogoing research area concerning CNNs. The following articles may be of interest

    +
      +
    1. Read more about the convolution theorem and Fouriers series

    2. +
    3. Fourier Transform Layer

    4. +
    +
    +
    +

    Generalizing the above one-dimensional case#

    +

    In order to align the above simple case with the more general +convolution cases, we rename \(\boldsymbol{\alpha}\), whose length is \(m=3\), +with \(\boldsymbol{w}\). We will interpret \(\boldsymbol{w}\) as a weight/filter function +with which we want to perform the convolution with an input variable +\(\boldsymbol{x}\) of length \(n\). We will assume always that the filter +\(\boldsymbol{w}\) has dimensionality \(m \le n\).

    +

    We replace thus \(\boldsymbol{\beta}\) with \(\boldsymbol{x}\) and \(\boldsymbol{\delta}\) with \(\boldsymbol{y}\) and have

    +
    +\[ +y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k), +\]
    +

    where \(m=3\) in our case, the maximum length of the vector \(\boldsymbol{w}\). +Here the symbol \(*\) represents the mathematical operation of convolution.

    +
    +
    +

    Memory considerations#

    +

    This expression leaves us however with some terms with negative +indices, for example \(x(-1)\) and \(x(-2)\) which may not be defined. Our +vector \(\boldsymbol{x}\) has components \(x(0)\), \(x(1)\), \(x(2)\) and \(x(3)\).

    +

    The index \(j\) for \(\boldsymbol{x}\) runs from \(j=0\) to \(j=3\) since \(\boldsymbol{x}\) is meant to +represent a third-order polynomial.

    +

    Furthermore, the index \(i\) runs from \(i=0\) to \(i=5\) since \(\boldsymbol{y}\) +contains the coefficients of a fifth-order polynomial. When \(i=5\) we +may also have values of \(x(4)\) and \(x(5)\) which are not defined.

    +
    +
    +

    Padding#

    +

    The solution to this is what is called padding! We simply define a +new vector \(x\) with two added elements set to zero before \(x(0)\) and +two new elements after \(x(3)\) set to zero. That is, we augment the +length of \(\boldsymbol{x}\) from \(n=4\) to \(n+2P=8\), where \(P=2\) is the padding +constant (a new hyperparameter), see discussions below as well.

    +
    +
    +

    New vector#

    +

    We have a new vector defined as \(x(0)=0\), \(x(1)=0\), +\(x(2)=\beta_0\), \(x(3)=\beta_1\), \(x(4)=\beta_2\), \(x(5)=\beta_3\), +\(x(6)=0\), and \(x(7)=0\).

    +

    We have added four new elements, which +are all zero. The benefit is that we can rewrite the equation for +\(\boldsymbol{y}\), with \(i=0,1,\dots,5\),

    +
    +\[ +y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k). +\]
    +

    As an example, we have

    +
    +\[ +y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2, +\]
    +

    as before except that we have an additional term \(x(6)w(0)\), which is zero.

    +

    Similarly, for the fifth-order term we have

    +
    +\[ +y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2. +\]
    +

    The zeroth-order term is

    +
    +\[ +y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0. +\]
    +
    +
    +

    Rewriting as dot products#

    +

    If we now flip the filter/weight vector, with the following term as a typical example

    +
    +\[ +y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0), +\]
    +

    with \(\tilde{w}(0)=w(2)\), \(\tilde{w}(1)=w(1)\), and \(\tilde{w}(2)=w(0)\), we can then rewrite the above sum as a dot product of +\(x(i:i+(m-1))\tilde{w}\) for element \(y(i)\), where \(x(i:i+(m-1))\) is simply a patch of \(\boldsymbol{x}\) of size \(m-1\).

    +

    The padding \(P\) we have introduced for the convolution stage is just +another hyperparameter which is introduced as part of the +architecture. Similarly, below we will also introduce another +hyperparameter called Stride \(S\).

    +
    +
    +

    Cross correlation#

    +

    In essentially all applications one uses what is called cross correlation instead of the standard convolution described above. +This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)

    +
    +\[ +y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k), +\]
    +

    we have now

    +
    +\[ +y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k). +\]
    +

    Both TensorFlow and PyTorch (as well as our own code example below), +implement the last equation, although it is normally referred to as +convolution. The same padding rules and stride rules discussed below +apply to this expression as well.

    +

    We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.

    +
    +
    +

    Two-dimensional objects#

    +

    We are now ready to start studying the discrete convolutions relevant for convolutional neural networks. +We often use convolutions over more than one dimension at a time. If +we have a two-dimensional image \(X\) as input, we can have a filter +defined by a two-dimensional kernel/weight/filter \(W\). This leads to an output \(Y\)

    +
    +\[ +Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n). +\]
    +

    Convolution is a commutative process, which means we can rewrite this equation as

    +
    +\[ +Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n). +\]
    +

    Normally the latter is more straightforward to implement in a machine +larning library since there is less variation in the range of values +of \(m\) and \(n\).

    +

    As mentioned above, most deep learning libraries implement +cross-correlation instead of convolution (although it is referred to as +convolution)

    +
    +\[ +Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n). +\]
    +
    +
    +

    CNNs in more detail, simple example#

    +

    Let assume we have an input matrix \(X\) of dimensionality \(3\times 3\) +and a \(2\times 2\) filter \(W\) given by the following matrices

    +
    +\[\begin{split} +\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02} \\ + x_{10} & x_{11} & x_{12} \\ + x_{20} & x_{21} & x_{22} \end{bmatrix}, +\end{split}\]
    +

    and

    +
    +\[\begin{split} +\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\ + w_{10} & w_{11}\end{bmatrix}. +\end{split}\]
    +

    We introduce now the hyperparameter \(S\) stride. Stride represents how the filter \(W\) moves the convolution process on the matrix \(X\). +We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin

    +

    Here we set the stride equal to \(S=1\), which means that, starting with the element \(x_{00}\), the filter will act on \(2\times 2\) submatrices each time, starting with the upper corner and moving according to the stride value column by column.

    +

    Here we perform the operation

    +
    +\[ +Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n), +\]
    +

    and obtain

    +
    +\[\begin{split} +\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\ + x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}. +\end{split}\]
    +

    We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \(\boldsymbol{X}'\) of length \(9\) and +a matrix \(\boldsymbol{W}'\) with dimension \(4\times 9\) as

    +
    +\[\begin{split} +\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix}, +\end{split}\]
    +

    and the new matrix

    +
    +\[\begin{split} +\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\ + 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\ + 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\ + 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}. +\end{split}\]
    +

    We see easily that performing the matrix-vector multiplication \(\boldsymbol{W}'\boldsymbol{X}'\) is the same as the above convolution with stride \(S=1\), that is

    +
    +\[ +Y=(\boldsymbol{W}*\boldsymbol{X}), +\]
    +

    is now given by \(\boldsymbol{W}'\boldsymbol{X}'\) which is a vector of length \(4\) instead of the originally resulting \(2\times 2\) output matrix.

    +
    +
    +

    The convolution stage#

    +

    The convolution stage, where we apply different filters \(\boldsymbol{W}\) in +order to reduce the dimensionality of an image, adds, in addition to +the weights and biases (to be trained by the back propagation +algorithm) that define the filters, two new hyperparameters, the so-called +padding \(P\) and the stride \(S\).

    +
    +
    +

    Finding the number of parameters#

    +

    In the above example we have an input matrix of dimension \(3\times +3\). In general we call the input for an input volume and it is defined +by its width \(H_1\), height \(H_1\) and depth \(D_1\). If we have the +standard three color channels \(D_1=3\).

    +

    The above example has \(W_1=H_1=3\) and \(D_1=1\).

    +

    When we introduce the filter we have the following additional hyperparameters

    +
      +
    1. \(K\) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well

    2. +
    3. \(F\) as the filter’s spatial extent

    4. +
    5. \(S\) as the stride parameter

    6. +
    7. \(P\) as the padding parameter

    8. +
    +

    These parameters are defined by the architecture of the network and are not included in the training.

    +
    +
    +

    New image (or volume)#

    +

    Acting with the filter on the input volume produces an output volume +which is defined by its width \(W_2\), its height \(H_2\) and its depth +\(D_2\).

    +

    These are defined by the following relations

    +
    +\[ +W_2 = \frac{(W_1-F+2P)}{S}+1, +\]
    +
    +\[ +H_2 = \frac{(H_1-F+2P)}{S}+1, +\]
    +

    and \(D_2=K\).

    +
    +
    +

    Parameters to train, common settings#

    +

    With parameter sharing, the convolution involves thus for each filter \(F\times F\times D_1\) weights plus one bias parameter.

    +

    In total we have

    +
    +\[ +\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}), +\]
    +

    parameters to train by back propagation.

    +

    It is common to let \(K\) come in powers of \(2\), that is \(32\), \(64\), \(128\) etc.

    +

    Common settings.

    +
      +
    1. \(\begin{array}{c} F=3 & S=1 & P=1 \end{array}\)

    2. +
    3. \(\begin{array}{c} F=5 & S=1 & P=2 \end{array}\)

    4. +
    5. \(\begin{array}{c} F=5 & S=2 & P=\mathrm{open} \end{array}\)

    6. +
    7. \(\begin{array}{c} F=1 & S=1 & P=0 \end{array}\)

    8. +
    +
    +
    +

    Examples of CNN setups#

    +

    Let us assume we have an input volume \(V\) given by an image of dimensionality +\(32\times 32 \times 3\), that is three color channels and \(32\times 32\) pixels.

    +

    We apply a filter of dimension \(5\times 5\) ten times with stride \(S=1\) and padding \(P=0\).

    +

    The output volume is given by \((32-5)/1+1=28\), resulting in ten images +of dimensionality \(28\times 28\times 3\).

    +

    The total number of parameters to train for each filter is then +\(5\times 5\times 3+1\), where the last parameter is the bias. This +gives us \(76\) parameters for each filter, leading to a total of \(760\) +parameters for the ten filters.

    +

    How many parameters will a filter of dimensionality \(3\times 3\) +(adding color channels) result in if we produce \(32\) new images? Use \(S=1\) and \(P=0\).

    +

    Note that strides constitute a form of subsampling. As an alternative to +being interpreted as a measure of how much the kernel/filter is translated, strides +can also be viewed as how much of the output is retained. For instance, moving +the kernel by hops of two is equivalent to moving the kernel by hops of one but +retaining only odd output elements.

    +
    +
    +

    Summarizing: Performing a general discrete convolution (From Raschka et al)#

    + + +

    Figure 1: A deep CNN

    +
    +
    +

    Pooling#

    +

    In addition to discrete convolutions themselves, pooling operations +make up another important building block in CNNs. Pooling operations reduce +the size of feature maps by using some function to summarize subregions, such +as taking the average or the maximum value.

    +

    Pooling works by sliding a window across the input and feeding the content of +the window to a pooling function. In some sense, pooling works very much +like a discrete convolution, but replaces the linear combination described by +the kernel with some other function.

    +
    +
    +

    Pooling arithmetic#

    +

    In a neural network, pooling layers provide invariance to small translations of +the input. The most common kind of pooling is max pooling, which +consists in splitting the input in (usually non-overlapping) patches and +outputting the maximum value of each patch. Other kinds of pooling exist, e.g., +mean or average pooling, which all share the same idea of aggregating the input +locally by applying a non-linearity to the content of some patches.

    +
    +
    +

    Pooling types (From Raschka et al)#

    + + +

    Figure 1: A deep CNN

    +
    +
    +

    Building convolutional neural networks in Tensorflow/Keras and PyTorch#

    +

    As discussed above, CNNs are neural networks built from the assumption +that the inputs to the network are 2D images. This is important +because the number of features or pixels in images grows very fast +with the image size, and an enormous number of weights and biases are +needed in order to build an accurate network. Next week we will +discuss in more detail how we can build a CNN using either TensorFlow +with Keras and PyTorch.

    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/html/week45.html b/doc/LectureNotes/_build/html/week45.html new file mode 100644 index 000000000..463600da6 --- /dev/null +++ b/doc/LectureNotes/_build/html/week45.html @@ -0,0 +1,1800 @@ + + + + + + + + + + + Week 45, Convolutional Neural Networks (CCNs) — Applied Data Analysis and Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + +
    + +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Week 45, Convolutional Neural Networks (CCNs)

    + +
    +
    + +
    +

    Contents

    +
    + +
    +
    +
    + + + + +
    + + +
    +

    Week 45, Convolutional Neural Networks (CCNs)#

    +

    Morten Hjorth-Jensen, Department of Physics, University of Oslo

    +

    Date: November 3-7, 2025

    +
    +

    Plans for week 45#

    +

    Material for the lecture on Monday November 3, 2025.

    +
      +
    1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)

    2. +
    3. Readings and Videos:

    4. +
    5. These lecture notes at CompPhysics/MachineLearning

    6. +
    7. Video of lecture at https://youtu.be/dZt6Vm1wjhs

    8. +
    9. Whiteboard notes at CompPhysics/MachineLearning

    10. +
    11. For a more in depth discussion on CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications

    12. +
    13. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at rasbt/machine-learning-book.

    14. +
    + +

    a. Video on Deep Learning at https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

    +
    +
    +

    Material for the lab sessions#

    +

    Discussion of and work on project 2, no exercises this week, only project work

    +
    +
    +

    Material for Lecture Monday November 3#

    +
    +
    +

    Convolutional Neural Networks (recognizing images), reminder from last week#

    +

    Convolutional neural networks (CNNs) were developed during the last +decade of the previous century, with a focus on character recognition +tasks. Nowadays, CNNs are a central element in the spectacular success +of deep learning methods. The success in for example image +classifications have made them a central tool for most machine +learning practitioners.

    +

    CNNs are very similar to ordinary Neural Networks. +They are made up of neurons that have learnable weights and +biases. Each neuron receives some inputs, performs a dot product and +optionally follows it with a non-linearity. The whole network still +expresses a single differentiable score function: from the raw image +pixels on one end to class scores at the other. And they still have a +loss function (for example Softmax) on the last (fully-connected) layer +and all the tips/tricks we developed for learning regular Neural +Networks still apply (back propagation, gradient descent etc etc).

    +
    +
    +

    What is the Difference#

    +

    CNN architectures make the explicit assumption that +the inputs are images, which allows us to encode certain properties +into the architecture. These then make the forward function more +efficient to implement and vastly reduce the amount of parameters in +the network.

    +
    +
    +

    Neural Networks vs CNNs#

    +

    Neural networks are defined as affine transformations, that is +a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an +output (to which a bias vector is usually added before passing the result +through a nonlinear activation function). This is applicable to any type of input, be it an +image, a sound clip or an unordered collection of features: whatever their +dimensionality, their representation can always be flattened into a vector +before the transformation.

    +
    +
    +

    Why CNNS for images, sound files, medical images from CT scans etc?#

    +

    However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic +structure. More formally, they share these important properties:

    +
      +
    • They are stored as multi-dimensional arrays (think of the pixels of a figure) .

    • +
    • They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).

    • +
    • One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).

    • +
    +

    These properties are not exploited when an affine transformation is applied; in +fact, all the axes are treated in the same way and the topological information +is not taken into account. Still, taking advantage of the implicit structure of +the data may prove very handy in solving some tasks, like computer vision and +speech recognition, and in these cases it would be best to preserve it. This is +where discrete convolutions come into play.

    +

    A discrete convolution is a linear transformation that preserves this notion of +ordering. It is sparse (only a few input units contribute to a given output +unit) and reuses parameters (the same weights are applied to multiple locations +in the input).

    +
    +
    +

    Regular NNs don’t scale well to full images#

    +

    As an example, consider +an image of size \(32\times 32\times 3\) (32 wide, 32 high, 3 color channels), so a +single fully-connected neuron in a first hidden layer of a regular +Neural Network would have \(32\times 32\times 3 = 3072\) weights. This amount still +seems manageable, but clearly this fully-connected structure does not +scale to larger images. For example, an image of more respectable +size, say \(200\times 200\times 3\), would lead to neurons that have +\(200\times 200\times 3 = 120,000\) weights.

    +

    We could have +several such neurons, and the parameters would add up quickly! Clearly, +this full connectivity is wasteful and the huge number of parameters +would quickly lead to possible overfitting.

    + + +

    Figure 1: A regular 3-layer Neural Network.

    +
    +
    +

    3D volumes of neurons#

    +

    Convolutional Neural Networks take advantage of the fact that the +input consists of images and they constrain the architecture in a more +sensible way.

    +

    In particular, unlike a regular Neural Network, the +layers of a CNN have neurons arranged in 3 dimensions: width, +height, depth. (Note that the word depth here refers to the third +dimension of an activation volume, not to the depth of a full Neural +Network, which can refer to the total number of layers in a network.)

    +

    To understand it better, the above example of an image +with an input volume of +activations has dimensions \(32\times 32\times 3\) (width, height, +depth respectively).

    +

    The neurons in a layer will +only be connected to a small region of the layer before it, instead of +all of the neurons in a fully-connected manner. Moreover, the final +output layer could for this specific image have dimensions \(1\times 1 \times 10\), +because by the +end of the CNN architecture we will reduce the full image into a +single vector of class scores, arranged along the depth +dimension.

    + + +

    Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

    +
    +
    +

    More on Dimensionalities#

    +

    In fields like signal processing (and imaging as well), one designs +so-called filters. These filters are defined by the convolutions and +are often hand-crafted. One may specify filters for smoothing, edge +detection, frequency reshaping, and similar operations. However with +neural networks the idea is to automatically learn the filters and use +many of them in conjunction with non-linear operations (activation +functions).

    +

    As an example consider a neural network operating on sound sequence +data. Assume that we an input vector \(\boldsymbol{x}\) of length \(d=10^6\). We +construct then a neural network with onle hidden layer only with +\(10^4\) nodes. This means that we will have a weight matrix with +\(10^4\times 10^6=10^{10}\) weights to be determined, together with \(10^4\) biases.

    +

    Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false). +It means that we have only one output node. But since this output node connects to \(10^4\) nodes in the hidden layer, there are in total \(10^4\) weights to be determined for the output layer, plus one bias. In total we have

    +
    +\[ +\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10}, +\]
    +

    that is ten billion parameters to determine.

    +
    +
    +

    Further remarks#

    +

    The main principles that justify convolutions is locality of +information and repetion of patterns within the signal. Sound samples +of the input in adjacent spots are much more likely to affect each +other than those that are very far away. Similarly, sounds are +repeated in multiple times in the signal. While slightly simplistic, +reasoning about such a sound example demonstrates this. The same +principles then apply to images and other similar data.

    +
    +
    +

    Layers used to build CNNs#

    +

    A simple CNN is a sequence of layers, and every layer of a CNN +transforms one volume of activations to another through a +differentiable function. We use three main types of layers to build +CNN architectures: Convolutional Layer, Pooling Layer, and +Fully-Connected Layer (exactly as seen in regular Neural Networks). We +will stack these layers to form a full CNN architecture.

    +

    A simple CNN for image classification could have the architecture:

    +
      +
    • INPUT (\(32\times 32 \times 3\)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.

    • +
    • CONV (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \([32\times 32\times 12]\) if we decided to use 12 filters.

    • +
    • RELU layer will apply an elementwise activation function, such as the \(max(0,x)\) thresholding at zero. This leaves the size of the volume unchanged (\([32\times 32\times 12]\)).

    • +
    • POOL (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \([16\times 16\times 12]\).

    • +
    • FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \([1\times 1\times 10]\), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

    • +
    +
    +
    +

    Transforming images#

    +

    CNNs transform the original image layer by layer from the original +pixel values to the final class scores.

    +

    Observe that some layers contain +parameters and other don’t. In particular, the CNN layers perform +transformations that are a function of not only the activations in the +input volume, but also of the parameters (the weights and biases of +the neurons). On the other hand, the RELU/POOL layers will implement a +fixed function. The parameters in the CONV/FC layers will be trained +with gradient descent so that the class scores that the CNN computes +are consistent with the labels in the training set for each image.

    +
    +
    +

    CNNs in brief#

    +

    In summary:

    +
      +
    • A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)

    • +
    • There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)

    • +
    • Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function

    • +
    • Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)

    • +
    • Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)

    • +
    +
    +
    +

    A deep CNN model (From Raschka et al)#

    + + +

    Figure 1: A deep CNN

    +
    +
    +

    Key Idea#

    +

    A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.

    +

    The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect +only neighboring neurons in the input instead of connecting all with the first hidden layer.

    +

    We say we perform a filtering (convolution is the mathematical operation).

    +
    +
    +

    Mathematics of CNNs#

    +

    The mathematics of CNNs is based on the mathematical operation of +convolution. In mathematics (in particular in functional analysis), +convolution is represented by mathematical operations (integration, +summation etc) on two functions in order to produce a third function +that expresses how the shape of one gets modified by the other. +Convolution has a plethora of applications in a variety of +disciplines, spanning from statistics to signal processing, computer +vision, solutions of differential equations,linear algebra, +engineering, and yes, machine learning.

    +

    Mathematically, convolution is defined as follows (one-dimensional example): +Let us define a continuous function \(y(t)\) given by

    +
    +\[ +y(t) = \int x(a) w(t-a) da, +\]
    +

    where \(x(a)\) represents a so-called input and \(w(t-a)\) is normally called the weight function or kernel.

    +

    The above integral is written in a more compact form as

    +
    +\[ +y(t) = \left(x * w\right)(t). +\]
    +

    The discretized version reads

    +
    +\[ +y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a). +\]
    +

    Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.

    +

    How can we use this? And what does it mean? Let us study some familiar examples first.

    +
    +
    +

    Convolution Examples: Polynomial multiplication#

    +

    Our first example is that of a multiplication between two polynomials, +which we will rewrite in terms of the mathematics of convolution. In +the final stage, since the problem here is a discrete one, we will +recast the final expression in terms of a matrix-vector +multiplication, where the matrix is a so-called Toeplitz matrix +.

    +

    Let us look a the following polynomials to second and third order, respectively:

    +
    +\[ +p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2, +\]
    +

    and

    +
    +\[ +s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3. +\]
    +

    The polynomial multiplication gives us a new polynomial of degree \(5\)

    +
    +\[ +z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5. +\]
    +
    +
    +

    Efficient Polynomial Multiplication#

    +

    Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution. +We note first that the new coefficients are given as

    +
    +\[\begin{split} +\begin{split} +\delta_0=&\alpha_0\beta_0\\ +\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\ +\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\ +\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\ +\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\ +\delta_5=&\alpha_2\beta_3.\\ +\end{split} +\end{split}\]
    +

    We note that \(\alpha_i=0\) except for \(i\in \left\{0,1,2\right\}\) and \(\beta_i=0\) except for \(i\in\left\{0,1,2,3\right\}\).

    +

    We can then rewrite the coefficients \(\delta_j\) using a discrete convolution as

    +
    +\[ +\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j, +\]
    +

    or as a double sum with restriction \(l=i+j\)

    +
    +\[ +\delta_l = \sum_{ij}\alpha_i\beta_{j}. +\]
    +
    +
    +

    Further simplification#

    +

    Although we may have redundant operations with some few zeros for \(\beta_i\), we can rewrite the above sum in a more compact way as

    +
    +\[ +\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k}, +\]
    +

    where \(m=3\) in our case, the maximum length of +the vector \(\alpha\). Note that the vector \(\boldsymbol{\beta}\) has length \(n=4\). Below we will find an even more efficient representation.

    +
    +
    +

    A more efficient way of coding the above Convolution#

    +

    Since we only have a finite number of \(\alpha\) and \(\beta\) values +which are non-zero, we can rewrite the above convolution expressions +as a matrix-vector multiplication

    +
    +\[\begin{split} +\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\ + \alpha_1 & \alpha_0 & 0 & 0 \\ + \alpha_2 & \alpha_1 & \alpha_0 & 0 \\ + 0 & \alpha_2 & \alpha_1 & \alpha_0 \\ + 0 & 0 & \alpha_2 & \alpha_1 \\ + 0 & 0 & 0 & \alpha_2 + \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}. +\end{split}\]
    +
    +
    +

    Commutative process#

    +

    The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding \(\beta\) and a vector holding \(\alpha\). +In this case we have

    +
    +\[\begin{split} +\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0 \\ + \beta_1 & \beta_0 & 0 \\ + \beta_2 & \beta_1 & \beta_0 \\ + \beta_3 & \beta_2 & \beta_1 \\ + 0 & \beta_3 & \beta_2 \\ + 0 & 0 & \beta_3 + \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}. +\end{split}\]
    +

    Note that the use of these matrices is for mathematical purposes only +and not implementation purposes. When implementing the above equation +we do not encode (and allocate memory) the matrices explicitely. We +rather code the convolutions in the minimal memory footprint that they +require.

    +
    +
    +

    Toeplitz matrices#

    +

    The above matrices are examples of so-called Toeplitz +matrices. A +Toeplitz matrix is a matrix in which each descending diagonal from +left to right is constant. For instance the last matrix, which we +rewrite as

    +
    +\[\begin{split} +\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0 \\ + a_1 & a_0 & 0 \\ + a_2 & a_1 & a_0 \\ + a_3 & a_2 & a_1 \\ + 0 & a_3 & a_2 \\ + 0 & 0 & a_3 + \end{bmatrix}, +\end{split}\]
    +

    with elements \(a_{ii}=a_{i+1,j+1}=a_{i-j}\) is an example of a Toeplitz +matrix. Such a matrix does not need to be a square matrix. Toeplitz +matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric +polynomial, compressed to a finite-dimensional space, can be +represented by such a matrix. The example above shows that we can +represent linear convolution as multiplication of a Toeplitz matrix by +a vector.

    +
    +
    +

    Fourier series and Toeplitz matrices#

    +

    This is an active and ogoing research area concerning CNNs. The following articles may be of interest

    +
      +
    1. Read more about the convolution theorem and Fouriers series

    2. +
    3. Fourier Transform Layer

    4. +
    +
    +
    +

    Generalizing the above one-dimensional case#

    +

    In order to align the above simple case with the more general +convolution cases, we rename \(\boldsymbol{\alpha}\), whose length is \(m=3\), +with \(\boldsymbol{w}\). We will interpret \(\boldsymbol{w}\) as a weight/filter function +with which we want to perform the convolution with an input variable +\(\boldsymbol{x}\) of length \(n\). We will assume always that the filter +\(\boldsymbol{w}\) has dimensionality \(m \le n\).

    +

    We replace thus \(\boldsymbol{\beta}\) with \(\boldsymbol{x}\) and \(\boldsymbol{\delta}\) with \(\boldsymbol{y}\) and have

    +
    +\[ +y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k), +\]
    +

    where \(m=3\) in our case, the maximum length of the vector \(\boldsymbol{w}\). +Here the symbol \(*\) represents the mathematical operation of convolution.

    +
    +
    +

    Memory considerations#

    +

    This expression leaves us however with some terms with negative +indices, for example \(x(-1)\) and \(x(-2)\) which may not be defined. Our +vector \(\boldsymbol{x}\) has components \(x(0)\), \(x(1)\), \(x(2)\) and \(x(3)\).

    +

    The index \(j\) for \(\boldsymbol{x}\) runs from \(j=0\) to \(j=3\) since \(\boldsymbol{x}\) is meant to +represent a third-order polynomial.

    +

    Furthermore, the index \(i\) runs from \(i=0\) to \(i=5\) since \(\boldsymbol{y}\) +contains the coefficients of a fifth-order polynomial. When \(i=5\) we +may also have values of \(x(4)\) and \(x(5)\) which are not defined.

    +
    +
    +

    Padding#

    +

    The solution to this is what is called padding! We simply define a +new vector \(x\) with two added elements set to zero before \(x(0)\) and +two new elements after \(x(3)\) set to zero. That is, we augment the +length of \(\boldsymbol{x}\) from \(n=4\) to \(n+2P=8\), where \(P=2\) is the padding +constant (a new hyperparameter), see discussions below as well.

    +
    +
    +

    New vector#

    +

    We have a new vector defined as \(x(0)=0\), \(x(1)=0\), +\(x(2)=\beta_0\), \(x(3)=\beta_1\), \(x(4)=\beta_2\), \(x(5)=\beta_3\), +\(x(6)=0\), and \(x(7)=0\).

    +

    We have added four new elements, which +are all zero. The benefit is that we can rewrite the equation for +\(\boldsymbol{y}\), with \(i=0,1,\dots,5\),

    +
    +\[ +y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k). +\]
    +

    As an example, we have

    +
    +\[ +y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2, +\]
    +

    as before except that we have an additional term \(x(6)w(0)\), which is zero.

    +

    Similarly, for the fifth-order term we have

    +
    +\[ +y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2. +\]
    +

    The zeroth-order term is

    +
    +\[ +y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0. +\]
    +
    +
    +

    Rewriting as dot products#

    +

    If we now flip the filter/weight vector, with the following term as a typical example

    +
    +\[ +y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0), +\]
    +

    with \(\tilde{w}(0)=w(2)\), \(\tilde{w}(1)=w(1)\), and \(\tilde{w}(2)=w(0)\), we can then rewrite the above sum as a dot product of +\(x(i:i+(m-1))\tilde{w}\) for element \(y(i)\), where \(x(i:i+(m-1))\) is simply a patch of \(\boldsymbol{x}\) of size \(m-1\).

    +

    The padding \(P\) we have introduced for the convolution stage is just +another hyperparameter which is introduced as part of the +architecture. Similarly, below we will also introduce another +hyperparameter called Stride \(S\).

    +
    +
    +

    Cross correlation#

    +

    In essentially all applications one uses what is called cross correlation instead of the standard convolution described above. +This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)

    +
    +\[ +y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k), +\]
    +

    we have now

    +
    +\[ +y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k). +\]
    +

    Both TensorFlow and PyTorch (as well as our own code example below), +implement the last equation, although it is normally referred to as +convolution. The same padding rules and stride rules discussed below +apply to this expression as well.

    +

    We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.

    +
    +
    +

    Two-dimensional objects#

    +

    We are now ready to start studying the discrete convolutions relevant for convolutional neural networks. +We often use convolutions over more than one dimension at a time. If +we have a two-dimensional image \(X\) as input, we can have a filter +defined by a two-dimensional kernel/weight/filter \(W\). This leads to an output \(Y\)

    +
    +\[ +Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n). +\]
    +

    Convolution is a commutative process, which means we can rewrite this equation as

    +
    +\[ +Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n). +\]
    +

    Normally the latter is more straightforward to implement in a machine +larning library since there is less variation in the range of values +of \(m\) and \(n\).

    +

    As mentioned above, most deep learning libraries implement +cross-correlation instead of convolution (although it is referred to as +convolution)

    +
    +\[ +Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n). +\]
    +
    +
    +

    CNNs in more detail, simple example#

    +

    Let assume we have an input matrix \(X\) of dimensionality \(3\times 3\) +and a \(2\times 2\) filter \(W\) given by the following matrices

    +
    +\[\begin{split} +\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02} \\ + x_{10} & x_{11} & x_{12} \\ + x_{20} & x_{21} & x_{22} \end{bmatrix}, +\end{split}\]
    +

    and

    +
    +\[\begin{split} +\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\ + w_{10} & w_{11}\end{bmatrix}. +\end{split}\]
    +

    We introduce now the hyperparameter \(S\) stride. Stride represents how the filter \(W\) moves the convolution process on the matrix \(X\). +We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin

    +

    Here we set the stride equal to \(S=1\), which means that, starting with the element \(x_{00}\), the filter will act on \(2\times 2\) submatrices each time, starting with the upper corner and moving according to the stride value column by column.

    +

    Here we perform the operation

    +
    +\[ +Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n), +\]
    +

    and obtain

    +
    +\[\begin{split} +\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\ + x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}. +\end{split}\]
    +

    We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \(\boldsymbol{X}'\) of length \(9\) and +a matrix \(\boldsymbol{W}'\) with dimension \(4\times 9\) as

    +
    +\[\begin{split} +\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix}, +\end{split}\]
    +

    and the new matrix

    +
    +\[\begin{split} +\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\ + 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\ + 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\ + 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}. +\end{split}\]
    +

    We see easily that performing the matrix-vector multiplication \(\boldsymbol{W}'\boldsymbol{X}'\) is the same as the above convolution with stride \(S=1\), that is

    +
    +\[ +Y=(\boldsymbol{W}*\boldsymbol{X}), +\]
    +

    is now given by \(\boldsymbol{W}'\boldsymbol{X}'\) which is a vector of length \(4\) instead of the originally resulting \(2\times 2\) output matrix.

    +
    +
    +

    The convolution stage#

    +

    The convolution stage, where we apply different filters \(\boldsymbol{W}\) in +order to reduce the dimensionality of an image, adds, in addition to +the weights and biases (to be trained by the back propagation +algorithm) that define the filters, two new hyperparameters, the so-called +padding \(P\) and the stride \(S\).

    +
    +
    +

    Finding the number of parameters#

    +

    In the above example we have an input matrix of dimension \(3\times +3\). In general we call the input for an input volume and it is defined +by its width \(H_1\), height \(H_1\) and depth \(D_1\). If we have the +standard three color channels \(D_1=3\).

    +

    The above example has \(W_1=H_1=3\) and \(D_1=1\).

    +

    When we introduce the filter we have the following additional hyperparameters

    +
      +
    1. \(K\) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well

    2. +
    3. \(F\) as the filter’s spatial extent

    4. +
    5. \(S\) as the stride parameter

    6. +
    7. \(P\) as the padding parameter

    8. +
    +

    These parameters are defined by the architecture of the network and are not included in the training.

    +
    +
    +

    New image (or volume)#

    +

    Acting with the filter on the input volume produces an output volume +which is defined by its width \(W_2\), its height \(H_2\) and its depth +\(D_2\).

    +

    These are defined by the following relations

    +
    +\[ +W_2 = \frac{(W_1-F+2P)}{S}+1, +\]
    +
    +\[ +H_2 = \frac{(H_1-F+2P)}{S}+1, +\]
    +

    and \(D_2=K\).

    +
    +
    +

    Parameters to train, common settings#

    +

    With parameter sharing, the convolution involves thus for each filter \(F\times F\times D_1\) weights plus one bias parameter.

    +

    In total we have

    +
    +\[ +\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}), +\]
    +

    parameters to train by back propagation.

    +

    It is common to let \(K\) come in powers of \(2\), that is \(32\), \(64\), \(128\) etc.

    +

    Common settings.

    +
      +
    1. \(\begin{array}{c} F=3 & S=1 & P=1 \end{array}\)

    2. +
    3. \(\begin{array}{c} F=5 & S=1 & P=2 \end{array}\)

    4. +
    5. \(\begin{array}{c} F=5 & S=2 & P=\mathrm{open} \end{array}\)

    6. +
    7. \(\begin{array}{c} F=1 & S=1 & P=0 \end{array}\)

    8. +
    +
    +
    +

    Examples of CNN setups#

    +

    Let us assume we have an input volume \(V\) given by an image of dimensionality +\(32\times 32 \times 3\), that is three color channels and \(32\times 32\) pixels.

    +

    We apply a filter of dimension \(5\times 5\) ten times with stride \(S=1\) and padding \(P=0\).

    +

    The output volume is given by \((32-5)/1+1=28\), resulting in ten images +of dimensionality \(28\times 28\times 3\).

    +

    The total number of parameters to train for each filter is then +\(5\times 5\times 3+1\), where the last parameter is the bias. This +gives us \(76\) parameters for each filter, leading to a total of \(760\) +parameters for the ten filters.

    +

    How many parameters will a filter of dimensionality \(3\times 3\) +(adding color channels) result in if we produce \(32\) new images? Use \(S=1\) and \(P=0\).

    +

    Note that strides constitute a form of subsampling. As an alternative to +being interpreted as a measure of how much the kernel/filter is translated, strides +can also be viewed as how much of the output is retained. For instance, moving +the kernel by hops of two is equivalent to moving the kernel by hops of one but +retaining only odd output elements.

    +
    +
    +

    Summarizing: Performing a general discrete convolution (From Raschka et al)#

    + + +

    Figure 1: A deep CNN

    +
    +
    +

    Pooling#

    +

    In addition to discrete convolutions themselves, pooling operations +make up another important building block in CNNs. Pooling operations reduce +the size of feature maps by using some function to summarize subregions, such +as taking the average or the maximum value.

    +

    Pooling works by sliding a window across the input and feeding the content of +the window to a pooling function. In some sense, pooling works very much +like a discrete convolution, but replaces the linear combination described by +the kernel with some other function.

    +
    +
    +

    Pooling arithmetic#

    +

    In a neural network, pooling layers provide invariance to small translations of +the input. The most common kind of pooling is max pooling, which +consists in splitting the input in (usually non-overlapping) patches and +outputting the maximum value of each patch. Other kinds of pooling exist, e.g., +mean or average pooling, which all share the same idea of aggregating the input +locally by applying a non-linearity to the content of some patches.

    +
    +
    +

    Pooling types (From Raschka et al)#

    + + +

    Figure 1: A deep CNN

    +
    +
    +

    Building convolutional neural networks using Tensorflow and Keras#

    +

    As discussed above, CNNs are neural networks built from the assumption that the inputs +to the network are 2D images. This is important because the number of features or pixels in images +grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.

    +

    As before, we still have our input, a hidden layer and an output. What’s novel about convolutional networks +are the convolutional and pooling layers stacked in pairs between the input and the hidden layer. +In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D +matrices, typically 1 for each color dimension (Red, Green, Blue).

    +
    +
    +

    Setting it up#

    +

    It means that to represent the entire +dataset of images, we require a 4D matrix or tensor. This tensor has the dimensions:

    +
    +\[ +(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) . +\]
    +
    +
    +

    The MNIST dataset again#

    +

    The MNIST dataset consists of grayscale images with a pixel size of +\(28\times 28\), meaning we require \(28 \times 28 = 724\) weights to each +neuron in the first hidden layer.

    +

    If we were to analyze images of size \(128\times 128\) we would require +\(128 \times 128 = 16384\) weights to each neuron. Even worse if we were +dealing with color images, as most images are, we have an image matrix +of size \(128\times 128\) for each color dimension (Red, Green, Blue), +meaning 3 times the number of weights \(= 49152\) are required for every +single neuron in the first hidden layer.

    +
    +
    +

    Strong correlations#

    +

    Images typically have strong local correlations, meaning that a small +part of the image varies little from its neighboring regions. If for +example we have an image of a blue car, we can roughly assume that a +small blue part of the image is surrounded by other blue regions.

    +

    Therefore, instead of connecting every single pixel to a neuron in the +first hidden layer, as we have previously done with deep neural +networks, we can instead connect each neuron to a small part of the +image (in all 3 RGB depth dimensions). The size of each small area is +fixed, and known as a receptive.

    +
    +
    +

    Layers of a CNN#

    +

    The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.
    +The input image is typically a square matrix of depth 3.

    +

    A convolution is performed on the image which outputs +a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as filters.

    +

    Each filter slides along the input image, taking the dot product +between each small part of the image and the filter, in all depth +dimensions. This is then passed through a non-linear function, +typically the Rectified Linear (ReLu) function, which serves as the +activation of the neurons in the first convolutional layer. This is +further passed through a pooling layer, which reduces the size of the +convolutional layer, e.g. by taking the maximum or average across some +small regions, and this serves as input to the next convolutional +layer.

    +
    +
    +

    Systematic reduction#

    +

    By systematically reducing the size of the input volume, through +convolution and pooling, the network should create representations of +small parts of the input, and then from them assemble representations +of larger areas. The final pooling layer is flattened to serve as +input to a hidden layer, such that each neuron in the final pooling +layer is connected to every single neuron in the hidden layer. This +then serves as input to the output layer, e.g. a softmax output for +classification.

    +
    +
    +

    Prerequisites: Collect and pre-process data#

    +
    +
    +
    %matplotlib inline
    +
    +# import necessary packages
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn import datasets
    +
    +
    +# ensure the same random numbers appear every time
    +np.random.seed(0)
    +
    +# display images in notebook
    +%matplotlib inline
    +plt.rcParams['figure.figsize'] = (12,12)
    +
    +
    +# download MNIST dataset
    +digits = datasets.load_digits()
    +
    +# define inputs and labels
    +inputs = digits.images
    +labels = digits.target
    +
    +# RGB images have a depth of 3
    +# our images are grayscale so they should have a depth of 1
    +inputs = inputs[:,:,:,np.newaxis]
    +
    +print("inputs = (n_inputs, pixel_width, pixel_height, depth) = " + str(inputs.shape))
    +print("labels = (n_inputs) = " + str(labels.shape))
    +
    +
    +# choose some random images to display
    +n_inputs = len(inputs)
    +indices = np.arange(n_inputs)
    +random_indices = np.random.choice(indices, size=5)
    +
    +for i, image in enumerate(digits.images[random_indices]):
    +    plt.subplot(1, 5, i+1)
    +    plt.axis('off')
    +    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    +    plt.title("Label: %d" % digits.target[random_indices[i]])
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Importing Keras and Tensorflow#

    +
    +
    +
    from tensorflow.keras import datasets, layers, models
    +from tensorflow.keras.layers import Input
    +from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
    +from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
    +from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
    +from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
    +from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
    +#from tensorflow.keras import Conv2D
    +#from tensorflow.keras import MaxPooling2D
    +#from tensorflow.keras import Flatten
    +
    +from sklearn.model_selection import train_test_split
    +
    +# representation of labels
    +labels = to_categorical(labels)
    +
    +# split into train and test data
    +# one-liner from scikit-learn library
    +train_size = 0.8
    +test_size = 1 - train_size
    +X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
    +                                                    test_size=test_size)
    +
    +
    +
    +
    +
    +
    +

    Running with Keras#

    +
    +
    +
    def create_convolutional_neural_network_keras(input_shape, receptive_field,
    +                                              n_filters, n_neurons_connected, n_categories,
    +                                              eta, lmbd):
    +    model = Sequential()
    +    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',
    +              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
    +    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    +    model.add(layers.Flatten())
    +    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
    +    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))
    +    
    +    sgd = optimizers.SGD(lr=eta)
    +    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    +    
    +    return model
    +
    +epochs = 100
    +batch_size = 100
    +input_shape = X_train.shape[1:4]
    +receptive_field = 3
    +n_filters = 10
    +n_neurons_connected = 50
    +n_categories = 10
    +
    +eta_vals = np.logspace(-5, 1, 7)
    +lmbd_vals = np.logspace(-5, 1, 7)
    +
    +
    +
    +
    +
    +
    +

    Final part#

    +
    +
    +
    CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
    +        
    +for i, eta in enumerate(eta_vals):
    +    for j, lmbd in enumerate(lmbd_vals):
    +        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
    +                                              n_filters, n_neurons_connected, n_categories,
    +                                              eta, lmbd)
    +        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
    +        scores = CNN.evaluate(X_test, Y_test)
    +        
    +        CNN_keras[i][j] = CNN
    +        
    +        print("Learning rate = ", eta)
    +        print("Lambda = ", lmbd)
    +        print("Test accuracy: %.3f" % scores[1])
    +        print()
    +
    +
    +
    +
    +
    +
    +

    Final visualization#

    +
    +
    +
    # visual representation of grid search
    +# uses seaborn heatmap, could probably do this in matplotlib
    +import seaborn as sns
    +
    +sns.set()
    +
    +train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
    +
    +for i in range(len(eta_vals)):
    +    for j in range(len(lmbd_vals)):
    +        CNN = CNN_keras[i][j]
    +
    +        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
    +        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
    +
    +        
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Training Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +fig, ax = plt.subplots(figsize = (10, 10))
    +sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
    +ax.set_title("Test Accuracy")
    +ax.set_ylabel("$\eta$")
    +ax.set_xlabel("$\lambda$")
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    The CIFAR01 data set#

    +

    The CIFAR10 dataset contains 60,000 color images in 10 classes, with +6,000 images in each class. The dataset is divided into 50,000 +training images and 10,000 testing images. The classes are mutually +exclusive and there is no overlap between them.

    +
    +
    +
    import tensorflow as tf
    +
    +from tensorflow.keras import datasets, layers, models
    +import matplotlib.pyplot as plt
    +
    +# We import the data set
    +(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
    +
    +# Normalize pixel values to be between 0 and 1 by dividing by 255. 
    +train_images, test_images = train_images / 255.0, test_images / 255.0
    +
    +
    +
    +
    +
    +
    +

    Verifying the data set#

    +

    To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image.

    +
    +
    +
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
    +               'dog', 'frog', 'horse', 'ship', 'truck']
    +plt.figure(figsize=(10,10))
    +for i in range(25):
    +    plt.subplot(5,5,i+1)
    +    plt.xticks([])
    +    plt.yticks([])
    +    plt.grid(False)
    +    plt.imshow(train_images[i], cmap=plt.cm.binary)
    +    # The CIFAR labels happen to be arrays, 
    +    # which is why you need the extra index
    +    plt.xlabel(class_names[train_labels[i][0]])
    +plt.show()
    +
    +
    +
    +
    +
    +
    +

    Set up the model#

    +

    The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.

    +

    As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.

    +
    +
    +
    model = models.Sequential()
    +model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
    +model.add(layers.MaxPooling2D((2, 2)))
    +model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    +model.add(layers.MaxPooling2D((2, 2)))
    +model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    +
    +# Let's display the architecture of our model so far.
    +
    +model.summary()
    +
    +
    +
    +
    +

    You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

    +
    +
    +

    Add Dense layers on top#

    +

    To complete our model, you will feed the last output tensor from the +convolutional base (of shape (4, 4, 64)) into one or more Dense layers +to perform classification. Dense layers take vectors as input (which +are 1D), while the current output is a 3D tensor. First, you will +flatten (or unroll) the 3D output to 1D, then add one or more Dense +layers on top. CIFAR has 10 output classes, so you use a final Dense +layer with 10 outputs and a softmax activation.

    +
    +
    +
    model.add(layers.Flatten())
    +model.add(layers.Dense(64, activation='relu'))
    +model.add(layers.Dense(10))
    +Here's the complete architecture of our model.
    +
    +model.summary()
    +
    +
    +
    +
    +

    As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.

    +
    +
    +

    Compile and train the model#

    +
    +
    +
    model.compile(optimizer='adam',
    +              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    +              metrics=['accuracy'])
    +
    +history = model.fit(train_images, train_labels, epochs=10, 
    +                    validation_data=(test_images, test_labels))
    +
    +
    +
    +
    +
    +
    +

    Finally, evaluate the model#

    +
    +
    +
    plt.plot(history.history['accuracy'], label='accuracy')
    +plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
    +plt.xlabel('Epoch')
    +plt.ylabel('Accuracy')
    +plt.ylim([0.5, 1])
    +plt.legend(loc='lower right')
    +
    +test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    +
    +print(test_acc)
    +
    +
    +
    +
    +
    +
    +

    Building code using Pytorch#

    +

    This code loads and normalizes the MNIST dataset. Thereafter it defines a CNN architecture with:

    +
      +
    1. Two convolutional layers

    2. +
    3. Max pooling

    4. +
    5. Dropout for regularization

    6. +
    7. Two fully connected layers

    8. +
    +

    It uses the Adam optimizer and for cost function it employs the +Cross-Entropy function. It trains for 10 epochs. +You can modify the architecture (number of layers, channels, dropout +rate) or training parameters (learning rate, batch size, epochs) to +experiment with different configurations.

    +
    +
    +
    import torch
    +import torch.nn as nn
    +import torch.nn.functional as F
    +import torch.optim as optim
    +from torchvision import datasets, transforms
    +
    +# Set device
    +device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    +
    +# Define transforms
    +transform = transforms.Compose([
    +   transforms.ToTensor(),
    +   transforms.Normalize((0.1307,), (0.3081,))
    +])
    +
    +# Load datasets
    +train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
    +test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
    +
    +# Create data loaders
    +train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
    +test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
    +
    +# Define CNN model
    +class CNN(nn.Module):
    +   def __init__(self):
    +       super(CNN, self).__init__()
    +       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
    +       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
    +       self.pool = nn.MaxPool2d(2, 2)
    +       self.fc1 = nn.Linear(64*7*7, 1024)
    +       self.fc2 = nn.Linear(1024, 10)
    +       self.dropout = nn.Dropout(0.5)
    +
    +   def forward(self, x):
    +       x = self.pool(F.relu(self.conv1(x)))
    +       x = self.pool(F.relu(self.conv2(x)))
    +       x = x.view(-1, 64*7*7)
    +       x = self.dropout(F.relu(self.fc1(x)))
    +       x = self.fc2(x)
    +       return x
    +
    +# Initialize model, loss function, and optimizer
    +model = CNN().to(device)
    +criterion = nn.CrossEntropyLoss()
    +optimizer = optim.Adam(model.parameters(), lr=0.001)
    +
    +# Training loop
    +num_epochs = 10
    +for epoch in range(num_epochs):
    +   model.train()
    +   running_loss = 0.0
    +   for batch_idx, (data, target) in enumerate(train_loader):
    +       data, target = data.to(device), target.to(device)
    +       optimizer.zero_grad()
    +       outputs = model(data)
    +       loss = criterion(outputs, target)
    +       loss.backward()
    +       optimizer.step()
    +       running_loss += loss.item()
    +
    +   print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')
    +
    +# Testing the model
    +model.eval()
    +correct = 0
    +total = 0
    +with torch.no_grad():
    +   for data, target in test_loader:
    +       data, target = data.to(device), target.to(device)
    +       outputs = model(data)
    +       _, predicted = torch.max(outputs.data, 1)
    +       total += target.size(0)
    +       correct += (predicted == target).sum().item()
    +
    +print(f'Test Accuracy: {100 * correct / total:.2f}%')
    +
    +
    +
    +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + +
    +
    + + \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.ipynb b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.ipynb new file mode 100644 index 000000000..ff55b6fa1 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.ipynb @@ -0,0 +1,85 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1232311e", + "metadata": {}, + "source": [ + "# Notebooks with MyST Markdown\n", + "\n", + "Jupyter Book also lets you write text-based notebooks using MyST Markdown.\n", + "See [the Notebooks with MyST Markdown documentation](https://jupyterbook.org/file-types/myst-notebooks.html) for more detailed instructions.\n", + "This page shows off a notebook written in MyST Markdown.\n", + "\n", + "## An example cell\n", + "\n", + "With MyST Markdown, you can define code cells with a directive like so:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f961e284", + "metadata": {}, + "outputs": [], + "source": [ + "print(2 + 2)" + ] + }, + { + "cell_type": "markdown", + "id": "3b5f5a93", + "metadata": {}, + "source": [ + "When your book is built, the contents of any `{code-cell}` blocks will be\n", + "executed with your default Jupyter kernel, and their outputs will be displayed\n", + "in-line with the rest of your content.\n", + "\n", + "```{seealso}\n", + "Jupyter Book uses [Jupytext](https://jupytext.readthedocs.io/en/latest/) to convert text-based files to notebooks, and can support [many other text-based notebook files](https://jupyterbook.org/file-types/jupytext.html).\n", + "```\n", + "\n", + "## Create a notebook with MyST Markdown\n", + "\n", + "MyST Markdown notebooks are defined by two things:\n", + "\n", + "1. YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed).\n", + " See the YAML at the top of this page for example.\n", + "2. The presence of `{code-cell}` directives, which will be executed with your book.\n", + "\n", + "That's all that is needed to get started!\n", + "\n", + "## Quickly add YAML metadata for MyST Notebooks\n", + "\n", + "If you have a markdown file and you'd like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:\n", + "\n", + "```\n", + "jupyter-book myst init path/to/markdownfile.md\n", + "```" + ] + } + ], + "metadata": { + "jupytext": { + "formats": "md:myst", + "text_representation": { + "extension": ".md", + "format_name": "myst", + "format_version": 0.13, + "jupytext_version": "1.11.5" + } + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "source_map": [ + 13, + 25, + 27 + ] + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb new file mode 100644 index 000000000..1e007e192 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb @@ -0,0 +1,122 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Content with notebooks\n", + "\n", + "You can also create content with Jupyter Notebooks. This means that you can include\n", + "code blocks and their outputs in your book.\n", + "\n", + "## Markdown + notebooks\n", + "\n", + "As it is markdown, you can embed images, HTML, etc into your posts!\n", + "\n", + "![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)\n", + "\n", + "You can also $add_{math}$ and\n", + "\n", + "$$\n", + "math^{blocks}\n", + "$$\n", + "\n", + "or\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "\\mbox{mean} la_{tex} \\\\ \\\\\n", + "math blocks\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "But make sure you \\$Escape \\$your \\$dollar signs \\$you want to keep!\n", + "\n", + "## MyST markdown\n", + "\n", + "MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check\n", + "out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),\n", + "or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).\n", + "\n", + "## Code blocks and outputs\n", + "\n", + "Jupyter Book will also embed your code blocks and output in your book.\n", + "For example, here's some sample Matplotlib code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from matplotlib import rcParams, cycler\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "plt.ion()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Fixing random state for reproducibility\n", + "np.random.seed(19680801)\n", + "\n", + "N = 10\n", + "data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]\n", + "data = np.array(data).T\n", + "cmap = plt.cm.coolwarm\n", + "rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))\n", + "\n", + "\n", + "from matplotlib.lines import Line2D\n", + "custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),\n", + " Line2D([0], [0], color=cmap(.5), lw=4),\n", + " Line2D([0], [0], color=cmap(1.), lw=4)]\n", + "\n", + "fig, ax = plt.subplots(figsize=(10, 5))\n", + "lines = ax.plot(data)\n", + "ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is a lot more that you can do with outputs (such as including interactive outputs)\n", + "with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb index e88352077..4c74687f4 100644 --- a/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb @@ -323,7 +323,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb index 820e0a768..2b233c4e1 100644 --- a/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb @@ -172,7 +172,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb index 23a5a9d27..b7689e250 100644 --- a/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb @@ -2,32 +2,33 @@ "cells": [ { "cell_type": "markdown", - "id": "1b941c35", + "id": "8e6632a0", "metadata": { "editable": true }, "source": [ "\n", - "" + "\n" ] }, { "cell_type": "markdown", - "id": "dc05b096", + "id": "82705c4f", "metadata": { "editable": true }, "source": [ "# Exercises week 37\n", + "\n", "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n", "\n", - "Date: **September 8-12, 2025**" + "Date: **September 8-12, 2025**\n" ] }, { "cell_type": "markdown", - "id": "2cf07405", + "id": "921bf331", "metadata": { "editable": true }, @@ -35,55 +36,56 @@ "## Learning goals\n", "\n", "After having completed these exercises you will have:\n", + "\n", "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n", "\n", "2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n", "\n", "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n", "\n", - "4. Scale the data properly" + "4. Scale the data properly\n" ] }, { "cell_type": "markdown", - "id": "3c139edb", + "id": "adff65d5", "metadata": { "editable": true }, "source": [ "## Simple one-dimensional second-order polynomial\n", "\n", - "We start with a very simple function" + "We start with a very simple function\n" ] }, { "cell_type": "markdown", - "id": "aad4cfac", + "id": "70418b3d", "metadata": { "editable": true }, "source": [ "$$\n", "f(x)= 2-x+5x^2,\n", - "$$" + "$$\n" ] }, { "cell_type": "markdown", - "id": "6682282f", + "id": "11a3cf73", "metadata": { "editable": true }, "source": [ - "defined for $x\\in [-2,2]$. You can add noise if you wish. \n", + "defined for $x\\in [-2,2]$. You can add noise if you wish.\n", "\n", "We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n", - "Feel free to play around with higher-order polynomials." + "Feel free to play around with higher-order polynomials.\n" ] }, { "cell_type": "markdown", - "id": "89e2f4c4", + "id": "04a06b51", "metadata": { "editable": true }, @@ -94,12 +96,12 @@ "standardize the features. This ensures all features are on a\n", "comparable scale, which is especially important when using\n", "regularization. Here we will perform standardization, scaling each\n", - "feature to have mean 0 and standard deviation 1." + "feature to have mean 0 and standard deviation 1.\n" ] }, { "cell_type": "markdown", - "id": "b06d4e53", + "id": "408db3d9", "metadata": { "editable": true }, @@ -114,13 +116,13 @@ "term, the data is shifted such that the intercept is effectively 0\n", ". (In practice, one could include an intercept in the model and not\n", "penalize it, but here we simplify by centering.)\n", - "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + "Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n" ] }, { "cell_type": "code", "execution_count": 1, - "id": "63796480", + "id": "37fb732c", "metadata": { "collapsed": false, "editable": true @@ -140,46 +142,46 @@ }, { "cell_type": "markdown", - "id": "80748600", + "id": "d861e1e3", "metadata": { "editable": true }, "source": [ - "Fill in the necessary details.\n", + "Fill in the necessary details. Do we need to center the $y$-values?\n", "\n", "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n", "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", - "same scale)." + "same scale).\n" ] }, { "cell_type": "markdown", - "id": "92751e5f", + "id": "b3e774d0", "metadata": { "editable": true }, "source": [ "## Exercise 2, calculate the gradients\n", "\n", - "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function." + "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n" ] }, { "cell_type": "markdown", - "id": "aedfbd7a", + "id": "d5dc7708", "metadata": { "editable": true }, "source": [ - "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$" + "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n" ] }, { "cell_type": "code", "execution_count": 2, - "id": "5d1288fa", + "id": "4c9c86ac", "metadata": { "collapsed": false, "editable": true @@ -187,7 +189,9 @@ "outputs": [], "source": [ "# Set regularization parameter, either a single value or a vector of values\n", - "lambda = ?\n", + "# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called \"anonymous functions\" or \"lambda functions.\"\n", + "lam = ?\n", + "\n", "\n", "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n", "I = np.eye(n_features)\n", @@ -200,7 +204,7 @@ }, { "cell_type": "markdown", - "id": "628f5e89", + "id": "eeae00fd", "metadata": { "editable": true }, @@ -208,37 +212,37 @@ "This computes the Ridge and OLS regression coefficients directly. The identity\n", "matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n", "then invert this matrix and multiply by $X^T y$. The result\n", - "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n", - "fitted parameters $\\boldsymbol{\\theta}$." + "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n", + "fitted parameters $\\boldsymbol{\\theta}$.\n" ] }, { "cell_type": "markdown", - "id": "f115ba4e", + "id": "e1c215d5", "metadata": { "editable": true }, "source": [ "### 3a)\n", "\n", - "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$." + "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n" ] }, { "cell_type": "markdown", - "id": "a9b5189c", + "id": "587dd3dc", "metadata": { "editable": true }, "source": [ "### 3b)\n", "\n", - "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36." + "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n" ] }, { "cell_type": "markdown", - "id": "a3969ff6", + "id": "bfa34697", "metadata": { "editable": true }, @@ -250,15 +254,15 @@ "necessary if $n$ and $p$ are so large that the closed-form might be\n", "too slow or memory-intensive. We derive the gradients from the cost\n", "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n", - "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n", + "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n", "\n", - "Below is a template code for gradient descent implementation of ridge:" + "Below is a template code for gradient descent implementation of ridge:\n" ] }, { "cell_type": "code", "execution_count": 3, - "id": "34d87303", + "id": "49245f55", "metadata": { "collapsed": false, "editable": true @@ -273,19 +277,8 @@ "# Initialize weights for gradient descent\n", "theta = np.zeros(n_features)\n", "\n", - "# Arrays to store history for plotting\n", - "cost_history = np.zeros(num_iters)\n", - "\n", "# Gradient descent loop\n", - "m = n_samples # number of data points\n", "for t in range(num_iters):\n", - " # Compute prediction error\n", - " error = X_norm.dot(theta) - y_centered \n", - " # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n", - " cost_OLS = ?\n", - " cost_Ridge = ?\n", - " # You could add a history for both methods (optional)\n", - " cost_history[t] = ?\n", " # Compute gradients for OSL and Ridge\n", " grad_OLS = ?\n", " grad_Ridge = ?\n", @@ -302,31 +295,33 @@ }, { "cell_type": "markdown", - "id": "989f70bb", + "id": "f3f43f2c", "metadata": { "editable": true }, "source": [ "### 4a)\n", "\n", - "Discuss the results as function of the learning rate parameters and the number of iterations." + "Write first a gradient descent code for OLS only using the above template.\n", + "Discuss the results as function of the learning rate parameters and the number of iterations\n" ] }, { "cell_type": "markdown", - "id": "370b2dad", + "id": "9ba303be", "metadata": { "editable": true }, "source": [ "### 4b)\n", "\n", - "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?" + "Write then a similar code for Ridge regression using the above template.\n", + "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n" ] }, { "cell_type": "markdown", - "id": "ef197cd7", + "id": "78362c6c", "metadata": { "editable": true }, @@ -346,13 +341,13 @@ "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n", "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n", "\n", - "Below is the code to generate the dataset:" + "Below is the code to generate the dataset:\n" ] }, { "cell_type": "code", - "execution_count": 4, - "id": "4ccc2f65", + "execution_count": null, + "id": "8be1cebe", "metadata": { "collapsed": false, "editable": true @@ -375,13 +370,13 @@ "X = np.random.randn(n_samples, n_features) # standard normal distribution\n", "\n", "# Generate target values y with a linear combination of X and theta_true, plus noise\n", - "noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n", + "noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n", "y = X.dot @ theta_true + noise" ] }, { "cell_type": "markdown", - "id": "00e279ef", + "id": "e2693666", "metadata": { "editable": true }, @@ -390,29 +385,29 @@ "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n", "coefficient. For example, feature 0 has\n", "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n", - "the expected relationship is:" + "the expected relationship is:\n" ] }, { "cell_type": "markdown", - "id": "c910b3f4", + "id": "bc954d12", "metadata": { "editable": true }, "source": [ "$$\n", "y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n", - "$$" + "$$\n" ] }, { "cell_type": "markdown", - "id": "89e6e040", + "id": "6534b610", "metadata": { "editable": true }, "source": [ - "You can remove the noise if you wish to. \n", + "You can remove the noise if you wish to.\n", "\n", "Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n", "\n", @@ -420,11 +415,15 @@ "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n", "generate the data. Keep in mind that due to regularization and noise,\n", "the learned values will not exactly equal the true ones, but they\n", - "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?" + "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n" ] } ], - "metadata": {}, + "metadata": { + "language_info": { + "name": "python" + } + }, "nbformat": 4, "nbformat_minor": 5 } \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek38.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek38.ipynb new file mode 100644 index 000000000..93d8969c2 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek38.ipynb @@ -0,0 +1,485 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1da77599", + "metadata": {}, + "source": [ + "# Exercises week 38\n", + "\n", + "## September 15-19\n", + "\n", + "## Resampling and the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "e9f27b0e", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Derive expectation and variances values related to linear regression\n", + "- Compute expectation and variances values related to linear regression\n", + "- Compute and evaluate the trade-off between bias and variance of a model\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n", + "\n", + "- The jupyter notebook with the exercises completed\n", + "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n" + ] + }, + { + "cell_type": "markdown", + "id": "984af8e3", + "metadata": {}, + "source": [ + "## Use the books!\n", + "\n", + "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n", + "\n", + "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n", + "\n", + "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n", + "\n", + "### Definitions\n", + "\n", + "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n" + ] + }, + { + "cell_type": "markdown", + "id": "c16f7d0e", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "9fcf981a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d4189366", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4fca21b", + "metadata": {}, + "source": [ + "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "5de0c7e6", + "metadata": {}, + "source": [ + "## Exercise 1: Expectation values for ordinary least squares expressions\n" + ] + }, + { + "cell_type": "markdown", + "id": "d878c699", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "08b7007d", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "46e93394", + "metadata": {}, + "source": [ + "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n" + ] + }, + { + "cell_type": "markdown", + "id": "be1b65be", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "d2143684", + "metadata": {}, + "source": [ + "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n", + "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n" + ] + }, + { + "cell_type": "markdown", + "id": "f5c2dc22", + "metadata": {}, + "source": [ + "## Exercise 2: Expectation values for Ridge regression\n" + ] + }, + { + "cell_type": "markdown", + "id": "3893e3e7", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "79dc571f", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "028209a1", + "metadata": {}, + "source": [ + "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b4e721fc", + "metadata": {}, + "source": [ + "**b)** Show that the variance is\n" + ] + }, + { + "cell_type": "markdown", + "id": "090eb1e1", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T}\\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b8e8697", + "metadata": {}, + "source": [ + "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n" + ] + }, + { + "cell_type": "markdown", + "id": "74bc300b", + "metadata": {}, + "source": [ + "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "eeb86010", + "metadata": {}, + "source": [ + "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n", + "\n", + "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n" + ] + }, + { + "cell_type": "markdown", + "id": "522a0d1d", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "831db06c", + "metadata": {}, + "source": [ + "**a)** Show that you can rewrite this into an expression which contains\n", + "\n", + "- the variance of the model (the variance term)\n", + "- the expected deviation of the mean of the model from the true data (the bias term)\n", + "- the variance of the noise\n", + "\n", + "In other words, show that:\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cc52b3c", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cb50416", + "metadata": {}, + "source": [ + "with\n" + ] + }, + { + "cell_type": "markdown", + "id": "e49bdbb4", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "eca5554a", + "metadata": {}, + "source": [ + "and\n" + ] + }, + { + "cell_type": "markdown", + "id": "b1054343", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", + "$$\n", + "\n", + "In order to arrive at the equation for the bias, we have to approximate the unknown function $f$ with the output/target values $y$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "70fbfcd7", + "metadata": {}, + "source": [ + "**b)** Explain what the terms mean and discuss their interpretations.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b8f8b9d1", + "metadata": {}, + "source": [ + "## Exercise 4: Computing the Bias and Variance\n" + ] + }, + { + "cell_type": "markdown", + "id": "9e012430", + "metadata": {}, + "source": [ + "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n", + "\n", + "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5bf581c", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "predictions = np.random.rand(bootstraps, n) * 10 + 10\n", + "# The definition of targets has been updated, and was wrong earlier in the week.\n", + "targets = np.random.rand(1, n)\n", + "\n", + "mse = ...\n", + "bias = ...\n", + "variance = ..." + ] + }, + { + "cell_type": "markdown", + "id": "7b1dc621", + "metadata": {}, + "source": [ + "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n", + "\n", + "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n" + ] + }, + { + "cell_type": "markdown", + "id": "8da63362", + "metadata": {}, + "source": [ + "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd5855e4", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.preprocessing import (\n", + " PolynomialFeatures,\n", + ") # use the fit_transform method of the created object!\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e35fa37", + "metadata": {}, + "outputs": [], + "source": [ + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "x = np.linspace(-3, 3, n)\n", + "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n", + "\n", + "biases = []\n", + "variances = []\n", + "mses = []\n", + "\n", + "# for p in range(1, 5):\n", + "# predictions = ...\n", + "# targets = ...\n", + "#\n", + "# X = ...\n", + "# X_train, X_test, y_train, y_test = ...\n", + "# for b in range(bootstraps):\n", + "# X_train_re, y_train_re = ...\n", + "#\n", + "# # fit your model on the sampled data\n", + "#\n", + "# # make predictions on the test data\n", + "# predictions[b, :] =\n", + "# targets[b, :] =\n", + "#\n", + "# biases.append(...)\n", + "# variances.append(...)\n", + "# mses.append(...)" + ] + }, + { + "cell_type": "markdown", + "id": "253b8461", + "metadata": {}, + "source": [ + "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n", + "\n", + "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n" + ] + }, + { + "cell_type": "markdown", + "id": "46250fbc", + "metadata": {}, + "source": [ + "## Exercise 5: Interpretation of scaling and metrics\n" + ] + }, + { + "cell_type": "markdown", + "id": "5af53055", + "metadata": {}, + "source": [ + "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n", + "\n", + "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n", + "\n", + "Briefly answer the following:\n", + "\n", + "**a)** Why do we scale data?\n", + "\n", + "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n", + "\n", + "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n", + "\n", + "**d)** Why do we say that the Ridge method gives a biased model?\n", + "\n", + "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n", + "\n", + "**h)** What is an advantage of the R2 score over the MSE?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek39.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek39.ipynb new file mode 100644 index 000000000..e9c229ddc --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek39.ipynb @@ -0,0 +1,185 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "433db993", + "metadata": {}, + "source": [ + "# Exercises week 39\n", + "\n", + "## Getting started with project 1\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b931365", + "metadata": {}, + "source": [ + "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n", + "\n", + "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2a63bae1", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Create a properly formatted report in Overleaf\n", + "- Select and present graphs for a scientific report\n", + "- Write an abstract and introduction for a scientific report\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise 4.\n" + ] + }, + { + "cell_type": "markdown", + "id": "e0f2d99d", + "metadata": {}, + "source": [ + "## Exercise 1: Creating the report document\n" + ] + }, + { + "cell_type": "markdown", + "id": "d06bfb29", + "metadata": {}, + "source": [ + "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n", + "\n", + "**a)** Create an account on Overleaf.com, or log in using SSO with your UiO email.\n", + "\n", + "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n", + "\n", + "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n", + "\n", + "**d)** Read the general guideline for writing a report, which can be found at .\n", + "\n", + "**e)** Look at the provided example of an earlier project, found at \n" + ] + }, + { + "cell_type": "markdown", + "id": "ec36f4c3", + "metadata": {}, + "source": [ + "## Exercise 2: Adding good figures\n" + ] + }, + { + "cell_type": "markdown", + "id": "f50723f8", + "metadata": {}, + "source": [ + "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n", + "\n", + "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n", + "\n", + "**c)** Refer to the figure in your text using \\ref.\n", + "\n", + "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n", + "\n", + "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n" + ] + }, + { + "cell_type": "markdown", + "id": "276c214e", + "metadata": {}, + "source": [ + "## Exercise 3: Writing an abstract and introduction\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4134eb5", + "metadata": {}, + "source": [ + "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n", + "\n", + "**a)** Read the guidelines on abstract and introduction before you start.\n", + "\n", + "**b)** Write an abstract for project 1 in your report.\n", + "\n", + "**c)** Write an introduction for project 1 in your report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2f512b59", + "metadata": {}, + "source": [ + "## Exercise 4: Making the code available and presentable\n" + ] + }, + { + "cell_type": "markdown", + "id": "77fe1fec", + "metadata": {}, + "source": [ + "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n", + "\n", + "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n", + "\n", + "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n", + "\n", + "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n", + "\n", + "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n", + "\n", + "**e)** Create a README file in the repository or project folder with\n", + "\n", + "- the name of the group members\n", + "- a short description of the project\n", + "- a description of how to install the required packages to run your code from a requirements.txt file\n", + "- names and descriptions of the various notebooks in the Code folder and the results they produce\n" + ] + }, + { + "cell_type": "markdown", + "id": "f1d72c56", + "metadata": {}, + "source": [ + "## Exercise 5: Referencing\n", + "\n", + "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n", + "\n", + "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n", + "\n", + "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n", + "\n", + "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek41.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek41.ipynb new file mode 100644 index 000000000..d5378765f --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek41.ipynb @@ -0,0 +1,804 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4b4c06bc", + "metadata": {}, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "bcb25e64", + "metadata": {}, + "source": [ + "# Exercises week 41\n", + "\n", + "**October 6-10, 2025**\n", + "\n", + "Date: **Deadline is Friday October 10 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "id": "bb01f126", + "metadata": {}, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n", + "\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", + "\n", + "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6f61b09", + "metadata": {}, + "outputs": [], + "source": [ + "import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later\n", + "from sklearn import datasets\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "\n", + "# Defining some activation functions\n", + "def ReLU(z):\n", + " return np.where(z > 0, z, 0)\n", + "\n", + "\n", + "def sigmoid(z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + "\n", + "def softmax(z):\n", + " \"\"\"Compute softmax values for each set of scores in the rows of the matrix z.\n", + " Used with batched input data.\"\"\"\n", + " e_z = np.exp(z - np.max(z, axis=0))\n", + " return e_z / np.sum(e_z, axis=1)[:, np.newaxis]\n", + "\n", + "\n", + "def softmax_vec(z):\n", + " \"\"\"Compute softmax values for each set of scores in the vector z.\n", + " Use this function when you use the activation function on one vector at a time\"\"\"\n", + " e_z = np.exp(z - np.max(z))\n", + " return e_z / np.sum(e_z)" + ] + }, + { + "cell_type": "markdown", + "id": "6248ec53", + "metadata": {}, + "source": [ + "# Exercise 1\n", + "\n", + "In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37f30740", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(2024)\n", + "\n", + "x = np.random.randn(2) # network input. This is a single input with two features\n", + "W1 = np.random.randn(4, 2) # first layer weights" + ] + }, + { + "cell_type": "markdown", + "id": "4ed2cf3d", + "metadata": {}, + "source": [ + "**a)** Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "edf7217b", + "metadata": {}, + "source": [ + "**b)** Define the bias of the first layer, `b1`with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2129c19f", + "metadata": {}, + "outputs": [], + "source": [ + "b1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "09e8d453", + "metadata": {}, + "source": [ + "**c)** Compute the intermediary `z1` for the first layer\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6837119b", + "metadata": {}, + "outputs": [], + "source": [ + "z1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "6f71374e", + "metadata": {}, + "source": [ + "**d)** Compute the activation `a1` for the first layer using the ReLU activation function defined earlier.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d41ed19", + "metadata": {}, + "outputs": [], + "source": [ + "a1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "088710c0", + "metadata": {}, + "source": [ + "Confirm that you got the correct activation with the test below. Make sure that you define `b1` with the randn function right after you define `W1`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d2f54b4", + "metadata": {}, + "outputs": [], + "source": [ + "sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])\n", + "\n", + "print(np.allclose(a1, sol1))" + ] + }, + { + "cell_type": "markdown", + "id": "7fb0cf46", + "metadata": {}, + "source": [ + "# Exercise 2\n", + "\n", + "Now we will add a layer to the network with an output of length 8 and ReLU activation.\n", + "\n", + "**a)** What is the input of the second layer? What is its shape?\n", + "\n", + "**b)** Define the weight and bias of the second layer with the right shapes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00063acf", + "metadata": {}, + "outputs": [], + "source": [ + "W2 = ...\n", + "b2 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "5bd7d84b", + "metadata": {}, + "source": [ + "**c)** Compute the intermediary `z2` and activation `a2` for the second layer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fd0383d", + "metadata": {}, + "outputs": [], + "source": [ + "z2 = ...\n", + "a2 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "1b5daae5", + "metadata": {}, + "source": [ + "Confirm that you got the correct activation shape with the test below.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7f2f8a1", + "metadata": {}, + "outputs": [], + "source": [ + "print(\n", + " np.allclose(np.exp(len(a2)), 2980.9579870417283)\n", + ") # This should evaluate to True if a2 has the correct shape :)" + ] + }, + { + "cell_type": "markdown", + "id": "3759620d", + "metadata": {}, + "source": [ + "# Exercise 3\n", + "\n", + "We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.\n", + "\n", + "**a)** Complete the function below so that it returns a list `layers` of weight and bias tuples `(W, b)` for each layer, in order, with the correct shapes that we can use later as our network parameters.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c58f10f9", + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = ...\n", + " b = ...\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers" + ] + }, + { + "cell_type": "markdown", + "id": "bdc0cda2", + "metadata": {}, + "source": [ + "**b)** Comple the function below so that it evaluates the intermediary `z` and activation `a` for each layer, with ReLU actication, and returns the final activation `a`. This is the complete feed-forward pass, a full neural network!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5262df05", + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_all_relu(layers, input):\n", + " a = input\n", + " for W, b in layers:\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "245adbcb", + "metadata": {}, + "source": [ + "**c)** Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89a8f70d", + "metadata": {}, + "outputs": [], + "source": [ + "input_size = ...\n", + "layer_output_sizes = [...]\n", + "\n", + "x = np.random.rand(input_size)\n", + "layers = ...\n", + "predict = ...\n", + "print(predict)" + ] + }, + { + "cell_type": "markdown", + "id": "0da7fd52", + "metadata": {}, + "source": [ + "**d)** Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "306d8b7c", + "metadata": {}, + "source": [ + "# Exercise 4 - Custom activation for each layer\n" + ] + }, + { + "cell_type": "markdown", + "id": "221c7b6c", + "metadata": {}, + "source": [ + "So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.\n" + ] + }, + { + "cell_type": "markdown", + "id": "10896d06", + "metadata": {}, + "source": [ + "**a)** Complete the `feed_forward` function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de062369", + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward(input, layers, activation_funcs):\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "8f7df363", + "metadata": {}, + "source": [ + "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", + "\n", + "Evaluate a network with three layers and these activation functions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "301b46dc", + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = ...\n", + "layer_output_sizes = [...]\n", + "activation_funcs = [ReLU, ReLU, sigmoid]\n", + "layers = ...\n", + "\n", + "x = np.random.randn(network_input_size)\n", + "feed_forward(x, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "9c914fd0", + "metadata": {}, + "source": [ + "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "a8d6c425", + "metadata": {}, + "source": [ + "# Exercise 5 - Processing multiple inputs at once\n" + ] + }, + { + "cell_type": "markdown", + "id": "0f4330a4", + "metadata": {}, + "source": [ + "So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.\n", + "\n", + "To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.\n" + ] + }, + { + "cell_type": "markdown", + "id": "17023bb7", + "metadata": {}, + "source": [ + "**a)** Complete the function `create_layers_batch` so that the weight matrix is the transpose of what it was when you only sent in one input at a time.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a241fd79", + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers_batch(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = ...\n", + " b = ...\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers" + ] + }, + { + "cell_type": "markdown", + "id": "a6349db6", + "metadata": {}, + "source": [ + "**b)** Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function `feed_forward_batch` so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "425f3bcc", + "metadata": {}, + "outputs": [], + "source": [ + "inputs = np.random.rand(1000, 4)\n", + "\n", + "\n", + "def feed_forward_batch(inputs, layers, activation_funcs):\n", + " a = inputs\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "efd07b4e", + "metadata": {}, + "source": [ + "**c)** Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce6fcc2f", + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = ...\n", + "layer_output_sizes = [...]\n", + "activation_funcs = [...]\n", + "layers = create_layers_batch(network_input_size, layer_output_sizes)\n", + "\n", + "x = np.random.randn(network_input_size)\n", + "feed_forward_batch(inputs, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "87999271", + "metadata": {}, + "source": [ + "You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.\n" + ] + }, + { + "cell_type": "markdown", + "id": "237eb782", + "metadata": {}, + "source": [ + "# Exercise 6 - Predicting on real data\n" + ] + }, + { + "cell_type": "markdown", + "id": "54d5fde7", + "metadata": {}, + "source": [ + "You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).\n", + "\n", + "This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bd4c148", + "metadata": {}, + "outputs": [], + "source": [ + "iris = datasets.load_iris()\n", + "\n", + "_, ax = plt.subplots()\n", + "scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)\n", + "ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])\n", + "_ = ax.legend(\n", + " scatter.legend_elements()[0], iris.target_names, loc=\"lower right\", title=\"Classes\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed3e2fc9", + "metadata": {}, + "outputs": [], + "source": [ + "inputs = iris.data\n", + "\n", + "# Since each prediction is a vector with a score for each of the three types of flowers,\n", + "# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.\n", + "targets = np.zeros((len(iris.data), 3))\n", + "for i, t in enumerate(iris.target):\n", + " targets[i, t] = 1\n", + "\n", + "\n", + "def accuracy(predictions, targets):\n", + " one_hot_predictions = np.zeros(predictions.shape)\n", + "\n", + " for i, prediction in enumerate(predictions):\n", + " one_hot_predictions[i, np.argmax(prediction)] = 1\n", + " return accuracy_score(one_hot_predictions, targets)" + ] + }, + { + "cell_type": "markdown", + "id": "0362c4a9", + "metadata": {}, + "source": [ + "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n" + ] + }, + { + "cell_type": "markdown", + "id": "bf62607e", + "metadata": {}, + "source": [ + "**b)** Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 \"nodes\", the second has the number of nodes you found in exercise a). Softmax returns a \"probability distribution\", in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5366d4ae", + "metadata": {}, + "outputs": [], + "source": [ + "...\n", + "layers = ..." + ] + }, + { + "cell_type": "markdown", + "id": "c528846f", + "metadata": {}, + "source": [ + "**c)** Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c783105", + "metadata": {}, + "outputs": [], + "source": [ + "predictions = feed_forward_batch(inputs, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "01a3caa8", + "metadata": {}, + "source": [ + "**d)** Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2612b82", + "metadata": {}, + "outputs": [], + "source": [ + "print(accuracy(predictions, targets))" + ] + }, + { + "cell_type": "markdown", + "id": "334560b6", + "metadata": {}, + "source": [ + "# Exercise 7 - Training on real data (Optional)\n", + "\n", + "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n" + ] + }, + { + "cell_type": "markdown", + "id": "700cabe4", + "metadata": {}, + "source": [ + "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f30e6e2c", + "metadata": {}, + "outputs": [], + "source": [ + "def cross_entropy(predict, target):\n", + " return np.sum(-target * np.log(predict))\n", + "\n", + "\n", + "def cost(input, layers, activation_funcs, target):\n", + " predict = feed_forward_batch(input, layers, activation_funcs)\n", + " return cross_entropy(predict, target)" + ] + }, + { + "cell_type": "markdown", + "id": "7ea9c1a4", + "metadata": {}, + "source": [ + "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n", + "\n", + "$$\n", + "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6c753e3b", + "metadata": {}, + "source": [ + "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56bef776", + "metadata": {}, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "\n", + "gradient_func = grad(\n", + " cost, 1\n", + ") # Taking the gradient wrt. the second input to the cost function, i.e. the layers" + ] + }, + { + "cell_type": "markdown", + "id": "7b1b74bc", + "metadata": {}, + "source": [ + "**a)** What shape should the gradient of the cost function wrt. weights and biases be?\n", + "\n", + "**b)** Use the `gradient_func` function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what's inside. What does the `grad` func from autograd actually do?\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "841c9e87", + "metadata": {}, + "outputs": [], + "source": [ + "layers_grad = gradient_func(\n", + " inputs, layers, activation_funcs, targets\n", + ") # Don't change this" + ] + }, + { + "cell_type": "markdown", + "id": "adc9e9be", + "metadata": {}, + "source": [ + "**c)** Finish the `train_network` function.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e4d38d3", + "metadata": {}, + "outputs": [], + "source": [ + "def train_network(\n", + " inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n", + "):\n", + " for i in range(epochs):\n", + " layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n", + " for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n", + " W -= ...\n", + " b -= ..." + ] + }, + { + "cell_type": "markdown", + "id": "2f65d663", + "metadata": {}, + "source": [ + "**e)** What do we call the gradient method used above?\n" + ] + }, + { + "cell_type": "markdown", + "id": "7059dd8c", + "metadata": {}, + "source": [ + "**d)** Train your network and see how the accuracy changes! Make a plot if you want.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5027c7a5", + "metadata": {}, + "outputs": [], + "source": [ + "..." + ] + }, + { + "cell_type": "markdown", + "id": "3bc77016", + "metadata": {}, + "source": [ + "**e)** How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek42.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek42.ipynb new file mode 100644 index 000000000..19e4e09c7 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek42.ipynb @@ -0,0 +1,719 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercises week 42\n", + "\n", + "**October 13-17, 2025**\n", + "\n", + "Date: **Deadline is Friday October 17 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "The aim of the exercises this week is to train the neural network you implemented last week.\n", + "\n", + "To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.\n", + "\n", + "You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.\n", + "\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, some setup code that you will need.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later\n", + "from autograd import grad, elementwise_grad\n", + "from sklearn import datasets\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "\n", + "# Defining some activation functions\n", + "def ReLU(z):\n", + " return np.where(z > 0, z, 0)\n", + "\n", + "\n", + "# Derivative of the ReLU function\n", + "def ReLU_der(z):\n", + " return np.where(z > 0, 1, 0)\n", + "\n", + "\n", + "def sigmoid(z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + "\n", + "def mse(predict, target):\n", + " return np.mean((predict - target) ** 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1 - Understand the feed forward pass\n", + "\n", + "**a)** Complete last weeks' exercises if you haven't already (recommended).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 2 - Gradient with one layer using autograd\n", + "\n", + "For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.\n", + "\n", + "In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_one_layer(W, b, x):\n", + " z = ...\n", + " a = ...\n", + " return a\n", + "\n", + "\n", + "def cost_one_layer(W, b, x, target):\n", + " predict = feed_forward_one_layer(W, b, x)\n", + " return mse(predict, target)\n", + "\n", + "\n", + "x = np.random.rand(2)\n", + "target = np.random.rand(3)\n", + "\n", + "W = ...\n", + "b = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "autograd_one_layer = grad(cost_one_layer, [0, 1])\n", + "W_g, b_g = autograd_one_layer(W, b, x, target)\n", + "print(W_g, b_g)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 3 - Gradient with one layer writing backpropagation by hand\n", + "\n", + "Before you use the gradient you found using autograd, you will have to find the gradient \"manually\", to better understand how the backpropagation computation works. To do backpropagation \"manually\", you will need to write out expressions for many derivatives along the computation.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.\n", + "\n", + "$$\n", + "\\frac{dC}{dW} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{dW}\n", + "$$\n", + "\n", + "$$\n", + "\\frac{dC}{db} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{db}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Which intermediary results can be reused between the two expressions?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = W @ x + b\n", + "a = sigmoid(z)\n", + "\n", + "predict = a\n", + "\n", + "\n", + "def mse_der(predict, target):\n", + " return ...\n", + "\n", + "\n", + "print(mse_der(predict, target))\n", + "\n", + "cost_autograd = grad(mse, 0)\n", + "print(cost_autograd(predict, target))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sigmoid_der(z):\n", + " return ...\n", + "\n", + "\n", + "print(sigmoid_der(z))\n", + "\n", + "sigmoid_autograd = elementwise_grad(sigmoid, 0)\n", + "print(sigmoid_autograd(z))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Using the two derivatives you just computed, compute this intermetidary gradient you will use later:\n", + "\n", + "$$\n", + "\\frac{dC}{dz} = \\frac{dC}{da}\\frac{da}{dz}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da = ...\n", + "dC_dz = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**e)** What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**f)** Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da = ...\n", + "dC_dz = ...\n", + "dC_dW = ...\n", + "dC_db = ...\n", + "\n", + "print(dC_dW, dC_db)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should get the same results as with autograd.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "W_g, b_g = autograd_one_layer(W, b, x, target)\n", + "print(W_g, b_g)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 4 - Gradient with two layers writing backpropagation by hand\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let's move up to two layers.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.random.rand(2)\n", + "target = np.random.rand(4)\n", + "\n", + "W1 = np.random.rand(3, 2)\n", + "b1 = np.random.rand(3)\n", + "\n", + "W2 = np.random.rand(4, 3)\n", + "b2 = np.random.rand(4)\n", + "\n", + "layers = [(W1, b1), (W2, b2)]" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [], + "source": [ + "z1 = W1 @ x + b1\n", + "a1 = sigmoid(z1)\n", + "z2 = W2 @ a1 + b2\n", + "a2 = sigmoid(z2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.\n", + "\n", + "**a)** Compute the gradients of the last layer, just like you did the single layer in the previous exercise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da2 = ...\n", + "dC_dz2 = ...\n", + "dC_dW2 = ...\n", + "dC_db2 = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.\n", + "\n", + "$$\n", + "\\frac{dC}{da_1} = \\frac{dC}{dz_2}\\frac{dz_2}{da_1}\n", + "$$\n", + "\n", + "**b)** What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute $z_2$)\n", + "\n", + "$$\n", + "\\frac{dz_2}{da_1}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.\n", + "\n", + "$$\n", + "\\frac{dC}{dW_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{dW_1}\n", + "$$\n", + "\n", + "$$\n", + "\\frac{dC}{db_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{db_1}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da1 = ...\n", + "dC_dz1 = ...\n", + "dC_dW1 = ...\n", + "dC_db1 = ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(dC_dW1, dC_db1)\n", + "print(dC_dW2, dC_db2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Make sure you got the same gradient as the following code which uses autograd to do backpropagation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_two_layers(layers, x):\n", + " W1, b1 = layers[0]\n", + " z1 = W1 @ x + b1\n", + " a1 = sigmoid(z1)\n", + "\n", + " W2, b2 = layers[1]\n", + " z2 = W2 @ a1 + b2\n", + " a2 = sigmoid(z2)\n", + "\n", + " return a2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cost_two_layers(layers, x, target):\n", + " predict = feed_forward_two_layers(layers, x)\n", + " return mse(predict, target)\n", + "\n", + "\n", + "grad_two_layers = grad(cost_two_layers, 0)\n", + "grad_two_layers(layers, x, target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**e)** How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 5 - Gradient with any number of layers writing backpropagation by hand\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done on getting this far! Now it's time to compute the gradient with any number of layers.\n", + "\n", + "First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = np.random.randn(layer_output_size, i_size)\n", + " b = np.random.randn(layer_output_size)\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers\n", + "\n", + "\n", + "def feed_forward(input, layers, activation_funcs):\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = W @ a + b\n", + " a = activation_func(z)\n", + " return a\n", + "\n", + "\n", + "def cost(layers, input, activation_funcs, target):\n", + " predict = feed_forward(input, layers, activation_funcs)\n", + " return mse(predict, target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.\n", + "\n", + "Here is a function which does that for you.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_saver(input, layers, activation_funcs):\n", + " layer_inputs = []\n", + " zs = []\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " layer_inputs.append(a)\n", + " z = W @ a + b\n", + " a = activation_func(z)\n", + "\n", + " zs.append(z)\n", + "\n", + " return layer_inputs, zs, a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def backpropagation(\n", + " input, layers, activation_funcs, target, activation_ders, cost_der=mse_der\n", + "):\n", + " layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)\n", + "\n", + " layer_grads = [() for layer in layers]\n", + "\n", + " # We loop over the layers, from the last to the first\n", + " for i in reversed(range(len(layers))):\n", + " layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]\n", + "\n", + " if i == len(layers) - 1:\n", + " # For last layer we use cost derivative as dC_da(L) can be computed directly\n", + " dC_da = ...\n", + " else:\n", + " # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)\n", + " (W, b) = layers[i + 1]\n", + " dC_da = ...\n", + "\n", + " dC_dz = ...\n", + " dC_dW = ...\n", + " dC_db = ...\n", + "\n", + " layer_grads[i] = (dC_dW, dC_db)\n", + "\n", + " return layer_grads" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = 2\n", + "layer_output_sizes = [3, 4]\n", + "activation_funcs = [sigmoid, ReLU]\n", + "activation_ders = [sigmoid_der, ReLU_der]\n", + "\n", + "layers = create_layers(network_input_size, layer_output_sizes)\n", + "\n", + "x = np.random.rand(network_input_size)\n", + "target = np.random.rand(4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)\n", + "print(layer_grads)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cost_grad = grad(cost, 0)\n", + "cost_grad(layers, x, [sigmoid, ReLU], target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 6 - Batched inputs\n", + "\n", + "Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 7 - Training\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.\n", + "- IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!\n", + "- Instead, use the fact that the derivatives multiplied together simplify to **prediction - target** (see [source1](https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1), [source2](https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/))\n", + "\n", + "**b)** Use stochastic gradient descent with momentum when you train your network.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 8 (Optional) - Object orientation\n", + "\n", + "Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.\n", + "\n", + "**a)** Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.\n", + "\n", + "We provide here a skeleton structure which should get you started.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class NeuralNetwork:\n", + " def __init__(\n", + " self,\n", + " network_input_size,\n", + " layer_output_sizes,\n", + " activation_funcs,\n", + " activation_ders,\n", + " cost_fun,\n", + " cost_der,\n", + " ):\n", + " pass\n", + "\n", + " def predict(self, inputs):\n", + " # Simple feed forward pass\n", + " pass\n", + "\n", + " def cost(self, inputs, targets):\n", + " pass\n", + "\n", + " def _feed_forward_saver(self, inputs):\n", + " pass\n", + "\n", + " def compute_gradient(self, inputs, targets):\n", + " pass\n", + "\n", + " def update_weights(self, layer_grads):\n", + " pass\n", + "\n", + " # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it\n", + " def autograd_compliant_predict(self, layers, inputs):\n", + " pass\n", + "\n", + " def autograd_gradient(self, inputs, targets):\n", + " pass" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek43.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek43.ipynb new file mode 100644 index 000000000..e02a479e8 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek43.ipynb @@ -0,0 +1,647 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "860d70d8", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "119c0988", + "metadata": { + "editable": true + }, + "source": [ + "# Exercises week 43 \n", + "**October 20-24, 2025**\n", + "\n", + "Date: **Deadline Friday October 24 at midnight**" + ] + }, + { + "cell_type": "markdown", + "id": "909887eb", + "metadata": { + "editable": true + }, + "source": [ + "# Overarching aims of the exercises for week 43\n", + "\n", + "The aim of the exercises this week is to gain some confidence with\n", + "ways to visualize the results of a classification problem. We will\n", + "target three ways of setting up the analysis. The first and simplest\n", + "one is the\n", + "1. so-called confusion matrix. The next one is the so-called\n", + "\n", + "2. ROC curve. Finally we have the\n", + "\n", + "3. Cumulative gain curve.\n", + "\n", + "We will use Logistic Regression as method for the classification in\n", + "this exercise. You can compare these results with those obtained with\n", + "your neural network code from project 2 without a hidden layer.\n", + "\n", + "In these exercises we will use binary and multi-class data sets\n", + "(the Iris data set from week 41).\n", + "\n", + "The underlying mathematics is described here." + ] + }, + { + "cell_type": "markdown", + "id": "1e1cb4fb", + "metadata": { + "editable": true + }, + "source": [ + "### Confusion Matrix\n", + "\n", + "A **confusion matrix** summarizes a classifier’s performance by\n", + "tabulating predictions versus true labels. For binary classification,\n", + "it is a $2\\times2$ table whose entries are counts of outcomes:" + ] + }, + { + "cell_type": "markdown", + "id": "7b090385", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e14904b", + "metadata": { + "editable": true + }, + "source": [ + "Here TP (true positives) is the number of cases correctly predicted as\n", + "positive, FP (false positives) is the number incorrectly predicted as\n", + "positive, TN (true negatives) is correctly predicted negative, and FN\n", + "(false negatives) is incorrectly predicted negative . In other words,\n", + "“positive” means class 1 and “negative” means class 0; for example, TP\n", + "occurs when the prediction and actual are both positive. Formally:" + ] + }, + { + "cell_type": "markdown", + "id": "e93ea290", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c80bea5b", + "metadata": { + "editable": true + }, + "source": [ + "where TPR and FPR are the true and false positive rates defined below.\n", + "\n", + "In multiclass classification with $K$ classes, the confusion matrix\n", + "generalizes to a $K\\times K$ table. Entry $N_{ij}$ in the table is\n", + "the count of instances whose true class is $i$ and whose predicted\n", + "class is $j$. For example, a three-class confusion matrix can be written\n", + "as:" + ] + }, + { + "cell_type": "markdown", + "id": "a0f68f5f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "869669b2", + "metadata": { + "editable": true + }, + "source": [ + "Here the diagonal entries $N_{ii}$ are the true positives for each\n", + "class, and off-diagonal entries are misclassifications. This matrix\n", + "allows computation of per-class metrics: e.g. for class $i$,\n", + "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n", + "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n", + "all remaining entries.\n", + "\n", + "As defined above, TPR and FPR come from the binary case. In binary\n", + "terms with $P$ actual positives and $N$ actual negatives, one has" + ] + }, + { + "cell_type": "markdown", + "id": "2abd82a7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n", + "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2f79325c", + "metadata": { + "editable": true + }, + "source": [ + "as used in standard confusion-matrix\n", + "formulations. These rates will be used in constructing ROC curves." + ] + }, + { + "cell_type": "markdown", + "id": "0ce65a47", + "metadata": { + "editable": true + }, + "source": [ + "### ROC Curve\n", + "\n", + "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n", + "between true positives and false positives as a discrimination\n", + "threshold varies. Specifically, for a binary classifier that outputs\n", + "a score or probability, one varies the threshold $t$ for declaring\n", + "**positive**, and computes at each $t$ the true positive rate\n", + "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n", + "confusion matrix at that threshold. The ROC curve is then the graph\n", + "of TPR versus FPR. By definition," + ] + }, + { + "cell_type": "markdown", + "id": "d750fdff", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "561bfb2c", + "metadata": { + "editable": true + }, + "source": [ + "where $TP,FP,TN,FN$ are counts determined by threshold $t$. A perfect\n", + "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n", + "\n", + "Formally, the ROC curve is obtained by plotting\n", + "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n", + "sweeps through the sorted scores). The Area Under the ROC Curve (AUC)\n", + "quantifies the average performance over all thresholds. It can be\n", + "interpreted probabilistically: $\\mathrm{AUC} =\n", + "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n", + "instance $X^+$ receives a higher score $s$ than a random negative\n", + "instance $X^-$ . Equivalently, the AUC is the integral under the ROC\n", + "curve:" + ] + }, + { + "cell_type": "markdown", + "id": "5ca722fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "30080a86", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ ranges over FPR (or fraction of negatives). A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0." + ] + }, + { + "cell_type": "markdown", + "id": "9e627156", + "metadata": { + "editable": true + }, + "source": [ + "### Cumulative Gain\n", + "\n", + "The cumulative gain curve (or gains chart) evaluates how many\n", + "positives are captured as one targets an increasing fraction of the\n", + "population, sorted by model confidence. To construct it, sort all\n", + "instances by decreasing predicted probability of the positive class.\n", + "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n", + "of all actual positives that fall in this subset. In formula form, if\n", + "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n", + "number of positives among the top $\\alpha$ of the data, the cumulative\n", + "gain at level $\\alpha$ is" + ] + }, + { + "cell_type": "markdown", + "id": "3e9132ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "75be6f5c", + "metadata": { + "editable": true + }, + "source": [ + "For example, cutting off at the top 10% of predictions yields a gain\n", + "equal to (positives in top 10%) divided by (total positives) .\n", + "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n", + "gives the gain curve. The baseline (random) curve is the diagonal\n", + "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n", + "toward 1.\n", + "\n", + "A related measure is the {\\em lift}, often called the gain ratio. It is the ratio of the model’s capture rate to that of random selection. Equivalently," + ] + }, + { + "cell_type": "markdown", + "id": "e5525570", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "18ff8dc2", + "metadata": { + "editable": true + }, + "source": [ + "A lift $>1$ indicates better-than-random targeting. In practice, gain\n", + "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n", + "show how many positives can be “gained” by focusing on a fraction of\n", + "the population ." + ] + }, + { + "cell_type": "markdown", + "id": "c3d3fde8", + "metadata": { + "editable": true + }, + "source": [ + "### Other measures: Precision, Recall, and the F$_1$ Measure\n", + "\n", + "Precision and recall (sensitivity) quantify binary classification\n", + "accuracy in terms of positive predictions. They are defined from the\n", + "confusion matrix as:" + ] + }, + { + "cell_type": "markdown", + "id": "f1f14c8e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "422cc743", + "metadata": { + "editable": true + }, + "source": [ + "Precision is the fraction of predicted positives that are correct, and\n", + "recall is the fraction of actual positives that are correctly\n", + "identified . A high-precision classifier makes few false-positive\n", + "errors, while a high-recall classifier makes few false-negative\n", + "errors.\n", + "\n", + "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean. The usual formula is:" + ] + }, + { + "cell_type": "markdown", + "id": "621a2e8b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "62eee54a", + "metadata": { + "editable": true + }, + "source": [ + "This can be shown to equal" + ] + }, + { + "cell_type": "markdown", + "id": "7a6a2e7a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b96c9ff4", + "metadata": { + "editable": true + }, + "source": [ + "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n", + "trade-off between precision and recall.\n", + "\n", + "For multi-class classification, one computes per-class\n", + "precision/recall/F$_1$ (treating each class as “positive” in a\n", + "one-vs-rest manner) and then averages. Common averaging methods are:\n", + "\n", + "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n", + "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ . This treats all classes equally regardless of size.\n", + "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$. This accounts for class imbalance by giving more weight to larger classes .\n", + "\n", + "Each of these averages has different use-cases. Micro-average is\n", + "dominated by common classes, macro-average highlights performance on\n", + "rare classes, and weighted-average is a compromise. These formulas\n", + "and concepts allow rigorous evaluation of classifier performance in\n", + "both binary and multi-class settings." + ] + }, + { + "cell_type": "markdown", + "id": "9274bf3f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercises\n", + "\n", + "Here is a simple code example which uses the Logistic regression machinery from **scikit-learn**.\n", + "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n", + "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "be9ff0b9", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "# from sklearn.datasets import fill in the data set\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data, fill inn\n", + "mydata.data = ?\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "# define which type of problem, binary or multiclass\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "51760b3e", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise a)\n", + "\n", + "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem." + ] + }, + { + "cell_type": "markdown", + "id": "c1d42f5f", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise b)\n", + "\n", + "Use a binary classification data available from **scikit-learn**. As an example you can use\n", + "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d20bb8be", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n", + "X, y = digits.data, digits.target" + ] + }, + { + "cell_type": "markdown", + "id": "828ea1cd", + "metadata": { + "editable": true + }, + "source": [ + "Alternatively, you can use the _make$\\_$classification_\n", + "functionality. This function generates a random $n$-class classification\n", + "dataset, which can be configured for binary classification by setting\n", + "n_classes=2. You can also control the number of samples, features,\n", + "informative features, redundant features, and more." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d271f0ba", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import make_classification\n", + "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "0068b032", + "metadata": { + "editable": true + }, + "source": [ + "You can use this option for the multiclass case as well, see the next exercise.\n", + "If you prefer to study other binary classification datasets, feel free\n", + "to replace the above suggestions with your own dataset.\n", + "\n", + "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve." + ] + }, + { + "cell_type": "markdown", + "id": "c45f5b41", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise c) week 43\n", + "\n", + "As a multiclass problem, we will use the Iris data set discussed in\n", + "the exercises from weeks 41 and 42. This is a three-class data set and\n", + "you can set it up using **scikit-learn**," + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3b045d56", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_iris\n", + "iris = load_iris()\n", + "X = iris.data # Features\n", + "y = iris.target # Target labels" + ] + }, + { + "cell_type": "markdown", + "id": "14cc859c", + "metadata": { + "editable": true + }, + "source": [ + "Make plots of the confusion matrix, the ROC curve and the cumulative\n", + "gain curve for this (or other) multiclass data set." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek44.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek44.ipynb new file mode 100644 index 000000000..e218688d9 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek44.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "55f7cd56", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "37c83276", + "metadata": { + "editable": true + }, + "source": [ + "# Exercises week 44\n", + "\n", + "**October 27-31, 2025**\n", + "\n", + "Date: **Deadline is Friday October 31 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "id": "58a26983", + "metadata": { + "editable": true + }, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "The exercise set this week has two parts.\n", + "\n", + "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n", + "\n", + "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n" + ] + }, + { + "cell_type": "markdown", + "id": "350c58e2", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n", + "\n", + "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise **1d)**\n" + ] + }, + { + "cell_type": "markdown", + "id": "00f65f6e", + "metadata": {}, + "source": [ + "## Exercise 1:\n", + "\n", + "Following the same directions as in the weekly exercises for week 39:\n", + "\n", + "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n", + "\n", + "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n", + "\n", + "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n", + "\n", + "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n", + "\n", + "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n" + ] + }, + { + "cell_type": "markdown", + "id": "6dff53b8", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 2:\n", + "\n", + "**a)** Linear and logistic regression methods\n", + "\n", + "1. What is the main difference between ordinary least squares and Ridge regression?\n", + "\n", + "2. Which kind of data set would you use logistic regression for?\n", + "\n", + "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n", + "\n", + "4. Can you find an analytic solution to a logistic regression type of problem?\n", + "\n", + "5. What kind of cost function would you use in logistic regression?\n" + ] + }, + { + "cell_type": "markdown", + "id": "21a056a4", + "metadata": { + "editable": true + }, + "source": [ + "**b)** Deep learning\n", + "\n", + "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n", + "\n", + "2. Describe the architecture of a typical feed forward Neural Network (NN).\n", + "\n", + "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n", + "\n", + "4. How would you know if your model is suffering from the problem of exploding gradients?\n", + "\n", + "5. Can you name and explain a few hyperparameters used for training a neural network?\n", + "\n", + "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n", + "\n", + "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n", + "\n", + "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n", + "\n", + "9. How does L1/L2 regularization affect a neural network?\n", + "\n", + "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n" + ] + }, + { + "cell_type": "markdown", + "id": "7c48bc09", + "metadata": { + "editable": true + }, + "source": [ + "**c)** Optimization part\n", + "\n", + "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n", + "\n", + "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n", + "\n", + "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n", + "\n", + "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n", + "\n", + "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n" + ] + }, + { + "cell_type": "markdown", + "id": "56b0b5f6", + "metadata": { + "editable": true + }, + "source": [ + "**d)** Analysis of results\n", + "\n", + "1. How do you assess overfitting and underfitting?\n", + "\n", + "2. Why do we divide the data in test and train and/or eventually validation sets?\n", + "\n", + "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n", + "\n", + "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/project1.ipynb b/doc/LectureNotes/_build/jupyter_execute/project1.ipynb index 85723ce73..3cca77865 100644 --- a/doc/LectureNotes/_build/jupyter_execute/project1.ipynb +++ b/doc/LectureNotes/_build/jupyter_execute/project1.ipynb @@ -9,7 +9,7 @@ "source": [ "\n", - "" + "\n" ] }, { @@ -20,9 +20,34 @@ }, "source": [ "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n", + "\n", "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n", "\n", - "Date: **September 2**" + "Date: **September 2**\n" + ] + }, + { + "cell_type": "markdown", + "id": "beb333e3", + "metadata": {}, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + " - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + " - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + " - A PDF file of the report\n", + " - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + " - A README file with\n", + " - the name of the group members\n", + " - a short description of the project\n", + " - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n", + " - names and descriptions of the various notebooks in the Code folder and the results they produce\n" ] }, { @@ -35,7 +60,7 @@ "## Preamble: Note on writing reports, using reference material, AI and other tools\n", "\n", "We want you to answer the three different projects by handing in\n", - "reports written like a standard scientific/technical report. The\n", + "reports written like a standard scientific/technical report. The\n", "links at\n", "\n", "contain more information. There you can find examples of previous\n", @@ -63,14 +88,14 @@ "been studied in the scientific literature. This makes it easier for\n", "you to compare and analyze your results. Comparing with existing\n", "results from the scientific literature is also an essential element of\n", - "the scientific discussion. The University of California at Irvine\n", + "the scientific discussion. The University of California at Irvine\n", "with its Machine Learning repository at\n", " is an excellent site to\n", "look up for examples and\n", "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n", "interesting site. Feel free to explore these sites. When selecting\n", "other data sets, make sure these are sets used for regression problems\n", - "(not classification)." + "(not classification).\n" ] }, { @@ -90,7 +115,7 @@ "We will study how to fit polynomials to specific\n", "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n", "\n", - "We will use Runge's function (see for a discussion). The one-dimensional function we will study is" + "We will use Runge's function (see for a discussion). The one-dimensional function we will study is\n" ] }, { @@ -102,7 +127,7 @@ "source": [ "$$\n", "f(x) = \\frac{1}{1+25x^2}.\n", - "$$" + "$$\n" ] }, { @@ -114,14 +139,14 @@ "source": [ "Our first step will be to perform an OLS regression analysis of this\n", "function, trying out a polynomial fit with an $x$ dependence of the\n", - "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", + "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n", "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n", - "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", + "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", "\n", "We will also include bootstrap as a resampling technique in order to\n", - "study the so-called **bias-variance tradeoff**. After that we will\n", - "include the so-called cross-validation technique." + "study the so-called **bias-variance tradeoff**. After that we will\n", + "include the so-called cross-validation technique.\n" ] }, { @@ -133,15 +158,15 @@ "source": [ "### Part a : Ordinary Least Square (OLS) for the Runge function\n", "\n", - "We will generate our own dataset for abovementioned function\n", + "We will generate our own dataset for abovementioned function\n", "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n", "of an added stochastic noise to this function using the normal\n", "distribution $N(0,1)$.\n", "\n", - "*Write your own code* (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", - "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", + "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", + "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", "\n", - "Evaluate the mean Squared error (MSE)" + "Evaluate the mean Squared error (MSE)\n" ] }, { @@ -154,7 +179,7 @@ "$$\n", "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n", "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n", - "$$" + "$$\n" ] }, { @@ -164,9 +189,9 @@ "editable": true }, "source": [ - "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", + "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n", - "then the score $R^2$ is defined as" + "then the score $R^2$ is defined as\n" ] }, { @@ -178,7 +203,7 @@ "source": [ "$$\n", "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n", - "$$" + "$$\n" ] }, { @@ -188,7 +213,7 @@ "editable": true }, "source": [ - "where we have defined the mean value of $\\boldsymbol{y}$ as" + "where we have defined the mean value of $\\boldsymbol{y}$ as\n" ] }, { @@ -200,7 +225,7 @@ "source": [ "$$\n", "\\bar{y} = \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n", - "$$" + "$$\n" ] }, { @@ -215,23 +240,23 @@ "\n", "Your code has to include a scaling/centering of the data (for example by\n", "subtracting the mean value), and\n", - "a split of the data in training and test data. For the scaling you can\n", + "a split of the data in training and test data. For the scaling you can\n", "either write your own code or use for example the function for\n", "splitting training data provided by the library **Scikit-Learn** (make\n", - "sure you have installed it). This function is called\n", - "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", + "sure you have installed it). This function is called\n", + "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", "\n", "It is normal in essentially all Machine Learning studies to split the\n", - "data in a training set and a test set (eventually also an additional\n", - "validation set). There\n", + "data in a training set and a test set (eventually also an additional\n", + "validation set). There\n", "is no explicit recipe for how much data should be included as training\n", - "data and say test data. An accepted rule of thumb is to use\n", + "data and say test data. An accepted rule of thumb is to use\n", "approximately $2/3$ to $4/5$ of the data as training data.\n", "\n", "You can easily reuse the solutions to your exercises from week 35.\n", "See also the lecture slides from week 35 and week 36.\n", "\n", - "On scaling, we recommend reading the following section from the scikit-learn software description, see ." + "On scaling, we recommend reading the following section from the scikit-learn software description, see .\n" ] }, { @@ -241,14 +266,14 @@ "editable": true }, "source": [ - "### Part b: Adding Ridge regression for the Runge function\n", + "### Part b: Adding Ridge regression for the Runge function\n", "\n", "Write your own code for the Ridge method as done in the previous\n", - "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", + "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", "\n", "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n", - "analyze your results with those obtained in part a) with the OLS method. Study the\n", - "dependence on $\\lambda$." + "analyze your results with those obtained in part a) with the OLS method. Study the\n", + "dependence on $\\lambda$.\n" ] }, { @@ -267,7 +292,7 @@ "from week 36).\n", "\n", "Study and compare your results from parts a) and b) with your gradient\n", - "descent approch. Discuss in particular the role of the learning rate." + "descent approch. Discuss in particular the role of the learning rate.\n" ] }, { @@ -283,7 +308,7 @@ "the gradient descent method by including **momentum**, **ADAgrad**,\n", "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n", "rate. Discuss the results and compare the different methods applied to\n", - "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods." + "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n" ] }, { @@ -299,12 +324,12 @@ "represents our first encounter with a machine learning method which\n", "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n", "descent methods you developed in parts c) and d) to solve the LASSO\n", - "optimization problem. You can compare your results with \n", + "optimization problem. You can compare your results with\n", "the functionalities of **Scikit-Learn**.\n", "\n", "Discuss (critically) your results for the Runge function from OLS,\n", "Ridge and LASSO regression using the various gradient descent\n", - "approaches." + "approaches.\n" ] }, { @@ -319,7 +344,7 @@ "Our last gradient step is to include stochastic gradient descent using\n", "the same methods to update the learning rates as in parts c-e).\n", "Compare and discuss your results with and without stochastic gradient\n", - "and give a critical assessment of the various methods." + "and give a critical assessment of the various methods.\n" ] }, { @@ -332,14 +357,14 @@ "### Part g: Bias-variance trade-off and resampling techniques\n", "\n", "Our aim here is to study the bias-variance trade-off by implementing\n", - "the **bootstrap** resampling technique. **We will only use the simpler\n", + "the **bootstrap** resampling technique. **We will only use the simpler\n", "ordinary least squares here**.\n", "\n", - "With a code which does OLS and includes resampling techniques, \n", + "With a code which does OLS and includes resampling techniques,\n", "we will now discuss the bias-variance trade-off in the context of\n", "continuous predictions such as regression. However, many of the\n", "intuitions and ideas discussed here also carry over to classification\n", - "tasks and basically all Machine Learning algorithms. \n", + "tasks and basically all Machine Learning algorithms.\n", "\n", "Before you perform an analysis of the bias-variance trade-off on your\n", "test data, make first a figure similar to Fig. 2.11 of Hastie,\n", @@ -356,7 +381,7 @@ "dataset $\\mathcal{L}$ consisting of the data\n", "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n", "\n", - "We assume that the true data is generated from a noisy model" + "We assume that the true data is generated from a noisy model\n" ] }, { @@ -368,7 +393,7 @@ "source": [ "$$\n", "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n", - "$$" + "$$\n" ] }, { @@ -387,7 +412,7 @@ "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n", "\n", "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n", - "squared error via the so-called cost function" + "squared error via the so-called cost function\n" ] }, { @@ -399,7 +424,7 @@ "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", - "$$" + "$$\n" ] }, { @@ -409,14 +434,14 @@ "editable": true }, "source": [ - "Here the expected value $\\mathbb{E}$ is the sample value. \n", + "Here the expected value $\\mathbb{E}$ is the sample value.\n", "\n", "Show that you can rewrite this in terms of a term which contains the\n", "variance of the model itself (the so-called variance term), a term\n", "which measures the deviation from the true data and the mean value of\n", "the model (the bias term) and finally the variance of the noise.\n", "\n", - "That is, show that" + "That is, show that\n" ] }, { @@ -428,7 +453,7 @@ "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", - "$$" + "$$\n" ] }, { @@ -438,7 +463,7 @@ "editable": true }, "source": [ - "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)" + "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n" ] }, { @@ -450,7 +475,7 @@ "source": [ "$$\n", "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", - "$$" + "$$\n" ] }, { @@ -460,7 +485,7 @@ "editable": true }, "source": [ - "and" + "and\n" ] }, { @@ -472,7 +497,7 @@ "source": [ "$$\n", "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", - "$$" + "$$\n" ] }, { @@ -482,11 +507,11 @@ "editable": true }, "source": [ - "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$. \n", + "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n", "\n", "The answer to this exercise should be included in the theory part of\n", - "the report. This exercise is also part of the weekly exercises of\n", - "week 38. Explain what the terms mean and discuss their\n", + "the report. This exercise is also part of the weekly exercises of\n", + "week 38. Explain what the terms mean and discuss their\n", "interpretations.\n", "\n", "Perform then a bias-variance analysis of the Runge function by\n", @@ -495,7 +520,7 @@ "Discuss the bias and variance trade-off as function\n", "of your model complexity (the degree of the polynomial) and the number\n", "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n", - "You can follow the code example in the jupyter-book at ." + "You can follow the code example in the jupyter-book at .\n" ] }, { @@ -505,20 +530,20 @@ "editable": true }, "source": [ - "### Part h): Cross-validation as resampling techniques, adding more complexity\n", + "### Part h): Cross-validation as resampling techniques, adding more complexity\n", "\n", "The aim here is to implement another widely popular\n", - "resampling technique, the so-called cross-validation method. \n", + "resampling technique, the so-called cross-validation method.\n", "\n", "Implement the $k$-fold cross-validation algorithm (feel free to use\n", "the functionality of **Scikit-Learn** or write your own code) and\n", "evaluate again the MSE function resulting from the test folds.\n", "\n", "Compare the MSE you get from your cross-validation code with the one\n", - "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results. \n", + "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n", "\n", "In addition to using the ordinary least squares method, you should\n", - "include both Ridge and Lasso regression in the final analysis." + "include both Ridge and Lasso regression in the final analysis.\n" ] }, { @@ -532,7 +557,7 @@ "\n", "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n", "\n", - "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h)." + "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n" ] }, { @@ -544,25 +569,25 @@ "source": [ "## Introduction to numerical projects\n", "\n", - "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers. \n", + "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n", "\n", - " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", "\n", - " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", "\n", - " * Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", + "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", "\n", - " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", "\n", - " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", "\n", - " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", "\n", - " * Try to give an interpretation of you results in your answers to the problems.\n", + "- Try to give an interpretation of you results in your answers to the problems.\n", "\n", - " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", "\n", - " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n" ] }, { @@ -574,17 +599,17 @@ "source": [ "## Format for electronic delivery of report and programs\n", "\n", - "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", "\n", - " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "- Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", "\n", - " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", "\n", - " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", "\n", - "Finally, \n", - "we encourage you to collaborate. Optimal working groups consist of \n", - "2-3 students. You can then hand in a common report." + "Finally,\n", + "we encourage you to collaborate. Optimal working groups consist of\n", + "2-3 students. You can then hand in a common report.\n" ] }, { @@ -596,42 +621,46 @@ "source": [ "## Software and needed installations\n", "\n", - "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, \n", + "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n", "we recommend that you install the following Python packages via **pip** as\n", + "\n", "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n", "\n", "For Python3, replace **pip** with **pip3**.\n", "\n", - "See below for a discussion of **tensorflow** and **scikit-learn**. \n", + "See below for a discussion of **tensorflow** and **scikit-learn**.\n", "\n", - "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows \n", + "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n", "for a seamless installation of additional software via for example\n", + "\n", "1. brew install python3\n", "\n", "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n", - "you can use **pip** as well and simply install Python as \n", - "1. sudo apt-get install python3 (or python for python2.7)\n", + "you can use **pip** as well and simply install Python as\n", + "\n", + "1. sudo apt-get install python3 (or python for python2.7)\n", + "\n", + "etc etc.\n", "\n", - "etc etc. \n", + "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", "\n", - "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n", "\n", - "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", + "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", "\n", "Popular software packages written in Python for ML are\n", "\n", - "* [Scikit-learn](http://scikit-learn.org/stable/), \n", + "- [Scikit-learn](http://scikit-learn.org/stable/),\n", "\n", - "* [Tensorflow](https://www.tensorflow.org/),\n", + "- [Tensorflow](https://www.tensorflow.org/),\n", "\n", - "* [PyTorch](http://pytorch.org/) and \n", + "- [PyTorch](http://pytorch.org/) and\n", "\n", - "* [Keras](https://keras.io/).\n", + "- [Keras](https://keras.io/).\n", "\n", - "These are all freely available at their respective GitHub sites. They \n", + "These are all freely available at their respective GitHub sites. They\n", "encompass communities of developers in the thousands or more. And the number\n", - "of code developers and contributors keeps increasing." + "of code developers and contributors keeps increasing.\n" ] } ], diff --git a/doc/LectureNotes/_build/jupyter_execute/project2.ipynb b/doc/LectureNotes/_build/jupyter_execute/project2.ipynb new file mode 100644 index 000000000..f2130ba5a --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/project2.ipynb @@ -0,0 +1,635 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "96e577ca", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "067c02b9", + "metadata": { + "editable": true + }, + "source": [ + "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n", + "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n", + "\n", + "Date: **October 14, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "01f9fedd", + "metadata": { + "editable": true + }, + "source": [ + "## Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + "\n", + " * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + "\n", + " * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "\n", + "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + "\n", + "A PDF file of the report\n", + " * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + "\n", + " * A README file with the name of the group members\n", + "\n", + " * a short description of the project\n", + "\n", + " * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce" + ] + }, + { + "cell_type": "markdown", + "id": "9f8e4871", + "metadata": { + "editable": true + }, + "source": [ + "### Preamble: Note on writing reports, using reference material, AI and other tools\n", + "\n", + "We want you to answer the three different projects by handing in\n", + "reports written like a standard scientific/technical report. The links\n", + "at\n", + "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n", + "contain more information. There you can find examples of previous\n", + "reports, the projects themselves, how we grade reports etc. How to\n", + "write reports will also be discussed during the various lab\n", + "sessions. Please do ask us if you are in doubt.\n", + "\n", + "When using codes and material from other sources, you should refer to\n", + "these in the bibliography of your report, indicating wherefrom you for\n", + "example got the code, whether this is from the lecture notes,\n", + "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n", + "sources. These sources should always be cited correctly. How to cite\n", + "some of the libraries is often indicated from their corresponding\n", + "GitHub sites or websites, see for example how to cite Scikit-Learn at\n", + "https://scikit-learn.org/dev/about.html./n", + "\n", + "We enocurage you to use tools like ChatGPT or similar in writing the\n", + "report. If you use for example ChatGPT, please do cite it properly and\n", + "include (if possible) your questions and answers as an addition to the\n", + "report. This can be uploaded to for example your website,\n", + "GitHub/GitLab or similar as supplemental material.\n", + "\n", + "If you would like to study other data sets, feel free to propose other\n", + "sets. What we have proposed here are mere suggestions from our\n", + "side. If you opt for another data set, consider using a set which has\n", + "been studied in the scientific literature. This makes it easier for\n", + "you to compare and analyze your results. Comparing with existing\n", + "results from the scientific literature is also an essential element of\n", + "the scientific discussion. The University of California at Irvine with\n", + "its Machine Learning repository at\n", + "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n", + "up for examples and inspiration. Kaggle.com is an equally interesting\n", + "site. Feel free to explore these sites." + ] + }, + { + "cell_type": "markdown", + "id": "460cc6ea", + "metadata": { + "editable": true + }, + "source": [ + "## Classification and Regression, writing our own neural network code\n", + "\n", + "The main aim of this project is to study both classification and\n", + "regression problems by developing our own \n", + "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see and ) as well as the lecture material from the same weeks (see and ) should contain enough information for you to get started with writing your own code.\n", + "\n", + "We will also reuse our codes on gradient descent methods from project 1.\n", + "\n", + "The data sets that we propose here are (the default sets)\n", + "\n", + "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n", + "\n", + " * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "* Classification.\n", + "\n", + " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at .\n", + "\n", + "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1." + ] + }, + { + "cell_type": "markdown", + "id": "d62a07ef", + "metadata": { + "editable": true + }, + "source": [ + "### Part a): Analytical warm-up\n", + "\n", + "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n", + "gradients. The functions whose gradients we need are:\n", + "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n", + "\n", + "2. The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms\n", + "\n", + "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n", + "\n", + "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.\n", + "\n", + "We will test three activation functions for our neural network setup, these are the \n", + "1. The Sigmoid (aka **logit**) function,\n", + "\n", + "2. the RELU function and\n", + "\n", + "3. the Leaky RELU function\n", + "\n", + "Set up their expressions and their first derivatives.\n", + "You may consult the lecture notes (with codes and more) from week 42 at ." + ] + }, + { + "cell_type": "markdown", + "id": "9cd8b8ac", + "metadata": { + "editable": true + }, + "source": [ + "### Reminder about the gradient machinery from project 1\n", + "\n", + "In the setup of a neural network code you will need your gradient descent codes from\n", + "project 1. For neural networks we will recommend using stochastic\n", + "gradient descent with either the RMSprop or the ADAM algorithms for\n", + "updating the learning rates. But you should feel free to try plain gradient descent as well.\n", + "\n", + "We recommend reading chapter 8 on optimization from the textbook of\n", + "Goodfellow, Bengio and Courville at\n", + ". This chapter contains many\n", + "useful insights and discussions on the optimization part of machine\n", + "learning. A useful reference on the back progagation algorithm is\n", + "Nielsen's book at . \n", + "\n", + "You will find the Python [Seaborn\n", + "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n", + "useful when plotting the results as function of the learning rate\n", + "$\\eta$ and the hyper-parameter $\\lambda$ ." + ] + }, + { + "cell_type": "markdown", + "id": "5931b155", + "metadata": { + "editable": true + }, + "source": [ + "### Part b): Writing your own Neural Network code\n", + "\n", + "Your aim now, and this is the central part of this project, is to\n", + "write your own FFNN code implementing the back\n", + "propagation algorithm discussed in the lecture slides from week 41 at and week 42 at .\n", + "\n", + "We will focus on a regression problem first, using the one-dimensional Runge function" + ] + }, + { + "cell_type": "markdown", + "id": "b273fc8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1+25x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e13db1ec", + "metadata": { + "editable": true + }, + "source": [ + "from project 1.\n", + "\n", + "Use only the mean-squared error as cost function (no regularization terms) and \n", + "write an FFNN code for a regression problem with a flexible number of hidden\n", + "layers and nodes using only the Sigmoid function as activation function for\n", + "the hidden layers. Initialize the weights using a normal\n", + "distribution. How would you initialize the biases? And which\n", + "activation function would you select for the final output layer?\n", + "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n", + "\n", + "Train your network and compare the results with those from your OLS\n", + "regression code from project 1 using the one-dimensional Runge\n", + "function. When comparing your neural network code with the OLS\n", + "results from project 1, use the same data sets which gave you the best\n", + "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n", + "best result. Compare these results with your neural network with one\n", + "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n", + "\n", + "Comment your results and give a critical discussion of the results\n", + "obtained with the OLS code from project 1 and your own neural network\n", + "code. Make an analysis of the learning rates employed to find the\n", + "optimal MSE score. Test both stochastic gradient descent\n", + "with RMSprop and ADAM and plain gradient descent with different\n", + "learning rates.\n", + "\n", + "You should, as you did in project 1, scale your data." + ] + }, + { + "cell_type": "markdown", + "id": "4f864e31", + "metadata": { + "editable": true + }, + "source": [ + "### Part c): Testing against other software libraries\n", + "\n", + "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n", + "\n", + "Furthermore, you should also test that your derivatives are correctly\n", + "calculated using automatic differentiation, using for example the\n", + "**Autograd** library or the **JAX** library. It is optional to implement\n", + "these libraries for the present project. In this project they serve as\n", + "useful tests of our derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "c9faeafd", + "metadata": { + "editable": true + }, + "source": [ + "### Part d): Testing different activation functions and depths of the neural network\n", + "\n", + "You should also test different activation functions for the hidden\n", + "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n", + "discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n", + "It is optional in this project to perform a bias-variance trade-off analysis." + ] + }, + { + "cell_type": "markdown", + "id": "d865c22b", + "metadata": { + "editable": true + }, + "source": [ + "### Part e): Testing different norms\n", + "\n", + "Finally, still using the one-dimensional Runge function, add now the\n", + "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms. Find the\n", + "optimal results for the hyperparameters $\\lambda$ and the learning\n", + "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n", + "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n", + "Use again the same data sets and the best results from project 1 in your comparisons." + ] + }, + { + "cell_type": "markdown", + "id": "5270af8f", + "metadata": { + "editable": true + }, + "source": [ + "### Part f): Classification analysis using neural networks\n", + "\n", + "With a well-written code it should now be easy to change the\n", + "activation function for the output layer.\n", + "\n", + "Here we will change the cost function for our neural network code\n", + "developed in parts b), d) and e) in order to perform a classification\n", + "analysis. The classification problem we will study is the multiclass\n", + "MNIST problem, see the description of the full data set at\n", + ". We will use the Softmax cross entropy function discussed in a). \n", + "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n", + "\n", + "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the \n", + "MNIST-Fashion data set at for example .\n", + "\n", + "To set up the data set, the following python programs may be useful" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e0e1fea", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import fetch_openml\n", + "\n", + "# Fetch the MNIST dataset\n", + "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n", + "\n", + "# Extract data (features) and target (labels)\n", + "X = mnist.data\n", + "y = mnist.target" + ] + }, + { + "cell_type": "markdown", + "id": "8fe85677", + "metadata": { + "editable": true + }, + "source": [ + "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b28318b2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = X / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "97e02c71", + "metadata": { + "editable": true + }, + "source": [ + "And then perform the standard train-test splitting" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "88af355c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "d1f8f0ed", + "metadata": { + "editable": true + }, + "source": [ + "To measure the performance of our classification problem we will use the\n", + "so-called *accuracy* score. The accuracy is as you would expect just\n", + "the number of correctly guessed targets $t_i$ divided by the total\n", + "number of targets, that is" + ] + }, + { + "cell_type": "markdown", + "id": "554b3a48", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77bfdd5c", + "metadata": { + "editable": true + }, + "source": [ + "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n", + "otherwise if we have a binary classification problem. Here $t_i$\n", + "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n", + "\n", + "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions. \n", + "\n", + "Again, we strongly recommend that you compare your own neural Network\n", + "code for classification and pertinent results against a similar code using **Scikit-Learn** or **tensorflow/keras** or **pytorch**.\n", + "\n", + "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n", + "The weblink here compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n", + "\n", + "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "eaa9e72e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "# Initialize the model\n", + "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n", + "# Train the model\n", + "model.fit(X_train, y_train)\n", + "from sklearn.metrics import accuracy_score\n", + "# Make predictions on the test set\n", + "y_pred = model.predict(X_test)\n", + "# Calculate accuracy\n", + "accuracy = accuracy_score(y_test, y_pred)\n", + "print(f\"Model Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "c7ba883e", + "metadata": { + "editable": true + }, + "source": [ + "### Part g) Critical evaluation of the various algorithms\n", + "\n", + "After all these glorious calculations, you should now summarize the\n", + "various algorithms and come with a critical evaluation of their pros\n", + "and cons. Which algorithm works best for the regression case and which\n", + "is best for the classification case. These codes can also be part of\n", + "your final project 3, but now applied to other data sets." + ] + }, + { + "cell_type": "markdown", + "id": "595be693", + "metadata": { + "editable": true + }, + "source": [ + "## Summary of methods to implement and analyze\n", + "\n", + "**Required Implementation:**\n", + "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n", + "\n", + "2. Implement a neural network with\n", + "\n", + " * A flexible number of layers\n", + "\n", + " * A flexible number of nodes in each layer\n", + "\n", + " * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n", + "\n", + " * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n", + "\n", + " * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n", + "\n", + "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n", + "\n", + "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n", + "\n", + " * With no optimization algorithm\n", + "\n", + " * With RMS Prop\n", + "\n", + " * With ADAM\n", + "\n", + "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n", + "\n", + "6. Implement and compute metrics like the MSE and Accuracy" + ] + }, + { + "cell_type": "markdown", + "id": "35138b41", + "metadata": { + "editable": true + }, + "source": [ + "### Required Analysis:\n", + "\n", + "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n", + "\n", + "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n", + "\n", + "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n", + "\n", + "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n", + "\n", + "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network" + ] + }, + { + "cell_type": "markdown", + "id": "b18bea03", + "metadata": { + "editable": true + }, + "source": [ + "### Optional (Note that you should include at least two of these in the report):\n", + "\n", + "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n", + "\n", + "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n", + "\n", + "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n", + "\n", + "4. Use a more complex classification dataset instead, like the fashion MNIST (see )\n", + "\n", + "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "6. Compute and interpret a confusion matrix of your best classification model (see )" + ] + }, + { + "cell_type": "markdown", + "id": "580d8424", + "metadata": { + "editable": true + }, + "source": [ + "## Background literature\n", + "\n", + "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at . It is an excellent read.\n", + "\n", + "2. Goodfellow, Bengio and Courville, Deep Learning at . Here we recommend chapters 6, 7 and 8\n", + "\n", + "3. Raschka et al. at . Here we recommend chapters 11, 12 and 13." + ] + }, + { + "cell_type": "markdown", + "id": "96f5c67e", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to numerical projects\n", + "\n", + "Here follows a brief recipe and recommendation on how to write a report for each\n", + "project.\n", + "\n", + " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "\n", + " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "\n", + " * Include the source code of your program. Comment your program properly.\n", + "\n", + " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "\n", + " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "\n", + " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "\n", + " * Try to give an interpretation of you results in your answers to the problems.\n", + "\n", + " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "\n", + " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + ] + }, + { + "cell_type": "markdown", + "id": "d1bc28ba", + "metadata": { + "editable": true + }, + "source": [ + "## Format for electronic delivery of report and programs\n", + "\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n", + "\n", + " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "\n", + " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "\n", + " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "\n", + "Finally, \n", + "we encourage you to collaborate. Optimal working groups consist of \n", + "2-3 students. You can then hand in a common report." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week37.ipynb b/doc/LectureNotes/_build/jupyter_execute/week37.ipynb new file mode 100644 index 000000000..b072ac35a --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week37.ipynb @@ -0,0 +1,3856 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d842e7e1", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0cd52479", + "metadata": { + "editable": true + }, + "source": [ + "# Week 37: Gradient descent methods\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 8-12, 2025**\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "699b6141", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 37, lecture Monday\n", + "\n", + "**Plans and material for the lecture on Monday September 8.**\n", + "\n", + "The family of gradient descent methods\n", + "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n", + "\n", + "2. Improving gradient descent with momentum\n", + "\n", + "3. Introducing stochastic gradient descent\n", + "\n", + "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n", + "\n", + "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n", + "\n", + "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "dd264b1c", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at and chapter 8.3-8.5 at \n", + "\n", + "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n", + "\n", + "3. Video on gradient descent at \n", + "\n", + "4. Video on Stochastic gradient descent at " + ] + }, + { + "cell_type": "markdown", + "id": "608927bc", + "metadata": { + "editable": true + }, + "source": [ + "## Material for lecture Monday September 8" + ] + }, + { + "cell_type": "markdown", + "id": "60640670", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and revisiting Ordinary Least Squares from last week\n", + "\n", + "Last week we started with linear regression as a case study for the gradient descent\n", + "methods. Linear regression is a great test case for the gradient\n", + "descent methods discussed in the lectures since it has several\n", + "desirable properties such as:\n", + "\n", + "1. An analytical solution (recall homework sets for week 35).\n", + "\n", + "2. The gradient can be computed analytically.\n", + "\n", + "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n", + "\n", + "We revisit an example similar to what we had in the first homework set. We have a function of the type" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "947b67ee", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "x = 2*np.random.rand(m,1)\n", + "y = 4+3*x+np.random.randn(m,1)" + ] + }, + { + "cell_type": "markdown", + "id": "0a787eca", + "metadata": { + "editable": true + }, + "source": [ + "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n", + "The linear regression model is given by" + ] + }, + { + "cell_type": "markdown", + "id": "d7e84ac7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f34c217e", + "metadata": { + "editable": true + }, + "source": [ + "such that" + ] + }, + { + "cell_type": "markdown", + "id": "b145d4eb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2df6d60d", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent example\n", + "\n", + "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n", + "\n", + "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)" + ] + }, + { + "cell_type": "markdown", + "id": "1deafba0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "X \\equiv \\begin{bmatrix}\n", + "1 & x_1 \\\\\n", + "\\vdots & \\vdots \\\\\n", + "1 & x_{100} & \\\\\n", + "\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "520ac423", + "metadata": { + "editable": true + }, + "source": [ + "The cost/loss/risk function is given by" + ] + }, + { + "cell_type": "markdown", + "id": "48e7232b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0194af20", + "metadata": { + "editable": true + }, + "source": [ + "and we want to find $\\theta$ such that $C(\\theta)$ is minimized." + ] + }, + { + "cell_type": "markdown", + "id": "9f58d823", + "metadata": { + "editable": true + }, + "source": [ + "## The derivative of the cost/loss function\n", + "\n", + "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show that the gradient can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "10129d02", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4cd07523", + "metadata": { + "editable": true + }, + "source": [ + "where $X$ is the design matrix defined above." + ] + }, + { + "cell_type": "markdown", + "id": "1bda7e01", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix\n", + "The Hessian matrix of $C(\\theta)$ is given by" + ] + }, + { + "cell_type": "markdown", + "id": "aa64bdd1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3e7f4c5d", + "metadata": { + "editable": true + }, + "source": [ + "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite." + ] + }, + { + "cell_type": "markdown", + "id": "79ed73a8", + "metadata": { + "editable": true + }, + "source": [ + "## Simple program\n", + "\n", + "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to" + ] + }, + { + "cell_type": "markdown", + "id": "1b70ad9b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2fbef92d", + "metadata": { + "editable": true + }, + "source": [ + "We can use the expression we computed for the gradient and let use a\n", + "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n", + "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n", + "\n", + "And finally we can compare our solution for $\\theta$ with the analytic result given by \n", + "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$." + ] + }, + { + "cell_type": "markdown", + "id": "0728a369", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient Descent Example\n", + "\n", + "Here our simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a48d43f0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "\n", + "# Importing various packages\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "# Hessian matrix\n", + "H = (2.0/n)* X.T @ X\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n", + "print(theta_linreg)\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "for iter in range(Niterations):\n", + " gradient = (2.0/n)*X.T @ (X @ theta-y)\n", + " theta -= eta*gradient\n", + "\n", + "print(theta)\n", + "xnew = np.array([[0],[2]])\n", + "xbnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = xbnew.dot(theta)\n", + "ypredict2 = xbnew.dot(theta_linreg)\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6c1c6ed1", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and Ridge\n", + "\n", + "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$," + ] + }, + { + "cell_type": "markdown", + "id": "a82ce6e3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb0de7c2", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows" + ] + }, + { + "cell_type": "markdown", + "id": "b76c0dea", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C_{\\text{ridge}}(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4eeb07f6", + "metadata": { + "editable": true + }, + "source": [ + "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by" + ] + }, + { + "cell_type": "markdown", + "id": "cc7d6c64", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "08bd65db", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix for Ridge Regression\n", + "The Hessian matrix of Ridge Regression for our simple example is given by" + ] + }, + { + "cell_type": "markdown", + "id": "a1c5a4d1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f178c97e", + "metadata": { + "editable": true + }, + "source": [ + "This implies that the Hessian matrix is positive definite, hence the stationary point is a\n", + "minimum.\n", + "Note that the Ridge cost function is convex being a sum of two convex\n", + "functions. Therefore, the stationary point is a global\n", + "minimum of this function." + ] + }, + { + "cell_type": "markdown", + "id": "3853aec7", + "metadata": { + "editable": true + }, + "source": [ + "## Program example for gradient descent with Ridge Regression" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "81740e7b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "\n", + "#Ridge parameter lambda\n", + "lmbda = 0.001\n", + "Id = n*lmbda* np.eye(XT_X.shape[0])\n", + "\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "\n", + "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n", + "print(theta_linreg)\n", + "# Start plain gradient descent\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n", + " theta -= eta*gradients\n", + "\n", + "print(theta)\n", + "ypredict = X @ theta\n", + "ypredict2 = X @ theta_linreg\n", + "plt.plot(x, ypredict, \"r-\")\n", + "plt.plot(x, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example for Ridge')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "aa1b6e08", + "metadata": { + "editable": true + }, + "source": [ + "## Using gradient descent methods, limitations\n", + "\n", + "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n", + "\n", + "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n", + "\n", + "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n", + "\n", + "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n", + "\n", + "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n", + "\n", + "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points." + ] + }, + { + "cell_type": "markdown", + "id": "d1b9be1a", + "metadata": { + "editable": true + }, + "source": [ + "## Momentum based GD\n", + "\n", + "We discuss here some simple examples where we introduce what is called\n", + "'memory'about previous steps, or what is normally called momentum\n", + "gradient descent.\n", + "For the mathematical details, see whiteboad notes from lecture on September 8, 2025." + ] + }, + { + "cell_type": "markdown", + "id": "2e1267e6", + "metadata": { + "editable": true + }, + "source": [ + "## Improving gradient descent with momentum" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "494e82a7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# take a step\n", + "\t\tsolution = solution - step_size * gradient\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# perform the gradient descent search\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "46858c7c", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "6a917123", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# keep track of the change\n", + "\tchange = 0.0\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# calculate update\n", + "\t\tnew_change = step_size * gradient + momentum * change\n", + "\t\t# take a step\n", + "\t\tsolution = solution - new_change\n", + "\t\t# save the change\n", + "\t\tchange = new_change\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# define momentum\n", + "momentum = 0.3\n", + "# perform the gradient descent search with momentum\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "361b2aa8", + "metadata": { + "editable": true + }, + "source": [ + "## Overview video on Stochastic Gradient Descent (SGD)\n", + "\n", + "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n", + "There are several reasons for using stochastic gradient descent. Some of these are:\n", + "\n", + "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n", + "\n", + "2. Hopefully avoid Local Minima\n", + "\n", + "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset." + ] + }, + { + "cell_type": "markdown", + "id": "2dacb8ef", + "metadata": { + "editable": true + }, + "source": [ + "## Batches and mini-batches\n", + "\n", + "In gradient descent we compute the cost function and its gradient for all data points we have.\n", + "\n", + "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n", + "training data can have on order of millions of examples. Hence, it\n", + "seems wasteful to compute the full cost function over the entire\n", + "training set in order to perform only a single parameter update. A\n", + "very common approach to addressing this challenge is to compute the\n", + "gradient over batches of the training data. For example, a typical batch could contain some thousand examples from\n", + "an entire training set of several millions. This batch is then used to\n", + "perform a parameter update." + ] + }, + { + "cell_type": "markdown", + "id": "59c9add4", + "metadata": { + "editable": true + }, + "source": [ + "## Pros and cons\n", + "\n", + "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n", + "\n", + "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n", + "\n", + "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient." + ] + }, + { + "cell_type": "markdown", + "id": "a5168cc9", + "metadata": { + "editable": true + }, + "source": [ + "## Convergence rates\n", + "\n", + "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n", + "\n", + "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration." + ] + }, + { + "cell_type": "markdown", + "id": "47321307", + "metadata": { + "editable": true + }, + "source": [ + "## Accuracy\n", + "\n", + "In general, stochastic Gradient Descent is Less accurate than gradient\n", + "descent, as it calculates the gradient on single examples, which may\n", + "not accurately represent the overall dataset. Gradient Descent is\n", + "more accurate because it uses the average gradient calculated over the\n", + "entire dataset.\n", + "\n", + "There are other disadvantages to using SGD. The main drawback is that\n", + "its convergence behaviour can be more erratic due to the random\n", + "sampling of individual training examples. This can lead to less\n", + "accurate results, as the algorithm may not converge to the true\n", + "minimum of the cost function. Additionally, the learning rate, which\n", + "determines the step size of each update to the model’s parameters,\n", + "must be carefully chosen to ensure convergence.\n", + "\n", + "It is however the method of choice in deep learning algorithms where\n", + "SGD is often used in combination with other optimization techniques,\n", + "such as momentum or adaptive learning rates" + ] + }, + { + "cell_type": "markdown", + "id": "96f44d6b", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent (SGD)\n", + "\n", + "In stochastic gradient descent, the extreme case is the case where we\n", + "have only one batch, that is we include the whole data set.\n", + "\n", + "This process is called Stochastic Gradient\n", + "Descent (SGD) (or also sometimes on-line gradient descent). This is\n", + "relatively less common to see because in practice due to vectorized\n", + "code optimizations it can be computationally much more efficient to\n", + "evaluate the gradient for 100 examples, than the gradient for one\n", + "example 100 times. Even though SGD technically refers to using a\n", + "single example at a time to evaluate the gradient, you will hear\n", + "people use the term SGD even when referring to mini-batch gradient\n", + "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n", + "for “Batch gradient descent” are rare to see), where it is usually\n", + "assumed that mini-batches are used. The size of the mini-batch is a\n", + "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n", + "usually based on memory constraints (if any), or set to some value,\n", + "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n", + "vectorized operation implementations work faster when their inputs are\n", + "sized in powers of 2.\n", + "\n", + "In our notes with SGD we mean stochastic gradient descent with mini-batches." + ] + }, + { + "cell_type": "markdown", + "id": "898ef421", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent\n", + "\n", + "Stochastic gradient descent (SGD) and variants thereof address some of\n", + "the shortcomings of the Gradient descent method discussed above.\n", + "\n", + "The underlying idea of SGD comes from the observation that the cost\n", + "function, which we want to minimize, can almost always be written as a\n", + "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$," + ] + }, + { + "cell_type": "markdown", + "id": "4e827950", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05e99546", + "metadata": { + "editable": true + }, + "source": [ + "## Computation of gradients\n", + "\n", + "This in turn means that the gradient can be\n", + "computed as a sum over $i$-gradients" + ] + }, + { + "cell_type": "markdown", + "id": "b92afe6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b20a4aca", + "metadata": { + "editable": true + }, + "source": [ + "Stochasticity/randomness is introduced by only taking the\n", + "gradient on a subset of the data called minibatches. If there are $n$\n", + "data points and the size of each minibatch is $M$, there will be $n/M$\n", + "minibatches. We denote these minibatches by $B_k$ where\n", + "$k=1,\\cdots,n/M$." + ] + }, + { + "cell_type": "markdown", + "id": "7884cc0d", + "metadata": { + "editable": true + }, + "source": [ + "## SGD example\n", + "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n", + "and we choose to have $M=5$ minibathces,\n", + "then each minibatch contains two data points. In particular we have\n", + "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n", + "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n", + "have only a single batch with all data points and on the other extreme,\n", + "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n", + "$B_k = \\mathbf{x}_k$.\n", + "\n", + "The idea is now to approximate the gradient by replacing the sum over\n", + "all data points with a sum over the data points in one the minibatches\n", + "picked at random in each gradient descent step" + ] + }, + { + "cell_type": "markdown", + "id": "392aeed0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta}\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n", + "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "04581249", + "metadata": { + "editable": true + }, + "source": [ + "## The gradient step\n", + "\n", + "Thus a gradient descent step now looks like" + ] + }, + { + "cell_type": "markdown", + "id": "d21077a4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b4bed668", + "metadata": { + "editable": true + }, + "source": [ + "where $k$ is picked at random with equal\n", + "probability from $[1,n/M]$. An iteration over the number of\n", + "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n", + "typical to choose a number of epochs and for each epoch iterate over\n", + "the number of minibatches, as exemplified in the code below." + ] + }, + { + "cell_type": "markdown", + "id": "9c15b282", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example code" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "602bda4c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 10 #number of epochs\n", + "\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for \n", + " j += 1" + ] + }, + { + "cell_type": "markdown", + "id": "332831a7", + "metadata": { + "editable": true + }, + "source": [ + "Taking the gradient only on a subset of the data has two important\n", + "benefits. First, it introduces randomness which decreases the chance\n", + "that our opmization scheme gets stuck in a local minima. Second, if\n", + "the size of the minibatches are small relative to the number of\n", + "datapoints ($M < n$), the computation of the gradient is much\n", + "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n", + "all $n$ datapoints." + ] + }, + { + "cell_type": "markdown", + "id": "187eb27c", + "metadata": { + "editable": true + }, + "source": [ + "## When do we stop?\n", + "\n", + "A natural question is when do we stop the search for a new minimum?\n", + "One possibility is to compute the full gradient after a given number\n", + "of epochs and check if the norm of the gradient is smaller than some\n", + "threshold and stop if true. However, the condition that the gradient\n", + "is zero is valid also for local minima, so this would only tell us\n", + "that we are close to a local/global minimum. However, we could also\n", + "evaluate the cost function at this point, store the result and\n", + "continue the search. If the test kicks in at a later stage we can\n", + "compare the values of the cost function and keep the $\\theta$ that\n", + "gave the lowest value." + ] + }, + { + "cell_type": "markdown", + "id": "8ddbdbb5", + "metadata": { + "editable": true + }, + "source": [ + "## Slightly different approach\n", + "\n", + "Another approach is to let the step length $\\eta_j$ depend on the\n", + "number of epochs in such a way that it becomes very small after a\n", + "reasonable time such that we do not move at all. Such approaches are\n", + "also called scaling. There are many such ways to [scale the learning\n", + "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n", + "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n", + "also\n", + "\n", + "for a discussion of different scaling functions for the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "35ea8e21", + "metadata": { + "editable": true + }, + "source": [ + "## Time decay rate\n", + "\n", + "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n", + "\n", + "In this way we can fix the number of epochs, compute $\\theta$ and\n", + "evaluate the cost function at the end. Repeating the computation will\n", + "give a different result since the scheme is random by design. Then we\n", + "pick the final $\\theta$ that gives the lowest value of the cost\n", + "function." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "77a60fcd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "def step_length(t,t0,t1):\n", + " return t0/(t+t1)\n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 500 #number of epochs\n", + "t0 = 1.0\n", + "t1 = 10\n", + "\n", + "eta_j = t0/t1\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for theta\n", + " t = epoch*m+i\n", + " eta_j = step_length(t,t0,t1)\n", + " j += 1\n", + "\n", + "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))" + ] + }, + { + "cell_type": "markdown", + "id": "b030b80c", + "metadata": { + "editable": true + }, + "source": [ + "## Code with a Number of Minibatches which varies\n", + "\n", + "In the code here we vary the number of mini-batches." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "9bdf875b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Importing various packages\n", + "from math import exp, sqrt\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ ((X @ theta)-y)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "365cebd9", + "metadata": { + "editable": true + }, + "source": [ + "## Replace or not\n", + "\n", + "In the above code, we have use replacement in setting up the\n", + "mini-batches. The discussion\n", + "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n", + "useful." + ] + }, + { + "cell_type": "markdown", + "id": "e7c9011a", + "metadata": { + "editable": true + }, + "source": [ + "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison" + ] + }, + { + "cell_type": "markdown", + "id": "f1c85da0", + "metadata": { + "editable": true + }, + "source": [ + "### Theoretical Convergence Speed and convex optimization\n", + "\n", + "Consider minimizing an empirical cost function" + ] + }, + { + "cell_type": "markdown", + "id": "66df0f80", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f02b845", + "metadata": { + "editable": true + }, + "source": [ + "where each $l_i(\\theta)$ is a\n", + "differentiable loss term. Gradient Descent (GD) updates parameters\n", + "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n", + "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n", + "l_i(\\theta)$ selected at random. In equation form, one GD step is:" + ] + }, + { + "cell_type": "markdown", + "id": "21997f1a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cdefe165", + "metadata": { + "editable": true + }, + "source": [ + "whereas one SGD step is:" + ] + }, + { + "cell_type": "markdown", + "id": "ac200d56", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb3edfb3", + "metadata": { + "editable": true + }, + "source": [ + "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n", + "converge to the global minimum, but their rates differ. GD can take\n", + "larger, more stable steps since it uses the exact gradient, achieving\n", + "an error that decreases on the order of $O(1/t)$ per iteration for\n", + "convex objectives (and even exponentially fast for strongly convex\n", + "cases). In contrast, plain SGD has more variance in each step, leading\n", + "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n", + "for general convex objectives (\\thetaith appropriate diminishing step\n", + "sizes) . Intuitively, GD’s trajectory is smoother and more\n", + "predictable, while SGD’s path oscillates due to noise but costs far\n", + "less per iteration, enabling many more updates in the same time." + ] + }, + { + "cell_type": "markdown", + "id": "7fe05c0d", + "metadata": { + "editable": true + }, + "source": [ + "### Strongly Convex Case\n", + "\n", + "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n", + "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as" + ] + }, + { + "cell_type": "markdown", + "id": "2ae403f1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "44272171", + "metadata": { + "editable": true + }, + "source": [ + "a geometric (linear) convergence per iteration . Achieving an\n", + "$\\epsilon$-accurate solution thus takes on the order of\n", + "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n", + "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n", + "obtain a linear rate – instead, with a properly decaying step size\n", + "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n", + "$O(1/t)$ convergence rate in expectation . For example, one result\n", + "of Moulines and Bach 2011, see shows that with $\\eta_t = \\Theta(1/t)$," + ] + }, + { + "cell_type": "markdown", + "id": "9cde29ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9b77f20e", + "metadata": { + "editable": true + }, + "source": [ + "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n", + "iteration than GD’s exponential decay, but each SGD iteration is $N$\n", + "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n", + "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n", + "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n", + "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n", + "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n", + "computations. In large-scale regimes (huge $N$), SGD can be\n", + "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n", + "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n", + "with millions of data points, one epoch of GD (one full gradient) is\n", + "extremely costly, whereas SGD can make $N$ cheap updates in the time\n", + "GD makes one – often yielding a good solution faster in practice, even\n", + "though SGD’s asymptotic error decays more slowly. As one lecture\n", + "succinctly puts it: “SGD can be super effective in terms of iteration\n", + "cost and memory, but SGD is slow to converge and can’t adapt to strong\n", + "convexity” . Thus, the break-even point depends on $N$ and the desired\n", + "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n", + "updates win; for extremely high precision (very small $\\epsilon$) on a\n", + "modest $N$, GD’s fast convergence per step can be advantageous." + ] + }, + { + "cell_type": "markdown", + "id": "4479bd97", + "metadata": { + "editable": true + }, + "source": [ + "### Non-Convex Problems\n", + "\n", + "In non-convex optimization (e.g. deep neural networks), neither GD nor\n", + "SGD guarantees global minima, but SGD often displays faster progress\n", + "in finding useful minima. Theoretical results here are weaker, usually\n", + "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n", + "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n", + "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n", + "similar polynomial complexity (often worse due to gradient\n", + "noise). However, a noteworthy difference is that SGD’s stochasticity\n", + "can help escape saddle points or poor local minima. Random gradient\n", + "fluctuations act like implicit noise, helping the iterate “jump” out\n", + "of flat saddle regions where full-batch GD could stagnate . In fact,\n", + "research has shown that adding noise to GD can guarantee escaping\n", + "saddle points in polynomial time, and the inherent noise in SGD often\n", + "serves this role. Empirically, this means SGD can sometimes find a\n", + "lower loss basin faster, whereas full-batch GD might get “stuck” near\n", + "saddle points or need a very small learning rate to navigate complex\n", + "error surfaces . Overall, in modern high-dimensional machine learning,\n", + "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n", + "because it converges to good solutions much faster in practice,\n", + "despite the lack of a linear convergence guarantee. Full-batch GD is\n", + "rarely used on large neural networks, as it would require tiny steps\n", + "to avoid divergence and is extremely slow per iteration ." + ] + }, + { + "cell_type": "markdown", + "id": "31ea65c9", + "metadata": { + "editable": true + }, + "source": [ + "## Memory Usage and Scalability\n", + "\n", + "A major advantage of SGD is its memory efficiency in handling large\n", + "datasets. Full-batch GD requires access to the entire training set for\n", + "each iteration, which often means the whole dataset (or a large\n", + "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n", + "in memory usage that scales linearly with the dataset size $N$. For\n", + "instance, if each training sample is large (e.g. high-dimensional\n", + "features), computing a full gradient may require storing a substantial\n", + "portion of the data or all intermediate gradients until they are\n", + "aggregated. In contrast, SGD needs only a single (or a small\n", + "mini-batch of) training example(s) in memory at any time . The\n", + "algorithm processes one sample (or mini-batch) at a time and\n", + "immediately updates the model, discarding that sample before moving to\n", + "the next. This streaming approach means that memory footprint is\n", + "essentially independent of $N$ (apart from storing the model\n", + "parameters themselves). As one source notes, gradient descent\n", + "“requires more memory than SGD” because it “must store the entire\n", + "dataset for each iteration,” whereas SGD “only needs to store the\n", + "current training example” . In practical terms, if you have a dataset\n", + "of size, say, 1 million examples, full-batch GD would need memory for\n", + "all million every step, while SGD could be implemented to load just\n", + "one example at a time – a crucial benefit if data are too large to fit\n", + "in RAM or GPU memory. This scalability makes SGD suitable for\n", + "large-scale learning: as long as you can stream data from disk, SGD\n", + "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n", + "“does not need to remember which examples were visited” in the past,\n", + "allowing it to run in an online fashion on infinite data streams\n", + ". Full-batch GD, on the other hand, would require multiple passes\n", + "through a giant dataset per update (or a complex distributed memory\n", + "system), which is often infeasible.\n", + "\n", + "There is also a secondary memory effect: computing a full-batch\n", + "gradient in deep learning requires storing all intermediate\n", + "activations for backpropagation across the entire batch. A very large\n", + "batch (approaching the full dataset) might exhaust GPU memory due to\n", + "the need to hold activation gradients for thousands or millions of\n", + "examples simultaneously. SGD/minibatches mitigate this by splitting\n", + "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n", + "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n", + "could not even be executed if $N$ is huge. Techniques like gradient\n", + "accumulation exist to simulate large-batch GD by summing many\n", + "small-batch gradients – but these still process data in manageable\n", + "chunks to avoid memory overflow. In summary, memory complexity for GD\n", + "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n", + "(only the model and perhaps a mini-batch reside in memory) . This is a\n", + "key reason why batch GD “does not scale” to very large data and why\n", + "virtually all large-scale machine learning algorithms rely on\n", + "stochastic or mini-batch methods." + ] + }, + { + "cell_type": "markdown", + "id": "3f3fe4c4", + "metadata": { + "editable": true + }, + "source": [ + "## Empirical Evidence: Convergence Time and Memory in Practice\n", + "\n", + "Empirical studies strongly support the theoretical trade-offs\n", + "above. In large-scale machine learning tasks, SGD often converges to a\n", + "good solution much faster in wall-clock time than full-batch GD, and\n", + "it uses far less memory. For example, Bottou & Bousquet (2008)\n", + "analyzed learning time under a fixed computational budget and\n", + "concluded that when data is abundant, it’s better to use a faster\n", + "(even if less precise) optimization method to process more examples in\n", + "the same time . This analysis showed that for large-scale problems,\n", + "processing more data with SGD yields lower error than spending the\n", + "time to do exact (batch) optimization on fewer data . In other words,\n", + "if you have a time budget, it’s often optimal to accept slightly\n", + "slower convergence per step (as with SGD) in exchange for being able\n", + "to use many more training samples in that time. This phenomenon is\n", + "borne out by experiments:" + ] + }, + { + "cell_type": "markdown", + "id": "69d08c69", + "metadata": { + "editable": true + }, + "source": [ + "### Deep Neural Networks\n", + "\n", + "In modern deep learning, full-batch GD is so slow that it is rarely\n", + "attempted; instead, mini-batch SGD is standard. A recent study\n", + "demonstrated that it is possible to train a ResNet-50 on ImageNet\n", + "using full-batch gradient descent, but it required careful tuning\n", + "(e.g. gradient clipping, tiny learning rates) and vast computational\n", + "resources – and even then, each full-batch update was extremely\n", + "expensive.\n", + "\n", + "Using a huge batch\n", + "(closer to full GD) tends to slow down convergence if the learning\n", + "rate is not scaled up, and often encounters optimization difficulties\n", + "(plateaus) that small batches avoid.\n", + "Empirically, small or medium\n", + "batch SGD finds minima in fewer clock hours because it can rapidly\n", + "loop over the data with gradient noise aiding exploration." + ] + }, + { + "cell_type": "markdown", + "id": "4e2b549d", + "metadata": { + "editable": true + }, + "source": [ + "### Memory constraints\n", + "\n", + "From a memory standpoint, practitioners note that batch GD becomes\n", + "infeasible on large data. For example, if one tried to do full-batch\n", + "training on a dataset that doesn’t fit in RAM or GPU memory, the\n", + "program would resort to heavy disk I/O or simply crash. SGD\n", + "circumvents this by processing mini-batches. Even in cases where data\n", + "does fit in memory, using a full batch can spike memory usage due to\n", + "storing all gradients. One empirical observation is that mini-batch\n", + "training has a “lower, fluctuating usage pattern” of memory, whereas\n", + "full-batch loading “quickly consumes memory (often exceeding limits)”\n", + ". This is especially relevant for graph neural networks or other\n", + "models where a “batch” may include a huge chunk of a graph: full-batch\n", + "gradient computation can exhaust GPU memory, whereas mini-batch\n", + "methods keep memory usage manageable .\n", + "\n", + "In summary, SGD converges faster than full-batch GD in terms of actual\n", + "training time for large-scale problems, provided we measure\n", + "convergence as reaching a good-enough solution. Theoretical bounds\n", + "show SGD needs more iterations, but because it performs many more\n", + "updates per unit time (and requires far less memory), it often\n", + "achieves lower loss in a given time frame than GD. Full-batch GD might\n", + "take slightly fewer iterations in theory, but each iteration is so\n", + "costly that it is “slower… especially for large datasets” . Meanwhile,\n", + "memory scaling strongly favors SGD: GD’s memory cost grows with\n", + "dataset size, making it impractical beyond a point, whereas SGD’s\n", + "memory use is modest and mostly constant w.r.t. $N$ . These\n", + "differences have made SGD (and mini-batch variants) the de facto\n", + "choice for training large machine learning models, from logistic\n", + "regression on millions of examples to deep neural networks with\n", + "billions of parameters. The consensus in both research and practice is\n", + "that for large-scale or high-dimensional tasks, SGD-type methods\n", + "converge quicker per unit of computation and handle memory constraints\n", + "better than standard full-batch gradient descent ." + ] + }, + { + "cell_type": "markdown", + "id": "48c2661e", + "metadata": { + "editable": true + }, + "source": [ + "## Second moment of the gradient\n", + "\n", + "In stochastic gradient descent, with and without momentum, we still\n", + "have to specify a schedule for tuning the learning rates $\\eta_t$\n", + "as a function of time. As discussed in the context of Newton's\n", + "method, this presents a number of dilemmas. The learning rate is\n", + "limited by the steepest direction which can change depending on the\n", + "current position in the landscape. To circumvent this problem, ideally\n", + "our algorithm would keep track of curvature and take large steps in\n", + "shallow, flat directions and small steps in steep, narrow directions.\n", + "Second-order methods accomplish this by calculating or approximating\n", + "the Hessian and normalizing the learning rate by the\n", + "curvature. However, this is very computationally expensive for\n", + "extremely large models. Ideally, we would like to be able to\n", + "adaptively change the step size to match the landscape without paying\n", + "the steep computational price of calculating or approximating\n", + "Hessians.\n", + "\n", + "During the last decade a number of methods have been introduced that accomplish\n", + "this by tracking not only the gradient, but also the second moment of\n", + "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n", + "[ADAM](https://arxiv.org/abs/1412.6980)." + ] + }, + { + "cell_type": "markdown", + "id": "a2106298", + "metadata": { + "editable": true + }, + "source": [ + "## Challenge: Choosing a Fixed Learning Rate\n", + "A fixed $\\eta$ is hard to get right:\n", + "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n", + "\n", + "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n", + "\n", + "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n", + "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n", + "1. Steep coordinates require a smaller step size to avoid oscillation.\n", + "\n", + "2. Flat/shallow coordinates could use a larger step to speed up progress.\n", + "\n", + "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature." + ] + }, + { + "cell_type": "markdown", + "id": "477a053c", + "metadata": { + "editable": true + }, + "source": [ + "## Motivation for Adaptive Step Sizes\n", + "\n", + "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n", + "\n", + "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n", + "\n", + "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n", + "\n", + "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n", + "\n", + "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods." + ] + }, + { + "cell_type": "markdown", + "id": "f0924df8", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "7743f26d", + "metadata": { + "editable": true + }, + "source": [ + "## Derivation of the AdaGrad Algorithm\n", + "\n", + "**Accumulating Gradient History.**\n", + "\n", + "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n", + "\n", + "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n", + "\n", + "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n", + "\n", + "4. At each iteration $t$, update the accumulation:" + ] + }, + { + "cell_type": "markdown", + "id": "ef4b5d6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "r_t = r_{t-1} + g_t \\circ g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "927e2738", + "metadata": { + "editable": true + }, + "source": [ + "1. Here $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n", + "\n", + "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$." + ] + }, + { + "cell_type": "markdown", + "id": "1753de13", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Update Rule Derivation\n", + "\n", + "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:" + ] + }, + { + "cell_type": "markdown", + "id": "0db67ba3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7831e978", + "metadata": { + "editable": true + }, + "source": [ + "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n", + "In coordinates, this means each parameter $j$ has an individual step size:" + ] + }, + { + "cell_type": "markdown", + "id": "92a7758a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df62a4ff", + "metadata": { + "editable": true + }, + "source": [ + "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:" + ] + }, + { + "cell_type": "markdown", + "id": "c8a2b948", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3f269e80", + "metadata": { + "editable": true + }, + "source": [ + "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows." + ] + }, + { + "cell_type": "markdown", + "id": "f4ec584c", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Properties\n", + "\n", + "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n", + "\n", + "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n", + "\n", + "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n", + "\n", + "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem\n", + "\n", + "It effectively reduces the need to tune $\\eta$ by hand.\n", + "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)" + ] + }, + { + "cell_type": "markdown", + "id": "4b741016", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp: Adaptive Learning Rates\n", + "\n", + "Addresses AdaGrad’s diminishing learning rate issue.\n", + "Uses a decaying average of squared gradients (instead of a cumulative sum):" + ] + }, + { + "cell_type": "markdown", + "id": "76108e75", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4c6a3353", + "metadata": { + "editable": true + }, + "source": [ + "with $\\rho$ typically $0.9$ (or $0.99$).\n", + "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n", + "\n", + "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n", + "\n", + "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n", + "\n", + "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)" + ] + }, + { + "cell_type": "markdown", + "id": "3e0a76ae", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "fa5fd82e", + "metadata": { + "editable": true + }, + "source": [ + "## Adam Optimizer\n", + "\n", + "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n", + "\n", + "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice." + ] + }, + { + "cell_type": "markdown", + "id": "89cda2f6", + "metadata": { + "editable": true + }, + "source": [ + "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n", + "\n", + "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n", + "both the first and second moment of the gradient and use this\n", + "information to adaptively change the learning rate for different\n", + "parameters. The method is efficient when working with large\n", + "problems involving lots data and/or parameters. It is a combination of the\n", + "gradient descent with momentum algorithm and the RMSprop algorithm\n", + "discussed above." + ] + }, + { + "cell_type": "markdown", + "id": "69310c2b", + "metadata": { + "editable": true + }, + "source": [ + "## Why Combine Momentum and RMSProp?\n", + "\n", + "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice" + ] + }, + { + "cell_type": "markdown", + "id": "7d6b8734", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Exponential Moving Averages (Moments)\n", + "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n", + "**First moment (mean) $m_t$.**\n", + "\n", + "The Momentum term" + ] + }, + { + "cell_type": "markdown", + "id": "106ce6bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ba64fd6", + "metadata": { + "editable": true + }, + "source": [ + "**Second moment (uncentered variance) $v_t$.**\n", + "\n", + "The RMS term" + ] + }, + { + "cell_type": "markdown", + "id": "d2e1a9ee", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "00aae51f", + "metadata": { + "editable": true + }, + "source": [ + "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n", + "\n", + " These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)" + ] + }, + { + "cell_type": "markdown", + "id": "38adfadd", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Bias Correction\n", + "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates" + ] + }, + { + "cell_type": "markdown", + "id": "484156fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45d1d0c2", + "metadata": { + "editable": true + }, + "source": [ + "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n", + "\n", + "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n", + "\n", + "* Bias correction is important for Adam’s stability in early iterations" + ] + }, + { + "cell_type": "markdown", + "id": "e62d5568", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Update Rule Derivation\n", + "Finally, Adam updates parameters using the bias-corrected moments:" + ] + }, + { + "cell_type": "markdown", + "id": "3eb873c1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fc1129f6", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n", + "Breaking it down:\n", + "1. Compute gradient $\\nabla C(\\theta_t)$.\n", + "\n", + "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n", + "\n", + "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n", + "\n", + "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n", + "\n", + "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n", + "\n", + "This is the Adam update rule as given in the original paper." + ] + }, + { + "cell_type": "markdown", + "id": "6f15ce48", + "metadata": { + "editable": true + }, + "source": [ + "## Adam vs. AdaGrad and RMSProp\n", + "\n", + "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n", + "\n", + "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n", + "\n", + "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n", + "\n", + " * Momentum ($m_t$) provides acceleration and smoother convergence.\n", + "\n", + " * Adaptive $v_t$ scaling moderates the step size per dimension.\n", + "\n", + " * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n", + "\n", + "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone" + ] + }, + { + "cell_type": "markdown", + "id": "44cb65e2", + "metadata": { + "editable": true + }, + "source": [ + "## Adaptivity Across Dimensions\n", + "\n", + "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n", + "\n", + "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n", + "\n", + "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction." + ] + }, + { + "cell_type": "markdown", + "id": "e3862c40", + "metadata": { + "editable": true + }, + "source": [ + "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "c4aa2b35", + "metadata": { + "editable": true + }, + "source": [ + "## Algorithms and codes for Adagrad, RMSprop and Adam\n", + "\n", + "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n", + "\n", + "The codes which implement these algorithms are discussed below here." + ] + }, + { + "cell_type": "markdown", + "id": "01de27d3", + "metadata": { + "editable": true + }, + "source": [ + "## Practical tips\n", + "\n", + "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n", + "\n", + "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n", + "\n", + "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n", + "\n", + "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications." + ] + }, + { + "cell_type": "markdown", + "id": "78a1a601", + "metadata": { + "editable": true + }, + "source": [ + "## Sneaking in automatic differentiation using Autograd\n", + "\n", + "In the examples here we take the liberty of sneaking in automatic\n", + "differentiation (without having discussed the mathematics). In\n", + "project 1 you will write the gradients as discussed above, that is\n", + "hard-coding the gradients. By introducing automatic differentiation\n", + "via the library **autograd**, which is now replaced by **JAX**, we have\n", + "more flexibility in setting up alternative cost functions.\n", + "\n", + "The\n", + "first example shows results with ordinary leats squares." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c721352d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e36cec47", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "fc5df7eb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x#+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 30\n", + "\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "# Now improve with momentum gradient descent\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "for iter in range(Niterations):\n", + " # calculate gradient\n", + " gradients = training_gradient(theta)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd wth momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "0b27af70", + "metadata": { + "editable": true + }, + "source": [ + "## Including Stochastic Gradient Descent with Autograd\n", + "\n", + "In this code we include the stochastic gradient descent approach\n", + "discussed above. Note here that we specify which argument we are\n", + "taking the derivative with respect to when using **autograd**." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "adef9763", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "310fe5b2", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bcf65acf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "\n", + "for epoch in range(n_epochs):\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + "print(\"theta from own sdg with momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "f5e2c550", + "metadata": { + "editable": true + }, + "source": [ + "## But none of these can compete with Newton's method\n", + "\n", + "Note that we here have introduced automatic differentiation" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "300a02a4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Newton's method\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+5*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "# Note that here the Hessian does not depend on the parameters theta\n", + "invH = np.linalg.pinv(H)\n", + "theta = np.random.randn(3,1)\n", + "Niterations = 5\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= invH @ gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own Newton code\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "5cb5fd26", + "metadata": { + "editable": true + }, + "source": [ + "## Similar (second order function now) problem but now with AdaGrad" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "030efc5d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " Giter += gradients*gradients\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + " theta -= update\n", + "print(\"theta from own AdaGrad\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "66850bb7", + "metadata": { + "editable": true + }, + "source": [ + "Running this code we note an almost perfect agreement with the results from matrix inversion." + ] + }, + { + "cell_type": "markdown", + "id": "e1608bcf", + "metadata": { + "editable": true + }, + "source": [ + "## RMSprop for adaptive learning rate with Stochastic Gradient Descent" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "0ba7d8f7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameter rho\n", + "rho = 0.99\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + "\t# Accumulated gradient\n", + "\t# Scaling with rho the new and the previous results\n", + " Giter = (rho*Giter+(1-rho)*gradients*gradients)\n", + "\t# Taking the diagonal only and inverting\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + "\t# Hadamard product\n", + " theta -= update\n", + "print(\"theta from own RMSprop\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "0503f74b", + "metadata": { + "editable": true + }, + "source": [ + "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "c2a2732a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n", + "theta1 = 0.9\n", + "theta2 = 0.999\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-7\n", + "iter = 0\n", + "for epoch in range(n_epochs):\n", + " first_moment = 0.0\n", + " second_moment = 0.0\n", + " iter += 1\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " # Computing moments first\n", + " first_moment = theta1*first_moment + (1-theta1)*gradients\n", + " second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n", + " first_term = first_moment/(1.0-theta1**iter)\n", + " second_term = second_moment/(1.0-theta2**iter)\n", + "\t# Scaling with rho the new and the previous results\n", + " update = eta*first_term/(np.sqrt(second_term)+delta)\n", + " theta -= update\n", + "print(\"theta from own ADAM\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "b8475863", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n", + "\n", + "2. Work on project 1\n", + "\n", + "\n", + "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended." + ] + }, + { + "cell_type": "markdown", + "id": "4d4d0717", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on different scaling methods\n", + "\n", + "Before fitting a regression model, it is good practice to normalize or\n", + "standardize the features. This ensures all features are on a\n", + "comparable scale, which is especially important when using\n", + "regularization. In the exercises this week we will perform standardization, scaling each\n", + "feature to have mean 0 and standard deviation 1.\n", + "\n", + "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n", + "Then we subtract the mean and divide by the standard deviation for each feature.\n", + "\n", + "In the example here we\n", + "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n", + "(and each feature) means the model does not require a separate intercept\n", + "term, the data is shifted such that the intercept is effectively 0\n", + ". (In practice, one could include an intercept in the model and not\n", + "penalize it, but here we simplify by centering.)\n", + "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "46375144", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Standardize features (zero mean, unit variance for each feature)\n", + "X_mean = X.mean(axis=0)\n", + "X_std = X.std(axis=0)\n", + "X_std[X_std == 0] = 1 # safeguard to avoid division by zero for constant features\n", + "X_norm = (X - X_mean) / X_std\n", + "\n", + "# Center the target to zero mean (optional, to simplify intercept handling)\n", + "y_mean = ?\n", + "y_centered = ?" + ] + }, + { + "cell_type": "markdown", + "id": "39426ccf", + "metadata": { + "editable": true + }, + "source": [ + "Do we need to center the values of $y$?\n", + "\n", + "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", + "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n", + "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", + "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", + "same scale)." + ] + }, + { + "cell_type": "markdown", + "id": "df7fe27f", + "metadata": { + "editable": true + }, + "source": [ + "## Functionality in Scikit-Learn\n", + "\n", + "**Scikit-Learn** has several functions which allow us to rescale the\n", + "data, normally resulting in much better results in terms of various\n", + "accuracy scores. The **StandardScaler** function in **Scikit-Learn**\n", + "ensures that for each feature/predictor we study the mean value is\n", + "zero and the variance is one (every column in the design/feature\n", + "matrix). This scaling has the drawback that it does not ensure that\n", + "we have a particular maximum or minimum in our data set. Another\n", + "function included in **Scikit-Learn** is the **MinMaxScaler** which\n", + "ensures that all features are exactly between $0$ and $1$. The" + ] + }, + { + "cell_type": "markdown", + "id": "8fd48e39", + "metadata": { + "editable": true + }, + "source": [ + "## More preprocessing\n", + "\n", + "The **Normalizer** scales each data\n", + "point such that the feature vector has a euclidean length of one. In other words, it\n", + "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n", + "radius of 1. This means every data point is scaled by a different number (by the\n", + "inverse of it’s length).\n", + "This normalization is often used when only the direction (or angle) of the data matters,\n", + "not the length of the feature vector.\n", + "\n", + "The **RobustScaler** works similarly to the StandardScaler in that it\n", + "ensures statistical properties for each feature that guarantee that\n", + "they are on the same scale. However, the RobustScaler uses the median\n", + "and quartiles, instead of mean and variance. This makes the\n", + "RobustScaler ignore data points that are very different from the rest\n", + "(like measurement errors). These odd data points are also called\n", + "outliers, and might often lead to trouble for other scaling\n", + "techniques." + ] + }, + { + "cell_type": "markdown", + "id": "d6c60a0a", + "metadata": { + "editable": true + }, + "source": [ + "## Frequently used scaling functions\n", + "\n", + "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n", + "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:" + ] + }, + { + "cell_type": "markdown", + "id": "1bb6eaa0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25135896", + "metadata": { + "editable": true + }, + "source": [ + "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively, of the feature $x_j$.\n", + "This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one.\n", + "\n", + "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n", + "on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "469ca11e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#Model training, we compute the mean value of y and X\n", + "y_train_mean = np.mean(y_train)\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "X_train = X_train - X_train_mean\n", + "y_train = y_train - y_train_mean\n", + "\n", + "# The we fit our model with the training data\n", + "trained_model = some_model.fit(X_train,y_train)\n", + "\n", + "\n", + "#Model prediction, we need also to transform our data set used for the prediction.\n", + "X_test = X_test - X_train_mean #Use mean from training data\n", + "y_pred = trained_model(X_test)\n", + "y_pred = y_pred + y_train_mean\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "33722029", + "metadata": { + "editable": true + }, + "source": [ + "Let us try to understand what this may imply mathematically when we\n", + "subtract the mean values, also known as *zero centering*. For\n", + "simplicity, we will focus on ordinary regression, as done in the above example.\n", + "\n", + "The cost/loss function for regression is" + ] + }, + { + "cell_type": "markdown", + "id": "fe27291e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ead1167d", + "metadata": { + "editable": true + }, + "source": [ + "Recall also that we use the squared value. This expression can lead to an\n", + "increased penalty for higher differences between predicted and\n", + "output/target values.\n", + "\n", + "What we have done is to single out the $\\theta_0$ term in the\n", + "definition of the mean squared error (MSE). The design matrix $X$\n", + "does in this case not contain any intercept column. When we take the\n", + "derivative with respect to $\\theta_0$, we want the derivative to obey" + ] + }, + { + "cell_type": "markdown", + "id": "b2efb706", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65333100", + "metadata": { + "editable": true + }, + "source": [ + "for all $j$. For $\\theta_0$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "1fde497c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "264ce562", + "metadata": { + "editable": true + }, + "source": [ + "Multiplying away the constant $2/n$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "0f63a6f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2ba0a6e4", + "metadata": { + "editable": true + }, + "source": [ + "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n", + "Our result for $\\theta_0$ simplifies then to" + ] + }, + { + "cell_type": "markdown", + "id": "3b377f93", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f05e9d08", + "metadata": { + "editable": true + }, + "source": [ + "We obtain then" + ] + }, + { + "cell_type": "markdown", + "id": "84784b8e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b62c6e5a", + "metadata": { + "editable": true + }, + "source": [ + "If we define" + ] + }, + { + "cell_type": "markdown", + "id": "ecce9763", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9e1842a", + "metadata": { + "editable": true + }, + "source": [ + "and the mean value of the outputs as" + ] + }, + { + "cell_type": "markdown", + "id": "be12163e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a097e9ab", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "239422b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed9778bb", + "metadata": { + "editable": true + }, + "source": [ + "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have" + ] + }, + { + "cell_type": "markdown", + "id": "7179b77b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aad2f56e", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite the latter equation as" + ] + }, + { + "cell_type": "markdown", + "id": "26aa9739", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d270cb13", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "5a52457b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c98105d", + "metadata": { + "editable": true + }, + "source": [ + "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n", + "\n", + "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)" + ] + }, + { + "cell_type": "markdown", + "id": "4d82302f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a3a07a10", + "metadata": { + "editable": true + }, + "source": [ + "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then" + ] + }, + { + "cell_type": "markdown", + "id": "ea19374e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11dd1361", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n", + "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n", + "\n", + "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then" + ] + }, + { + "cell_type": "markdown", + "id": "f6a52f34", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9d6807dc", + "metadata": { + "editable": true + }, + "source": [ + "What does this mean? And why do we insist on all this? Let us look at some examples.\n", + "\n", + "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n", + "Note also that we do not split the data into training and test." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "2ed0cafc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "\n", + "np.random.seed(2021)\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "def fit_theta(X, y):\n", + " return np.linalg.pinv(X.T @ X) @ X.T @ y\n", + "\n", + "\n", + "true_theta = [2, 0.5, 3.7]\n", + "\n", + "x = np.linspace(0, 1, 11)\n", + "y = np.sum(\n", + " np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n", + ") + 0.1 * np.random.normal(size=len(x))\n", + "\n", + "degree = 3\n", + "X = np.zeros((len(x), degree))\n", + "\n", + "# Include the intercept in the design matrix\n", + "for p in range(degree):\n", + " X[:, p] = x ** p\n", + "\n", + "theta = fit_theta(X, y)\n", + "\n", + "# Intercept is included in the design matrix\n", + "skl = LinearRegression(fit_intercept=False).fit(X, y)\n", + "\n", + "print(f\"True theta: {true_theta}\")\n", + "print(f\"Fitted theta: {theta}\")\n", + "print(f\"Sklearn fitted theta: {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with intercept column\")\n", + "print(MSE(y,ypredictOwn))\n", + "print(f\"MSE with intercept column from SKL\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "\n", + "plt.figure()\n", + "plt.scatter(x, y, label=\"Data\")\n", + "plt.plot(x, X @ theta, label=\"Fit\")\n", + "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n", + "\n", + "\n", + "# Do not include the intercept in the design matrix\n", + "X = np.zeros((len(x), degree - 1))\n", + "\n", + "for p in range(degree - 1):\n", + " X[:, p] = x ** (p + 1)\n", + "\n", + "# Intercept is not included in the design matrix\n", + "skl = LinearRegression(fit_intercept=True).fit(X, y)\n", + "\n", + "# Use centered values for X and y when computing coefficients\n", + "y_offset = np.average(y, axis=0)\n", + "X_offset = np.average(X, axis=0)\n", + "\n", + "theta = fit_theta(X - X_offset, y - y_offset)\n", + "intercept = np.mean(y_offset - X_offset @ theta)\n", + "\n", + "print(f\"Manual intercept: {intercept}\")\n", + "print(f\"Fitted theta (without intercept): {theta}\")\n", + "print(f\"Sklearn intercept: {skl.intercept_}\")\n", + "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with Manual intercept\")\n", + "print(MSE(y,ypredictOwn+intercept))\n", + "print(f\"MSE with Sklearn intercept\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n", + "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n", + "plt.grid()\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "f72dbb49", + "metadata": { + "editable": true + }, + "source": [ + "The intercept is the value of our output/target variable\n", + "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n", + "\n", + "Printing the MSE, we see first that both methods give the same MSE, as\n", + "they should. However, when we move to for example Ridge regression,\n", + "the way we treat the intercept may give a larger or smaller MSE,\n", + "meaning that the MSE can be penalized by the value of the\n", + "intercept. Not including the intercept in the fit, means that the\n", + "regularization term does not include $\\theta_0$. For different values\n", + "of $\\lambda$, this may lead to different MSE values. \n", + "\n", + "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by" + ] + }, + { + "cell_type": "markdown", + "id": "b7759b1f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba0ecd6e", + "metadata": { + "editable": true + }, + "source": [ + "but when we take out the intercept, this equation becomes" + ] + }, + { + "cell_type": "markdown", + "id": "ae897f1e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f9c41f7f", + "metadata": { + "editable": true + }, + "source": [ + "For Lasso regression we have" + ] + }, + { + "cell_type": "markdown", + "id": "fa013cc4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0c9b24be", + "metadata": { + "editable": true + }, + "source": [ + "It means that, when scaling the design matrix and the outputs/targets,\n", + "by subtracting the mean values, we have an optimization problem which\n", + "is not penalized by the intercept. The MSE value can then be smaller\n", + "since it focuses only on the remaining quantities. If we however bring\n", + "back the intercept, we will get a MSE which then contains the\n", + "intercept.\n", + "\n", + "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known vanilla data set." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "4f9b1fa0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree))\n", + "#We include explicitely the intercept column\n", + "for degree in range(Maxpolydegree):\n", + " X[:,degree] = x**degree\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "p = Maxpolydegree\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n", + " # Note: we include the intercept column and no scaling\n", + " RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n", + " RegRidge.fit(X_train,y_train)\n", + " # and then make the prediction\n", + " ytildeOwnRidge = X_train @ OwnRidgeTheta\n", + " ypredictOwnRidge = X_test @ OwnRidgeTheta\n", + " ytildeRidge = RegRidge.predict(X_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta)\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "1aa5ca37", + "metadata": { + "editable": true + }, + "source": [ + "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n", + "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n", + "What happens if we do not include the intercept in our fit?\n", + "Let us see how we can change this code by zero centering." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "a731e32c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(315)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree-1))\n", + "\n", + "for degree in range(1,Maxpolydegree): #No intercept column\n", + " X[:,degree-1] = x**(degree)\n", + "\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "#Center by removing mean from each feature\n", + "X_train_scaled = X_train - X_train_mean \n", + "X_test_scaled = X_test - X_train_mean\n", + "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n", + "#Remove the intercept from the training data.\n", + "y_scaler = np.mean(y_train) \n", + "y_train_scaled = y_train - y_scaler \n", + "\n", + "p = Maxpolydegree-1\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n", + " intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n", + " #Add intercept to prediction\n", + " ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n", + " RegRidge = linear_model.Ridge(lmb)\n", + " RegRidge.fit(X_train,y_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta) #Intercept is given by mean of target variable\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print('Intercept from own implementation:')\n", + " print(intercept_)\n", + " print('Intercept from Scikit-Learn Ridge implementation')\n", + " print(RegRidge.intercept_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6ea197d8", + "metadata": { + "editable": true + }, + "source": [ + "We see here, when compared to the code which includes explicitely the\n", + "intercept column, that our MSE value is actually smaller. This is\n", + "because the regularization term does not include the intercept value\n", + "$\\theta_0$ in the fitting. This applies to Lasso regularization as\n", + "well. It means that our optimization is now done only with the\n", + "centered matrix and/or vector that enter the fitting procedure." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week38.ipynb b/doc/LectureNotes/_build/jupyter_execute/week38.ipynb new file mode 100644 index 000000000..c9b413443 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week38.ipynb @@ -0,0 +1,2283 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8f27372d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "fff8ca30", + "metadata": { + "editable": true + }, + "source": [ + "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n", + "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n", + "\n", + "Date: **September 15-19, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "7ee7e714", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 38, lecture Monday September 15\n", + "\n", + "**Material for the lecture on Monday September 15.**\n", + "\n", + "1. Statistical interpretation of OLS and various expectation values\n", + "\n", + "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", + "\n", + "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at \n", + "\n", + "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n", + "\n", + "5. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "3b5ac440", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos\n", + "1. Raschka et al, pages 175-192\n", + "\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", + "\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", + "\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", + "\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see " + ] + }, + { + "cell_type": "markdown", + "id": "6d5dba52", + "metadata": { + "editable": true + }, + "source": [ + "## Linking the regression analysis with a statistical interpretation\n", + "\n", + "We will now couple the discussions of ordinary least squares, Ridge\n", + "and Lasso regression with a statistical interpretation, that is we\n", + "move from a linear algebra analysis to a statistical analysis. In\n", + "particular, we will focus on what the regularization terms can result\n", + "in. We will amongst other things show that the regularization\n", + "parameter can reduce considerably the variance of the parameters\n", + "$\\theta$.\n", + "\n", + "On of the advantages of doing linear regression is that we actually end up with\n", + "analytical expressions for several statistical quantities. \n", + "Standard least squares and Ridge regression allow us to\n", + "derive quantities like the variance and other expectation values in a\n", + "rather straightforward way.\n", + "\n", + "It is assumed that $\\varepsilon_i\n", + "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n", + "independent, i.e.:" + ] + }, + { + "cell_type": "markdown", + "id": "bfc2983a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \n", + "\\mbox{Cov}(\\varepsilon_{i_1},\n", + "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n", + "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2. \\end{array} \\right.\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b5f5980", + "metadata": { + "editable": true + }, + "source": [ + "The randomness of $\\varepsilon_i$ implies that\n", + "$\\mathbf{y}_i$ is also a random variable. In particular,\n", + "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n", + "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n", + "non-random scalar. To specify the parameters of the distribution of\n", + "$\\mathbf{y}_i$ we need to calculate its first two moments. \n", + "\n", + "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n", + "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n", + "row number $i$ and perform a sum over all values $p$." + ] + }, + { + "cell_type": "markdown", + "id": "3464c7e8", + "metadata": { + "editable": true + }, + "source": [ + "## Assumptions made\n", + "\n", + "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n", + "that there exists a function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n", + "which describe our data" + ] + }, + { + "cell_type": "markdown", + "id": "ed0fd2df", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "feb9d4c2", + "metadata": { + "editable": true + }, + "source": [ + "We approximate this function with our model from the solution of the linear regression equations, that is our\n", + "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with" + ] + }, + { + "cell_type": "markdown", + "id": "eb6d71f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "566399f6", + "metadata": { + "editable": true + }, + "source": [ + "## Expectation value and variance\n", + "\n", + "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$" + ] + }, + { + "cell_type": "markdown", + "id": "6b33f497", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \n", + "\\mathbb{E}(y_i) & =\n", + "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n", + "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5f2f79f2", + "metadata": { + "editable": true + }, + "source": [ + "while\n", + "its variance is" + ] + }, + { + "cell_type": "markdown", + "id": "199121b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n", + "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n", + "[\\mathbb{E}(y_i)]^2 \\\\ & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n", + "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n", + "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n", + "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n", + "\\ast} \\, \\theta)^2 \\\\ & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n", + "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n", + "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n", + "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n", + "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2. \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a1cc529", + "metadata": { + "editable": true + }, + "source": [ + "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n", + "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)." + ] + }, + { + "cell_type": "markdown", + "id": "149e63be", + "metadata": { + "editable": true + }, + "source": [ + "## Expectation value and variance for $\\boldsymbol{\\theta}$\n", + "\n", + "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "6a6fb04a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "79420d06", + "metadata": { + "editable": true + }, + "source": [ + "This means that the estimator of the regression parameters is unbiased.\n", + "\n", + "We can also calculate the variance\n", + "\n", + "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is" + ] + }, + { + "cell_type": "markdown", + "id": "0e3de992", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{eqnarray*}\n", + "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n", + "\\\\\n", + "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n", + "\\\\\n", + "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n", + "% \\\\\n", + "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n", + "\\\\\n", + "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n", + "\\end{eqnarray*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d3ea2897", + "metadata": { + "editable": true + }, + "source": [ + "where we have used that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n", + "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n", + "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n", + "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n", + "variance of the estimate of the $j$-th regression coefficient:\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n", + "construct a confidence interval for the estimates.\n", + "\n", + "In a similar way, we can obtain analytical expressions for say the\n", + "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n", + "when we employ Ridge regression, allowing us again to define a confidence interval. \n", + "\n", + "It is rather straightforward to show that" + ] + }, + { + "cell_type": "markdown", + "id": "da5e3927", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ab5488b", + "metadata": { + "editable": true + }, + "source": [ + "We see clearly that \n", + "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n", + "\n", + "We can also compute the variance as" + ] + }, + { + "cell_type": "markdown", + "id": "f904a739", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T} \\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10fd648b", + "metadata": { + "editable": true + }, + "source": [ + "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n", + "\n", + "With this, we can compute the difference" + ] + }, + { + "cell_type": "markdown", + "id": "4812c2a4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "199d8531", + "metadata": { + "editable": true + }, + "source": [ + "The difference is non-negative definite since each component of the\n", + "matrix product is non-negative definite. \n", + "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below." + ] + }, + { + "cell_type": "markdown", + "id": "96c16676", + "metadata": { + "editable": true + }, + "source": [ + "## Deriving OLS from a probability distribution\n", + "\n", + "Our basic assumption when we derived the OLS equations was to assume\n", + "that our output is determined by a given continuous function\n", + "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n", + "distribution with zero mean value and an undetermined variance\n", + "$\\sigma^2$.\n", + "\n", + "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n", + "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n", + "the design matrix are not stochastic variables, we can assume that the\n", + "probability distribution of our targets is also a normal distribution\n", + "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n", + "single output $y_i$ is given by the Gaussian distribution" + ] + }, + { + "cell_type": "markdown", + "id": "a2a1a004", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5aad445b", + "metadata": { + "editable": true + }, + "source": [ + "## Independent and Identically Distributed (iid)\n", + "\n", + "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n", + "We define this distribution as" + ] + }, + { + "cell_type": "markdown", + "id": "d197c8bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e2e7462f", + "metadata": { + "editable": true + }, + "source": [ + "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n", + "\n", + "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have" + ] + }, + { + "cell_type": "markdown", + "id": "eb635d3d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "445ed13e", + "metadata": { + "editable": true + }, + "source": [ + "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n", + "in case we have a simple one-dimensional input and output case" + ] + }, + { + "cell_type": "markdown", + "id": "319bfc6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "90abf35a", + "metadata": { + "editable": true + }, + "source": [ + "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n", + "We can now rewrite the above probability as" + ] + }, + { + "cell_type": "markdown", + "id": "04b66fbd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4a27b5a7", + "metadata": { + "editable": true + }, + "source": [ + "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$." + ] + }, + { + "cell_type": "markdown", + "id": "8d12543f", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum Likelihood Estimation (MLE)\n", + "\n", + "In statistics, maximum likelihood estimation (MLE) is a method of\n", + "estimating the parameters of an assumed probability distribution,\n", + "given some observed data. This is achieved by maximizing a likelihood\n", + "function so that, under the assumed statistical model, the observed\n", + "data is the most probable. \n", + "\n", + "We will assume here that our events are given by the above Gaussian\n", + "distribution and we will determine the optimal parameters $\\theta$ by\n", + "maximizing the above PDF. However, computing the derivatives of a\n", + "product function is cumbersome and can easily lead to overflow and/or\n", + "underflowproblems, with potentials for loss of numerical precision.\n", + "\n", + "In practice, it is more convenient to maximize the logarithm of the\n", + "PDF because it is a monotonically increasing function of the argument.\n", + "Alternatively, and this will be our option, we will minimize the\n", + "negative of the logarithm since this is a monotonically decreasing\n", + "function.\n", + "\n", + "Note also that maximization/minimization of the logarithm of the PDF\n", + "is equivalent to the maximization/minimization of the function itself." + ] + }, + { + "cell_type": "markdown", + "id": "2e5cd118", + "metadata": { + "editable": true + }, + "source": [ + "## A new Cost Function\n", + "\n", + "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF" + ] + }, + { + "cell_type": "markdown", + "id": "c71a5edf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e663bf2e", + "metadata": { + "editable": true + }, + "source": [ + "which becomes" + ] + }, + { + "cell_type": "markdown", + "id": "c4bc4873", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f5bc59b8", + "metadata": { + "editable": true + }, + "source": [ + "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely" + ] + }, + { + "cell_type": "markdown", + "id": "4f6ddf4a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "afda0a6b", + "metadata": { + "editable": true + }, + "source": [ + "which leads to the well-known OLS equation for the optimal paramters $\\theta$" + ] + }, + { + "cell_type": "markdown", + "id": "b5335dc0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f86a52d", + "metadata": { + "editable": true + }, + "source": [ + "Next week we will make a similar analysis for Ridge and Lasso regression" + ] + }, + { + "cell_type": "markdown", + "id": "5cdb1767", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods\n", + "\n", + "Before we proceed, we need to rethink what we have been doing. In our\n", + "eager to fit the data, we have omitted several important elements in\n", + "our regression analysis. In what follows we will\n", + "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n", + "\n", + "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n", + "\n", + "and discuss how to select a given model (one of the difficult parts in machine learning)." + ] + }, + { + "cell_type": "markdown", + "id": "69435d77", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", + "\n", + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n", + "cross-validation and the bootstrap method." + ] + }, + { + "cell_type": "markdown", + "id": "cefbb559", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." + ] + }, + { + "cell_type": "markdown", + "id": "2659401a", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." + ] + }, + { + "cell_type": "markdown", + "id": "4d5d7748", + "metadata": { + "editable": true + }, + "source": [ + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." + ] + }, + { + "cell_type": "markdown", + "id": "54df92b3", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", + "\n", + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." + ] + }, + { + "cell_type": "markdown", + "id": "5b1a1390", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n", + "\n", + "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**." + ] + }, + { + "cell_type": "markdown", + "id": "39f233e4", + "metadata": { + "editable": true + }, + "source": [ + "## The Central Limit Theorem\n", + "\n", + "Suppose we have a PDF $p(x)$ from which we generate a series $N$\n", + "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n", + "is viewed as the average of a specific measurement, e.g., throwing \n", + "dice 100 times and then taking the average value, or producing a certain\n", + "amount of random numbers. \n", + "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n", + "which follows. We do the same for $\\mathbb{E}[z]=z$.\n", + "\n", + "If we compute the mean $z$ of $m$ such mean values $x_i$" + ] + }, + { + "cell_type": "markdown", + "id": "361320d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a363db1e", + "metadata": { + "editable": true + }, + "source": [ + "the question we pose is which is the PDF of the new variable $z$." + ] + }, + { + "cell_type": "markdown", + "id": "92967efc", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the Limit\n", + "\n", + "The probability of obtaining an average value $z$ is the product of the \n", + "probabilities of obtaining arbitrary individual mean values $x_i$,\n", + "but with the constraint that the average is $z$. We can express this through\n", + "the following expression" + ] + }, + { + "cell_type": "markdown", + "id": "1bffca97", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n", + " \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0dacb6fc", + "metadata": { + "editable": true + }, + "source": [ + "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n", + "All measurements that lead to each individual $x_i$ are expected to\n", + "be independent, which in turn means that we can express $\\tilde{p}$ as the \n", + "product of individual $p(x_i)$. The independence assumption is important in the derivation of the central limit theorem." + ] + }, + { + "cell_type": "markdown", + "id": "baeedf81", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting the $\\delta$-function\n", + "\n", + "If we use the integral expression for the $\\delta$-function" + ] + }, + { + "cell_type": "markdown", + "id": "20cc7770", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f67d3b94", + "metadata": { + "editable": true + }, + "source": [ + "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n", + "we arrive at" + ] + }, + { + "cell_type": "markdown", + "id": "17f59fb6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n", + " dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5f899fbe", + "metadata": { + "editable": true + }, + "source": [ + "with the integral over $x$ resulting in" + ] + }, + { + "cell_type": "markdown", + "id": "19a1f5bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n", + " \\int_{-\\infty}^{\\infty}dxp(x)\n", + " \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1db8fcf2", + "metadata": { + "editable": true + }, + "source": [ + "## Identifying Terms\n", + "\n", + "The second term on the rhs disappears since this is just the mean and \n", + "employing the definition of $\\sigma^2$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "bfadf7e5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n", + " 1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c65ce24", + "metadata": { + "editable": true + }, + "source": [ + "resulting in" + ] + }, + { + "cell_type": "markdown", + "id": "8cd5650a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n", + " \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11fdc936", + "metadata": { + "editable": true + }, + "source": [ + "and in the limit $m\\rightarrow \\infty$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "ed88642e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n", + " \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82c61b81", + "metadata": { + "editable": true + }, + "source": [ + "which is the normal distribution with variance\n", + "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n", + "and $\\mu$ is also the mean of the PDF $p(x)$." + ] + }, + { + "cell_type": "markdown", + "id": "bc43db46", + "metadata": { + "editable": true + }, + "source": [ + "## Wrapping it up\n", + "\n", + "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n", + "the average of $m$ random values corresponding to a PDF $p(x)$ \n", + "is a normal distribution whose mean is the \n", + "mean value of the PDF $p(x)$ and whose variance is the variance\n", + "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n", + "\n", + "The central limit theorem leads to the well-known expression for the\n", + "standard deviation, given by" + ] + }, + { + "cell_type": "markdown", + "id": "25418113", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma_m=\n", + "\\frac{\\sigma}{\\sqrt{m}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e5d3c3eb", + "metadata": { + "editable": true + }, + "source": [ + "The latter is true only if the average value is known exactly. This is obtained in the limit\n", + "$m\\rightarrow \\infty$ only. Because the mean and the variance are measured quantities we obtain \n", + "the familiar expression in statistics (the so-called Bessel correction)" + ] + }, + { + "cell_type": "markdown", + "id": "c504cba4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma_m\\approx \n", + "\\frac{\\sigma}{\\sqrt{m-1}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "079ded2a", + "metadata": { + "editable": true + }, + "source": [ + "In many cases however the above estimate for the standard deviation,\n", + "in particular if correlations are strong, may be too simplistic. Keep\n", + "in mind that we have assumed that the variables $x$ are independent\n", + "and identically distributed. This is obviously not always the\n", + "case. For example, the random numbers (or better pseudorandom numbers)\n", + "we generate in various calculations do always exhibit some\n", + "correlations.\n", + "\n", + "The theorem is satisfied by a large class of PDFs. Note however that for a\n", + "finite $m$, it is not always possible to find a closed form /analytic expression for\n", + "$\\tilde{p}(x)$." + ] + }, + { + "cell_type": "markdown", + "id": "e8534a50", + "metadata": { + "editable": true + }, + "source": [ + "## Confidence Intervals\n", + "\n", + "Confidence intervals are used in statistics and represent a type of estimate\n", + "computed from the observed data. This gives a range of values for an\n", + "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n", + "\n", + "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n", + "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n", + "\n", + "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n", + "\n", + "This quantity can be used to\n", + "construct a confidence interval for the estimates." + ] + }, + { + "cell_type": "markdown", + "id": "2fc73431", + "metadata": { + "editable": true + }, + "source": [ + "## Standard Approach based on the Normal Distribution\n", + "\n", + "We will assume that the parameters $\\theta$ follow a normal\n", + "distribution. We can then define the confidence interval. Here we will be using as\n", + "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n", + "for the standard deviation. We have then a confidence interval" + ] + }, + { + "cell_type": "markdown", + "id": "0f8b0845", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25105753", + "metadata": { + "editable": true + }, + "source": [ + "where $z$ defines the level of certainty (or confidence). For a normal\n", + "distribution typical parameters are $z=2.576$ which corresponds to a\n", + "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n", + "$95\\%$. A confidence level of $95\\%$ is commonly used and it is\n", + "normally referred to as a *two-sigmas* confidence level, that is we\n", + "approximate $z\\approx 2$.\n", + "\n", + "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n", + "\n", + "In this text you will also find an in-depth discussion of the\n", + "Bootstrap method, why it works and various theorems related to it." + ] + }, + { + "cell_type": "markdown", + "id": "89be6eea", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap background\n", + "\n", + "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n", + "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n", + "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n", + "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n", + "$\\widehat{\\theta}$. You can think of this as using a histogram\n", + "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n", + "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n", + "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n", + "estimators." + ] + }, + { + "cell_type": "markdown", + "id": "6c240b38", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: More Bootstrap background\n", + "\n", + "In the case that $\\widehat{\\theta}$ has\n", + "more than one component, and the components are independent, we use the\n", + "same estimator on each component separately. If the probability\n", + "density function of $X_i$, $p(x)$, had been known, then it would have\n", + "been straightforward to do this by: \n", + "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n", + "\n", + "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n", + "\n", + "By repeated use of the above two points, many\n", + "estimates of $\\widehat{\\theta}$ can be obtained. The\n", + "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n", + "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$." + ] + }, + { + "cell_type": "markdown", + "id": "fbd95a5c", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap approach\n", + "\n", + "But\n", + "unless there is enough information available about the process that\n", + "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n", + "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552) asked the\n", + "question: What if we replace $p(x)$ by the relative frequency\n", + "of the observation $X_i$?\n", + "\n", + "If we draw observations in accordance with\n", + "the relative frequency of the observations, will we obtain the same\n", + "result in some asymptotic sense? The answer is yes." + ] + }, + { + "cell_type": "markdown", + "id": "dc50d43a", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap steps\n", + "\n", + "The independent bootstrap works like this: \n", + "\n", + "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n", + "\n", + "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n", + "\n", + "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n", + "\n", + "4. Repeat this process $k$ times. \n", + "\n", + "When you are done, you can draw a histogram of the relative frequency\n", + "of $\\widehat \\theta^*$. This is your estimate of the probability\n", + "distribution $p(t)$. Using this probability distribution you can\n", + "estimate any statistics thereof. In principle you never draw the\n", + "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n", + "you use the estimators corresponding to the statistic of interest. For\n", + "example, if you are interested in estimating the variance of $\\widehat\n", + "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n", + "$\\widehat \\theta^*$." + ] + }, + { + "cell_type": "markdown", + "id": "283068cc", + "metadata": { + "editable": true + }, + "source": [ + "## Code example for the Bootstrap method\n", + "\n", + "The following code starts with a Gaussian distribution with mean value\n", + "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n", + "used in the bootstrap analysis. The bootstrap analysis returns a data\n", + "set after a given number of bootstrap operations (as many as we have\n", + "data points). This data set consists of estimated mean values for each\n", + "bootstrap operation. The histogram generated by the bootstrap method\n", + "shows that the distribution for these mean values is also a Gaussian,\n", + "centered around the mean value $\\mu=100$ but with standard deviation\n", + "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n", + "this case the same as the number of original data points). The value\n", + "of the standard deviation is what we expect from the central limit\n", + "theorem." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ff4790ba", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import numpy as np\n", + "from time import time\n", + "from scipy.stats import norm\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Returns mean of bootstrap samples \n", + "# Bootstrap algorithm\n", + "def bootstrap(data, datapoints):\n", + " t = np.zeros(datapoints)\n", + " n = len(data)\n", + " # non-parametric bootstrap \n", + " for i in range(datapoints):\n", + " t[i] = np.mean(data[np.random.randint(0,n,n)])\n", + " # analysis \n", + " print(\"Bootstrap Statistics :\")\n", + " print(\"original bias std. error\")\n", + " print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n", + " return t\n", + "\n", + "# We set the mean value to 100 and the standard deviation to 15\n", + "mu, sigma = 100, 15\n", + "datapoints = 10000\n", + "# We generate random numbers according to the normal distribution\n", + "x = mu + sigma*np.random.randn(datapoints)\n", + "# bootstrap returns the data sample \n", + "t = bootstrap(x, datapoints)" + ] + }, + { + "cell_type": "markdown", + "id": "3e6adc2f", + "metadata": { + "editable": true + }, + "source": [ + "We see that our new variance and from that the standard deviation, agrees with the central limit theorem." + ] + }, + { + "cell_type": "markdown", + "id": "6ec8223c", + "metadata": { + "editable": true + }, + "source": [ + "## Plotting the Histogram" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3cf4144d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# the histogram of the bootstrapped data (normalized data if density = True)\n", + "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n", + "# add a 'best fit' line \n", + "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n", + "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n", + "plt.xlabel('x')\n", + "plt.ylabel('Probability')\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "db5a8f91", + "metadata": { + "editable": true + }, + "source": [ + "## The bias-variance tradeoff\n", + "\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "327bce6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c671d4e", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", + "\n", + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "6e05fc43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c45e0752", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this as" + ] + }, + { + "cell_type": "markdown", + "id": "bafa4ab6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ea0bc471", + "metadata": { + "editable": true + }, + "source": [ + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", + "\n", + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "08b603f3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4114d10e", + "metadata": { + "editable": true + }, + "source": [ + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "8890c666", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d5b7ce4", + "metadata": { + "editable": true + }, + "source": [ + "which, using the abovementioned expectation values can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "3913c5b9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5e0067b1", + "metadata": { + "editable": true + }, + "source": [ + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$." + ] + }, + { + "cell_type": "markdown", + "id": "326bc8f1", + "metadata": { + "editable": true + }, + "source": [ + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d3713eca", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Bias-Variance tradeoff" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "01c3b507", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 500\n", + "n_boostraps = 100\n", + "degree = 18 # A quite high value, just to show.\n", + "noise = 0.1\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-1, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n", + "\n", + "# Hold out some test data that is never used in training.\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "# Combine x transformation and model into one operation.\n", + "# Not neccesary, but convenient.\n", + "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + "\n", + "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n", + "# for each bootstrap iteration.\n", + "y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + "for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + "\n", + " # Evaluate the new model on the same test data each time.\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + "# Note: Expectations and variances taken w.r.t. different training\n", + "# data sets, hence the axis=1. Subsequent means are taken across the test data\n", + "# set in order to obtain a total value, but before this we have error/bias/variance\n", + "# calculated per data point in the test set.\n", + "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n", + "# maintains the column vector form. Dropping this yields very unexpected results.\n", + "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + "print('Error:', error)\n", + "print('Bias^2:', bias)\n", + "print('Var:', variance)\n", + "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n", + "\n", + "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n", + "plt.scatter(x_test, y_test, label='Data points')\n", + "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "949e3a5e", + "metadata": { + "editable": true + }, + "source": [ + "## Understanding what happens" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "7e7f4926", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "33c5cae5", + "metadata": { + "editable": true + }, + "source": [ + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." + ] + }, + { + "cell_type": "markdown", + "id": "f931f0f2", + "metadata": { + "editable": true + }, + "source": [ + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "58daa28d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3bbcf741", + "metadata": { + "editable": true + }, + "source": [ + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." + ] + }, + { + "cell_type": "markdown", + "id": "4b0ffe06", + "metadata": { + "editable": true + }, + "source": [ + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" + ] + }, + { + "cell_type": "markdown", + "id": "b11baed6", + "metadata": { + "editable": true + }, + "source": [ + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", + "\n", + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "39e76d49", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", + "\n", + "## Cross-validation on Ridge regression using KFold only\n", + "\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", + "\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", + "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", + "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", + "\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", + "\n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", + "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", + "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", + "\n", + " j += 1\n", + " i += 1\n", + "\n", + "\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", + "\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", + "\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e7d12ef0", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "47f6ae18", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9c1d4754", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "b698ac66", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "0a2409b0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "56f130b5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "This week we will discuss during the first hour of each lab session\n", + "some technicalities related to the project and methods for updating\n", + "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n", + "the jupyter-notebook from week 37 (September 12-16).\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see \n", + "\n", + "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at " + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week39.ipynb b/doc/LectureNotes/_build/jupyter_execute/week39.ipynb new file mode 100644 index 000000000..c1b12b458 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week39.ipynb @@ -0,0 +1,2430 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3a65fcc4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "284ac98b", + "metadata": { + "editable": true + }, + "source": [ + "# Week 39: Resampling methods and logistic regression\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n", + "\n", + "Date: **Week 39**" + ] + }, + { + "cell_type": "markdown", + "id": "582e0b32", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 39, September 22-26, 2025\n", + "\n", + "**Material for the lecture on Monday September 22.**\n", + "\n", + "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", + "\n", + "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n", + "\n", + "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n", + "\n", + "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "08ea52de", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos, resampling methods\n", + "1. Raschka et al, pages 175-192\n", + "\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", + "\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", + "\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", + "\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)" + ] + }, + { + "cell_type": "markdown", + "id": "a8d5878f", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos, logistic regression\n", + "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n", + "\n", + "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n", + "\n", + "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n", + "\n", + "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)" + ] + }, + { + "cell_type": "markdown", + "id": "e93210f9", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions week 39\n", + "\n", + "**Material for the lab sessions on Tuesday and Wednesday.**\n", + "\n", + "1. Discussions on how to structure your report for the first project\n", + "\n", + "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n", + "\n", + "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n", + "\n", + "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n", + "\n", + "5. A general guideline can be found at ." + ] + }, + { + "cell_type": "markdown", + "id": "c319a504", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture material" + ] + }, + { + "cell_type": "markdown", + "id": "5f29284a", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", + "\n", + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation." + ] + }, + { + "cell_type": "markdown", + "id": "4a774608", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." + ] + }, + { + "cell_type": "markdown", + "id": "5e62c381", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." + ] + }, + { + "cell_type": "markdown", + "id": "96896342", + "metadata": { + "editable": true + }, + "source": [ + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." + ] + }, + { + "cell_type": "markdown", + "id": "d5318be7", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", + "\n", + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." + ] + }, + { + "cell_type": "markdown", + "id": "7597015e", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)." + ] + }, + { + "cell_type": "markdown", + "id": "fbf69230", + "metadata": { + "editable": true + }, + "source": [ + "## The bias-variance tradeoff\n", + "\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "358f7872", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a4aceef", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", + "\n", + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "84416669", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0036358e", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this as" + ] + }, + { + "cell_type": "markdown", + "id": "d712d2d7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b71e48ac", + "metadata": { + "editable": true + }, + "source": [ + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", + "\n", + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "c78ceafe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "74aae5bc", + "metadata": { + "editable": true + }, + "source": [ + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "1f2313f1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a29b174f", + "metadata": { + "editable": true + }, + "source": [ + "which, using the abovementioned expectation values can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "3bc08002", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7b7d24e8", + "metadata": { + "editable": true + }, + "source": [ + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n", + "\n", + "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**" + ] + }, + { + "cell_type": "markdown", + "id": "f2118d82", + "metadata": { + "editable": true + }, + "source": [ + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "baf08f8a", + "metadata": { + "editable": true + }, + "source": [ + "## Understanding what happens" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1bd7ac4e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3edb75ab", + "metadata": { + "editable": true + }, + "source": [ + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." + ] + }, + { + "cell_type": "markdown", + "id": "88ce8a48", + "metadata": { + "editable": true + }, + "source": [ + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "40385eb8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a0c0d4df", + "metadata": { + "editable": true + }, + "source": [ + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." + ] + }, + { + "cell_type": "markdown", + "id": "68d3e653", + "metadata": { + "editable": true + }, + "source": [ + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" + ] + }, + { + "cell_type": "markdown", + "id": "7f7a6350", + "metadata": { + "editable": true + }, + "source": [ + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", + "\n", + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "23eef50b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", + "\n", + "## Cross-validation on Ridge regression using KFold only\n", + "\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", + "\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", + "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", + "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", + "\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", + "\n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", + "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", + "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", + "\n", + " j += 1\n", + " i += 1\n", + "\n", + "\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", + "\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", + "\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "76662787", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "166cd085", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "53dc97b8", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "660084ab", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5dd5aec2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2c1f6d4b", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic Regression\n", + "\n", + "In linear regression our main interest was centered on learning the\n", + "coefficients of a functional fit (say a polynomial) in order to be\n", + "able to predict the response of a continuous variable on some unseen\n", + "data. The fit to the continuous variable $y_i$ is based on some\n", + "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", + "analytical expressions for standard ordinary Least Squares or Ridge\n", + "regression (in terms of matrices to invert) for several quantities,\n", + "ranging from the variance and thereby the confidence intervals of the\n", + "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n", + "the product of the design matrices, linear regression gives then a\n", + "simple recipe for fitting our data." + ] + }, + { + "cell_type": "markdown", + "id": "149e92ec", + "metadata": { + "editable": true + }, + "source": [ + "## Classification problems\n", + "\n", + "Classification problems, however, are concerned with outcomes taking\n", + "the form of discrete variables (i.e. categories). We may for example,\n", + "on the basis of DNA sequencing for a number of patients, like to find\n", + "out which mutations are important for a certain disease; or based on\n", + "scans of various patients' brains, figure out if there is a tumor or\n", + "not; or given a specific physical system, we'd like to identify its\n", + "state, say whether it is an ordered or disordered system (typical\n", + "situation in solid state physics); or classify the status of a\n", + "patient, whether she/he has a stroke or not and many other similar\n", + "situations.\n", + "\n", + "The most common situation we encounter when we apply logistic\n", + "regression is that of two possible outcomes, normally denoted as a\n", + "binary outcome, true or false, positive or negative, success or\n", + "failure etc." + ] + }, + { + "cell_type": "markdown", + "id": "ce85cd3a", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization and Deep learning\n", + "\n", + "Logistic regression will also serve as our stepping stone towards\n", + "neural network algorithms and supervised deep learning. For logistic\n", + "learning, the minimization of the cost function leads to a non-linear\n", + "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n", + "problem calls therefore for minimization algorithms. This forms the\n", + "bottle neck of all machine learning algorithms, namely how to find\n", + "reliable minima of a multi-variable function. This leads us to the\n", + "family of gradient descent methods. The latter are the working horses\n", + "of basically all modern machine learning algorithms.\n", + "\n", + "We note also that many of the topics discussed here on logistic \n", + "regression are also commonly used in modern supervised Deep Learning\n", + "models, as we will see later." + ] + }, + { + "cell_type": "markdown", + "id": "2eb9e687", + "metadata": { + "editable": true + }, + "source": [ + "## Basics\n", + "\n", + "We consider the case where the outputs/targets, also called the\n", + "responses or the outcomes, $y_i$ are discrete and only take values\n", + "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", + "\n", + "The goal is to predict the\n", + "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", + "made of $n$ samples, each of which carries $p$ features or predictors. The\n", + "primary goal is to identify the classes to which new unseen samples\n", + "belong.\n", + "\n", + "Let us specialize to the case of two classes only, with outputs\n", + "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", + "credit card user that could default or not on her/his credit card\n", + "debt. That is" + ] + }, + { + "cell_type": "markdown", + "id": "9b8b7d05", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7db50d1a", + "metadata": { + "editable": true + }, + "source": [ + "## Linear classifier\n", + "\n", + "Before moving to the logistic model, let us try to use our linear\n", + "regression model to classify these two outcomes. We could for example\n", + "fit a linear model to the default case if $y_i > 0.5$ and the no\n", + "default case $y_i \\leq 0.5$.\n", + "\n", + "We would then have our \n", + "weighted linear combination, namely" + ] + }, + { + "cell_type": "markdown", + "id": "a78fc346", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} + \\boldsymbol{\\epsilon},\n", + "\\label{_auto1} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "661d8faf", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n", + "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors." + ] + }, + { + "cell_type": "markdown", + "id": "8620ba1b", + "metadata": { + "editable": true + }, + "source": [ + "## Some selected properties\n", + "\n", + "The main problem with our function is that it takes values on the\n", + "entire real axis. In the case of logistic regression, however, the\n", + "labels $y_i$ are discrete variables. A typical example is the credit\n", + "card data discussed below here, where we can set the state of\n", + "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n", + "in the data set (see the full example below).\n", + "\n", + "One simple way to get a discrete output is to have sign\n", + "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n", + "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n", + "We will encounter this model in our first demonstration of neural networks.\n", + "\n", + "Historically it is called the **perceptron** model in the machine learning\n", + "literature. This model is extremely simple. However, in many cases it is more\n", + "favorable to use a ``soft\" classifier that outputs\n", + "the probability of a given category. This leads us to the logistic function." + ] + }, + { + "cell_type": "markdown", + "id": "8fdbebd2", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example\n", + "\n", + "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8dc64aeb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "from IPython.display import display\n", + "from pylab import plt, mpl\n", + "mpl.rcParams['font.family'] = 'serif'\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"chddata.csv\"),'r')\n", + "\n", + "# Read the chd data as csv file and organize the data into arrays with age group, age, and chd\n", + "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n", + "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n", + "output = chd['CHD']\n", + "age = chd['Age']\n", + "agegroup = chd['Agegroup']\n", + "numberID = chd['ID'] \n", + "display(chd)\n", + "\n", + "plt.scatter(age, output, marker='o')\n", + "plt.axis([18,70.0,-0.1, 1.2])\n", + "plt.xlabel(r'Age')\n", + "plt.ylabel(r'CHD')\n", + "plt.title(r'Age distribution and Coronary heart disease')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "40385068", + "metadata": { + "editable": true + }, + "source": [ + "## Plotting the mean value for each group\n", + "\n", + "What we could attempt however is to plot the mean value for each group." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a473659b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n", + "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n", + "plt.plot(group, agegroupmean, \"r-\")\n", + "plt.axis([0,9,0, 1.0])\n", + "plt.xlabel(r'Age group')\n", + "plt.ylabel(r'CHD mean values')\n", + "plt.title(r'Mean values for each age group')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3e2ab512", + "metadata": { + "editable": true + }, + "source": [ + "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n", + "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model" + ] + }, + { + "cell_type": "markdown", + "id": "40361f1b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a1b379fb", + "metadata": { + "editable": true + }, + "source": [ + "This expression implies however that $f(y_i\\vert x_i)$ could take any\n", + "value from minus infinity to plus infinity. If we however let\n", + "$f(y\\vert y)$ be represented by the mean value, the above example\n", + "shows us that we can constrain the function to take values between\n", + "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n", + "at our last curve we see also that it has an S-shaped form. This leads\n", + "us to a very popular model for the function $f$, namely the so-called\n", + "Sigmoid function or logistic model. We will consider this function as\n", + "representing the probability for finding a value of $y_i$ with a given\n", + "$x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "bcbf3d2b", + "metadata": { + "editable": true + }, + "source": [ + "## The logistic function\n", + "\n", + "Another widely studied model, is the so-called \n", + "perceptron model, which is an example of a \"hard classification\" model. We\n", + "will encounter this model when we discuss neural networks as\n", + "well. Each datapoint is deterministically assigned to a category (i.e\n", + "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n", + "classifier that outputs the probability of a given category rather\n", + "than a single value. For example, given $x_i$, the classifier\n", + "outputs the probability of being in a category $k$. Logistic regression\n", + "is the most common example of a so-called soft classifier. In logistic\n", + "regression, the probability that a data point $x_i$\n", + "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event," + ] + }, + { + "cell_type": "markdown", + "id": "38918f44", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fd225d0f", + "metadata": { + "editable": true + }, + "source": [ + "Note that $1-p(t)= p(-t)$." + ] + }, + { + "cell_type": "markdown", + "id": "d340b5c1", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of likelihood functions used in logistic regression and nueral networks\n", + "\n", + "The following code plots the logistic function, the step function and other functions we will encounter from here and on." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "357d6f03", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"The sigmoid function (or the logistic curve) is a\n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"tanh Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.tanh(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('tanh function')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8be63821", + "metadata": { + "editable": true + }, + "source": [ + "## Two parameters\n", + "\n", + "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "f79d930e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8a758aae", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n", + "\n", + "Note that we used" + ] + }, + { + "cell_type": "markdown", + "id": "88159170", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f9972402", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum likelihood\n", + "\n", + "In order to define the total likelihood for all possible outcomes from a \n", + "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", + "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", + "We aim thus at maximizing \n", + "the probability of seeing the observed data. We can then approximate the \n", + "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "949524d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9a7fded", + "metadata": { + "editable": true + }, + "source": [ + "from which we obtain the log-likelihood and our **cost/loss** function" + ] + }, + { + "cell_type": "markdown", + "id": "4c5f78fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ccce506", + "metadata": { + "editable": true + }, + "source": [ + "## The cost function rewritten\n", + "\n", + "Reordering the logarithms, we can rewrite the **cost/loss** function as" + ] + }, + { + "cell_type": "markdown", + "id": "bf58bb76", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41543ca6", + "metadata": { + "editable": true + }, + "source": [ + "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n", + "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + ] + }, + { + "cell_type": "markdown", + "id": "e664b57a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb357503", + "metadata": { + "editable": true + }, + "source": [ + "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", + "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + ] + }, + { + "cell_type": "markdown", + "id": "e388ad02", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cross entropy\n", + "\n", + "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n", + "therefore, any local minimizer is a global minimizer. \n", + "\n", + "Minimizing this\n", + "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "1d4f2850", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "68a0c133", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c942a72b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42caf6db", + "metadata": { + "editable": true + }, + "source": [ + "## A more compact expression\n", + "\n", + "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", + "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", + "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n", + "derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "22cd94c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9428067", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "29178d5a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6b7671ad", + "metadata": { + "editable": true + }, + "source": [ + "## Extending to more predictors\n", + "\n", + "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + ] + }, + { + "cell_type": "markdown", + "id": "500b6574", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cf0b50ce", + "metadata": { + "editable": true + }, + "source": [ + "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to" + ] + }, + { + "cell_type": "markdown", + "id": "537486ee", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "534fb571", + "metadata": { + "editable": true + }, + "source": [ + "## Including more classes\n", + "\n", + "Till now we have mainly focused on two classes, the so-called binary\n", + "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", + "of simplicity assume we have only two predictors. We have then following model" + ] + }, + { + "cell_type": "markdown", + "id": "fa7ca275", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cc765c0e", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2c43387d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e063f183", + "metadata": { + "editable": true + }, + "source": [ + "and so on till the class $C=K-1$ class" + ] + }, + { + "cell_type": "markdown", + "id": "060fa00c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9034492", + "metadata": { + "editable": true + }, + "source": [ + "and the model is specified in term of $K-1$ so-called log-odds or\n", + "**logit** transformations." + ] + }, + { + "cell_type": "markdown", + "id": "b7fba1fc", + "metadata": { + "editable": true + }, + "source": [ + "## More classes\n", + "\n", + "In our discussion of neural networks we will encounter the above again\n", + "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "\n", + "The softmax function is used in various multiclass classification\n", + "methods, such as multinomial logistic regression (also known as\n", + "softmax regression), multiclass linear discriminant analysis, naive\n", + "Bayes classifiers, and artificial neural networks. Specifically, in\n", + "multinomial logistic regression and linear discriminant analysis, the\n", + "input to the function is the result of $K$ distinct linear functions,\n", + "and the predicted probability for the $k$-th class given a sample\n", + "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n", + "predictors):" + ] + }, + { + "cell_type": "markdown", + "id": "a8346f86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b05e18eb", + "metadata": { + "editable": true + }, + "source": [ + "It is easy to extend to more predictors. The final class is" + ] + }, + { + "cell_type": "markdown", + "id": "3bff89b1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e89e832c", + "metadata": { + "editable": true + }, + "source": [ + "and they sum to one. Our earlier discussions were all specialized to\n", + "the case with two classes only. It is easy to see from the above that\n", + "what we derived earlier is compatible with these equations.\n", + "\n", + "To find the optimal parameters we would typically use a gradient\n", + "descent method. Newton's method and gradient descent methods are\n", + "discussed in the material on [optimization\n", + "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "464d4933", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization, the central part of any Machine Learning algortithm\n", + "\n", + "Almost every problem in machine learning and data science starts with\n", + "a dataset $X$, a model $g(\\theta)$, which is a function of the\n", + "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n", + "us to judge how well the model $g(\\theta)$ explains the observations\n", + "$X$. The model is fit by finding the values of $\\theta$ that minimize\n", + "the cost function. Ideally we would be able to solve for $\\theta$\n", + "analytically, however this is not possible in general and we must use\n", + "some approximative/numerical method to compute the minimum." + ] + }, + { + "cell_type": "markdown", + "id": "c707d4a0", + "metadata": { + "editable": true + }, + "source": [ + "## Revisiting our Logistic Regression case\n", + "\n", + "In our discussion on Logistic Regression we studied the \n", + "case of\n", + "two classes, with $y_i$ either\n", + "$0$ or $1$. Furthermore we assumed also that we have only two\n", + "parameters $\\theta$ in our fitting, that is we\n", + "defined probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "3f00d244", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2d239661", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$." + ] + }, + { + "cell_type": "markdown", + "id": "4243778f", + "metadata": { + "editable": true + }, + "source": [ + "## The equations to solve\n", + "\n", + "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", + "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", + "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n", + "the first derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "21ce04bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b854153c", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "235c9b1d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1651fe82", + "metadata": { + "editable": true + }, + "source": [ + "This defines what is called the Hessian matrix." + ] + }, + { + "cell_type": "markdown", + "id": "f36a8c94", + "metadata": { + "editable": true + }, + "source": [ + "## Solving using Newton-Raphson's method\n", + "\n", + "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", + "\n", + "Our iterative scheme is then given by" + ] + }, + { + "cell_type": "markdown", + "id": "438b5efe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3ae8207", + "metadata": { + "editable": true + }, + "source": [ + "or in matrix form as" + ] + }, + { + "cell_type": "markdown", + "id": "702a38c4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "43b5a9ab", + "metadata": { + "editable": true + }, + "source": [ + "The right-hand side is computed with the old values of $\\theta$. \n", + "\n", + "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + ] + }, + { + "cell_type": "markdown", + "id": "5b579d10", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Logistic Regression\n", + "\n", + "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a59b8c77", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "class LogisticRegression:\n", + " \"\"\"\n", + " Logistic Regression for binary and multiclass classification.\n", + " \"\"\"\n", + " def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n", + " self.lr = lr # Learning rate for gradient descent\n", + " self.epochs = epochs # Number of iterations\n", + " self.fit_intercept = fit_intercept # Whether to add intercept (bias)\n", + " self.verbose = verbose # Print loss during training if True\n", + " self.weights = None\n", + " self.multi_class = False # Will be determined at fit time\n", + "\n", + " def _add_intercept(self, X):\n", + " \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n", + " intercept = np.ones((X.shape[0], 1))\n", + " return np.concatenate((intercept, X), axis=1)\n", + "\n", + " def _sigmoid(self, z):\n", + " \"\"\"Sigmoid function for binary logistic.\"\"\"\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def _softmax(self, Z):\n", + " \"\"\"Softmax function for multiclass logistic.\"\"\"\n", + " exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n", + " return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n", + "\n", + " def fit(self, X, y):\n", + " \"\"\"\n", + " Train the logistic regression model using gradient descent.\n", + " Supports binary (sigmoid) and multiclass (softmax) based on y.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " y = np.array(y)\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Add intercept if needed\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " n_features += 1\n", + "\n", + " # Determine classes and mode (binary vs multiclass)\n", + " unique_classes = np.unique(y)\n", + " if len(unique_classes) > 2:\n", + " self.multi_class = True\n", + " else:\n", + " self.multi_class = False\n", + "\n", + " # ----- Multiclass case -----\n", + " if self.multi_class:\n", + " n_classes = len(unique_classes)\n", + " # Map original labels to 0...n_classes-1\n", + " class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n", + " y_indices = np.array([class_to_index[c] for c in y])\n", + " # Initialize weight matrix (features x classes)\n", + " self.weights = np.zeros((n_features, n_classes))\n", + "\n", + " # One-hot encode y\n", + " Y_onehot = np.zeros((n_samples, n_classes))\n", + " Y_onehot[np.arange(n_samples), y_indices] = 1\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " scores = X.dot(self.weights) # Linear scores (n_samples x n_classes)\n", + " probs = self._softmax(scores) # Probabilities (n_samples x n_classes)\n", + " # Compute gradient (features x classes)\n", + " gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n", + " # Update weights\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute current loss (categorical cross-entropy)\n", + " loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n", + " print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n", + "\n", + " # ----- Binary case -----\n", + " else:\n", + " # Convert y to 0/1 if not already\n", + " if not np.array_equal(unique_classes, [0, 1]):\n", + " # Map the two classes to 0 and 1\n", + " class0, class1 = unique_classes\n", + " y_binary = np.where(y == class1, 1, 0)\n", + " else:\n", + " y_binary = y.copy().astype(int)\n", + "\n", + " # Initialize weights vector (features,)\n", + " self.weights = np.zeros(n_features)\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " linear_model = X.dot(self.weights) # (n_samples,)\n", + " probs = self._sigmoid(linear_model) # (n_samples,)\n", + " # Gradient for binary cross-entropy\n", + " gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute binary cross-entropy loss\n", + " loss = -np.mean(\n", + " y_binary * np.log(probs + 1e-15) + \n", + " (1 - y_binary) * np.log(1 - probs + 1e-15)\n", + " )\n", + " print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n", + "\n", + " def predict_prob(self, X):\n", + " \"\"\"\n", + " Compute probability estimates. Returns a 1D array for binary or\n", + " a 2D array (n_samples x n_classes) for multiclass.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " # Add intercept if the model used it\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " scores = X.dot(self.weights)\n", + " if self.multi_class:\n", + " return self._softmax(scores)\n", + " else:\n", + " return self._sigmoid(scores)\n", + "\n", + " def predict(self, X):\n", + " \"\"\"\n", + " Predict class labels for samples in X.\n", + " Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n", + " \"\"\"\n", + " probs = self.predict_prob(X)\n", + " if self.multi_class:\n", + " # Choose class with highest probability\n", + " return np.argmax(probs, axis=1)\n", + " else:\n", + " # Threshold at 0.5 for binary\n", + " return (probs >= 0.5).astype(int)" + ] + }, + { + "cell_type": "markdown", + "id": "d7401376", + "metadata": { + "editable": true + }, + "source": [ + "The class implements the sigmoid and softmax internally. During fit(),\n", + "we check the number of classes: if more than 2, we set\n", + "self.multi_class=True and perform multinomial logistic regression. We\n", + "one-hot encode the target vector and update a weight matrix with\n", + "softmax probabilities. Otherwise, we do standard binary logistic\n", + "regression, converting labels to 0/1 if needed and updating a weight\n", + "vector. In both cases we use batch gradient descent on the\n", + "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n", + "stability). Progress (loss) can be printed if verbose=True." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "8609fd64", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Evaluation Metrics\n", + "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n", + "\n", + "def accuracy_score(y_true, y_pred):\n", + " \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n", + " y_true = np.array(y_true)\n", + " y_pred = np.array(y_pred)\n", + " return np.mean(y_true == y_pred)\n", + "\n", + "def binary_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Binary cross-entropy loss.\n", + " y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n", + " \"\"\"\n", + " y_true = np.array(y_true)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n", + "\n", + "def categorical_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Categorical cross-entropy loss for multiclass.\n", + " y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n", + " \"\"\"\n", + " y_true = np.array(y_true, dtype=int)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " # One-hot encode true labels\n", + " n_samples, n_classes = y_prob.shape\n", + " one_hot = np.zeros_like(y_prob)\n", + " one_hot[np.arange(n_samples), y_true] = 1\n", + " # Compute cross-entropy\n", + " loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n", + " return np.mean(loss_vec)" + ] + }, + { + "cell_type": "markdown", + "id": "1879aba2", + "metadata": { + "editable": true + }, + "source": [ + "### Synthetic data generation\n", + "\n", + "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n", + "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "6083d844", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic binary classification data.\n", + " Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " # Half samples for class 0, half for class 1\n", + " n0 = n_samples // 2\n", + " n1 = n_samples - n0\n", + " # Class 0 around mean -2, class 1 around +2\n", + " mean0 = -2 * np.ones(n_features)\n", + " mean1 = 2 * np.ones(n_features)\n", + " X0 = rng.randn(n0, n_features) + mean0\n", + " X1 = rng.randn(n1, n_features) + mean1\n", + " X = np.vstack((X0, X1))\n", + " y = np.array([0]*n0 + [1]*n1)\n", + " return X, y\n", + "\n", + "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic multiclass data with n_classes Gaussian clusters.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " X = []\n", + " y = []\n", + " samples_per_class = n_samples // n_classes\n", + " for cls in range(n_classes):\n", + " # Random cluster center for each class\n", + " center = rng.uniform(-5, 5, size=n_features)\n", + " Xi = rng.randn(samples_per_class, n_features) + center\n", + " yi = [cls] * samples_per_class\n", + " X.append(Xi)\n", + " y.extend(yi)\n", + " X = np.vstack(X)\n", + " y = np.array(y)\n", + " return X, y\n", + "\n", + "\n", + "# Generate and test on binary data\n", + "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n", + "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_bin.fit(X_bin, y_bin)\n", + "y_prob_bin = model_bin.predict_prob(X_bin) # probabilities for class 1\n", + "y_pred_bin = model_bin.predict(X_bin) # predicted classes 0 or 1\n", + "\n", + "acc_bin = accuracy_score(y_bin, y_pred_bin)\n", + "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n", + "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n", + "#For multiclass:\n", + "# Generate and test on multiclass data\n", + "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n", + "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_multi.fit(X_multi, y_multi)\n", + "y_prob_multi = model_multi.predict_prob(X_multi) # (n_samples x 3) probabilities\n", + "y_pred_multi = model_multi.predict(X_multi) # predicted labels 0,1,2\n", + "\n", + "acc_multi = accuracy_score(y_multi, y_pred_multi)\n", + "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n", + "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n", + "\n", + "# CSV Export\n", + "import csv\n", + "\n", + "# Export binary results\n", + "with open('binary_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_bin, y_pred_bin):\n", + " writer.writerow([true, pred])\n", + "\n", + "# Export multiclass results\n", + "with open('multiclass_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_multi, y_pred_multi):\n", + " writer.writerow([true, pred])" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week40.ipynb b/doc/LectureNotes/_build/jupyter_execute/week40.ipynb new file mode 100644 index 000000000..5475b7668 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week40.ipynb @@ -0,0 +1,2459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2303c986", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "75c3b33e", + "metadata": { + "editable": true + }, + "source": [ + "# Week 40: Gradient descent methods (continued) and start Neural networks\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 29-October 3, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "4ba50982", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday September 29, 2025\n", + "1. Logistic regression and gradient descent, examples on how to code\n", + "\n", + "\n", + "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n", + "\n", + "3. Video of lecture at \n", + "\n", + "4. Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "1d527020", + "metadata": { + "editable": true + }, + "source": [ + "## Suggested readings and videos\n", + "**Readings and Videos:**\n", + "\n", + "1. The lecture notes for week 40 (these notes)\n", + "\n", + "\n", + "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n", + "\n", + "\n", + "\n", + "3. Neural Networks demystified at \n", + "\n", + "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\"" + ] + }, + { + "cell_type": "markdown", + "id": "63a4d497", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions Tuesday and Wednesday\n", + "**Material for the active learning sessions on Tuesday and Wednesday.**\n", + "\n", + " * Work on project 1 and discussions on how to structure your report\n", + "\n", + " * No weekly exercises for week 40, project work only\n", + "\n", + " * Video on how to write scientific reports recorded during one of the lab sessions at \n", + "\n", + " * A general guideline can be found at ." + ] + }, + { + "cell_type": "markdown", + "id": "73621d6b", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic Regression, from last week\n", + "\n", + "In linear regression our main interest was centered on learning the\n", + "coefficients of a functional fit (say a polynomial) in order to be\n", + "able to predict the response of a continuous variable on some unseen\n", + "data. The fit to the continuous variable $y_i$ is based on some\n", + "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", + "analytical expressions for standard ordinary Least Squares or Ridge\n", + "regression (in terms of matrices to invert) for several quantities,\n", + "ranging from the variance and thereby the confidence intervals of the\n", + "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n", + "the product of the design matrices, linear regression gives then a\n", + "simple recipe for fitting our data." + ] + }, + { + "cell_type": "markdown", + "id": "fc1df17b", + "metadata": { + "editable": true + }, + "source": [ + "## Classification problems\n", + "\n", + "Classification problems, however, are concerned with outcomes taking\n", + "the form of discrete variables (i.e. categories). We may for example,\n", + "on the basis of DNA sequencing for a number of patients, like to find\n", + "out which mutations are important for a certain disease; or based on\n", + "scans of various patients' brains, figure out if there is a tumor or\n", + "not; or given a specific physical system, we'd like to identify its\n", + "state, say whether it is an ordered or disordered system (typical\n", + "situation in solid state physics); or classify the status of a\n", + "patient, whether she/he has a stroke or not and many other similar\n", + "situations.\n", + "\n", + "The most common situation we encounter when we apply logistic\n", + "regression is that of two possible outcomes, normally denoted as a\n", + "binary outcome, true or false, positive or negative, success or\n", + "failure etc." + ] + }, + { + "cell_type": "markdown", + "id": "a3d311e6", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization and Deep learning\n", + "\n", + "Logistic regression will also serve as our stepping stone towards\n", + "neural network algorithms and supervised deep learning. For logistic\n", + "learning, the minimization of the cost function leads to a non-linear\n", + "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n", + "problem calls therefore for minimization algorithms.\n", + "\n", + "As we have discussed earlier, this forms the\n", + "bottle neck of all machine learning algorithms, namely how to find\n", + "reliable minima of a multi-variable function. This leads us to the\n", + "family of gradient descent methods. The latter are the working horses\n", + "of basically all modern machine learning algorithms.\n", + "\n", + "We note also that many of the topics discussed here on logistic \n", + "regression are also commonly used in modern supervised Deep Learning\n", + "models, as we will see later." + ] + }, + { + "cell_type": "markdown", + "id": "4120d6f9", + "metadata": { + "editable": true + }, + "source": [ + "## Basics\n", + "\n", + "We consider the case where the outputs/targets, also called the\n", + "responses or the outcomes, $y_i$ are discrete and only take values\n", + "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", + "\n", + "The goal is to predict the\n", + "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", + "made of $n$ samples, each of which carries $p$ features or predictors. The\n", + "primary goal is to identify the classes to which new unseen samples\n", + "belong.\n", + "\n", + "Last week we specialized to the case of two classes only, with outputs\n", + "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", + "credit card user that could default or not on her/his credit card\n", + "debt. That is" + ] + }, + { + "cell_type": "markdown", + "id": "9e85d1e4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a0d8c838", + "metadata": { + "editable": true + }, + "source": [ + "## Two parameters\n", + "\n", + "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "7cea7945", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6adc5106", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n", + "\n", + "Note that we used" + ] + }, + { + "cell_type": "markdown", + "id": "f976068e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dedf9f0e", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum likelihood\n", + "\n", + "In order to define the total likelihood for all possible outcomes from a \n", + "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", + "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", + "We aim thus at maximizing \n", + "the probability of seeing the observed data. We can then approximate the \n", + "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "bd8b54ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "57bfb17f", + "metadata": { + "editable": true + }, + "source": [ + "from which we obtain the log-likelihood and our **cost/loss** function" + ] + }, + { + "cell_type": "markdown", + "id": "00aee268", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e12940f3", + "metadata": { + "editable": true + }, + "source": [ + "## The cost function rewritten\n", + "\n", + "Reordering the logarithms, we can rewrite the **cost/loss** function as" + ] + }, + { + "cell_type": "markdown", + "id": "e5b2b29e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c6c0ba4c", + "metadata": { + "editable": true + }, + "source": [ + "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n", + "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + ] + }, + { + "cell_type": "markdown", + "id": "46ee2ea8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a05709b", + "metadata": { + "editable": true + }, + "source": [ + "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", + "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + ] + }, + { + "cell_type": "markdown", + "id": "ae1362c9", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cross entropy\n", + "\n", + "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n", + "therefore, any local minimizer is a global minimizer. \n", + "\n", + "Minimizing this\n", + "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "57f4670b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1dc19f59", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "4e96dc87", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fa77bec9", + "metadata": { + "editable": true + }, + "source": [ + "## A more compact expression\n", + "\n", + "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", + "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", + "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n", + "derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "1b013fd2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "910f36dd", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "8212d0ed", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ae7078b", + "metadata": { + "editable": true + }, + "source": [ + "## Extending to more predictors\n", + "\n", + "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + ] + }, + { + "cell_type": "markdown", + "id": "59e57d7c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6ffe0955", + "metadata": { + "editable": true + }, + "source": [ + "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to" + ] + }, + { + "cell_type": "markdown", + "id": "56e9bd82", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "86b12946", + "metadata": { + "editable": true + }, + "source": [ + "## Including more classes\n", + "\n", + "Till now we have mainly focused on two classes, the so-called binary\n", + "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", + "of simplicity assume we have only two predictors. We have then following model" + ] + }, + { + "cell_type": "markdown", + "id": "d55394df", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee01378a", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c7fadfbb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e8310f63", + "metadata": { + "editable": true + }, + "source": [ + "and so on till the class $C=K-1$ class" + ] + }, + { + "cell_type": "markdown", + "id": "be651647", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e277c601", + "metadata": { + "editable": true + }, + "source": [ + "and the model is specified in term of $K-1$ so-called log-odds or\n", + "**logit** transformations." + ] + }, + { + "cell_type": "markdown", + "id": "aea3a410", + "metadata": { + "editable": true + }, + "source": [ + "## More classes\n", + "\n", + "In our discussion of neural networks we will encounter the above again\n", + "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "\n", + "The softmax function is used in various multiclass classification\n", + "methods, such as multinomial logistic regression (also known as\n", + "softmax regression), multiclass linear discriminant analysis, naive\n", + "Bayes classifiers, and artificial neural networks. Specifically, in\n", + "multinomial logistic regression and linear discriminant analysis, the\n", + "input to the function is the result of $K$ distinct linear functions,\n", + "and the predicted probability for the $k$-th class given a sample\n", + "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n", + "predictors):" + ] + }, + { + "cell_type": "markdown", + "id": "bfa7221f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3d749c39", + "metadata": { + "editable": true + }, + "source": [ + "It is easy to extend to more predictors. The final class is" + ] + }, + { + "cell_type": "markdown", + "id": "dc061a39", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ea10488", + "metadata": { + "editable": true + }, + "source": [ + "and they sum to one. Our earlier discussions were all specialized to\n", + "the case with two classes only. It is easy to see from the above that\n", + "what we derived earlier is compatible with these equations.\n", + "\n", + "To find the optimal parameters we would typically use a gradient\n", + "descent method. Newton's method and gradient descent methods are\n", + "discussed in the material on [optimization\n", + "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "9cb3baf8", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization, the central part of any Machine Learning algortithm\n", + "\n", + "Almost every problem in machine learning and data science starts with\n", + "a dataset $X$, a model $g(\\theta)$, which is a function of the\n", + "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n", + "us to judge how well the model $g(\\theta)$ explains the observations\n", + "$X$. The model is fit by finding the values of $\\theta$ that minimize\n", + "the cost function. Ideally we would be able to solve for $\\theta$\n", + "analytically, however this is not possible in general and we must use\n", + "some approximative/numerical method to compute the minimum." + ] + }, + { + "cell_type": "markdown", + "id": "387393d7", + "metadata": { + "editable": true + }, + "source": [ + "## Revisiting our Logistic Regression case\n", + "\n", + "In our discussion on Logistic Regression we studied the \n", + "case of\n", + "two classes, with $y_i$ either\n", + "$0$ or $1$. Furthermore we assumed also that we have only two\n", + "parameters $\\theta$ in our fitting, that is we\n", + "defined probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "30f64659", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ba65422", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$." + ] + }, + { + "cell_type": "markdown", + "id": "005f46d7", + "metadata": { + "editable": true + }, + "source": [ + "## The equations to solve\n", + "\n", + "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", + "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", + "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n", + "the first derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "61a638bc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "469c0042", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "0af5449a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f4c16b4f", + "metadata": { + "editable": true + }, + "source": [ + "This defines what is called the Hessian matrix." + ] + }, + { + "cell_type": "markdown", + "id": "ddbe7f50", + "metadata": { + "editable": true + }, + "source": [ + "## Solving using Newton-Raphson's method\n", + "\n", + "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", + "\n", + "Our iterative scheme is then given by" + ] + }, + { + "cell_type": "markdown", + "id": "52830f96", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1b8a1c14", + "metadata": { + "editable": true + }, + "source": [ + "or in matrix form as" + ] + }, + { + "cell_type": "markdown", + "id": "8ad73cea", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6d47dd0b", + "metadata": { + "editable": true + }, + "source": [ + "The right-hand side is computed with the old values of $\\theta$. \n", + "\n", + "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + ] + }, + { + "cell_type": "markdown", + "id": "f399c2f4", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Logistic Regression\n", + "\n", + "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "79f6b6fc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "class LogisticRegression:\n", + " \"\"\"\n", + " Logistic Regression for binary and multiclass classification.\n", + " \"\"\"\n", + " def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n", + " self.lr = lr # Learning rate for gradient descent\n", + " self.epochs = epochs # Number of iterations\n", + " self.fit_intercept = fit_intercept # Whether to add intercept (bias)\n", + " self.verbose = verbose # Print loss during training if True\n", + " self.weights = None\n", + " self.multi_class = False # Will be determined at fit time\n", + "\n", + " def _add_intercept(self, X):\n", + " \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n", + " intercept = np.ones((X.shape[0], 1))\n", + " return np.concatenate((intercept, X), axis=1)\n", + "\n", + " def _sigmoid(self, z):\n", + " \"\"\"Sigmoid function for binary logistic.\"\"\"\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def _softmax(self, Z):\n", + " \"\"\"Softmax function for multiclass logistic.\"\"\"\n", + " exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n", + " return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n", + "\n", + " def fit(self, X, y):\n", + " \"\"\"\n", + " Train the logistic regression model using gradient descent.\n", + " Supports binary (sigmoid) and multiclass (softmax) based on y.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " y = np.array(y)\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Add intercept if needed\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " n_features += 1\n", + "\n", + " # Determine classes and mode (binary vs multiclass)\n", + " unique_classes = np.unique(y)\n", + " if len(unique_classes) > 2:\n", + " self.multi_class = True\n", + " else:\n", + " self.multi_class = False\n", + "\n", + " # ----- Multiclass case -----\n", + " if self.multi_class:\n", + " n_classes = len(unique_classes)\n", + " # Map original labels to 0...n_classes-1\n", + " class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n", + " y_indices = np.array([class_to_index[c] for c in y])\n", + " # Initialize weight matrix (features x classes)\n", + " self.weights = np.zeros((n_features, n_classes))\n", + "\n", + " # One-hot encode y\n", + " Y_onehot = np.zeros((n_samples, n_classes))\n", + " Y_onehot[np.arange(n_samples), y_indices] = 1\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " scores = X.dot(self.weights) # Linear scores (n_samples x n_classes)\n", + " probs = self._softmax(scores) # Probabilities (n_samples x n_classes)\n", + " # Compute gradient (features x classes)\n", + " gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n", + " # Update weights\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute current loss (categorical cross-entropy)\n", + " loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n", + " print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n", + "\n", + " # ----- Binary case -----\n", + " else:\n", + " # Convert y to 0/1 if not already\n", + " if not np.array_equal(unique_classes, [0, 1]):\n", + " # Map the two classes to 0 and 1\n", + " class0, class1 = unique_classes\n", + " y_binary = np.where(y == class1, 1, 0)\n", + " else:\n", + " y_binary = y.copy().astype(int)\n", + "\n", + " # Initialize weights vector (features,)\n", + " self.weights = np.zeros(n_features)\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " linear_model = X.dot(self.weights) # (n_samples,)\n", + " probs = self._sigmoid(linear_model) # (n_samples,)\n", + " # Gradient for binary cross-entropy\n", + " gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute binary cross-entropy loss\n", + " loss = -np.mean(\n", + " y_binary * np.log(probs + 1e-15) + \n", + " (1 - y_binary) * np.log(1 - probs + 1e-15)\n", + " )\n", + " print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n", + "\n", + " def predict_prob(self, X):\n", + " \"\"\"\n", + " Compute probability estimates. Returns a 1D array for binary or\n", + " a 2D array (n_samples x n_classes) for multiclass.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " # Add intercept if the model used it\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " scores = X.dot(self.weights)\n", + " if self.multi_class:\n", + " return self._softmax(scores)\n", + " else:\n", + " return self._sigmoid(scores)\n", + "\n", + " def predict(self, X):\n", + " \"\"\"\n", + " Predict class labels for samples in X.\n", + " Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n", + " \"\"\"\n", + " probs = self.predict_prob(X)\n", + " if self.multi_class:\n", + " # Choose class with highest probability\n", + " return np.argmax(probs, axis=1)\n", + " else:\n", + " # Threshold at 0.5 for binary\n", + " return (probs >= 0.5).astype(int)" + ] + }, + { + "cell_type": "markdown", + "id": "24e84b29", + "metadata": { + "editable": true + }, + "source": [ + "The class implements the sigmoid and softmax internally. During fit(),\n", + "we check the number of classes: if more than 2, we set\n", + "self.multi_class=True and perform multinomial logistic regression. We\n", + "one-hot encode the target vector and update a weight matrix with\n", + "softmax probabilities. Otherwise, we do standard binary logistic\n", + "regression, converting labels to 0/1 if needed and updating a weight\n", + "vector. In both cases we use batch gradient descent on the\n", + "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n", + "stability). Progress (loss) can be printed if verbose=True." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "7a73eca4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Evaluation Metrics\n", + "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n", + "\n", + "def accuracy_score(y_true, y_pred):\n", + " \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n", + " y_true = np.array(y_true)\n", + " y_pred = np.array(y_pred)\n", + " return np.mean(y_true == y_pred)\n", + "\n", + "def binary_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Binary cross-entropy loss.\n", + " y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n", + " \"\"\"\n", + " y_true = np.array(y_true)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n", + "\n", + "def categorical_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Categorical cross-entropy loss for multiclass.\n", + " y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n", + " \"\"\"\n", + " y_true = np.array(y_true, dtype=int)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " # One-hot encode true labels\n", + " n_samples, n_classes = y_prob.shape\n", + " one_hot = np.zeros_like(y_prob)\n", + " one_hot[np.arange(n_samples), y_true] = 1\n", + " # Compute cross-entropy\n", + " loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n", + " return np.mean(loss_vec)" + ] + }, + { + "cell_type": "markdown", + "id": "40d4b30f", + "metadata": { + "editable": true + }, + "source": [ + "### Synthetic data generation\n", + "\n", + "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n", + "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ac0089bf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic binary classification data.\n", + " Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " # Half samples for class 0, half for class 1\n", + " n0 = n_samples // 2\n", + " n1 = n_samples - n0\n", + " # Class 0 around mean -2, class 1 around +2\n", + " mean0 = -2 * np.ones(n_features)\n", + " mean1 = 2 * np.ones(n_features)\n", + " X0 = rng.randn(n0, n_features) + mean0\n", + " X1 = rng.randn(n1, n_features) + mean1\n", + " X = np.vstack((X0, X1))\n", + " y = np.array([0]*n0 + [1]*n1)\n", + " return X, y\n", + "\n", + "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic multiclass data with n_classes Gaussian clusters.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " X = []\n", + " y = []\n", + " samples_per_class = n_samples // n_classes\n", + " for cls in range(n_classes):\n", + " # Random cluster center for each class\n", + " center = rng.uniform(-5, 5, size=n_features)\n", + " Xi = rng.randn(samples_per_class, n_features) + center\n", + " yi = [cls] * samples_per_class\n", + " X.append(Xi)\n", + " y.extend(yi)\n", + " X = np.vstack(X)\n", + " y = np.array(y)\n", + " return X, y\n", + "\n", + "\n", + "# Generate and test on binary data\n", + "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n", + "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_bin.fit(X_bin, y_bin)\n", + "y_prob_bin = model_bin.predict_prob(X_bin) # probabilities for class 1\n", + "y_pred_bin = model_bin.predict(X_bin) # predicted classes 0 or 1\n", + "\n", + "acc_bin = accuracy_score(y_bin, y_pred_bin)\n", + "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n", + "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n", + "#For multiclass:\n", + "# Generate and test on multiclass data\n", + "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n", + "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_multi.fit(X_multi, y_multi)\n", + "y_prob_multi = model_multi.predict_prob(X_multi) # (n_samples x 3) probabilities\n", + "y_pred_multi = model_multi.predict(X_multi) # predicted labels 0,1,2\n", + "\n", + "acc_multi = accuracy_score(y_multi, y_pred_multi)\n", + "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n", + "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n", + "\n", + "# CSV Export\n", + "import csv\n", + "\n", + "# Export binary results\n", + "with open('binary_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_bin, y_pred_bin):\n", + " writer.writerow([true, pred])\n", + "\n", + "# Export multiclass results\n", + "with open('multiclass_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_multi, y_pred_multi):\n", + " writer.writerow([true, pred])" + ] + }, + { + "cell_type": "markdown", + "id": "1e9acef3", + "metadata": { + "editable": true + }, + "source": [ + "## Using **Scikit-learn**\n", + "\n", + "We show here how we can use a logistic regression case on a data set\n", + "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n", + "using Logistic regression as our algorithm for classification. This is\n", + "a widely studied data set and can easily be included in demonstrations\n", + "of classification problems." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9153234a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data\n", + "cancer = load_breast_cancer()\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))" + ] + }, + { + "cell_type": "markdown", + "id": "908d547b", + "metadata": { + "editable": true + }, + "source": [ + "## Using the correlation matrix\n", + "\n", + "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n", + "We use **Pandas** to compute the correlation matrix." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8a46f4f3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "cancer = load_breast_cancer()\n", + "import pandas as pd\n", + "# Making a data frame\n", + "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n", + "\n", + "fig, axes = plt.subplots(15,2,figsize=(10,20))\n", + "malignant = cancer.data[cancer.target == 0]\n", + "benign = cancer.data[cancer.target == 1]\n", + "ax = axes.ravel()\n", + "\n", + "for i in range(30):\n", + " _, bins = np.histogram(cancer.data[:,i], bins =50)\n", + " ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n", + " ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n", + " ax[i].set_title(cancer.feature_names[i])\n", + " ax[i].set_yticks(())\n", + "ax[0].set_xlabel(\"Feature magnitude\")\n", + "ax[0].set_ylabel(\"Frequency\")\n", + "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n", + "fig.tight_layout()\n", + "plt.show()\n", + "\n", + "import seaborn as sns\n", + "correlation_matrix = cancerpd.corr().round(1)\n", + "# use the heatmap function from seaborn to plot the correlation matrix\n", + "# annot = True to print the values inside the square\n", + "plt.figure(figsize=(15,8))\n", + "sns.heatmap(data=correlation_matrix, annot=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ba0275a7", + "metadata": { + "editable": true + }, + "source": [ + "## Discussing the correlation data\n", + "\n", + "In the above example we note two things. In the first plot we display\n", + "the overlap of benign and malignant tumors as functions of the various\n", + "features in the Wisconsin data set. We see that for\n", + "some of the features we can distinguish clearly the benign and\n", + "malignant cases while for other features we cannot. This can point to\n", + "us which features may be of greater interest when we wish to classify\n", + "a benign or not benign tumour.\n", + "\n", + "In the second figure we have computed the so-called correlation\n", + "matrix, which in our case with thirty features becomes a $30\\times 30$\n", + "matrix.\n", + "\n", + "We constructed this matrix using **pandas** via the statements" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "1af34f8e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)" + ] + }, + { + "cell_type": "markdown", + "id": "1eac30d3", + "metadata": { + "editable": true + }, + "source": [ + "and then" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a0cdd9c9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "correlation_matrix = cancerpd.corr().round(1)" + ] + }, + { + "cell_type": "markdown", + "id": "013777ad", + "metadata": { + "editable": true + }, + "source": [ + "Diagonalizing this matrix we can in turn say something about which\n", + "features are of relevance and which are not. This leads us to\n", + "the classical Principal Component Analysis (PCA) theorem with\n", + "applications. This will be discussed later this semester." + ] + }, + { + "cell_type": "markdown", + "id": "410f90ac", + "metadata": { + "editable": true + }, + "source": [ + "## Other measures in classification studies" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fa16a459", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data\n", + "cancer = load_breast_cancer()\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a721de53", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to Neural networks\n", + "\n", + "Artificial neural networks are computational systems that can learn to\n", + "perform tasks by considering examples, generally without being\n", + "programmed with any task-specific rules. It is supposed to mimic a\n", + "biological system, wherein neurons interact by sending signals in the\n", + "form of mathematical functions between layers. All layers can contain\n", + "an arbitrary number of neurons, and each connection is represented by\n", + "a weight variable." + ] + }, + { + "cell_type": "markdown", + "id": "68de5052", + "metadata": { + "editable": true + }, + "source": [ + "## Artificial neurons\n", + "\n", + "The field of artificial neural networks has a long history of\n", + "development, and is closely connected with the advancement of computer\n", + "science and computers in general. A model of artificial neurons was\n", + "first developed by McCulloch and Pitts in 1943 to study signal\n", + "processing in the brain and has later been refined by others. The\n", + "general idea is to mimic neural networks in the human brain, which is\n", + "composed of billions of neurons that communicate with each other by\n", + "sending electrical signals. Each neuron accumulates its incoming\n", + "signals, which must exceed an activation threshold to yield an\n", + "output. If the threshold is not overcome, the neuron remains inactive,\n", + "i.e. has zero output.\n", + "\n", + "This behaviour has inspired a simple mathematical model for an artificial neuron." + ] + }, + { + "cell_type": "markdown", + "id": "7685af02", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n", + "\\label{artificialNeuron} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3dfcfcb0", + "metadata": { + "editable": true + }, + "source": [ + "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n", + "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n", + "\n", + "Conceptually, it is helpful to divide neural networks into four\n", + "categories:\n", + "1. general purpose neural networks for supervised learning,\n", + "\n", + "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n", + "\n", + "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n", + "\n", + "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n", + "\n", + "In natural science, DNNs and CNNs have already found numerous\n", + "applications. In statistical physics, they have been applied to detect\n", + "phase transitions in 2D Ising and Potts models, lattice gauge\n", + "theories, and different phases of polymers, or solving the\n", + "Navier-Stokes equation in weather forecasting. Deep learning has also\n", + "found interesting applications in quantum physics. Various quantum\n", + "phase transitions can be detected and studied using DNNs and CNNs,\n", + "topological phases, and even non-equilibrium many-body\n", + "localization. Representing quantum states as DNNs quantum state\n", + "tomography are among some of the impressive achievements to reveal the\n", + "potential of DNNs to facilitate the study of quantum systems.\n", + "\n", + "In quantum information theory, it has been shown that one can perform\n", + "gate decompositions with the help of neural. \n", + "\n", + "The applications are not limited to the natural sciences. There is a\n", + "plethora of applications in essentially all disciplines, from the\n", + "humanities to life science and medicine." + ] + }, + { + "cell_type": "markdown", + "id": "0d037ca7", + "metadata": { + "editable": true + }, + "source": [ + "## Neural network types\n", + "\n", + "An artificial neural network (ANN), is a computational model that\n", + "consists of layers of connected neurons, or nodes or units. We will\n", + "refer to these interchangeably as units or nodes, and sometimes as\n", + "neurons.\n", + "\n", + "It is supposed to mimic a biological nervous system by letting each\n", + "neuron interact with other neurons by sending signals in the form of\n", + "mathematical functions between layers. A wide variety of different\n", + "ANNs have been developed, but most of them consist of an input layer,\n", + "an output layer and eventual layers in-between, called *hidden\n", + "layers*. All layers can contain an arbitrary number of nodes, and each\n", + "connection between two nodes is associated with a weight variable.\n", + "\n", + "Neural networks (also called neural nets) are neural-inspired\n", + "nonlinear models for supervised learning. As we will see, neural nets\n", + "can be viewed as natural, more powerful extensions of supervised\n", + "learning methods such as linear and logistic regression and soft-max\n", + "methods we discussed earlier." + ] + }, + { + "cell_type": "markdown", + "id": "7bcf7188", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward neural networks\n", + "\n", + "The feed-forward neural network (FFNN) was the first and simplest type\n", + "of ANNs that were devised. In this network, the information moves in\n", + "only one direction: forward through the layers.\n", + "\n", + "Nodes are represented by circles, while the arrows display the\n", + "connections between the nodes, including the direction of information\n", + "flow. Additionally, each arrow corresponds to a weight variable\n", + "(figure to come). We observe that each node in a layer is connected\n", + "to *all* nodes in the subsequent layer, making this a so-called\n", + "*fully-connected* FFNN." + ] + }, + { + "cell_type": "markdown", + "id": "cd094e20", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Network\n", + "\n", + "A different variant of FFNNs are *convolutional neural networks*\n", + "(CNNs), which have a connectivity pattern inspired by the animal\n", + "visual cortex. Individual neurons in the visual cortex only respond to\n", + "stimuli from small sub-regions of the visual field, called a receptive\n", + "field. This makes the neurons well-suited to exploit the strong\n", + "spatially local correlation present in natural images. The response of\n", + "each neuron can be approximated mathematically as a convolution\n", + "operation. (figure to come)\n", + "\n", + "Convolutional neural networks emulate the behaviour of neurons in the\n", + "visual cortex by enforcing a *local* connectivity pattern between\n", + "nodes of adjacent layers: Each node in a convolutional layer is\n", + "connected only to a subset of the nodes in the previous layer, in\n", + "contrast to the fully-connected FFNN. Often, CNNs consist of several\n", + "convolutional layers that learn local features of the input, with a\n", + "fully-connected layer at the end, which gathers all the local data and\n", + "produces the outputs. They have wide applications in image and video\n", + "recognition." + ] + }, + { + "cell_type": "markdown", + "id": "ea99157e", + "metadata": { + "editable": true + }, + "source": [ + "## Recurrent neural networks\n", + "\n", + "So far we have only mentioned ANNs where information flows in one\n", + "direction: forward. *Recurrent neural networks* on the other hand,\n", + "have connections between nodes that form directed *cycles*. This\n", + "creates a form of internal memory which are able to capture\n", + "information on what has been calculated before; the output is\n", + "dependent on the previous computations. Recurrent NNs make use of\n", + "sequential information by performing the same task for every element\n", + "in a sequence, where each element depends on previous elements. An\n", + "example of such information is sentences, making recurrent NNs\n", + "especially well-suited for handwriting and speech recognition." + ] + }, + { + "cell_type": "markdown", + "id": "b73754c2", + "metadata": { + "editable": true + }, + "source": [ + "## Other types of networks\n", + "\n", + "There are many other kinds of ANNs that have been developed. One type\n", + "that is specifically designed for interpolation in multidimensional\n", + "space is the radial basis function (RBF) network. RBFs are typically\n", + "made up of three layers: an input layer, a hidden layer with\n", + "non-linear radial symmetric activation functions and a linear output\n", + "layer (''linear'' here means that each node in the output layer has a\n", + "linear activation function). The layers are normally fully-connected\n", + "and there are no cycles, thus RBFs can be viewed as a type of\n", + "fully-connected FFNN. They are however usually treated as a separate\n", + "type of NN due the unusual activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "aa97c83d", + "metadata": { + "editable": true + }, + "source": [ + "## Multilayer perceptrons\n", + "\n", + "One uses often so-called fully-connected feed-forward neural networks\n", + "with three or more layers (an input layer, one or more hidden layers\n", + "and an output layer) consisting of neurons that have non-linear\n", + "activation functions.\n", + "\n", + "Such networks are often called *multilayer perceptrons* (MLPs)." + ] + }, + { + "cell_type": "markdown", + "id": "abe84919", + "metadata": { + "editable": true + }, + "source": [ + "## Why multilayer perceptrons?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**.\n", + "\n", + "Note that the requirements on the activation function only applies to\n", + "the hidden layer, the output nodes are always assumed to be linear, so\n", + "as to not restrict the range of output values." + ] + }, + { + "cell_type": "markdown", + "id": "d3ff207b", + "metadata": { + "editable": true + }, + "source": [ + "## Illustration of a single perceptron model and a multi-perceptron model\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f982c11f", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of XOR, OR and AND gates\n", + "\n", + "Let us first try to fit various gates using standard linear\n", + "regression. The gates we are thinking of are the classical XOR, OR and\n", + "AND gates, well-known elements in computer science. The tables here\n", + "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n", + "specific target $y_i$." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "04a3e090", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "Simple code that tests XOR, OR and AND gates with linear regression\n", + "\"\"\"\n", + "\n", + "import numpy as np\n", + "# Design matrix\n", + "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n", + "print(f\"The X.TX matrix:{X.T @ X}\")\n", + "Xinv = np.linalg.pinv(X.T @ X)\n", + "print(f\"The invers of X.TX matrix:{Xinv}\")\n", + "\n", + "# The XOR gate \n", + "yXOR = np.array( [ 0, 1 ,1, 0])\n", + "ThetaXOR = Xinv @ X.T @ yXOR\n", + "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n", + "print(f\"The linear regression prediction for the XOR gate:{X @ ThetaXOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yOR = np.array( [ 0, 1 ,1, 1])\n", + "ThetaOR = Xinv @ X.T @ yOR\n", + "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n", + "print(f\"The linear regression prediction for the OR gate:{X @ ThetaOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yAND = np.array( [ 0, 0 ,0, 1])\n", + "ThetaAND = Xinv @ X.T @ yAND\n", + "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n", + "print(f\"The linear regression prediction for the AND gate:{X @ ThetaAND}\")" + ] + }, + { + "cell_type": "markdown", + "id": "95b1f5a5", + "metadata": { + "editable": true + }, + "source": [ + "What is happening here?" + ] + }, + { + "cell_type": "markdown", + "id": "0d200eff", + "metadata": { + "editable": true + }, + "source": [ + "## Does Logistic Regression do a better Job?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "040a69d0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "Simple code that tests XOR and OR gates with linear regression\n", + "and logistic regression\n", + "\"\"\"\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LogisticRegression\n", + "import numpy as np\n", + "\n", + "# Design matrix\n", + "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n", + "print(f\"The X.TX matrix:{X.T @ X}\")\n", + "Xinv = np.linalg.pinv(X.T @ X)\n", + "print(f\"The invers of X.TX matrix:{Xinv}\")\n", + "\n", + "# The XOR gate \n", + "yXOR = np.array( [ 0, 1 ,1, 0])\n", + "ThetaXOR = Xinv @ X.T @ yXOR\n", + "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n", + "print(f\"The linear regression prediction for the XOR gate:{X @ ThetaXOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yOR = np.array( [ 0, 1 ,1, 1])\n", + "ThetaOR = Xinv @ X.T @ yOR\n", + "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n", + "print(f\"The linear regression prediction for the OR gate:{X @ ThetaOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yAND = np.array( [ 0, 0 ,0, 1])\n", + "ThetaAND = Xinv @ X.T @ yAND\n", + "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n", + "print(f\"The linear regression prediction for the AND gate:{X @ ThetaAND}\")\n", + "\n", + "# Now we change to logistic regression\n", + "\n", + "\n", + "# Logistic Regression\n", + "logreg = LogisticRegression()\n", + "logreg.fit(X, yOR)\n", + "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n", + "\n", + "logreg.fit(X, yXOR)\n", + "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n", + "\n", + "\n", + "logreg.fit(X, yAND)\n", + "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))" + ] + }, + { + "cell_type": "markdown", + "id": "49f17f65", + "metadata": { + "editable": true + }, + "source": [ + "Not exactly impressive, but somewhat better." + ] + }, + { + "cell_type": "markdown", + "id": "714e0891", + "metadata": { + "editable": true + }, + "source": [ + "## Adding Neural Networks" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "28bde670", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "# and now neural networks with Scikit-Learn and the XOR\n", + "\n", + "from sklearn.neural_network import MLPClassifier\n", + "from sklearn.datasets import make_classification\n", + "X, yXOR = make_classification(n_samples=100, random_state=1)\n", + "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n", + "FFNN.predict_proba(X)\n", + "print(f\"Test set accuracy with Feed Forward Neural Network for XOR gate:{FFNN.score(X, yXOR)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "4440856f", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "The output $y$ is produced via the activation function $f$" + ] + }, + { + "cell_type": "markdown", + "id": "6199da92", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "62c964e3", + "metadata": { + "editable": true + }, + "source": [ + "This function receives $x_i$ as inputs.\n", + "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n", + "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n", + "the neurons in the preceding layer. Furthermore, an MLP is\n", + "fully-connected, which means that each neuron receives a weighted sum\n", + "of the outputs of *all* neurons in the previous layer." + ] + }, + { + "cell_type": "markdown", + "id": "64ba4c70", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$," + ] + }, + { + "cell_type": "markdown", + "id": "66c11135", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0f47b20a", + "metadata": { + "editable": true + }, + "source": [ + "Here $b_i$ is the so-called bias which is normally needed in\n", + "case of zero activation weights or inputs. How to fix the biases and\n", + "the weights will be discussed below. The value of $z_i^1$ is the\n", + "argument to the activation function $f_i$ of each node $i$, The\n", + "variable $M$ stands for all possible inputs to a given node $i$ in the\n", + "first layer. We define the output $y_i^1$ of all neurons in layer 1 as" + ] + }, + { + "cell_type": "markdown", + "id": "bda56156", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j + b_i^1\\right)\n", + "\\label{outputLayer1} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1330fab9", + "metadata": { + "editable": true + }, + "source": [ + "where we assume that all nodes in the same layer have identical\n", + "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n", + "In this case we would identify these functions with a superscript $l$ for the $l$-th layer," + ] + }, + { + "cell_type": "markdown", + "id": "ae474dfb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n", + "\\label{generalLayer} \\tag{4}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6cb6fed", + "metadata": { + "editable": true + }, + "source": [ + "where $N_l$ is the number of nodes in layer $l$. When the output of\n", + "all the nodes in the first hidden layer are computed, the values of\n", + "the subsequent layer can be calculated and so forth until the output\n", + "is obtained." + ] + }, + { + "cell_type": "markdown", + "id": "2f8f9b4e", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "The output of neuron $i$ in layer 2 is thus," + ] + }, + { + "cell_type": "markdown", + "id": "18e74238", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d10df3e7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \n", + " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n", + "\\label{outputLayer2} \\tag{6}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "da21a316", + "metadata": { + "editable": true + }, + "source": [ + "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads" + ] + }, + { + "cell_type": "markdown", + "id": "76938a28", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n", + "\\label{_auto3} \\tag{7}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65434967", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \n", + " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n", + " + b_1^3\\right]\n", + "\\label{_auto4} \\tag{8}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "31d4f5aa", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "We can generalize this expression to an MLP with $l$ hidden\n", + "layers. The complete functional form is," + ] + }, + { + "cell_type": "markdown", + "id": "114030e5", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n", + "\\label{completeNN} \\tag{9}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a93aec4e", + "metadata": { + "editable": true + }, + "source": [ + "which illustrates a basic property of MLPs: The only independent\n", + "variables are the input values $x_n$." + ] + }, + { + "cell_type": "markdown", + "id": "7c85562d", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "This confirms that an MLP, despite its quite convoluted mathematical\n", + "form, is nothing more than an analytic function, specifically a\n", + "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n", + "\\hat{y} \\in \\mathbb{R}^m$.\n", + "\n", + "Furthermore, the flexibility and universality of an MLP can be\n", + "illustrated by realizing that the expression is essentially a nested\n", + "sum of scaled activation functions of the form" + ] + }, + { + "cell_type": "markdown", + "id": "1152ea5e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " f(x) = c_1 f(c_2 x + c_3) + c_4\n", + "\\label{_auto5} \\tag{10}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f3d4b33", + "metadata": { + "editable": true + }, + "source": [ + "where the parameters $c_i$ are weights and biases. By adjusting these\n", + "parameters, the activation functions can be shifted up and down or\n", + "left and right, change slope or be rescaled which is the key to the\n", + "flexibility of a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "4c1ac54e", + "metadata": { + "editable": true + }, + "source": [ + "### Matrix-vector notation\n", + "\n", + "We can introduce a more convenient notation for the activations in an A NN. \n", + "\n", + "Additionally, we can represent the biases and activations\n", + "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n", + "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n", + "\n", + "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n", + "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n", + "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as" + ] + }, + { + "cell_type": "markdown", + "id": "5c4a861f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n", + " f_2\\left(\\left[\\begin{array}{ccc}\n", + " w^2_{11} &w^2_{12} &w^2_{13} \\\\\n", + " w^2_{21} &w^2_{22} &w^2_{23} \\\\\n", + " w^2_{31} &w^2_{32} &w^2_{33} \\\\\n", + " \\end{array} \\right] \\cdot\n", + " \\left[\\begin{array}{c}\n", + " y^1_1 \\\\\n", + " y^1_2 \\\\\n", + " y^1_3 \\\\\n", + " \\end{array}\\right] + \n", + " \\left[\\begin{array}{c}\n", + " b^2_1 \\\\\n", + " b^2_2 \\\\\n", + " b^2_3 \\\\\n", + " \\end{array}\\right]\\right).\n", + "\\label{_auto6} \\tag{11}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "276b271b", + "metadata": { + "editable": true + }, + "source": [ + "### Matrix-vector notation and activation\n", + "\n", + "The activation of node $i$ in layer 2 is" + ] + }, + { + "cell_type": "markdown", + "id": "63a5b8f1", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n", + " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n", + "\\label{_auto7} \\tag{12}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "316b8c32", + "metadata": { + "editable": true + }, + "source": [ + "This is not just a convenient and compact notation, but also a useful\n", + "and intuitive way to think about MLPs: The output is calculated by a\n", + "series of matrix-vector multiplications and vector additions that are\n", + "used as input to the activation functions. For each operation\n", + "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer." + ] + }, + { + "cell_type": "markdown", + "id": "34ba90c8", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). As described\n", + "in, the following restrictions are imposed on an activation function\n", + "for a FFNN to fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "3019fcaf", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, Logistic and Hyperbolic ones\n", + "\n", + "The second requirement excludes all linear functions. Furthermore, in\n", + "a MLP with only linear activation functions, each layer simply\n", + "performs a linear transformation of its inputs.\n", + "\n", + "Regardless of the number of layers, the output of the NN will be\n", + "nothing but a linear function of the inputs. Thus we need to introduce\n", + "some kind of non-linearity to the NN to be able to fit non-linear\n", + "functions Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "389ff36b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee9b399a", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "36f98b26", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb7b8839", + "metadata": { + "editable": true + }, + "source": [ + "### Relevance\n", + "\n", + "The *sigmoid* function are more biologically plausible because the\n", + "output of inactive neurons are zero. Such activation function are\n", + "called *one-sided*. However, it has been shown that the hyperbolic\n", + "tangent performs better than the sigmoid for training MLPs. has\n", + "become the most popular for *deep neural networks*" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "db8d28b5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"The sigmoid function (or the logistic curve) is a \n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Sine Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.sin(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sine function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Plots a graph of the squashing function used by a rectified linear\n", + "unit\"\"\"\n", + "z = numpy.arange(-2, 2, .1)\n", + "zero = numpy.zeros(len(z))\n", + "y = numpy.max([zero, z], axis=0)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, y)\n", + "ax.set_ylim([-2.0, 2.0])\n", + "ax.set_xlim([-2.0, 2.0])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('Rectified linear unit')\n", + "\n", + "plt.show()" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week41.ipynb b/doc/LectureNotes/_build/jupyter_execute/week41.ipynb new file mode 100644 index 000000000..00bfd22e1 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week41.ipynb @@ -0,0 +1,3820 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b625bb28", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "679109d4", + "metadata": { + "editable": true + }, + "source": [ + "# Week 41 Neural networks and constructing a neural network code\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **Week 41**" + ] + }, + { + "cell_type": "markdown", + "id": "d7401ab9", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 41, October 6-10" + ] + }, + { + "cell_type": "markdown", + "id": "f47e1c5c", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lecture on Monday October 6, 2025\n", + "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n", + "\n", + "2. Building our own Feed-forward Neural Network, getting started\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "af0a9895", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. These lecture notes\n", + "\n", + "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n", + "\n", + "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n", + "\n", + "4. Neural Networks demystified at \n", + "\n", + "5. Building Neural Networks from scratch at \n", + "\n", + "6. Video on Neural Networks at \n", + "\n", + "7. Video on the back propagation algorithm at \n", + "\n", + "8. We also recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at ." + ] + }, + { + "cell_type": "markdown", + "id": "be1e5c03", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning\n", + "\n", + "**Two recent books online.**\n", + "\n", + "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n", + "\n", + "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)" + ] + }, + { + "cell_type": "markdown", + "id": "52520e8f", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on books with hands-on material and codes\n", + "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)" + ] + }, + { + "cell_type": "markdown", + "id": "408a0487", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions on Tuesday and Wednesday\n", + "\n", + "Aim: Getting started with coding neural network. The exercises this\n", + "week aim at setting up the feed-forward part of a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "23056baf", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday October 6" + ] + }, + { + "cell_type": "markdown", + "id": "56a2f2f2", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to Neural networks\n", + "\n", + "Artificial neural networks are computational systems that can learn to\n", + "perform tasks by considering examples, generally without being\n", + "programmed with any task-specific rules. It is supposed to mimic a\n", + "biological system, wherein neurons interact by sending signals in the\n", + "form of mathematical functions between layers. All layers can contain\n", + "an arbitrary number of neurons, and each connection is represented by\n", + "a weight variable." + ] + }, + { + "cell_type": "markdown", + "id": "2e3fa93d", + "metadata": { + "editable": true + }, + "source": [ + "## Artificial neurons\n", + "\n", + "The field of artificial neural networks has a long history of\n", + "development, and is closely connected with the advancement of computer\n", + "science and computers in general. A model of artificial neurons was\n", + "first developed by McCulloch and Pitts in 1943 to study signal\n", + "processing in the brain and has later been refined by others. The\n", + "general idea is to mimic neural networks in the human brain, which is\n", + "composed of billions of neurons that communicate with each other by\n", + "sending electrical signals. Each neuron accumulates its incoming\n", + "signals, which must exceed an activation threshold to yield an\n", + "output. If the threshold is not overcome, the neuron remains inactive,\n", + "i.e. has zero output.\n", + "\n", + "This behaviour has inspired a simple mathematical model for an artificial neuron." + ] + }, + { + "cell_type": "markdown", + "id": "0afafe3e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n", + "\\label{artificialNeuron} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc113056", + "metadata": { + "editable": true + }, + "source": [ + "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n", + "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n", + "\n", + "Conceptually, it is helpful to divide neural networks into four\n", + "categories:\n", + "1. general purpose neural networks for supervised learning,\n", + "\n", + "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n", + "\n", + "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n", + "\n", + "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n", + "\n", + "In natural science, DNNs and CNNs have already found numerous\n", + "applications. In statistical physics, they have been applied to detect\n", + "phase transitions in 2D Ising and Potts models, lattice gauge\n", + "theories, and different phases of polymers, or solving the\n", + "Navier-Stokes equation in weather forecasting. Deep learning has also\n", + "found interesting applications in quantum physics. Various quantum\n", + "phase transitions can be detected and studied using DNNs and CNNs,\n", + "topological phases, and even non-equilibrium many-body\n", + "localization. Representing quantum states as DNNs quantum state\n", + "tomography are among some of the impressive achievements to reveal the\n", + "potential of DNNs to facilitate the study of quantum systems.\n", + "\n", + "In quantum information theory, it has been shown that one can perform\n", + "gate decompositions with the help of neural. \n", + "\n", + "The applications are not limited to the natural sciences. There is a\n", + "plethora of applications in essentially all disciplines, from the\n", + "humanities to life science and medicine." + ] + }, + { + "cell_type": "markdown", + "id": "872c3321", + "metadata": { + "editable": true + }, + "source": [ + "## Neural network types\n", + "\n", + "An artificial neural network (ANN), is a computational model that\n", + "consists of layers of connected neurons, or nodes or units. We will\n", + "refer to these interchangeably as units or nodes, and sometimes as\n", + "neurons.\n", + "\n", + "It is supposed to mimic a biological nervous system by letting each\n", + "neuron interact with other neurons by sending signals in the form of\n", + "mathematical functions between layers. A wide variety of different\n", + "ANNs have been developed, but most of them consist of an input layer,\n", + "an output layer and eventual layers in-between, called *hidden\n", + "layers*. All layers can contain an arbitrary number of nodes, and each\n", + "connection between two nodes is associated with a weight variable.\n", + "\n", + "Neural networks (also called neural nets) are neural-inspired\n", + "nonlinear models for supervised learning. As we will see, neural nets\n", + "can be viewed as natural, more powerful extensions of supervised\n", + "learning methods such as linear and logistic regression and soft-max\n", + "methods we discussed earlier." + ] + }, + { + "cell_type": "markdown", + "id": "53edae74", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward neural networks\n", + "\n", + "The feed-forward neural network (FFNN) was the first and simplest type\n", + "of ANNs that were devised. In this network, the information moves in\n", + "only one direction: forward through the layers.\n", + "\n", + "Nodes are represented by circles, while the arrows display the\n", + "connections between the nodes, including the direction of information\n", + "flow. Additionally, each arrow corresponds to a weight variable\n", + "(figure to come). We observe that each node in a layer is connected\n", + "to *all* nodes in the subsequent layer, making this a so-called\n", + "*fully-connected* FFNN." + ] + }, + { + "cell_type": "markdown", + "id": "0eef36d6", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Network\n", + "\n", + "A different variant of FFNNs are *convolutional neural networks*\n", + "(CNNs), which have a connectivity pattern inspired by the animal\n", + "visual cortex. Individual neurons in the visual cortex only respond to\n", + "stimuli from small sub-regions of the visual field, called a receptive\n", + "field. This makes the neurons well-suited to exploit the strong\n", + "spatially local correlation present in natural images. The response of\n", + "each neuron can be approximated mathematically as a convolution\n", + "operation. (figure to come)\n", + "\n", + "Convolutional neural networks emulate the behaviour of neurons in the\n", + "visual cortex by enforcing a *local* connectivity pattern between\n", + "nodes of adjacent layers: Each node in a convolutional layer is\n", + "connected only to a subset of the nodes in the previous layer, in\n", + "contrast to the fully-connected FFNN. Often, CNNs consist of several\n", + "convolutional layers that learn local features of the input, with a\n", + "fully-connected layer at the end, which gathers all the local data and\n", + "produces the outputs. They have wide applications in image and video\n", + "recognition." + ] + }, + { + "cell_type": "markdown", + "id": "bf602451", + "metadata": { + "editable": true + }, + "source": [ + "## Recurrent neural networks\n", + "\n", + "So far we have only mentioned ANNs where information flows in one\n", + "direction: forward. *Recurrent neural networks* on the other hand,\n", + "have connections between nodes that form directed *cycles*. This\n", + "creates a form of internal memory which are able to capture\n", + "information on what has been calculated before; the output is\n", + "dependent on the previous computations. Recurrent NNs make use of\n", + "sequential information by performing the same task for every element\n", + "in a sequence, where each element depends on previous elements. An\n", + "example of such information is sentences, making recurrent NNs\n", + "especially well-suited for handwriting and speech recognition." + ] + }, + { + "cell_type": "markdown", + "id": "0afbe2d0", + "metadata": { + "editable": true + }, + "source": [ + "## Other types of networks\n", + "\n", + "There are many other kinds of ANNs that have been developed. One type\n", + "that is specifically designed for interpolation in multidimensional\n", + "space is the radial basis function (RBF) network. RBFs are typically\n", + "made up of three layers: an input layer, a hidden layer with\n", + "non-linear radial symmetric activation functions and a linear output\n", + "layer (''linear'' here means that each node in the output layer has a\n", + "linear activation function). The layers are normally fully-connected\n", + "and there are no cycles, thus RBFs can be viewed as a type of\n", + "fully-connected FFNN. They are however usually treated as a separate\n", + "type of NN due the unusual activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "d957cfe8", + "metadata": { + "editable": true + }, + "source": [ + "## Multilayer perceptrons\n", + "\n", + "One uses often so-called fully-connected feed-forward neural networks\n", + "with three or more layers (an input layer, one or more hidden layers\n", + "and an output layer) consisting of neurons that have non-linear\n", + "activation functions.\n", + "\n", + "Such networks are often called *multilayer perceptrons* (MLPs)." + ] + }, + { + "cell_type": "markdown", + "id": "57b218ab", + "metadata": { + "editable": true + }, + "source": [ + "## Why multilayer perceptrons?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**.\n", + "\n", + "Note that the requirements on the activation function only applies to\n", + "the hidden layer, the output nodes are always assumed to be linear, so\n", + "as to not restrict the range of output values." + ] + }, + { + "cell_type": "markdown", + "id": "6bda8dda", + "metadata": { + "editable": true + }, + "source": [ + "## Illustration of a single perceptron model and a multi-perceptron model\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f7d514be", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning and neural networks\n", + "\n", + "Neural networks, in its so-called feed-forward form, where each\n", + "iterations contains a feed-forward stage and a back-propgagation\n", + "stage, consist of series of affine matrix-matrix and matrix-vector\n", + "multiplications. The unknown parameters (the so-called biases and\n", + "weights which deternine the architecture of a neural network), are\n", + "uptaded iteratively using the so-called back-propagation algorithm.\n", + "This algorithm corresponds to the so-called reverse mode of \n", + "automatic differentation." + ] + }, + { + "cell_type": "markdown", + "id": "02ed299b", + "metadata": { + "editable": true + }, + "source": [ + "## Basics of an NN\n", + "\n", + "A neural network consists of a series of hidden layers, in addition to\n", + "the input and output layers. Each layer $l$ has a set of parameters\n", + "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n", + "parameters in other layers through a series of affine transformations,\n", + "for a standard NN these are matrix-matrix and matrix-vector\n", + "multiplications. For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n", + "\n", + "It consist of two basic steps:\n", + "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n", + "\n", + "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n", + "\n", + "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion." + ] + }, + { + "cell_type": "markdown", + "id": "96b8c13c", + "metadata": { + "editable": true + }, + "source": [ + "## Overarching view of a neural network\n", + "\n", + "The architecture of a neural network defines our model. This model\n", + "aims at describing some function $f(\\boldsymbol{x}$ which represents\n", + "some final result (outputs or tagrget values) given a specific inpput\n", + "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n", + "vectors.\n", + "\n", + "The architecture consists of\n", + "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n", + "\n", + "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n", + "\n", + "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n", + "\n", + "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n", + "\n", + "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model." + ] + }, + { + "cell_type": "markdown", + "id": "089704bf", + "metadata": { + "editable": true + }, + "source": [ + "## The optimization problem\n", + "\n", + "The cost function is a function of the unknown parameters\n", + "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n", + "parameters needed to define a neural network\n", + "\n", + "If we are dealing with a regression task a typical cost/loss function\n", + "is the mean squared error" + ] + }, + { + "cell_type": "markdown", + "id": "91ef7170", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9402737", + "metadata": { + "editable": true + }, + "source": [ + "This function represents one of many possible ways to define\n", + "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case." + ] + }, + { + "cell_type": "markdown", + "id": "09940e05", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters of neural networks\n", + "For neural networks the parameters\n", + "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n", + "defined below).\n", + "\n", + "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n", + "superscript indicates the layer number. The biases are typically given\n", + "by vector elements representing each single node of a given layer,\n", + "that is $b_j^{(l)}$." + ] + }, + { + "cell_type": "markdown", + "id": "2bd7b3ff", + "metadata": { + "editable": true + }, + "source": [ + "## Other ingredients of a neural network\n", + "\n", + "Having defined the architecture of a neural network, the optimization\n", + "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n", + "involves the calculations of gradients and their optimization. The\n", + "gradients represent the derivatives of a multidimensional object and\n", + "are often approximated by various gradient methods, including\n", + "1. various quasi-Newton methods,\n", + "\n", + "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n", + "\n", + "3. GD with momentum and other approximations to the learning rates such as\n", + "\n", + " * Adapative gradient (ADAgrad)\n", + "\n", + " * Root mean-square propagation (RMSprop)\n", + "\n", + " * Adaptive gradient with momentum (ADAM) and many other\n", + "\n", + "4. Stochastic gradient descent and various families of learning rate approximations" + ] + }, + { + "cell_type": "markdown", + "id": "1a771f02", + "metadata": { + "editable": true + }, + "source": [ + "## Other parameters\n", + "\n", + "In addition to the above, there are often additional hyperparamaters\n", + "which are included in the setup of a neural network. These will be\n", + "discussed below." + ] + }, + { + "cell_type": "markdown", + "id": "3291a232", + "metadata": { + "editable": true + }, + "source": [ + "## Universal approximation theorem\n", + "\n", + "The universal approximation theorem plays a central role in deep\n", + "learning. [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n", + "the following:\n", + "\n", + "Let $\\sigma$ be any continuous sigmoidal function such that" + ] + }, + { + "cell_type": "markdown", + "id": "74cc209d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fe210f2f", + "metadata": { + "editable": true + }, + "source": [ + "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n", + "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n", + "$\\epsilon >0$, there is a one-layer (hidden) neural network\n", + "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n", + "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which" + ] + }, + { + "cell_type": "markdown", + "id": "4dfec9c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a65f0cd5", + "metadata": { + "editable": true + }, + "source": [ + "## Some parallels from real analysis\n", + "\n", + "For those of you familiar with for example the [Stone-Weierstrass\n", + "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n", + "for polynomial approximations or the convergence criterion for Fourier\n", + "series, there are similarities in the derivation of the proof for\n", + "neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "d006386b", + "metadata": { + "editable": true + }, + "source": [ + "## The approximation theorem in words\n", + "\n", + "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n", + "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n", + "arbitrary accuracy.**\n", + "\n", + "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "0b094d43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f2b9ca56", + "metadata": { + "editable": true + }, + "source": [ + "Then we have" + ] + }, + { + "cell_type": "markdown", + "id": "db4817b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "43216143", + "metadata": { + "editable": true + }, + "source": [ + "## More on the general approximation theorem\n", + "\n", + "None of the proofs give any insight into the relation between the\n", + "number of of hidden layers and nodes and the approximation error\n", + "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n", + "\n", + "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n", + "\n", + "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want." + ] + }, + { + "cell_type": "markdown", + "id": "ef48ad88", + "metadata": { + "editable": true + }, + "source": [ + "## Class of functions we can approximate\n", + "\n", + "The class of functions that can be approximated are the continuous ones.\n", + "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points." + ] + }, + { + "cell_type": "markdown", + "id": "7c4fed36", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the equations for a neural network\n", + "\n", + "The questions we want to ask are how do changes in the biases and the\n", + "weights in our network change the cost function and how can we use the\n", + "final output to modify the weights and biases?\n", + "\n", + "To derive these equations let us start with a plain regression problem\n", + "and define our cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "c4cf04e8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ecc9a1bd", + "metadata": { + "editable": true + }, + "source": [ + "where the $y_i$s are our $n$ targets (the values we want to\n", + "reproduce), while the outputs of the network after having propagated\n", + "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$." + ] + }, + { + "cell_type": "markdown", + "id": "91e6e150", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a neural network with three hidden layers\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "4fabe3cc", + "metadata": { + "editable": true + }, + "source": [ + "## Definitions\n", + "\n", + "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n", + "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n", + "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n", + "$l$-th layer as a function of the bias, the weights which add up from\n", + "the previous layer $l-1$ and the forward passes/outputs\n", + "$\\hat{a}^{l-1}$ from the previous layer as" + ] + }, + { + "cell_type": "markdown", + "id": "8c25e4cf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ae861380", + "metadata": { + "editable": true + }, + "source": [ + "where $b_k^l$ are the biases from layer $l$. Here $M_{l-1}$\n", + "represents the total number of nodes/neurons/units of layer $l-1$. The\n", + "figure in the whiteboard notes illustrates this equation. We can rewrite this in a more\n", + "compact form as the matrix-vector products we discussed earlier," + ] + }, + { + "cell_type": "markdown", + "id": "2b7f7b74", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e76386ca", + "metadata": { + "editable": true + }, + "source": [ + "## Inputs to the activation function\n", + "\n", + "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n", + "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n", + "activation function. In the examples here we will use the sigmoid\n", + "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n", + "and their nodes. It means we have" + ] + }, + { + "cell_type": "markdown", + "id": "12a9fb38", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "08bbe824", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives and the chain rule\n", + "\n", + "From the definition of the activation $z_j^l$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "3783fe53", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b70d213", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "209db1b2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6e42f02f", + "metadata": { + "editable": true + }, + "source": [ + "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)" + ] + }, + { + "cell_type": "markdown", + "id": "78422fdc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c8491cf", + "metadata": { + "editable": true + }, + "source": [ + "## Derivative of the cost function\n", + "\n", + "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n", + "\n", + "Let us specialize to the output layer $l=L$. Our cost function is" + ] + }, + { + "cell_type": "markdown", + "id": "82fb3ded", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}^L) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "88fe7049", + "metadata": { + "editable": true + }, + "source": [ + "The derivative of this function with respect to the weights is" + ] + }, + { + "cell_type": "markdown", + "id": "af856571", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d684ab45", + "metadata": { + "editable": true + }, + "source": [ + "The last partial derivative can easily be computed and reads (by applying the chain rule)" + ] + }, + { + "cell_type": "markdown", + "id": "ac371b5c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8dbfe230", + "metadata": { + "editable": true + }, + "source": [ + "## Simpler examples first, and automatic differentiation\n", + "\n", + "In order to understand the back propagation algorithm and its\n", + "derivation (an implementation of the chain rule), let us first digress\n", + "with some simple examples. These examples are also meant to motivate\n", + "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)." + ] + }, + { + "cell_type": "markdown", + "id": "7244f7f3", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on the chain rule and gradients\n", + "\n", + "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)" + ] + }, + { + "cell_type": "markdown", + "id": "ffb80d86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f15ef23", + "metadata": { + "editable": true + }, + "source": [ + "## Multivariable functions\n", + "\n", + "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives" + ] + }, + { + "cell_type": "markdown", + "id": "1734d532", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c013e25", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f416e200", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "943d440c", + "metadata": { + "editable": true + }, + "source": [ + "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)" + ] + }, + { + "cell_type": "markdown", + "id": "9a88f9e3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s} &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6bc993bf", + "metadata": { + "editable": true + }, + "source": [ + "## Automatic differentiation through examples\n", + "\n", + "A great introduction to automatic differentiation is given by Baydin et al., see .\n", + "See also the video at .\n", + "\n", + "Automatic differentiation is a represented by a repeated application\n", + "of the chain rule on well-known functions and allows for the\n", + "calculation of derivatives to numerical precision. It is not the same\n", + "as the calculation of symbolic derivatives via for example SymPy, nor\n", + "does it use approximative formulae based on Taylor-expansions of a\n", + "function around a given value. The latter are error prone due to\n", + "truncation errors and values of the step size $\\Delta$." + ] + }, + { + "cell_type": "markdown", + "id": "0685fdd2", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example\n", + "\n", + "Our first example is rather simple," + ] + }, + { + "cell_type": "markdown", + "id": "9a2b16de", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) =\\exp{x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba5c3f8a", + "metadata": { + "editable": true + }, + "source": [ + "with derivative" + ] + }, + { + "cell_type": "markdown", + "id": "d0c973a9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) =2x\\exp{x^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "34c21223", + "metadata": { + "editable": true + }, + "source": [ + "We can use SymPy to extract the pertinent lines of Python code through the following simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "72fa0f44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from sympy import *\n", + "x = symbols('x')\n", + "expr = exp(x*x)\n", + "simplify(expr)\n", + "derivative = diff(expr,x)\n", + "print(python(expr))\n", + "print(python(derivative))" + ] + }, + { + "cell_type": "markdown", + "id": "78884bc6", + "metadata": { + "editable": true + }, + "source": [ + "## Smarter way of evaluating the above function\n", + "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable" + ] + }, + { + "cell_type": "markdown", + "id": "f13d7286", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a = x^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "443739d9", + "metadata": { + "editable": true + }, + "source": [ + "leading to" + ] + }, + { + "cell_type": "markdown", + "id": "48b45da1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = f(a(x)) = b= \\exp{a}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "81e7fd8f", + "metadata": { + "editable": true + }, + "source": [ + "We now assume that all operations can be counted in terms of equal\n", + "floating point operations. This means that in order to calculate\n", + "$f(x)$ we need first to square $x$ and then compute the exponential. We\n", + "have thus two floating point operations only." + ] + }, + { + "cell_type": "markdown", + "id": "824bbfa1", + "metadata": { + "editable": true + }, + "source": [ + "## Reducing the number of operations\n", + "\n", + "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "42d2716e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) = 2xb,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f27855c1", + "metadata": { + "editable": true + }, + "source": [ + "which reduces the number of operations from four in the orginal\n", + "expression to two. This means that if we need to compute $f(x)$ and\n", + "its derivative (a common task in optimizations), we have reduced the\n", + "number of operations from six to four in total.\n", + "\n", + "**Note** that the usage of a symbolic software like SymPy does not\n", + "include such simplifications and the calculations of the function and\n", + "the derivatives yield in general more floating point operations." + ] + }, + { + "cell_type": "markdown", + "id": "d4fe531f", + "metadata": { + "editable": true + }, + "source": [ + "## Chain rule, forward and reverse modes\n", + "\n", + "In the above example we have introduced the variables $a$ and $b$, and our function is" + ] + }, + { + "cell_type": "markdown", + "id": "aba8f666", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = f(a(x)) = b= \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "404c698a", + "metadata": { + "editable": true + }, + "source": [ + "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as" + ] + }, + { + "cell_type": "markdown", + "id": "2c73032a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "95a71a82", + "metadata": { + "editable": true + }, + "source": [ + "We note that since $b=f(x)$ that" + ] + }, + { + "cell_type": "markdown", + "id": "c71b8e66", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{db}=1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "23998633", + "metadata": { + "editable": true + }, + "source": [ + "leading to" + ] + }, + { + "cell_type": "markdown", + "id": "0708e562", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee8c4ade", + "metadata": { + "editable": true + }, + "source": [ + "as before." + ] + }, + { + "cell_type": "markdown", + "id": "860d410c", + "metadata": { + "editable": true + }, + "source": [ + "## Forward and reverse modes\n", + "\n", + "We have that" + ] + }, + { + "cell_type": "markdown", + "id": "064e5852", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "983c3afe", + "metadata": { + "editable": true + }, + "source": [ + "which we can rewrite either as" + ] + }, + { + "cell_type": "markdown", + "id": "a1f9638f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "84a07e04", + "metadata": { + "editable": true + }, + "source": [ + "or" + ] + }, + { + "cell_type": "markdown", + "id": "4383650d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "36a2d607", + "metadata": { + "editable": true + }, + "source": [ + "The first expression is called reverse mode (or back propagation)\n", + "since we start by evaluating the derivatives at the end point and then\n", + "propagate backwards. This is the standard way of evaluating\n", + "derivatives (gradients) when optimizing the parameters of a neural\n", + "network. In the context of deep learning this is computationally\n", + "more efficient since the output of a neural network consists of either\n", + "one or some few other output variables.\n", + "\n", + "The second equation defines the so-called **forward mode**." + ] + }, + { + "cell_type": "markdown", + "id": "ab0a9ca8", + "metadata": { + "editable": true + }, + "source": [ + "## More complicated function\n", + "\n", + "We increase our ambitions and introduce a slightly more complicated function" + ] + }, + { + "cell_type": "markdown", + "id": "e85a7d29", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) =\\sqrt{x^2+exp{x^2}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "91c151e1", + "metadata": { + "editable": true + }, + "source": [ + "with derivative" + ] + }, + { + "cell_type": "markdown", + "id": "037a60e4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f198b96", + "metadata": { + "editable": true + }, + "source": [ + "The corresponding SymPy code reads" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "620b6c3e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from sympy import *\n", + "x = symbols('x')\n", + "expr = sqrt(x*x+exp(x*x))\n", + "simplify(expr)\n", + "derivative = diff(expr,x)\n", + "print(python(expr))\n", + "print(python(derivative))" + ] + }, + { + "cell_type": "markdown", + "id": "d1fe5ce8", + "metadata": { + "editable": true + }, + "source": [ + "## Counting the number of floating point operations\n", + "\n", + "A simple count of operations shows that we need five operations for\n", + "the function itself and ten for the derivative. Fifteen operations in total if we wish to proceed with the above codes.\n", + "\n", + "Can we reduce this to\n", + "say half the number of operations?" + ] + }, + { + "cell_type": "markdown", + "id": "746e84de", + "metadata": { + "editable": true + }, + "source": [ + "## Defining intermediate operations\n", + "\n", + "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n", + "We define the following quantities" + ] + }, + { + "cell_type": "markdown", + "id": "cbb4abde", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a = x^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "640a0037", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "e3b8b12d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b = \\exp{x^2} = \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5b2087bf", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "5c397a99", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "c= a+b,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4884822", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c1834aef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "d=f(x)=\\sqrt{c}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aeee8fc4", + "metadata": { + "editable": true + }, + "source": [ + "## New expression for the derivative\n", + "\n", + "With these definitions we obtain the following partial derivatives" + ] + }, + { + "cell_type": "markdown", + "id": "df71e889", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a}{\\partial x} = 2x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "358a49a2", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "95138b08", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial b}{\\partial a} = \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0a0e2f81", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "7fa7f3b5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial c}{\\partial a} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c74442e2", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2e9ebae8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial c}{\\partial b} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "db89516c", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "0bc2735a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42e0cb08", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "56ccf1d5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial d} = 1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "557f2482", + "metadata": { + "editable": true + }, + "source": [ + "## Final derivatives\n", + "Our final derivatives are thus" + ] + }, + { + "cell_type": "markdown", + "id": "90eeebe1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6c2abeb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3f5af305", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n", + "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a} = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b78e9f43", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "d197d721", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x} = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "17334528", + "metadata": { + "editable": true + }, + "source": [ + "which is just" + ] + }, + { + "cell_type": "markdown", + "id": "f69ca3fd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e937d622", + "metadata": { + "editable": true + }, + "source": [ + "and requires only three operations if we can reuse all intermediate variables." + ] + }, + { + "cell_type": "markdown", + "id": "8ab7ba6b", + "metadata": { + "editable": true + }, + "source": [ + "## In general not this simple\n", + "\n", + "In general, see the generalization below, unless we can obtain simple\n", + "analytical expressions which we can simplify further, the final\n", + "implementation of automatic differentiation involves repeated\n", + "calculations (and thereby operations) of derivatives of elementary\n", + "functions." + ] + }, + { + "cell_type": "markdown", + "id": "02665ba6", + "metadata": { + "editable": true + }, + "source": [ + "## Automatic differentiation\n", + "\n", + "We can make this example more formal. Automatic differentiation is a\n", + "formalization of the previous example (see graph).\n", + "\n", + "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n", + "\n", + "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is" + ] + }, + { + "cell_type": "markdown", + "id": "c473a49a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6beeffc2", + "metadata": { + "editable": true + }, + "source": [ + "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n", + "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n", + "\n", + "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$." + ] + }, + { + "cell_type": "markdown", + "id": "814918db", + "metadata": { + "editable": true + }, + "source": [ + "## Chain rule\n", + "\n", + "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n", + "We have defined" + ] + }, + { + "cell_type": "markdown", + "id": "a7a72e3b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x_L} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "041df7ab", + "metadata": { + "editable": true + }, + "source": [ + "which allows us to find the derivatives of the various variables $x_i$ as" + ] + }, + { + "cell_type": "markdown", + "id": "b687bc51", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5c87f3af", + "metadata": { + "editable": true + }, + "source": [ + "Whenever we have a function which can be expressed as a computation\n", + "graph and the various functions can be expressed in terms of\n", + "elementary functions that are differentiable, then automatic\n", + "differentiation works. The functions may not need to be elementary\n", + "functions, they could also be computer programs, although not all\n", + "programs can be automatically differentiated." + ] + }, + { + "cell_type": "markdown", + "id": "02df0535", + "metadata": { + "editable": true + }, + "source": [ + "## First network example, simple percepetron with one input\n", + "\n", + "As yet another example we define now a simple perceptron model with\n", + "all quantities given by scalars. We consider only one input variable\n", + "$x$ and one target value $y$. We define an activation function\n", + "$\\sigma_1$ which takes as input" + ] + }, + { + "cell_type": "markdown", + "id": "dc45fa01", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1x+b_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5568395b", + "metadata": { + "editable": true + }, + "source": [ + "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", + "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", + "graph from whiteboard notes). This output is then fed into the\n", + "**cost/loss** function, which we here for the sake of simplicity just\n", + "define as the squared error" + ] + }, + { + "cell_type": "markdown", + "id": "e6ae6f18", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d6abd22", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with no hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "1e466108", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the parameters\n", + "\n", + "In setting up the feed forward and back propagation parts of the\n", + "algorithm, we need now the derivative of the various variables we want\n", + "to train.\n", + "\n", + "We need" + ] + }, + { + "cell_type": "markdown", + "id": "3b6fd059", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfad60fc", + "metadata": { + "editable": true + }, + "source": [ + "Using the chain rule we find" + ] + }, + { + "cell_type": "markdown", + "id": "5c5014b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c677323", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "93362833", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c857a902", + "metadata": { + "editable": true + }, + "source": [ + "which we later will just define as" + ] + }, + { + "cell_type": "markdown", + "id": "b7b95721", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e2574534", + "metadata": { + "editable": true + }, + "source": [ + "## Adding a hidden layer\n", + "\n", + "We change our simple model to (see graph)\n", + "a network with just one hidden layer but with scalar variables only.\n", + "\n", + "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "ae7a5afa", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7962e138", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0add2cb1", + "metadata": { + "editable": true + }, + "source": [ + "and the cost function" + ] + }, + { + "cell_type": "markdown", + "id": "2ea986fc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "683c4849", + "metadata": { + "editable": true + }, + "source": [ + "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." + ] + }, + { + "cell_type": "markdown", + "id": "f345670c", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with one hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "bb15a76b", + "metadata": { + "editable": true + }, + "source": [ + "## The derivatives\n", + "\n", + "The derivatives are now, using the chain rule again" + ] + }, + { + "cell_type": "markdown", + "id": "d0882362", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3e16d45d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2a0a41b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e8f61358", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5a8258cb", + "metadata": { + "editable": true + }, + "source": [ + "Can you generalize this to more than one hidden layer?" + ] + }, + { + "cell_type": "markdown", + "id": "bb720314", + "metadata": { + "editable": true + }, + "source": [ + "## Important observations\n", + "\n", + "From the above equations we see that the derivatives of the activation\n", + "functions play a central role. If they vanish, the training may\n", + "stop. This is called the vanishing gradient problem, see discussions below. If they become\n", + "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n", + "is referenced as the exploding gradient problem." + ] + }, + { + "cell_type": "markdown", + "id": "52217a26", + "metadata": { + "editable": true + }, + "source": [ + "## The training\n", + "\n", + "The training of the parameters is done through various gradient descent approximations with" + ] + }, + { + "cell_type": "markdown", + "id": "eb647e50", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cda95964", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "130a2766", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_i \\leftarrow b_i-\\eta \\delta_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ac7cc3bc", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ is the learning rate.\n", + "\n", + "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n", + "\n", + "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model." + ] + }, + { + "cell_type": "markdown", + "id": "cde60cd2", + "metadata": { + "editable": true + }, + "source": [ + "## Code example\n", + "\n", + "The code here implements the above model with one hidden layer and\n", + "scalar variables for the same function we studied in the previous\n", + "example. The code is however set up so that we can add multiple\n", + "inputs $x$ and target values $y$. Note also that we have the\n", + "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n", + "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3616dd69", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "# We use the Sigmoid function as activation function\n", + "def sigmoid(z):\n", + " return 1.0/(1.0+np.exp(-z))\n", + "\n", + "def forwardpropagation(x):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_1 = np.matmul(x, w_1) + b_1\n", + " # activation in the hidden layer\n", + " a_1 = sigmoid(z_1)\n", + " # weighted sum of inputs to the output layer\n", + " z_2 = np.matmul(a_1, w_2) + b_2\n", + " a_2 = z_2\n", + " return a_1, a_2\n", + "\n", + "def backpropagation(x, y):\n", + " a_1, a_2 = forwardpropagation(x)\n", + " # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n", + " delta_2 = a_2 - y\n", + " print(0.5*((a_2-y)**2))\n", + " # delta for the hidden layer\n", + " delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_1.T, delta_2)\n", + " output_bias_gradient = np.sum(delta_2, axis=0)\n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(x.T, delta_1)\n", + " hidden_bias_gradient = np.sum(delta_1, axis=0)\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = np.array([4.0],dtype=np.float64)\n", + "# Target values\n", + "y = 2*x+1.0 \n", + "\n", + "# Defining the neural network, only scalars here\n", + "n_inputs = x.shape\n", + "n_features = 1\n", + "n_hidden_neurons = 1\n", + "n_outputs = 1\n", + "\n", + "# Initialize the network\n", + "# weights and bias in the hidden layer\n", + "w_1 = np.random.randn(n_features, n_hidden_neurons)\n", + "b_1 = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n", + "b_2 = np.zeros(n_outputs) + 0.01\n", + "\n", + "eta = 0.1\n", + "for i in range(50):\n", + " # calculate gradients\n", + " derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " w_2 -= eta * derivW2\n", + " b_2 -= eta * derivB2\n", + " w_1 -= eta * derivW1\n", + " b_1 -= eta * derivB1" + ] + }, + { + "cell_type": "markdown", + "id": "3348a149", + "metadata": { + "editable": true + }, + "source": [ + "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." + ] + }, + { + "cell_type": "markdown", + "id": "b9b47543", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 1: Including more data\n", + "\n", + "Try to increase the amount of input and\n", + "target/output data. Try also to perform calculations for more values\n", + "of the learning rates. Feel free to add either hyperparameters with an\n", + "$l_1$ norm or an $l_2$ norm and discuss your results.\n", + "Discuss your results as functions of the amount of training data and various learning rates.\n", + "\n", + "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**." + ] + }, + { + "cell_type": "markdown", + "id": "3d2a82c9", + "metadata": { + "editable": true + }, + "source": [ + "## Simple neural network and the back propagation equations\n", + "\n", + "Let us now try to increase our level of ambition and attempt at setting \n", + "up the equations for a neural network with two input nodes, one hidden\n", + "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n", + "\n", + "We need to define the following parameters and variables with the input layer (layer $(0)$) \n", + "where we label the nodes $x_0$ and $x_1$" + ] + }, + { + "cell_type": "markdown", + "id": "e2bda122", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4324d91", + "metadata": { + "editable": true + }, + "source": [ + "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" + ] + }, + { + "cell_type": "markdown", + "id": "b3c0b344", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fb200d12", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with two input nodes, one hidden layer and one output node\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "5a7e37cd", + "metadata": { + "editable": true + }, + "source": [ + "## The ouput layer\n", + "\n", + "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables" + ] + }, + { + "cell_type": "markdown", + "id": "11f25dfa", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8755dbae", + "metadata": { + "editable": true + }, + "source": [ + "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", + "The parameters we need to optimize are given by" + ] + }, + { + "cell_type": "markdown", + "id": "51983594", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20a70d90", + "metadata": { + "editable": true + }, + "source": [ + "## Compact expressions\n", + "\n", + "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n", + "The inputs to the first hidden layer are" + ] + }, + { + "cell_type": "markdown", + "id": "76e186dc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3396d1b9", + "metadata": { + "editable": true + }, + "source": [ + "with outputs" + ] + }, + { + "cell_type": "markdown", + "id": "2f4d2eed", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6863edaa", + "metadata": { + "editable": true + }, + "source": [ + "## Output layer\n", + "\n", + "For the final output layer we have the inputs to the final activation function" + ] + }, + { + "cell_type": "markdown", + "id": "569b5a62", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "88775a53", + "metadata": { + "editable": true + }, + "source": [ + "resulting in the output" + ] + }, + { + "cell_type": "markdown", + "id": "11852c41", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4e2e26a9", + "metadata": { + "editable": true + }, + "source": [ + "## Explicit derivatives\n", + "\n", + "In total we have nine parameters which we need to train. Using the\n", + "chain rule (or just the back-propagation algorithm) we can find all\n", + "derivatives. Since we will use automatic differentiation in reverse\n", + "mode, we start with the derivatives of the cost function with respect\n", + "to the parameters of the output layer, namely" + ] + }, + { + "cell_type": "markdown", + "id": "25da37b5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4094b188", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "99f40072", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a93180cb", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "312c8e22", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4db8065c", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives of the hidden layer\n", + "\n", + "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)" + ] + }, + { + "cell_type": "markdown", + "id": "316b7cc7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}= \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ef16e76", + "metadata": { + "editable": true + }, + "source": [ + "which, noting that" + ] + }, + { + "cell_type": "markdown", + "id": "85a0f70d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "108db06e", + "metadata": { + "editable": true + }, + "source": [ + "allows us to rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "2922e5c6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb6f6fe5", + "metadata": { + "editable": true + }, + "source": [ + "## Final expression\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "3a0d272d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "70a6cf5c", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "a862fb73", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "703fa2c1", + "metadata": { + "editable": true + }, + "source": [ + "Similarly, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "2032458a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "97d8acd7", + "metadata": { + "editable": true + }, + "source": [ + "## Completing the list\n", + "\n", + "Similarly, we find" + ] + }, + { + "cell_type": "markdown", + "id": "972e5301", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba8f5955", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "3ac41463", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ab92a69c", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "8224b6f2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b55a566b", + "metadata": { + "editable": true + }, + "source": [ + "## Final expressions for the biases of the hidden layer\n", + "\n", + "For the sake of completeness, we list the derivatives of the biases, which are" + ] + }, + { + "cell_type": "markdown", + "id": "cb5f687e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6d8361e8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ccfb7fa8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20fd0aa3", + "metadata": { + "editable": true + }, + "source": [ + "As we will see below, these expressions can be generalized in a more compact form." + ] + }, + { + "cell_type": "markdown", + "id": "6bca7f99", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient expressions\n", + "\n", + "For this specific model, with just one output node and two hidden\n", + "nodes, the gradient descent equations take the following form for output layer" + ] + }, + { + "cell_type": "markdown", + "id": "430e26d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ced71f83", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ec12ee1a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f46fe24d", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "af8f924d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4aeb6140", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "0bc2f26c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7eafd358", + "metadata": { + "editable": true + }, + "source": [ + "where $\\eta$ is the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "548f58f6", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 2: Extended program\n", + "\n", + "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "4c38514a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "06303245", + "metadata": { + "editable": true + }, + "source": [ + "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$" + ] + }, + { + "cell_type": "markdown", + "id": "ed0c0029", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "93df389e", + "metadata": { + "editable": true + }, + "source": [ + "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n", + "You can extend your code to include automatic differentiation.\n", + "\n", + "With these examples, we are now ready to embark upon the writing of more a general code for neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "5df18704", + "metadata": { + "editable": true + }, + "source": [ + "## Getting serious, the back propagation equations for a neural network\n", + "\n", + "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n", + "\n", + "We have thus" + ] + }, + { + "cell_type": "markdown", + "id": "ae3765be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dd8f7882", + "metadata": { + "editable": true + }, + "source": [ + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "f204fdd7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c28e8401", + "metadata": { + "editable": true + }, + "source": [ + "and using the Hadamard product of two vectors we can write this as" + ] + }, + { + "cell_type": "markdown", + "id": "910c4eb1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "efd2f948", + "metadata": { + "editable": true + }, + "source": [ + "## Analyzing the last results\n", + "\n", + "This is an important expression. The second term on the right handside\n", + "measures how fast the cost function is changing as a function of the $j$th\n", + "output activation. If, for example, the cost function doesn't depend\n", + "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n", + "which is what we would expect. The first term on the right, measures\n", + "how fast the activation function $f$ is changing at a given activation\n", + "value $z_j^L$." + ] + }, + { + "cell_type": "markdown", + "id": "e1eeeba2", + "metadata": { + "editable": true + }, + "source": [ + "## More considerations\n", + "\n", + "Notice that everything in the above equations is easily computed. In\n", + "particular, we compute $z_j^L$ while computing the behaviour of the\n", + "network, and it is only a small additional overhead to compute\n", + "$\\sigma'(z^L_j)$. The exact form of the derivative with respect to the\n", + "output depends on the form of the cost function.\n", + "However, provided the cost function is known there should be little\n", + "trouble in calculating" + ] + }, + { + "cell_type": "markdown", + "id": "b5e74c11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e129fe72", + "metadata": { + "editable": true + }, + "source": [ + "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" + ] + }, + { + "cell_type": "markdown", + "id": "3879d293", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1ea1da9d", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives in terms of $z_j^L$\n", + "\n", + "It is also easy to see that our previous equation can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "c7156e16", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8311b4aa", + "metadata": { + "editable": true + }, + "source": [ + "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" + ] + }, + { + "cell_type": "markdown", + "id": "7bb3d820", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eeb0c00", + "metadata": { + "editable": true + }, + "source": [ + "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." + ] + }, + { + "cell_type": "markdown", + "id": "bc7d3757", + "metadata": { + "editable": true + }, + "source": [ + "## Bringing it together\n", + "\n", + "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are" + ] + }, + { + "cell_type": "markdown", + "id": "9f018cff", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1},\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ebde7551", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f96aa8f7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "\\label{_auto2} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1215d118", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c5f6885e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "\\label{_auto3} \\tag{4}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1dedde99", + "metadata": { + "editable": true + }, + "source": [ + "## Final back propagating equation\n", + "\n", + "We have that (replacing $L$ with a general layer $l$)" + ] + }, + { + "cell_type": "markdown", + "id": "a182b912", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9fcc3201", + "metadata": { + "editable": true + }, + "source": [ + "We want to express this in terms of the equations for layer $l+1$." + ] + }, + { + "cell_type": "markdown", + "id": "54237463", + "metadata": { + "editable": true + }, + "source": [ + "## Using the chain rule and summing over all $k$ entries\n", + "\n", + "We obtain" + ] + }, + { + "cell_type": "markdown", + "id": "dc069f5a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "71ba0435", + "metadata": { + "editable": true + }, + "source": [ + "and recalling that" + ] + }, + { + "cell_type": "markdown", + "id": "bd00cbe9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e7e0241", + "metadata": { + "editable": true + }, + "source": [ + "with $M_l$ being the number of nodes in layer $l$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "e8e3697e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d86a02b", + "metadata": { + "editable": true + }, + "source": [ + "This is our final equation.\n", + "\n", + "We are now ready to set up the algorithm for back propagation and learning the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "ff1dc46f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm\n", + "\n", + "The four equations provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\hat{x}$ and the activations\n", + "$\\hat{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\hat{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\hat{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "1313e6dc", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "74378773", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "70450254", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "81a28b23", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a733356", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "f469f486", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7461e5e6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "50a1b605", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "0cebce43", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "2e4405bd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2920aa4e", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "bc4357b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9b66569", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week42.ipynb b/doc/LectureNotes/_build/jupyter_execute/week42.ipynb new file mode 100644 index 000000000..c3af30dea --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week42.ipynb @@ -0,0 +1,5952 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d231eeee", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "5e782cb1", + "metadata": { + "editable": true + }, + "source": [ + "# Week 42 Constructing a Neural Network code with examples\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **October 13-17, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "53309290", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture October 13, 2025\n", + "1. Building our own Feed-forward Neural Network and discussion of project 2\n", + "\n", + "2. Project 2 is available at " + ] + }, + { + "cell_type": "markdown", + "id": "71367514", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and videos\n", + "1. These lecture notes\n", + "\n", + "2. Video of lecture at \n", + "\n", + "3. Whiteboard notes at \n", + "\n", + "4. For a more in depth discussion on neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8. \n", + "\n", + "5. Neural Networks demystified at \n", + "\n", + "6. Building Neural Networks from scratch at \n", + "\n", + "7. Video on Neural Networks at \n", + "\n", + "8. Video on the back propagation algorithm at \n", + "\n", + "I also recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at ." + ] + }, + { + "cell_type": "markdown", + "id": "c7be87be", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions on Tuesday and Wednesday\n", + "1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at \n", + "\n", + "2. Discussion of project 2" + ] + }, + { + "cell_type": "markdown", + "id": "8e0567a2", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture material: Writing a code which implements a feed-forward neural network\n", + "\n", + "Last week we discussed the basics of neural networks and deep learning\n", + "and the basics of automatic differentiation. We looked also at\n", + "examples on how compute the parameters of a simple network with scalar\n", + "inputs and ouputs and no or just one hidden layers.\n", + "\n", + "We ended our discussions with the derivation of the equations for a\n", + "neural network with one hidden layers and two input variables and two\n", + "hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm." + ] + }, + { + "cell_type": "markdown", + "id": "549dcc05", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning\n", + "\n", + "**Two recent books online.**\n", + "\n", + "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n", + "\n", + "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)" + ] + }, + { + "cell_type": "markdown", + "id": "21203bae", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on books with hands-on material and codes\n", + "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)" + ] + }, + { + "cell_type": "markdown", + "id": "1c102a30", + "metadata": { + "editable": true + }, + "source": [ + "## Reading recommendations\n", + "\n", + "1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n", + "\n", + "2. Goodfellow et al, chapter 6 and 7 contain most of the neural network background." + ] + }, + { + "cell_type": "markdown", + "id": "53f11afe", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder from last week: First network example, simple percepetron with one input\n", + "\n", + "As yet another example we define now a simple perceptron model with\n", + "all quantities given by scalars. We consider only one input variable\n", + "$x$ and one target value $y$. We define an activation function\n", + "$\\sigma_1$ which takes as input" + ] + }, + { + "cell_type": "markdown", + "id": "afa8c42a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1x+b_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb5c959f", + "metadata": { + "editable": true + }, + "source": [ + "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", + "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", + "graph from whiteboard notes). This output is then fed into the\n", + "**cost/loss** function, which we here for the sake of simplicity just\n", + "define as the squared error" + ] + }, + { + "cell_type": "markdown", + "id": "0083ae15", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f4931203", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with no hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d3a3754d", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the parameters\n", + "\n", + "In setting up the feed forward and back propagation parts of the\n", + "algorithm, we need now the derivative of the various variables we want\n", + "to train.\n", + "\n", + "We need" + ] + }, + { + "cell_type": "markdown", + "id": "bcd5dbab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2cbc30f1", + "metadata": { + "editable": true + }, + "source": [ + "Using the chain rule we find" + ] + }, + { + "cell_type": "markdown", + "id": "1a1d803d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "776735c7", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c1a2e5af", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9e603df9", + "metadata": { + "editable": true + }, + "source": [ + "which we later will just define as" + ] + }, + { + "cell_type": "markdown", + "id": "533212cd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "09d91067", + "metadata": { + "editable": true + }, + "source": [ + "## Adding a hidden layer\n", + "\n", + "We change our simple model to (see graph)\n", + "a network with just one hidden layer but with scalar variables only.\n", + "\n", + "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "f767afe7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f38ded54", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3f03bc3", + "metadata": { + "editable": true + }, + "source": [ + "and the cost function" + ] + }, + { + "cell_type": "markdown", + "id": "9062730e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "75bbc32c", + "metadata": { + "editable": true + }, + "source": [ + "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." + ] + }, + { + "cell_type": "markdown", + "id": "fcf02dbf", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with one hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "aa97678f", + "metadata": { + "editable": true + }, + "source": [ + "## The derivatives\n", + "\n", + "The derivatives are now, using the chain rule again" + ] + }, + { + "cell_type": "markdown", + "id": "98f68e27", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4528178", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d6304298", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dfc47ba6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8834c3dc", + "metadata": { + "editable": true + }, + "source": [ + "Can you generalize this to more than one hidden layer?" + ] + }, + { + "cell_type": "markdown", + "id": "40956770", + "metadata": { + "editable": true + }, + "source": [ + "## Important observations\n", + "\n", + "From the above equations we see that the derivatives of the activation\n", + "functions play a central role. If they vanish, the training may\n", + "stop. This is called the vanishing gradient problem, see discussions below. If they become\n", + "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n", + "is referenced as the exploding gradient problem." + ] + }, + { + "cell_type": "markdown", + "id": "69e7fdcf", + "metadata": { + "editable": true + }, + "source": [ + "## The training\n", + "\n", + "The training of the parameters is done through various gradient descent approximations with" + ] + }, + { + "cell_type": "markdown", + "id": "726d4c90", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0ee83d1c", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f5b3b5a5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_i \\leftarrow b_i-\\eta \\delta_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2746792", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ is the learning rate.\n", + "\n", + "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n", + "\n", + "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model." + ] + }, + { + "cell_type": "markdown", + "id": "76e2e41a", + "metadata": { + "editable": true + }, + "source": [ + "## Code example\n", + "\n", + "The code here implements the above model with one hidden layer and\n", + "scalar variables for the same function we studied in the previous\n", + "example. The code is however set up so that we can add multiple\n", + "inputs $x$ and target values $y$. Note also that we have the\n", + "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n", + "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1c4719c1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "# We use the Sigmoid function as activation function\n", + "def sigmoid(z):\n", + " return 1.0/(1.0+np.exp(-z))\n", + "\n", + "def forwardpropagation(x):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_1 = np.matmul(x, w_1) + b_1\n", + " # activation in the hidden layer\n", + " a_1 = sigmoid(z_1)\n", + " # weighted sum of inputs to the output layer\n", + " z_2 = np.matmul(a_1, w_2) + b_2\n", + " a_2 = z_2\n", + " return a_1, a_2\n", + "\n", + "def backpropagation(x, y):\n", + " a_1, a_2 = forwardpropagation(x)\n", + " # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n", + " delta_2 = a_2 - y\n", + " print(0.5*((a_2-y)**2))\n", + " # delta for the hidden layer\n", + " delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_1.T, delta_2)\n", + " output_bias_gradient = np.sum(delta_2, axis=0)\n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(x.T, delta_1)\n", + " hidden_bias_gradient = np.sum(delta_1, axis=0)\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = np.array([4.0],dtype=np.float64)\n", + "# Target values\n", + "y = 2*x+1.0 \n", + "\n", + "# Defining the neural network, only scalars here\n", + "n_inputs = x.shape\n", + "n_features = 1\n", + "n_hidden_neurons = 1\n", + "n_outputs = 1\n", + "\n", + "# Initialize the network\n", + "# weights and bias in the hidden layer\n", + "w_1 = np.random.randn(n_features, n_hidden_neurons)\n", + "b_1 = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n", + "b_2 = np.zeros(n_outputs) + 0.01\n", + "\n", + "eta = 0.1\n", + "for i in range(50):\n", + " # calculate gradients\n", + " derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " w_2 -= eta * derivW2\n", + " b_2 -= eta * derivB2\n", + " w_1 -= eta * derivW1\n", + " b_1 -= eta * derivB1" + ] + }, + { + "cell_type": "markdown", + "id": "debaaadc", + "metadata": { + "editable": true + }, + "source": [ + "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." + ] + }, + { + "cell_type": "markdown", + "id": "7d576f19", + "metadata": { + "editable": true + }, + "source": [ + "## Simple neural network and the back propagation equations\n", + "\n", + "Let us now try to increase our level of ambition and attempt at setting \n", + "up the equations for a neural network with two input nodes, one hidden\n", + "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n", + "\n", + "We need to define the following parameters and variables with the input layer (layer $(0)$) \n", + "where we label the nodes $x_1$ and $x_2$" + ] + }, + { + "cell_type": "markdown", + "id": "582b3b43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_1 = a_1^{(0)} \\wedge x_2 = a_2^{(0)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c8eace47", + "metadata": { + "editable": true + }, + "source": [ + "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" + ] + }, + { + "cell_type": "markdown", + "id": "81ec9945", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_1^{(1)},b_2^{(1)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c35e1f69", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with two input nodes, one hidden layer with two hidden noeds and one output node\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "05b8eea9", + "metadata": { + "editable": true + }, + "source": [ + "## The ouput layer\n", + "\n", + "We have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables" + ] + }, + { + "cell_type": "markdown", + "id": "7ef9cb55", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}=\\left\\{w_{1}^{(2)},w_{2}^{(2)}\\right\\} \\wedge b^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eb5c5ac", + "metadata": { + "editable": true + }, + "source": [ + "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", + "The parameters we need to optimize are given by" + ] + }, + { + "cell_type": "markdown", + "id": "00492358", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\Theta}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45cca5aa", + "metadata": { + "editable": true + }, + "source": [ + "## Compact expressions\n", + "\n", + "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n", + "The inputs to the first hidden layer are" + ] + }, + { + "cell_type": "markdown", + "id": "22cfb40b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}z_1^{(1)} \\\\ z_2^{(1)} \\end{bmatrix}=\\left(\\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\\\ w_{21}^{(1)} &w_{22}^{(1)} \\end{bmatrix}\\right)^{T}\\begin{bmatrix}a_1^{(0)} \\\\ a_2^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_1^{(1)} \\\\ b_2^{(1)} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45b30d06", + "metadata": { + "editable": true + }, + "source": [ + "with outputs" + ] + }, + { + "cell_type": "markdown", + "id": "ebd6a7a5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}a_1^{(1)} \\\\ a_2^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_1^{(1)}) \\\\ \\sigma^{(1)}(z_2^{(1)}) \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "659dd686", + "metadata": { + "editable": true + }, + "source": [ + "## Output layer\n", + "\n", + "For the final output layer we have the inputs to the final activation function" + ] + }, + { + "cell_type": "markdown", + "id": "34a1d4ca", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "34471712", + "metadata": { + "editable": true + }, + "source": [ + "resulting in the output" + ] + }, + { + "cell_type": "markdown", + "id": "0b3a74fd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1a5bdab3", + "metadata": { + "editable": true + }, + "source": [ + "## Explicit derivatives\n", + "\n", + "In total we have nine parameters which we need to train. Using the\n", + "chain rule (or just the back-propagation algorithm) we can find all\n", + "derivatives. Since we will use automatic differentiation in reverse\n", + "mode, we start with the derivatives of the cost function with respect\n", + "to the parameters of the output layer, namely" + ] + }, + { + "cell_type": "markdown", + "id": "37f19e78", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5505aab8", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "d55d045c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "04f101e7", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "bfab2e91", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77f35b7e", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives of the hidden layer\n", + "\n", + "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)" + ] + }, + { + "cell_type": "markdown", + "id": "8cf4a606", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}= \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "86951351", + "metadata": { + "editable": true + }, + "source": [ + "which, noting that" + ] + }, + { + "cell_type": "markdown", + "id": "73414e65", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8f0aaa15", + "metadata": { + "editable": true + }, + "source": [ + "allows us to rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "730c5415", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1afcb5a1", + "metadata": { + "editable": true + }, + "source": [ + "## Final expression\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "7f30cb44", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "14c045ce", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "0c1a2c68", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a3385222", + "metadata": { + "editable": true + }, + "source": [ + "Similarly, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "18ee3804", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{12}^{(1)}}=\\delta_1^{(1)}a_2^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ad741d56", + "metadata": { + "editable": true + }, + "source": [ + "## Completing the list\n", + "\n", + "Similarly, we find" + ] + }, + { + "cell_type": "markdown", + "id": "65870a70", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{21}^{(1)}}=\\delta_2^{(1)}a_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f7807fdc", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9af4a759", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{22}^{(1)}}=\\delta_2^{(1)}a_2^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dc548cb7", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "83b75e94", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_2^{(1)}=w_2^{(2)}\\frac{\\partial a_2^{(1)}}{\\partial z_2^{(1)}}\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c2be559", + "metadata": { + "editable": true + }, + "source": [ + "## Final expressions for the biases of the hidden layer\n", + "\n", + "For the sake of completeness, we list the derivatives of the biases, which are" + ] + }, + { + "cell_type": "markdown", + "id": "18b85f86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "63e39eb4", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "a55371c1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{2}^{(1)}}=\\delta_2^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fa31a9b3", + "metadata": { + "editable": true + }, + "source": [ + "As we will see below, these expressions can be generalized in a more compact form." + ] + }, + { + "cell_type": "markdown", + "id": "580df891", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient expressions\n", + "\n", + "For this specific model, with just one output node and two hidden\n", + "nodes, the gradient descent equations take the following form for output layer" + ] + }, + { + "cell_type": "markdown", + "id": "c10bf2ce", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0bae11f8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ed4a8b93", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2d582987", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "5fa760a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc9de8bf", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f00e3ace", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ac96362", + "metadata": { + "editable": true + }, + "source": [ + "where $\\eta$ is the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "9c46f966", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the equations for a neural network\n", + "\n", + "The questions we want to ask are how do changes in the biases and the\n", + "weights in our network change the cost function and how can we use the\n", + "final output to modify the weights and biases?\n", + "\n", + "To derive these equations let us start with a plain regression problem\n", + "and define our cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "ea509b11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e08ff771", + "metadata": { + "editable": true + }, + "source": [ + "where the $y_i$s are our $n$ targets (the values we want to\n", + "reproduce), while the outputs of the network after having propagated\n", + "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$." + ] + }, + { + "cell_type": "markdown", + "id": "6f476983", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0535d087", + "metadata": { + "editable": true + }, + "source": [ + "## Definitions\n", + "\n", + "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n", + "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n", + "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n", + "$l$-th layer as a function of the bias, the weights which add up from\n", + "the previous layer $l-1$ and the forward passes/outputs\n", + "$\\boldsymbol{a}^{l-1}$ from the previous layer as" + ] + }, + { + "cell_type": "markdown", + "id": "5e024ec1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "239fb4c6", + "metadata": { + "editable": true + }, + "source": [ + "where $b_k^l$ are the biases from layer $l$. Here $M_{l-1}$\n", + "represents the total number of nodes/neurons/units of layer $l-1$. The\n", + "figure in the whiteboard notes illustrates this equation. We can rewrite this in a more\n", + "compact form as the matrix-vector products we discussed earlier," + ] + }, + { + "cell_type": "markdown", + "id": "7e4fa6c5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}^l = \\left(\\boldsymbol{W}^l\\right)^T\\boldsymbol{a}^{l-1}+\\boldsymbol{b}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c47cc3c6", + "metadata": { + "editable": true + }, + "source": [ + "## Inputs to the activation function\n", + "\n", + "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n", + "output of layer $l$ as $\\boldsymbol{a}^l = \\sigma(\\boldsymbol{z}^l)$ where $\\sigma$ is our\n", + "activation function. In the examples here we will use the sigmoid\n", + "function discussed in our logistic regression lectures. We will also use the same activation function $\\sigma$ for all layers\n", + "and their nodes. It means we have" + ] + }, + { + "cell_type": "markdown", + "id": "4eb89f11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "92744a90", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of input to first hidden layer $l=1$ from input layer $l=0$\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "35424d45", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives and the chain rule\n", + "\n", + "From the definition of the input variable to the activation function, that is $z_j^l$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "b8502930", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "81ad45a5", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "11bb8afb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b53ec752", + "metadata": { + "editable": true + }, + "source": [ + "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)" + ] + }, + { + "cell_type": "markdown", + "id": "b7519a84", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c57689db", + "metadata": { + "editable": true + }, + "source": [ + "## Derivative of the cost function\n", + "\n", + "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n", + "\n", + "Let us specialize to the output layer $l=L$. Our cost function is" + ] + }, + { + "cell_type": "markdown", + "id": "a9f83b15", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}^L) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "067c2583", + "metadata": { + "editable": true + }, + "source": [ + "The derivative of this function with respect to the weights is" + ] + }, + { + "cell_type": "markdown", + "id": "43545710", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L} = \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eb33717", + "metadata": { + "editable": true + }, + "source": [ + "The last partial derivative can easily be computed and reads (by applying the chain rule)" + ] + }, + { + "cell_type": "markdown", + "id": "e09a8734", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3dc0f5a3", + "metadata": { + "editable": true + }, + "source": [ + "## The back propagation equations for a neural network\n", + "\n", + "We have thus" + ] + }, + { + "cell_type": "markdown", + "id": "bb58784b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_i^{L-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10aea094", + "metadata": { + "editable": true + }, + "source": [ + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "b7cc2db8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6cce9a62", + "metadata": { + "editable": true + }, + "source": [ + "and using the Hadamard product of two vectors we can write this as" + ] + }, + { + "cell_type": "markdown", + "id": "43e5a84b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}^L = \\sigma'(\\boldsymbol{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d5c607a7", + "metadata": { + "editable": true + }, + "source": [ + "## Analyzing the last results\n", + "\n", + "This is an important expression. The second term on the right handside\n", + "measures how fast the cost function is changing as a function of the $j$th\n", + "output activation. If, for example, the cost function doesn't depend\n", + "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n", + "which is what we would expect. The first term on the right, measures\n", + "how fast the activation function $f$ is changing at a given activation\n", + "value $z_j^L$." + ] + }, + { + "cell_type": "markdown", + "id": "a51b3b58", + "metadata": { + "editable": true + }, + "source": [ + "## More considerations\n", + "\n", + "Notice that everything in the above equations is easily computed. In\n", + "particular, we compute $z_j^L$ while computing the behaviour of the\n", + "network, and it is only a small additional overhead to compute\n", + "$\\sigma'(z^L_j)$. The exact form of the derivative with respect to the\n", + "output depends on the form of the cost function.\n", + "However, provided the cost function is known there should be little\n", + "trouble in calculating" + ] + }, + { + "cell_type": "markdown", + "id": "4cd9d058", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c80b630d", + "metadata": { + "editable": true + }, + "source": [ + "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" + ] + }, + { + "cell_type": "markdown", + "id": "dc0c1a06", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}}{\\partial w_{ij}^L} = \\delta_j^La_i^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8f2065b7", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives in terms of $z_j^L$\n", + "\n", + "It is also easy to see that our previous equation can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "7f89b9d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "49c2cd3f", + "metadata": { + "editable": true + }, + "source": [ + "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" + ] + }, + { + "cell_type": "markdown", + "id": "517b1a37", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65c8107f", + "metadata": { + "editable": true + }, + "source": [ + "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." + ] + }, + { + "cell_type": "markdown", + "id": "2a10f902", + "metadata": { + "editable": true + }, + "source": [ + "## Bringing it together\n", + "\n", + "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are" + ] + }, + { + "cell_type": "markdown", + "id": "b2ebf9c2", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{W^L})}{\\partial w_{ij}^L} = \\delta_j^La_i^{L-1},\n", + "\\label{_auto1} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "90336322", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f25ff166", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "\\label{_auto2} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4cf11d5e", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2670748d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "\\label{_auto3} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "18c29f71", + "metadata": { + "editable": true + }, + "source": [ + "## Final back propagating equation\n", + "\n", + "We have that (replacing $L$ with a general layer $l$)" + ] + }, + { + "cell_type": "markdown", + "id": "c593470c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "28e8caef", + "metadata": { + "editable": true + }, + "source": [ + "We want to express this in terms of the equations for layer $l+1$." + ] + }, + { + "cell_type": "markdown", + "id": "516de9d7", + "metadata": { + "editable": true + }, + "source": [ + "## Using the chain rule and summing over all $k$ entries\n", + "\n", + "We obtain" + ] + }, + { + "cell_type": "markdown", + "id": "004c0bf4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d62a3b1f", + "metadata": { + "editable": true + }, + "source": [ + "and recalling that" + ] + }, + { + "cell_type": "markdown", + "id": "e9af770e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eca56f17", + "metadata": { + "editable": true + }, + "source": [ + "with $M_l$ being the number of nodes in layer $l$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "bb0e4414", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a4b190fc", + "metadata": { + "editable": true + }, + "source": [ + "This is our final equation.\n", + "\n", + "We are now ready to set up the algorithm for back propagation and learning the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "ec0f87c0", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n", + "\n", + "**The architecture (our model).**\n", + "\n", + "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n", + "\n", + "2. Define the number of hidden layers and hidden nodes\n", + "\n", + "3. Define activation functions for hidden layers and output layers\n", + "\n", + "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n", + "\n", + "5. Define cost function and possible regularization terms with hyperparameters\n", + "\n", + "6. Initialize weights and biases\n", + "\n", + "7. Fix number of iterations for the feed forward part and back propagation part" + ] + }, + { + "cell_type": "markdown", + "id": "2fb45155", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 1\n", + "\n", + "The four equations provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n", + "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\boldsymbol{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "3d5c2a0e", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "9183bbd0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "32ece956", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "466d6bda", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f31b228", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "fbeac005", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc6ae984", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65f3133d", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "5d27bbe1", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "5e5d0aa0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ea32e5bb", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "3a9bb5a6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9008dcf8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "89aba7d6", + "metadata": { + "editable": true + }, + "source": [ + "## Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). The following\n", + "restrictions are imposed on an activation function for an FFNN to\n", + "fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "ea0cdce2", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, Logistic and Hyperbolic ones\n", + "\n", + "The second requirement excludes all linear functions. Furthermore, in\n", + "a MLP with only linear activation functions, each layer simply\n", + "performs a linear transformation of its inputs.\n", + "\n", + "Regardless of the number of layers, the output of the NN will be\n", + "nothing but a linear function of the inputs. Thus we need to introduce\n", + "some kind of non-linearity to the NN to be able to fit non-linear\n", + "functions Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "91342c80", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd6eb22a", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "4e75b2ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1626d9b7", + "metadata": { + "editable": true + }, + "source": [ + "## Relevance\n", + "\n", + "The *sigmoid* function are more biologically plausible because the\n", + "output of inactive neurons are zero. Such activation function are\n", + "called *one-sided*. However, it has been shown that the hyperbolic\n", + "tangent performs better than the sigmoid for training MLPs. has\n", + "become the most popular for *deep neural networks*" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4ac7c23c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "\"\"\"The sigmoid function (or the logistic curve) is a \n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Sine Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.sin(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sine function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Plots a graph of the squashing function used by a rectified linear\n", + "unit\"\"\"\n", + "z = numpy.arange(-2, 2, .1)\n", + "zero = numpy.zeros(len(z))\n", + "y = numpy.max([zero, z], axis=0)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, y)\n", + "ax.set_ylim([-2.0, 2.0])\n", + "ax.set_xlim([-2.0, 2.0])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('Rectified linear unit')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6aeb0ee4", + "metadata": { + "editable": true + }, + "source": [ + "## Vanishing gradients\n", + "\n", + "The Back propagation algorithm we derived above works by going from\n", + "the output layer to the input layer, propagating the error gradient on\n", + "the way. Once the algorithm has computed the gradient of the cost\n", + "function with regards to each parameter in the network, it uses these\n", + "gradients to update each parameter with a Gradient Descent (GD) step.\n", + "\n", + "Unfortunately for us, the gradients often get smaller and smaller as\n", + "the algorithm progresses down to the first hidden layers. As a result,\n", + "the GD update leaves the lower layer connection weights virtually\n", + "unchanged, and training never converges to a good solution. This is\n", + "known in the literature as **the vanishing gradients problem**." + ] + }, + { + "cell_type": "markdown", + "id": "ea47d1d6", + "metadata": { + "editable": true + }, + "source": [ + "## Exploding gradients\n", + "\n", + "In other cases, the opposite can happen, namely the the gradients can\n", + "grow bigger and bigger. The result is that many of the layers get\n", + "large updates of the weights the algorithm diverges. This is the\n", + "**exploding gradients problem**, which is mostly encountered in\n", + "recurrent neural networks. More generally, deep neural networks suffer\n", + "from unstable gradients, different layers may learn at widely\n", + "different speeds" + ] + }, + { + "cell_type": "markdown", + "id": "1947aa95", + "metadata": { + "editable": true + }, + "source": [ + "## Is the Logistic activation function (Sigmoid) our choice?\n", + "\n", + "Although this unfortunate behavior has been empirically observed for\n", + "quite a while (it was one of the reasons why deep neural networks were\n", + "mostly abandoned for a long time), it is only around 2010 that\n", + "significant progress was made in understanding it.\n", + "\n", + "A paper titled [Understanding the Difficulty of Training Deep\n", + "Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio](http://proceedings.mlr.press/v9/glorot10a.html) found that\n", + "the problems with the popular logistic\n", + "sigmoid activation function and the weight initialization technique\n", + "that was most popular at the time, namely random initialization using\n", + "a normal distribution with a mean of 0 and a standard deviation of\n", + "1." + ] + }, + { + "cell_type": "markdown", + "id": "d024119f", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic function as the root of problems\n", + "\n", + "They showed that with this activation function and this\n", + "initialization scheme, the variance of the outputs of each layer is\n", + "much greater than the variance of its inputs. Going forward in the\n", + "network, the variance keeps increasing after each layer until the\n", + "activation function saturates at the top layers. This is actually made\n", + "worse by the fact that the logistic function has a mean of 0.5, not 0\n", + "(the hyperbolic tangent function has a mean of 0 and behaves slightly\n", + "better than the logistic function in deep networks)." + ] + }, + { + "cell_type": "markdown", + "id": "c9178132", + "metadata": { + "editable": true + }, + "source": [ + "## The derivative of the Logistic funtion\n", + "\n", + "Looking at the logistic activation function, when inputs become large\n", + "(negative or positive), the function saturates at 0 or 1, with a\n", + "derivative extremely close to 0. Thus when backpropagation kicks in,\n", + "it has virtually no gradient to propagate back through the network,\n", + "and what little gradient exists keeps getting diluted as\n", + "backpropagation progresses down through the top layers, so there is\n", + "really nothing left for the lower layers.\n", + "\n", + "In their paper, Glorot and Bengio propose a way to significantly\n", + "alleviate this problem. We need the signal to flow properly in both\n", + "directions: in the forward direction when making predictions, and in\n", + "the reverse direction when backpropagating gradients. We don’t want\n", + "the signal to die out, nor do we want it to explode and saturate. For\n", + "the signal to flow properly, the authors argue that we need the\n", + "variance of the outputs of each layer to be equal to the variance of\n", + "its inputs, and we also need the gradients to have equal variance\n", + "before and after flowing through a layer in the reverse direction." + ] + }, + { + "cell_type": "markdown", + "id": "756185f5", + "metadata": { + "editable": true + }, + "source": [ + "## Insights from the paper by Glorot and Bengio\n", + "\n", + "One of the insights in the 2010 paper by Glorot and Bengio was that\n", + "the vanishing/exploding gradients problems were in part due to a poor\n", + "choice of activation function. Until then most people had assumed that\n", + "if Nature had chosen to use roughly sigmoid activation functions in\n", + "biological neurons, they must be an excellent choice. But it turns out\n", + "that other activation functions behave much better in deep neural\n", + "networks, in particular the ReLU activation function, mostly because\n", + "it does not saturate for positive values (and also because it is quite\n", + "fast to compute)." + ] + }, + { + "cell_type": "markdown", + "id": "3d92cad4", + "metadata": { + "editable": true + }, + "source": [ + "## The RELU function family\n", + "\n", + "The ReLU activation function suffers from a problem known as the dying\n", + "ReLUs: during training, some neurons effectively die, meaning they\n", + "stop outputting anything other than 0.\n", + "\n", + "In some cases, you may find that half of your network’s neurons are\n", + "dead, especially if you used a large learning rate. During training,\n", + "if a neuron’s weights get updated such that the weighted sum of the\n", + "neuron’s inputs is negative, it will start outputting 0. When this\n", + "happen, the neuron is unlikely to come back to life since the gradient\n", + "of the ReLU function is 0 when its input is negative." + ] + }, + { + "cell_type": "markdown", + "id": "cbc6f721", + "metadata": { + "editable": true + }, + "source": [ + "## ELU function\n", + "\n", + "To solve this problem, nowadays practitioners use a variant of the\n", + "ReLU function, such as the leaky ReLU discussed above or the so-called\n", + "exponential linear unit (ELU) function" + ] + }, + { + "cell_type": "markdown", + "id": "9249dc7b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\ z & z \\ge 0.\\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e59de3af", + "metadata": { + "editable": true + }, + "source": [ + "## Which activation function should we use?\n", + "\n", + "In general it seems that the ELU activation function is better than\n", + "the leaky ReLU function (and its variants), which is better than\n", + "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n", + "than the logistic function.\n", + "\n", + "If runtime performance is an issue, then you may opt for the leaky\n", + "ReLU function over the ELU function If you don’t want to tweak yet\n", + "another hyperparameter, you may just use the default $\\alpha$ of\n", + "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n", + "computing power, you can use cross-validation or bootstrap to evaluate\n", + "other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "e2da998c", + "metadata": { + "editable": true + }, + "source": [ + "## More on activation functions, output layers\n", + "\n", + "In most cases you can use the ReLU activation function in the hidden\n", + "layers (or one of its variants).\n", + "\n", + "It is a bit faster to compute than other activation functions, and the\n", + "gradient descent optimization does in general not get stuck.\n", + "\n", + "**For the output layer:**\n", + "\n", + "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n", + "\n", + "* For regression tasks, you can simply use no activation function at all." + ] + }, + { + "cell_type": "markdown", + "id": "e1abf01e", + "metadata": { + "editable": true + }, + "source": [ + "## Fine-tuning neural network hyperparameters\n", + "\n", + "The flexibility of neural networks is also one of their main\n", + "drawbacks: there are many hyperparameters to tweak. Not only can you\n", + "use any imaginable network topology (how neurons/nodes are\n", + "interconnected), but even in a simple FFNN you can change the number\n", + "of layers, the number of neurons per layer, the type of activation\n", + "function to use in each layer, the weight initialization logic, the\n", + "stochastic gradient optmized and much more. How do you know what\n", + "combination of hyperparameters is the best for your task?\n", + "\n", + "* You can use grid search with cross-validation to find the right hyperparameters.\n", + "\n", + "However,since there are many hyperparameters to tune, and since\n", + "training a neural network on a large dataset takes a lot of time, you\n", + "will only be able to explore a tiny part of the hyperparameter space.\n", + "\n", + "* You can use randomized search.\n", + "\n", + "* Or use tools like [Oscar](http://oscar.calldesk.ai/), which implements more complex algorithms to help you find a good set of hyperparameters quickly." + ] + }, + { + "cell_type": "markdown", + "id": "a8ded7cd", + "metadata": { + "editable": true + }, + "source": [ + "## Hidden layers\n", + "\n", + "For many problems you can start with just one or two hidden layers and\n", + "it will work just fine. For the MNIST data set discussed below you can easily get a\n", + "high accuracy using just one hidden layer with a few hundred neurons.\n", + "You can reach for this data set above 98% accuracy using two hidden\n", + "layers with the same total amount of neurons, in roughly the same\n", + "amount of training time.\n", + "\n", + "For more complex problems, you can gradually ramp up the number of\n", + "hidden layers, until you start overfitting the training set. Very\n", + "complex tasks, such as large image classification or speech\n", + "recognition, typically require networks with dozens of layers and they\n", + "need a huge amount of training data. However, you will rarely have to\n", + "train such networks from scratch: it is much more common to reuse\n", + "parts of a pretrained state-of-the-art network that performs a similar\n", + "task." + ] + }, + { + "cell_type": "markdown", + "id": "96da4f48", + "metadata": { + "editable": true + }, + "source": [ + "## Batch Normalization\n", + "\n", + "Batch Normalization aims to address the vanishing/exploding gradients\n", + "problems, and more generally the problem that the distribution of each\n", + "layer’s inputs changes during training, as the parameters of the\n", + "previous layers change.\n", + "\n", + "The technique consists of adding an operation in the model just before\n", + "the activation function of each layer, simply zero-centering and\n", + "normalizing the inputs, then scaling and shifting the result using two\n", + "new parameters per layer (one for scaling, the other for shifting). In\n", + "other words, this operation lets the model learn the optimal scale and\n", + "mean of the inputs for each layer. In order to zero-center and\n", + "normalize the inputs, the algorithm needs to estimate the inputs’ mean\n", + "and standard deviation. It does so by evaluating the mean and standard\n", + "deviation of the inputs over the current mini-batch, from this the\n", + "name batch normalization." + ] + }, + { + "cell_type": "markdown", + "id": "395346a7", + "metadata": { + "editable": true + }, + "source": [ + "## Dropout\n", + "\n", + "It is a fairly simple algorithm: at every training step, every neuron\n", + "(including the input neurons but excluding the output neurons) has a\n", + "probability $p$ of being temporarily dropped out, meaning it will be\n", + "entirely ignored during this training step, but it may be active\n", + "during the next step.\n", + "\n", + "The hyperparameter $p$ is called the dropout rate, and it is typically\n", + "set to 50%. After training, the neurons are not dropped anymore. It\n", + "is viewed as one of the most popular regularization techniques." + ] + }, + { + "cell_type": "markdown", + "id": "9c712bbb", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient Clipping\n", + "\n", + "A popular technique to lessen the exploding gradients problem is to\n", + "simply clip the gradients during backpropagation so that they never\n", + "exceed some threshold (this is mostly useful for recurrent neural\n", + "networks).\n", + "\n", + "This technique is called Gradient Clipping.\n", + "\n", + "In general however, Batch\n", + "Normalization is preferred." + ] + }, + { + "cell_type": "markdown", + "id": "2b66ea72", + "metadata": { + "editable": true + }, + "source": [ + "## A top-down perspective on Neural networks\n", + "\n", + "The first thing we would like to do is divide the data into two or\n", + "three parts. A training set, a validation or dev (development) set,\n", + "and a test set. The test set is the data on which we want to make\n", + "predictions. The dev set is a subset of the training data we use to\n", + "check how well we are doing out-of-sample, after training the model on\n", + "the training dataset. We use the validation error as a proxy for the\n", + "test error in order to make tweaks to our model. It is crucial that we\n", + "do not use any of the test data to train the algorithm. This is a\n", + "cardinal sin in ML. Then:\n", + "\n", + "1. Estimate optimal error rate\n", + "\n", + "2. Minimize underfitting (bias) on training data set.\n", + "\n", + "3. Make sure you are not overfitting." + ] + }, + { + "cell_type": "markdown", + "id": "5acbc082", + "metadata": { + "editable": true + }, + "source": [ + "## More top-down perspectives\n", + "\n", + "If the validation and test sets are drawn from the same distributions,\n", + "then a good performance on the validation set should lead to similarly\n", + "good performance on the test set. \n", + "\n", + "However, sometimes\n", + "the training data and test data differ in subtle ways because, for\n", + "example, they are collected using slightly different methods, or\n", + "because it is cheaper to collect data in one way versus another. In\n", + "this case, there can be a mismatch between the training and test\n", + "data. This can lead to the neural network overfitting these small\n", + "differences between the test and training sets, and a poor performance\n", + "on the test set despite having a good performance on the validation\n", + "set. To rectify this, Andrew Ng suggests making two validation or dev\n", + "sets, one constructed from the training data and one constructed from\n", + "the test data. The difference between the performance of the algorithm\n", + "on these two validation sets quantifies the train-test mismatch. This\n", + "can serve as another important diagnostic when using DNNs for\n", + "supervised learning." + ] + }, + { + "cell_type": "markdown", + "id": "31825b65", + "metadata": { + "editable": true + }, + "source": [ + "## Limitations of supervised learning with deep networks\n", + "\n", + "Like all statistical methods, supervised learning using neural\n", + "networks has important limitations. This is especially important when\n", + "one seeks to apply these methods, especially to physics problems. Like\n", + "all tools, DNNs are not a universal solution. Often, the same or\n", + "better performance on a task can be achieved by using a few\n", + "hand-engineered features (or even a collection of random\n", + "features)." + ] + }, + { + "cell_type": "markdown", + "id": "c76d9af9", + "metadata": { + "editable": true + }, + "source": [ + "## Limitations of NNs\n", + "\n", + "Here we list some of the important limitations of supervised neural network based models. \n", + "\n", + "* **Need labeled data**. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).\n", + "\n", + "* **Supervised neural networks are extremely data intensive.** DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs." + ] + }, + { + "cell_type": "markdown", + "id": "bdc93363", + "metadata": { + "editable": true + }, + "source": [ + "## Homogeneous data\n", + "\n", + "* **Homogeneous data.** Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types." + ] + }, + { + "cell_type": "markdown", + "id": "a1d6ff64", + "metadata": { + "editable": true + }, + "source": [ + "## More limitations\n", + "\n", + "* **Many problems are not about prediction.** In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.\n", + "\n", + "Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems." + ] + }, + { + "cell_type": "markdown", + "id": "0c2e5742", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up a Multi-layer perceptron model for classification\n", + "\n", + "We are now gong to develop an example based on the MNIST data\n", + "base. This is a classification problem and we need to use our\n", + "cross-entropy function we discussed in connection with logistic\n", + "regression. The cross-entropy defines our cost function for the\n", + "classificaton problems with neural networks.\n", + "\n", + "In binary classification with two classes $(0, 1)$ we define the\n", + "logistic/sigmoid function as the probability that a particular input\n", + "is in class $0$ or $1$. This is possible because the logistic\n", + "function takes any input from the real numbers and inputs a number\n", + "between 0 and 1, and can therefore be interpreted as a probability. It\n", + "also has other nice properties, such as a derivative that is simple to\n", + "calculate.\n", + "\n", + "For an input $\\boldsymbol{a}$ from the hidden layer, the probability that the input $\\boldsymbol{x}$\n", + "is in class 0 or 1 is just. We let $\\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$\n", + "represents our activation values $z$. We have" + ] + }, + { + "cell_type": "markdown", + "id": "d4da3f02", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "01ea2e0b", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9c1c7bec", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9238ff2d", + "metadata": { + "editable": true + }, + "source": [ + "where $y \\in \\{0, 1\\}$ and $\\boldsymbol{\\theta}$ represents the weights and biases\n", + "of our network." + ] + }, + { + "cell_type": "markdown", + "id": "3be74bd1", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the cost function\n", + "\n", + "Our cost function is given as (see the Logistic regression lectures)" + ] + }, + { + "cell_type": "markdown", + "id": "2e2fd39c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n", + "y_i \\ln[P(y_i = 0)] + (1 - y_i) \\ln [1 - P(y_i = 0)] = \\sum_{i=1}^n \\mathcal{L}_i(\\boldsymbol{\\theta}) .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42b1d26b", + "metadata": { + "editable": true + }, + "source": [ + "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n", + "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$. \n", + "The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather\n", + "than maximizing a negative number. \n", + "\n", + "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector: \n", + "\n", + "$y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and\n", + "\n", + "$y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ \n", + "\n", + "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$).. \n", + "\n", + "If $\\boldsymbol{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th\n", + "output vector $\\boldsymbol{y}_i$. \n", + "The probability of $\\boldsymbol{x}_i$ being in class $c$ will be given by the softmax function:" + ] + }, + { + "cell_type": "markdown", + "id": "f740a484", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n", + "{\\sum_{c'=0}^{C-1} \\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_{c'})}} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "19189bfc", + "metadata": { + "editable": true + }, + "source": [ + "which reduces to the logistic function in the binary case. \n", + "The likelihood of this $C$-class classifier\n", + "is now given as:" + ] + }, + { + "cell_type": "markdown", + "id": "aeb3ef60", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dbf419a1", + "metadata": { + "editable": true + }, + "source": [ + "Again we take the negative log-likelihood to define our cost function:" + ] + }, + { + "cell_type": "markdown", + "id": "9e345753", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3b13095e", + "metadata": { + "editable": true + }, + "source": [ + "See the logistic regression lectures for a full definition of the cost function.\n", + "\n", + "The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!" + ] + }, + { + "cell_type": "markdown", + "id": "96501a91", + "metadata": { + "editable": true + }, + "source": [ + "## Example: binary classification problem\n", + "\n", + "As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\\beta$ as" + ] + }, + { + "cell_type": "markdown", + "id": "48cf79fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3243c0b1", + "metadata": { + "editable": true + }, + "source": [ + "where we had defined the logistic (sigmoid) function" + ] + }, + { + "cell_type": "markdown", + "id": "bb312a09", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "484cf2b4", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2b9c5483", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ca21f09", + "metadata": { + "editable": true + }, + "source": [ + "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n", + "\n", + "Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. \n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "4852e4d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e3b7cbef", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "0c1e69a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e71df7f4", + "metadata": { + "editable": true + }, + "source": [ + "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n", + "Our cost function at the final layer $l=L$ is now" + ] + }, + { + "cell_type": "markdown", + "id": "50d6fecc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e145e461", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get" + ] + }, + { + "cell_type": "markdown", + "id": "97f13260", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4361ce3b", + "metadata": { + "editable": true + }, + "source": [ + "In case we use another activation function than the logistic one, we need to evaluate other derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "52a16654", + "metadata": { + "editable": true + }, + "source": [ + "## The Softmax function\n", + "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need" + ] + }, + { + "cell_type": "markdown", + "id": "3bfb321e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n", + "\\frac{\\partial f(z_i^l)}{\\partial z_j^l} \\frac{\\partial z_j^l}{\\partial w_{jk}^l}= \\frac{\\partial f(z_i^l)}{\\partial z_j^l}a_k^{l-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eccac6c9", + "metadata": { + "editable": true + }, + "source": [ + "For the Softmax function we have" + ] + }, + { + "cell_type": "markdown", + "id": "23634198", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7a2e75ba", + "metadata": { + "editable": true + }, + "source": [ + "Its derivative with respect to $z_j^l$ gives" + ] + }, + { + "cell_type": "markdown", + "id": "2dad2d14", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "46415917", + "metadata": { + "editable": true + }, + "source": [ + "which in case of the simply binary model reduces to having $i=j$." + ] + }, + { + "cell_type": "markdown", + "id": "6adc7c1e", + "metadata": { + "editable": true + }, + "source": [ + "## Developing a code for doing neural networks with back propagation\n", + "\n", + "One can identify a set of key steps when using neural networks to solve supervised learning problems: \n", + "\n", + "1. Collect and pre-process data \n", + "\n", + "2. Define model and architecture \n", + "\n", + "3. Choose cost function and optimizer \n", + "\n", + "4. Train the model \n", + "\n", + "5. Evaluate model performance on test data \n", + "\n", + "6. Adjust hyperparameters (if necessary, network architecture)" + ] + }, + { + "cell_type": "markdown", + "id": "4110d83e", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Here we will be using the MNIST dataset, which is readily available through the **scikit-learn**\n", + "package. You may also find it for example [here](http://yann.lecun.com/exdb/mnist/). \n", + "The *MNIST* (Modified National Institute of Standards and Technology) database is a large database\n", + "of handwritten digits that is commonly used for training various image processing systems. \n", + "The MNIST dataset consists of 70 000 images of size $28\\times 28$ pixels, each labeled from 0 to 9. \n", + "The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\\times 8$ collected and processed from this database. \n", + "\n", + "To feed data into a feed-forward neural network we need to represent\n", + "the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$. Each\n", + "row represents an *input*, in this case a handwritten digit, and\n", + "each column represents a *feature*, in this case a pixel. The\n", + "correct answers, also known as *labels* or *targets* are\n", + "represented as a 1D array of integers \n", + "$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.\n", + "\n", + "As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from\n", + "measurements of height (in m) \n", + "and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example: \n", + "\n", + "$$ X = \\begin{bmatrix}\n", + "1.85 & 81\\\\\n", + "1.71 & 65\\\\\n", + "1.95 & 103\\\\\n", + "1.55 & 42\\\\\n", + "1.63 & 56\n", + "\\end{bmatrix} ,$$ \n", + "\n", + "and the targets would be: \n", + "\n", + "$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$ \n", + "\n", + "Since each input image is a 2D matrix, we need to flatten the image\n", + "(i.e. \"unravel\" the 2D matrix into a 1D array) to turn the data into a\n", + "design/feature matrix. This means we lose all spatial information in the\n", + "image, such as locality and translational invariance. More complicated\n", + "architectures such as Convolutional Neural Networks can take advantage\n", + "of such information, and are most commonly applied when analyzing\n", + "images." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "070c610d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "28bb6085", + "metadata": { + "editable": true + }, + "source": [ + "## Train and test datasets\n", + "\n", + "Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions. \n", + "\n", + "We will reserve $80 \\%$ of our dataset for training and $20 \\%$ for testing. \n", + "\n", + "It is important that the train and test datasets are drawn randomly from our dataset, to ensure\n", + "no bias in the sampling. \n", + "Say you are taking measurements of weather data to predict the weather in the coming 5 days.\n", + "You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data\n", + "collected from 12.00 to 24.00." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5a6ae0b0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-liner from scikit-learn library\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)\n", + "\n", + "# equivalently in numpy\n", + "def train_test_split_numpy(inputs, labels, train_size, test_size):\n", + " n_inputs = len(inputs)\n", + " inputs_shuffled = inputs.copy()\n", + " labels_shuffled = labels.copy()\n", + " \n", + " np.random.shuffle(inputs_shuffled)\n", + " np.random.shuffle(labels_shuffled)\n", + " \n", + " train_end = int(n_inputs*train_size)\n", + " X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]\n", + " Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]\n", + " \n", + " return X_train, X_test, Y_train, Y_test\n", + "\n", + "#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)\n", + "\n", + "print(\"Number of training images: \" + str(len(X_train)))\n", + "print(\"Number of test images: \" + str(len(X_test)))" + ] + }, + { + "cell_type": "markdown", + "id": "c26d604d", + "metadata": { + "editable": true + }, + "source": [ + "## Define model and architecture\n", + "\n", + "Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have \n", + "\n", + "$$ z = \\sum_{i=1}^n w_i a_i ,$$\n", + "\n", + "$$ y = f(z) ,$$\n", + "\n", + "where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer\n", + "and $w_i$ is the weight to input $i$. \n", + "The activation of the neurons in the input layer is just the features (e.g. a pixel value). \n", + "\n", + "The simplest activation function for a neuron is the *Heaviside* function:\n", + "\n", + "$$ f(z) = \n", + "\\begin{cases}\n", + "1, & z > 0\\\\\n", + "0, & \\text{otherwise}\n", + "\\end{cases}\n", + "$$\n", + "\n", + "A feed-forward neural network with this activation is known as a *perceptron*. \n", + "For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer. \n", + "This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), \n", + "and we call these architectures *multiclass perceptrons*. \n", + "\n", + "However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and \n", + "Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function. \n", + "\n", + "Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU). \n", + "We will be using the sigmoid function $\\sigma(x)$: \n", + "\n", + "$$ f(x) = \\sigma(x) = \\frac{1}{1 + e^{-x}} ,$$\n", + "\n", + "which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "2775283b", + "metadata": { + "editable": true + }, + "source": [ + "## Layers\n", + "\n", + "* Input \n", + "\n", + "Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons. \n", + "\n", + "* Hidden layer\n", + "\n", + "We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer. \n", + "Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer. \n", + "\n", + "* Output\n", + "\n", + "If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,\n", + "which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1. \n", + "\n", + "For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class. \n", + "\n", + "Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function: \n", + "\n", + "$$ P(\\text{class $j$} \\mid \\text{input $\\boldsymbol{a}$}) = \\frac{\\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_j)}}\n", + "{\\sum_{c=0}^{9} \\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_c)}} ,$$ \n", + "\n", + "i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\\boldsymbol{a}$, with $\\boldsymbol{w}_j$ the weights of neuron $j$ to the inputs. \n", + "The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1. \n", + "The exponent is just the weighted sum of inputs as before: \n", + "\n", + "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i+b_j.$$ \n", + "\n", + "Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500\n", + "weights to the output layer." + ] + }, + { + "cell_type": "markdown", + "id": "f7455c00", + "metadata": { + "editable": true + }, + "source": [ + "## Weights and biases\n", + "\n", + "Typically weights are initialized with small values distributed around zero, drawn from a uniform\n", + "or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless. \n", + "\n", + "Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range\n", + "of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$: \n", + "\n", + "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i + b_j.$$ \n", + "\n", + "The bias weights $\\boldsymbol{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "20b3c8c0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# building our neural network\n", + "\n", + "n_inputs, n_features = X_train.shape\n", + "n_hidden_neurons = 50\n", + "n_categories = 10\n", + "\n", + "# we make the weights normally distributed using numpy.random.randn\n", + "\n", + "# weights and bias in the hidden layer\n", + "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n", + "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n", + "output_bias = np.zeros(n_categories) + 0.01" + ] + }, + { + "cell_type": "markdown", + "id": "a41d9acd", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward pass\n", + "\n", + "Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories. \n", + "For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$: \n", + "\n", + "$$ z_{j}^{l} = \\sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$\n", + "\n", + "this is then passed through our activation function \n", + "\n", + "$$ a_{j}^{l} = f(z_{j}^{l}) .$$ \n", + "\n", + "We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer: \n", + "\n", + "$$ z_{j}^{L} = \\sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$ \n", + "\n", + "Finally we calculate the output of neuron $j$ in the output layer using the softmax function: \n", + "\n", + "$$ a_{j}^{L} = \\frac{\\exp{(z_j^{L})}}\n", + "{\\sum_{c=0}^{C-1} \\exp{(z_c^{L})}} .$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2f64238", + "metadata": { + "editable": true + }, + "source": [ + "## Matrix multiplications\n", + "\n", + "Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden\n", + "layer have the dimensions \n", + "$W_{hidden} = (n_{features}, n_{hidden})$,\n", + "we can easily feed the network all our training data in one go by taking the matrix product \n", + "\n", + "$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ \n", + "\n", + "and obtain a matrix that holds the weighted sum of inputs to the hidden layer\n", + "for each input image and each hidden neuron. \n", + "We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$: \n", + "\n", + "$$ \\boldsymbol{z}^{l} = \\boldsymbol{X} \\boldsymbol{W}^{l} + \\boldsymbol{b}^{l} ,$$\n", + "\n", + "meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image. \n", + "This is then passed through the activation: \n", + "\n", + "$$ \\boldsymbol{a}^{l} = f(\\boldsymbol{z}^l) .$$ \n", + "\n", + "This is fed to the output layer: \n", + "\n", + "$$ \\boldsymbol{z}^{L} = \\boldsymbol{a}^{L} \\boldsymbol{W}^{L} + \\boldsymbol{b}^{L} .$$\n", + "\n", + "Finally we receive our output values for each image and each category by passing it through the softmax function: \n", + "\n", + "$$ output = softmax (\\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "1f5589af", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# setup the feed-forward pass, subscript h = hidden layer\n", + "\n", + "def sigmoid(x):\n", + " return 1/(1 + np.exp(-x))\n", + "\n", + "def feed_forward(X):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_h = np.matmul(X, hidden_weights) + hidden_bias\n", + " # activation in the hidden layer\n", + " a_h = sigmoid(z_h)\n", + " \n", + " # weighted sum of inputs to the output layer\n", + " z_o = np.matmul(a_h, output_weights) + output_bias\n", + " # softmax output\n", + " # axis 0 holds each input and axis 1 the probabilities of each category\n", + " exp_term = np.exp(z_o)\n", + " probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + " \n", + " return probabilities\n", + "\n", + "probabilities = feed_forward(X_train)\n", + "print(\"probabilities = (n_inputs, n_categories) = \" + str(probabilities.shape))\n", + "print(\"probability that image 0 is in category 0,1,2,...,9 = \\n\" + str(probabilities[0]))\n", + "print(\"probabilities sum up to: \" + str(probabilities[0].sum()))\n", + "print()\n", + "\n", + "# we obtain a prediction by taking the class with the highest likelihood\n", + "def predict(X):\n", + " probabilities = feed_forward(X)\n", + " return np.argmax(probabilities, axis=1)\n", + "\n", + "predictions = predict(X_train)\n", + "print(\"predictions = (n_inputs) = \" + str(predictions.shape))\n", + "print(\"prediction for image 0: \" + str(predictions[0]))\n", + "print(\"correct label for image 0: \" + str(Y_train[0]))" + ] + }, + { + "cell_type": "markdown", + "id": "4518e911", + "metadata": { + "editable": true + }, + "source": [ + "## Choose cost function and optimizer\n", + "\n", + "To measure how well our neural network is doing we need to introduce a cost function. \n", + "We will call the function that gives the error of a single sample output the *loss* function, and the function\n", + "that gives the total error of our network across all samples the *cost* function.\n", + "A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood. \n", + "\n", + "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector: \n", + "\n", + "$$ y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$ \n", + "\n", + "$$ y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$ \n", + "\n", + "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset. \n", + "\n", + "Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector. \n", + "We define the cost function $\\mathcal{C}$ as a sum over the cross-entropy loss for each point $\\boldsymbol{x}_i$ in the dataset.\n", + "\n", + "In the one-hot representation only one of the terms in the loss function is non-zero, namely the\n", + "probability of the correct category $c'$ \n", + "(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong\n", + "you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\\boldsymbol{\\theta}$ represents the parameters of our network, i.e. all the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "d519516b", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the cost function\n", + "\n", + "The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent\n", + "is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function. \n", + "Each parameter $\\theta$ is iteratively adjusted according to the rule \n", + "\n", + "$$ \\theta_{i+1} = \\theta_i - \\eta \\nabla \\mathcal{C}(\\theta_i) ,$$\n", + "\n", + "where $\\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum. \n", + "This update can be repeated for any number of iterations, or until we are satisfied with the result. \n", + "\n", + "A simple and effective improvement is a variant called *Batch Gradient Descent*. \n", + "Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient\n", + "on a subset of the data called a *minibatch*. \n", + "If there are $N$ data points and we have a minibatch size of $M$, the total number of batches\n", + "is $N/M$. \n", + "We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes: \n", + "\n", + "$$ \\nabla \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\nabla \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n", + "\\frac{1}{M} \\sum_{i \\in B_k} \\nabla \\mathcal{L}_i(\\theta) ,$$\n", + "\n", + "i.e. instead of averaging the loss over the entire dataset, we average over a minibatch. \n", + "\n", + "This has two important benefits: \n", + "1. Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima. \n", + "\n", + "2. It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient. \n", + "\n", + "The various optmization methods, with codes and algorithms, are discussed in our lectures on [Gradient descent approaches](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "46b71202", + "metadata": { + "editable": true + }, + "source": [ + "## Regularization\n", + "\n", + "It is common to add an extra term to the cost function, proportional\n", + "to the size of the weights. This is equivalent to constraining the\n", + "size of the weights, so that they do not grow out of control.\n", + "Constraining the size of the weights means that the weights cannot\n", + "grow arbitrarily large to fit the training data, and in this way\n", + "reduces *overfitting*.\n", + "\n", + "We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes: \n", + "\n", + "$$ \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n", + "\\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) + \\lambda \\lvert \\lvert \\boldsymbol{w} \\rvert \\rvert_2^2 \n", + "= \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}(\\theta) + \\lambda \\sum_{ij} w_{ij}^2,$$ \n", + "\n", + "i.e. we sum up all the weights squared. The factor $\\lambda$ is known as a regularization parameter.\n", + "\n", + "In order to train the model, we need to calculate the derivative of\n", + "the cost function with respect to every bias and weight in the\n", + "network. In total our network has $(64 + 1)\\times 50=3250$ weights in\n", + "the hidden layer and $(50 + 1)\\times 10=510$ weights to the output\n", + "layer ($+1$ for the bias), and the gradient must be calculated for\n", + "every parameter. We use the *backpropagation* algorithm discussed\n", + "above. This is a clever use of the chain rule that allows us to\n", + "calculate the gradient efficently." + ] + }, + { + "cell_type": "markdown", + "id": "129c39d3", + "metadata": { + "editable": true + }, + "source": [ + "## Matrix multiplication\n", + "\n", + "To more efficently train our network these equations are implemented using matrix operations. \n", + "The error in the output layer is calculated simply as, with $\\boldsymbol{t}$ being our targets, \n", + "\n", + "$$ \\delta_L = \\boldsymbol{t} - \\boldsymbol{y} = (n_{inputs}, n_{categories}) .$$ \n", + "\n", + "The gradient for the output weights is calculated as \n", + "\n", + "$$ \\nabla W_{L} = \\boldsymbol{a}^T \\delta_L = (n_{hidden}, n_{categories}) ,$$\n", + "\n", + "where $\\boldsymbol{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input. \n", + "Since we are going backwards we have to transpose the activation matrix. \n", + "\n", + "The gradient with respect to the output bias is then \n", + "\n", + "$$ \\nabla \\boldsymbol{b}_{L} = \\sum_{i=1}^{n_{inputs}} \\delta_L = (n_{categories}) .$$ \n", + "\n", + "The error in the hidden layer is \n", + "\n", + "$$ \\Delta_h = \\delta_L W_{L}^T \\circ f'(z_{h}) = \\delta_L W_{L}^T \\circ a_{h} \\circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$ \n", + "\n", + "where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean\n", + "that we are summing up the products for each neuron in the output layer. The symbol $\\circ$ denotes\n", + "the *Hadamard product*, meaning element-wise multiplication. \n", + "\n", + "This again gives us the gradients in the hidden layer: \n", + "\n", + "$$ \\nabla W_{h} = X^T \\delta_h = (n_{features}, n_{hidden}) ,$$ \n", + "\n", + "$$ \\nabla b_{h} = \\sum_{i=1}^{n_{inputs}} \\delta_h = (n_{hidden}) .$$" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "8abafb44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# to categorical turns our integer vector into a onehot representation\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "# one-hot in numpy\n", + "def to_categorical_numpy(integer_vector):\n", + " n_inputs = len(integer_vector)\n", + " n_categories = np.max(integer_vector) + 1\n", + " onehot_vector = np.zeros((n_inputs, n_categories))\n", + " onehot_vector[range(n_inputs), integer_vector] = 1\n", + " \n", + " return onehot_vector\n", + "\n", + "#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)\n", + "Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)\n", + "\n", + "def feed_forward_train(X):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_h = np.matmul(X, hidden_weights) + hidden_bias\n", + " # activation in the hidden layer\n", + " a_h = sigmoid(z_h)\n", + " \n", + " # weighted sum of inputs to the output layer\n", + " z_o = np.matmul(a_h, output_weights) + output_bias\n", + " # softmax output\n", + " # axis 0 holds each input and axis 1 the probabilities of each category\n", + " exp_term = np.exp(z_o)\n", + " probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + " \n", + " # for backpropagation need activations in hidden and output layers\n", + " return a_h, probabilities\n", + "\n", + "def backpropagation(X, Y):\n", + " a_h, probabilities = feed_forward_train(X)\n", + " \n", + " # error in the output layer\n", + " error_output = probabilities - Y\n", + " # error in the hidden layer\n", + " error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)\n", + " \n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_h.T, error_output)\n", + " output_bias_gradient = np.sum(error_output, axis=0)\n", + " \n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(X.T, error_hidden)\n", + " hidden_bias_gradient = np.sum(error_hidden, axis=0)\n", + "\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "print(\"Old accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))\n", + "\n", + "eta = 0.01\n", + "lmbd = 0.01\n", + "for i in range(1000):\n", + " # calculate gradients\n", + " dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)\n", + " \n", + " # regularization term gradients\n", + " dWo += lmbd * output_weights\n", + " dWh += lmbd * hidden_weights\n", + " \n", + " # update weights and biases\n", + " output_weights -= eta * dWo\n", + " output_bias -= eta * dBo\n", + " hidden_weights -= eta * dWh\n", + " hidden_bias -= eta * dBh\n", + "\n", + "print(\"New accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))" + ] + }, + { + "cell_type": "markdown", + "id": "e95c7166", + "metadata": { + "editable": true + }, + "source": [ + "## Improving performance\n", + "\n", + "As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image. \n", + "In order to obtain a network that does something useful, we will have to do a bit more work. \n", + "\n", + "The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\\lambda = 10^{-6},...,10^{-0}$. \n", + "\n", + "Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period\n", + "going through the entire dataset ($n/M$ batches) an *epoch*.\n", + "\n", + "If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers. \n", + "Andrew Ng goes through some of these considerations in this [video](https://youtu.be/F1ka6a13S9I). You can find a summary of the video [here](https://kevinzakka.github.io/2016/09/26/applying-deep-learning/)." + ] + }, + { + "cell_type": "markdown", + "id": "b4365471", + "metadata": { + "editable": true + }, + "source": [ + "## Full object-oriented implementation\n", + "\n", + "It is very natural to think of the network as an object, with specific instances of the network\n", + "being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5a0357b2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "class NeuralNetwork:\n", + " def __init__(\n", + " self,\n", + " X_data,\n", + " Y_data,\n", + " n_hidden_neurons=50,\n", + " n_categories=10,\n", + " epochs=10,\n", + " batch_size=100,\n", + " eta=0.1,\n", + " lmbd=0.0):\n", + "\n", + " self.X_data_full = X_data\n", + " self.Y_data_full = Y_data\n", + "\n", + " self.n_inputs = X_data.shape[0]\n", + " self.n_features = X_data.shape[1]\n", + " self.n_hidden_neurons = n_hidden_neurons\n", + " self.n_categories = n_categories\n", + "\n", + " self.epochs = epochs\n", + " self.batch_size = batch_size\n", + " self.iterations = self.n_inputs // self.batch_size\n", + " self.eta = eta\n", + " self.lmbd = lmbd\n", + "\n", + " self.create_biases_and_weights()\n", + "\n", + " def create_biases_and_weights(self):\n", + " self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)\n", + " self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01\n", + "\n", + " self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)\n", + " self.output_bias = np.zeros(self.n_categories) + 0.01\n", + "\n", + " def feed_forward(self):\n", + " # feed-forward for training\n", + " self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias\n", + " self.a_h = sigmoid(self.z_h)\n", + "\n", + " self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias\n", + "\n", + " exp_term = np.exp(self.z_o)\n", + " self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + "\n", + " def feed_forward_out(self, X):\n", + " # feed-forward for output\n", + " z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias\n", + " a_h = sigmoid(z_h)\n", + "\n", + " z_o = np.matmul(a_h, self.output_weights) + self.output_bias\n", + " \n", + " exp_term = np.exp(z_o)\n", + " probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + " return probabilities\n", + "\n", + " def backpropagation(self):\n", + " error_output = self.probabilities - self.Y_data\n", + " error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)\n", + "\n", + " self.output_weights_gradient = np.matmul(self.a_h.T, error_output)\n", + " self.output_bias_gradient = np.sum(error_output, axis=0)\n", + "\n", + " self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)\n", + " self.hidden_bias_gradient = np.sum(error_hidden, axis=0)\n", + "\n", + " if self.lmbd > 0.0:\n", + " self.output_weights_gradient += self.lmbd * self.output_weights\n", + " self.hidden_weights_gradient += self.lmbd * self.hidden_weights\n", + "\n", + " self.output_weights -= self.eta * self.output_weights_gradient\n", + " self.output_bias -= self.eta * self.output_bias_gradient\n", + " self.hidden_weights -= self.eta * self.hidden_weights_gradient\n", + " self.hidden_bias -= self.eta * self.hidden_bias_gradient\n", + "\n", + " def predict(self, X):\n", + " probabilities = self.feed_forward_out(X)\n", + " return np.argmax(probabilities, axis=1)\n", + "\n", + " def predict_probabilities(self, X):\n", + " probabilities = self.feed_forward_out(X)\n", + " return probabilities\n", + "\n", + " def train(self):\n", + " data_indices = np.arange(self.n_inputs)\n", + "\n", + " for i in range(self.epochs):\n", + " for j in range(self.iterations):\n", + " # pick datapoints with replacement\n", + " chosen_datapoints = np.random.choice(\n", + " data_indices, size=self.batch_size, replace=False\n", + " )\n", + "\n", + " # minibatch training data\n", + " self.X_data = self.X_data_full[chosen_datapoints]\n", + " self.Y_data = self.Y_data_full[chosen_datapoints]\n", + "\n", + " self.feed_forward()\n", + " self.backpropagation()" + ] + }, + { + "cell_type": "markdown", + "id": "a417307d", + "metadata": { + "editable": true + }, + "source": [ + "## Evaluate model performance on test data\n", + "\n", + "To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data. \n", + "We measure the performance of the network using the *accuracy* score. \n", + "The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$. \n", + "\n", + "$$ \\text{Accuracy} = \\frac{\\sum_{i=1}^n I(\\tilde{y}_i = y_i)}{n} ,$$ \n", + "\n", + "where $I$ is the indicator function, $1$ if $\\tilde{y}_i = y_i$ and $0$ otherwise." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "8ee4b306", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "epochs = 100\n", + "batch_size = 100\n", + "\n", + "dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n", + " n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n", + "dnn.train()\n", + "test_predict = dnn.predict(X_test)\n", + "\n", + "# accuracy score from scikit library\n", + "print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n", + "\n", + "# equivalent in numpy\n", + "def accuracy_score_numpy(Y_test, Y_pred):\n", + " return np.sum(Y_test == Y_pred) / len(Y_test)\n", + "\n", + "#print(\"Accuracy score on test set: \", accuracy_score_numpy(Y_test, test_predict))" + ] + }, + { + "cell_type": "markdown", + "id": "efcbd954", + "metadata": { + "editable": true + }, + "source": [ + "## Adjust hyperparameters\n", + "\n", + "We now perform a grid search to find the optimal hyperparameters for the network. \n", + "Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\\%$ ($2\\%$ error rate)." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "bb527e6e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "# store the models for later use\n", + "DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + "\n", + "# grid search\n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n", + " n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n", + " dnn.train()\n", + " \n", + " DNN_numpy[i][j] = dnn\n", + " \n", + " test_predict = dnn.predict(X_test)\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "d282951d", + "metadata": { + "editable": true + }, + "source": [ + "## Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "69d3d9c8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# visual representation of grid search\n", + "# uses seaborn heatmap, you can also do this with matplotlib imshow\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " dnn = DNN_numpy[i][j]\n", + " \n", + " train_pred = dnn.predict(X_train) \n", + " test_pred = dnn.predict(X_test)\n", + "\n", + " train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n", + " test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "99f5058c", + "metadata": { + "editable": true + }, + "source": [ + "## scikit-learn implementation\n", + "\n", + "**scikit-learn** focuses more\n", + "on traditional machine learning methods, such as regression,\n", + "clustering, decision trees, etc. As such, it has only two types of\n", + "neural networks: Multi Layer Perceptron outputting continuous values,\n", + "*MPLRegressor*, and Multi Layer Perceptron outputting labels,\n", + "*MLPClassifier*. We will see how simple it is to use these classes.\n", + "\n", + "**scikit-learn** implements a few improvements from our neural network,\n", + "such as early stopping, a varying learning rate, different\n", + "optimization methods, etc. We would therefore expect a better\n", + "performance overall." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "7898d99f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.neural_network import MLPClassifier\n", + "# store models for later use\n", + "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + "\n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n", + " alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n", + " dnn.fit(X_train, Y_train)\n", + " \n", + " DNN_scikit[i][j] = dnn\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Accuracy score on test set: \", dnn.score(X_test, Y_test))\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "7ceec918", + "metadata": { + "editable": true + }, + "source": [ + "## Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "98abf229", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " dnn = DNN_scikit[i][j]\n", + " \n", + " train_pred = dnn.predict(X_train) \n", + " test_pred = dnn.predict(X_test)\n", + "\n", + " train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n", + " test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ba07c374", + "metadata": { + "editable": true + }, + "source": [ + "## Building neural networks in Tensorflow and Keras\n", + "\n", + "Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n", + "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n", + "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer. \n", + "\n", + "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n", + "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n", + "NumPy arrays." + ] + }, + { + "cell_type": "markdown", + "id": "1cf09819", + "metadata": { + "editable": true + }, + "source": [ + "## Tensorflow\n", + "\n", + "Tensorflow is an open source library machine learning library\n", + "developed by the Google Brain team for internal use. It was released\n", + "under the Apache 2.0 open source license in November 9, 2015.\n", + "\n", + "Tensorflow is a computational framework that allows you to construct\n", + "machine learning models at different levels of abstraction, from\n", + "high-level, object-oriented APIs like Keras, down to the C++ kernels\n", + "that Tensorflow is built upon. The higher levels of abstraction are\n", + "simpler to use, but less flexible, and our choice of implementation\n", + "should reflect the problems we are trying to solve.\n", + "\n", + "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n", + "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n", + "to represent your model, and then create a Tensorflow *session* to run the graph.\n", + "\n", + "In this guide we will analyze the same data as we did in our NumPy and\n", + "scikit-learn tutorial, gathered from the MNIST database of images. We\n", + "will give an introduction to the lower level Python Application\n", + "Program Interfaces (APIs), and see how we use them to build our graph.\n", + "Then we will build (effectively) the same graph in Keras, to see just\n", + "how simple solving a machine learning problem can be.\n", + "\n", + "To install tensorflow on Unix/Linux systems, use pip as" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "2c2c3ec5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "pip3 install tensorflow" + ] + }, + { + "cell_type": "markdown", + "id": "39d013b1", + "metadata": { + "editable": true + }, + "source": [ + "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n", + "(current release of CPU-only TensorFlow)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "fbf36c26", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf tensorflow\n", + "conda activate tf" + ] + }, + { + "cell_type": "markdown", + "id": "94e66380", + "metadata": { + "editable": true + }, + "source": [ + "To install the current release of GPU TensorFlow" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "5e72b1d2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf-gpu tensorflow-gpu\n", + "conda activate tf-gpu" + ] + }, + { + "cell_type": "markdown", + "id": "40470dbd", + "metadata": { + "editable": true + }, + "source": [ + "## Using Keras\n", + "\n", + "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n", + "that supports Tensorflow, CTNK and Theano as backends. \n", + "If you have Anaconda installed you may run the following command" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "f2cd4f41", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda install keras" + ] + }, + { + "cell_type": "markdown", + "id": "636940c6", + "metadata": { + "editable": true + }, + "source": [ + "You can look up the [instructions here](https://keras.io/) for more information.\n", + "\n", + "We will to a large extent use **keras** in this course." + ] + }, + { + "cell_type": "markdown", + "id": "d9f47b57", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Let us look again at the MINST data set." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1489b5d5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tensorflow as tf\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "672dc5a2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-hot representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "0513084f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "n_neurons_layer1 = 100\n", + "n_neurons_layer2 = 50\n", + "n_categories = 10\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n", + " model = Sequential()\n", + " model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_categories, activation='softmax'))\n", + " \n", + " sgd = optimizers.SGD(lr=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "02a34777", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n", + " eta=eta, lmbd=lmbd)\n", + " DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = DNN.evaluate(X_test, Y_test)\n", + " \n", + " DNN_keras[i][j] = DNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "52c1d6e2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " DNN = DNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "53f9be79", + "metadata": { + "editable": true + }, + "source": [ + "## Building a neural network code\n", + "\n", + "Here we present a flexible object oriented codebase\n", + "for a feed forward neural network, along with a demonstration of how\n", + "to use it. Before we get into the details of the neural network, we\n", + "will first present some implementations of various schedulers, cost\n", + "functions and activation functions that can be used together with the\n", + "neural network.\n", + "\n", + "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023." + ] + }, + { + "cell_type": "markdown", + "id": "39bd1718", + "metadata": { + "editable": true + }, + "source": [ + "### Learning rate methods\n", + "\n", + "The code below shows object oriented implementations of the Constant,\n", + "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n", + "of the classes belong to the shared abstract Scheduler class, and\n", + "share the update_change() and reset() methods allowing for any of the\n", + "schedulers to be seamlessly used during the training stage, as will\n", + "later be shown in the fit() method of the neural\n", + "network. Update_change() only has one parameter, the gradient\n", + "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n", + "from the weights. The reset() function takes no parameters, and resets\n", + "the desired variables. For Constant and Momentum, reset does nothing." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "4c1f42f1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "class Scheduler:\n", + " \"\"\"\n", + " Abstract class for Schedulers\n", + " \"\"\"\n", + "\n", + " def __init__(self, eta):\n", + " self.eta = eta\n", + "\n", + " # should be overwritten\n", + " def update_change(self, gradient):\n", + " raise NotImplementedError\n", + "\n", + " # overwritten if needed\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Constant(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + "\n", + " def update_change(self, gradient):\n", + " return self.eta * gradient\n", + " \n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Momentum(Scheduler):\n", + " def __init__(self, eta: float, momentum: float):\n", + " super().__init__(eta)\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " self.change = self.momentum * self.change + self.eta * gradient\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Adagrad(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " return self.eta * gradient * G_t_inverse\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class AdagradMomentum(Scheduler):\n", + " def __init__(self, eta, momentum):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class RMS_prop(Scheduler):\n", + " def __init__(self, eta, rho):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.second = 0.0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + " self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n", + " return self.eta * gradient / (np.sqrt(self.second + delta))\n", + "\n", + " def reset(self):\n", + " self.second = 0.0\n", + "\n", + "\n", + "class Adam(Scheduler):\n", + " def __init__(self, eta, rho, rho2):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.rho2 = rho2\n", + " self.moment = 0\n", + " self.second = 0\n", + " self.n_epochs = 1\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n", + " self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n", + "\n", + " moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n", + " second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n", + "\n", + " return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n", + "\n", + " def reset(self):\n", + " self.n_epochs += 1\n", + " self.moment = 0\n", + " self.second = 0" + ] + }, + { + "cell_type": "markdown", + "id": "532aecc2", + "metadata": { + "editable": true + }, + "source": [ + "### Usage of the above learning rate schedulers\n", + "\n", + "To initalize a scheduler, simply create the object and pass in the\n", + "necessary parameters such as the learning rate and the momentum as\n", + "shown below. As the Scheduler class is an abstract class it should not\n", + "called directly, and will raise an error upon usage." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "b24b4414", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", + "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)" + ] + }, + { + "cell_type": "markdown", + "id": "32a25c0b", + "metadata": { + "editable": true + }, + "source": [ + "Here is a small example for how a segment of code using schedulers\n", + "could look. Switching out the schedulers is simple." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "7a7d273f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "weights = np.ones((3,3))\n", + "print(f\"Before scheduler:\\n{weights=}\")\n", + "\n", + "epochs = 10\n", + "for e in range(epochs):\n", + " gradient = np.random.rand(3, 3)\n", + " change = adam_scheduler.update_change(gradient)\n", + " weights = weights - change\n", + " adam_scheduler.reset()\n", + "\n", + "print(f\"\\nAfter scheduler:\\n{weights=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d34cd45c", + "metadata": { + "editable": true + }, + "source": [ + "### Cost functions\n", + "\n", + "Here we discuss cost functions that can be used when creating the\n", + "neural network. Every cost function takes the target vector as its\n", + "parameter, and returns a function valued only at $x$ such that it may\n", + "easily be differentiated." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "9ad6425d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "def CostOLS(target):\n", + " \n", + " def func(X):\n", + " return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostLogReg(target):\n", + "\n", + " def func(X):\n", + " \n", + " return -(1.0 / target.shape[0]) * np.sum(\n", + " (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n", + " )\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostCrossEntropy(target):\n", + " \n", + " def func(X):\n", + " return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n", + "\n", + " return func" + ] + }, + { + "cell_type": "markdown", + "id": "baaaff79", + "metadata": { + "editable": true + }, + "source": [ + "Below we give a short example of how these cost function may be used\n", + "to obtain results if you wish to test them out on your own using\n", + "AutoGrad's automatics differentiation." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "78f11b83", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "target = np.array([[1, 2, 3]]).T\n", + "a = np.array([[4, 5, 6]]).T\n", + "\n", + "cost_func = CostCrossEntropy\n", + "cost_func_derivative = grad(cost_func(target))\n", + "\n", + "valued_at_a = cost_func_derivative(a)\n", + "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")" + ] + }, + { + "cell_type": "markdown", + "id": "05285af5", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "Finally, before we look at the neural network, we will look at the\n", + "activation functions which can be specified between the hidden layers\n", + "and as the output function. Each function can be valued for any given\n", + "vector or matrix X, and can be differentiated via derivate()." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "7ac52c84", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import elementwise_grad\n", + "\n", + "def identity(X):\n", + " return X\n", + "\n", + "\n", + "def sigmoid(X):\n", + " try:\n", + " return 1.0 / (1 + np.exp(-X))\n", + " except FloatingPointError:\n", + " return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n", + "\n", + "\n", + "def softmax(X):\n", + " X = X - np.max(X, axis=-1, keepdims=True)\n", + " delta = 10e-10\n", + " return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n", + "\n", + "\n", + "def RELU(X):\n", + " return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n", + "\n", + "\n", + "def LRELU(X):\n", + " delta = 10e-4\n", + " return np.where(X > np.zeros(X.shape), X, delta * X)\n", + "\n", + "\n", + "def derivate(func):\n", + " if func.__name__ == \"RELU\":\n", + "\n", + " def func(X):\n", + " return np.where(X > 0, 1, 0)\n", + "\n", + " return func\n", + "\n", + " elif func.__name__ == \"LRELU\":\n", + "\n", + " def func(X):\n", + " delta = 10e-4\n", + " return np.where(X > 0, 1, delta)\n", + "\n", + " return func\n", + "\n", + " else:\n", + " return elementwise_grad(func)" + ] + }, + { + "cell_type": "markdown", + "id": "873e7caa", + "metadata": { + "editable": true + }, + "source": [ + "Below follows a short demonstration of how to use an activation\n", + "function. The derivative of the activation function will be important\n", + "when calculating the output delta term during backpropagation. Note\n", + "that derivate() can also be used for cost functions for a more\n", + "generalized approach." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "bd43ac18", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "z = np.array([[4, 5, 6]]).T\n", + "print(f\"Input to activation function:\\n{z}\")\n", + "\n", + "act_func = sigmoid\n", + "a = act_func(z)\n", + "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n", + "\n", + "act_func_derivative = derivate(act_func)\n", + "valued_at_z = act_func_derivative(a)\n", + "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")" + ] + }, + { + "cell_type": "markdown", + "id": "3dc2175e", + "metadata": { + "editable": true + }, + "source": [ + "### The Neural Network\n", + "\n", + "Now that we have gotten a good understanding of the implementation of\n", + "some important components, we can take a look at an object oriented\n", + "implementation of a feed forward neural network. The feed forward\n", + "neural network has been implemented as a class named FFNN, which can\n", + "be initiated as a regressor or classifier dependant on the choice of\n", + "cost function. The FFNN can have any number of input nodes, hidden\n", + "layers with any amount of hidden nodes, and any amount of output nodes\n", + "meaning it can perform multiclass classification as well as binary\n", + "classification and regression problems. Although there is a lot of\n", + "code present, it makes for an easy to use and generalizeable interface\n", + "for creating many types of neural networks as will be demonstrated\n", + "below." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "5b4b161c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import math\n", + "import autograd.numpy as np\n", + "import sys\n", + "import warnings\n", + "from autograd import grad, elementwise_grad\n", + "from random import random, seed\n", + "from copy import deepcopy, copy\n", + "from typing import Tuple, Callable\n", + "from sklearn.utils import resample\n", + "\n", + "warnings.simplefilter(\"error\")\n", + "\n", + "\n", + "class FFNN:\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Feed Forward Neural Network with interface enabling flexible design of a\n", + " nerual networks architecture and the specification of activation function\n", + " in the hidden layers and output layer respectively. This model can be used\n", + " for both regression and classification problems, depending on the output function.\n", + "\n", + " Attributes:\n", + " ------------\n", + " I dimensions (tuple[int]): A list of positive integers, which specifies the\n", + " number of nodes in each of the networks layers. The first integer in the array\n", + " defines the number of nodes in the input layer, the second integer defines number\n", + " of nodes in the first hidden layer and so on until the last number, which\n", + " specifies the number of nodes in the output layer.\n", + " II hidden_func (Callable): The activation function for the hidden layers\n", + " III output_func (Callable): The activation function for the output layer\n", + " IV cost_func (Callable): Our cost function\n", + " V seed (int): Sets random seed, makes results reproducible\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " dimensions: tuple[int],\n", + " hidden_func: Callable = sigmoid,\n", + " output_func: Callable = lambda x: x,\n", + " cost_func: Callable = CostOLS,\n", + " seed: int = None,\n", + " ):\n", + " self.dimensions = dimensions\n", + " self.hidden_func = hidden_func\n", + " self.output_func = output_func\n", + " self.cost_func = cost_func\n", + " self.seed = seed\n", + " self.weights = list()\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + " self.classification = None\n", + "\n", + " self.reset_weights()\n", + " self._set_classification()\n", + "\n", + " def fit(\n", + " self,\n", + " X: np.ndarray,\n", + " t: np.ndarray,\n", + " scheduler: Scheduler,\n", + " batches: int = 1,\n", + " epochs: int = 100,\n", + " lam: float = 0,\n", + " X_val: np.ndarray = None,\n", + " t_val: np.ndarray = None,\n", + " ):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " This function performs the training the neural network by performing the feedforward and backpropagation\n", + " algorithm to update the networks weights.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray) : training data\n", + " II t (np.ndarray) : target data\n", + " III scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n", + " IV scheduler_args (list[int]) : list of all arguments necessary for scheduler\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " V batches (int) : number of batches the datasets are split into, default equal to 1\n", + " VI epochs (int) : number of iterations used to train the network, default equal to 100\n", + " VII lam (float) : regularization hyperparameter lambda\n", + " VIII X_val (np.ndarray) : validation set\n", + " IX t_val (np.ndarray) : validation target set\n", + "\n", + " Returns:\n", + " ------------\n", + " I scores (dict) : A dictionary containing the performance metrics of the model.\n", + " The number of the metrics depends on the parameters passed to the fit-function.\n", + "\n", + " \"\"\"\n", + "\n", + " # setup \n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " val_set = False\n", + " if X_val is not None and t_val is not None:\n", + " val_set = True\n", + "\n", + " # creating arrays for score metrics\n", + " train_errors = np.empty(epochs)\n", + " train_errors.fill(np.nan)\n", + " val_errors = np.empty(epochs)\n", + " val_errors.fill(np.nan)\n", + "\n", + " train_accs = np.empty(epochs)\n", + " train_accs.fill(np.nan)\n", + " val_accs = np.empty(epochs)\n", + " val_accs.fill(np.nan)\n", + "\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + "\n", + " batch_size = X.shape[0] // batches\n", + "\n", + " X, t = resample(X, t)\n", + "\n", + " # this function returns a function valued only at X\n", + " cost_function_train = self.cost_func(t)\n", + " if val_set:\n", + " cost_function_val = self.cost_func(t_val)\n", + "\n", + " # create schedulers for each weight matrix\n", + " for i in range(len(self.weights)):\n", + " self.schedulers_weight.append(copy(scheduler))\n", + " self.schedulers_bias.append(copy(scheduler))\n", + "\n", + " print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n", + "\n", + " try:\n", + " for e in range(epochs):\n", + " for i in range(batches):\n", + " # allows for minibatch gradient descent\n", + " if i == batches - 1:\n", + " # If the for loop has reached the last batch, take all thats left\n", + " X_batch = X[i * batch_size :, :]\n", + " t_batch = t[i * batch_size :, :]\n", + " else:\n", + " X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n", + " t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n", + "\n", + " self._feedforward(X_batch)\n", + " self._backpropagate(X_batch, t_batch, lam)\n", + "\n", + " # reset schedulers for each epoch (some schedulers pass in this call)\n", + " for scheduler in self.schedulers_weight:\n", + " scheduler.reset()\n", + "\n", + " for scheduler in self.schedulers_bias:\n", + " scheduler.reset()\n", + "\n", + " # computing performance metrics\n", + " pred_train = self.predict(X)\n", + " train_error = cost_function_train(pred_train)\n", + "\n", + " train_errors[e] = train_error\n", + " if val_set:\n", + " \n", + " pred_val = self.predict(X_val)\n", + " val_error = cost_function_val(pred_val)\n", + " val_errors[e] = val_error\n", + "\n", + " if self.classification:\n", + " train_acc = self._accuracy(self.predict(X), t)\n", + " train_accs[e] = train_acc\n", + " if val_set:\n", + " val_acc = self._accuracy(pred_val, t_val)\n", + " val_accs[e] = val_acc\n", + "\n", + " # printing progress bar\n", + " progression = e / epochs\n", + " print_length = self._progress_bar(\n", + " progression,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " except KeyboardInterrupt:\n", + " # allows for stopping training at any point and seeing the result\n", + " pass\n", + "\n", + " # visualization of training progression (similiar to tensorflow progression bar)\n", + " sys.stdout.write(\"\\r\" + \" \" * print_length)\n", + " sys.stdout.flush()\n", + " self._progress_bar(\n", + " 1,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " sys.stdout.write(\"\")\n", + "\n", + " # return performance metrics for the entire run\n", + " scores = dict()\n", + "\n", + " scores[\"train_errors\"] = train_errors\n", + "\n", + " if val_set:\n", + " scores[\"val_errors\"] = val_errors\n", + "\n", + " if self.classification:\n", + " scores[\"train_accs\"] = train_accs\n", + "\n", + " if val_set:\n", + " scores[\"val_accs\"] = val_accs\n", + "\n", + " return scores\n", + "\n", + " def predict(self, X: np.ndarray, *, threshold=0.5):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs prediction after training of the network has been finished.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " II threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n", + " in classification problems\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " This vector is thresholded if regression=False, meaning that classification results\n", + " in a vector of 1s and 0s, while regressions in an array of decimal numbers\n", + "\n", + " \"\"\"\n", + "\n", + " predict = self._feedforward(X)\n", + "\n", + " if self.classification:\n", + " return np.where(predict > threshold, 1, 0)\n", + " else:\n", + " return predict\n", + "\n", + " def reset_weights(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Resets/Reinitializes the weights in order to train the network for a new problem.\n", + "\n", + " \"\"\"\n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " self.weights = list()\n", + " for i in range(len(self.dimensions) - 1):\n", + " weight_array = np.random.randn(\n", + " self.dimensions[i] + 1, self.dimensions[i + 1]\n", + " )\n", + " weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n", + "\n", + " self.weights.append(weight_array)\n", + "\n", + " def _feedforward(self, X: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates the activation of each layer starting at the input and ending at the output.\n", + " Each following activation is calculated from a weighted sum of each of the preceeding\n", + " activations (except in the case of the input layer).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " \"\"\"\n", + "\n", + " # reset matrices\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + "\n", + " # if X is just a vector, make it into a matrix\n", + " if len(X.shape) == 1:\n", + " X = X.reshape((1, X.shape[0]))\n", + "\n", + " # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n", + " # to add bias to our data\n", + " bias = np.ones((X.shape[0], 1)) * 0.01\n", + " X = np.hstack([bias, X])\n", + "\n", + " # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n", + " # exponent indicates layer number).\n", + " a = X\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(a)\n", + "\n", + " # The feed forward algorithm\n", + " for i in range(len(self.weights)):\n", + " if i < len(self.weights) - 1:\n", + " z = a @ self.weights[i]\n", + " self.z_matrices.append(z)\n", + " a = self.hidden_func(z)\n", + " # bias column again added to the data here\n", + " bias = np.ones((a.shape[0], 1)) * 0.01\n", + " a = np.hstack([bias, a])\n", + " self.a_matrices.append(a)\n", + " else:\n", + " try:\n", + " # a^L, the nodes in our output layers\n", + " z = a @ self.weights[i]\n", + " a = self.output_func(z)\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(z)\n", + " except Exception as OverflowError:\n", + " print(\n", + " \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n", + " )\n", + "\n", + " # this will be a^L\n", + " return a\n", + "\n", + " def _backpropagate(self, X, t, lam):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs the backpropagation algorithm. In other words, this method\n", + " calculates the gradient of all the layers starting at the\n", + " output layer, and moving from right to left accumulates the gradient until\n", + " the input layer is reached. Each layers respective weights are updated while\n", + " the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each.\n", + " II t (np.ndarray): The target vector, with n rows of p targets.\n", + " III lam (float32): regularization parameter used to punish the weights in case of overfitting\n", + "\n", + " Returns:\n", + " ------------\n", + " No return value.\n", + "\n", + " \"\"\"\n", + " out_derivative = derivate(self.output_func)\n", + " hidden_derivative = derivate(self.hidden_func)\n", + "\n", + " for i in range(len(self.weights) - 1, -1, -1):\n", + " # delta terms for output\n", + " if i == len(self.weights) - 1:\n", + " # for multi-class classification\n", + " if (\n", + " self.output_func.__name__ == \"softmax\"\n", + " ):\n", + " delta_matrix = self.a_matrices[i + 1] - t\n", + " # for single class classification\n", + " else:\n", + " cost_func_derivative = grad(self.cost_func(t))\n", + " delta_matrix = out_derivative(\n", + " self.z_matrices[i + 1]\n", + " ) * cost_func_derivative(self.a_matrices[i + 1])\n", + "\n", + " # delta terms for hidden layer\n", + " else:\n", + " delta_matrix = (\n", + " self.weights[i + 1][1:, :] @ delta_matrix.T\n", + " ).T * hidden_derivative(self.z_matrices[i + 1])\n", + "\n", + " # calculate gradient\n", + " gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n", + " gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n", + " 1, delta_matrix.shape[1]\n", + " )\n", + "\n", + " # regularization term\n", + " gradient_weights += self.weights[i][1:, :] * lam\n", + "\n", + " # use scheduler\n", + " update_matrix = np.vstack(\n", + " [\n", + " self.schedulers_bias[i].update_change(gradient_bias),\n", + " self.schedulers_weight[i].update_change(gradient_weights),\n", + " ]\n", + " )\n", + "\n", + " # update weights and bias\n", + " self.weights[i] -= update_matrix\n", + "\n", + " def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates accuracy of given prediction to target\n", + "\n", + " Parameters:\n", + " ------------\n", + " I prediction (np.ndarray): vector of predicitons output network\n", + " (1s and 0s in case of classification, and real numbers in case of regression)\n", + " II target (np.ndarray): vector of true values (What the network ideally should predict)\n", + "\n", + " Returns:\n", + " ------------\n", + " A floating point number representing the percentage of correctly classified instances.\n", + " \"\"\"\n", + " assert prediction.size == target.size\n", + " return np.average((target == prediction))\n", + " def _set_classification(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Decides if FFNN acts as classifier (True) og regressor (False),\n", + " sets self.classification during init()\n", + " \"\"\"\n", + " self.classification = False\n", + " if (\n", + " self.cost_func.__name__ == \"CostLogReg\"\n", + " or self.cost_func.__name__ == \"CostCrossEntropy\"\n", + " ):\n", + " self.classification = True\n", + "\n", + " def _progress_bar(self, progression, **kwargs):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Displays progress of training\n", + " \"\"\"\n", + " print_length = 40\n", + " num_equals = int(progression * print_length)\n", + " num_not = print_length - num_equals\n", + " arrow = \">\" if num_equals > 0 else \"\"\n", + " bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n", + " perc_print = self._format(progression * 100, decimals=5)\n", + " line = f\" {bar} {perc_print}% \"\n", + "\n", + " for key in kwargs:\n", + " if not np.isnan(kwargs[key]):\n", + " value = self._format(kwargs[key], decimals=4)\n", + " line += f\"| {key}: {value} \"\n", + " sys.stdout.write(\"\\r\" + line)\n", + " sys.stdout.flush()\n", + " return len(line)\n", + "\n", + " def _format(self, value, decimals=4):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Formats decimal numbers for progress bar\n", + " \"\"\"\n", + " if value > 0:\n", + " v = value\n", + " elif value < 0:\n", + " v = -10 * value\n", + " else:\n", + " v = 1\n", + " n = 1 + math.floor(math.log10(v))\n", + " if n >= decimals - 1:\n", + " return str(round(value))\n", + " return f\"{value:.{decimals-n-1}f}\"" + ] + }, + { + "cell_type": "markdown", + "id": "9596ae53", + "metadata": { + "editable": true + }, + "source": [ + "Before we make a model, we will quickly generate a dataset we can use\n", + "for our linear regression problem as shown below" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "a11f680f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "def SkrankeFunction(x, y):\n", + " return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n", + "\n", + "def create_X(x, y, n):\n", + " if len(x.shape) > 1:\n", + " x = np.ravel(x)\n", + " y = np.ravel(y)\n", + "\n", + " N = len(x)\n", + " l = int((n + 1) * (n + 2) / 2) # Number of elements in beta\n", + " X = np.ones((N, l))\n", + "\n", + " for i in range(1, n + 1):\n", + " q = int((i) * (i + 1) / 2)\n", + " for k in range(i + 1):\n", + " X[:, q + k] = (x ** (i - k)) * (y**k)\n", + "\n", + " return X\n", + "\n", + "step=0.5\n", + "x = np.arange(0, 1, step)\n", + "y = np.arange(0, 1, step)\n", + "x, y = np.meshgrid(x, y)\n", + "target = SkrankeFunction(x, y)\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "poly_degree=3\n", + "X = create_X(x, y, poly_degree)\n", + "\n", + "X_train, X_test, t_train, t_test = train_test_split(X, target)" + ] + }, + { + "cell_type": "markdown", + "id": "0fc39e40", + "metadata": { + "editable": true + }, + "source": [ + "Now that we have our dataset ready for the regression, we can create\n", + "our regressor. Note that with the seed parameter, we can make sure our\n", + "results stay the same every time we run the neural network. For\n", + "inititialization, we simply specify the dimensions (we wish the amount\n", + "of input nodes to be equal to the datapoints, and the output to\n", + "predict one value)." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "a67ab3a0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "3add8665", + "metadata": { + "editable": true + }, + "source": [ + "We then fit our model with our training data using the scheduler of our choice." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "4a4fbc7a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Constant(eta=1e-3)\n", + "scores = linear_regression.fit(X_train, t_train, scheduler)" + ] + }, + { + "cell_type": "markdown", + "id": "4dff1871", + "metadata": { + "editable": true + }, + "source": [ + "Due to the progress bar we can see the MSE (train_error) throughout\n", + "the FFNN's training. Note that the fit() function has some optional\n", + "parameters with defualt arguments. For example, the regularization\n", + "hyperparameter can be left ignored if not needed, and equally the FFNN\n", + "will by default run for 100 epochs. These can easily be changed, such\n", + "as for example:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "ad40e38c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "43cd1e22", + "metadata": { + "editable": true + }, + "source": [ + "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", + "\n", + "Let us then switch to a binary classification. We use a binary\n", + "classification dataset, and follow a similar setup to the regression\n", + "case." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "cde36b38", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "wisconsin = load_breast_cancer()\n", + "X = wisconsin.data\n", + "target = wisconsin.target\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "X_train, X_val, t_train, t_val = train_test_split(X, target)\n", + "\n", + "scaler = MinMaxScaler()\n", + "scaler.fit(X_train)\n", + "X_train = scaler.transform(X_train)\n", + "X_val = scaler.transform(X_val)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "2bc572a4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "e3e6fa31", + "metadata": { + "editable": true + }, + "source": [ + "We will now make use of our validation data by passing it into our fit function as a keyword argument" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "575ceb29", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "622015f0", + "metadata": { + "editable": true + }, + "source": [ + "Finally, we will create a neural network with 2 hidden layers with activation functions." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "9c075b36", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 1\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "44ded771", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "317e6e5c", + "metadata": { + "editable": true + }, + "source": [ + "### Multiclass classification\n", + "\n", + "Finally, we will demonstrate the use case of multiclass classification\n", + "using our FFNN with the famous MNIST dataset, which contain images of\n", + "digits between the range of 0 to 9." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "8911de9d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "\n", + "def onehot(target: np.ndarray):\n", + " onehot = np.zeros((target.size, target.max() + 1))\n", + " onehot[np.arange(target.size), target] = 1\n", + " return onehot\n", + "\n", + "digits = load_digits()\n", + "\n", + "X = digits.data\n", + "target = digits.target\n", + "target = onehot(target)\n", + "\n", + "input_nodes = 64\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 10\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n", + "\n", + "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = multiclass.fit(X, target, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "82d61377", + "metadata": { + "editable": true + }, + "source": [ + "## Testing the XOR gate and other gates\n", + "\n", + "Let us now use our code to test the XOR gate." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "2a72a374", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", + "\n", + "# The XOR gate\n", + "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n", + "\n", + "input_nodes = X.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n", + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "2d892009", + "metadata": { + "editable": true + }, + "source": [ + "Not bad, but the results depend strongly on the learning reate. Try different learning rates." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week43.ipynb b/doc/LectureNotes/_build/jupyter_execute/week43.ipynb new file mode 100644 index 000000000..f51a39c64 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week43.ipynb @@ -0,0 +1,5950 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5e07edf2", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "44b465a0", + "metadata": { + "editable": true + }, + "source": [ + "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **October 20, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "9d7bd8c9", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 43\n", + "\n", + "**Material for the lecture on Monday October 20, 2025.**\n", + "\n", + "1. Reminder from last week, see also lecture notes from week 42 at as well as those from week 41, see see . \n", + "\n", + "2. Building our own Feed-forward Neural Network.\n", + "\n", + "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n", + "\n", + "4. Start discussions on how to use neural networks for solving differential equations (ordinary and partial ones). This topic continues next week as well.\n", + "\n", + "5. Video of lecture at \n", + "\n", + "6. Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "c50cff0f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercises and lab session week 43\n", + "**Lab sessions on Tuesday and Wednesday.**\n", + "\n", + "1. Work on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n", + "\n", + "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems" + ] + }, + { + "cell_type": "markdown", + "id": "fe8d32ed", + "metadata": { + "editable": true + }, + "source": [ + "## Using Automatic differentiation\n", + "\n", + "In our discussions of ordinary differential equations and neural network codes\n", + "we will also study the usage of Autograd, see for example in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at and the lecture slides from week 41, see ." + ] + }, + { + "cell_type": "markdown", + "id": "99999ab4", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation and automatic differentiation\n", + "\n", + "For more details on the back propagation algorithm and automatic differentiation see\n", + "1. \n", + "\n", + "2. \n", + "\n", + "3. Slides 12-44 at " + ] + }, + { + "cell_type": "markdown", + "id": "b4489372", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday October 20" + ] + }, + { + "cell_type": "markdown", + "id": "f7435e4a", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n", + "This is a reminder from last week.\n", + "\n", + "**The architecture (our model).**\n", + "\n", + "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n", + "\n", + "2. Define the number of hidden layers and hidden nodes\n", + "\n", + "3. Define activation functions for hidden layers and output layers\n", + "\n", + "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n", + "\n", + "5. Define cost function and possible regularization terms with hyperparameters\n", + "\n", + "6. Initialize weights and biases\n", + "\n", + "7. Fix number of iterations for the feed forward part and back propagation part" + ] + }, + { + "cell_type": "markdown", + "id": "e2561576", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 1\n", + "\n", + "Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n", + "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\boldsymbol{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "39ed46ed", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "776b50ac", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b0ad385d", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "bb592830", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41259526", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "47eaff91", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05b74533", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6edb8648", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "a663fc08", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "479150e0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41b9b1ea", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "590c403a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3db8cbb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a204182a", + "metadata": { + "editable": true + }, + "source": [ + "## Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). The following\n", + "restrictions are imposed on an activation function for an FFNN to\n", + "fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "4fe58cce", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, examples\n", + "\n", + "Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "a14f6d08", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4c290410", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "ca1ac514", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9bcfab3", + "metadata": { + "editable": true + }, + "source": [ + "## The RELU function family\n", + "\n", + "The ReLU activation function suffers from a problem known as the dying\n", + "ReLUs: during training, some neurons effectively die, meaning they\n", + "stop outputting anything other than 0.\n", + "\n", + "In some cases, you may find that half of your network’s neurons are\n", + "dead, especially if you used a large learning rate. During training,\n", + "if a neuron’s weights get updated such that the weighted sum of the\n", + "neuron’s inputs is negative, it will start outputting 0. When this\n", + "happen, the neuron is unlikely to come back to life since the gradient\n", + "of the ReLU function is 0 when its input is negative." + ] + }, + { + "cell_type": "markdown", + "id": "2fdf56f7", + "metadata": { + "editable": true + }, + "source": [ + "## ELU function\n", + "\n", + "To solve this problem, nowadays practitioners use a variant of the\n", + "ReLU function, such as the leaky ReLU discussed above or the so-called\n", + "exponential linear unit (ELU) function" + ] + }, + { + "cell_type": "markdown", + "id": "14bf193c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\ z & z \\ge 0.\\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df29068f", + "metadata": { + "editable": true + }, + "source": [ + "## Which activation function should we use?\n", + "\n", + "In general it seems that the ELU activation function is better than\n", + "the leaky ReLU function (and its variants), which is better than\n", + "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n", + "than the logistic function.\n", + "\n", + "If runtime performance is an issue, then you may opt for the leaky\n", + "ReLU function over the ELU function If you don’t want to tweak yet\n", + "another hyperparameter, you may just use the default $\\alpha$ of\n", + "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n", + "computing power, you can use cross-validation or bootstrap to evaluate\n", + "other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "2fb5a29e", + "metadata": { + "editable": true + }, + "source": [ + "## More on activation functions, output layers\n", + "\n", + "In most cases you can use the ReLU activation function in the hidden\n", + "layers (or one of its variants).\n", + "\n", + "It is a bit faster to compute than other activation functions, and the\n", + "gradient descent optimization does in general not get stuck.\n", + "\n", + "**For the output layer:**\n", + "\n", + "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n", + "\n", + "* For regression tasks, you can simply use no activation function at all." + ] + }, + { + "cell_type": "markdown", + "id": "bab79791", + "metadata": { + "editable": true + }, + "source": [ + "## Building neural networks in Tensorflow and Keras\n", + "\n", + "Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n", + "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n", + "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer. \n", + "\n", + "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n", + "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n", + "NumPy arrays." + ] + }, + { + "cell_type": "markdown", + "id": "cc32bc9d", + "metadata": { + "editable": true + }, + "source": [ + "## Tensorflow\n", + "\n", + "Tensorflow is an open source library machine learning library\n", + "developed by the Google Brain team for internal use. It was released\n", + "under the Apache 2.0 open source license in November 9, 2015.\n", + "\n", + "Tensorflow is a computational framework that allows you to construct\n", + "machine learning models at different levels of abstraction, from\n", + "high-level, object-oriented APIs like Keras, down to the C++ kernels\n", + "that Tensorflow is built upon. The higher levels of abstraction are\n", + "simpler to use, but less flexible, and our choice of implementation\n", + "should reflect the problems we are trying to solve.\n", + "\n", + "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n", + "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n", + "to represent your model, and then create a Tensorflow *session* to run the graph.\n", + "\n", + "In this guide we will analyze the same data as we did in our NumPy and\n", + "scikit-learn tutorial, gathered from the MNIST database of images. We\n", + "will give an introduction to the lower level Python Application\n", + "Program Interfaces (APIs), and see how we use them to build our graph.\n", + "Then we will build (effectively) the same graph in Keras, to see just\n", + "how simple solving a machine learning problem can be.\n", + "\n", + "To install tensorflow on Unix/Linux systems, use pip as" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "deb81088", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "pip3 install tensorflow" + ] + }, + { + "cell_type": "markdown", + "id": "979148b0", + "metadata": { + "editable": true + }, + "source": [ + "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n", + "(current release of CPU-only TensorFlow)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ad63b8d9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf tensorflow\n", + "conda activate tf" + ] + }, + { + "cell_type": "markdown", + "id": "1417a40e", + "metadata": { + "editable": true + }, + "source": [ + "To install the current release of GPU TensorFlow" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d56acb3a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf-gpu tensorflow-gpu\n", + "conda activate tf-gpu" + ] + }, + { + "cell_type": "markdown", + "id": "6a163d27", + "metadata": { + "editable": true + }, + "source": [ + "## Using Keras\n", + "\n", + "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n", + "that supports Tensorflow, CTNK and Theano as backends. \n", + "If you have Anaconda installed you may run the following command" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9ee390a8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda install keras" + ] + }, + { + "cell_type": "markdown", + "id": "528ea3d5", + "metadata": { + "editable": true + }, + "source": [ + "You can look up the [instructions here](https://keras.io/) for more information.\n", + "\n", + "We will to a large extent use **keras** in this course." + ] + }, + { + "cell_type": "markdown", + "id": "32178225", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Let us look again at the MINST data set." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "e37f86e4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tensorflow as tf\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "06a7c3bd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-hot representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "358b46c5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "n_neurons_layer1 = 100\n", + "n_neurons_layer2 = 50\n", + "n_categories = 10\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n", + " model = Sequential()\n", + " model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_categories, activation='softmax'))\n", + " \n", + " sgd = optimizers.SGD(learning_rate=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5a0445fb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n", + " eta=eta, lmbd=lmbd)\n", + " DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = DNN.evaluate(X_test, Y_test)\n", + " \n", + " DNN_keras[i][j] = DNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "f301c7cf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " DNN = DNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "610c95e1", + "metadata": { + "editable": true + }, + "source": [ + "## Using Pytorch with the full MNIST data set" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d0f3ad9a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "\n", + "# Device configuration: use GPU if available\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "# MNIST dataset (downloads if not already present)\n", + "transform = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.5,), (0.5,)) # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n", + "])\n", + "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "\n", + "class NeuralNet(nn.Module):\n", + " def __init__(self):\n", + " super(NeuralNet, self).__init__()\n", + " self.fc1 = nn.Linear(28*28, 100) # first hidden layer (784 -> 100)\n", + " self.fc2 = nn.Linear(100, 100) # second hidden layer (100 -> 100)\n", + " self.fc3 = nn.Linear(100, 10) # output layer (100 -> 10 classes)\n", + " def forward(self, x):\n", + " x = x.view(x.size(0), -1) # flatten images into vectors of size 784\n", + " x = torch.relu(self.fc1(x)) # hidden layer 1 + ReLU activation\n", + " x = torch.relu(self.fc2(x)) # hidden layer 2 + ReLU activation\n", + " x = self.fc3(x) # output layer (logits for 10 classes)\n", + " return x\n", + "\n", + "model = NeuralNet().to(device)\n", + "\n", + "\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n", + "\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " model.train() # set model to training mode\n", + " running_loss = 0.0\n", + " for images, labels in train_loader:\n", + " # Move data to device (GPU if available, else CPU)\n", + " images, labels = images.to(device), labels.to(device)\n", + "\n", + " optimizer.zero_grad() # reset gradients to zero\n", + " outputs = model(images) # forward pass: compute predictions\n", + " loss = criterion(outputs, labels) # compute cross-entropy loss\n", + " loss.backward() # backpropagate to compute gradients\n", + " optimizer.step() # update weights using SGD step \n", + "\n", + " running_loss += loss.item()\n", + " # Compute average loss over all batches in this epoch\n", + " avg_loss = running_loss / len(train_loader)\n", + " print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n", + "\n", + "#Evaluation on the Test Set\n", + "\n", + "\n", + "\n", + "model.eval() # set model to evaluation mode \n", + "correct = 0\n", + "total = 0\n", + "with torch.no_grad(): # disable gradient calculation for evaluation \n", + " for images, labels in test_loader:\n", + " images, labels = images.to(device), labels.to(device)\n", + " outputs = model(images)\n", + " _, predicted = torch.max(outputs, dim=1) # class with highest score\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + "accuracy = 100 * correct / total\n", + "print(f\"Test Accuracy: {accuracy:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "aad687aa", + "metadata": { + "editable": true + }, + "source": [ + "## And a similar example using Tensorflow with Keras" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b6c4fad4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "from tensorflow.keras import layers, regularizers\n", + "\n", + "# Check for GPU (TensorFlow will use it automatically if available)\n", + "gpus = tf.config.list_physical_devices('GPU')\n", + "print(f\"GPUs available: {gpus}\")\n", + "\n", + "# 1) Load and preprocess MNIST\n", + "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n", + "# Normalize to [0, 1]\n", + "x_train = (x_train.astype(\"float32\") / 255.0)\n", + "x_test = (x_test.astype(\"float32\") / 255.0)\n", + "\n", + "# 2) Build the model: 784 -> 100 -> 100 -> 10\n", + "l2_reg = 1e-4 # L2 regularization strength\n", + "\n", + "model = keras.Sequential([\n", + " layers.Input(shape=(28, 28)),\n", + " layers.Flatten(),\n", + " layers.Dense(100, activation=\"relu\",\n", + " kernel_regularizer=regularizers.l2(l2_reg)),\n", + " layers.Dense(100, activation=\"relu\",\n", + " kernel_regularizer=regularizers.l2(l2_reg)),\n", + " layers.Dense(10, activation=\"softmax\") # output probabilities for 10 classes\n", + "])\n", + "\n", + "# 3) Compile with SGD + weight decay via L2 regularizers\n", + "model.compile(\n", + " optimizer=keras.optimizers.SGD(learning_rate=0.01),\n", + " loss=\"sparse_categorical_crossentropy\",\n", + " metrics=[\"accuracy\"],\n", + ")\n", + "\n", + "model.summary()\n", + "\n", + "# 4) Train\n", + "history = model.fit(\n", + " x_train, y_train,\n", + " epochs=10,\n", + " batch_size=64,\n", + " validation_split=0.1, # optional: monitor validation during training\n", + " verbose=1\n", + ")\n", + "\n", + "# 5) Evaluate on test set\n", + "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n", + "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "73162fbb", + "metadata": { + "editable": true + }, + "source": [ + "## Building our own neural network code\n", + "\n", + "Here we present a flexible object oriented codebase\n", + "for a feed forward neural network, along with a demonstration of how\n", + "to use it. Before we get into the details of the neural network, we\n", + "will first present some implementations of various schedulers, cost\n", + "functions and activation functions that can be used together with the\n", + "neural network.\n", + "\n", + "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023." + ] + }, + { + "cell_type": "markdown", + "id": "86f36041", + "metadata": { + "editable": true + }, + "source": [ + "### Learning rate methods\n", + "\n", + "The code below shows object oriented implementations of the Constant,\n", + "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n", + "of the classes belong to the shared abstract Scheduler class, and\n", + "share the update_change() and reset() methods allowing for any of the\n", + "schedulers to be seamlessly used during the training stage, as will\n", + "later be shown in the fit() method of the neural\n", + "network. Update_change() only has one parameter, the gradient\n", + "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n", + "from the weights. The reset() function takes no parameters, and resets\n", + "the desired variables. For Constant and Momentum, reset does nothing." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bcbec449", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "class Scheduler:\n", + " \"\"\"\n", + " Abstract class for Schedulers\n", + " \"\"\"\n", + "\n", + " def __init__(self, eta):\n", + " self.eta = eta\n", + "\n", + " # should be overwritten\n", + " def update_change(self, gradient):\n", + " raise NotImplementedError\n", + "\n", + " # overwritten if needed\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Constant(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + "\n", + " def update_change(self, gradient):\n", + " return self.eta * gradient\n", + " \n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Momentum(Scheduler):\n", + " def __init__(self, eta: float, momentum: float):\n", + " super().__init__(eta)\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " self.change = self.momentum * self.change + self.eta * gradient\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Adagrad(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " return self.eta * gradient * G_t_inverse\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class AdagradMomentum(Scheduler):\n", + " def __init__(self, eta, momentum):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class RMS_prop(Scheduler):\n", + " def __init__(self, eta, rho):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.second = 0.0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + " self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n", + " return self.eta * gradient / (np.sqrt(self.second + delta))\n", + "\n", + " def reset(self):\n", + " self.second = 0.0\n", + "\n", + "\n", + "class Adam(Scheduler):\n", + " def __init__(self, eta, rho, rho2):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.rho2 = rho2\n", + " self.moment = 0\n", + " self.second = 0\n", + " self.n_epochs = 1\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n", + " self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n", + "\n", + " moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n", + " second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n", + "\n", + " return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n", + "\n", + " def reset(self):\n", + " self.n_epochs += 1\n", + " self.moment = 0\n", + " self.second = 0" + ] + }, + { + "cell_type": "markdown", + "id": "961989d9", + "metadata": { + "editable": true + }, + "source": [ + "### Usage of the above learning rate schedulers\n", + "\n", + "To initalize a scheduler, simply create the object and pass in the\n", + "necessary parameters such as the learning rate and the momentum as\n", + "shown below. As the Scheduler class is an abstract class it should not\n", + "called directly, and will raise an error upon usage." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1e9fbe0f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", + "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)" + ] + }, + { + "cell_type": "markdown", + "id": "b5adb1b4", + "metadata": { + "editable": true + }, + "source": [ + "Here is a small example for how a segment of code using schedulers\n", + "could look. Switching out the schedulers is simple." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "dc4f4d28", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "weights = np.ones((3,3))\n", + "print(f\"Before scheduler:\\n{weights=}\")\n", + "\n", + "epochs = 10\n", + "for e in range(epochs):\n", + " gradient = np.random.rand(3, 3)\n", + " change = adam_scheduler.update_change(gradient)\n", + " weights = weights - change\n", + " adam_scheduler.reset()\n", + "\n", + "print(f\"\\nAfter scheduler:\\n{weights=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "8964d118", + "metadata": { + "editable": true + }, + "source": [ + "### Cost functions\n", + "\n", + "Here we discuss cost functions that can be used when creating the\n", + "neural network. Every cost function takes the target vector as its\n", + "parameter, and returns a function valued only at $x$ such that it may\n", + "easily be differentiated." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3a8470bd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "def CostOLS(target):\n", + " \n", + " def func(X):\n", + " return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostLogReg(target):\n", + "\n", + " def func(X):\n", + " \n", + " return -(1.0 / target.shape[0]) * np.sum(\n", + " (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n", + " )\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostCrossEntropy(target):\n", + " \n", + " def func(X):\n", + " return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n", + "\n", + " return func" + ] + }, + { + "cell_type": "markdown", + "id": "ab4daf8f", + "metadata": { + "editable": true + }, + "source": [ + "Below we give a short example of how these cost function may be used\n", + "to obtain results if you wish to test them out on your own using\n", + "AutoGrad's automatics differentiation." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "cf8922ac", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "target = np.array([[1, 2, 3]]).T\n", + "a = np.array([[4, 5, 6]]).T\n", + "\n", + "cost_func = CostCrossEntropy\n", + "cost_func_derivative = grad(cost_func(target))\n", + "\n", + "valued_at_a = cost_func_derivative(a)\n", + "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")" + ] + }, + { + "cell_type": "markdown", + "id": "fab332c4", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "Finally, before we look at the neural network, we will look at the\n", + "activation functions which can be specified between the hidden layers\n", + "and as the output function. Each function can be valued for any given\n", + "vector or matrix X, and can be differentiated via derivate()." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "5ab56013", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import elementwise_grad\n", + "\n", + "def identity(X):\n", + " return X\n", + "\n", + "\n", + "def sigmoid(X):\n", + " try:\n", + " return 1.0 / (1 + np.exp(-X))\n", + " except FloatingPointError:\n", + " return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n", + "\n", + "\n", + "def softmax(X):\n", + " X = X - np.max(X, axis=-1, keepdims=True)\n", + " delta = 10e-10\n", + " return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n", + "\n", + "\n", + "def RELU(X):\n", + " return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n", + "\n", + "\n", + "def LRELU(X):\n", + " delta = 10e-4\n", + " return np.where(X > np.zeros(X.shape), X, delta * X)\n", + "\n", + "\n", + "def derivate(func):\n", + " if func.__name__ == \"RELU\":\n", + "\n", + " def func(X):\n", + " return np.where(X > 0, 1, 0)\n", + "\n", + " return func\n", + "\n", + " elif func.__name__ == \"LRELU\":\n", + "\n", + " def func(X):\n", + " delta = 10e-4\n", + " return np.where(X > 0, 1, delta)\n", + "\n", + " return func\n", + "\n", + " else:\n", + " return elementwise_grad(func)" + ] + }, + { + "cell_type": "markdown", + "id": "969612c3", + "metadata": { + "editable": true + }, + "source": [ + "Below follows a short demonstration of how to use an activation\n", + "function. The derivative of the activation function will be important\n", + "when calculating the output delta term during backpropagation. Note\n", + "that derivate() can also be used for cost functions for a more\n", + "generalized approach." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "313878c6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "z = np.array([[4, 5, 6]]).T\n", + "print(f\"Input to activation function:\\n{z}\")\n", + "\n", + "act_func = sigmoid\n", + "a = act_func(z)\n", + "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n", + "\n", + "act_func_derivative = derivate(act_func)\n", + "valued_at_z = act_func_derivative(a)\n", + "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")" + ] + }, + { + "cell_type": "markdown", + "id": "095347a2", + "metadata": { + "editable": true + }, + "source": [ + "### The Neural Network\n", + "\n", + "Now that we have gotten a good understanding of the implementation of\n", + "some important components, we can take a look at an object oriented\n", + "implementation of a feed forward neural network. The feed forward\n", + "neural network has been implemented as a class named FFNN, which can\n", + "be initiated as a regressor or classifier dependant on the choice of\n", + "cost function. The FFNN can have any number of input nodes, hidden\n", + "layers with any amount of hidden nodes, and any amount of output nodes\n", + "meaning it can perform multiclass classification as well as binary\n", + "classification and regression problems. Although there is a lot of\n", + "code present, it makes for an easy to use and generalizeable interface\n", + "for creating many types of neural networks as will be demonstrated\n", + "below." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9ea2b0b7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import math\n", + "import autograd.numpy as np\n", + "import sys\n", + "import warnings\n", + "from autograd import grad, elementwise_grad\n", + "from random import random, seed\n", + "from copy import deepcopy, copy\n", + "from typing import Tuple, Callable\n", + "from sklearn.utils import resample\n", + "\n", + "warnings.simplefilter(\"error\")\n", + "\n", + "\n", + "class FFNN:\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Feed Forward Neural Network with interface enabling flexible design of a\n", + " nerual networks architecture and the specification of activation function\n", + " in the hidden layers and output layer respectively. This model can be used\n", + " for both regression and classification problems, depending on the output function.\n", + "\n", + " Attributes:\n", + " ------------\n", + " I dimensions (tuple[int]): A list of positive integers, which specifies the\n", + " number of nodes in each of the networks layers. The first integer in the array\n", + " defines the number of nodes in the input layer, the second integer defines number\n", + " of nodes in the first hidden layer and so on until the last number, which\n", + " specifies the number of nodes in the output layer.\n", + " II hidden_func (Callable): The activation function for the hidden layers\n", + " III output_func (Callable): The activation function for the output layer\n", + " IV cost_func (Callable): Our cost function\n", + " V seed (int): Sets random seed, makes results reproducible\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " dimensions: tuple[int],\n", + " hidden_func: Callable = sigmoid,\n", + " output_func: Callable = lambda x: x,\n", + " cost_func: Callable = CostOLS,\n", + " seed: int = None,\n", + " ):\n", + " self.dimensions = dimensions\n", + " self.hidden_func = hidden_func\n", + " self.output_func = output_func\n", + " self.cost_func = cost_func\n", + " self.seed = seed\n", + " self.weights = list()\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + " self.classification = None\n", + "\n", + " self.reset_weights()\n", + " self._set_classification()\n", + "\n", + " def fit(\n", + " self,\n", + " X: np.ndarray,\n", + " t: np.ndarray,\n", + " scheduler: Scheduler,\n", + " batches: int = 1,\n", + " epochs: int = 100,\n", + " lam: float = 0,\n", + " X_val: np.ndarray = None,\n", + " t_val: np.ndarray = None,\n", + " ):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " This function performs the training the neural network by performing the feedforward and backpropagation\n", + " algorithm to update the networks weights.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray) : training data\n", + " II t (np.ndarray) : target data\n", + " III scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n", + " IV scheduler_args (list[int]) : list of all arguments necessary for scheduler\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " V batches (int) : number of batches the datasets are split into, default equal to 1\n", + " VI epochs (int) : number of iterations used to train the network, default equal to 100\n", + " VII lam (float) : regularization hyperparameter lambda\n", + " VIII X_val (np.ndarray) : validation set\n", + " IX t_val (np.ndarray) : validation target set\n", + "\n", + " Returns:\n", + " ------------\n", + " I scores (dict) : A dictionary containing the performance metrics of the model.\n", + " The number of the metrics depends on the parameters passed to the fit-function.\n", + "\n", + " \"\"\"\n", + "\n", + " # setup \n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " val_set = False\n", + " if X_val is not None and t_val is not None:\n", + " val_set = True\n", + "\n", + " # creating arrays for score metrics\n", + " train_errors = np.empty(epochs)\n", + " train_errors.fill(np.nan)\n", + " val_errors = np.empty(epochs)\n", + " val_errors.fill(np.nan)\n", + "\n", + " train_accs = np.empty(epochs)\n", + " train_accs.fill(np.nan)\n", + " val_accs = np.empty(epochs)\n", + " val_accs.fill(np.nan)\n", + "\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + "\n", + " batch_size = X.shape[0] // batches\n", + "\n", + " X, t = resample(X, t)\n", + "\n", + " # this function returns a function valued only at X\n", + " cost_function_train = self.cost_func(t)\n", + " if val_set:\n", + " cost_function_val = self.cost_func(t_val)\n", + "\n", + " # create schedulers for each weight matrix\n", + " for i in range(len(self.weights)):\n", + " self.schedulers_weight.append(copy(scheduler))\n", + " self.schedulers_bias.append(copy(scheduler))\n", + "\n", + " print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n", + "\n", + " try:\n", + " for e in range(epochs):\n", + " for i in range(batches):\n", + " # allows for minibatch gradient descent\n", + " if i == batches - 1:\n", + " # If the for loop has reached the last batch, take all thats left\n", + " X_batch = X[i * batch_size :, :]\n", + " t_batch = t[i * batch_size :, :]\n", + " else:\n", + " X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n", + " t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n", + "\n", + " self._feedforward(X_batch)\n", + " self._backpropagate(X_batch, t_batch, lam)\n", + "\n", + " # reset schedulers for each epoch (some schedulers pass in this call)\n", + " for scheduler in self.schedulers_weight:\n", + " scheduler.reset()\n", + "\n", + " for scheduler in self.schedulers_bias:\n", + " scheduler.reset()\n", + "\n", + " # computing performance metrics\n", + " pred_train = self.predict(X)\n", + " train_error = cost_function_train(pred_train)\n", + "\n", + " train_errors[e] = train_error\n", + " if val_set:\n", + " \n", + " pred_val = self.predict(X_val)\n", + " val_error = cost_function_val(pred_val)\n", + " val_errors[e] = val_error\n", + "\n", + " if self.classification:\n", + " train_acc = self._accuracy(self.predict(X), t)\n", + " train_accs[e] = train_acc\n", + " if val_set:\n", + " val_acc = self._accuracy(pred_val, t_val)\n", + " val_accs[e] = val_acc\n", + "\n", + " # printing progress bar\n", + " progression = e / epochs\n", + " print_length = self._progress_bar(\n", + " progression,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " except KeyboardInterrupt:\n", + " # allows for stopping training at any point and seeing the result\n", + " pass\n", + "\n", + " # visualization of training progression (similiar to tensorflow progression bar)\n", + " sys.stdout.write(\"\\r\" + \" \" * print_length)\n", + " sys.stdout.flush()\n", + " self._progress_bar(\n", + " 1,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " sys.stdout.write(\"\")\n", + "\n", + " # return performance metrics for the entire run\n", + " scores = dict()\n", + "\n", + " scores[\"train_errors\"] = train_errors\n", + "\n", + " if val_set:\n", + " scores[\"val_errors\"] = val_errors\n", + "\n", + " if self.classification:\n", + " scores[\"train_accs\"] = train_accs\n", + "\n", + " if val_set:\n", + " scores[\"val_accs\"] = val_accs\n", + "\n", + " return scores\n", + "\n", + " def predict(self, X: np.ndarray, *, threshold=0.5):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs prediction after training of the network has been finished.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " II threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n", + " in classification problems\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " This vector is thresholded if regression=False, meaning that classification results\n", + " in a vector of 1s and 0s, while regressions in an array of decimal numbers\n", + "\n", + " \"\"\"\n", + "\n", + " predict = self._feedforward(X)\n", + "\n", + " if self.classification:\n", + " return np.where(predict > threshold, 1, 0)\n", + " else:\n", + " return predict\n", + "\n", + " def reset_weights(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Resets/Reinitializes the weights in order to train the network for a new problem.\n", + "\n", + " \"\"\"\n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " self.weights = list()\n", + " for i in range(len(self.dimensions) - 1):\n", + " weight_array = np.random.randn(\n", + " self.dimensions[i] + 1, self.dimensions[i + 1]\n", + " )\n", + " weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n", + "\n", + " self.weights.append(weight_array)\n", + "\n", + " def _feedforward(self, X: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates the activation of each layer starting at the input and ending at the output.\n", + " Each following activation is calculated from a weighted sum of each of the preceeding\n", + " activations (except in the case of the input layer).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " \"\"\"\n", + "\n", + " # reset matrices\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + "\n", + " # if X is just a vector, make it into a matrix\n", + " if len(X.shape) == 1:\n", + " X = X.reshape((1, X.shape[0]))\n", + "\n", + " # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n", + " # to add bias to our data\n", + " bias = np.ones((X.shape[0], 1)) * 0.01\n", + " X = np.hstack([bias, X])\n", + "\n", + " # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n", + " # exponent indicates layer number).\n", + " a = X\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(a)\n", + "\n", + " # The feed forward algorithm\n", + " for i in range(len(self.weights)):\n", + " if i < len(self.weights) - 1:\n", + " z = a @ self.weights[i]\n", + " self.z_matrices.append(z)\n", + " a = self.hidden_func(z)\n", + " # bias column again added to the data here\n", + " bias = np.ones((a.shape[0], 1)) * 0.01\n", + " a = np.hstack([bias, a])\n", + " self.a_matrices.append(a)\n", + " else:\n", + " try:\n", + " # a^L, the nodes in our output layers\n", + " z = a @ self.weights[i]\n", + " a = self.output_func(z)\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(z)\n", + " except Exception as OverflowError:\n", + " print(\n", + " \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n", + " )\n", + "\n", + " # this will be a^L\n", + " return a\n", + "\n", + " def _backpropagate(self, X, t, lam):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs the backpropagation algorithm. In other words, this method\n", + " calculates the gradient of all the layers starting at the\n", + " output layer, and moving from right to left accumulates the gradient until\n", + " the input layer is reached. Each layers respective weights are updated while\n", + " the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each.\n", + " II t (np.ndarray): The target vector, with n rows of p targets.\n", + " III lam (float32): regularization parameter used to punish the weights in case of overfitting\n", + "\n", + " Returns:\n", + " ------------\n", + " No return value.\n", + "\n", + " \"\"\"\n", + " out_derivative = derivate(self.output_func)\n", + " hidden_derivative = derivate(self.hidden_func)\n", + "\n", + " for i in range(len(self.weights) - 1, -1, -1):\n", + " # delta terms for output\n", + " if i == len(self.weights) - 1:\n", + " # for multi-class classification\n", + " if (\n", + " self.output_func.__name__ == \"softmax\"\n", + " ):\n", + " delta_matrix = self.a_matrices[i + 1] - t\n", + " # for single class classification\n", + " else:\n", + " cost_func_derivative = grad(self.cost_func(t))\n", + " delta_matrix = out_derivative(\n", + " self.z_matrices[i + 1]\n", + " ) * cost_func_derivative(self.a_matrices[i + 1])\n", + "\n", + " # delta terms for hidden layer\n", + " else:\n", + " delta_matrix = (\n", + " self.weights[i + 1][1:, :] @ delta_matrix.T\n", + " ).T * hidden_derivative(self.z_matrices[i + 1])\n", + "\n", + " # calculate gradient\n", + " gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n", + " gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n", + " 1, delta_matrix.shape[1]\n", + " )\n", + "\n", + " # regularization term\n", + " gradient_weights += self.weights[i][1:, :] * lam\n", + "\n", + " # use scheduler\n", + " update_matrix = np.vstack(\n", + " [\n", + " self.schedulers_bias[i].update_change(gradient_bias),\n", + " self.schedulers_weight[i].update_change(gradient_weights),\n", + " ]\n", + " )\n", + "\n", + " # update weights and bias\n", + " self.weights[i] -= update_matrix\n", + "\n", + " def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates accuracy of given prediction to target\n", + "\n", + " Parameters:\n", + " ------------\n", + " I prediction (np.ndarray): vector of predicitons output network\n", + " (1s and 0s in case of classification, and real numbers in case of regression)\n", + " II target (np.ndarray): vector of true values (What the network ideally should predict)\n", + "\n", + " Returns:\n", + " ------------\n", + " A floating point number representing the percentage of correctly classified instances.\n", + " \"\"\"\n", + " assert prediction.size == target.size\n", + " return np.average((target == prediction))\n", + " def _set_classification(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Decides if FFNN acts as classifier (True) og regressor (False),\n", + " sets self.classification during init()\n", + " \"\"\"\n", + " self.classification = False\n", + " if (\n", + " self.cost_func.__name__ == \"CostLogReg\"\n", + " or self.cost_func.__name__ == \"CostCrossEntropy\"\n", + " ):\n", + " self.classification = True\n", + "\n", + " def _progress_bar(self, progression, **kwargs):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Displays progress of training\n", + " \"\"\"\n", + " print_length = 40\n", + " num_equals = int(progression * print_length)\n", + " num_not = print_length - num_equals\n", + " arrow = \">\" if num_equals > 0 else \"\"\n", + " bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n", + " perc_print = self._format(progression * 100, decimals=5)\n", + " line = f\" {bar} {perc_print}% \"\n", + "\n", + " for key in kwargs:\n", + " if not np.isnan(kwargs[key]):\n", + " value = self._format(kwargs[key], decimals=4)\n", + " line += f\"| {key}: {value} \"\n", + " sys.stdout.write(\"\\r\" + line)\n", + " sys.stdout.flush()\n", + " return len(line)\n", + "\n", + " def _format(self, value, decimals=4):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Formats decimal numbers for progress bar\n", + " \"\"\"\n", + " if value > 0:\n", + " v = value\n", + " elif value < 0:\n", + " v = -10 * value\n", + " else:\n", + " v = 1\n", + " n = 1 + math.floor(math.log10(v))\n", + " if n >= decimals - 1:\n", + " return str(round(value))\n", + " return f\"{value:.{decimals-n-1}f}\"" + ] + }, + { + "cell_type": "markdown", + "id": "0f29bccd", + "metadata": { + "editable": true + }, + "source": [ + "Before we make a model, we will quickly generate a dataset we can use\n", + "for our linear regression problem as shown below" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "dc37b403", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "def SkrankeFunction(x, y):\n", + " return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n", + "\n", + "def create_X(x, y, n):\n", + " if len(x.shape) > 1:\n", + " x = np.ravel(x)\n", + " y = np.ravel(y)\n", + "\n", + " N = len(x)\n", + " l = int((n + 1) * (n + 2) / 2) # Number of elements in beta\n", + " X = np.ones((N, l))\n", + "\n", + " for i in range(1, n + 1):\n", + " q = int((i) * (i + 1) / 2)\n", + " for k in range(i + 1):\n", + " X[:, q + k] = (x ** (i - k)) * (y**k)\n", + "\n", + " return X\n", + "\n", + "step=0.5\n", + "x = np.arange(0, 1, step)\n", + "y = np.arange(0, 1, step)\n", + "x, y = np.meshgrid(x, y)\n", + "target = SkrankeFunction(x, y)\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "poly_degree=3\n", + "X = create_X(x, y, poly_degree)\n", + "\n", + "X_train, X_test, t_train, t_test = train_test_split(X, target)" + ] + }, + { + "cell_type": "markdown", + "id": "91790369", + "metadata": { + "editable": true + }, + "source": [ + "Now that we have our dataset ready for the regression, we can create\n", + "our regressor. Note that with the seed parameter, we can make sure our\n", + "results stay the same every time we run the neural network. For\n", + "inititialization, we simply specify the dimensions (we wish the amount\n", + "of input nodes to be equal to the datapoints, and the output to\n", + "predict one value)." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "62585c7a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "69cdc171", + "metadata": { + "editable": true + }, + "source": [ + "We then fit our model with our training data using the scheduler of our choice." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d0713298", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Constant(eta=1e-3)\n", + "scores = linear_regression.fit(X_train, t_train, scheduler)" + ] + }, + { + "cell_type": "markdown", + "id": "310f805d", + "metadata": { + "editable": true + }, + "source": [ + "Due to the progress bar we can see the MSE (train_error) throughout\n", + "the FFNN's training. Note that the fit() function has some optional\n", + "parameters with defualt arguments. For example, the regularization\n", + "hyperparameter can be left ignored if not needed, and equally the FFNN\n", + "will by default run for 100 epochs. These can easily be changed, such\n", + "as for example:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "216d1c44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "ba2e5a39", + "metadata": { + "editable": true + }, + "source": [ + "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", + "\n", + "Let us then switch to a binary classification. We use a binary\n", + "classification dataset, and follow a similar setup to the regression\n", + "case." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "8c5b291e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "wisconsin = load_breast_cancer()\n", + "X = wisconsin.data\n", + "target = wisconsin.target\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "X_train, X_val, t_train, t_val = train_test_split(X, target)\n", + "\n", + "scaler = MinMaxScaler()\n", + "scaler.fit(X_train)\n", + "X_train = scaler.transform(X_train)\n", + "X_val = scaler.transform(X_val)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "4f6aa682", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "3ff7c54a", + "metadata": { + "editable": true + }, + "source": [ + "We will now make use of our validation data by passing it into our fit function as a keyword argument" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "4bbcaedd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "aa4f54fe", + "metadata": { + "editable": true + }, + "source": [ + "Finally, we will create a neural network with 2 hidden layers with activation functions." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "c11be1f5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 1\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "78482f24", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "678b88e7", + "metadata": { + "editable": true + }, + "source": [ + "### Multiclass classification\n", + "\n", + "Finally, we will demonstrate the use case of multiclass classification\n", + "using our FFNN with the famous MNIST dataset, which contain images of\n", + "digits between the range of 0 to 9." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "833a7321", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "\n", + "def onehot(target: np.ndarray):\n", + " onehot = np.zeros((target.size, target.max() + 1))\n", + " onehot[np.arange(target.size), target] = 1\n", + " return onehot\n", + "\n", + "digits = load_digits()\n", + "\n", + "X = digits.data\n", + "target = digits.target\n", + "target = onehot(target)\n", + "\n", + "input_nodes = 64\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 10\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n", + "\n", + "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = multiclass.fit(X, target, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "1af2ad7b", + "metadata": { + "editable": true + }, + "source": [ + "## Testing the XOR gate and other gates\n", + "\n", + "Let us now use our code to test the XOR gate." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "752c6403", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", + "\n", + "# The XOR gate\n", + "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n", + "\n", + "input_nodes = X.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n", + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "0a7c91e3", + "metadata": { + "editable": true + }, + "source": [ + "Not bad, but the results depend strongly on the learning reate. Try different learning rates." + ] + }, + { + "cell_type": "markdown", + "id": "40ffa1fb", + "metadata": { + "editable": true + }, + "source": [ + "## Solving differential equations with Deep Learning\n", + "\n", + "The Universal Approximation Theorem states that a neural network can\n", + "approximate any function at a single hidden layer along with one input\n", + "and output layer to any given precision.\n", + "\n", + "**Book on solving differential equations with ML methods.**\n", + "\n", + "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n", + "\n", + "**Physics informed neural networks.**\n", + "\n", + "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n", + "\n", + "**Thanks to Kristine Baluka Hein.**\n", + "\n", + "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n", + "A great thanks to Kristine." + ] + }, + { + "cell_type": "markdown", + "id": "191ba3eb", + "metadata": { + "editable": true + }, + "source": [ + "## Ordinary Differential Equations first\n", + "\n", + "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n", + "\n", + "In general, an ordinary differential equation looks like" + ] + }, + { + "cell_type": "markdown", + "id": "a0be312a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{ode} \\tag{1}\n", + "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "000663cf", + "metadata": { + "editable": true + }, + "source": [ + "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n", + "\n", + "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n", + "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n", + "The equation is referred to as a $n$-th order ODE.\n", + "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n", + "for the solution to be unique." + ] + }, + { + "cell_type": "markdown", + "id": "f5b87995", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "Let the trial solution $g_t(x)$ be" + ] + }, + { + "cell_type": "markdown", + "id": "a166c0b6", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f1e49a2c", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n", + "of conditions, $N(x,P)$ a neural network with weights and biases\n", + "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n", + "neural network. The role of the function $h_2(x, N(x,P))$, is to\n", + "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n", + "evaluated at the values of $x$ where the given conditions must be\n", + "satisfied. The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n", + "the conditions.\n", + "\n", + "But what about the network $N(x,P)$?\n", + "\n", + "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation." + ] + }, + { + "cell_type": "markdown", + "id": "207d1a97", + "metadata": { + "editable": true + }, + "source": [ + "## Minimization process\n", + "\n", + "For the minimization to be defined, we need to have a cost function at hand to minimize.\n", + "\n", + "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n", + "We can choose to consider the mean squared error as the cost function for an input $x$.\n", + "Since we are looking at one input, the cost function is just $f$ squared.\n", + "The cost function $c\\left(x, P \\right)$ can therefore be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "94a061a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "93244d03", + "metadata": { + "editable": true + }, + "source": [ + "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n", + "the cost function becomes" + ] + }, + { + "cell_type": "markdown", + "id": "6dc16fd4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{cost} \\tag{3}\n", + "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "01f4c14a", + "metadata": { + "editable": true + }, + "source": [ + "The neural net should then find the parameters $P$ that minimizes the cost function in\n", + "([3](#cost)) for a set of $N$ training samples $x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "1784066c", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cost function using gradient descent and automatic differentiation\n", + "\n", + "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n", + "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n", + "\n", + "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n", + "Automatic differentiation is a method of finding the derivatives numerically with very high precision." + ] + }, + { + "cell_type": "markdown", + "id": "43e1b7bf", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Exponential decay\n", + "\n", + "An exponential decay of a quantity $g(x)$ is described by the equation" + ] + }, + { + "cell_type": "markdown", + "id": "5c28e60a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solve_expdec} \\tag{4}\n", + " g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfd2e420", + "metadata": { + "editable": true + }, + "source": [ + "with $g(0) = g_0$ for some chosen initial value $g_0$.\n", + "\n", + "The analytical solution of ([4](#solve_expdec)) is" + ] + }, + { + "cell_type": "markdown", + "id": "b93aa0f8", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "093952f0", + "metadata": { + "editable": true + }, + "source": [ + "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))." + ] + }, + { + "cell_type": "markdown", + "id": "8f82fa61", + "metadata": { + "editable": true + }, + "source": [ + "## The function to solve for\n", + "\n", + "The program will use a neural network to solve" + ] + }, + { + "cell_type": "markdown", + "id": "027d9c52", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode} \\tag{6}\n", + "g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c18c4ee8", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n", + "\n", + "In this example, $\\gamma = 2$ and $g_0 = 10$." + ] + }, + { + "cell_type": "markdown", + "id": "a0d7fc0a", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be" + ] + }, + { + "cell_type": "markdown", + "id": "73cd72f4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a4d0850f", + "metadata": { + "editable": true + }, + "source": [ + "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer." + ] + }, + { + "cell_type": "markdown", + "id": "62f3b94f", + "metadata": { + "editable": true + }, + "source": [ + "## Setup of Network\n", + "\n", + "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}}, P_{\\text{output}} \\}$.\n", + "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n", + "\n", + "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n", + "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n", + "\n", + "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n", + "\n", + "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:" + ] + }, + { + "cell_type": "markdown", + "id": "f5144858", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{trial} \\tag{7}\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6b441362", + "metadata": { + "editable": true + }, + "source": [ + "## Reformulating the problem\n", + "\n", + "We wish that our neural network manages to minimize a given cost function.\n", + "\n", + "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n", + "such that it describes the problem a neural network can solve for.\n", + "\n", + "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n", + "\n", + "The trial solution" + ] + }, + { + "cell_type": "markdown", + "id": "abfe2d6d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aabb6c7b", + "metadata": { + "editable": true + }, + "source": [ + "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "11fc8b1b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{nnmin} \\tag{8}\n", + "g_t'(x, P) = - \\gamma g_t(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "604c92b4", + "metadata": { + "editable": true + }, + "source": [ + "is fulfilled as *best as possible*." + ] + }, + { + "cell_type": "markdown", + "id": "e2cd7572", + "metadata": { + "editable": true + }, + "source": [ + "## More technicalities\n", + "\n", + "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n", + "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n", + "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n", + "\n", + "This gives the following cost function our neural network must solve for:" + ] + }, + { + "cell_type": "markdown", + "id": "d916a5f6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d746e69c", + "metadata": { + "editable": true + }, + "source": [ + "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n", + "\n", + "or, in terms of weights and biases for the hidden and output layer in our network:" + ] + }, + { + "cell_type": "markdown", + "id": "4c34c242", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f55f3047", + "metadata": { + "editable": true + }, + "source": [ + "for an input value $x$." + ] + }, + { + "cell_type": "markdown", + "id": "485e4671", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes" + ] + }, + { + "cell_type": "markdown", + "id": "5628ca35", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{min} \\tag{9}\n", + "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "da2c90ea", + "metadata": { + "editable": true + }, + "source": [ + "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes" + ] + }, + { + "cell_type": "markdown", + "id": "d386a466", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P} C(\\boldsymbol{x}, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ec3d975a", + "metadata": { + "editable": true + }, + "source": [ + "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n", + "\n", + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f0f47e7", + "metadata": { + "editable": true + }, + "source": [ + "## A possible implementation of a neural network\n", + "\n", + "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n", + "\n", + "First, the neural network must feed forward the inputs.\n", + "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n", + "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be $N_{\\text{hidden} }$." + ] + }, + { + "cell_type": "markdown", + "id": "a757d9cf", + "metadata": { + "editable": true + }, + "source": [ + "## Technicalities\n", + "\n", + "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:" + ] + }, + { + "cell_type": "markdown", + "id": "ee093dd9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "x_j\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4d3954bf", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities I\n", + "\n", + "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:" + ] + }, + { + "cell_type": "markdown", + "id": "b4b36b8c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big) \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + " b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "x_1 & x_2 & \\dots & x_N\n", + "\\end{pmatrix} \\\\\n", + "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "36e8a1dd", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities II\n", + "\n", + "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n", + "\n", + "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n", + "\n", + "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:" + ] + }, + { + "cell_type": "markdown", + "id": "af2e68be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7b8922c6", + "metadata": { + "editable": true + }, + "source": [ + "It is possible to use other activations functions for the hidden layer also.\n", + "\n", + "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n", + "\n", + "$$\n", + "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big( \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n", + "$$\n", + "\n", + "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n", + "\n", + "The output layer consists of one neuron in this case, and combines the\n", + "output from each of the neurons in the hidden layers. The output layer\n", + "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n", + "and biases $b_i^{\\text{output}}$. In this case,\n", + "it is assumes that the number of neurons in the output layer is one." + ] + }, + { + "cell_type": "markdown", + "id": "2aa977d9", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities III\n", + "\n", + "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously." + ] + }, + { + "cell_type": "markdown", + "id": "48eccfa6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{1,j}^{\\text{output}} & =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "\\boldsymbol{x}_j^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4c2cdbf", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities IV\n", + "\n", + "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:" + ] + }, + { + "cell_type": "markdown", + "id": "be26d9c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}_{1}^{\\text{output}} =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3703c9a", + "metadata": { + "editable": true + }, + "source": [ + "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network." + ] + }, + { + "cell_type": "markdown", + "id": "9859680c", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation\n", + "\n", + "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n", + "\n", + "The chosen cost function for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "c3df269d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dc69023a", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize the cost function, an optimization method must be chosen.\n", + "\n", + "Here, gradient descent with a constant step size has been chosen." + ] + }, + { + "cell_type": "markdown", + "id": "d4bed3bd", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent\n", + "\n", + "The idea of the gradient descent algorithm is to update parameters in\n", + "a direction where the cost function decreases goes to a minimum.\n", + "\n", + "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n", + "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n", + "\\boldsymbol{\\omega})$, goes as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "ed2a4f9a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9a4f604", + "metadata": { + "editable": true + }, + "source": [ + "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n", + "\n", + "The value of $\\lambda$ decides how large steps the algorithm must take\n", + "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n", + "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n", + "to the elements in $\\boldsymbol{\\omega}$.\n", + "\n", + "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n", + "respect to the two sets of weights and biases, that is for the hidden\n", + "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n", + "}$ .\n", + "\n", + "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by" + ] + }, + { + "cell_type": "markdown", + "id": "e48d507f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P) \\\\\n", + "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b84c5cf5", + "metadata": { + "editable": true + }, + "source": [ + "## The code for solving the ODE" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "293d0f7d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Assuming one input, hidden, and output layer\n", + "def neural_network(params, x):\n", + "\n", + " # Find the weights (including and biases) for the hidden and output layer.\n", + " # Assume that params is a list of parameters for each layer.\n", + " # The biases are the first element for each array in params,\n", + " # and the weights are the remaning elements in each array in params.\n", + "\n", + " w_hidden = params[0]\n", + " w_output = params[1]\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " ## Hidden layer:\n", + "\n", + " # Add a row of ones to include bias\n", + " x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_input)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " ## Output layer:\n", + "\n", + " # Include bias:\n", + " x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_hidden)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial(x,params, g0 = 10):\n", + " return g0 + x*neural_network(params,x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n", + "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n", + " ## Set up initial weights and biases\n", + "\n", + " # For the hidden layer\n", + " p0 = npr.randn(num_neurons_hidden, 2 )\n", + "\n", + " # For the output layer\n", + " p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n", + "\n", + " P = [p0, p1]\n", + "\n", + " print('Initial cost: %g'%cost_function(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of two arrays;\n", + " # one for the gradient w.r.t P_hidden and\n", + " # one for the gradient w.r.t P_output\n", + " cost_grad = cost_function_grad(P, x)\n", + "\n", + " P[0] = P[0] - lmb * cost_grad[0]\n", + " P[1] = P[1] - lmb * cost_grad[1]\n", + "\n", + " print('Final cost: %g'%cost_function(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " # Set seed such that the weight are initialized\n", + " # with same weights and biases for every run.\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = 10\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " # Use the network\n", + " P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " # Print the deviation from the trial solution and true solution\n", + " res = g_trial(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "54c070e1", + "metadata": { + "editable": true + }, + "source": [ + "## The network with one input layer, specified number of hidden layers, and one output layer\n", + "\n", + "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n", + "\n", + "The number of neurons within each hidden layer are given as a list of integers in the program below." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "4ab2467e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# The neural network with one input layer and one output layer,\n", + "# but with number of hidden layers specified by the user.\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x,params, g0 = 10):\n", + " return g0 + x*deep_neural_network(params, x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The same cost function as before, but calls deep_neural_network instead.\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input and one output layer,\n", + "# but with specified number of hidden layers from the user.\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # The number of elements in the list num_hidden_neurons thus represents\n", + " # the number of hidden layers.\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weights and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = np.array([10,10])\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " res = g_trial_deep(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','dnn'])\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "05126a03", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Population growth\n", + "\n", + "A logistic model of population growth assumes that a population converges toward an equilibrium.\n", + "The population growth can be modeled by" + ] + }, + { + "cell_type": "markdown", + "id": "7b4e9871", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{log} \\tag{10}\n", + "\tg'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20266e3a", + "metadata": { + "editable": true + }, + "source": [ + "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n", + "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n", + "\n", + "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n", + "and high execution time (this might be more apparent in the examples solving PDEs),\n", + "using a library like TensorFlow is recommended.\n", + "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method." + ] + }, + { + "cell_type": "markdown", + "id": "8a3f1b3d", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the problem\n", + "\n", + "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n", + "The population follows the model" + ] + }, + { + "cell_type": "markdown", + "id": "14dfc04b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode_population} \\tag{11}\n", + "g'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b125d1d3", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$.\n", + "\n", + "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$." + ] + }, + { + "cell_type": "markdown", + "id": "226a3528", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "We will get a slightly different trial solution, as the boundary conditions are different\n", + "compared to the case for exponential decay.\n", + "\n", + "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n", + "\n", + "$$\n", + "h_1(t) = g_0 + t \\cdot N(t,P)\n", + "$$\n", + "\n", + "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n", + "\n", + "The analytical solution is\n", + "\n", + "$$\n", + "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "adeeb731", + "metadata": { + "editable": true + }, + "source": [ + "## The program using Autograd\n", + "\n", + "The network will be the similar as for the exponential decay example, but with some small modifications for our problem." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "eb3ed6d1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Function to get the parameters.\n", + "# Done such that one can easily change the paramaters after one's liking.\n", + "def get_parameters():\n", + " alpha = 2\n", + " A = 1\n", + " g0 = 1.2\n", + " return alpha, A, g0\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = f(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# The right side of the ODE:\n", + "def f(x, g_trial):\n", + " alpha,A, g0 = get_parameters()\n", + " return alpha*g_trial*(A - g_trial)\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x, params):\n", + " alpha,A, g0 = get_parameters()\n", + " return g0 + x*deep_neural_network(params,x)\n", + "\n", + "# The analytical solution:\n", + "def g_analytic(t):\n", + " alpha,A, g0 = get_parameters()\n", + " return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100, 50, 25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2407df1c", + "metadata": { + "editable": true + }, + "source": [ + "## Using forward Euler to solve the ODE\n", + "\n", + "A straightforward way of solving an ODE numerically, is to use Euler's method.\n", + "\n", + "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n", + "\n", + "$$\n", + "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n", + "$$\n", + "\n", + "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields" + ] + }, + { + "cell_type": "markdown", + "id": "e30d9840", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + " g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n", + " &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4af6e338", + "metadata": { + "editable": true + }, + "source": [ + "along with the condition that $g(0) = g_0$.\n", + "\n", + "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n", + "\n", + "For $i \\geq 1$, we have that" + ] + }, + { + "cell_type": "markdown", + "id": "606cf0d3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "t_i &= i\\Delta t \\\\\n", + "&= (i - 1)\\Delta t + \\Delta t \\\\\n", + "&= t_{i-1} + \\Delta t\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3275ea67", + "metadata": { + "editable": true + }, + "source": [ + "Now, if $g_i = g(t_i)$ then" + ] + }, + { + "cell_type": "markdown", + "id": "8c36efec", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " g_i &= g(t_i) \\\\\n", + " &= g(t_{i-1} + \\Delta t) \\\\\n", + " &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n", + " &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odenum} \\tag{12}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5290cde6", + "metadata": { + "editable": true + }, + "source": [ + "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n", + "\n", + "Equation ([12](#odenum)) could be implemented in the following way,\n", + "extending the program that uses the network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "d5488516", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Assume that all function definitions from the example program using Autograd\n", + "# are located here.\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100,50,25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " ## Find an approximation to the funtion using forward Euler\n", + "\n", + " alpha, A, g0 = get_parameters()\n", + " dt = T/(Nt - 1)\n", + "\n", + " # Perform forward Euler to solve the ODE\n", + " g_euler = np.zeros(Nt)\n", + " g_euler[0] = g0\n", + "\n", + " for i in range(1,Nt):\n", + " g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n", + "\n", + " # Print the errors done by each method\n", + " diff1 = np.max(np.abs(g_euler - g_analytical))\n", + " diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n", + "\n", + " print('Max absolute difference between Euler method and analytical: %g'%diff1)\n", + " print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n", + "\n", + " # Plot results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(t,g_euler)\n", + " plt.plot(t,g_analytical)\n", + " plt.plot(t,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['euler','analytical','dnn'])\n", + " plt.xlabel('Time t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "d631641d", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the one dimensional Poisson equation\n", + "\n", + "The Poisson equation for $g(x)$ in one dimension is" + ] + }, + { + "cell_type": "markdown", + "id": "3bd8043b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{poisson} \\tag{13}\n", + " -g''(x) = f(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "818ac1d8", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function for $x \\in (0,1)$.\n", + "\n", + "The conditions that $g(x)$ is chosen to fulfill, are" + ] + }, + { + "cell_type": "markdown", + "id": "894be116", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g(0) &= 0 \\\\\n", + " g(1) &= 0\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c2fce07f", + "metadata": { + "editable": true + }, + "source": [ + "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n", + "The results from the networks can then be compared to the analytical solution.\n", + "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "1e2ffb5e", + "metadata": { + "editable": true + }, + "source": [ + "## The specific equation to solve for\n", + "\n", + "Here, the function $g(x)$ to solve for follows the equation" + ] + }, + { + "cell_type": "markdown", + "id": "5677eb07", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "-g''(x) = f(x),\\qquad x \\in (0,1)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "89173815", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function, along with the chosen conditions" + ] + }, + { + "cell_type": "markdown", + "id": "f6e81c01", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0) = g(1) = 0\n", + "\\end{aligned}\\label{cond} \\tag{14}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82b4c100", + "metadata": { + "editable": true + }, + "source": [ + "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n", + "\n", + "For this case, a possible trial solution satisfying the conditions could be" + ] + }, + { + "cell_type": "markdown", + "id": "05574f7f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5c17a08c", + "metadata": { + "editable": true + }, + "source": [ + "The analytical solution for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "a0ce240a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g(x) = x(1 - x)\\exp(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d90da9be", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the equation using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "ffd8b552", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2cde42e7", + "metadata": { + "editable": true + }, + "source": [ + "## Comparing with a numerical scheme\n", + "\n", + "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n", + "\n", + "Using Taylor series, the second derivative can be expressed as\n", + "\n", + "$$\n", + "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n", + "$$\n", + "\n", + "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n", + "\n", + "Looking away from the error terms gives an approximation to the second derivative:" + ] + }, + { + "cell_type": "markdown", + "id": "e24a46af", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{approx} \\tag{15}\n", + "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2417ec7c", + "metadata": { + "editable": true + }, + "source": [ + "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes" + ] + }, + { + "cell_type": "markdown", + "id": "012a9c2b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n", + "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "101bccb8", + "metadata": { + "editable": true + }, + "source": [ + "Since we know from our problem that" + ] + }, + { + "cell_type": "markdown", + "id": "280cdc54", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "-g''(x) &= f(x) \\\\\n", + "&= (3x + x^2)\\exp(x)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "38bc9035", + "metadata": { + "editable": true + }, + "source": [ + "along with the conditions $g(0) = g(1) = 0$,\n", + "the following scheme can be used to find an approximate solution for $g(x)$ numerically:" + ] + }, + { + "cell_type": "markdown", + "id": "3925a117", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n", + " -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odesys} \\tag{16}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f86e85b", + "metadata": { + "editable": true + }, + "source": [ + "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n", + "\n", + "The equation can be rewritten into a matrix equation:" + ] + }, + { + "cell_type": "markdown", + "id": "394b14bc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\begin{pmatrix}\n", + "2 & -1 & 0 & \\dots & 0 \\\\\n", + "-1 & 2 & -1 & \\dots & 0 \\\\\n", + "\\vdots & & \\ddots & & \\vdots \\\\\n", + "0 & \\dots & -1 & 2 & -1 \\\\\n", + "0 & \\dots & 0 & -1 & 2\\\\\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "g_1 \\\\\n", + "g_2 \\\\\n", + "\\vdots \\\\\n", + "g_{N_x - 3} \\\\\n", + "g_{N_x - 2}\n", + "\\end{pmatrix}\n", + "&=\n", + "\\Delta x^2\n", + "\\begin{pmatrix}\n", + "f(x_1) \\\\\n", + "f(x_2) \\\\\n", + "\\vdots \\\\\n", + "f(x_{N_x - 3}) \\\\\n", + "f(x_{N_x - 2})\n", + "\\end{pmatrix} \\\\\n", + "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ab07ae1", + "metadata": { + "editable": true + }, + "source": [ + "which makes it possible to solve for the vector $\\boldsymbol{g}$." + ] + }, + { + "cell_type": "markdown", + "id": "8134c34f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the code\n", + "\n", + "We can then compare the result from this numerical scheme with the output from our network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "4362f9a9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + "\n", + " ## Perform the computation using the numerical scheme\n", + "\n", + " dx = 1/(Nx - 1)\n", + "\n", + " # Set up the matrix A\n", + " A = np.zeros((Nx-2,Nx-2))\n", + "\n", + " A[0,0] = 2\n", + " A[0,1] = -1\n", + "\n", + " for i in range(1,Nx-3):\n", + " A[i,i-1] = -1\n", + " A[i,i] = 2\n", + " A[i,i+1] = -1\n", + "\n", + " A[Nx - 3, Nx - 4] = -1\n", + " A[Nx - 3, Nx - 3] = 2\n", + "\n", + " # Set up the vector f\n", + " f_vec = dx**2 * f(x[1:-1])\n", + "\n", + " # Solve the equation\n", + " g_res = np.linalg.solve(A,f_vec)\n", + "\n", + " g_vec = np.zeros(Nx)\n", + " g_vec[1:-1] = g_res\n", + "\n", + " # Print the differences between each method\n", + " max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " max_diff2 = np.max(np.abs(g_vec - g_analytical))\n", + " print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n", + " print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(x,g_vec)\n", + " plt.plot(x,g_analytical)\n", + " plt.plot(x,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['numerical scheme','analytical','dnn'])\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "c66dc85a", + "metadata": { + "editable": true + }, + "source": [ + "## Partial Differential Equations\n", + "\n", + "A partial differential equation (PDE) has a solution here the function\n", + "is defined by multiple variables. The equation may involve all kinds\n", + "of combinations of which variables the function is differentiated with\n", + "respect to.\n", + "\n", + "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "cf60d1fc", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{PDE} \\tag{17}\n", + " f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bff85f6e", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given." + ] + }, + { + "cell_type": "markdown", + "id": "64289867", + "metadata": { + "editable": true + }, + "source": [ + "## Type of problem\n", + "\n", + "The problem our network must solve for, is similar to the ODE case.\n", + "We must have a trial solution $g_t$ at hand.\n", + "\n", + "For instance, the trial solution could be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "75d3a4d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f3e695d", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n", + "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n", + "\n", + "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions." + ] + }, + { + "cell_type": "markdown", + "id": "da1ba3cf", + "metadata": { + "editable": true + }, + "source": [ + "## Network requirements\n", + "\n", + "The network tries then the minimize the cost function following the\n", + "same ideas as described for the ODE case, but now with more than one\n", + "variables to consider. The concept still remains the same; find a set\n", + "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n", + "close to zero as possible.\n", + "\n", + "As for the ODE case, the cost function is the mean squared error that\n", + "the network must try to minimize. The cost function for the network to\n", + "minimize is" + ] + }, + { + "cell_type": "markdown", + "id": "373065ff", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x_1, \\dots, x_N, P\\right) = \\left( f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2281eade", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:" + ] + }, + { + "cell_type": "markdown", + "id": "989a8905", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b36367a0", + "metadata": { + "editable": true + }, + "source": [ + "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into" + ] + }, + { + "cell_type": "markdown", + "id": "6f6f51dd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "35bd1e4a", + "metadata": { + "editable": true + }, + "source": [ + "## Example: The diffusion equation\n", + "\n", + "In one spatial dimension, the equation reads" + ] + }, + { + "cell_type": "markdown", + "id": "2b804c0a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "07f20557", + "metadata": { + "editable": true + }, + "source": [ + "where a possible choice of conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "0e14c702", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a19c5cae", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x)$ being some given function." + ] + }, + { + "cell_type": "markdown", + "id": "de041a40", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the problem\n", + "\n", + "For this case, we want to find $g(x,t)$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "519bb7a7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation} \\label{diffonedim} \\tag{18}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "129322ea", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ddc7b725", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5497b34b", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x) = \\sin(\\pi x)$.\n", + "\n", + "First, let us set up the deep neural network.\n", + "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n", + "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions." + ] + }, + { + "cell_type": "markdown", + "id": "0b9040e4", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd\n", + "\n", + "The only change to do here, is to extend our network such that\n", + "functions of multiple parameters are correctly handled. In this case\n", + "we have two variables in our function to solve for, that is time $t$\n", + "and position $x$. The variables will be represented by a\n", + "one-dimensional array in the program. The program will evaluate the\n", + "network at each possible pair $(x,t)$, given an array for the desired\n", + "$x$-values and $t$-values to approximate the solution at." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "17097802", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]" + ] + }, + { + "cell_type": "markdown", + "id": "a2178b56", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The trial solution\n", + "\n", + "The cost function must then iterate through the given arrays\n", + "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n", + "neural network and the trial solution is evaluated at, and then finds\n", + "the Jacobian of the trial solution.\n", + "\n", + "A possible trial solution for this PDE is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n", + "$$\n", + "\n", + "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n", + "\n", + "To fulfill the conditions, $A(x,t)$ could be:\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n", + "$$\n", + "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "533f4e84", + "metadata": { + "editable": true + }, + "source": [ + "## Why the jacobian?\n", + "\n", + "The Jacobian is used because the program must find the derivative of\n", + "the trial solution with respect to $x$ and $t$.\n", + "\n", + "This gives the necessity of computing the Jacobian matrix, as we want\n", + "to evaluate the gradient with respect to $x$ and $t$ (note that the\n", + "Jacobian of a scalar-valued multivariate function is simply its\n", + "gradient).\n", + "\n", + "In Autograd, the differentiation is by default done with respect to\n", + "the first input argument of your Python function. Since the points is\n", + "an array representing $x$ and $t$, the Jacobian is calculated using\n", + "the values of $x$ and $t$.\n", + "\n", + "To find the second derivative with respect to $x$ and $t$, the\n", + "Jacobian can be found for the second time. The result is a Hessian\n", + "matrix, which is the matrix containing all the possible second order\n", + "mixed derivatives of $g(x,t)$." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "7b494481", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum" + ] + }, + { + "cell_type": "markdown", + "id": "9f4b4939", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The full program\n", + "\n", + "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n", + "\n", + "The analytical solution of our problem is\n", + "\n", + "$$\n", + "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n", + "$$\n", + "\n", + "A possible way to implement a neural network solving the PDE, is given below.\n", + "Be aware, though, that it is fairly slow for the parameters used.\n", + "A better result is possible, but requires more iterations, and thus longer time to complete.\n", + "\n", + "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n", + "Using TensorFlow results in a much better execution time. Try it!" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "83d6eb7d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import jacobian,hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the network\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## Define the trial solution and cost function\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum /( np.size(x)*np.size(t) )\n", + "\n", + "## For comparison, define the analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n", + "\n", + "## Set up a function for training the network to solve for the equation\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [100, 25]\n", + " num_iter = 250\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " g_dnn_ag = np.zeros((Nx, Nt))\n", + " G_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " g_dnn_ag[i,j] = g_trial(point,P)\n", + "\n", + " G_analytical[i,j] = g_analytic(point)\n", + "\n", + " # Find the map difference between the analytical and the computed solution\n", + " diff_ag = np.abs(g_dnn_ag - G_analytical)\n", + " print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = g_dnn_ag[:,indx1]\n", + " res2 = g_dnn_ag[:,indx2]\n", + " res3 = g_dnn_ag[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = G_analytical[:,indx1]\n", + " res_analytical2 = G_analytical[:,indx2]\n", + " res_analytical3 = G_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ada13a48", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the wave equation with Neural Networks\n", + "\n", + "The wave equation is" + ] + }, + { + "cell_type": "markdown", + "id": "e4727d73", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0b86d555", + "metadata": { + "editable": true + }, + "source": [ + "with $c$ being the specified wave speed.\n", + "\n", + "Here, the chosen conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "216948d5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\tg(0,t) &= 0 \\\\\n", + "\tg(1,t) &= 0 \\\\\n", + "\tg(x,0) &= u(x) \\\\\n", + "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "44c25fdc", + "metadata": { + "editable": true + }, + "source": [ + "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions." + ] + }, + { + "cell_type": "markdown", + "id": "98f919eb", + "metadata": { + "editable": true + }, + "source": [ + "## The problem to solve for\n", + "\n", + "The wave equation to solve for, is" + ] + }, + { + "cell_type": "markdown", + "id": "01299767", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{wave} \\tag{19}\n", + "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "556587c5", + "metadata": { + "editable": true + }, + "source": [ + "where $c$ is the given wave speed.\n", + "The chosen conditions for this equation are" + ] + }, + { + "cell_type": "markdown", + "id": "c9eb4f3a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0,t) &= 0, &t \\geq 0 \\\\\n", + "g(1,t) &= 0, &t \\geq 0 \\\\\n", + "g(x,0) &= u(x), &x\\in[0,1] \\\\\n", + "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n", + "\\end{aligned} \\label{condwave} \\tag{20}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "63128ef6", + "metadata": { + "editable": true + }, + "source": [ + "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "ff568c81", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n", + "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n", + "\n", + "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n", + "$$\n", + "\n", + "where\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t^2)u(x) + tv(x)\n", + "$$\n", + "\n", + "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example." + ] + }, + { + "cell_type": "markdown", + "id": "7b32c8dd", + "metadata": { + "editable": true + }, + "source": [ + "## The analytical solution\n", + "\n", + "The analytical solution for our specific problem, is\n", + "\n", + "$$\n", + "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fc33e683", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the wave equation - the full program using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "2f923958", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def v(x):\n", + " return -np.pi*np.sin(np.pi*x)\n", + "\n", + "def h1(point):\n", + " x,t = point\n", + " return (1 - t**2)*u(x) + t*v(x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n", + "\n", + "## Define the cost function\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_d2x = g_t_hessian[0][0]\n", + " g_t_d2t = g_t_hessian[1][1]\n", + "\n", + " err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum / (np.size(t) * np.size(x))\n", + "\n", + "## The neural network\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## The analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n", + "\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [50,20]\n", + " num_iter = 1000\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " res = np.zeros((Nx, Nt))\n", + " res_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " res[i,j] = g_trial(point,P)\n", + "\n", + " res_analytical[i,j] = g_analytic(point)\n", + "\n", + " diff = np.abs(res - res_analytical)\n", + " print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = res[:,indx1]\n", + " res2 = res[:,indx2]\n", + " res3 = res[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = res_analytical[:,indx1]\n", + " res_analytical2 = res_analytical[:,indx2]\n", + " res_analytical3 = res_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "95dea76f", + "metadata": { + "editable": true + }, + "source": [ + "## Resources on differential equations and deep learning\n", + "\n", + "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n", + "\n", + "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n", + "\n", + "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n", + "\n", + "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week44.ipynb b/doc/LectureNotes/_build/jupyter_execute/week44.ipynb new file mode 100644 index 000000000..9e0c9b8bd --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week44.ipynb @@ -0,0 +1,4983 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "67995f17", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d31bb6a0", + "metadata": { + "editable": true + }, + "source": [ + "# Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **Week 44**" + ] + }, + { + "cell_type": "markdown", + "id": "846f5bd7", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 44\n", + "\n", + "**Material for the lecture Monday October 27, 2025.**\n", + "\n", + "1. Solving differential equations, continuation from last week, first lecture\n", + "\n", + "2. Convolutional Neural Networks, second lecture\n", + "\n", + "3. Readings and Videos:\n", + "\n", + " * These lecture notes at \n", + "\n", + " * For a more in depth discussion on neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n", + "\n", + " * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at . \n", + "\n", + " * Video on Deep Learning at \n", + "\n", + " * Video on Convolutional Neural Networks from MIT at \n", + "\n", + " * Video on CNNs from Stanford at \n", + "\n", + " * Video of lecture October 27 at \n", + "\n", + " * Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "855f98ab", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions on Tuesday and Wednesday\n", + "\n", + "* Main focus is discussion of and work on project 2\n", + "\n", + "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday" + ] + }, + { + "cell_type": "markdown", + "id": "12675cc5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for Lecture Monday October 27" + ] + }, + { + "cell_type": "markdown", + "id": "f714320f", + "metadata": { + "editable": true + }, + "source": [ + "## Solving differential equations with Deep Learning\n", + "\n", + "The Universal Approximation Theorem states that a neural network can\n", + "approximate any function at a single hidden layer along with one input\n", + "and output layer to any given precision.\n", + "\n", + "**Book on solving differential equations with ML methods.**\n", + "\n", + "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n", + "\n", + "**Physics informed neural networks.**\n", + "\n", + "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n", + "\n", + "**Thanks to Kristine Baluka Hein.**\n", + "\n", + "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n", + "A great thanks to Kristine." + ] + }, + { + "cell_type": "markdown", + "id": "ebe354b6", + "metadata": { + "editable": true + }, + "source": [ + "## Ordinary Differential Equations first\n", + "\n", + "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n", + "\n", + "In general, an ordinary differential equation looks like" + ] + }, + { + "cell_type": "markdown", + "id": "f16621c0", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{ode} \\tag{1}\n", + "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b272a0d", + "metadata": { + "editable": true + }, + "source": [ + "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n", + "\n", + "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n", + "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n", + "The equation is referred to as a $n$-th order ODE.\n", + "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n", + "for the solution to be unique." + ] + }, + { + "cell_type": "markdown", + "id": "611b2399", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "Let the trial solution $g_t(x)$ be" + ] + }, + { + "cell_type": "markdown", + "id": "cab2d9fb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fbd68a84", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n", + "of conditions, $N(x,P)$ a neural network with weights and biases\n", + "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n", + "neural network. The role of the function $h_2(x, N(x,P))$, is to\n", + "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n", + "evaluated at the values of $x$ where the given conditions must be\n", + "satisfied. The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n", + "the conditions.\n", + "\n", + "But what about the network $N(x,P)$?\n", + "\n", + "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation." + ] + }, + { + "cell_type": "markdown", + "id": "24929e78", + "metadata": { + "editable": true + }, + "source": [ + "## Minimization process\n", + "\n", + "For the minimization to be defined, we need to have a cost function at hand to minimize.\n", + "\n", + "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n", + "We can choose to consider the mean squared error as the cost function for an input $x$.\n", + "Since we are looking at one input, the cost function is just $f$ squared.\n", + "The cost function $c\\left(x, P \\right)$ can therefore be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "8da0a4d4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3de8b89e", + "metadata": { + "editable": true + }, + "source": [ + "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n", + "the cost function becomes" + ] + }, + { + "cell_type": "markdown", + "id": "1275ce7a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{cost} \\tag{3}\n", + "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a522e0fa", + "metadata": { + "editable": true + }, + "source": [ + "The neural net should then find the parameters $P$ that minimizes the cost function in\n", + "([3](#cost)) for a set of $N$ training samples $x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "8a18955b", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cost function using gradient descent and automatic differentiation\n", + "\n", + "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n", + "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n", + "\n", + "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n", + "Automatic differentiation is a method of finding the derivatives numerically with very high precision." + ] + }, + { + "cell_type": "markdown", + "id": "888808f7", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Exponential decay\n", + "\n", + "An exponential decay of a quantity $g(x)$ is described by the equation" + ] + }, + { + "cell_type": "markdown", + "id": "fcefd7fb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solve_expdec} \\tag{4}\n", + " g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "02cb2ce9", + "metadata": { + "editable": true + }, + "source": [ + "with $g(0) = g_0$ for some chosen initial value $g_0$.\n", + "\n", + "The analytical solution of ([4](#solve_expdec)) is" + ] + }, + { + "cell_type": "markdown", + "id": "bdd9ef4d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "867cbb56", + "metadata": { + "editable": true + }, + "source": [ + "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))." + ] + }, + { + "cell_type": "markdown", + "id": "2f9ac7ae", + "metadata": { + "editable": true + }, + "source": [ + "## The function to solve for\n", + "\n", + "The program will use a neural network to solve" + ] + }, + { + "cell_type": "markdown", + "id": "49a68337", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode} \\tag{6}\n", + "g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a6a70316", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n", + "\n", + "In this example, $\\gamma = 2$ and $g_0 = 10$." + ] + }, + { + "cell_type": "markdown", + "id": "15622597", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be" + ] + }, + { + "cell_type": "markdown", + "id": "3661d5fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "245327b3", + "metadata": { + "editable": true + }, + "source": [ + "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer." + ] + }, + { + "cell_type": "markdown", + "id": "57ae96b2", + "metadata": { + "editable": true + }, + "source": [ + "## Setup of Network\n", + "\n", + "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}}, P_{\\text{output}} \\}$.\n", + "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n", + "\n", + "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n", + "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n", + "\n", + "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n", + "\n", + "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:" + ] + }, + { + "cell_type": "markdown", + "id": "6e7ea73f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{trial} \\tag{7}\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ef84086", + "metadata": { + "editable": true + }, + "source": [ + "## Reformulating the problem\n", + "\n", + "We wish that our neural network manages to minimize a given cost function.\n", + "\n", + "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n", + "such that it describes the problem a neural network can solve for.\n", + "\n", + "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n", + "\n", + "The trial solution" + ] + }, + { + "cell_type": "markdown", + "id": "03980965", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f838bf7c", + "metadata": { + "editable": true + }, + "source": [ + "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "3e1ebb62", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{nnmin} \\tag{8}\n", + "g_t'(x, P) = - \\gamma g_t(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a85dcbea", + "metadata": { + "editable": true + }, + "source": [ + "is fulfilled as *best as possible*." + ] + }, + { + "cell_type": "markdown", + "id": "dc4a2fc0", + "metadata": { + "editable": true + }, + "source": [ + "## More technicalities\n", + "\n", + "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n", + "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n", + "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n", + "\n", + "This gives the following cost function our neural network must solve for:" + ] + }, + { + "cell_type": "markdown", + "id": "20921b20", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "06e89d99", + "metadata": { + "editable": true + }, + "source": [ + "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n", + "\n", + "or, in terms of weights and biases for the hidden and output layer in our network:" + ] + }, + { + "cell_type": "markdown", + "id": "fb4b7d00", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "925d8872", + "metadata": { + "editable": true + }, + "source": [ + "for an input value $x$." + ] + }, + { + "cell_type": "markdown", + "id": "46f38d69", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes" + ] + }, + { + "cell_type": "markdown", + "id": "adca56df", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{min} \\tag{9}\n", + "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9e260216", + "metadata": { + "editable": true + }, + "source": [ + "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes" + ] + }, + { + "cell_type": "markdown", + "id": "7d5e7f63", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P} C(\\boldsymbol{x}, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d7442d44", + "metadata": { + "editable": true + }, + "source": [ + "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n", + "\n", + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "af21673a", + "metadata": { + "editable": true + }, + "source": [ + "## A possible implementation of a neural network\n", + "\n", + "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n", + "\n", + "First, the neural network must feed forward the inputs.\n", + "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n", + "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be $N_{\\text{hidden} }$." + ] + }, + { + "cell_type": "markdown", + "id": "6687f370", + "metadata": { + "editable": true + }, + "source": [ + "## Technicalities\n", + "\n", + "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:" + ] + }, + { + "cell_type": "markdown", + "id": "7c07e210", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "x_j\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7747386f", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities I\n", + "\n", + "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:" + ] + }, + { + "cell_type": "markdown", + "id": "981c5e4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big) \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + " b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "x_1 & x_2 & \\dots & x_N\n", + "\\end{pmatrix} \\\\\n", + "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7eedb1ed", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities II\n", + "\n", + "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n", + "\n", + "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n", + "\n", + "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:" + ] + }, + { + "cell_type": "markdown", + "id": "8507388c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "32c6ce19", + "metadata": { + "editable": true + }, + "source": [ + "It is possible to use other activations functions for the hidden layer also.\n", + "\n", + "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n", + "\n", + "$$\n", + "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big( \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n", + "$$\n", + "\n", + "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n", + "\n", + "The output layer consists of one neuron in this case, and combines the\n", + "output from each of the neurons in the hidden layers. The output layer\n", + "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n", + "and biases $b_i^{\\text{output}}$. In this case,\n", + "it is assumes that the number of neurons in the output layer is one." + ] + }, + { + "cell_type": "markdown", + "id": "d3adb503", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities III\n", + "\n", + "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously." + ] + }, + { + "cell_type": "markdown", + "id": "41fb7d85", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{1,j}^{\\text{output}} & =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "\\boldsymbol{x}_j^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6af6c5f6", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities IV\n", + "\n", + "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:" + ] + }, + { + "cell_type": "markdown", + "id": "bfdfcfe5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}_{1}^{\\text{output}} =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "224fb7a0", + "metadata": { + "editable": true + }, + "source": [ + "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network." + ] + }, + { + "cell_type": "markdown", + "id": "03c8c39e", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation\n", + "\n", + "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n", + "\n", + "The chosen cost function for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "f467feb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "287a0aed", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize the cost function, an optimization method must be chosen.\n", + "\n", + "Here, gradient descent with a constant step size has been chosen." + ] + }, + { + "cell_type": "markdown", + "id": "a49835f1", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent\n", + "\n", + "The idea of the gradient descent algorithm is to update parameters in\n", + "a direction where the cost function decreases goes to a minimum.\n", + "\n", + "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n", + "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n", + "\\boldsymbol{\\omega})$, goes as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "62d6f51d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ca20573", + "metadata": { + "editable": true + }, + "source": [ + "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n", + "\n", + "The value of $\\lambda$ decides how large steps the algorithm must take\n", + "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n", + "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n", + "to the elements in $\\boldsymbol{\\omega}$.\n", + "\n", + "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n", + "respect to the two sets of weights and biases, that is for the hidden\n", + "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n", + "}$ .\n", + "\n", + "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by" + ] + }, + { + "cell_type": "markdown", + "id": "8b16bc94", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P) \\\\\n", + "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a339b3a7", + "metadata": { + "editable": true + }, + "source": [ + "## The code for solving the ODE" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a63e587a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Assuming one input, hidden, and output layer\n", + "def neural_network(params, x):\n", + "\n", + " # Find the weights (including and biases) for the hidden and output layer.\n", + " # Assume that params is a list of parameters for each layer.\n", + " # The biases are the first element for each array in params,\n", + " # and the weights are the remaning elements in each array in params.\n", + "\n", + " w_hidden = params[0]\n", + " w_output = params[1]\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " ## Hidden layer:\n", + "\n", + " # Add a row of ones to include bias\n", + " x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_input)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " ## Output layer:\n", + "\n", + " # Include bias:\n", + " x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_hidden)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial(x,params, g0 = 10):\n", + " return g0 + x*neural_network(params,x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n", + "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n", + " ## Set up initial weights and biases\n", + "\n", + " # For the hidden layer\n", + " p0 = npr.randn(num_neurons_hidden, 2 )\n", + "\n", + " # For the output layer\n", + " p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n", + "\n", + " P = [p0, p1]\n", + "\n", + " print('Initial cost: %g'%cost_function(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of two arrays;\n", + " # one for the gradient w.r.t P_hidden and\n", + " # one for the gradient w.r.t P_output\n", + " cost_grad = cost_function_grad(P, x)\n", + "\n", + " P[0] = P[0] - lmb * cost_grad[0]\n", + " P[1] = P[1] - lmb * cost_grad[1]\n", + "\n", + " print('Final cost: %g'%cost_function(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " # Set seed such that the weight are initialized\n", + " # with same weights and biases for every run.\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = 10\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " # Use the network\n", + " P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " # Print the deviation from the trial solution and true solution\n", + " res = g_trial(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "85985bda", + "metadata": { + "editable": true + }, + "source": [ + "## The network with one input layer, specified number of hidden layers, and one output layer\n", + "\n", + "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n", + "\n", + "The number of neurons within each hidden layer are given as a list of integers in the program below." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "91831f8e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# The neural network with one input layer and one output layer,\n", + "# but with number of hidden layers specified by the user.\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x,params, g0 = 10):\n", + " return g0 + x*deep_neural_network(params, x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The same cost function as before, but calls deep_neural_network instead.\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input and one output layer,\n", + "# but with specified number of hidden layers from the user.\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # The number of elements in the list num_hidden_neurons thus represents\n", + " # the number of hidden layers.\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weights and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = np.array([10,10])\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " res = g_trial_deep(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','dnn'])\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e6de1553", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Population growth\n", + "\n", + "A logistic model of population growth assumes that a population converges toward an equilibrium.\n", + "The population growth can be modeled by" + ] + }, + { + "cell_type": "markdown", + "id": "6e4c5e3a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{log} \\tag{10}\n", + "\tg'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "64a97256", + "metadata": { + "editable": true + }, + "source": [ + "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n", + "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n", + "\n", + "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n", + "and high execution time (this might be more apparent in the examples solving PDEs),\n", + "using a library like TensorFlow is recommended.\n", + "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method." + ] + }, + { + "cell_type": "markdown", + "id": "94bb8aaa", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the problem\n", + "\n", + "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n", + "The population follows the model" + ] + }, + { + "cell_type": "markdown", + "id": "29ead54b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode_population} \\tag{11}\n", + "g'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5685f6e2", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$.\n", + "\n", + "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$." + ] + }, + { + "cell_type": "markdown", + "id": "adaea719", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "We will get a slightly different trial solution, as the boundary conditions are different\n", + "compared to the case for exponential decay.\n", + "\n", + "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n", + "\n", + "$$\n", + "h_1(t) = g_0 + t \\cdot N(t,P)\n", + "$$\n", + "\n", + "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n", + "\n", + "The analytical solution is\n", + "\n", + "$$\n", + "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4ee7e543", + "metadata": { + "editable": true + }, + "source": [ + "## The program using Autograd\n", + "\n", + "The network will be the similar as for the exponential decay example, but with some small modifications for our problem." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e50f4369", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Function to get the parameters.\n", + "# Done such that one can easily change the paramaters after one's liking.\n", + "def get_parameters():\n", + " alpha = 2\n", + " A = 1\n", + " g0 = 1.2\n", + " return alpha, A, g0\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = f(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# The right side of the ODE:\n", + "def f(x, g_trial):\n", + " alpha,A, g0 = get_parameters()\n", + " return alpha*g_trial*(A - g_trial)\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x, params):\n", + " alpha,A, g0 = get_parameters()\n", + " return g0 + x*deep_neural_network(params,x)\n", + "\n", + "# The analytical solution:\n", + "def g_analytic(t):\n", + " alpha,A, g0 = get_parameters()\n", + " return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100, 50, 25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "cf212644", + "metadata": { + "editable": true + }, + "source": [ + "## Using forward Euler to solve the ODE\n", + "\n", + "A straightforward way of solving an ODE numerically, is to use Euler's method.\n", + "\n", + "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n", + "\n", + "$$\n", + "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n", + "$$\n", + "\n", + "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields" + ] + }, + { + "cell_type": "markdown", + "id": "46f2fb77", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + " g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n", + " &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aab2dfa5", + "metadata": { + "editable": true + }, + "source": [ + "along with the condition that $g(0) = g_0$.\n", + "\n", + "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n", + "\n", + "For $i \\geq 1$, we have that" + ] + }, + { + "cell_type": "markdown", + "id": "8eea575e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "t_i &= i\\Delta t \\\\\n", + "&= (i - 1)\\Delta t + \\Delta t \\\\\n", + "&= t_{i-1} + \\Delta t\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b91b116d", + "metadata": { + "editable": true + }, + "source": [ + "Now, if $g_i = g(t_i)$ then" + ] + }, + { + "cell_type": "markdown", + "id": "b438159d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " g_i &= g(t_i) \\\\\n", + " &= g(t_{i-1} + \\Delta t) \\\\\n", + " &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n", + " &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odenum} \\tag{12}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4fcc89b", + "metadata": { + "editable": true + }, + "source": [ + "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n", + "\n", + "Equation ([12](#odenum)) could be implemented in the following way,\n", + "extending the program that uses the network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "98f55b29", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Assume that all function definitions from the example program using Autograd\n", + "# are located here.\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100,50,25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " ## Find an approximation to the funtion using forward Euler\n", + "\n", + " alpha, A, g0 = get_parameters()\n", + " dt = T/(Nt - 1)\n", + "\n", + " # Perform forward Euler to solve the ODE\n", + " g_euler = np.zeros(Nt)\n", + " g_euler[0] = g0\n", + "\n", + " for i in range(1,Nt):\n", + " g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n", + "\n", + " # Print the errors done by each method\n", + " diff1 = np.max(np.abs(g_euler - g_analytical))\n", + " diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n", + "\n", + " print('Max absolute difference between Euler method and analytical: %g'%diff1)\n", + " print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n", + "\n", + " # Plot results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(t,g_euler)\n", + " plt.plot(t,g_analytical)\n", + " plt.plot(t,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['euler','analytical','dnn'])\n", + " plt.xlabel('Time t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a6e8888e", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the one dimensional Poisson equation\n", + "\n", + "The Poisson equation for $g(x)$ in one dimension is" + ] + }, + { + "cell_type": "markdown", + "id": "ac2720d4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{poisson} \\tag{13}\n", + " -g''(x) = f(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65554b02", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function for $x \\in (0,1)$.\n", + "\n", + "The conditions that $g(x)$ is chosen to fulfill, are" + ] + }, + { + "cell_type": "markdown", + "id": "0cdf0586", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g(0) &= 0 \\\\\n", + " g(1) &= 0\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f7e65a6a", + "metadata": { + "editable": true + }, + "source": [ + "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n", + "The results from the networks can then be compared to the analytical solution.\n", + "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "cd827e12", + "metadata": { + "editable": true + }, + "source": [ + "## The specific equation to solve for\n", + "\n", + "Here, the function $g(x)$ to solve for follows the equation" + ] + }, + { + "cell_type": "markdown", + "id": "a6100e41", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "-g''(x) = f(x),\\qquad x \\in (0,1)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "15c06751", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function, along with the chosen conditions" + ] + }, + { + "cell_type": "markdown", + "id": "b2b4dd2f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0) = g(1) = 0\n", + "\\end{aligned}\\label{cond} \\tag{14}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2133aeed", + "metadata": { + "editable": true + }, + "source": [ + "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n", + "\n", + "For this case, a possible trial solution satisfying the conditions could be" + ] + }, + { + "cell_type": "markdown", + "id": "5baf9b4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed82aba2", + "metadata": { + "editable": true + }, + "source": [ + "The analytical solution for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "c9bce69c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g(x) = x(1 - x)\\exp(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ce42c4a8", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the equation using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2fcb9045", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9db2e30e", + "metadata": { + "editable": true + }, + "source": [ + "## Comparing with a numerical scheme\n", + "\n", + "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n", + "\n", + "Using Taylor series, the second derivative can be expressed as\n", + "\n", + "$$\n", + "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n", + "$$\n", + "\n", + "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n", + "\n", + "Looking away from the error terms gives an approximation to the second derivative:" + ] + }, + { + "cell_type": "markdown", + "id": "2cea098e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{approx} \\tag{15}\n", + "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4606d139", + "metadata": { + "editable": true + }, + "source": [ + "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes" + ] + }, + { + "cell_type": "markdown", + "id": "bf52b218", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n", + "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5649b303", + "metadata": { + "editable": true + }, + "source": [ + "Since we know from our problem that" + ] + }, + { + "cell_type": "markdown", + "id": "cabbaeeb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "-g''(x) &= f(x) \\\\\n", + "&= (3x + x^2)\\exp(x)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9116da9a", + "metadata": { + "editable": true + }, + "source": [ + "along with the conditions $g(0) = g(1) = 0$,\n", + "the following scheme can be used to find an approximate solution for $g(x)$ numerically:" + ] + }, + { + "cell_type": "markdown", + "id": "fa0313ed", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n", + " -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odesys} \\tag{16}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4bff256", + "metadata": { + "editable": true + }, + "source": [ + "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n", + "\n", + "The equation can be rewritten into a matrix equation:" + ] + }, + { + "cell_type": "markdown", + "id": "2817b619", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\begin{pmatrix}\n", + "2 & -1 & 0 & \\dots & 0 \\\\\n", + "-1 & 2 & -1 & \\dots & 0 \\\\\n", + "\\vdots & & \\ddots & & \\vdots \\\\\n", + "0 & \\dots & -1 & 2 & -1 \\\\\n", + "0 & \\dots & 0 & -1 & 2\\\\\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "g_1 \\\\\n", + "g_2 \\\\\n", + "\\vdots \\\\\n", + "g_{N_x - 3} \\\\\n", + "g_{N_x - 2}\n", + "\\end{pmatrix}\n", + "&=\n", + "\\Delta x^2\n", + "\\begin{pmatrix}\n", + "f(x_1) \\\\\n", + "f(x_2) \\\\\n", + "\\vdots \\\\\n", + "f(x_{N_x - 3}) \\\\\n", + "f(x_{N_x - 2})\n", + "\\end{pmatrix} \\\\\n", + "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5130b233", + "metadata": { + "editable": true + }, + "source": [ + "which makes it possible to solve for the vector $\\boldsymbol{g}$." + ] + }, + { + "cell_type": "markdown", + "id": "18a4fdda", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the code\n", + "\n", + "We can then compare the result from this numerical scheme with the output from our network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "3cff184d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + "\n", + " ## Perform the computation using the numerical scheme\n", + "\n", + " dx = 1/(Nx - 1)\n", + "\n", + " # Set up the matrix A\n", + " A = np.zeros((Nx-2,Nx-2))\n", + "\n", + " A[0,0] = 2\n", + " A[0,1] = -1\n", + "\n", + " for i in range(1,Nx-3):\n", + " A[i,i-1] = -1\n", + " A[i,i] = 2\n", + " A[i,i+1] = -1\n", + "\n", + " A[Nx - 3, Nx - 4] = -1\n", + " A[Nx - 3, Nx - 3] = 2\n", + "\n", + " # Set up the vector f\n", + " f_vec = dx**2 * f(x[1:-1])\n", + "\n", + " # Solve the equation\n", + " g_res = np.linalg.solve(A,f_vec)\n", + "\n", + " g_vec = np.zeros(Nx)\n", + " g_vec[1:-1] = g_res\n", + "\n", + " # Print the differences between each method\n", + " max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " max_diff2 = np.max(np.abs(g_vec - g_analytical))\n", + " print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n", + " print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(x,g_vec)\n", + " plt.plot(x,g_analytical)\n", + " plt.plot(x,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['numerical scheme','analytical','dnn'])\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "89115be0", + "metadata": { + "editable": true + }, + "source": [ + "## Partial Differential Equations\n", + "\n", + "A partial differential equation (PDE) has a solution here the function\n", + "is defined by multiple variables. The equation may involve all kinds\n", + "of combinations of which variables the function is differentiated with\n", + "respect to.\n", + "\n", + "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "c43a6341", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{PDE} \\tag{17}\n", + " f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "218a7a68", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given." + ] + }, + { + "cell_type": "markdown", + "id": "902f8f61", + "metadata": { + "editable": true + }, + "source": [ + "## Type of problem\n", + "\n", + "The problem our network must solve for, is similar to the ODE case.\n", + "We must have a trial solution $g_t$ at hand.\n", + "\n", + "For instance, the trial solution could be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "1c2bbcbd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "73f5bf7b", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n", + "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n", + "\n", + "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions." + ] + }, + { + "cell_type": "markdown", + "id": "dbb4ece5", + "metadata": { + "editable": true + }, + "source": [ + "## Network requirements\n", + "\n", + "The network tries then the minimize the cost function following the\n", + "same ideas as described for the ODE case, but now with more than one\n", + "variables to consider. The concept still remains the same; find a set\n", + "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n", + "close to zero as possible.\n", + "\n", + "As for the ODE case, the cost function is the mean squared error that\n", + "the network must try to minimize. The cost function for the network to\n", + "minimize is" + ] + }, + { + "cell_type": "markdown", + "id": "d01d3943", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x_1, \\dots, x_N, P\\right) = \\left( f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6514db22", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:" + ] + }, + { + "cell_type": "markdown", + "id": "5a0ed10c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "200fc78c", + "metadata": { + "editable": true + }, + "source": [ + "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into" + ] + }, + { + "cell_type": "markdown", + "id": "0c87647d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6484a267", + "metadata": { + "editable": true + }, + "source": [ + "## Example: The diffusion equation\n", + "\n", + "In one spatial dimension, the equation reads" + ] + }, + { + "cell_type": "markdown", + "id": "2c2a2467", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6df58357", + "metadata": { + "editable": true + }, + "source": [ + "where a possible choice of conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "13d9c7f6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "627708ec", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x)$ being some given function." + ] + }, + { + "cell_type": "markdown", + "id": "43cdd945", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the problem\n", + "\n", + "For this case, we want to find $g(x,t)$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "ccdcb67e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation} \\label{diffonedim} \\tag{18}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ebe711f8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2174f30f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "083ed2ff", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x) = \\sin(\\pi x)$.\n", + "\n", + "First, let us set up the deep neural network.\n", + "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n", + "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions." + ] + }, + { + "cell_type": "markdown", + "id": "cf5e3f46", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd\n", + "\n", + "The only change to do here, is to extend our network such that\n", + "functions of multiple parameters are correctly handled. In this case\n", + "we have two variables in our function to solve for, that is time $t$\n", + "and position $x$. The variables will be represented by a\n", + "one-dimensional array in the program. The program will evaluate the\n", + "network at each possible pair $(x,t)$, given an array for the desired\n", + "$x$-values and $t$-values to approximate the solution at." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4fee106b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]" + ] + }, + { + "cell_type": "markdown", + "id": "63e5fb7e", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The trial solution\n", + "\n", + "The cost function must then iterate through the given arrays\n", + "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n", + "neural network and the trial solution is evaluated at, and then finds\n", + "the Jacobian of the trial solution.\n", + "\n", + "A possible trial solution for this PDE is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n", + "$$\n", + "\n", + "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n", + "\n", + "To fulfill the conditions, $h_1(x,t)$ could be:\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n", + "$$\n", + "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "50cfea81", + "metadata": { + "editable": true + }, + "source": [ + "## Why the Jacobian?\n", + "\n", + "The Jacobian is used because the program must find the derivative of\n", + "the trial solution with respect to $x$ and $t$.\n", + "\n", + "This gives the necessity of computing the Jacobian matrix, as we want\n", + "to evaluate the gradient with respect to $x$ and $t$ (note that the\n", + "Jacobian of a scalar-valued multivariate function is simply its\n", + "gradient).\n", + "\n", + "In Autograd, the differentiation is by default done with respect to\n", + "the first input argument of your Python function. Since the points is\n", + "an array representing $x$ and $t$, the Jacobian is calculated using\n", + "the values of $x$ and $t$.\n", + "\n", + "To find the second derivative with respect to $x$ and $t$, the\n", + "Jacobian can be found for the second time. The result is a Hessian\n", + "matrix, which is the matrix containing all the possible second order\n", + "mixed derivatives of $g(x,t)$." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "309808f6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum" + ] + }, + { + "cell_type": "markdown", + "id": "9880d94c", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The full program\n", + "\n", + "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n", + "\n", + "The analytical solution of our problem is\n", + "\n", + "$$\n", + "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n", + "$$\n", + "\n", + "A possible way to implement a neural network solving the PDE, is given below.\n", + "Be aware, though, that it is fairly slow for the parameters used.\n", + "A better result is possible, but requires more iterations, and thus longer time to complete.\n", + "\n", + "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n", + "Using TensorFlow results in a much better execution time. Try it!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fcd284e3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import jacobian,hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the network\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## Define the trial solution and cost function\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum /( np.size(x)*np.size(t) )\n", + "\n", + "## For comparison, define the analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n", + "\n", + "## Set up a function for training the network to solve for the equation\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [100, 25]\n", + " num_iter = 250\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " g_dnn_ag = np.zeros((Nx, Nt))\n", + " G_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " g_dnn_ag[i,j] = g_trial(point,P)\n", + "\n", + " G_analytical[i,j] = g_analytic(point)\n", + "\n", + " # Find the map difference between the analytical and the computed solution\n", + " diff_ag = np.abs(g_dnn_ag - G_analytical)\n", + " print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = g_dnn_ag[:,indx1]\n", + " res2 = g_dnn_ag[:,indx2]\n", + " res3 = g_dnn_ag[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = G_analytical[:,indx1]\n", + " res_analytical2 = G_analytical[:,indx2]\n", + " res_analytical3 = G_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "51ff4964", + "metadata": { + "editable": true + }, + "source": [ + "## Resources on differential equations and deep learning\n", + "\n", + "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n", + "\n", + "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n", + "\n", + "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n", + "\n", + "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)" + ] + }, + { + "cell_type": "markdown", + "id": "f7c3b9fc", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Networks (recognizing images)\n", + "\n", + "Convolutional neural networks (CNNs) were developed during the last\n", + "decade of the previous century, with a focus on character recognition\n", + "tasks. Nowadays, CNNs are a central element in the spectacular success\n", + "of deep learning methods. The success in for example image\n", + "classifications have made them a central tool for most machine\n", + "learning practitioners.\n", + "\n", + "CNNs are very similar to ordinary Neural Networks.\n", + "They are made up of neurons that have learnable weights and\n", + "biases. Each neuron receives some inputs, performs a dot product and\n", + "optionally follows it with a non-linearity. The whole network still\n", + "expresses a single differentiable score function: from the raw image\n", + "pixels on one end to class scores at the other. And they still have a\n", + "loss function (for example Softmax) on the last (fully-connected) layer\n", + "and all the tips/tricks we developed for learning regular Neural\n", + "Networks still apply (back propagation, gradient descent etc etc)." + ] + }, + { + "cell_type": "markdown", + "id": "5d3a5ee8", + "metadata": { + "editable": true + }, + "source": [ + "## What is the Difference\n", + "\n", + "**CNN architectures make the explicit assumption that\n", + "the inputs are images, which allows us to encode certain properties\n", + "into the architecture. These then make the forward function more\n", + "efficient to implement and vastly reduce the amount of parameters in\n", + "the network.**" + ] + }, + { + "cell_type": "markdown", + "id": "e8618fc8", + "metadata": { + "editable": true + }, + "source": [ + "## Neural Networks vs CNNs\n", + "\n", + "Neural networks are defined as **affine transformations**, that is \n", + "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n", + "output (to which a bias vector is usually added before passing the result\n", + "through a nonlinear activation function). This is applicable to any type of input, be it an\n", + "image, a sound clip or an unordered collection of features: whatever their\n", + "dimensionality, their representation can always be flattened into a vector\n", + "before the transformation." + ] + }, + { + "cell_type": "markdown", + "id": "b41b4781", + "metadata": { + "editable": true + }, + "source": [ + "## Why CNNS for images, sound files, medical images from CT scans etc?\n", + "\n", + "However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic\n", + "structure. More formally, they share these important properties:\n", + "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n", + "\n", + "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n", + "\n", + "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n", + "\n", + "These properties are not exploited when an affine transformation is applied; in\n", + "fact, all the axes are treated in the same way and the topological information\n", + "is not taken into account. Still, taking advantage of the implicit structure of\n", + "the data may prove very handy in solving some tasks, like computer vision and\n", + "speech recognition, and in these cases it would be best to preserve it. This is\n", + "where discrete convolutions come into play.\n", + "\n", + "A discrete convolution is a linear transformation that preserves this notion of\n", + "ordering. It is sparse (only a few input units contribute to a given output\n", + "unit) and reuses parameters (the same weights are applied to multiple locations\n", + "in the input)." + ] + }, + { + "cell_type": "markdown", + "id": "33bf8922", + "metadata": { + "editable": true + }, + "source": [ + "## Regular NNs don’t scale well to full images\n", + "\n", + "As an example, consider\n", + "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n", + "single fully-connected neuron in a first hidden layer of a regular\n", + "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n", + "seems manageable, but clearly this fully-connected structure does not\n", + "scale to larger images. For example, an image of more respectable\n", + "size, say $200\\times 200\\times 3$, would lead to neurons that have \n", + "$200\\times 200\\times 3 = 120,000$ weights. \n", + "\n", + "We could have\n", + "several such neurons, and the parameters would add up quickly! Clearly,\n", + "this full connectivity is wasteful and the huge number of parameters\n", + "would quickly lead to possible overfitting.\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A regular 3-layer Neural Network.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "95c20234", + "metadata": { + "editable": true + }, + "source": [ + "## 3D volumes of neurons\n", + "\n", + "Convolutional Neural Networks take advantage of the fact that the\n", + "input consists of images and they constrain the architecture in a more\n", + "sensible way. \n", + "\n", + "In particular, unlike a regular Neural Network, the\n", + "layers of a CNN have neurons arranged in 3 dimensions: width,\n", + "height, depth. (Note that the word depth here refers to the third\n", + "dimension of an activation volume, not to the depth of a full Neural\n", + "Network, which can refer to the total number of layers in a network.)\n", + "\n", + "To understand it better, the above example of an image \n", + "with an input volume of\n", + "activations has dimensions $32\\times 32\\times 3$ (width, height,\n", + "depth respectively). \n", + "\n", + "The neurons in a layer will\n", + "only be connected to a small region of the layer before it, instead of\n", + "all of the neurons in a fully-connected manner. Moreover, the final\n", + "output layer could for this specific image have dimensions $1\\times 1 \\times 10$, \n", + "because by the\n", + "end of the CNN architecture we will reduce the full image into a\n", + "single vector of class scores, arranged along the depth\n", + "dimension. \n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "2b7ba652", + "metadata": { + "editable": true + }, + "source": [ + "## More on Dimensionalities\n", + "\n", + "In fields like signal processing (and imaging as well), one designs\n", + "so-called filters. These filters are defined by the convolutions and\n", + "are often hand-crafted. One may specify filters for smoothing, edge\n", + "detection, frequency reshaping, and similar operations. However with\n", + "neural networks the idea is to automatically learn the filters and use\n", + "many of them in conjunction with non-linear operations (activation\n", + "functions).\n", + "\n", + "As an example consider a neural network operating on sound sequence\n", + "data. Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$. We\n", + "construct then a neural network with onle hidden layer only with\n", + "$10^4$ nodes. This means that we will have a weight matrix with\n", + "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n", + "\n", + "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n", + "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "b6a7ae46", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0d56b05e", + "metadata": { + "editable": true + }, + "source": [ + "that is ten billion parameters to determine." + ] + }, + { + "cell_type": "markdown", + "id": "35c90423", + "metadata": { + "editable": true + }, + "source": [ + "## Further remarks\n", + "\n", + "The main principles that justify convolutions is locality of\n", + "information and repetion of patterns within the signal. Sound samples\n", + "of the input in adjacent spots are much more likely to affect each\n", + "other than those that are very far away. Similarly, sounds are\n", + "repeated in multiple times in the signal. While slightly simplistic,\n", + "reasoning about such a sound example demonstrates this. The same\n", + "principles then apply to images and other similar data." + ] + }, + { + "cell_type": "markdown", + "id": "d08d4fb6", + "metadata": { + "editable": true + }, + "source": [ + "## Layers used to build CNNs\n", + "\n", + "A simple CNN is a sequence of layers, and every layer of a CNN\n", + "transforms one volume of activations to another through a\n", + "differentiable function. We use three main types of layers to build\n", + "CNN architectures: Convolutional Layer, Pooling Layer, and\n", + "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n", + "will stack these layers to form a full CNN architecture.\n", + "\n", + "A simple CNN for image classification could have the architecture:\n", + "\n", + "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n", + "\n", + "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n", + "\n", + "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n", + "\n", + "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n", + "\n", + "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume." + ] + }, + { + "cell_type": "markdown", + "id": "dd95dcc6", + "metadata": { + "editable": true + }, + "source": [ + "## Transforming images\n", + "\n", + "CNNs transform the original image layer by layer from the original\n", + "pixel values to the final class scores. \n", + "\n", + "Observe that some layers contain\n", + "parameters and other don’t. In particular, the CNN layers perform\n", + "transformations that are a function of not only the activations in the\n", + "input volume, but also of the parameters (the weights and biases of\n", + "the neurons). On the other hand, the RELU/POOL layers will implement a\n", + "fixed function. The parameters in the CONV/FC layers will be trained\n", + "with gradient descent so that the class scores that the CNN computes\n", + "are consistent with the labels in the training set for each image." + ] + }, + { + "cell_type": "markdown", + "id": "5fdbdbfd", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in brief\n", + "\n", + "In summary:\n", + "\n", + "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n", + "\n", + "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n", + "\n", + "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n", + "\n", + "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n", + "\n", + "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)" + ] + }, + { + "cell_type": "markdown", + "id": "c0cbb6b0", + "metadata": { + "editable": true + }, + "source": [ + "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "caf2418d", + "metadata": { + "editable": true + }, + "source": [ + "## Key Idea\n", + "\n", + "A dense neural network is representd by an affine operation (like\n", + "matrix-matrix multiplication) where all parameters are included.\n", + "\n", + "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n", + "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n", + "\n", + "We say we perform a filtering (convolution is the mathematical operation)." + ] + }, + { + "cell_type": "markdown", + "id": "7d5552d8", + "metadata": { + "editable": true + }, + "source": [ + "## How to do image compression before the era of deep learning\n", + "\n", + "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n", + "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n", + "\n", + "The orthogonal vectors which are obtained from the SVD, can be used to\n", + "project down the dimensionality of a given image. In the example here\n", + "we gray-scale an image and downsize it.\n", + "\n", + "This recipe relies on us being able to actually perform the SVD. For\n", + "large images, and in particular with many images to reconstruct, using the SVD \n", + "may quickly become an overwhelming task. With the advent of efficient deep\n", + "learning methods like CNNs and later generative methods, these methods\n", + "have become in the last years the premier way of performing image\n", + "analysis. In particular for classification problems with labelled images." + ] + }, + { + "cell_type": "markdown", + "id": "d0bc0489", + "metadata": { + "editable": true + }, + "source": [ + "## The SVD example" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cec697e6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from matplotlib.image import imread\n", + "import matplotlib.pyplot as plt\n", + "import scipy.linalg as ln\n", + "import numpy as np\n", + "import os\n", + "from PIL import Image\n", + "from math import log10, sqrt \n", + "plt.rcParams['figure.figsize'] = [16, 8]\n", + "# Import image\n", + "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n", + "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n", + "img = plt.imshow(X)\n", + "# convert to gray\n", + "img.set_cmap('gray')\n", + "plt.axis('off')\n", + "plt.show()\n", + "# Call image size\n", + "print(': %s'%str(X.shape))\n", + "\n", + "\n", + "# split the matrix into U, S, VT\n", + "U, S, VT = np.linalg.svd(X,full_matrices=False)\n", + "S = np.diag(S)\n", + "m = 800 # Image's width\n", + "n = 1200 # Image's height\n", + "j = 0\n", + "# Try compression with different k vectors (these represent projections):\n", + "for k in (5,10, 20, 100,200,400,500):\n", + " # Original size of the image\n", + " originalSize = m * n \n", + " # Size after compressed\n", + " compressedSize = k * (1 + m + n) \n", + " # The projection of the original image\n", + " Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n", + " plt.figure(j+1)\n", + " j += 1\n", + " img = plt.imshow(Xapprox)\n", + " img.set_cmap('gray')\n", + " \n", + " plt.axis('off')\n", + " plt.title('k = ' + str(k))\n", + " plt.show() \n", + " print('Original size of image:')\n", + " print(originalSize)\n", + " print('Compression rate as Compressed image / Original size:')\n", + " ratio = compressedSize * 1.0 / originalSize\n", + " print(ratio)\n", + " print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' ) \n", + " # Estimate MQA\n", + " x= X.astype(\"float\")\n", + " y=Xapprox.astype(\"float\")\n", + " err = np.sum((x - y) ** 2)\n", + " err /= float(X.shape[0] * Xapprox.shape[1])\n", + " print('The mean-square deviation '+ str(round( err)))\n", + " max_pixel = 255.0\n", + " # Estimate Signal Noise Ratio\n", + " srv = 20 * (log10(max_pixel / sqrt(err)))\n", + " print('Signa to noise ratio '+ str(round(srv)) +'dB')" + ] + }, + { + "cell_type": "markdown", + "id": "6a578704", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of CNNs\n", + "\n", + "The mathematics of CNNs is based on the mathematical operation of\n", + "**convolution**. In mathematics (in particular in functional analysis),\n", + "convolution is represented by mathematical operations (integration,\n", + "summation etc) on two functions in order to produce a third function\n", + "that expresses how the shape of one gets modified by the other.\n", + "Convolution has a plethora of applications in a variety of\n", + "disciplines, spanning from statistics to signal processing, computer\n", + "vision, solutions of differential equations,linear algebra,\n", + "engineering, and yes, machine learning.\n", + "\n", + "Mathematically, convolution is defined as follows (one-dimensional example):\n", + "Let us define a continuous function $y(t)$ given by" + ] + }, + { + "cell_type": "markdown", + "id": "5c858d52", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\int x(a) w(t-a) da,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a96333c3", + "metadata": { + "editable": true + }, + "source": [ + "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n", + "\n", + "The above integral is written in a more compact form as" + ] + }, + { + "cell_type": "markdown", + "id": "9834d45e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\left(x * w\\right)(t).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "13e15c5f", + "metadata": { + "editable": true + }, + "source": [ + "The discretized version reads" + ] + }, + { + "cell_type": "markdown", + "id": "0a496b2f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "48c5ecd3", + "metadata": { + "editable": true + }, + "source": [ + "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n", + "\n", + "How can we use this? And what does it mean? Let us study some familiar examples first." + ] + }, + { + "cell_type": "markdown", + "id": "7cab11e7", + "metadata": { + "editable": true + }, + "source": [ + "## Convolution Examples: Polynomial multiplication\n", + "\n", + "Our first example is that of a multiplication between two polynomials,\n", + "which we will rewrite in terms of the mathematics of convolution. In\n", + "the final stage, since the problem here is a discrete one, we will\n", + "recast the final expression in terms of a matrix-vector\n", + "multiplication, where the matrix is a so-called [Toeplitz matrix\n", + "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n", + "\n", + "Let us look a the following polynomials to second and third order, respectively:" + ] + }, + { + "cell_type": "markdown", + "id": "c90333f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c1b0c9b", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9c8df6e8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "50667dfa", + "metadata": { + "editable": true + }, + "source": [ + "The polynomial multiplication gives us a new polynomial of degree $5$" + ] + }, + { + "cell_type": "markdown", + "id": "11f2ea4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4abea758", + "metadata": { + "editable": true + }, + "source": [ + "## Efficient Polynomial Multiplication\n", + "\n", + "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n", + "We note first that the new coefficients are given as" + ] + }, + { + "cell_type": "markdown", + "id": "ad22b2d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{split}\n", + "\\delta_0=&\\alpha_0\\beta_0\\\\\n", + "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n", + "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n", + "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n", + "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n", + "\\delta_5=&\\alpha_2\\beta_3.\\\\\n", + "\\end{split}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a3ee064", + "metadata": { + "editable": true + }, + "source": [ + "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n", + "\n", + "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as" + ] + }, + { + "cell_type": "markdown", + "id": "3aca65d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0e04ce27", + "metadata": { + "editable": true + }, + "source": [ + "or as a double sum with restriction $l=i+j$" + ] + }, + { + "cell_type": "markdown", + "id": "173eda29", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a196c2cd", + "metadata": { + "editable": true + }, + "source": [ + "## Further simplification\n", + "\n", + "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as" + ] + }, + { + "cell_type": "markdown", + "id": "56018bb8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba91ab7b", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of\n", + "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation." + ] + }, + { + "cell_type": "markdown", + "id": "1b25324b", + "metadata": { + "editable": true + }, + "source": [ + "## A more efficient way of coding the above Convolution\n", + "\n", + "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n", + "which are non-zero, we can rewrite the above convolution expressions\n", + "as a matrix-vector multiplication" + ] + }, + { + "cell_type": "markdown", + "id": "dd6d9155", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n", + " \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n", + "\t\t\t \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n", + "\t\t\t 0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n", + "\t\t\t 0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n", + "\t\t\t 0 & 0 & 0 & \\alpha_2\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "28050537", + "metadata": { + "editable": true + }, + "source": [ + "## Commutative process\n", + "\n", + "The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding $\\beta$ and a vector holding $\\alpha$.\n", + "In this case we have" + ] + }, + { + "cell_type": "markdown", + "id": "f8278af4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0 \\\\\n", + " \\beta_1 & \\beta_0 & 0 \\\\\n", + "\t\t\t \\beta_2 & \\beta_1 & \\beta_0 \\\\\n", + "\t\t\t \\beta_3 & \\beta_2 & \\beta_1 \\\\\n", + "\t\t\t 0 & \\beta_3 & \\beta_2 \\\\\n", + "\t\t\t 0 & 0 & \\beta_3\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfa8bf9e", + "metadata": { + "editable": true + }, + "source": [ + "Note that the use of these matrices is for mathematical purposes only\n", + "and not implementation purposes. When implementing the above equation\n", + "we do not encode (and allocate memory) the matrices explicitely. We\n", + "rather code the convolutions in the minimal memory footprint that they\n", + "require." + ] + }, + { + "cell_type": "markdown", + "id": "4ad971ca", + "metadata": { + "editable": true + }, + "source": [ + "## Toeplitz matrices\n", + "\n", + "The above matrices are examples of so-called [Toeplitz\n", + "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n", + "Toeplitz matrix is a matrix in which each descending diagonal from\n", + "left to right is constant. For instance the last matrix, which we\n", + "rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "ff12250a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0 \\\\\n", + " a_1 & a_0 & 0 \\\\\n", + "\t\t\t a_2 & a_1 & a_0 \\\\\n", + "\t\t\t a_3 & a_2 & a_1 \\\\\n", + "\t\t\t 0 & a_3 & a_2 \\\\\n", + "\t\t\t 0 & 0 & a_3\n", + "\t\t\t \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d7ebe2e9", + "metadata": { + "editable": true + }, + "source": [ + "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n", + "matrix. Such a matrix does not need to be a square matrix. Toeplitz\n", + "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n", + "polynomial, compressed to a finite-dimensional space, can be\n", + "represented by such a matrix. The example above shows that we can\n", + "represent linear convolution as multiplication of a Toeplitz matrix by\n", + "a vector." + ] + }, + { + "cell_type": "markdown", + "id": "5bfa6cd4", + "metadata": { + "editable": true + }, + "source": [ + "## Fourier series and Toeplitz matrices\n", + "\n", + "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n", + "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n", + "\n", + "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)" + ] + }, + { + "cell_type": "markdown", + "id": "4cb64d8c", + "metadata": { + "editable": true + }, + "source": [ + "## Generalizing the above one-dimensional case\n", + "\n", + "In order to align the above simple case with the more general\n", + "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n", + "with $\\boldsymbol{w}$. We will interpret $\\boldsymbol{w}$ as a weight/filter function\n", + "with which we want to perform the convolution with an input variable\n", + "$\\boldsymbol{x}$ of length $n$. We will assume always that the filter\n", + "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n", + "\n", + "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have" + ] + }, + { + "cell_type": "markdown", + "id": "b05f94fc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e95bb8b8", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n", + "Here the symbol $*$ represents the mathematical operation of convolution." + ] + }, + { + "cell_type": "markdown", + "id": "490b28d9", + "metadata": { + "editable": true + }, + "source": [ + "## Memory considerations\n", + "\n", + "This expression leaves us however with some terms with negative\n", + "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n", + "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n", + "\n", + "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n", + "represent a third-order polynomial.\n", + "\n", + "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n", + "contains the coefficients of a fifth-order polynomial. When $i=5$ we\n", + "may also have values of $x(4)$ and $x(5)$ which are not defined." + ] + }, + { + "cell_type": "markdown", + "id": "73dba37b", + "metadata": { + "editable": true + }, + "source": [ + "## Padding\n", + "\n", + "The solution to this is what is called **padding**! We simply define a\n", + "new vector $x$ with two added elements set to zero before $x(0)$ and\n", + "two new elements after $x(3)$ set to zero. That is, we augment the\n", + "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n", + "constant (a new hyperparameter), see discussions below as well." + ] + }, + { + "cell_type": "markdown", + "id": "a4ef9cfb", + "metadata": { + "editable": true + }, + "source": [ + "## New vector\n", + "\n", + "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n", + "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n", + "$x(6)=0$, and $x(7)=0$.\n", + "\n", + "We have added four new elements, which\n", + "are all zero. The benefit is that we can rewrite the equation for\n", + "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$," + ] + }, + { + "cell_type": "markdown", + "id": "a3df037d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a10c95fd", + "metadata": { + "editable": true + }, + "source": [ + "As an example, we have" + ] + }, + { + "cell_type": "markdown", + "id": "be674b8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c903130e", + "metadata": { + "editable": true + }, + "source": [ + "as before except that we have an additional term $x(6)w(0)$, which is zero.\n", + "\n", + "Similarly, for the fifth-order term we have" + ] + }, + { + "cell_type": "markdown", + "id": "369fb648", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9eae3982", + "metadata": { + "editable": true + }, + "source": [ + "The zeroth-order term is" + ] + }, + { + "cell_type": "markdown", + "id": "52147ec0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f26b1f24", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting as dot products\n", + "\n", + "If we now flip the filter/weight vector, with the following term as a typical example" + ] + }, + { + "cell_type": "markdown", + "id": "1cda7b7e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "de80daa7", + "metadata": { + "editable": true + }, + "source": [ + "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n", + "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n", + "\n", + "The padding $P$ we have introduced for the convolution stage is just\n", + "another hyperparameter which is introduced as part of the\n", + "architecture. Similarly, below we will also introduce another\n", + "hyperparameter called **Stride** $S$." + ] + }, + { + "cell_type": "markdown", + "id": "bdb16a64", + "metadata": { + "editable": true + }, + "source": [ + "## Cross correlation\n", + "\n", + "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n", + "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)" + ] + }, + { + "cell_type": "markdown", + "id": "a88a1043", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "03659d77", + "metadata": { + "editable": true + }, + "source": [ + "we have now" + ] + }, + { + "cell_type": "markdown", + "id": "532e84de", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0487f1f5", + "metadata": { + "editable": true + }, + "source": [ + "Both TensorFlow and PyTorch (as well as our own code example below),\n", + "implement the last equation, although it is normally referred to as\n", + "convolution. The same padding rules and stride rules discussed below\n", + "apply to this expression as well.\n", + "\n", + "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression." + ] + }, + { + "cell_type": "markdown", + "id": "98475dfa", + "metadata": { + "editable": true + }, + "source": [ + "## Two-dimensional objects\n", + "\n", + "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n", + "We often use convolutions over more than one dimension at a time. If\n", + "we have a two-dimensional image $X$ as input, we can have a **filter**\n", + "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$" + ] + }, + { + "cell_type": "markdown", + "id": "1cb3be71", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd9cd9fb", + "metadata": { + "editable": true + }, + "source": [ + "Convolution is a commutative process, which means we can rewrite this equation as" + ] + }, + { + "cell_type": "markdown", + "id": "1ba314a8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9fb3fef", + "metadata": { + "editable": true + }, + "source": [ + "Normally the latter is more straightforward to implement in a machine\n", + "larning library since there is less variation in the range of values\n", + "of $m$ and $n$.\n", + "\n", + "As mentioned above, most deep learning libraries implement\n", + "cross-correlation instead of convolution (although it is referred to as\n", + "convolution)" + ] + }, + { + "cell_type": "markdown", + "id": "2d48086b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2a62fbae", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in more detail, simple example\n", + "\n", + "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n", + "and a $2\\times 2$ filter $W$ given by the following matrices" + ] + }, + { + "cell_type": "markdown", + "id": "0176ecc6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02} \\\\\n", + " x_{10} & x_{11} & x_{12} \\\\\n", + "\t x_{20} & x_{21} & x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f87b6051", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "164502cc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n", + "\t w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1d4e61fe", + "metadata": { + "editable": true + }, + "source": [ + "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n", + "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n", + "\n", + "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n", + "\n", + "Here we perform the operation" + ] + }, + { + "cell_type": "markdown", + "id": "7aae890d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "352ba109", + "metadata": { + "editable": true + }, + "source": [ + "and obtain" + ] + }, + { + "cell_type": "markdown", + "id": "4660c16f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\\\\n", + "\t x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "edb9d39b", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n", + "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as" + ] + }, + { + "cell_type": "markdown", + "id": "11470079", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9b505f16", + "metadata": { + "editable": true + }, + "source": [ + "and the new matrix" + ] + }, + { + "cell_type": "markdown", + "id": "30c903b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n", + " 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n", + "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\\\\n", + " 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "057a5e31", + "metadata": { + "editable": true + }, + "source": [ + "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "e5f35917", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c7cca5e", + "metadata": { + "editable": true + }, + "source": [ + "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\\times 2$ output matrix." + ] + }, + { + "cell_type": "markdown", + "id": "ed8782fc", + "metadata": { + "editable": true + }, + "source": [ + "## The convolution stage\n", + "\n", + "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n", + "order to reduce the dimensionality of an image, adds, in addition to\n", + "the weights and biases (to be trained by the back propagation\n", + "algorithm) that define the filters, two new hyperparameters, the so-called\n", + "**padding** $P$ and the stride $S$." + ] + }, + { + "cell_type": "markdown", + "id": "3582873f", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the number of parameters\n", + "\n", + "In the above example we have an input matrix of dimension $3\\times\n", + "3$. In general we call the input for an input volume and it is defined\n", + "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n", + "standard three color channels $D_1=3$.\n", + "\n", + "The above example has $W_1=H_1=3$ and $D_1=1$.\n", + "\n", + "When we introduce the filter we have the following additional hyperparameters\n", + "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n", + "\n", + "2. $F$ as the filter's spatial extent\n", + "\n", + "3. $S$ as the stride parameter\n", + "\n", + "4. $P$ as the padding parameter\n", + "\n", + "These parameters are defined by the architecture of the network and are not included in the training." + ] + }, + { + "cell_type": "markdown", + "id": "c06b2b85", + "metadata": { + "editable": true + }, + "source": [ + "## New image (or volume)\n", + "\n", + "Acting with the filter on the input volume produces an output volume\n", + "which is defined by its width $W_2$, its height $H_2$ and its depth\n", + "$D_2$.\n", + "\n", + "These are defined by the following relations" + ] + }, + { + "cell_type": "markdown", + "id": "aa9ff748", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0508533e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b59d0d6", + "metadata": { + "editable": true + }, + "source": [ + "and $D_2=K$." + ] + }, + { + "cell_type": "markdown", + "id": "e283f13b", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters to train, common settings\n", + "\n", + "With parameter sharing, the convolution involves thus for each filter $F\\times F\\times D_1$ weights plus one bias parameter.\n", + "\n", + "In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "59617fcb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f406e197", + "metadata": { + "editable": true + }, + "source": [ + "parameters to train by back propagation.\n", + "\n", + "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n", + "\n", + "**Common settings.**\n", + "\n", + "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n", + "\n", + "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n", + "\n", + "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n", + "\n", + "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$" + ] + }, + { + "cell_type": "markdown", + "id": "82febfb4", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of CNN setups\n", + "\n", + "Let us assume we have an input volume $V$ given by an image of dimensionality\n", + "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n", + "\n", + "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n", + "\n", + "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n", + "of dimensionality $28\\times 28\\times 3$.\n", + "\n", + "The total number of parameters to train for each filter is then\n", + "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n", + "gives us $76$ parameters for each filter, leading to a total of $760$\n", + "parameters for the ten filters.\n", + "\n", + "How many parameters will a filter of dimensionality $3\\times 3$\n", + "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n", + "\n", + "Note that strides constitute a form of **subsampling**. As an alternative to\n", + "being interpreted as a measure of how much the kernel/filter is translated, strides\n", + "can also be viewed as how much of the output is retained. For instance, moving\n", + "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n", + "retaining only odd output elements." + ] + }, + { + "cell_type": "markdown", + "id": "638e063c", + "metadata": { + "editable": true + }, + "source": [ + "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d182de4b", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling\n", + "\n", + "In addition to discrete convolutions themselves, **pooling** operations\n", + "make up another important building block in CNNs. Pooling operations reduce\n", + "the size of feature maps by using some function to summarize subregions, such\n", + "as taking the average or the maximum value.\n", + "\n", + "Pooling works by sliding a window across the input and feeding the content of\n", + "the window to a **pooling function**. In some sense, pooling works very much\n", + "like a discrete convolution, but replaces the linear combination described by\n", + "the kernel with some other function." + ] + }, + { + "cell_type": "markdown", + "id": "1159bffe", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling arithmetic\n", + "\n", + "In a neural network, pooling layers provide invariance to small translations of\n", + "the input. The most common kind of pooling is **max pooling**, which\n", + "consists in splitting the input in (usually non-overlapping) patches and\n", + "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n", + "mean or average pooling, which all share the same idea of aggregating the input\n", + "locally by applying a non-linearity to the content of some patches." + ] + }, + { + "cell_type": "markdown", + "id": "138b6d6a", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "97123878", + "metadata": { + "editable": true + }, + "source": [ + "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n", + "\n", + "As discussed above, CNNs are neural networks built from the assumption\n", + "that the inputs to the network are 2D images. This is important\n", + "because the number of features or pixels in images grows very fast\n", + "with the image size, and an enormous number of weights and biases are\n", + "needed in order to build an accurate network. Next week we will\n", + "discuss in more detail how we can build a CNN using either TensorFlow\n", + "with Keras and PyTorch." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_build/jupyter_execute/week45.ipynb b/doc/LectureNotes/_build/jupyter_execute/week45.ipynb new file mode 100644 index 000000000..54d61a576 --- /dev/null +++ b/doc/LectureNotes/_build/jupyter_execute/week45.ipynb @@ -0,0 +1,2335 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9686648f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "45892517", + "metadata": { + "editable": true + }, + "source": [ + "# Week 45, Convolutional Neural Networks (CCNs)\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n", + "\n", + "Date: **November 3-7, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "8449fbfd", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 45\n", + "\n", + "**Material for the lecture on Monday November 3, 2025.**\n", + "\n", + "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n", + "\n", + "2. Readings and Videos:\n", + "\n", + "3. These lecture notes at \n", + "\n", + "4. Video of lecture at \n", + "\n", + "5. Whiteboard notes at \n", + "\n", + "6. For a more in depth discussion on CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications \n", + "\n", + "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at .\n", + "\n", + "\n", + "a. Video on Deep Learning at " + ] + }, + { + "cell_type": "markdown", + "id": "4ad8a4b2", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "Discussion of and work on project 2, no exercises this week, only project work" + ] + }, + { + "cell_type": "markdown", + "id": "48e99fbe", + "metadata": { + "editable": true + }, + "source": [ + "## Material for Lecture Monday November 3" + ] + }, + { + "cell_type": "markdown", + "id": "661e183c", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Networks (recognizing images), reminder from last week\n", + "\n", + "Convolutional neural networks (CNNs) were developed during the last\n", + "decade of the previous century, with a focus on character recognition\n", + "tasks. Nowadays, CNNs are a central element in the spectacular success\n", + "of deep learning methods. The success in for example image\n", + "classifications have made them a central tool for most machine\n", + "learning practitioners.\n", + "\n", + "CNNs are very similar to ordinary Neural Networks.\n", + "They are made up of neurons that have learnable weights and\n", + "biases. Each neuron receives some inputs, performs a dot product and\n", + "optionally follows it with a non-linearity. The whole network still\n", + "expresses a single differentiable score function: from the raw image\n", + "pixels on one end to class scores at the other. And they still have a\n", + "loss function (for example Softmax) on the last (fully-connected) layer\n", + "and all the tips/tricks we developed for learning regular Neural\n", + "Networks still apply (back propagation, gradient descent etc etc)." + ] + }, + { + "cell_type": "markdown", + "id": "96a38398", + "metadata": { + "editable": true + }, + "source": [ + "## What is the Difference\n", + "\n", + "**CNN architectures make the explicit assumption that\n", + "the inputs are images, which allows us to encode certain properties\n", + "into the architecture. These then make the forward function more\n", + "efficient to implement and vastly reduce the amount of parameters in\n", + "the network.**" + ] + }, + { + "cell_type": "markdown", + "id": "3ca522fb", + "metadata": { + "editable": true + }, + "source": [ + "## Neural Networks vs CNNs\n", + "\n", + "Neural networks are defined as **affine transformations**, that is \n", + "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n", + "output (to which a bias vector is usually added before passing the result\n", + "through a nonlinear activation function). This is applicable to any type of input, be it an\n", + "image, a sound clip or an unordered collection of features: whatever their\n", + "dimensionality, their representation can always be flattened into a vector\n", + "before the transformation." + ] + }, + { + "cell_type": "markdown", + "id": "609aa156", + "metadata": { + "editable": true + }, + "source": [ + "## Why CNNS for images, sound files, medical images from CT scans etc?\n", + "\n", + "However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic\n", + "structure. More formally, they share these important properties:\n", + "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n", + "\n", + "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n", + "\n", + "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n", + "\n", + "These properties are not exploited when an affine transformation is applied; in\n", + "fact, all the axes are treated in the same way and the topological information\n", + "is not taken into account. Still, taking advantage of the implicit structure of\n", + "the data may prove very handy in solving some tasks, like computer vision and\n", + "speech recognition, and in these cases it would be best to preserve it. This is\n", + "where discrete convolutions come into play.\n", + "\n", + "A discrete convolution is a linear transformation that preserves this notion of\n", + "ordering. It is sparse (only a few input units contribute to a given output\n", + "unit) and reuses parameters (the same weights are applied to multiple locations\n", + "in the input)." + ] + }, + { + "cell_type": "markdown", + "id": "c280e4de", + "metadata": { + "editable": true + }, + "source": [ + "## Regular NNs don’t scale well to full images\n", + "\n", + "As an example, consider\n", + "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n", + "single fully-connected neuron in a first hidden layer of a regular\n", + "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n", + "seems manageable, but clearly this fully-connected structure does not\n", + "scale to larger images. For example, an image of more respectable\n", + "size, say $200\\times 200\\times 3$, would lead to neurons that have \n", + "$200\\times 200\\times 3 = 120,000$ weights. \n", + "\n", + "We could have\n", + "several such neurons, and the parameters would add up quickly! Clearly,\n", + "this full connectivity is wasteful and the huge number of parameters\n", + "would quickly lead to possible overfitting.\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A regular 3-layer Neural Network.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0d86d50e", + "metadata": { + "editable": true + }, + "source": [ + "## 3D volumes of neurons\n", + "\n", + "Convolutional Neural Networks take advantage of the fact that the\n", + "input consists of images and they constrain the architecture in a more\n", + "sensible way. \n", + "\n", + "In particular, unlike a regular Neural Network, the\n", + "layers of a CNN have neurons arranged in 3 dimensions: width,\n", + "height, depth. (Note that the word depth here refers to the third\n", + "dimension of an activation volume, not to the depth of a full Neural\n", + "Network, which can refer to the total number of layers in a network.)\n", + "\n", + "To understand it better, the above example of an image \n", + "with an input volume of\n", + "activations has dimensions $32\\times 32\\times 3$ (width, height,\n", + "depth respectively). \n", + "\n", + "The neurons in a layer will\n", + "only be connected to a small region of the layer before it, instead of\n", + "all of the neurons in a fully-connected manner. Moreover, the final\n", + "output layer could for this specific image have dimensions $1\\times 1 \\times 10$, \n", + "because by the\n", + "end of the CNN architecture we will reduce the full image into a\n", + "single vector of class scores, arranged along the depth\n", + "dimension. \n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "93102a35", + "metadata": { + "editable": true + }, + "source": [ + "## More on Dimensionalities\n", + "\n", + "In fields like signal processing (and imaging as well), one designs\n", + "so-called filters. These filters are defined by the convolutions and\n", + "are often hand-crafted. One may specify filters for smoothing, edge\n", + "detection, frequency reshaping, and similar operations. However with\n", + "neural networks the idea is to automatically learn the filters and use\n", + "many of them in conjunction with non-linear operations (activation\n", + "functions).\n", + "\n", + "As an example consider a neural network operating on sound sequence\n", + "data. Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$. We\n", + "construct then a neural network with onle hidden layer only with\n", + "$10^4$ nodes. This means that we will have a weight matrix with\n", + "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n", + "\n", + "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n", + "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "b0e6ea33", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3fbba997", + "metadata": { + "editable": true + }, + "source": [ + "that is ten billion parameters to determine." + ] + }, + { + "cell_type": "markdown", + "id": "4be9d3e0", + "metadata": { + "editable": true + }, + "source": [ + "## Further remarks\n", + "\n", + "The main principles that justify convolutions is locality of\n", + "information and repetion of patterns within the signal. Sound samples\n", + "of the input in adjacent spots are much more likely to affect each\n", + "other than those that are very far away. Similarly, sounds are\n", + "repeated in multiple times in the signal. While slightly simplistic,\n", + "reasoning about such a sound example demonstrates this. The same\n", + "principles then apply to images and other similar data." + ] + }, + { + "cell_type": "markdown", + "id": "b93711ab", + "metadata": { + "editable": true + }, + "source": [ + "## Layers used to build CNNs\n", + "\n", + "A simple CNN is a sequence of layers, and every layer of a CNN\n", + "transforms one volume of activations to another through a\n", + "differentiable function. We use three main types of layers to build\n", + "CNN architectures: Convolutional Layer, Pooling Layer, and\n", + "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n", + "will stack these layers to form a full CNN architecture.\n", + "\n", + "A simple CNN for image classification could have the architecture:\n", + "\n", + "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n", + "\n", + "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n", + "\n", + "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n", + "\n", + "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n", + "\n", + "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume." + ] + }, + { + "cell_type": "markdown", + "id": "df93de2c", + "metadata": { + "editable": true + }, + "source": [ + "## Transforming images\n", + "\n", + "CNNs transform the original image layer by layer from the original\n", + "pixel values to the final class scores. \n", + "\n", + "Observe that some layers contain\n", + "parameters and other don’t. In particular, the CNN layers perform\n", + "transformations that are a function of not only the activations in the\n", + "input volume, but also of the parameters (the weights and biases of\n", + "the neurons). On the other hand, the RELU/POOL layers will implement a\n", + "fixed function. The parameters in the CONV/FC layers will be trained\n", + "with gradient descent so that the class scores that the CNN computes\n", + "are consistent with the labels in the training set for each image." + ] + }, + { + "cell_type": "markdown", + "id": "35b469f8", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in brief\n", + "\n", + "In summary:\n", + "\n", + "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n", + "\n", + "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n", + "\n", + "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n", + "\n", + "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n", + "\n", + "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)" + ] + }, + { + "cell_type": "markdown", + "id": "f2bc243c", + "metadata": { + "editable": true + }, + "source": [ + "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "92956a26", + "metadata": { + "editable": true + }, + "source": [ + "## Key Idea\n", + "\n", + "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n", + "\n", + "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n", + "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n", + "\n", + "We say we perform a filtering (convolution is the mathematical operation)." + ] + }, + { + "cell_type": "markdown", + "id": "b758f4ee", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of CNNs\n", + "\n", + "The mathematics of CNNs is based on the mathematical operation of\n", + "**convolution**. In mathematics (in particular in functional analysis),\n", + "convolution is represented by mathematical operations (integration,\n", + "summation etc) on two functions in order to produce a third function\n", + "that expresses how the shape of one gets modified by the other.\n", + "Convolution has a plethora of applications in a variety of\n", + "disciplines, spanning from statistics to signal processing, computer\n", + "vision, solutions of differential equations,linear algebra,\n", + "engineering, and yes, machine learning.\n", + "\n", + "Mathematically, convolution is defined as follows (one-dimensional example):\n", + "Let us define a continuous function $y(t)$ given by" + ] + }, + { + "cell_type": "markdown", + "id": "9fa911b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\int x(a) w(t-a) da,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "918817a5", + "metadata": { + "editable": true + }, + "source": [ + "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n", + "\n", + "The above integral is written in a more compact form as" + ] + }, + { + "cell_type": "markdown", + "id": "d5538df6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\left(x * w\\right)(t).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4a4e2bc", + "metadata": { + "editable": true + }, + "source": [ + "The discretized version reads" + ] + }, + { + "cell_type": "markdown", + "id": "68268e68", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "198bcce9", + "metadata": { + "editable": true + }, + "source": [ + "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n", + "\n", + "How can we use this? And what does it mean? Let us study some familiar examples first." + ] + }, + { + "cell_type": "markdown", + "id": "43b535c4", + "metadata": { + "editable": true + }, + "source": [ + "## Convolution Examples: Polynomial multiplication\n", + "\n", + "Our first example is that of a multiplication between two polynomials,\n", + "which we will rewrite in terms of the mathematics of convolution. In\n", + "the final stage, since the problem here is a discrete one, we will\n", + "recast the final expression in terms of a matrix-vector\n", + "multiplication, where the matrix is a so-called [Toeplitz matrix\n", + "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n", + "\n", + "Let us look a the following polynomials to second and third order, respectively:" + ] + }, + { + "cell_type": "markdown", + "id": "45bc8ffc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2c42df04", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "08c139bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bf189420", + "metadata": { + "editable": true + }, + "source": [ + "The polynomial multiplication gives us a new polynomial of degree $5$" + ] + }, + { + "cell_type": "markdown", + "id": "7f5d7607", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a2f47e64", + "metadata": { + "editable": true + }, + "source": [ + "## Efficient Polynomial Multiplication\n", + "\n", + "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n", + "We note first that the new coefficients are given as" + ] + }, + { + "cell_type": "markdown", + "id": "7890aee8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{split}\n", + "\\delta_0=&\\alpha_0\\beta_0\\\\\n", + "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n", + "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n", + "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n", + "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n", + "\\delta_5=&\\alpha_2\\beta_3.\\\\\n", + "\\end{split}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a03a3eb", + "metadata": { + "editable": true + }, + "source": [ + "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n", + "\n", + "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as" + ] + }, + { + "cell_type": "markdown", + "id": "b49e404f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4ef5b061", + "metadata": { + "editable": true + }, + "source": [ + "or as a double sum with restriction $l=i+j$" + ] + }, + { + "cell_type": "markdown", + "id": "61685a6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ced5341", + "metadata": { + "editable": true + }, + "source": [ + "## Further simplification\n", + "\n", + "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as" + ] + }, + { + "cell_type": "markdown", + "id": "3d00697e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "22837be3", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of\n", + "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation." + ] + }, + { + "cell_type": "markdown", + "id": "1603a086", + "metadata": { + "editable": true + }, + "source": [ + "## A more efficient way of coding the above Convolution\n", + "\n", + "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n", + "which are non-zero, we can rewrite the above convolution expressions\n", + "as a matrix-vector multiplication" + ] + }, + { + "cell_type": "markdown", + "id": "340acf5c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n", + " \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n", + "\t\t\t \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n", + "\t\t\t 0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n", + "\t\t\t 0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n", + "\t\t\t 0 & 0 & 0 & \\alpha_2\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cdc8d513", + "metadata": { + "editable": true + }, + "source": [ + "## Commutative process\n", + "\n", + "The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding $\\beta$ and a vector holding $\\alpha$.\n", + "In this case we have" + ] + }, + { + "cell_type": "markdown", + "id": "51e1f3d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0 \\\\\n", + " \\beta_1 & \\beta_0 & 0 \\\\\n", + "\t\t\t \\beta_2 & \\beta_1 & \\beta_0 \\\\\n", + "\t\t\t \\beta_3 & \\beta_2 & \\beta_1 \\\\\n", + "\t\t\t 0 & \\beta_3 & \\beta_2 \\\\\n", + "\t\t\t 0 & 0 & \\beta_3\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ce936f65", + "metadata": { + "editable": true + }, + "source": [ + "Note that the use of these matrices is for mathematical purposes only\n", + "and not implementation purposes. When implementing the above equation\n", + "we do not encode (and allocate memory) the matrices explicitely. We\n", + "rather code the convolutions in the minimal memory footprint that they\n", + "require." + ] + }, + { + "cell_type": "markdown", + "id": "c93a683f", + "metadata": { + "editable": true + }, + "source": [ + "## Toeplitz matrices\n", + "\n", + "The above matrices are examples of so-called [Toeplitz\n", + "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n", + "Toeplitz matrix is a matrix in which each descending diagonal from\n", + "left to right is constant. For instance the last matrix, which we\n", + "rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "1e3cffca", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0 \\\\\n", + " a_1 & a_0 & 0 \\\\\n", + "\t\t\t a_2 & a_1 & a_0 \\\\\n", + "\t\t\t a_3 & a_2 & a_1 \\\\\n", + "\t\t\t 0 & a_3 & a_2 \\\\\n", + "\t\t\t 0 & 0 & a_3\n", + "\t\t\t \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e27270d9", + "metadata": { + "editable": true + }, + "source": [ + "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n", + "matrix. Such a matrix does not need to be a square matrix. Toeplitz\n", + "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n", + "polynomial, compressed to a finite-dimensional space, can be\n", + "represented by such a matrix. The example above shows that we can\n", + "represent linear convolution as multiplication of a Toeplitz matrix by\n", + "a vector." + ] + }, + { + "cell_type": "markdown", + "id": "125ef645", + "metadata": { + "editable": true + }, + "source": [ + "## Fourier series and Toeplitz matrices\n", + "\n", + "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n", + "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n", + "\n", + "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)" + ] + }, + { + "cell_type": "markdown", + "id": "d13ab1e4", + "metadata": { + "editable": true + }, + "source": [ + "## Generalizing the above one-dimensional case\n", + "\n", + "In order to align the above simple case with the more general\n", + "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n", + "with $\\boldsymbol{w}$. We will interpret $\\boldsymbol{w}$ as a weight/filter function\n", + "with which we want to perform the convolution with an input variable\n", + "$\\boldsymbol{x}$ of length $n$. We will assume always that the filter\n", + "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n", + "\n", + "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have" + ] + }, + { + "cell_type": "markdown", + "id": "b9eb4b1e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bdf0893f", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n", + "Here the symbol $*$ represents the mathematical operation of convolution." + ] + }, + { + "cell_type": "markdown", + "id": "64cd5dbb", + "metadata": { + "editable": true + }, + "source": [ + "## Memory considerations\n", + "\n", + "This expression leaves us however with some terms with negative\n", + "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n", + "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n", + "\n", + "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n", + "represent a third-order polynomial.\n", + "\n", + "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n", + "contains the coefficients of a fifth-order polynomial. When $i=5$ we\n", + "may also have values of $x(4)$ and $x(5)$ which are not defined." + ] + }, + { + "cell_type": "markdown", + "id": "20fa0219", + "metadata": { + "editable": true + }, + "source": [ + "## Padding\n", + "\n", + "The solution to this is what is called **padding**! We simply define a\n", + "new vector $x$ with two added elements set to zero before $x(0)$ and\n", + "two new elements after $x(3)$ set to zero. That is, we augment the\n", + "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n", + "constant (a new hyperparameter), see discussions below as well." + ] + }, + { + "cell_type": "markdown", + "id": "d24c7e69", + "metadata": { + "editable": true + }, + "source": [ + "## New vector\n", + "\n", + "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n", + "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n", + "$x(6)=0$, and $x(7)=0$.\n", + "\n", + "We have added four new elements, which\n", + "are all zero. The benefit is that we can rewrite the equation for\n", + "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$," + ] + }, + { + "cell_type": "markdown", + "id": "c00151a8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9b39bfd", + "metadata": { + "editable": true + }, + "source": [ + "As an example, we have" + ] + }, + { + "cell_type": "markdown", + "id": "53de5ac4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e1025d77", + "metadata": { + "editable": true + }, + "source": [ + "as before except that we have an additional term $x(6)w(0)$, which is zero.\n", + "\n", + "Similarly, for the fifth-order term we have" + ] + }, + { + "cell_type": "markdown", + "id": "34a5a413", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ef38242", + "metadata": { + "editable": true + }, + "source": [ + "The zeroth-order term is" + ] + }, + { + "cell_type": "markdown", + "id": "42a8bd2e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2580d624", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting as dot products\n", + "\n", + "If we now flip the filter/weight vector, with the following term as a typical example" + ] + }, + { + "cell_type": "markdown", + "id": "76157e3c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a47c0bbf", + "metadata": { + "editable": true + }, + "source": [ + "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n", + "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n", + "\n", + "The padding $P$ we have introduced for the convolution stage is just\n", + "another hyperparameter which is introduced as part of the\n", + "architecture. Similarly, below we will also introduce another\n", + "hyperparameter called **Stride** $S$." + ] + }, + { + "cell_type": "markdown", + "id": "4de2c235", + "metadata": { + "editable": true + }, + "source": [ + "## Cross correlation\n", + "\n", + "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n", + "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)" + ] + }, + { + "cell_type": "markdown", + "id": "33319954", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d46fb216", + "metadata": { + "editable": true + }, + "source": [ + "we have now" + ] + }, + { + "cell_type": "markdown", + "id": "1125a773", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4e9ea645", + "metadata": { + "editable": true + }, + "source": [ + "Both TensorFlow and PyTorch (as well as our own code example below),\n", + "implement the last equation, although it is normally referred to as\n", + "convolution. The same padding rules and stride rules discussed below\n", + "apply to this expression as well.\n", + "\n", + "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression." + ] + }, + { + "cell_type": "markdown", + "id": "711fc589", + "metadata": { + "editable": true + }, + "source": [ + "## Two-dimensional objects\n", + "\n", + "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n", + "We often use convolutions over more than one dimension at a time. If\n", + "we have a two-dimensional image $X$ as input, we can have a **filter**\n", + "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$" + ] + }, + { + "cell_type": "markdown", + "id": "ea93186d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2ce72e4f", + "metadata": { + "editable": true + }, + "source": [ + "Convolution is a commutative process, which means we can rewrite this equation as" + ] + }, + { + "cell_type": "markdown", + "id": "7c891889", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "337f70a6", + "metadata": { + "editable": true + }, + "source": [ + "Normally the latter is more straightforward to implement in a machine\n", + "larning library since there is less variation in the range of values\n", + "of $m$ and $n$.\n", + "\n", + "As mentioned above, most deep learning libraries implement\n", + "cross-correlation instead of convolution (although it is referred to as\n", + "convolution)" + ] + }, + { + "cell_type": "markdown", + "id": "aa0e3c87", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77113b34", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in more detail, simple example\n", + "\n", + "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n", + "and a $2\\times 2$ filter $W$ given by the following matrices" + ] + }, + { + "cell_type": "markdown", + "id": "d54278c7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02} \\\\\n", + " x_{10} & x_{11} & x_{12} \\\\\n", + "\t x_{20} & x_{21} & x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "597d1ef3", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c544ba40", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n", + "\t w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6c1b40b", + "metadata": { + "editable": true + }, + "source": [ + "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n", + "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n", + "\n", + "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n", + "\n", + "Here we perform the operation" + ] + }, + { + "cell_type": "markdown", + "id": "d8ee5cf0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5df35204", + "metadata": { + "editable": true + }, + "source": [ + "and obtain" + ] + }, + { + "cell_type": "markdown", + "id": "afe8a3ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\\\\n", + "\t x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a1c6848", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n", + "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as" + ] + }, + { + "cell_type": "markdown", + "id": "4506234a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f1b2fef4", + "metadata": { + "editable": true + }, + "source": [ + "and the new matrix" + ] + }, + { + "cell_type": "markdown", + "id": "6c372fa6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n", + " 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n", + "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\\\\n", + " 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "61ad1cf3", + "metadata": { + "editable": true + }, + "source": [ + "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "a18a70a2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b63a1613", + "metadata": { + "editable": true + }, + "source": [ + "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\\times 2$ output matrix." + ] + }, + { + "cell_type": "markdown", + "id": "8fa9fe57", + "metadata": { + "editable": true + }, + "source": [ + "## The convolution stage\n", + "\n", + "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n", + "order to reduce the dimensionality of an image, adds, in addition to\n", + "the weights and biases (to be trained by the back propagation\n", + "algorithm) that define the filters, two new hyperparameters, the so-called\n", + "**padding** $P$ and the stride $S$." + ] + }, + { + "cell_type": "markdown", + "id": "a30b6ced", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the number of parameters\n", + "\n", + "In the above example we have an input matrix of dimension $3\\times\n", + "3$. In general we call the input for an input volume and it is defined\n", + "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n", + "standard three color channels $D_1=3$.\n", + "\n", + "The above example has $W_1=H_1=3$ and $D_1=1$.\n", + "\n", + "When we introduce the filter we have the following additional hyperparameters\n", + "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n", + "\n", + "2. $F$ as the filter's spatial extent\n", + "\n", + "3. $S$ as the stride parameter\n", + "\n", + "4. $P$ as the padding parameter\n", + "\n", + "These parameters are defined by the architecture of the network and are not included in the training." + ] + }, + { + "cell_type": "markdown", + "id": "b38d040f", + "metadata": { + "editable": true + }, + "source": [ + "## New image (or volume)\n", + "\n", + "Acting with the filter on the input volume produces an output volume\n", + "which is defined by its width $W_2$, its height $H_2$ and its depth\n", + "$D_2$.\n", + "\n", + "These are defined by the following relations" + ] + }, + { + "cell_type": "markdown", + "id": "3b090ce0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "52fa4212", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dfa9a926", + "metadata": { + "editable": true + }, + "source": [ + "and $D_2=K$." + ] + }, + { + "cell_type": "markdown", + "id": "9bb02c26", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters to train, common settings\n", + "\n", + "With parameter sharing, the convolution involves thus for each filter $F\\times F\\times D_1$ weights plus one bias parameter.\n", + "\n", + "In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "d98e6808", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "601ecd16", + "metadata": { + "editable": true + }, + "source": [ + "parameters to train by back propagation.\n", + "\n", + "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n", + "\n", + "**Common settings.**\n", + "\n", + "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n", + "\n", + "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n", + "\n", + "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n", + "\n", + "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$" + ] + }, + { + "cell_type": "markdown", + "id": "3f87e148", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of CNN setups\n", + "\n", + "Let us assume we have an input volume $V$ given by an image of dimensionality\n", + "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n", + "\n", + "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n", + "\n", + "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n", + "of dimensionality $28\\times 28\\times 3$.\n", + "\n", + "The total number of parameters to train for each filter is then\n", + "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n", + "gives us $76$ parameters for each filter, leading to a total of $760$\n", + "parameters for the ten filters.\n", + "\n", + "How many parameters will a filter of dimensionality $3\\times 3$\n", + "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n", + "\n", + "Note that strides constitute a form of **subsampling**. As an alternative to\n", + "being interpreted as a measure of how much the kernel/filter is translated, strides\n", + "can also be viewed as how much of the output is retained. For instance, moving\n", + "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n", + "retaining only odd output elements." + ] + }, + { + "cell_type": "markdown", + "id": "45526eae", + "metadata": { + "editable": true + }, + "source": [ + "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "963177d2", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling\n", + "\n", + "In addition to discrete convolutions themselves, **pooling** operations\n", + "make up another important building block in CNNs. Pooling operations reduce\n", + "the size of feature maps by using some function to summarize subregions, such\n", + "as taking the average or the maximum value.\n", + "\n", + "Pooling works by sliding a window across the input and feeding the content of\n", + "the window to a **pooling function**. In some sense, pooling works very much\n", + "like a discrete convolution, but replaces the linear combination described by\n", + "the kernel with some other function." + ] + }, + { + "cell_type": "markdown", + "id": "f657465b", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling arithmetic\n", + "\n", + "In a neural network, pooling layers provide invariance to small translations of\n", + "the input. The most common kind of pooling is **max pooling**, which\n", + "consists in splitting the input in (usually non-overlapping) patches and\n", + "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n", + "mean or average pooling, which all share the same idea of aggregating the input\n", + "locally by applying a non-linearity to the content of some patches." + ] + }, + { + "cell_type": "markdown", + "id": "33142d01", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "7e8ee265", + "metadata": { + "editable": true + }, + "source": [ + "## Building convolutional neural networks using Tensorflow and Keras\n", + "\n", + "As discussed above, CNNs are neural networks built from the assumption that the inputs\n", + "to the network are 2D images. This is important because the number of features or pixels in images\n", + "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network. \n", + "\n", + "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n", + "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n", + "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n", + "matrices, typically 1 for each color dimension (Red, Green, Blue)." + ] + }, + { + "cell_type": "markdown", + "id": "c4e2bc6f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting it up\n", + "\n", + "It means that to represent the entire\n", + "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:" + ] + }, + { + "cell_type": "markdown", + "id": "f8d6e5be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd170ded", + "metadata": { + "editable": true + }, + "source": [ + "## The MNIST dataset again\n", + "\n", + "The MNIST dataset consists of grayscale images with a pixel size of\n", + "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n", + "neuron in the first hidden layer.\n", + "\n", + "If we were to analyze images of size $128\\times 128$ we would require\n", + "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n", + "dealing with color images, as most images are, we have an image matrix\n", + "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n", + "meaning 3 times the number of weights $= 49152$ are required for every\n", + "single neuron in the first hidden layer." + ] + }, + { + "cell_type": "markdown", + "id": "5f8a4322", + "metadata": { + "editable": true + }, + "source": [ + "## Strong correlations\n", + "\n", + "Images typically have strong local correlations, meaning that a small\n", + "part of the image varies little from its neighboring regions. If for\n", + "example we have an image of a blue car, we can roughly assume that a\n", + "small blue part of the image is surrounded by other blue regions.\n", + "\n", + "Therefore, instead of connecting every single pixel to a neuron in the\n", + "first hidden layer, as we have previously done with deep neural\n", + "networks, we can instead connect each neuron to a small part of the\n", + "image (in all 3 RGB depth dimensions). The size of each small area is\n", + "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)." + ] + }, + { + "cell_type": "markdown", + "id": "bad994c1", + "metadata": { + "editable": true + }, + "source": [ + "## Layers of a CNN\n", + "\n", + "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth. \n", + "The input image is typically a square matrix of depth 3. \n", + "\n", + "A **convolution** is performed on the image which outputs\n", + "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n", + "\n", + "Each filter slides along the input image, taking the dot product\n", + "between each small part of the image and the filter, in all depth\n", + "dimensions. This is then passed through a non-linear function,\n", + "typically the **Rectified Linear (ReLu)** function, which serves as the\n", + "activation of the neurons in the first convolutional layer. This is\n", + "further passed through a **pooling layer**, which reduces the size of the\n", + "convolutional layer, e.g. by taking the maximum or average across some\n", + "small regions, and this serves as input to the next convolutional\n", + "layer." + ] + }, + { + "cell_type": "markdown", + "id": "3f9bf131", + "metadata": { + "editable": true + }, + "source": [ + "## Systematic reduction\n", + "\n", + "By systematically reducing the size of the input volume, through\n", + "convolution and pooling, the network should create representations of\n", + "small parts of the input, and then from them assemble representations\n", + "of larger areas. The final pooling layer is flattened to serve as\n", + "input to a hidden layer, such that each neuron in the final pooling\n", + "layer is connected to every single neuron in the hidden layer. This\n", + "then serves as input to the output layer, e.g. a softmax output for\n", + "classification." + ] + }, + { + "cell_type": "markdown", + "id": "625ace40", + "metadata": { + "editable": true + }, + "source": [ + "## Prerequisites: Collect and pre-process data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a3f06a64", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "# RGB images have a depth of 3\n", + "# our images are grayscale so they should have a depth of 1\n", + "inputs = inputs[:,:,:,np.newaxis]\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "n_inputs = len(inputs)\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "764e7143", + "metadata": { + "editable": true + }, + "source": [ + "## Importing Keras and Tensorflow" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "1b8fd15a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras import datasets, layers, models\n", + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "#from tensorflow.keras import Conv2D\n", + "#from tensorflow.keras import MaxPooling2D\n", + "#from tensorflow.keras import Flatten\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "# one-liner from scikit-learn library\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "markdown", + "id": "bf68c3f4", + "metadata": { + "editable": true + }, + "source": [ + "## Running with Keras" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d5a91d0e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n", + " n_filters, n_neurons_connected, n_categories,\n", + " eta, lmbd):\n", + " model = Sequential()\n", + " model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n", + " activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n", + " model.add(layers.Flatten())\n", + " model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n", + " \n", + " sgd = optimizers.SGD(lr=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model\n", + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "input_shape = X_train.shape[1:4]\n", + "receptive_field = 3\n", + "n_filters = 10\n", + "n_neurons_connected = 50\n", + "n_categories = 10\n", + "\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)" + ] + }, + { + "cell_type": "markdown", + "id": "8ff4d34b", + "metadata": { + "editable": true + }, + "source": [ + "## Final part" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "c1035646", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n", + " n_filters, n_neurons_connected, n_categories,\n", + " eta, lmbd)\n", + " CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = CNN.evaluate(X_test, Y_test)\n", + " \n", + " CNN_keras[i][j] = CNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "dcdee4b4", + "metadata": { + "editable": true + }, + "source": [ + "## Final visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "c34c4218", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " CNN = CNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9848777f", + "metadata": { + "editable": true + }, + "source": [ + "## The CIFAR01 data set\n", + "\n", + "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n", + "6,000 images in each class. The dataset is divided into 50,000\n", + "training images and 10,000 testing images. The classes are mutually\n", + "exclusive and there is no overlap between them." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "e3c34685", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "\n", + "from tensorflow.keras import datasets, layers, models\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# We import the data set\n", + "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n", + "\n", + "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n", + "train_images, test_images = train_images / 255.0, test_images / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "376a2959", + "metadata": { + "editable": true + }, + "source": [ + "## Verifying the data set\n", + "\n", + "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "fa4b303c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n", + " 'dog', 'frog', 'horse', 'ship', 'truck']\n", + "plt.figure(figsize=(10,10))\n", + "for i in range(25):\n", + " plt.subplot(5,5,i+1)\n", + " plt.xticks([])\n", + " plt.yticks([])\n", + " plt.grid(False)\n", + " plt.imshow(train_images[i], cmap=plt.cm.binary)\n", + " # The CIFAR labels happen to be arrays, \n", + " # which is why you need the extra index\n", + " plt.xlabel(class_names[train_labels[i][0]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8f717ab7", + "metadata": { + "editable": true + }, + "source": [ + "## Set up the model\n", + "\n", + "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n", + "\n", + "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "91013222", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model = models.Sequential()\n", + "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "\n", + "# Let's display the architecture of our model so far.\n", + "\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "64f3581b", + "metadata": { + "editable": true + }, + "source": [ + "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer." + ] + }, + { + "cell_type": "markdown", + "id": "07774fd6", + "metadata": { + "editable": true + }, + "source": [ + "## Add Dense layers on top\n", + "\n", + "To complete our model, you will feed the last output tensor from the\n", + "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n", + "to perform classification. Dense layers take vectors as input (which\n", + "are 1D), while the current output is a 3D tensor. First, you will\n", + "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n", + "layers on top. CIFAR has 10 output classes, so you use a final Dense\n", + "layer with 10 outputs and a softmax activation." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a6dc1206", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model.add(layers.Flatten())\n", + "model.add(layers.Dense(64, activation='relu'))\n", + "model.add(layers.Dense(10))\n", + "Here's the complete architecture of our model.\n", + "\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "71ef5715", + "metadata": { + "editable": true + }, + "source": [ + "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers." + ] + }, + { + "cell_type": "markdown", + "id": "596eaf51", + "metadata": { + "editable": true + }, + "source": [ + "## Compile and train the model" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1c8159af", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model.compile(optimizer='adam',\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])\n", + "\n", + "history = model.fit(train_images, train_labels, epochs=10, \n", + " validation_data=(test_images, test_labels))" + ] + }, + { + "cell_type": "markdown", + "id": "23913f02", + "metadata": { + "editable": true + }, + "source": [ + "## Finally, evaluate the model" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "942cf136", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "plt.plot(history.history['accuracy'], label='accuracy')\n", + "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('Accuracy')\n", + "plt.ylim([0.5, 1])\n", + "plt.legend(loc='lower right')\n", + "\n", + "test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)\n", + "\n", + "print(test_acc)" + ] + }, + { + "cell_type": "markdown", + "id": "9cf8f35b", + "metadata": { + "editable": true + }, + "source": [ + "## Building code using Pytorch\n", + "\n", + "This code loads and normalizes the MNIST dataset. Thereafter it defines a CNN architecture with:\n", + "1. Two convolutional layers\n", + "\n", + "2. Max pooling\n", + "\n", + "3. Dropout for regularization\n", + "\n", + "4. Two fully connected layers\n", + "\n", + "It uses the Adam optimizer and for cost function it employs the\n", + "Cross-Entropy function. It trains for 10 epochs.\n", + "You can modify the architecture (number of layers, channels, dropout\n", + "rate) or training parameters (learning rate, batch size, epochs) to\n", + "experiment with different configurations." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3f08edcf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "from torchvision import datasets, transforms\n", + "\n", + "# Set device\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "# Define transforms\n", + "transform = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.1307,), (0.3081,))\n", + "])\n", + "\n", + "# Load datasets\n", + "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "# Create data loaders\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define CNN model\n", + "class CNN(nn.Module):\n", + " def __init__(self):\n", + " super(CNN, self).__init__()\n", + " self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n", + " self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n", + " self.pool = nn.MaxPool2d(2, 2)\n", + " self.fc1 = nn.Linear(64*7*7, 1024)\n", + " self.fc2 = nn.Linear(1024, 10)\n", + " self.dropout = nn.Dropout(0.5)\n", + "\n", + " def forward(self, x):\n", + " x = self.pool(F.relu(self.conv1(x)))\n", + " x = self.pool(F.relu(self.conv2(x)))\n", + " x = x.view(-1, 64*7*7)\n", + " x = self.dropout(F.relu(self.fc1(x)))\n", + " x = self.fc2(x)\n", + " return x\n", + "\n", + "# Initialize model, loss function, and optimizer\n", + "model = CNN().to(device)\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.Adam(model.parameters(), lr=0.001)\n", + "\n", + "# Training loop\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " model.train()\n", + " running_loss = 0.0\n", + " for batch_idx, (data, target) in enumerate(train_loader):\n", + " data, target = data.to(device), target.to(device)\n", + " optimizer.zero_grad()\n", + " outputs = model(data)\n", + " loss = criterion(outputs, target)\n", + " loss.backward()\n", + " optimizer.step()\n", + " running_loss += loss.item()\n", + "\n", + " print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n", + "\n", + "# Testing the model\n", + "model.eval()\n", + "correct = 0\n", + "total = 0\n", + "with torch.no_grad():\n", + " for data, target in test_loader:\n", + " data, target = data.to(device), target.to(device)\n", + " outputs = model(data)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += target.size(0)\n", + " correct += (predicted == target).sum().item()\n", + "\n", + "print(f'Test Accuracy: {100 * correct / total:.2f}%')" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/doc/LectureNotes/_toc.yml b/doc/LectureNotes/_toc.yml index d07149d0e..39ed2f64f 100644 --- a/doc/LectureNotes/_toc.yml +++ b/doc/LectureNotes/_toc.yml @@ -48,7 +48,23 @@ parts: - file: exercisesweek36.ipynb - file: week36.ipynb - file: exercisesweek37.ipynb + - file: week37.ipynb + - file: exercisesweek38.ipynb + - file: week38.ipynb + - file: exercisesweek39.ipynb + - file: week39.ipynb + - file: week40.ipynb + - file: week41.ipynb + - file: exercisesweek41.ipynb + - file: week42.ipynb + - file: exercisesweek42.ipynb + - file: week43.ipynb + - file: exercisesweek43.ipynb + - file: week44.ipynb + - file: exercisesweek44.ipynb + - file: week45.ipynb - caption: Projects numbered: false chapters: - file: project1.ipynb + - file: project2.ipynb diff --git a/doc/LectureNotes/data/FYS_STK_Template.zip b/doc/LectureNotes/data/FYS_STK_Template.zip new file mode 100644 index 000000000..9d2eea71b Binary files /dev/null and b/doc/LectureNotes/data/FYS_STK_Template.zip differ diff --git a/doc/LectureNotes/exercisesweek35.ipynb b/doc/LectureNotes/exercisesweek35.ipynb index 886db99ef..403eab1f3 100644 --- a/doc/LectureNotes/exercisesweek35.ipynb +++ b/doc/LectureNotes/exercisesweek35.ipynb @@ -323,7 +323,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/exercisesweek36.ipynb b/doc/LectureNotes/exercisesweek36.ipynb index ddf3e11e5..3dd1ad167 100644 --- a/doc/LectureNotes/exercisesweek36.ipynb +++ b/doc/LectureNotes/exercisesweek36.ipynb @@ -172,7 +172,7 @@ "source": [ "n = 100\n", "x = np.linspace(-3, 3, n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)" + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)" ] }, { diff --git a/doc/LectureNotes/exercisesweek37.ipynb b/doc/LectureNotes/exercisesweek37.ipynb index 25296c4e0..bb6ba7a35 100644 --- a/doc/LectureNotes/exercisesweek37.ipynb +++ b/doc/LectureNotes/exercisesweek37.ipynb @@ -2,32 +2,33 @@ "cells": [ { "cell_type": "markdown", - "id": "1b941c35", + "id": "8e6632a0", "metadata": { "editable": true }, "source": [ "\n", - "" + "\n" ] }, { "cell_type": "markdown", - "id": "dc05b096", + "id": "82705c4f", "metadata": { "editable": true }, "source": [ "# Exercises week 37\n", + "\n", "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n", "\n", - "Date: **September 8-12, 2025**" + "Date: **September 8-12, 2025**\n" ] }, { "cell_type": "markdown", - "id": "2cf07405", + "id": "921bf331", "metadata": { "editable": true }, @@ -35,55 +36,56 @@ "## Learning goals\n", "\n", "After having completed these exercises you will have:\n", + "\n", "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n", "\n", "2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n", "\n", "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n", "\n", - "4. Scale the data properly" + "4. Scale the data properly\n" ] }, { "cell_type": "markdown", - "id": "3c139edb", + "id": "adff65d5", "metadata": { "editable": true }, "source": [ "## Simple one-dimensional second-order polynomial\n", "\n", - "We start with a very simple function" + "We start with a very simple function\n" ] }, { "cell_type": "markdown", - "id": "aad4cfac", + "id": "70418b3d", "metadata": { "editable": true }, "source": [ "$$\n", "f(x)= 2-x+5x^2,\n", - "$$" + "$$\n" ] }, { "cell_type": "markdown", - "id": "6682282f", + "id": "11a3cf73", "metadata": { "editable": true }, "source": [ - "defined for $x\\in [-2,2]$. You can add noise if you wish. \n", + "defined for $x\\in [-2,2]$. You can add noise if you wish.\n", "\n", "We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n", - "Feel free to play around with higher-order polynomials." + "Feel free to play around with higher-order polynomials.\n" ] }, { "cell_type": "markdown", - "id": "89e2f4c4", + "id": "04a06b51", "metadata": { "editable": true }, @@ -94,12 +96,12 @@ "standardize the features. This ensures all features are on a\n", "comparable scale, which is especially important when using\n", "regularization. Here we will perform standardization, scaling each\n", - "feature to have mean 0 and standard deviation 1." + "feature to have mean 0 and standard deviation 1.\n" ] }, { "cell_type": "markdown", - "id": "b06d4e53", + "id": "408db3d9", "metadata": { "editable": true }, @@ -114,13 +116,13 @@ "term, the data is shifted such that the intercept is effectively 0\n", ". (In practice, one could include an intercept in the model and not\n", "penalize it, but here we simplify by centering.)\n", - "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + "Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n" ] }, { "cell_type": "code", "execution_count": 1, - "id": "63796480", + "id": "37fb732c", "metadata": { "collapsed": false, "editable": true @@ -140,46 +142,46 @@ }, { "cell_type": "markdown", - "id": "80748600", + "id": "d861e1e3", "metadata": { "editable": true }, "source": [ - "Fill in the necessary details.\n", + "Fill in the necessary details. Do we need to center the $y$-values?\n", "\n", "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n", "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", - "same scale)." + "same scale).\n" ] }, { "cell_type": "markdown", - "id": "92751e5f", + "id": "b3e774d0", "metadata": { "editable": true }, "source": [ "## Exercise 2, calculate the gradients\n", "\n", - "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function." + "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n" ] }, { "cell_type": "markdown", - "id": "aedfbd7a", + "id": "d5dc7708", "metadata": { "editable": true }, "source": [ - "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$" + "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n" ] }, { "cell_type": "code", "execution_count": 2, - "id": "5d1288fa", + "id": "4c9c86ac", "metadata": { "collapsed": false, "editable": true @@ -187,7 +189,9 @@ "outputs": [], "source": [ "# Set regularization parameter, either a single value or a vector of values\n", - "lambda = ?\n", + "# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called \"anonymous functions\" or \"lambda functions.\"\n", + "lam = ?\n", + "\n", "\n", "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n", "I = np.eye(n_features)\n", @@ -200,7 +204,7 @@ }, { "cell_type": "markdown", - "id": "628f5e89", + "id": "eeae00fd", "metadata": { "editable": true }, @@ -208,37 +212,37 @@ "This computes the Ridge and OLS regression coefficients directly. The identity\n", "matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n", "then invert this matrix and multiply by $X^T y$. The result\n", - "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n", - "fitted parameters $\\boldsymbol{\\theta}$." + "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n", + "fitted parameters $\\boldsymbol{\\theta}$.\n" ] }, { "cell_type": "markdown", - "id": "f115ba4e", + "id": "e1c215d5", "metadata": { "editable": true }, "source": [ "### 3a)\n", "\n", - "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$." + "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n" ] }, { "cell_type": "markdown", - "id": "a9b5189c", + "id": "587dd3dc", "metadata": { "editable": true }, "source": [ "### 3b)\n", "\n", - "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36." + "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n" ] }, { "cell_type": "markdown", - "id": "a3969ff6", + "id": "bfa34697", "metadata": { "editable": true }, @@ -250,15 +254,15 @@ "necessary if $n$ and $p$ are so large that the closed-form might be\n", "too slow or memory-intensive. We derive the gradients from the cost\n", "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n", - "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n", + "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n", "\n", - "Below is a template code for gradient descent implementation of ridge:" + "Below is a template code for gradient descent implementation of ridge:\n" ] }, { "cell_type": "code", "execution_count": 3, - "id": "34d87303", + "id": "49245f55", "metadata": { "collapsed": false, "editable": true @@ -273,19 +277,8 @@ "# Initialize weights for gradient descent\n", "theta = np.zeros(n_features)\n", "\n", - "# Arrays to store history for plotting\n", - "cost_history = np.zeros(num_iters)\n", - "\n", "# Gradient descent loop\n", - "m = n_samples # number of data points\n", "for t in range(num_iters):\n", - " # Compute prediction error\n", - " error = X_norm.dot(theta) - y_centered \n", - " # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n", - " cost_OLS = ?\n", - " cost_Ridge = ?\n", - " # You could add a history for both methods (optional)\n", - " cost_history[t] = ?\n", " # Compute gradients for OSL and Ridge\n", " grad_OLS = ?\n", " grad_Ridge = ?\n", @@ -302,31 +295,33 @@ }, { "cell_type": "markdown", - "id": "989f70bb", + "id": "f3f43f2c", "metadata": { "editable": true }, "source": [ "### 4a)\n", "\n", - "Discuss the results as function of the learning rate parameters and the number of iterations." + "Write first a gradient descent code for OLS only using the above template.\n", + "Discuss the results as function of the learning rate parameters and the number of iterations\n" ] }, { "cell_type": "markdown", - "id": "370b2dad", + "id": "9ba303be", "metadata": { "editable": true }, "source": [ "### 4b)\n", "\n", - "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?" + "Write then a similar code for Ridge regression using the above template.\n", + "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n" ] }, { "cell_type": "markdown", - "id": "ef197cd7", + "id": "78362c6c", "metadata": { "editable": true }, @@ -346,13 +341,13 @@ "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n", "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n", "\n", - "Below is the code to generate the dataset:" + "Below is the code to generate the dataset:\n" ] }, { "cell_type": "code", - "execution_count": 4, - "id": "4ccc2f65", + "execution_count": null, + "id": "8be1cebe", "metadata": { "collapsed": false, "editable": true @@ -375,13 +370,13 @@ "X = np.random.randn(n_samples, n_features) # standard normal distribution\n", "\n", "# Generate target values y with a linear combination of X and theta_true, plus noise\n", - "noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n", + "noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n", "y = X.dot @ theta_true + noise" ] }, { "cell_type": "markdown", - "id": "00e279ef", + "id": "e2693666", "metadata": { "editable": true }, @@ -390,29 +385,29 @@ "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n", "coefficient. For example, feature 0 has\n", "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n", - "the expected relationship is:" + "the expected relationship is:\n" ] }, { "cell_type": "markdown", - "id": "c910b3f4", + "id": "bc954d12", "metadata": { "editable": true }, "source": [ "$$\n", "y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n", - "$$" + "$$\n" ] }, { "cell_type": "markdown", - "id": "89e6e040", + "id": "6534b610", "metadata": { "editable": true }, "source": [ - "You can remove the noise if you wish to. \n", + "You can remove the noise if you wish to.\n", "\n", "Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n", "\n", @@ -420,11 +415,15 @@ "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n", "generate the data. Keep in mind that due to regularization and noise,\n", "the learned values will not exactly equal the true ones, but they\n", - "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?" + "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n" ] } ], - "metadata": {}, + "metadata": { + "language_info": { + "name": "python" + } + }, "nbformat": 4, "nbformat_minor": 5 } diff --git a/doc/LectureNotes/exercisesweek38.ipynb b/doc/LectureNotes/exercisesweek38.ipynb new file mode 100644 index 000000000..c100028a5 --- /dev/null +++ b/doc/LectureNotes/exercisesweek38.ipynb @@ -0,0 +1,485 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1da77599", + "metadata": {}, + "source": [ + "# Exercises week 38\n", + "\n", + "## September 15-19\n", + "\n", + "## Resampling and the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "e9f27b0e", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Derive expectation and variances values related to linear regression\n", + "- Compute expectation and variances values related to linear regression\n", + "- Compute and evaluate the trade-off between bias and variance of a model\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n", + "\n", + "- The jupyter notebook with the exercises completed\n", + "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n" + ] + }, + { + "cell_type": "markdown", + "id": "984af8e3", + "metadata": {}, + "source": [ + "## Use the books!\n", + "\n", + "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n", + "\n", + "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n", + "\n", + "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n", + "\n", + "### Definitions\n", + "\n", + "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n" + ] + }, + { + "cell_type": "markdown", + "id": "c16f7d0e", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "9fcf981a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "d4189366", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4fca21b", + "metadata": {}, + "source": [ + "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "5de0c7e6", + "metadata": {}, + "source": [ + "## Exercise 1: Expectation values for ordinary least squares expressions\n" + ] + }, + { + "cell_type": "markdown", + "id": "d878c699", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "08b7007d", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "46e93394", + "metadata": {}, + "source": [ + "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n" + ] + }, + { + "cell_type": "markdown", + "id": "be1b65be", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "d2143684", + "metadata": {}, + "source": [ + "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n", + "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n" + ] + }, + { + "cell_type": "markdown", + "id": "f5c2dc22", + "metadata": {}, + "source": [ + "## Exercise 2: Expectation values for Ridge regression\n" + ] + }, + { + "cell_type": "markdown", + "id": "3893e3e7", + "metadata": {}, + "source": [ + "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n" + ] + }, + { + "cell_type": "markdown", + "id": "79dc571f", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "028209a1", + "metadata": {}, + "source": [ + "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b4e721fc", + "metadata": {}, + "source": [ + "**b)** Show that the variance is\n" + ] + }, + { + "cell_type": "markdown", + "id": "090eb1e1", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T}\\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b8e8697", + "metadata": {}, + "source": [ + "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n" + ] + }, + { + "cell_type": "markdown", + "id": "74bc300b", + "metadata": {}, + "source": [ + "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n" + ] + }, + { + "cell_type": "markdown", + "id": "eeb86010", + "metadata": {}, + "source": [ + "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n", + "\n", + "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n" + ] + }, + { + "cell_type": "markdown", + "id": "522a0d1d", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "831db06c", + "metadata": {}, + "source": [ + "**a)** Show that you can rewrite this into an expression which contains\n", + "\n", + "- the variance of the model (the variance term)\n", + "- the expected deviation of the mean of the model from the true data (the bias term)\n", + "- the variance of the noise\n", + "\n", + "In other words, show that:\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cc52b3c", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "8cb50416", + "metadata": {}, + "source": [ + "with\n" + ] + }, + { + "cell_type": "markdown", + "id": "e49bdbb4", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "eca5554a", + "metadata": {}, + "source": [ + "and\n" + ] + }, + { + "cell_type": "markdown", + "id": "b1054343", + "metadata": {}, + "source": [ + "$$\n", + "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", + "$$\n", + "\n", + "In order to arrive at the equation for the bias, we have to approximate the unknown function $f$ with the output/target values $y$.\n" + ] + }, + { + "cell_type": "markdown", + "id": "70fbfcd7", + "metadata": {}, + "source": [ + "**b)** Explain what the terms mean and discuss their interpretations.\n" + ] + }, + { + "cell_type": "markdown", + "id": "b8f8b9d1", + "metadata": {}, + "source": [ + "## Exercise 4: Computing the Bias and Variance\n" + ] + }, + { + "cell_type": "markdown", + "id": "9e012430", + "metadata": {}, + "source": [ + "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n", + "\n", + "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5bf581c", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "predictions = np.random.rand(bootstraps, n) * 10 + 10\n", + "# The definition of targets has been updated, and was wrong earlier in the week.\n", + "targets = np.random.rand(1, n)\n", + "\n", + "mse = ...\n", + "bias = ...\n", + "variance = ..." + ] + }, + { + "cell_type": "markdown", + "id": "7b1dc621", + "metadata": {}, + "source": [ + "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n", + "\n", + "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n" + ] + }, + { + "cell_type": "markdown", + "id": "8da63362", + "metadata": {}, + "source": [ + "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd5855e4", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.preprocessing import (\n", + " PolynomialFeatures,\n", + ") # use the fit_transform method of the created object!\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e35fa37", + "metadata": {}, + "outputs": [], + "source": [ + "n = 100\n", + "bootstraps = 1000\n", + "\n", + "x = np.linspace(-3, 3, n)\n", + "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n", + "\n", + "biases = []\n", + "variances = []\n", + "mses = []\n", + "\n", + "# for p in range(1, 5):\n", + "# predictions = ...\n", + "# targets = ...\n", + "#\n", + "# X = ...\n", + "# X_train, X_test, y_train, y_test = ...\n", + "# for b in range(bootstraps):\n", + "# X_train_re, y_train_re = ...\n", + "#\n", + "# # fit your model on the sampled data\n", + "#\n", + "# # make predictions on the test data\n", + "# predictions[b, :] =\n", + "# targets[b, :] =\n", + "#\n", + "# biases.append(...)\n", + "# variances.append(...)\n", + "# mses.append(...)" + ] + }, + { + "cell_type": "markdown", + "id": "253b8461", + "metadata": {}, + "source": [ + "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n", + "\n", + "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n" + ] + }, + { + "cell_type": "markdown", + "id": "46250fbc", + "metadata": {}, + "source": [ + "## Exercise 5: Interpretation of scaling and metrics\n" + ] + }, + { + "cell_type": "markdown", + "id": "5af53055", + "metadata": {}, + "source": [ + "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n", + "\n", + "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n", + "\n", + "Briefly answer the following:\n", + "\n", + "**a)** Why do we scale data?\n", + "\n", + "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n", + "\n", + "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n", + "\n", + "**d)** Why do we say that the Ridge method gives a biased model?\n", + "\n", + "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n", + "\n", + "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n", + "\n", + "**h)** What is an advantage of the R2 score over the MSE?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/exercisesweek39.ipynb b/doc/LectureNotes/exercisesweek39.ipynb new file mode 100644 index 000000000..22a86cb56 --- /dev/null +++ b/doc/LectureNotes/exercisesweek39.ipynb @@ -0,0 +1,185 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "433db993", + "metadata": {}, + "source": [ + "# Exercises week 39\n", + "\n", + "## Getting started with project 1\n" + ] + }, + { + "cell_type": "markdown", + "id": "6b931365", + "metadata": {}, + "source": [ + "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n", + "\n", + "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2a63bae1", + "metadata": {}, + "source": [ + "### Learning goals\n", + "\n", + "After completing these exercises, you will know how to\n", + "\n", + "- Create a properly formatted report in Overleaf\n", + "- Select and present graphs for a scientific report\n", + "- Write an abstract and introduction for a scientific report\n", + "\n", + "### Deliverables\n", + "\n", + "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise 4.\n" + ] + }, + { + "cell_type": "markdown", + "id": "e0f2d99d", + "metadata": {}, + "source": [ + "## Exercise 1: Creating the report document\n" + ] + }, + { + "cell_type": "markdown", + "id": "d06bfb29", + "metadata": {}, + "source": [ + "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n", + "\n", + "**a)** Create an account on Overleaf.com, or log in using SSO with your UiO email.\n", + "\n", + "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n", + "\n", + "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n", + "\n", + "**d)** Read the general guideline for writing a report, which can be found at .\n", + "\n", + "**e)** Look at the provided example of an earlier project, found at \n" + ] + }, + { + "cell_type": "markdown", + "id": "ec36f4c3", + "metadata": {}, + "source": [ + "## Exercise 2: Adding good figures\n" + ] + }, + { + "cell_type": "markdown", + "id": "f50723f8", + "metadata": {}, + "source": [ + "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n", + "\n", + "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n", + "\n", + "**c)** Refer to the figure in your text using \\ref.\n", + "\n", + "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n", + "\n", + "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n" + ] + }, + { + "cell_type": "markdown", + "id": "276c214e", + "metadata": {}, + "source": [ + "## Exercise 3: Writing an abstract and introduction\n" + ] + }, + { + "cell_type": "markdown", + "id": "f4134eb5", + "metadata": {}, + "source": [ + "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n", + "\n", + "**a)** Read the guidelines on abstract and introduction before you start.\n", + "\n", + "**b)** Write an abstract for project 1 in your report.\n", + "\n", + "**c)** Write an introduction for project 1 in your report.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2f512b59", + "metadata": {}, + "source": [ + "## Exercise 4: Making the code available and presentable\n" + ] + }, + { + "cell_type": "markdown", + "id": "77fe1fec", + "metadata": {}, + "source": [ + "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n", + "\n", + "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n", + "\n", + "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n", + "\n", + "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n", + "\n", + "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n", + "\n", + "**e)** Create a README file in the repository or project folder with\n", + "\n", + "- the name of the group members\n", + "- a short description of the project\n", + "- a description of how to install the required packages to run your code from a requirements.txt file\n", + "- names and descriptions of the various notebooks in the Code folder and the results they produce\n" + ] + }, + { + "cell_type": "markdown", + "id": "f1d72c56", + "metadata": {}, + "source": [ + "## Exercise 5: Referencing\n", + "\n", + "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n", + "\n", + "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n", + "\n", + "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n", + "\n", + "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/exercisesweek41.ipynb b/doc/LectureNotes/exercisesweek41.ipynb new file mode 100644 index 000000000..190c0b96a --- /dev/null +++ b/doc/LectureNotes/exercisesweek41.ipynb @@ -0,0 +1,804 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4b4c06bc", + "metadata": {}, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "bcb25e64", + "metadata": {}, + "source": [ + "# Exercises week 41\n", + "\n", + "**October 6-10, 2025**\n", + "\n", + "Date: **Deadline is Friday October 10 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "id": "bb01f126", + "metadata": {}, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n", + "\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n", + "\n", + "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c6f61b09", + "metadata": {}, + "outputs": [], + "source": [ + "import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later\n", + "from sklearn import datasets\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "\n", + "# Defining some activation functions\n", + "def ReLU(z):\n", + " return np.where(z > 0, z, 0)\n", + "\n", + "\n", + "def sigmoid(z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + "\n", + "def softmax(z):\n", + " \"\"\"Compute softmax values for each set of scores in the rows of the matrix z.\n", + " Used with batched input data.\"\"\"\n", + " e_z = np.exp(z - np.max(z, axis=0))\n", + " return e_z / np.sum(e_z, axis=1)[:, np.newaxis]\n", + "\n", + "\n", + "def softmax_vec(z):\n", + " \"\"\"Compute softmax values for each set of scores in the vector z.\n", + " Use this function when you use the activation function on one vector at a time\"\"\"\n", + " e_z = np.exp(z - np.max(z))\n", + " return e_z / np.sum(e_z)" + ] + }, + { + "cell_type": "markdown", + "id": "6248ec53", + "metadata": {}, + "source": [ + "# Exercise 1\n", + "\n", + "In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37f30740", + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(2024)\n", + "\n", + "x = np.random.randn(2) # network input. This is a single input with two features\n", + "W1 = np.random.randn(4, 2) # first layer weights" + ] + }, + { + "cell_type": "markdown", + "id": "4ed2cf3d", + "metadata": {}, + "source": [ + "**a)** Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "edf7217b", + "metadata": {}, + "source": [ + "**b)** Define the bias of the first layer, `b1`with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2129c19f", + "metadata": {}, + "outputs": [], + "source": [ + "b1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "09e8d453", + "metadata": {}, + "source": [ + "**c)** Compute the intermediary `z1` for the first layer\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6837119b", + "metadata": {}, + "outputs": [], + "source": [ + "z1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "6f71374e", + "metadata": {}, + "source": [ + "**d)** Compute the activation `a1` for the first layer using the ReLU activation function defined earlier.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d41ed19", + "metadata": {}, + "outputs": [], + "source": [ + "a1 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "088710c0", + "metadata": {}, + "source": [ + "Confirm that you got the correct activation with the test below. Make sure that you define `b1` with the randn function right after you define `W1`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4d2f54b4", + "metadata": {}, + "outputs": [], + "source": [ + "sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])\n", + "\n", + "print(np.allclose(a1, sol1))" + ] + }, + { + "cell_type": "markdown", + "id": "7fb0cf46", + "metadata": {}, + "source": [ + "# Exercise 2\n", + "\n", + "Now we will add a layer to the network with an output of length 8 and ReLU activation.\n", + "\n", + "**a)** What is the input of the second layer? What is its shape?\n", + "\n", + "**b)** Define the weight and bias of the second layer with the right shapes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00063acf", + "metadata": {}, + "outputs": [], + "source": [ + "W2 = ...\n", + "b2 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "5bd7d84b", + "metadata": {}, + "source": [ + "**c)** Compute the intermediary `z2` and activation `a2` for the second layer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fd0383d", + "metadata": {}, + "outputs": [], + "source": [ + "z2 = ...\n", + "a2 = ..." + ] + }, + { + "cell_type": "markdown", + "id": "1b5daae5", + "metadata": {}, + "source": [ + "Confirm that you got the correct activation shape with the test below.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7f2f8a1", + "metadata": {}, + "outputs": [], + "source": [ + "print(\n", + " np.allclose(np.exp(len(a2)), 2980.9579870417283)\n", + ") # This should evaluate to True if a2 has the correct shape :)" + ] + }, + { + "cell_type": "markdown", + "id": "3759620d", + "metadata": {}, + "source": [ + "# Exercise 3\n", + "\n", + "We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.\n", + "\n", + "**a)** Complete the function below so that it returns a list `layers` of weight and bias tuples `(W, b)` for each layer, in order, with the correct shapes that we can use later as our network parameters.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c58f10f9", + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = ...\n", + " b = ...\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers" + ] + }, + { + "cell_type": "markdown", + "id": "bdc0cda2", + "metadata": {}, + "source": [ + "**b)** Comple the function below so that it evaluates the intermediary `z` and activation `a` for each layer, with ReLU actication, and returns the final activation `a`. This is the complete feed-forward pass, a full neural network!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5262df05", + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_all_relu(layers, input):\n", + " a = input\n", + " for W, b in layers:\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "245adbcb", + "metadata": {}, + "source": [ + "**c)** Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89a8f70d", + "metadata": {}, + "outputs": [], + "source": [ + "input_size = ...\n", + "layer_output_sizes = [...]\n", + "\n", + "x = np.random.rand(input_size)\n", + "layers = ...\n", + "predict = ...\n", + "print(predict)" + ] + }, + { + "cell_type": "markdown", + "id": "0da7fd52", + "metadata": {}, + "source": [ + "**d)** Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "306d8b7c", + "metadata": {}, + "source": [ + "# Exercise 4 - Custom activation for each layer\n" + ] + }, + { + "cell_type": "markdown", + "id": "221c7b6c", + "metadata": {}, + "source": [ + "So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.\n" + ] + }, + { + "cell_type": "markdown", + "id": "10896d06", + "metadata": {}, + "source": [ + "**a)** Complete the `feed_forward` function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de062369", + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward(input, layers, activation_funcs):\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "8f7df363", + "metadata": {}, + "source": [ + "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n", + "\n", + "Evaluate a network with three layers and these activation functions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "301b46dc", + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = ...\n", + "layer_output_sizes = [...]\n", + "activation_funcs = [ReLU, ReLU, sigmoid]\n", + "layers = ...\n", + "\n", + "x = np.random.randn(network_input_size)\n", + "feed_forward(x, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "9c914fd0", + "metadata": {}, + "source": [ + "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n" + ] + }, + { + "cell_type": "markdown", + "id": "a8d6c425", + "metadata": {}, + "source": [ + "# Exercise 5 - Processing multiple inputs at once\n" + ] + }, + { + "cell_type": "markdown", + "id": "0f4330a4", + "metadata": {}, + "source": [ + "So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.\n", + "\n", + "To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.\n" + ] + }, + { + "cell_type": "markdown", + "id": "17023bb7", + "metadata": {}, + "source": [ + "**a)** Complete the function `create_layers_batch` so that the weight matrix is the transpose of what it was when you only sent in one input at a time.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a241fd79", + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers_batch(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = ...\n", + " b = ...\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers" + ] + }, + { + "cell_type": "markdown", + "id": "a6349db6", + "metadata": {}, + "source": [ + "**b)** Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function `feed_forward_batch` so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "425f3bcc", + "metadata": {}, + "outputs": [], + "source": [ + "inputs = np.random.rand(1000, 4)\n", + "\n", + "\n", + "def feed_forward_batch(inputs, layers, activation_funcs):\n", + " a = inputs\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = ...\n", + " a = ...\n", + " return a" + ] + }, + { + "cell_type": "markdown", + "id": "efd07b4e", + "metadata": {}, + "source": [ + "**c)** Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce6fcc2f", + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = ...\n", + "layer_output_sizes = [...]\n", + "activation_funcs = [...]\n", + "layers = create_layers_batch(network_input_size, layer_output_sizes)\n", + "\n", + "x = np.random.randn(network_input_size)\n", + "feed_forward_batch(inputs, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "87999271", + "metadata": {}, + "source": [ + "You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.\n" + ] + }, + { + "cell_type": "markdown", + "id": "237eb782", + "metadata": {}, + "source": [ + "# Exercise 6 - Predicting on real data\n" + ] + }, + { + "cell_type": "markdown", + "id": "54d5fde7", + "metadata": {}, + "source": [ + "You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).\n", + "\n", + "This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bd4c148", + "metadata": {}, + "outputs": [], + "source": [ + "iris = datasets.load_iris()\n", + "\n", + "_, ax = plt.subplots()\n", + "scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)\n", + "ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])\n", + "_ = ax.legend(\n", + " scatter.legend_elements()[0], iris.target_names, loc=\"lower right\", title=\"Classes\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed3e2fc9", + "metadata": {}, + "outputs": [], + "source": [ + "inputs = iris.data\n", + "\n", + "# Since each prediction is a vector with a score for each of the three types of flowers,\n", + "# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.\n", + "targets = np.zeros((len(iris.data), 3))\n", + "for i, t in enumerate(iris.target):\n", + " targets[i, t] = 1\n", + "\n", + "\n", + "def accuracy(predictions, targets):\n", + " one_hot_predictions = np.zeros(predictions.shape)\n", + "\n", + " for i, prediction in enumerate(predictions):\n", + " one_hot_predictions[i, np.argmax(prediction)] = 1\n", + " return accuracy_score(one_hot_predictions, targets)" + ] + }, + { + "cell_type": "markdown", + "id": "0362c4a9", + "metadata": {}, + "source": [ + "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n" + ] + }, + { + "cell_type": "markdown", + "id": "bf62607e", + "metadata": {}, + "source": [ + "**b)** Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 \"nodes\", the second has the number of nodes you found in exercise a). Softmax returns a \"probability distribution\", in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5366d4ae", + "metadata": {}, + "outputs": [], + "source": [ + "...\n", + "layers = ..." + ] + }, + { + "cell_type": "markdown", + "id": "c528846f", + "metadata": {}, + "source": [ + "**c)** Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c783105", + "metadata": {}, + "outputs": [], + "source": [ + "predictions = feed_forward_batch(inputs, layers, activation_funcs)" + ] + }, + { + "cell_type": "markdown", + "id": "01a3caa8", + "metadata": {}, + "source": [ + "**d)** Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2612b82", + "metadata": {}, + "outputs": [], + "source": [ + "print(accuracy(predictions, targets))" + ] + }, + { + "cell_type": "markdown", + "id": "334560b6", + "metadata": {}, + "source": [ + "# Exercise 7 - Training on real data (Optional)\n", + "\n", + "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n" + ] + }, + { + "cell_type": "markdown", + "id": "700cabe4", + "metadata": {}, + "source": [ + "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f30e6e2c", + "metadata": {}, + "outputs": [], + "source": [ + "def cross_entropy(predict, target):\n", + " return np.sum(-target * np.log(predict))\n", + "\n", + "\n", + "def cost(input, layers, activation_funcs, target):\n", + " predict = feed_forward_batch(input, layers, activation_funcs)\n", + " return cross_entropy(predict, target)" + ] + }, + { + "cell_type": "markdown", + "id": "7ea9c1a4", + "metadata": {}, + "source": [ + "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n", + "\n", + "$$\n", + "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "id": "6c753e3b", + "metadata": {}, + "source": [ + "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56bef776", + "metadata": {}, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "\n", + "gradient_func = grad(\n", + " cost, 1\n", + ") # Taking the gradient wrt. the second input to the cost function, i.e. the layers" + ] + }, + { + "cell_type": "markdown", + "id": "7b1b74bc", + "metadata": {}, + "source": [ + "**a)** What shape should the gradient of the cost function wrt. weights and biases be?\n", + "\n", + "**b)** Use the `gradient_func` function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what's inside. What does the `grad` func from autograd actually do?\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "841c9e87", + "metadata": {}, + "outputs": [], + "source": [ + "layers_grad = gradient_func(\n", + " inputs, layers, activation_funcs, targets\n", + ") # Don't change this" + ] + }, + { + "cell_type": "markdown", + "id": "adc9e9be", + "metadata": {}, + "source": [ + "**c)** Finish the `train_network` function.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e4d38d3", + "metadata": {}, + "outputs": [], + "source": [ + "def train_network(\n", + " inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n", + "):\n", + " for i in range(epochs):\n", + " layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n", + " for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n", + " W -= ...\n", + " b -= ..." + ] + }, + { + "cell_type": "markdown", + "id": "2f65d663", + "metadata": {}, + "source": [ + "**e)** What do we call the gradient method used above?\n" + ] + }, + { + "cell_type": "markdown", + "id": "7059dd8c", + "metadata": {}, + "source": [ + "**d)** Train your network and see how the accuracy changes! Make a plot if you want.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5027c7a5", + "metadata": {}, + "outputs": [], + "source": [ + "..." + ] + }, + { + "cell_type": "markdown", + "id": "3bc77016", + "metadata": {}, + "source": [ + "**e)** How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/exercisesweek42.ipynb b/doc/LectureNotes/exercisesweek42.ipynb new file mode 100644 index 000000000..9925836a4 --- /dev/null +++ b/doc/LectureNotes/exercisesweek42.ipynb @@ -0,0 +1,719 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercises week 42\n", + "\n", + "**October 13-17, 2025**\n", + "\n", + "Date: **Deadline is Friday October 17 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "The aim of the exercises this week is to train the neural network you implemented last week.\n", + "\n", + "To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.\n", + "\n", + "You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.\n", + "\n", + "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.\n", + "\n", + "First, some setup code that you will need.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import autograd.numpy as np # We need to use this numpy wrapper to make automatic differentiation work later\n", + "from autograd import grad, elementwise_grad\n", + "from sklearn import datasets\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "\n", + "# Defining some activation functions\n", + "def ReLU(z):\n", + " return np.where(z > 0, z, 0)\n", + "\n", + "\n", + "# Derivative of the ReLU function\n", + "def ReLU_der(z):\n", + " return np.where(z > 0, 1, 0)\n", + "\n", + "\n", + "def sigmoid(z):\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + "\n", + "def mse(predict, target):\n", + " return np.mean((predict - target) ** 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1 - Understand the feed forward pass\n", + "\n", + "**a)** Complete last weeks' exercises if you haven't already (recommended).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 2 - Gradient with one layer using autograd\n", + "\n", + "For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.\n", + "\n", + "In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_one_layer(W, b, x):\n", + " z = ...\n", + " a = ...\n", + " return a\n", + "\n", + "\n", + "def cost_one_layer(W, b, x, target):\n", + " predict = feed_forward_one_layer(W, b, x)\n", + " return mse(predict, target)\n", + "\n", + "\n", + "x = np.random.rand(2)\n", + "target = np.random.rand(3)\n", + "\n", + "W = ...\n", + "b = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "autograd_one_layer = grad(cost_one_layer, [0, 1])\n", + "W_g, b_g = autograd_one_layer(W, b, x, target)\n", + "print(W_g, b_g)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 3 - Gradient with one layer writing backpropagation by hand\n", + "\n", + "Before you use the gradient you found using autograd, you will have to find the gradient \"manually\", to better understand how the backpropagation computation works. To do backpropagation \"manually\", you will need to write out expressions for many derivatives along the computation.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.\n", + "\n", + "$$\n", + "\\frac{dC}{dW} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{dW}\n", + "$$\n", + "\n", + "$$\n", + "\\frac{dC}{db} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{db}\n", + "$$\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Which intermediary results can be reused between the two expressions?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = W @ x + b\n", + "a = sigmoid(z)\n", + "\n", + "predict = a\n", + "\n", + "\n", + "def mse_der(predict, target):\n", + " return ...\n", + "\n", + "\n", + "print(mse_der(predict, target))\n", + "\n", + "cost_autograd = grad(mse, 0)\n", + "print(cost_autograd(predict, target))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sigmoid_der(z):\n", + " return ...\n", + "\n", + "\n", + "print(sigmoid_der(z))\n", + "\n", + "sigmoid_autograd = elementwise_grad(sigmoid, 0)\n", + "print(sigmoid_autograd(z))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Using the two derivatives you just computed, compute this intermetidary gradient you will use later:\n", + "\n", + "$$\n", + "\\frac{dC}{dz} = \\frac{dC}{da}\\frac{da}{dz}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da = ...\n", + "dC_dz = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**e)** What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**f)** Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da = ...\n", + "dC_dz = ...\n", + "dC_dW = ...\n", + "dC_db = ...\n", + "\n", + "print(dC_dW, dC_db)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should get the same results as with autograd.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "W_g, b_g = autograd_one_layer(W, b, x, target)\n", + "print(W_g, b_g)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 4 - Gradient with two layers writing backpropagation by hand\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let's move up to two layers.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.random.rand(2)\n", + "target = np.random.rand(4)\n", + "\n", + "W1 = np.random.rand(3, 2)\n", + "b1 = np.random.rand(3)\n", + "\n", + "W2 = np.random.rand(4, 3)\n", + "b2 = np.random.rand(4)\n", + "\n", + "layers = [(W1, b1), (W2, b2)]" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [], + "source": [ + "z1 = W1 @ x + b1\n", + "a1 = sigmoid(z1)\n", + "z2 = W2 @ a1 + b2\n", + "a2 = sigmoid(z2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.\n", + "\n", + "**a)** Compute the gradients of the last layer, just like you did the single layer in the previous exercise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da2 = ...\n", + "dC_dz2 = ...\n", + "dC_dW2 = ...\n", + "dC_db2 = ..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.\n", + "\n", + "$$\n", + "\\frac{dC}{da_1} = \\frac{dC}{dz_2}\\frac{dz_2}{da_1}\n", + "$$\n", + "\n", + "**b)** What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute $z_2$)\n", + "\n", + "$$\n", + "\\frac{dz_2}{da_1}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.\n", + "\n", + "$$\n", + "\\frac{dC}{dW_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{dW_1}\n", + "$$\n", + "\n", + "$$\n", + "\\frac{dC}{db_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{db_1}\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [], + "source": [ + "dC_da1 = ...\n", + "dC_dz1 = ...\n", + "dC_dW1 = ...\n", + "dC_db1 = ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(dC_dW1, dC_db1)\n", + "print(dC_dW2, dC_db2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Make sure you got the same gradient as the following code which uses autograd to do backpropagation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_two_layers(layers, x):\n", + " W1, b1 = layers[0]\n", + " z1 = W1 @ x + b1\n", + " a1 = sigmoid(z1)\n", + "\n", + " W2, b2 = layers[1]\n", + " z2 = W2 @ a1 + b2\n", + " a2 = sigmoid(z2)\n", + "\n", + " return a2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cost_two_layers(layers, x, target):\n", + " predict = feed_forward_two_layers(layers, x)\n", + " return mse(predict, target)\n", + "\n", + "\n", + "grad_two_layers = grad(cost_two_layers, 0)\n", + "grad_two_layers(layers, x, target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**e)** How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 5 - Gradient with any number of layers writing backpropagation by hand\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well done on getting this far! Now it's time to compute the gradient with any number of layers.\n", + "\n", + "First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def create_layers(network_input_size, layer_output_sizes):\n", + " layers = []\n", + "\n", + " i_size = network_input_size\n", + " for layer_output_size in layer_output_sizes:\n", + " W = np.random.randn(layer_output_size, i_size)\n", + " b = np.random.randn(layer_output_size)\n", + " layers.append((W, b))\n", + "\n", + " i_size = layer_output_size\n", + " return layers\n", + "\n", + "\n", + "def feed_forward(input, layers, activation_funcs):\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " z = W @ a + b\n", + " a = activation_func(z)\n", + " return a\n", + "\n", + "\n", + "def cost(layers, input, activation_funcs, target):\n", + " predict = feed_forward(input, layers, activation_funcs)\n", + " return mse(predict, target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.\n", + "\n", + "Here is a function which does that for you.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def feed_forward_saver(input, layers, activation_funcs):\n", + " layer_inputs = []\n", + " zs = []\n", + " a = input\n", + " for (W, b), activation_func in zip(layers, activation_funcs):\n", + " layer_inputs.append(a)\n", + " z = W @ a + b\n", + " a = activation_func(z)\n", + "\n", + " zs.append(z)\n", + "\n", + " return layer_inputs, zs, a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def backpropagation(\n", + " input, layers, activation_funcs, target, activation_ders, cost_der=mse_der\n", + "):\n", + " layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)\n", + "\n", + " layer_grads = [() for layer in layers]\n", + "\n", + " # We loop over the layers, from the last to the first\n", + " for i in reversed(range(len(layers))):\n", + " layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]\n", + "\n", + " if i == len(layers) - 1:\n", + " # For last layer we use cost derivative as dC_da(L) can be computed directly\n", + " dC_da = ...\n", + " else:\n", + " # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)\n", + " (W, b) = layers[i + 1]\n", + " dC_da = ...\n", + "\n", + " dC_dz = ...\n", + " dC_dW = ...\n", + " dC_db = ...\n", + "\n", + " layer_grads[i] = (dC_dW, dC_db)\n", + "\n", + " return layer_grads" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "network_input_size = 2\n", + "layer_output_sizes = [3, 4]\n", + "activation_funcs = [sigmoid, ReLU]\n", + "activation_ders = [sigmoid_der, ReLU_der]\n", + "\n", + "layers = create_layers(network_input_size, layer_output_sizes)\n", + "\n", + "x = np.random.rand(network_input_size)\n", + "target = np.random.rand(4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)\n", + "print(layer_grads)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cost_grad = grad(cost, 0)\n", + "cost_grad(layers, x, [sigmoid, ReLU], target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 6 - Batched inputs\n", + "\n", + "Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 7 - Training\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.\n", + "- IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!\n", + "- Instead, use the fact that the derivatives multiplied together simplify to **prediction - target** (see [source1](https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1), [source2](https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/))\n", + "\n", + "**b)** Use stochastic gradient descent with momentum when you train your network.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 8 (Optional) - Object orientation\n", + "\n", + "Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.\n", + "\n", + "**a)** Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.\n", + "\n", + "We provide here a skeleton structure which should get you started.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class NeuralNetwork:\n", + " def __init__(\n", + " self,\n", + " network_input_size,\n", + " layer_output_sizes,\n", + " activation_funcs,\n", + " activation_ders,\n", + " cost_fun,\n", + " cost_der,\n", + " ):\n", + " pass\n", + "\n", + " def predict(self, inputs):\n", + " # Simple feed forward pass\n", + " pass\n", + "\n", + " def cost(self, inputs, targets):\n", + " pass\n", + "\n", + " def _feed_forward_saver(self, inputs):\n", + " pass\n", + "\n", + " def compute_gradient(self, inputs, targets):\n", + " pass\n", + "\n", + " def update_weights(self, layer_grads):\n", + " pass\n", + "\n", + " # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it\n", + " def autograd_compliant_predict(self, layers, inputs):\n", + " pass\n", + "\n", + " def autograd_gradient(self, inputs, targets):\n", + " pass" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/doc/LectureNotes/exercisesweek43.ipynb b/doc/LectureNotes/exercisesweek43.ipynb new file mode 100644 index 000000000..f80e8787a --- /dev/null +++ b/doc/LectureNotes/exercisesweek43.ipynb @@ -0,0 +1,647 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "860d70d8", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "119c0988", + "metadata": { + "editable": true + }, + "source": [ + "# Exercises week 43 \n", + "**October 20-24, 2025**\n", + "\n", + "Date: **Deadline Friday October 24 at midnight**" + ] + }, + { + "cell_type": "markdown", + "id": "909887eb", + "metadata": { + "editable": true + }, + "source": [ + "# Overarching aims of the exercises for week 43\n", + "\n", + "The aim of the exercises this week is to gain some confidence with\n", + "ways to visualize the results of a classification problem. We will\n", + "target three ways of setting up the analysis. The first and simplest\n", + "one is the\n", + "1. so-called confusion matrix. The next one is the so-called\n", + "\n", + "2. ROC curve. Finally we have the\n", + "\n", + "3. Cumulative gain curve.\n", + "\n", + "We will use Logistic Regression as method for the classification in\n", + "this exercise. You can compare these results with those obtained with\n", + "your neural network code from project 2 without a hidden layer.\n", + "\n", + "In these exercises we will use binary and multi-class data sets\n", + "(the Iris data set from week 41).\n", + "\n", + "The underlying mathematics is described here." + ] + }, + { + "cell_type": "markdown", + "id": "1e1cb4fb", + "metadata": { + "editable": true + }, + "source": [ + "### Confusion Matrix\n", + "\n", + "A **confusion matrix** summarizes a classifier’s performance by\n", + "tabulating predictions versus true labels. For binary classification,\n", + "it is a $2\\times2$ table whose entries are counts of outcomes:" + ] + }, + { + "cell_type": "markdown", + "id": "7b090385", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e14904b", + "metadata": { + "editable": true + }, + "source": [ + "Here TP (true positives) is the number of cases correctly predicted as\n", + "positive, FP (false positives) is the number incorrectly predicted as\n", + "positive, TN (true negatives) is correctly predicted negative, and FN\n", + "(false negatives) is incorrectly predicted negative . In other words,\n", + "“positive” means class 1 and “negative” means class 0; for example, TP\n", + "occurs when the prediction and actual are both positive. Formally:" + ] + }, + { + "cell_type": "markdown", + "id": "e93ea290", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c80bea5b", + "metadata": { + "editable": true + }, + "source": [ + "where TPR and FPR are the true and false positive rates defined below.\n", + "\n", + "In multiclass classification with $K$ classes, the confusion matrix\n", + "generalizes to a $K\\times K$ table. Entry $N_{ij}$ in the table is\n", + "the count of instances whose true class is $i$ and whose predicted\n", + "class is $j$. For example, a three-class confusion matrix can be written\n", + "as:" + ] + }, + { + "cell_type": "markdown", + "id": "a0f68f5f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "869669b2", + "metadata": { + "editable": true + }, + "source": [ + "Here the diagonal entries $N_{ii}$ are the true positives for each\n", + "class, and off-diagonal entries are misclassifications. This matrix\n", + "allows computation of per-class metrics: e.g. for class $i$,\n", + "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n", + "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n", + "all remaining entries.\n", + "\n", + "As defined above, TPR and FPR come from the binary case. In binary\n", + "terms with $P$ actual positives and $N$ actual negatives, one has" + ] + }, + { + "cell_type": "markdown", + "id": "2abd82a7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n", + "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2f79325c", + "metadata": { + "editable": true + }, + "source": [ + "as used in standard confusion-matrix\n", + "formulations. These rates will be used in constructing ROC curves." + ] + }, + { + "cell_type": "markdown", + "id": "0ce65a47", + "metadata": { + "editable": true + }, + "source": [ + "### ROC Curve\n", + "\n", + "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n", + "between true positives and false positives as a discrimination\n", + "threshold varies. Specifically, for a binary classifier that outputs\n", + "a score or probability, one varies the threshold $t$ for declaring\n", + "**positive**, and computes at each $t$ the true positive rate\n", + "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n", + "confusion matrix at that threshold. The ROC curve is then the graph\n", + "of TPR versus FPR. By definition," + ] + }, + { + "cell_type": "markdown", + "id": "d750fdff", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "561bfb2c", + "metadata": { + "editable": true + }, + "source": [ + "where $TP,FP,TN,FN$ are counts determined by threshold $t$. A perfect\n", + "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n", + "\n", + "Formally, the ROC curve is obtained by plotting\n", + "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n", + "sweeps through the sorted scores). The Area Under the ROC Curve (AUC)\n", + "quantifies the average performance over all thresholds. It can be\n", + "interpreted probabilistically: $\\mathrm{AUC} =\n", + "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n", + "instance $X^+$ receives a higher score $s$ than a random negative\n", + "instance $X^-$ . Equivalently, the AUC is the integral under the ROC\n", + "curve:" + ] + }, + { + "cell_type": "markdown", + "id": "5ca722fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "30080a86", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ ranges over FPR (or fraction of negatives). A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0." + ] + }, + { + "cell_type": "markdown", + "id": "9e627156", + "metadata": { + "editable": true + }, + "source": [ + "### Cumulative Gain\n", + "\n", + "The cumulative gain curve (or gains chart) evaluates how many\n", + "positives are captured as one targets an increasing fraction of the\n", + "population, sorted by model confidence. To construct it, sort all\n", + "instances by decreasing predicted probability of the positive class.\n", + "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n", + "of all actual positives that fall in this subset. In formula form, if\n", + "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n", + "number of positives among the top $\\alpha$ of the data, the cumulative\n", + "gain at level $\\alpha$ is" + ] + }, + { + "cell_type": "markdown", + "id": "3e9132ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "75be6f5c", + "metadata": { + "editable": true + }, + "source": [ + "For example, cutting off at the top 10% of predictions yields a gain\n", + "equal to (positives in top 10%) divided by (total positives) .\n", + "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n", + "gives the gain curve. The baseline (random) curve is the diagonal\n", + "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n", + "toward 1.\n", + "\n", + "A related measure is the {\\em lift}, often called the gain ratio. It is the ratio of the model’s capture rate to that of random selection. Equivalently," + ] + }, + { + "cell_type": "markdown", + "id": "e5525570", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "18ff8dc2", + "metadata": { + "editable": true + }, + "source": [ + "A lift $>1$ indicates better-than-random targeting. In practice, gain\n", + "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n", + "show how many positives can be “gained” by focusing on a fraction of\n", + "the population ." + ] + }, + { + "cell_type": "markdown", + "id": "c3d3fde8", + "metadata": { + "editable": true + }, + "source": [ + "### Other measures: Precision, Recall, and the F$_1$ Measure\n", + "\n", + "Precision and recall (sensitivity) quantify binary classification\n", + "accuracy in terms of positive predictions. They are defined from the\n", + "confusion matrix as:" + ] + }, + { + "cell_type": "markdown", + "id": "f1f14c8e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "422cc743", + "metadata": { + "editable": true + }, + "source": [ + "Precision is the fraction of predicted positives that are correct, and\n", + "recall is the fraction of actual positives that are correctly\n", + "identified . A high-precision classifier makes few false-positive\n", + "errors, while a high-recall classifier makes few false-negative\n", + "errors.\n", + "\n", + "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean. The usual formula is:" + ] + }, + { + "cell_type": "markdown", + "id": "621a2e8b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "62eee54a", + "metadata": { + "editable": true + }, + "source": [ + "This can be shown to equal" + ] + }, + { + "cell_type": "markdown", + "id": "7a6a2e7a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b96c9ff4", + "metadata": { + "editable": true + }, + "source": [ + "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n", + "trade-off between precision and recall.\n", + "\n", + "For multi-class classification, one computes per-class\n", + "precision/recall/F$_1$ (treating each class as “positive” in a\n", + "one-vs-rest manner) and then averages. Common averaging methods are:\n", + "\n", + "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n", + "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ . This treats all classes equally regardless of size.\n", + "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$. This accounts for class imbalance by giving more weight to larger classes .\n", + "\n", + "Each of these averages has different use-cases. Micro-average is\n", + "dominated by common classes, macro-average highlights performance on\n", + "rare classes, and weighted-average is a compromise. These formulas\n", + "and concepts allow rigorous evaluation of classifier performance in\n", + "both binary and multi-class settings." + ] + }, + { + "cell_type": "markdown", + "id": "9274bf3f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercises\n", + "\n", + "Here is a simple code example which uses the Logistic regression machinery from **scikit-learn**.\n", + "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n", + "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "be9ff0b9", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "# from sklearn.datasets import fill in the data set\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data, fill inn\n", + "mydata.data = ?\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "# define which type of problem, binary or multiclass\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "51760b3e", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise a)\n", + "\n", + "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem." + ] + }, + { + "cell_type": "markdown", + "id": "c1d42f5f", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise b)\n", + "\n", + "Use a binary classification data available from **scikit-learn**. As an example you can use\n", + "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d20bb8be", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n", + "X, y = digits.data, digits.target" + ] + }, + { + "cell_type": "markdown", + "id": "828ea1cd", + "metadata": { + "editable": true + }, + "source": [ + "Alternatively, you can use the _make$\\_$classification_\n", + "functionality. This function generates a random $n$-class classification\n", + "dataset, which can be configured for binary classification by setting\n", + "n_classes=2. You can also control the number of samples, features,\n", + "informative features, redundant features, and more." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d271f0ba", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import make_classification\n", + "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "0068b032", + "metadata": { + "editable": true + }, + "source": [ + "You can use this option for the multiclass case as well, see the next exercise.\n", + "If you prefer to study other binary classification datasets, feel free\n", + "to replace the above suggestions with your own dataset.\n", + "\n", + "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve." + ] + }, + { + "cell_type": "markdown", + "id": "c45f5b41", + "metadata": { + "editable": true + }, + "source": [ + "### Exercise c) week 43\n", + "\n", + "As a multiclass problem, we will use the Iris data set discussed in\n", + "the exercises from weeks 41 and 42. This is a three-class data set and\n", + "you can set it up using **scikit-learn**," + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3b045d56", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_iris\n", + "iris = load_iris()\n", + "X = iris.data # Features\n", + "y = iris.target # Target labels" + ] + }, + { + "cell_type": "markdown", + "id": "14cc859c", + "metadata": { + "editable": true + }, + "source": [ + "Make plots of the confusion matrix, the ROC curve and the cumulative\n", + "gain curve for this (or other) multiclass data set." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/exercisesweek44.ipynb b/doc/LectureNotes/exercisesweek44.ipynb new file mode 100644 index 000000000..32aa0e723 --- /dev/null +++ b/doc/LectureNotes/exercisesweek44.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "55f7cd56", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "37c83276", + "metadata": { + "editable": true + }, + "source": [ + "# Exercises week 44\n", + "\n", + "**October 27-31, 2025**\n", + "\n", + "Date: **Deadline is Friday October 31 at midnight**\n" + ] + }, + { + "cell_type": "markdown", + "id": "58a26983", + "metadata": { + "editable": true + }, + "source": [ + "# Overarching aims of the exercises this week\n", + "\n", + "The exercise set this week has two parts.\n", + "\n", + "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n", + "\n", + "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n" + ] + }, + { + "cell_type": "markdown", + "id": "350c58e2", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n", + "\n", + "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n", + "\n", + "- An exported PDF of the report draft you have been working on.\n", + "- A comment linking to the github repository used in exercise **1d)**\n" + ] + }, + { + "cell_type": "markdown", + "id": "00f65f6e", + "metadata": {}, + "source": [ + "## Exercise 1:\n", + "\n", + "Following the same directions as in the weekly exercises for week 39:\n", + "\n", + "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n", + "\n", + "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n", + "\n", + "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n", + "\n", + "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n", + "\n", + "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n" + ] + }, + { + "cell_type": "markdown", + "id": "6dff53b8", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 2:\n", + "\n", + "**a)** Linear and logistic regression methods\n", + "\n", + "1. What is the main difference between ordinary least squares and Ridge regression?\n", + "\n", + "2. Which kind of data set would you use logistic regression for?\n", + "\n", + "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n", + "\n", + "4. Can you find an analytic solution to a logistic regression type of problem?\n", + "\n", + "5. What kind of cost function would you use in logistic regression?\n" + ] + }, + { + "cell_type": "markdown", + "id": "21a056a4", + "metadata": { + "editable": true + }, + "source": [ + "**b)** Deep learning\n", + "\n", + "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n", + "\n", + "2. Describe the architecture of a typical feed forward Neural Network (NN).\n", + "\n", + "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n", + "\n", + "4. How would you know if your model is suffering from the problem of exploding gradients?\n", + "\n", + "5. Can you name and explain a few hyperparameters used for training a neural network?\n", + "\n", + "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n", + "\n", + "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n", + "\n", + "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n", + "\n", + "9. How does L1/L2 regularization affect a neural network?\n", + "\n", + "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n" + ] + }, + { + "cell_type": "markdown", + "id": "7c48bc09", + "metadata": { + "editable": true + }, + "source": [ + "**c)** Optimization part\n", + "\n", + "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n", + "\n", + "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n", + "\n", + "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n", + "\n", + "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n", + "\n", + "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n" + ] + }, + { + "cell_type": "markdown", + "id": "56b0b5f6", + "metadata": { + "editable": true + }, + "source": [ + "**d)** Analysis of results\n", + "\n", + "1. How do you assess overfitting and underfitting?\n", + "\n", + "2. Why do we divide the data in test and train and/or eventually validation sets?\n", + "\n", + "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n", + "\n", + "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/project1.ipynb b/doc/LectureNotes/project1.ipynb index aba42cd41..5170af951 100644 --- a/doc/LectureNotes/project1.ipynb +++ b/doc/LectureNotes/project1.ipynb @@ -9,7 +9,7 @@ "source": [ "\n", - "" + "\n" ] }, { @@ -20,9 +20,34 @@ }, "source": [ "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n", + "\n", "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n", "\n", - "Date: **September 2**" + "Date: **September 2**\n" + ] + }, + { + "cell_type": "markdown", + "id": "beb333e3", + "metadata": {}, + "source": [ + "### Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + " - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + " - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + " - A PDF file of the report\n", + " - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + " - A README file with\n", + " - the name of the group members\n", + " - a short description of the project\n", + " - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n", + " - names and descriptions of the various notebooks in the Code folder and the results they produce\n" ] }, { @@ -35,7 +60,7 @@ "## Preamble: Note on writing reports, using reference material, AI and other tools\n", "\n", "We want you to answer the three different projects by handing in\n", - "reports written like a standard scientific/technical report. The\n", + "reports written like a standard scientific/technical report. The\n", "links at\n", "\n", "contain more information. There you can find examples of previous\n", @@ -63,14 +88,14 @@ "been studied in the scientific literature. This makes it easier for\n", "you to compare and analyze your results. Comparing with existing\n", "results from the scientific literature is also an essential element of\n", - "the scientific discussion. The University of California at Irvine\n", + "the scientific discussion. The University of California at Irvine\n", "with its Machine Learning repository at\n", " is an excellent site to\n", "look up for examples and\n", "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n", "interesting site. Feel free to explore these sites. When selecting\n", "other data sets, make sure these are sets used for regression problems\n", - "(not classification)." + "(not classification).\n" ] }, { @@ -90,7 +115,7 @@ "We will study how to fit polynomials to specific\n", "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n", "\n", - "We will use Runge's function (see for a discussion). The one-dimensional function we will study is" + "We will use Runge's function (see for a discussion). The one-dimensional function we will study is\n" ] }, { @@ -102,7 +127,7 @@ "source": [ "$$\n", "f(x) = \\frac{1}{1+25x^2}.\n", - "$$" + "$$\n" ] }, { @@ -114,14 +139,14 @@ "source": [ "Our first step will be to perform an OLS regression analysis of this\n", "function, trying out a polynomial fit with an $x$ dependence of the\n", - "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", + "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n", "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n", "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n", - "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", + "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n", "\n", "We will also include bootstrap as a resampling technique in order to\n", - "study the so-called **bias-variance tradeoff**. After that we will\n", - "include the so-called cross-validation technique." + "study the so-called **bias-variance tradeoff**. After that we will\n", + "include the so-called cross-validation technique.\n" ] }, { @@ -133,15 +158,15 @@ "source": [ "### Part a : Ordinary Least Square (OLS) for the Runge function\n", "\n", - "We will generate our own dataset for abovementioned function\n", + "We will generate our own dataset for abovementioned function\n", "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n", "of an added stochastic noise to this function using the normal\n", "distribution $N(0,1)$.\n", "\n", - "*Write your own code* (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", - "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", + "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n", + "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n", "\n", - "Evaluate the mean Squared error (MSE)" + "Evaluate the mean Squared error (MSE)\n" ] }, { @@ -154,7 +179,7 @@ "$$\n", "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n", "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n", - "$$" + "$$\n" ] }, { @@ -164,9 +189,9 @@ "editable": true }, "source": [ - "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", + "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n", "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n", - "then the score $R^2$ is defined as" + "then the score $R^2$ is defined as\n" ] }, { @@ -178,7 +203,7 @@ "source": [ "$$\n", "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n", - "$$" + "$$\n" ] }, { @@ -188,7 +213,7 @@ "editable": true }, "source": [ - "where we have defined the mean value of $\\boldsymbol{y}$ as" + "where we have defined the mean value of $\\boldsymbol{y}$ as\n" ] }, { @@ -200,7 +225,7 @@ "source": [ "$$\n", "\\bar{y} = \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n", - "$$" + "$$\n" ] }, { @@ -215,23 +240,23 @@ "\n", "Your code has to include a scaling/centering of the data (for example by\n", "subtracting the mean value), and\n", - "a split of the data in training and test data. For the scaling you can\n", + "a split of the data in training and test data. For the scaling you can\n", "either write your own code or use for example the function for\n", "splitting training data provided by the library **Scikit-Learn** (make\n", - "sure you have installed it). This function is called\n", - "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", + "sure you have installed it). This function is called\n", + "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n", "\n", "It is normal in essentially all Machine Learning studies to split the\n", - "data in a training set and a test set (eventually also an additional\n", - "validation set). There\n", + "data in a training set and a test set (eventually also an additional\n", + "validation set). There\n", "is no explicit recipe for how much data should be included as training\n", - "data and say test data. An accepted rule of thumb is to use\n", + "data and say test data. An accepted rule of thumb is to use\n", "approximately $2/3$ to $4/5$ of the data as training data.\n", "\n", "You can easily reuse the solutions to your exercises from week 35.\n", "See also the lecture slides from week 35 and week 36.\n", "\n", - "On scaling, we recommend reading the following section from the scikit-learn software description, see ." + "On scaling, we recommend reading the following section from the scikit-learn software description, see .\n" ] }, { @@ -241,14 +266,14 @@ "editable": true }, "source": [ - "### Part b: Adding Ridge regression for the Runge function\n", + "### Part b: Adding Ridge regression for the Runge function\n", "\n", "Write your own code for the Ridge method as done in the previous\n", - "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", + "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n", "\n", "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n", - "analyze your results with those obtained in part a) with the OLS method. Study the\n", - "dependence on $\\lambda$." + "analyze your results with those obtained in part a) with the OLS method. Study the\n", + "dependence on $\\lambda$.\n" ] }, { @@ -267,7 +292,7 @@ "from week 36).\n", "\n", "Study and compare your results from parts a) and b) with your gradient\n", - "descent approch. Discuss in particular the role of the learning rate." + "descent approch. Discuss in particular the role of the learning rate.\n" ] }, { @@ -283,7 +308,7 @@ "the gradient descent method by including **momentum**, **ADAgrad**,\n", "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n", "rate. Discuss the results and compare the different methods applied to\n", - "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods." + "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n" ] }, { @@ -299,12 +324,12 @@ "represents our first encounter with a machine learning method which\n", "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n", "descent methods you developed in parts c) and d) to solve the LASSO\n", - "optimization problem. You can compare your results with \n", + "optimization problem. You can compare your results with\n", "the functionalities of **Scikit-Learn**.\n", "\n", "Discuss (critically) your results for the Runge function from OLS,\n", "Ridge and LASSO regression using the various gradient descent\n", - "approaches." + "approaches.\n" ] }, { @@ -319,7 +344,7 @@ "Our last gradient step is to include stochastic gradient descent using\n", "the same methods to update the learning rates as in parts c-e).\n", "Compare and discuss your results with and without stochastic gradient\n", - "and give a critical assessment of the various methods." + "and give a critical assessment of the various methods.\n" ] }, { @@ -332,14 +357,14 @@ "### Part g: Bias-variance trade-off and resampling techniques\n", "\n", "Our aim here is to study the bias-variance trade-off by implementing\n", - "the **bootstrap** resampling technique. **We will only use the simpler\n", + "the **bootstrap** resampling technique. **We will only use the simpler\n", "ordinary least squares here**.\n", "\n", - "With a code which does OLS and includes resampling techniques, \n", + "With a code which does OLS and includes resampling techniques,\n", "we will now discuss the bias-variance trade-off in the context of\n", "continuous predictions such as regression. However, many of the\n", "intuitions and ideas discussed here also carry over to classification\n", - "tasks and basically all Machine Learning algorithms. \n", + "tasks and basically all Machine Learning algorithms.\n", "\n", "Before you perform an analysis of the bias-variance trade-off on your\n", "test data, make first a figure similar to Fig. 2.11 of Hastie,\n", @@ -356,7 +381,7 @@ "dataset $\\mathcal{L}$ consisting of the data\n", "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n", "\n", - "We assume that the true data is generated from a noisy model" + "We assume that the true data is generated from a noisy model\n" ] }, { @@ -368,7 +393,7 @@ "source": [ "$$\n", "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n", - "$$" + "$$\n" ] }, { @@ -387,7 +412,7 @@ "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n", "\n", "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n", - "squared error via the so-called cost function" + "squared error via the so-called cost function\n" ] }, { @@ -399,7 +424,7 @@ "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", - "$$" + "$$\n" ] }, { @@ -409,14 +434,14 @@ "editable": true }, "source": [ - "Here the expected value $\\mathbb{E}$ is the sample value. \n", + "Here the expected value $\\mathbb{E}$ is the sample value.\n", "\n", "Show that you can rewrite this in terms of a term which contains the\n", "variance of the model itself (the so-called variance term), a term\n", "which measures the deviation from the true data and the mean value of\n", "the model (the bias term) and finally the variance of the noise.\n", "\n", - "That is, show that" + "That is, show that\n" ] }, { @@ -428,7 +453,7 @@ "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", - "$$" + "$$\n" ] }, { @@ -438,7 +463,7 @@ "editable": true }, "source": [ - "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)" + "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n" ] }, { @@ -450,7 +475,7 @@ "source": [ "$$\n", "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", - "$$" + "$$\n" ] }, { @@ -460,7 +485,7 @@ "editable": true }, "source": [ - "and" + "and\n" ] }, { @@ -472,7 +497,7 @@ "source": [ "$$\n", "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", - "$$" + "$$\n" ] }, { @@ -482,11 +507,11 @@ "editable": true }, "source": [ - "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$. \n", + "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n", "\n", "The answer to this exercise should be included in the theory part of\n", - "the report. This exercise is also part of the weekly exercises of\n", - "week 38. Explain what the terms mean and discuss their\n", + "the report. This exercise is also part of the weekly exercises of\n", + "week 38. Explain what the terms mean and discuss their\n", "interpretations.\n", "\n", "Perform then a bias-variance analysis of the Runge function by\n", @@ -495,7 +520,7 @@ "Discuss the bias and variance trade-off as function\n", "of your model complexity (the degree of the polynomial) and the number\n", "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n", - "You can follow the code example in the jupyter-book at ." + "You can follow the code example in the jupyter-book at .\n" ] }, { @@ -505,20 +530,20 @@ "editable": true }, "source": [ - "### Part h): Cross-validation as resampling techniques, adding more complexity\n", + "### Part h): Cross-validation as resampling techniques, adding more complexity\n", "\n", "The aim here is to implement another widely popular\n", - "resampling technique, the so-called cross-validation method. \n", + "resampling technique, the so-called cross-validation method.\n", "\n", "Implement the $k$-fold cross-validation algorithm (feel free to use\n", "the functionality of **Scikit-Learn** or write your own code) and\n", "evaluate again the MSE function resulting from the test folds.\n", "\n", "Compare the MSE you get from your cross-validation code with the one\n", - "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results. \n", + "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n", "\n", "In addition to using the ordinary least squares method, you should\n", - "include both Ridge and Lasso regression in the final analysis." + "include both Ridge and Lasso regression in the final analysis.\n" ] }, { @@ -532,7 +557,7 @@ "\n", "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n", "\n", - "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h)." + "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n" ] }, { @@ -544,25 +569,25 @@ "source": [ "## Introduction to numerical projects\n", "\n", - "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers. \n", + "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n", "\n", - " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", "\n", - " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", "\n", - " * Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", + "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n", "\n", - " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", "\n", - " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", "\n", - " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", "\n", - " * Try to give an interpretation of you results in your answers to the problems.\n", + "- Try to give an interpretation of you results in your answers to the problems.\n", "\n", - " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", "\n", - " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n" ] }, { @@ -574,17 +599,17 @@ "source": [ "## Format for electronic delivery of report and programs\n", "\n", - "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n", "\n", - " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "- Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", "\n", - " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", "\n", - " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", "\n", - "Finally, \n", - "we encourage you to collaborate. Optimal working groups consist of \n", - "2-3 students. You can then hand in a common report." + "Finally,\n", + "we encourage you to collaborate. Optimal working groups consist of\n", + "2-3 students. You can then hand in a common report.\n" ] }, { @@ -596,42 +621,46 @@ "source": [ "## Software and needed installations\n", "\n", - "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, \n", + "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n", "we recommend that you install the following Python packages via **pip** as\n", + "\n", "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n", "\n", "For Python3, replace **pip** with **pip3**.\n", "\n", - "See below for a discussion of **tensorflow** and **scikit-learn**. \n", + "See below for a discussion of **tensorflow** and **scikit-learn**.\n", "\n", - "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows \n", + "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n", "for a seamless installation of additional software via for example\n", + "\n", "1. brew install python3\n", "\n", "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n", - "you can use **pip** as well and simply install Python as \n", - "1. sudo apt-get install python3 (or python for python2.7)\n", + "you can use **pip** as well and simply install Python as\n", + "\n", + "1. sudo apt-get install python3 (or python for python2.7)\n", + "\n", + "etc etc.\n", "\n", - "etc etc. \n", + "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", "\n", - "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n", "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n", "\n", - "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", + "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n", "\n", "Popular software packages written in Python for ML are\n", "\n", - "* [Scikit-learn](http://scikit-learn.org/stable/), \n", + "- [Scikit-learn](http://scikit-learn.org/stable/),\n", "\n", - "* [Tensorflow](https://www.tensorflow.org/),\n", + "- [Tensorflow](https://www.tensorflow.org/),\n", "\n", - "* [PyTorch](http://pytorch.org/) and \n", + "- [PyTorch](http://pytorch.org/) and\n", "\n", - "* [Keras](https://keras.io/).\n", + "- [Keras](https://keras.io/).\n", "\n", - "These are all freely available at their respective GitHub sites. They \n", + "These are all freely available at their respective GitHub sites. They\n", "encompass communities of developers in the thousands or more. And the number\n", - "of code developers and contributors keeps increasing." + "of code developers and contributors keeps increasing.\n" ] } ], diff --git a/doc/LectureNotes/project2.ipynb b/doc/LectureNotes/project2.ipynb new file mode 100644 index 000000000..faf4aee16 --- /dev/null +++ b/doc/LectureNotes/project2.ipynb @@ -0,0 +1,635 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "96e577ca", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "067c02b9", + "metadata": { + "editable": true + }, + "source": [ + "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n", + "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n", + "\n", + "Date: **October 14, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "01f9fedd", + "metadata": { + "editable": true + }, + "source": [ + "## Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + "\n", + " * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + "\n", + " * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "\n", + "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + "\n", + "A PDF file of the report\n", + " * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + "\n", + " * A README file with the name of the group members\n", + "\n", + " * a short description of the project\n", + "\n", + " * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce" + ] + }, + { + "cell_type": "markdown", + "id": "9f8e4871", + "metadata": { + "editable": true + }, + "source": [ + "### Preamble: Note on writing reports, using reference material, AI and other tools\n", + "\n", + "We want you to answer the three different projects by handing in\n", + "reports written like a standard scientific/technical report. The links\n", + "at\n", + "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n", + "contain more information. There you can find examples of previous\n", + "reports, the projects themselves, how we grade reports etc. How to\n", + "write reports will also be discussed during the various lab\n", + "sessions. Please do ask us if you are in doubt.\n", + "\n", + "When using codes and material from other sources, you should refer to\n", + "these in the bibliography of your report, indicating wherefrom you for\n", + "example got the code, whether this is from the lecture notes,\n", + "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n", + "sources. These sources should always be cited correctly. How to cite\n", + "some of the libraries is often indicated from their corresponding\n", + "GitHub sites or websites, see for example how to cite Scikit-Learn at\n", + "https://scikit-learn.org/dev/about.html./n", + "\n", + "We enocurage you to use tools like ChatGPT or similar in writing the\n", + "report. If you use for example ChatGPT, please do cite it properly and\n", + "include (if possible) your questions and answers as an addition to the\n", + "report. This can be uploaded to for example your website,\n", + "GitHub/GitLab or similar as supplemental material.\n", + "\n", + "If you would like to study other data sets, feel free to propose other\n", + "sets. What we have proposed here are mere suggestions from our\n", + "side. If you opt for another data set, consider using a set which has\n", + "been studied in the scientific literature. This makes it easier for\n", + "you to compare and analyze your results. Comparing with existing\n", + "results from the scientific literature is also an essential element of\n", + "the scientific discussion. The University of California at Irvine with\n", + "its Machine Learning repository at\n", + "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n", + "up for examples and inspiration. Kaggle.com is an equally interesting\n", + "site. Feel free to explore these sites." + ] + }, + { + "cell_type": "markdown", + "id": "460cc6ea", + "metadata": { + "editable": true + }, + "source": [ + "## Classification and Regression, writing our own neural network code\n", + "\n", + "The main aim of this project is to study both classification and\n", + "regression problems by developing our own \n", + "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see and ) as well as the lecture material from the same weeks (see and ) should contain enough information for you to get started with writing your own code.\n", + "\n", + "We will also reuse our codes on gradient descent methods from project 1.\n", + "\n", + "The data sets that we propose here are (the default sets)\n", + "\n", + "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n", + "\n", + " * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "* Classification.\n", + "\n", + " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at .\n", + "\n", + "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1." + ] + }, + { + "cell_type": "markdown", + "id": "d62a07ef", + "metadata": { + "editable": true + }, + "source": [ + "### Part a): Analytical warm-up\n", + "\n", + "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n", + "gradients. The functions whose gradients we need are:\n", + "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n", + "\n", + "2. The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms\n", + "\n", + "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n", + "\n", + "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.\n", + "\n", + "We will test three activation functions for our neural network setup, these are the \n", + "1. The Sigmoid (aka **logit**) function,\n", + "\n", + "2. the RELU function and\n", + "\n", + "3. the Leaky RELU function\n", + "\n", + "Set up their expressions and their first derivatives.\n", + "You may consult the lecture notes (with codes and more) from week 42 at ." + ] + }, + { + "cell_type": "markdown", + "id": "9cd8b8ac", + "metadata": { + "editable": true + }, + "source": [ + "### Reminder about the gradient machinery from project 1\n", + "\n", + "In the setup of a neural network code you will need your gradient descent codes from\n", + "project 1. For neural networks we will recommend using stochastic\n", + "gradient descent with either the RMSprop or the ADAM algorithms for\n", + "updating the learning rates. But you should feel free to try plain gradient descent as well.\n", + "\n", + "We recommend reading chapter 8 on optimization from the textbook of\n", + "Goodfellow, Bengio and Courville at\n", + ". This chapter contains many\n", + "useful insights and discussions on the optimization part of machine\n", + "learning. A useful reference on the back progagation algorithm is\n", + "Nielsen's book at . \n", + "\n", + "You will find the Python [Seaborn\n", + "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n", + "useful when plotting the results as function of the learning rate\n", + "$\\eta$ and the hyper-parameter $\\lambda$ ." + ] + }, + { + "cell_type": "markdown", + "id": "5931b155", + "metadata": { + "editable": true + }, + "source": [ + "### Part b): Writing your own Neural Network code\n", + "\n", + "Your aim now, and this is the central part of this project, is to\n", + "write your own FFNN code implementing the back\n", + "propagation algorithm discussed in the lecture slides from week 41 at and week 42 at .\n", + "\n", + "We will focus on a regression problem first, using the one-dimensional Runge function" + ] + }, + { + "cell_type": "markdown", + "id": "b273fc8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1+25x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e13db1ec", + "metadata": { + "editable": true + }, + "source": [ + "from project 1.\n", + "\n", + "Use only the mean-squared error as cost function (no regularization terms) and \n", + "write an FFNN code for a regression problem with a flexible number of hidden\n", + "layers and nodes using only the Sigmoid function as activation function for\n", + "the hidden layers. Initialize the weights using a normal\n", + "distribution. How would you initialize the biases? And which\n", + "activation function would you select for the final output layer?\n", + "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n", + "\n", + "Train your network and compare the results with those from your OLS\n", + "regression code from project 1 using the one-dimensional Runge\n", + "function. When comparing your neural network code with the OLS\n", + "results from project 1, use the same data sets which gave you the best\n", + "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n", + "best result. Compare these results with your neural network with one\n", + "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n", + "\n", + "Comment your results and give a critical discussion of the results\n", + "obtained with the OLS code from project 1 and your own neural network\n", + "code. Make an analysis of the learning rates employed to find the\n", + "optimal MSE score. Test both stochastic gradient descent\n", + "with RMSprop and ADAM and plain gradient descent with different\n", + "learning rates.\n", + "\n", + "You should, as you did in project 1, scale your data." + ] + }, + { + "cell_type": "markdown", + "id": "4f864e31", + "metadata": { + "editable": true + }, + "source": [ + "### Part c): Testing against other software libraries\n", + "\n", + "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n", + "\n", + "Furthermore, you should also test that your derivatives are correctly\n", + "calculated using automatic differentiation, using for example the\n", + "**Autograd** library or the **JAX** library. It is optional to implement\n", + "these libraries for the present project. In this project they serve as\n", + "useful tests of our derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "c9faeafd", + "metadata": { + "editable": true + }, + "source": [ + "### Part d): Testing different activation functions and depths of the neural network\n", + "\n", + "You should also test different activation functions for the hidden\n", + "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n", + "discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n", + "It is optional in this project to perform a bias-variance trade-off analysis." + ] + }, + { + "cell_type": "markdown", + "id": "d865c22b", + "metadata": { + "editable": true + }, + "source": [ + "### Part e): Testing different norms\n", + "\n", + "Finally, still using the one-dimensional Runge function, add now the\n", + "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms. Find the\n", + "optimal results for the hyperparameters $\\lambda$ and the learning\n", + "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n", + "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n", + "Use again the same data sets and the best results from project 1 in your comparisons." + ] + }, + { + "cell_type": "markdown", + "id": "5270af8f", + "metadata": { + "editable": true + }, + "source": [ + "### Part f): Classification analysis using neural networks\n", + "\n", + "With a well-written code it should now be easy to change the\n", + "activation function for the output layer.\n", + "\n", + "Here we will change the cost function for our neural network code\n", + "developed in parts b), d) and e) in order to perform a classification\n", + "analysis. The classification problem we will study is the multiclass\n", + "MNIST problem, see the description of the full data set at\n", + ". We will use the Softmax cross entropy function discussed in a). \n", + "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n", + "\n", + "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the \n", + "MNIST-Fashion data set at for example .\n", + "\n", + "To set up the data set, the following python programs may be useful" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e0e1fea", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import fetch_openml\n", + "\n", + "# Fetch the MNIST dataset\n", + "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n", + "\n", + "# Extract data (features) and target (labels)\n", + "X = mnist.data\n", + "y = mnist.target" + ] + }, + { + "cell_type": "markdown", + "id": "8fe85677", + "metadata": { + "editable": true + }, + "source": [ + "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b28318b2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = X / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "97e02c71", + "metadata": { + "editable": true + }, + "source": [ + "And then perform the standard train-test splitting" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "88af355c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "d1f8f0ed", + "metadata": { + "editable": true + }, + "source": [ + "To measure the performance of our classification problem we will use the\n", + "so-called *accuracy* score. The accuracy is as you would expect just\n", + "the number of correctly guessed targets $t_i$ divided by the total\n", + "number of targets, that is" + ] + }, + { + "cell_type": "markdown", + "id": "554b3a48", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77bfdd5c", + "metadata": { + "editable": true + }, + "source": [ + "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n", + "otherwise if we have a binary classification problem. Here $t_i$\n", + "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n", + "\n", + "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions. \n", + "\n", + "Again, we strongly recommend that you compare your own neural Network\n", + "code for classification and pertinent results against a similar code using **Scikit-Learn** or **tensorflow/keras** or **pytorch**.\n", + "\n", + "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n", + "The weblink here compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n", + "\n", + "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "eaa9e72e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "# Initialize the model\n", + "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n", + "# Train the model\n", + "model.fit(X_train, y_train)\n", + "from sklearn.metrics import accuracy_score\n", + "# Make predictions on the test set\n", + "y_pred = model.predict(X_test)\n", + "# Calculate accuracy\n", + "accuracy = accuracy_score(y_test, y_pred)\n", + "print(f\"Model Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "c7ba883e", + "metadata": { + "editable": true + }, + "source": [ + "### Part g) Critical evaluation of the various algorithms\n", + "\n", + "After all these glorious calculations, you should now summarize the\n", + "various algorithms and come with a critical evaluation of their pros\n", + "and cons. Which algorithm works best for the regression case and which\n", + "is best for the classification case. These codes can also be part of\n", + "your final project 3, but now applied to other data sets." + ] + }, + { + "cell_type": "markdown", + "id": "595be693", + "metadata": { + "editable": true + }, + "source": [ + "## Summary of methods to implement and analyze\n", + "\n", + "**Required Implementation:**\n", + "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n", + "\n", + "2. Implement a neural network with\n", + "\n", + " * A flexible number of layers\n", + "\n", + " * A flexible number of nodes in each layer\n", + "\n", + " * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n", + "\n", + " * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n", + "\n", + " * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n", + "\n", + "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n", + "\n", + "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n", + "\n", + " * With no optimization algorithm\n", + "\n", + " * With RMS Prop\n", + "\n", + " * With ADAM\n", + "\n", + "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n", + "\n", + "6. Implement and compute metrics like the MSE and Accuracy" + ] + }, + { + "cell_type": "markdown", + "id": "35138b41", + "metadata": { + "editable": true + }, + "source": [ + "### Required Analysis:\n", + "\n", + "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n", + "\n", + "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n", + "\n", + "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n", + "\n", + "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n", + "\n", + "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network" + ] + }, + { + "cell_type": "markdown", + "id": "b18bea03", + "metadata": { + "editable": true + }, + "source": [ + "### Optional (Note that you should include at least two of these in the report):\n", + "\n", + "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n", + "\n", + "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n", + "\n", + "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n", + "\n", + "4. Use a more complex classification dataset instead, like the fashion MNIST (see )\n", + "\n", + "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "6. Compute and interpret a confusion matrix of your best classification model (see )" + ] + }, + { + "cell_type": "markdown", + "id": "580d8424", + "metadata": { + "editable": true + }, + "source": [ + "## Background literature\n", + "\n", + "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at . It is an excellent read.\n", + "\n", + "2. Goodfellow, Bengio and Courville, Deep Learning at . Here we recommend chapters 6, 7 and 8\n", + "\n", + "3. Raschka et al. at . Here we recommend chapters 11, 12 and 13." + ] + }, + { + "cell_type": "markdown", + "id": "96f5c67e", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to numerical projects\n", + "\n", + "Here follows a brief recipe and recommendation on how to write a report for each\n", + "project.\n", + "\n", + " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "\n", + " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "\n", + " * Include the source code of your program. Comment your program properly.\n", + "\n", + " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "\n", + " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "\n", + " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "\n", + " * Try to give an interpretation of you results in your answers to the problems.\n", + "\n", + " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "\n", + " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + ] + }, + { + "cell_type": "markdown", + "id": "d1bc28ba", + "metadata": { + "editable": true + }, + "source": [ + "## Format for electronic delivery of report and programs\n", + "\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n", + "\n", + " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "\n", + " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "\n", + " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "\n", + "Finally, \n", + "we encourage you to collaborate. Optimal working groups consist of \n", + "2-3 students. You can then hand in a common report." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/requirements.txt b/doc/LectureNotes/requirements.txt new file mode 100644 index 000000000..54b882503 --- /dev/null +++ b/doc/LectureNotes/requirements.txt @@ -0,0 +1,93 @@ +accessible-pygments==0.0.5 +alabaster==0.7.16 +appnope==0.1.4 +asttokens==3.0.0 +attrs==25.3.0 +babel==2.17.0 +beautifulsoup4==4.13.5 +certifi==2025.8.3 +charset-normalizer==3.4.3 +click==8.2.1 +comm==0.2.3 +debugpy==1.8.16 +decorator==5.2.1 +docutils==0.21.2 +executing==2.2.1 +fastjsonschema==2.21.2 +idna==3.10 +imagesize==1.4.1 +importlib_metadata==8.7.0 +ipykernel==6.30.1 +ipython==9.5.0 +ipython_pygments_lexers==1.1.1 +jedi==0.19.2 +Jinja2==3.1.6 +jsonschema==4.25.1 +jsonschema-specifications==2025.9.1 +jupyter-book==1.0.4.post1 +jupyter-cache==1.0.1 +jupyter_client==8.6.3 +jupyter_core==5.8.1 +latexcodec==3.0.1 +linkify-it-py==2.0.3 +markdown-it-py==3.0.0 +MarkupSafe==3.0.2 +matplotlib-inline==0.1.7 +mdit-py-plugins==0.5.0 +mdurl==0.1.2 +myst-nb==1.3.0 +myst-parser==3.0.1 +nbclient==0.10.2 +nbformat==5.10.4 +nest-asyncio==1.6.0 +numpy==2.3.3 +packaging==25.0 +parso==0.8.5 +pexpect==4.9.0 +platformdirs==4.4.0 +prompt_toolkit==3.0.52 +psutil==7.0.0 +ptyprocess==0.7.0 +pure_eval==0.2.3 +pybtex==0.25.1 +pybtex-docutils==1.0.3 +pydata-sphinx-theme==0.15.4 +Pygments==2.19.2 +python-dateutil==2.9.0.post0 +PyYAML==6.0.2 +pyzmq==27.0.2 +referencing==0.36.2 +requests==2.32.5 +rpds-py==0.27.1 +setuptools==80.9.0 +six==1.17.0 +snowballstemmer==3.0.1 +soupsieve==2.8 +Sphinx==7.4.7 +sphinx-book-theme==1.1.4 +sphinx-comments==0.0.3 +sphinx-copybutton==0.5.2 +sphinx-jupyterbook-latex==1.0.0 +sphinx-multitoc-numbering==0.1.3 +sphinx-thebe==0.3.1 +sphinx-togglebutton==0.3.2 +sphinx_design==0.6.1 +sphinx_external_toc==1.0.1 +sphinxcontrib-applehelp==2.0.0 +sphinxcontrib-bibtex==2.6.5 +sphinxcontrib-devhelp==2.0.0 +sphinxcontrib-htmlhelp==2.1.0 +sphinxcontrib-jsmath==1.0.1 +sphinxcontrib-qthelp==2.0.0 +sphinxcontrib-serializinghtml==2.0.0 +SQLAlchemy==2.0.43 +stack-data==0.6.3 +tabulate==0.9.0 +tornado==6.5.2 +traitlets==5.14.3 +typing_extensions==4.15.0 +uc-micro-py==1.0.3 +urllib3==2.5.0 +wcwidth==0.2.13 +wheel==0.45.1 +zipp==3.23.0 diff --git a/doc/LectureNotes/week37.ipynb b/doc/LectureNotes/week37.ipynb new file mode 100644 index 000000000..fe89adb05 --- /dev/null +++ b/doc/LectureNotes/week37.ipynb @@ -0,0 +1,3856 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d842e7e1", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0cd52479", + "metadata": { + "editable": true + }, + "source": [ + "# Week 37: Gradient descent methods\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 8-12, 2025**\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "699b6141", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 37, lecture Monday\n", + "\n", + "**Plans and material for the lecture on Monday September 8.**\n", + "\n", + "The family of gradient descent methods\n", + "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n", + "\n", + "2. Improving gradient descent with momentum\n", + "\n", + "3. Introducing stochastic gradient descent\n", + "\n", + "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n", + "\n", + "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n", + "\n", + "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "dd264b1c", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at and chapter 8.3-8.5 at \n", + "\n", + "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n", + "\n", + "3. Video on gradient descent at \n", + "\n", + "4. Video on Stochastic gradient descent at " + ] + }, + { + "cell_type": "markdown", + "id": "608927bc", + "metadata": { + "editable": true + }, + "source": [ + "## Material for lecture Monday September 8" + ] + }, + { + "cell_type": "markdown", + "id": "60640670", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and revisiting Ordinary Least Squares from last week\n", + "\n", + "Last week we started with linear regression as a case study for the gradient descent\n", + "methods. Linear regression is a great test case for the gradient\n", + "descent methods discussed in the lectures since it has several\n", + "desirable properties such as:\n", + "\n", + "1. An analytical solution (recall homework sets for week 35).\n", + "\n", + "2. The gradient can be computed analytically.\n", + "\n", + "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n", + "\n", + "We revisit an example similar to what we had in the first homework set. We have a function of the type" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "947b67ee", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "x = 2*np.random.rand(m,1)\n", + "y = 4+3*x+np.random.randn(m,1)" + ] + }, + { + "cell_type": "markdown", + "id": "0a787eca", + "metadata": { + "editable": true + }, + "source": [ + "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n", + "The linear regression model is given by" + ] + }, + { + "cell_type": "markdown", + "id": "d7e84ac7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f34c217e", + "metadata": { + "editable": true + }, + "source": [ + "such that" + ] + }, + { + "cell_type": "markdown", + "id": "b145d4eb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2df6d60d", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent example\n", + "\n", + "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n", + "\n", + "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)" + ] + }, + { + "cell_type": "markdown", + "id": "1deafba0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "X \\equiv \\begin{bmatrix}\n", + "1 & x_1 \\\\\n", + "\\vdots & \\vdots \\\\\n", + "1 & x_{100} & \\\\\n", + "\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "520ac423", + "metadata": { + "editable": true + }, + "source": [ + "The cost/loss/risk function is given by" + ] + }, + { + "cell_type": "markdown", + "id": "48e7232b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0194af20", + "metadata": { + "editable": true + }, + "source": [ + "and we want to find $\\theta$ such that $C(\\theta)$ is minimized." + ] + }, + { + "cell_type": "markdown", + "id": "9f58d823", + "metadata": { + "editable": true + }, + "source": [ + "## The derivative of the cost/loss function\n", + "\n", + "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show that the gradient can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "10129d02", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4cd07523", + "metadata": { + "editable": true + }, + "source": [ + "where $X$ is the design matrix defined above." + ] + }, + { + "cell_type": "markdown", + "id": "1bda7e01", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix\n", + "The Hessian matrix of $C(\\theta)$ is given by" + ] + }, + { + "cell_type": "markdown", + "id": "aa64bdd1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3e7f4c5d", + "metadata": { + "editable": true + }, + "source": [ + "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite." + ] + }, + { + "cell_type": "markdown", + "id": "79ed73a8", + "metadata": { + "editable": true + }, + "source": [ + "## Simple program\n", + "\n", + "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to" + ] + }, + { + "cell_type": "markdown", + "id": "1b70ad9b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2fbef92d", + "metadata": { + "editable": true + }, + "source": [ + "We can use the expression we computed for the gradient and let use a\n", + "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n", + "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n", + "\n", + "And finally we can compare our solution for $\\theta$ with the analytic result given by \n", + "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$." + ] + }, + { + "cell_type": "markdown", + "id": "0728a369", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient Descent Example\n", + "\n", + "Here our simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a48d43f0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "\n", + "# Importing various packages\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "# Hessian matrix\n", + "H = (2.0/n)* X.T @ X\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n", + "print(theta_linreg)\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "for iter in range(Niterations):\n", + " gradient = (2.0/n)*X.T @ (X @ theta-y)\n", + " theta -= eta*gradient\n", + "\n", + "print(theta)\n", + "xnew = np.array([[0],[2]])\n", + "xbnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = xbnew.dot(theta)\n", + "ypredict2 = xbnew.dot(theta_linreg)\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6c1c6ed1", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent and Ridge\n", + "\n", + "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$," + ] + }, + { + "cell_type": "markdown", + "id": "a82ce6e3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb0de7c2", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows" + ] + }, + { + "cell_type": "markdown", + "id": "b76c0dea", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C_{\\text{ridge}}(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n", + "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n", + "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4eeb07f6", + "metadata": { + "editable": true + }, + "source": [ + "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by" + ] + }, + { + "cell_type": "markdown", + "id": "cc7d6c64", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "08bd65db", + "metadata": { + "editable": true + }, + "source": [ + "## The Hessian matrix for Ridge Regression\n", + "The Hessian matrix of Ridge Regression for our simple example is given by" + ] + }, + { + "cell_type": "markdown", + "id": "a1c5a4d1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} \\\\\n", + "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} & \\\\\n", + "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f178c97e", + "metadata": { + "editable": true + }, + "source": [ + "This implies that the Hessian matrix is positive definite, hence the stationary point is a\n", + "minimum.\n", + "Note that the Ridge cost function is convex being a sum of two convex\n", + "functions. Therefore, the stationary point is a global\n", + "minimum of this function." + ] + }, + { + "cell_type": "markdown", + "id": "3853aec7", + "metadata": { + "editable": true + }, + "source": [ + "## Program example for gradient descent with Ridge Regression" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "81740e7b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm\n", + "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", + "import sys\n", + "\n", + "# the number of datapoints\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "\n", + "#Ridge parameter lambda\n", + "lmbda = 0.001\n", + "Id = n*lmbda* np.eye(XT_X.shape[0])\n", + "\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n", + "# Get the eigenvalues\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "\n", + "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n", + "print(theta_linreg)\n", + "# Start plain gradient descent\n", + "theta = np.random.randn(2,1)\n", + "\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n", + " theta -= eta*gradients\n", + "\n", + "print(theta)\n", + "ypredict = X @ theta\n", + "ypredict2 = X @ theta_linreg\n", + "plt.plot(x, ypredict, \"r-\")\n", + "plt.plot(x, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Gradient descent example for Ridge')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "aa1b6e08", + "metadata": { + "editable": true + }, + "source": [ + "## Using gradient descent methods, limitations\n", + "\n", + "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n", + "\n", + "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n", + "\n", + "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n", + "\n", + "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n", + "\n", + "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n", + "\n", + "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points." + ] + }, + { + "cell_type": "markdown", + "id": "d1b9be1a", + "metadata": { + "editable": true + }, + "source": [ + "## Momentum based GD\n", + "\n", + "We discuss here some simple examples where we introduce what is called\n", + "'memory'about previous steps, or what is normally called momentum\n", + "gradient descent.\n", + "For the mathematical details, see whiteboad notes from lecture on September 8, 2025." + ] + }, + { + "cell_type": "markdown", + "id": "2e1267e6", + "metadata": { + "editable": true + }, + "source": [ + "## Improving gradient descent with momentum" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "494e82a7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# take a step\n", + "\t\tsolution = solution - step_size * gradient\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# perform the gradient descent search\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "46858c7c", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "6a917123", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from numpy import asarray\n", + "from numpy import arange\n", + "from numpy.random import rand\n", + "from numpy.random import seed\n", + "from matplotlib import pyplot\n", + " \n", + "# objective function\n", + "def objective(x):\n", + "\treturn x**2.0\n", + " \n", + "# derivative of objective function\n", + "def derivative(x):\n", + "\treturn x * 2.0\n", + " \n", + "# gradient descent algorithm\n", + "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n", + "\t# track all solutions\n", + "\tsolutions, scores = list(), list()\n", + "\t# generate an initial point\n", + "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n", + "\t# keep track of the change\n", + "\tchange = 0.0\n", + "\t# run the gradient descent\n", + "\tfor i in range(n_iter):\n", + "\t\t# calculate gradient\n", + "\t\tgradient = derivative(solution)\n", + "\t\t# calculate update\n", + "\t\tnew_change = step_size * gradient + momentum * change\n", + "\t\t# take a step\n", + "\t\tsolution = solution - new_change\n", + "\t\t# save the change\n", + "\t\tchange = new_change\n", + "\t\t# evaluate candidate point\n", + "\t\tsolution_eval = objective(solution)\n", + "\t\t# store solution\n", + "\t\tsolutions.append(solution)\n", + "\t\tscores.append(solution_eval)\n", + "\t\t# report progress\n", + "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n", + "\treturn [solutions, scores]\n", + " \n", + "# seed the pseudo random number generator\n", + "seed(4)\n", + "# define range for input\n", + "bounds = asarray([[-1.0, 1.0]])\n", + "# define the total iterations\n", + "n_iter = 30\n", + "# define the step size\n", + "step_size = 0.1\n", + "# define momentum\n", + "momentum = 0.3\n", + "# perform the gradient descent search with momentum\n", + "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n", + "# sample input range uniformly at 0.1 increments\n", + "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n", + "# compute targets\n", + "results = objective(inputs)\n", + "# create a line plot of input vs result\n", + "pyplot.plot(inputs, results)\n", + "# plot the solutions found\n", + "pyplot.plot(solutions, scores, '.-', color='red')\n", + "# show the plot\n", + "pyplot.show()" + ] + }, + { + "cell_type": "markdown", + "id": "361b2aa8", + "metadata": { + "editable": true + }, + "source": [ + "## Overview video on Stochastic Gradient Descent (SGD)\n", + "\n", + "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n", + "There are several reasons for using stochastic gradient descent. Some of these are:\n", + "\n", + "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n", + "\n", + "2. Hopefully avoid Local Minima\n", + "\n", + "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset." + ] + }, + { + "cell_type": "markdown", + "id": "2dacb8ef", + "metadata": { + "editable": true + }, + "source": [ + "## Batches and mini-batches\n", + "\n", + "In gradient descent we compute the cost function and its gradient for all data points we have.\n", + "\n", + "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n", + "training data can have on order of millions of examples. Hence, it\n", + "seems wasteful to compute the full cost function over the entire\n", + "training set in order to perform only a single parameter update. A\n", + "very common approach to addressing this challenge is to compute the\n", + "gradient over batches of the training data. For example, a typical batch could contain some thousand examples from\n", + "an entire training set of several millions. This batch is then used to\n", + "perform a parameter update." + ] + }, + { + "cell_type": "markdown", + "id": "59c9add4", + "metadata": { + "editable": true + }, + "source": [ + "## Pros and cons\n", + "\n", + "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n", + "\n", + "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n", + "\n", + "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient." + ] + }, + { + "cell_type": "markdown", + "id": "a5168cc9", + "metadata": { + "editable": true + }, + "source": [ + "## Convergence rates\n", + "\n", + "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n", + "\n", + "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration." + ] + }, + { + "cell_type": "markdown", + "id": "47321307", + "metadata": { + "editable": true + }, + "source": [ + "## Accuracy\n", + "\n", + "In general, stochastic Gradient Descent is Less accurate than gradient\n", + "descent, as it calculates the gradient on single examples, which may\n", + "not accurately represent the overall dataset. Gradient Descent is\n", + "more accurate because it uses the average gradient calculated over the\n", + "entire dataset.\n", + "\n", + "There are other disadvantages to using SGD. The main drawback is that\n", + "its convergence behaviour can be more erratic due to the random\n", + "sampling of individual training examples. This can lead to less\n", + "accurate results, as the algorithm may not converge to the true\n", + "minimum of the cost function. Additionally, the learning rate, which\n", + "determines the step size of each update to the model’s parameters,\n", + "must be carefully chosen to ensure convergence.\n", + "\n", + "It is however the method of choice in deep learning algorithms where\n", + "SGD is often used in combination with other optimization techniques,\n", + "such as momentum or adaptive learning rates" + ] + }, + { + "cell_type": "markdown", + "id": "96f44d6b", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent (SGD)\n", + "\n", + "In stochastic gradient descent, the extreme case is the case where we\n", + "have only one batch, that is we include the whole data set.\n", + "\n", + "This process is called Stochastic Gradient\n", + "Descent (SGD) (or also sometimes on-line gradient descent). This is\n", + "relatively less common to see because in practice due to vectorized\n", + "code optimizations it can be computationally much more efficient to\n", + "evaluate the gradient for 100 examples, than the gradient for one\n", + "example 100 times. Even though SGD technically refers to using a\n", + "single example at a time to evaluate the gradient, you will hear\n", + "people use the term SGD even when referring to mini-batch gradient\n", + "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n", + "for “Batch gradient descent” are rare to see), where it is usually\n", + "assumed that mini-batches are used. The size of the mini-batch is a\n", + "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n", + "usually based on memory constraints (if any), or set to some value,\n", + "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n", + "vectorized operation implementations work faster when their inputs are\n", + "sized in powers of 2.\n", + "\n", + "In our notes with SGD we mean stochastic gradient descent with mini-batches." + ] + }, + { + "cell_type": "markdown", + "id": "898ef421", + "metadata": { + "editable": true + }, + "source": [ + "## Stochastic Gradient Descent\n", + "\n", + "Stochastic gradient descent (SGD) and variants thereof address some of\n", + "the shortcomings of the Gradient descent method discussed above.\n", + "\n", + "The underlying idea of SGD comes from the observation that the cost\n", + "function, which we want to minimize, can almost always be written as a\n", + "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$," + ] + }, + { + "cell_type": "markdown", + "id": "4e827950", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05e99546", + "metadata": { + "editable": true + }, + "source": [ + "## Computation of gradients\n", + "\n", + "This in turn means that the gradient can be\n", + "computed as a sum over $i$-gradients" + ] + }, + { + "cell_type": "markdown", + "id": "b92afe6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b20a4aca", + "metadata": { + "editable": true + }, + "source": [ + "Stochasticity/randomness is introduced by only taking the\n", + "gradient on a subset of the data called minibatches. If there are $n$\n", + "data points and the size of each minibatch is $M$, there will be $n/M$\n", + "minibatches. We denote these minibatches by $B_k$ where\n", + "$k=1,\\cdots,n/M$." + ] + }, + { + "cell_type": "markdown", + "id": "7884cc0d", + "metadata": { + "editable": true + }, + "source": [ + "## SGD example\n", + "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n", + "and we choose to have $M=5$ minibathces,\n", + "then each minibatch contains two data points. In particular we have\n", + "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n", + "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n", + "have only a single batch with all data points and on the other extreme,\n", + "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n", + "$B_k = \\mathbf{x}_k$.\n", + "\n", + "The idea is now to approximate the gradient by replacing the sum over\n", + "all data points with a sum over the data points in one the minibatches\n", + "picked at random in each gradient descent step" + ] + }, + { + "cell_type": "markdown", + "id": "392aeed0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\nabla_{\\theta}\n", + "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n", + "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "04581249", + "metadata": { + "editable": true + }, + "source": [ + "## The gradient step\n", + "\n", + "Thus a gradient descent step now looks like" + ] + }, + { + "cell_type": "markdown", + "id": "d21077a4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n", + "\\mathbf{\\theta})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b4bed668", + "metadata": { + "editable": true + }, + "source": [ + "where $k$ is picked at random with equal\n", + "probability from $[1,n/M]$. An iteration over the number of\n", + "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n", + "typical to choose a number of epochs and for each epoch iterate over\n", + "the number of minibatches, as exemplified in the code below." + ] + }, + { + "cell_type": "markdown", + "id": "9c15b282", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example code" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "602bda4c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 10 #number of epochs\n", + "\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for \n", + " j += 1" + ] + }, + { + "cell_type": "markdown", + "id": "332831a7", + "metadata": { + "editable": true + }, + "source": [ + "Taking the gradient only on a subset of the data has two important\n", + "benefits. First, it introduces randomness which decreases the chance\n", + "that our opmization scheme gets stuck in a local minima. Second, if\n", + "the size of the minibatches are small relative to the number of\n", + "datapoints ($M < n$), the computation of the gradient is much\n", + "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n", + "all $n$ datapoints." + ] + }, + { + "cell_type": "markdown", + "id": "187eb27c", + "metadata": { + "editable": true + }, + "source": [ + "## When do we stop?\n", + "\n", + "A natural question is when do we stop the search for a new minimum?\n", + "One possibility is to compute the full gradient after a given number\n", + "of epochs and check if the norm of the gradient is smaller than some\n", + "threshold and stop if true. However, the condition that the gradient\n", + "is zero is valid also for local minima, so this would only tell us\n", + "that we are close to a local/global minimum. However, we could also\n", + "evaluate the cost function at this point, store the result and\n", + "continue the search. If the test kicks in at a later stage we can\n", + "compare the values of the cost function and keep the $\\theta$ that\n", + "gave the lowest value." + ] + }, + { + "cell_type": "markdown", + "id": "8ddbdbb5", + "metadata": { + "editable": true + }, + "source": [ + "## Slightly different approach\n", + "\n", + "Another approach is to let the step length $\\eta_j$ depend on the\n", + "number of epochs in such a way that it becomes very small after a\n", + "reasonable time such that we do not move at all. Such approaches are\n", + "also called scaling. There are many such ways to [scale the learning\n", + "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n", + "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n", + "also\n", + "\n", + "for a discussion of different scaling functions for the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "35ea8e21", + "metadata": { + "editable": true + }, + "source": [ + "## Time decay rate\n", + "\n", + "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n", + "\n", + "In this way we can fix the number of epochs, compute $\\theta$ and\n", + "evaluate the cost function at the end. Repeating the computation will\n", + "give a different result since the scheme is random by design. Then we\n", + "pick the final $\\theta$ that gives the lowest value of the cost\n", + "function." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "77a60fcd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "\n", + "def step_length(t,t0,t1):\n", + " return t0/(t+t1)\n", + "\n", + "n = 100 #100 datapoints \n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "n_epochs = 500 #number of epochs\n", + "t0 = 1.0\n", + "t1 = 10\n", + "\n", + "eta_j = t0/t1\n", + "j = 0\n", + "for epoch in range(1,n_epochs+1):\n", + " for i in range(m):\n", + " k = np.random.randint(m) #Pick the k-th minibatch at random\n", + " #Compute the gradient using the data in minibatch Bk\n", + " #Compute new suggestion for theta\n", + " t = epoch*m+i\n", + " eta_j = step_length(t,t0,t1)\n", + " j += 1\n", + "\n", + "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))" + ] + }, + { + "cell_type": "markdown", + "id": "b030b80c", + "metadata": { + "editable": true + }, + "source": [ + "## Code with a Number of Minibatches which varies\n", + "\n", + "In the code here we vary the number of mini-batches." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "9bdf875b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Importing various packages\n", + "from math import exp, sqrt\n", + "from random import random, seed\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = 2.0/n*X.T @ ((X @ theta)-y)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "365cebd9", + "metadata": { + "editable": true + }, + "source": [ + "## Replace or not\n", + "\n", + "In the above code, we have use replacement in setting up the\n", + "mini-batches. The discussion\n", + "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n", + "useful." + ] + }, + { + "cell_type": "markdown", + "id": "e7c9011a", + "metadata": { + "editable": true + }, + "source": [ + "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison" + ] + }, + { + "cell_type": "markdown", + "id": "f1c85da0", + "metadata": { + "editable": true + }, + "source": [ + "### Theoretical Convergence Speed and convex optimization\n", + "\n", + "Consider minimizing an empirical cost function" + ] + }, + { + "cell_type": "markdown", + "id": "66df0f80", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f02b845", + "metadata": { + "editable": true + }, + "source": [ + "where each $l_i(\\theta)$ is a\n", + "differentiable loss term. Gradient Descent (GD) updates parameters\n", + "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n", + "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n", + "l_i(\\theta)$ selected at random. In equation form, one GD step is:" + ] + }, + { + "cell_type": "markdown", + "id": "21997f1a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cdefe165", + "metadata": { + "editable": true + }, + "source": [ + "whereas one SGD step is:" + ] + }, + { + "cell_type": "markdown", + "id": "ac200d56", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb3edfb3", + "metadata": { + "editable": true + }, + "source": [ + "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n", + "converge to the global minimum, but their rates differ. GD can take\n", + "larger, more stable steps since it uses the exact gradient, achieving\n", + "an error that decreases on the order of $O(1/t)$ per iteration for\n", + "convex objectives (and even exponentially fast for strongly convex\n", + "cases). In contrast, plain SGD has more variance in each step, leading\n", + "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n", + "for general convex objectives (\\thetaith appropriate diminishing step\n", + "sizes) . Intuitively, GD’s trajectory is smoother and more\n", + "predictable, while SGD’s path oscillates due to noise but costs far\n", + "less per iteration, enabling many more updates in the same time." + ] + }, + { + "cell_type": "markdown", + "id": "7fe05c0d", + "metadata": { + "editable": true + }, + "source": [ + "### Strongly Convex Case\n", + "\n", + "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n", + "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as" + ] + }, + { + "cell_type": "markdown", + "id": "2ae403f1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "44272171", + "metadata": { + "editable": true + }, + "source": [ + "a geometric (linear) convergence per iteration . Achieving an\n", + "$\\epsilon$-accurate solution thus takes on the order of\n", + "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n", + "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n", + "obtain a linear rate – instead, with a properly decaying step size\n", + "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n", + "$O(1/t)$ convergence rate in expectation . For example, one result\n", + "of Moulines and Bach 2011, see shows that with $\\eta_t = \\Theta(1/t)$," + ] + }, + { + "cell_type": "markdown", + "id": "9cde29ef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9b77f20e", + "metadata": { + "editable": true + }, + "source": [ + "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n", + "iteration than GD’s exponential decay, but each SGD iteration is $N$\n", + "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n", + "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n", + "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n", + "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n", + "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n", + "computations. In large-scale regimes (huge $N$), SGD can be\n", + "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n", + "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n", + "with millions of data points, one epoch of GD (one full gradient) is\n", + "extremely costly, whereas SGD can make $N$ cheap updates in the time\n", + "GD makes one – often yielding a good solution faster in practice, even\n", + "though SGD’s asymptotic error decays more slowly. As one lecture\n", + "succinctly puts it: “SGD can be super effective in terms of iteration\n", + "cost and memory, but SGD is slow to converge and can’t adapt to strong\n", + "convexity” . Thus, the break-even point depends on $N$ and the desired\n", + "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n", + "updates win; for extremely high precision (very small $\\epsilon$) on a\n", + "modest $N$, GD’s fast convergence per step can be advantageous." + ] + }, + { + "cell_type": "markdown", + "id": "4479bd97", + "metadata": { + "editable": true + }, + "source": [ + "### Non-Convex Problems\n", + "\n", + "In non-convex optimization (e.g. deep neural networks), neither GD nor\n", + "SGD guarantees global minima, but SGD often displays faster progress\n", + "in finding useful minima. Theoretical results here are weaker, usually\n", + "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n", + "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n", + "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n", + "similar polynomial complexity (often worse due to gradient\n", + "noise). However, a noteworthy difference is that SGD’s stochasticity\n", + "can help escape saddle points or poor local minima. Random gradient\n", + "fluctuations act like implicit noise, helping the iterate “jump” out\n", + "of flat saddle regions where full-batch GD could stagnate . In fact,\n", + "research has shown that adding noise to GD can guarantee escaping\n", + "saddle points in polynomial time, and the inherent noise in SGD often\n", + "serves this role. Empirically, this means SGD can sometimes find a\n", + "lower loss basin faster, whereas full-batch GD might get “stuck” near\n", + "saddle points or need a very small learning rate to navigate complex\n", + "error surfaces . Overall, in modern high-dimensional machine learning,\n", + "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n", + "because it converges to good solutions much faster in practice,\n", + "despite the lack of a linear convergence guarantee. Full-batch GD is\n", + "rarely used on large neural networks, as it would require tiny steps\n", + "to avoid divergence and is extremely slow per iteration ." + ] + }, + { + "cell_type": "markdown", + "id": "31ea65c9", + "metadata": { + "editable": true + }, + "source": [ + "## Memory Usage and Scalability\n", + "\n", + "A major advantage of SGD is its memory efficiency in handling large\n", + "datasets. Full-batch GD requires access to the entire training set for\n", + "each iteration, which often means the whole dataset (or a large\n", + "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n", + "in memory usage that scales linearly with the dataset size $N$. For\n", + "instance, if each training sample is large (e.g. high-dimensional\n", + "features), computing a full gradient may require storing a substantial\n", + "portion of the data or all intermediate gradients until they are\n", + "aggregated. In contrast, SGD needs only a single (or a small\n", + "mini-batch of) training example(s) in memory at any time . The\n", + "algorithm processes one sample (or mini-batch) at a time and\n", + "immediately updates the model, discarding that sample before moving to\n", + "the next. This streaming approach means that memory footprint is\n", + "essentially independent of $N$ (apart from storing the model\n", + "parameters themselves). As one source notes, gradient descent\n", + "“requires more memory than SGD” because it “must store the entire\n", + "dataset for each iteration,” whereas SGD “only needs to store the\n", + "current training example” . In practical terms, if you have a dataset\n", + "of size, say, 1 million examples, full-batch GD would need memory for\n", + "all million every step, while SGD could be implemented to load just\n", + "one example at a time – a crucial benefit if data are too large to fit\n", + "in RAM or GPU memory. This scalability makes SGD suitable for\n", + "large-scale learning: as long as you can stream data from disk, SGD\n", + "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n", + "“does not need to remember which examples were visited” in the past,\n", + "allowing it to run in an online fashion on infinite data streams\n", + ". Full-batch GD, on the other hand, would require multiple passes\n", + "through a giant dataset per update (or a complex distributed memory\n", + "system), which is often infeasible.\n", + "\n", + "There is also a secondary memory effect: computing a full-batch\n", + "gradient in deep learning requires storing all intermediate\n", + "activations for backpropagation across the entire batch. A very large\n", + "batch (approaching the full dataset) might exhaust GPU memory due to\n", + "the need to hold activation gradients for thousands or millions of\n", + "examples simultaneously. SGD/minibatches mitigate this by splitting\n", + "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n", + "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n", + "could not even be executed if $N$ is huge. Techniques like gradient\n", + "accumulation exist to simulate large-batch GD by summing many\n", + "small-batch gradients – but these still process data in manageable\n", + "chunks to avoid memory overflow. In summary, memory complexity for GD\n", + "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n", + "(only the model and perhaps a mini-batch reside in memory) . This is a\n", + "key reason why batch GD “does not scale” to very large data and why\n", + "virtually all large-scale machine learning algorithms rely on\n", + "stochastic or mini-batch methods." + ] + }, + { + "cell_type": "markdown", + "id": "3f3fe4c4", + "metadata": { + "editable": true + }, + "source": [ + "## Empirical Evidence: Convergence Time and Memory in Practice\n", + "\n", + "Empirical studies strongly support the theoretical trade-offs\n", + "above. In large-scale machine learning tasks, SGD often converges to a\n", + "good solution much faster in wall-clock time than full-batch GD, and\n", + "it uses far less memory. For example, Bottou & Bousquet (2008)\n", + "analyzed learning time under a fixed computational budget and\n", + "concluded that when data is abundant, it’s better to use a faster\n", + "(even if less precise) optimization method to process more examples in\n", + "the same time . This analysis showed that for large-scale problems,\n", + "processing more data with SGD yields lower error than spending the\n", + "time to do exact (batch) optimization on fewer data . In other words,\n", + "if you have a time budget, it’s often optimal to accept slightly\n", + "slower convergence per step (as with SGD) in exchange for being able\n", + "to use many more training samples in that time. This phenomenon is\n", + "borne out by experiments:" + ] + }, + { + "cell_type": "markdown", + "id": "69d08c69", + "metadata": { + "editable": true + }, + "source": [ + "### Deep Neural Networks\n", + "\n", + "In modern deep learning, full-batch GD is so slow that it is rarely\n", + "attempted; instead, mini-batch SGD is standard. A recent study\n", + "demonstrated that it is possible to train a ResNet-50 on ImageNet\n", + "using full-batch gradient descent, but it required careful tuning\n", + "(e.g. gradient clipping, tiny learning rates) and vast computational\n", + "resources – and even then, each full-batch update was extremely\n", + "expensive.\n", + "\n", + "Using a huge batch\n", + "(closer to full GD) tends to slow down convergence if the learning\n", + "rate is not scaled up, and often encounters optimization difficulties\n", + "(plateaus) that small batches avoid.\n", + "Empirically, small or medium\n", + "batch SGD finds minima in fewer clock hours because it can rapidly\n", + "loop over the data with gradient noise aiding exploration." + ] + }, + { + "cell_type": "markdown", + "id": "4e2b549d", + "metadata": { + "editable": true + }, + "source": [ + "### Memory constraints\n", + "\n", + "From a memory standpoint, practitioners note that batch GD becomes\n", + "infeasible on large data. For example, if one tried to do full-batch\n", + "training on a dataset that doesn’t fit in RAM or GPU memory, the\n", + "program would resort to heavy disk I/O or simply crash. SGD\n", + "circumvents this by processing mini-batches. Even in cases where data\n", + "does fit in memory, using a full batch can spike memory usage due to\n", + "storing all gradients. One empirical observation is that mini-batch\n", + "training has a “lower, fluctuating usage pattern” of memory, whereas\n", + "full-batch loading “quickly consumes memory (often exceeding limits)”\n", + ". This is especially relevant for graph neural networks or other\n", + "models where a “batch” may include a huge chunk of a graph: full-batch\n", + "gradient computation can exhaust GPU memory, whereas mini-batch\n", + "methods keep memory usage manageable .\n", + "\n", + "In summary, SGD converges faster than full-batch GD in terms of actual\n", + "training time for large-scale problems, provided we measure\n", + "convergence as reaching a good-enough solution. Theoretical bounds\n", + "show SGD needs more iterations, but because it performs many more\n", + "updates per unit time (and requires far less memory), it often\n", + "achieves lower loss in a given time frame than GD. Full-batch GD might\n", + "take slightly fewer iterations in theory, but each iteration is so\n", + "costly that it is “slower… especially for large datasets” . Meanwhile,\n", + "memory scaling strongly favors SGD: GD’s memory cost grows with\n", + "dataset size, making it impractical beyond a point, whereas SGD’s\n", + "memory use is modest and mostly constant w.r.t. $N$ . These\n", + "differences have made SGD (and mini-batch variants) the de facto\n", + "choice for training large machine learning models, from logistic\n", + "regression on millions of examples to deep neural networks with\n", + "billions of parameters. The consensus in both research and practice is\n", + "that for large-scale or high-dimensional tasks, SGD-type methods\n", + "converge quicker per unit of computation and handle memory constraints\n", + "better than standard full-batch gradient descent ." + ] + }, + { + "cell_type": "markdown", + "id": "48c2661e", + "metadata": { + "editable": true + }, + "source": [ + "## Second moment of the gradient\n", + "\n", + "In stochastic gradient descent, with and without momentum, we still\n", + "have to specify a schedule for tuning the learning rates $\\eta_t$\n", + "as a function of time. As discussed in the context of Newton's\n", + "method, this presents a number of dilemmas. The learning rate is\n", + "limited by the steepest direction which can change depending on the\n", + "current position in the landscape. To circumvent this problem, ideally\n", + "our algorithm would keep track of curvature and take large steps in\n", + "shallow, flat directions and small steps in steep, narrow directions.\n", + "Second-order methods accomplish this by calculating or approximating\n", + "the Hessian and normalizing the learning rate by the\n", + "curvature. However, this is very computationally expensive for\n", + "extremely large models. Ideally, we would like to be able to\n", + "adaptively change the step size to match the landscape without paying\n", + "the steep computational price of calculating or approximating\n", + "Hessians.\n", + "\n", + "During the last decade a number of methods have been introduced that accomplish\n", + "this by tracking not only the gradient, but also the second moment of\n", + "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n", + "[ADAM](https://arxiv.org/abs/1412.6980)." + ] + }, + { + "cell_type": "markdown", + "id": "a2106298", + "metadata": { + "editable": true + }, + "source": [ + "## Challenge: Choosing a Fixed Learning Rate\n", + "A fixed $\\eta$ is hard to get right:\n", + "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n", + "\n", + "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n", + "\n", + "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n", + "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n", + "1. Steep coordinates require a smaller step size to avoid oscillation.\n", + "\n", + "2. Flat/shallow coordinates could use a larger step to speed up progress.\n", + "\n", + "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature." + ] + }, + { + "cell_type": "markdown", + "id": "477a053c", + "metadata": { + "editable": true + }, + "source": [ + "## Motivation for Adaptive Step Sizes\n", + "\n", + "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n", + "\n", + "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n", + "\n", + "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n", + "\n", + "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n", + "\n", + "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods." + ] + }, + { + "cell_type": "markdown", + "id": "f0924df8", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "7743f26d", + "metadata": { + "editable": true + }, + "source": [ + "## Derivation of the AdaGrad Algorithm\n", + "\n", + "**Accumulating Gradient History.**\n", + "\n", + "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n", + "\n", + "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n", + "\n", + "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n", + "\n", + "4. At each iteration $t$, update the accumulation:" + ] + }, + { + "cell_type": "markdown", + "id": "ef4b5d6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "r_t = r_{t-1} + g_t \\circ g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "927e2738", + "metadata": { + "editable": true + }, + "source": [ + "1. Here $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n", + "\n", + "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$." + ] + }, + { + "cell_type": "markdown", + "id": "1753de13", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Update Rule Derivation\n", + "\n", + "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:" + ] + }, + { + "cell_type": "markdown", + "id": "0db67ba3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7831e978", + "metadata": { + "editable": true + }, + "source": [ + "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n", + "In coordinates, this means each parameter $j$ has an individual step size:" + ] + }, + { + "cell_type": "markdown", + "id": "92a7758a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df62a4ff", + "metadata": { + "editable": true + }, + "source": [ + "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:" + ] + }, + { + "cell_type": "markdown", + "id": "c8a2b948", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3f269e80", + "metadata": { + "editable": true + }, + "source": [ + "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows." + ] + }, + { + "cell_type": "markdown", + "id": "f4ec584c", + "metadata": { + "editable": true + }, + "source": [ + "## AdaGrad Properties\n", + "\n", + "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n", + "\n", + "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n", + "\n", + "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n", + "\n", + "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem\n", + "\n", + "It effectively reduces the need to tune $\\eta$ by hand.\n", + "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)" + ] + }, + { + "cell_type": "markdown", + "id": "4b741016", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp: Adaptive Learning Rates\n", + "\n", + "Addresses AdaGrad’s diminishing learning rate issue.\n", + "Uses a decaying average of squared gradients (instead of a cumulative sum):" + ] + }, + { + "cell_type": "markdown", + "id": "76108e75", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4c6a3353", + "metadata": { + "editable": true + }, + "source": [ + "with $\\rho$ typically $0.9$ (or $0.99$).\n", + "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n", + "\n", + "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n", + "\n", + "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n", + "\n", + "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)" + ] + }, + { + "cell_type": "markdown", + "id": "3e0a76ae", + "metadata": { + "editable": true + }, + "source": [ + "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "fa5fd82e", + "metadata": { + "editable": true + }, + "source": [ + "## Adam Optimizer\n", + "\n", + "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n", + "\n", + "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice." + ] + }, + { + "cell_type": "markdown", + "id": "89cda2f6", + "metadata": { + "editable": true + }, + "source": [ + "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n", + "\n", + "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n", + "both the first and second moment of the gradient and use this\n", + "information to adaptively change the learning rate for different\n", + "parameters. The method is efficient when working with large\n", + "problems involving lots data and/or parameters. It is a combination of the\n", + "gradient descent with momentum algorithm and the RMSprop algorithm\n", + "discussed above." + ] + }, + { + "cell_type": "markdown", + "id": "69310c2b", + "metadata": { + "editable": true + }, + "source": [ + "## Why Combine Momentum and RMSProp?\n", + "\n", + "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n", + "\n", + "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n", + "\n", + "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n", + "\n", + "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n", + "\n", + "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice" + ] + }, + { + "cell_type": "markdown", + "id": "7d6b8734", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Exponential Moving Averages (Moments)\n", + "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n", + "**First moment (mean) $m_t$.**\n", + "\n", + "The Momentum term" + ] + }, + { + "cell_type": "markdown", + "id": "106ce6bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ba64fd6", + "metadata": { + "editable": true + }, + "source": [ + "**Second moment (uncentered variance) $v_t$.**\n", + "\n", + "The RMS term" + ] + }, + { + "cell_type": "markdown", + "id": "d2e1a9ee", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "00aae51f", + "metadata": { + "editable": true + }, + "source": [ + "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n", + "\n", + " These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)" + ] + }, + { + "cell_type": "markdown", + "id": "38adfadd", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Bias Correction\n", + "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates" + ] + }, + { + "cell_type": "markdown", + "id": "484156fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45d1d0c2", + "metadata": { + "editable": true + }, + "source": [ + "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n", + "\n", + "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n", + "\n", + "* Bias correction is important for Adam’s stability in early iterations" + ] + }, + { + "cell_type": "markdown", + "id": "e62d5568", + "metadata": { + "editable": true + }, + "source": [ + "## Adam: Update Rule Derivation\n", + "Finally, Adam updates parameters using the bias-corrected moments:" + ] + }, + { + "cell_type": "markdown", + "id": "3eb873c1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fc1129f6", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n", + "Breaking it down:\n", + "1. Compute gradient $\\nabla C(\\theta_t)$.\n", + "\n", + "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n", + "\n", + "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n", + "\n", + "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n", + "\n", + "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n", + "\n", + "This is the Adam update rule as given in the original paper." + ] + }, + { + "cell_type": "markdown", + "id": "6f15ce48", + "metadata": { + "editable": true + }, + "source": [ + "## Adam vs. AdaGrad and RMSProp\n", + "\n", + "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n", + "\n", + "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n", + "\n", + "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n", + "\n", + " * Momentum ($m_t$) provides acceleration and smoother convergence.\n", + "\n", + " * Adaptive $v_t$ scaling moderates the step size per dimension.\n", + "\n", + " * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n", + "\n", + "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone" + ] + }, + { + "cell_type": "markdown", + "id": "44cb65e2", + "metadata": { + "editable": true + }, + "source": [ + "## Adaptivity Across Dimensions\n", + "\n", + "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n", + "\n", + "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n", + "\n", + "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction." + ] + }, + { + "cell_type": "markdown", + "id": "e3862c40", + "metadata": { + "editable": true + }, + "source": [ + "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "c4aa2b35", + "metadata": { + "editable": true + }, + "source": [ + "## Algorithms and codes for Adagrad, RMSprop and Adam\n", + "\n", + "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n", + "\n", + "The codes which implement these algorithms are discussed below here." + ] + }, + { + "cell_type": "markdown", + "id": "01de27d3", + "metadata": { + "editable": true + }, + "source": [ + "## Practical tips\n", + "\n", + "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n", + "\n", + "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n", + "\n", + "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n", + "\n", + "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications." + ] + }, + { + "cell_type": "markdown", + "id": "78a1a601", + "metadata": { + "editable": true + }, + "source": [ + "## Sneaking in automatic differentiation using Autograd\n", + "\n", + "In the examples here we take the liberty of sneaking in automatic\n", + "differentiation (without having discussed the mathematics). In\n", + "project 1 you will write the gradients as discussed above, that is\n", + "hard-coding the gradients. By introducing automatic differentiation\n", + "via the library **autograd**, which is now replaced by **JAX**, we have\n", + "more flexibility in setting up alternative cost functions.\n", + "\n", + "The\n", + "first example shows results with ordinary leats squares." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c721352d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e36cec47", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "fc5df7eb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients for OLS\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x#+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 30\n", + "\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= eta*gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "# Now improve with momentum gradient descent\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "for iter in range(Niterations):\n", + " # calculate gradient\n", + " gradients = training_gradient(theta)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own gd wth momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "0b27af70", + "metadata": { + "editable": true + }, + "source": [ + "## Including Stochastic Gradient Descent with Autograd\n", + "\n", + "In this code we include the stochastic gradient descent approach\n", + "discussed above. Note here that we specify which argument we are\n", + "taking the derivative with respect to when using **autograd**." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "adef9763", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 1000\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "xnew = np.array([[0],[2]])\n", + "Xnew = np.c_[np.ones((2,1)), xnew]\n", + "ypredict = Xnew.dot(theta)\n", + "ypredict2 = Xnew.dot(theta_linreg)\n", + "\n", + "plt.plot(xnew, ypredict, \"r-\")\n", + "plt.plot(xnew, ypredict2, \"b-\")\n", + "plt.plot(x, y ,'ro')\n", + "plt.axis([0,2.0,0, 15.0])\n", + "plt.xlabel(r'$x$')\n", + "plt.ylabel(r'$y$')\n", + "plt.title(r'Random numbers ')\n", + "plt.show()\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "for epoch in range(n_epochs):\n", + "# Can you figure out a better way of setting up the contributions to each batch?\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " theta = theta - eta*gradients\n", + "print(\"theta from own sdg\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "310fe5b2", + "metadata": { + "editable": true + }, + "source": [ + "## Same code but now with momentum gradient descent" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bcf65acf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using SGD\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "EigValues, EigVectors = np.linalg.eig(H)\n", + "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", + "\n", + "theta = np.random.randn(2,1)\n", + "eta = 1.0/np.max(EigValues)\n", + "Niterations = 100\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = (1.0/n)*training_gradient(y, X, theta)\n", + " theta -= eta*gradients\n", + "print(\"theta from own gd\")\n", + "print(theta)\n", + "\n", + "\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "t0, t1 = 5, 50\n", + "def learning_schedule(t):\n", + " return t0/(t+t1)\n", + "\n", + "theta = np.random.randn(2,1)\n", + "\n", + "change = 0.0\n", + "delta_momentum = 0.3\n", + "\n", + "for epoch in range(n_epochs):\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " eta = learning_schedule(epoch*m+i)\n", + " # calculate update\n", + " new_change = eta*gradients+delta_momentum*change\n", + " # take a step\n", + " theta -= new_change\n", + " # save the change\n", + " change = new_change\n", + "print(\"theta from own sdg with momentum\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "f5e2c550", + "metadata": { + "editable": true + }, + "source": [ + "## But none of these can compete with Newton's method\n", + "\n", + "Note that we here have introduced automatic differentiation" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "300a02a4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Newton's method\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "from autograd import grad\n", + "\n", + "def CostOLS(theta):\n", + " return (1.0/n)*np.sum((y-X @ theta)**2)\n", + "\n", + "n = 100\n", + "x = 2*np.random.rand(n,1)\n", + "y = 4+3*x+5*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "# Hessian matrix\n", + "H = (2.0/n)* XT_X\n", + "# Note that here the Hessian does not depend on the parameters theta\n", + "invH = np.linalg.pinv(H)\n", + "theta = np.random.randn(3,1)\n", + "Niterations = 5\n", + "# define the gradient\n", + "training_gradient = grad(CostOLS)\n", + "\n", + "for iter in range(Niterations):\n", + " gradients = training_gradient(theta)\n", + " theta -= invH @ gradients\n", + " print(iter,gradients[0],gradients[1])\n", + "print(\"theta from own Newton code\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "5cb5fd26", + "metadata": { + "editable": true + }, + "source": [ + "## Similar (second order function now) problem but now with AdaGrad" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "030efc5d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " Giter += gradients*gradients\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + " theta -= update\n", + "print(\"theta from own AdaGrad\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "66850bb7", + "metadata": { + "editable": true + }, + "source": [ + "Running this code we note an almost perfect agreement with the results from matrix inversion." + ] + }, + { + "cell_type": "markdown", + "id": "e1608bcf", + "metadata": { + "editable": true + }, + "source": [ + "## RMSprop for adaptive learning rate with Stochastic Gradient Descent" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "0ba7d8f7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameter rho\n", + "rho = 0.99\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-8\n", + "for epoch in range(n_epochs):\n", + " Giter = 0.0\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + "\t# Accumulated gradient\n", + "\t# Scaling with rho the new and the previous results\n", + " Giter = (rho*Giter+(1-rho)*gradients*gradients)\n", + "\t# Taking the diagonal only and inverting\n", + " update = gradients*eta/(delta+np.sqrt(Giter))\n", + "\t# Hadamard product\n", + " theta -= update\n", + "print(\"theta from own RMSprop\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "0503f74b", + "metadata": { + "editable": true + }, + "source": [ + "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "c2a2732a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Using Autograd to calculate gradients using RMSprop and Stochastic Gradient descent\n", + "# OLS example\n", + "from random import random, seed\n", + "import numpy as np\n", + "import autograd.numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from autograd import grad\n", + "\n", + "# Note change from previous example\n", + "def CostOLS(y,X,theta):\n", + " return np.sum((y-X @ theta)**2)\n", + "\n", + "n = 1000\n", + "x = np.random.rand(n,1)\n", + "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n", + "\n", + "X = np.c_[np.ones((n,1)), x, x*x]\n", + "XT_X = X.T @ X\n", + "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n", + "print(\"Own inversion\")\n", + "print(theta_linreg)\n", + "\n", + "\n", + "# Note that we request the derivative wrt third argument (theta, 2 here)\n", + "training_gradient = grad(CostOLS,2)\n", + "# Define parameters for Stochastic Gradient Descent\n", + "n_epochs = 50\n", + "M = 5 #size of each minibatch\n", + "m = int(n/M) #number of minibatches\n", + "# Guess for unknown parameters theta\n", + "theta = np.random.randn(3,1)\n", + "\n", + "# Value for learning rate\n", + "eta = 0.01\n", + "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n", + "theta1 = 0.9\n", + "theta2 = 0.999\n", + "# Including AdaGrad parameter to avoid possible division by zero\n", + "delta = 1e-7\n", + "iter = 0\n", + "for epoch in range(n_epochs):\n", + " first_moment = 0.0\n", + " second_moment = 0.0\n", + " iter += 1\n", + " for i in range(m):\n", + " random_index = M*np.random.randint(m)\n", + " xi = X[random_index:random_index+M]\n", + " yi = y[random_index:random_index+M]\n", + " gradients = (1.0/M)*training_gradient(yi, xi, theta)\n", + " # Computing moments first\n", + " first_moment = theta1*first_moment + (1-theta1)*gradients\n", + " second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n", + " first_term = first_moment/(1.0-theta1**iter)\n", + " second_term = second_moment/(1.0-theta2**iter)\n", + "\t# Scaling with rho the new and the previous results\n", + " update = eta*first_term/(np.sqrt(second_term)+delta)\n", + " theta -= update\n", + "print(\"theta from own ADAM\")\n", + "print(theta)" + ] + }, + { + "cell_type": "markdown", + "id": "b8475863", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n", + "\n", + "2. Work on project 1\n", + "\n", + "\n", + "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended." + ] + }, + { + "cell_type": "markdown", + "id": "4d4d0717", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on different scaling methods\n", + "\n", + "Before fitting a regression model, it is good practice to normalize or\n", + "standardize the features. This ensures all features are on a\n", + "comparable scale, which is especially important when using\n", + "regularization. In the exercises this week we will perform standardization, scaling each\n", + "feature to have mean 0 and standard deviation 1.\n", + "\n", + "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n", + "Then we subtract the mean and divide by the standard deviation for each feature.\n", + "\n", + "In the example here we\n", + "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n", + "(and each feature) means the model does not require a separate intercept\n", + "term, the data is shifted such that the intercept is effectively 0\n", + ". (In practice, one could include an intercept in the model and not\n", + "penalize it, but here we simplify by centering.)\n", + "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "46375144", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Standardize features (zero mean, unit variance for each feature)\n", + "X_mean = X.mean(axis=0)\n", + "X_std = X.std(axis=0)\n", + "X_std[X_std == 0] = 1 # safeguard to avoid division by zero for constant features\n", + "X_norm = (X - X_mean) / X_std\n", + "\n", + "# Center the target to zero mean (optional, to simplify intercept handling)\n", + "y_mean = ?\n", + "y_centered = ?" + ] + }, + { + "cell_type": "markdown", + "id": "39426ccf", + "metadata": { + "editable": true + }, + "source": [ + "Do we need to center the values of $y$?\n", + "\n", + "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n", + "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n", + "nicer and ensures the regularization penalty $\\lambda \\sum_j\n", + "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n", + "same scale)." + ] + }, + { + "cell_type": "markdown", + "id": "df7fe27f", + "metadata": { + "editable": true + }, + "source": [ + "## Functionality in Scikit-Learn\n", + "\n", + "**Scikit-Learn** has several functions which allow us to rescale the\n", + "data, normally resulting in much better results in terms of various\n", + "accuracy scores. The **StandardScaler** function in **Scikit-Learn**\n", + "ensures that for each feature/predictor we study the mean value is\n", + "zero and the variance is one (every column in the design/feature\n", + "matrix). This scaling has the drawback that it does not ensure that\n", + "we have a particular maximum or minimum in our data set. Another\n", + "function included in **Scikit-Learn** is the **MinMaxScaler** which\n", + "ensures that all features are exactly between $0$ and $1$. The" + ] + }, + { + "cell_type": "markdown", + "id": "8fd48e39", + "metadata": { + "editable": true + }, + "source": [ + "## More preprocessing\n", + "\n", + "The **Normalizer** scales each data\n", + "point such that the feature vector has a euclidean length of one. In other words, it\n", + "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n", + "radius of 1. This means every data point is scaled by a different number (by the\n", + "inverse of it’s length).\n", + "This normalization is often used when only the direction (or angle) of the data matters,\n", + "not the length of the feature vector.\n", + "\n", + "The **RobustScaler** works similarly to the StandardScaler in that it\n", + "ensures statistical properties for each feature that guarantee that\n", + "they are on the same scale. However, the RobustScaler uses the median\n", + "and quartiles, instead of mean and variance. This makes the\n", + "RobustScaler ignore data points that are very different from the rest\n", + "(like measurement errors). These odd data points are also called\n", + "outliers, and might often lead to trouble for other scaling\n", + "techniques." + ] + }, + { + "cell_type": "markdown", + "id": "d6c60a0a", + "metadata": { + "editable": true + }, + "source": [ + "## Frequently used scaling functions\n", + "\n", + "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n", + "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:" + ] + }, + { + "cell_type": "markdown", + "id": "1bb6eaa0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25135896", + "metadata": { + "editable": true + }, + "source": [ + "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively, of the feature $x_j$.\n", + "This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one.\n", + "\n", + "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n", + "on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "469ca11e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "#Model training, we compute the mean value of y and X\n", + "y_train_mean = np.mean(y_train)\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "X_train = X_train - X_train_mean\n", + "y_train = y_train - y_train_mean\n", + "\n", + "# The we fit our model with the training data\n", + "trained_model = some_model.fit(X_train,y_train)\n", + "\n", + "\n", + "#Model prediction, we need also to transform our data set used for the prediction.\n", + "X_test = X_test - X_train_mean #Use mean from training data\n", + "y_pred = trained_model(X_test)\n", + "y_pred = y_pred + y_train_mean\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "33722029", + "metadata": { + "editable": true + }, + "source": [ + "Let us try to understand what this may imply mathematically when we\n", + "subtract the mean values, also known as *zero centering*. For\n", + "simplicity, we will focus on ordinary regression, as done in the above example.\n", + "\n", + "The cost/loss function for regression is" + ] + }, + { + "cell_type": "markdown", + "id": "fe27291e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ead1167d", + "metadata": { + "editable": true + }, + "source": [ + "Recall also that we use the squared value. This expression can lead to an\n", + "increased penalty for higher differences between predicted and\n", + "output/target values.\n", + "\n", + "What we have done is to single out the $\\theta_0$ term in the\n", + "definition of the mean squared error (MSE). The design matrix $X$\n", + "does in this case not contain any intercept column. When we take the\n", + "derivative with respect to $\\theta_0$, we want the derivative to obey" + ] + }, + { + "cell_type": "markdown", + "id": "b2efb706", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65333100", + "metadata": { + "editable": true + }, + "source": [ + "for all $j$. For $\\theta_0$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "1fde497c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "264ce562", + "metadata": { + "editable": true + }, + "source": [ + "Multiplying away the constant $2/n$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "0f63a6f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2ba0a6e4", + "metadata": { + "editable": true + }, + "source": [ + "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n", + "Our result for $\\theta_0$ simplifies then to" + ] + }, + { + "cell_type": "markdown", + "id": "3b377f93", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f05e9d08", + "metadata": { + "editable": true + }, + "source": [ + "We obtain then" + ] + }, + { + "cell_type": "markdown", + "id": "84784b8e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b62c6e5a", + "metadata": { + "editable": true + }, + "source": [ + "If we define" + ] + }, + { + "cell_type": "markdown", + "id": "ecce9763", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9e1842a", + "metadata": { + "editable": true + }, + "source": [ + "and the mean value of the outputs as" + ] + }, + { + "cell_type": "markdown", + "id": "be12163e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a097e9ab", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "239422b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed9778bb", + "metadata": { + "editable": true + }, + "source": [ + "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have" + ] + }, + { + "cell_type": "markdown", + "id": "7179b77b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aad2f56e", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite the latter equation as" + ] + }, + { + "cell_type": "markdown", + "id": "26aa9739", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d270cb13", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "5a52457b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c98105d", + "metadata": { + "editable": true + }, + "source": [ + "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n", + "\n", + "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)" + ] + }, + { + "cell_type": "markdown", + "id": "4d82302f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a3a07a10", + "metadata": { + "editable": true + }, + "source": [ + "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then" + ] + }, + { + "cell_type": "markdown", + "id": "ea19374e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11dd1361", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n", + "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n", + "\n", + "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then" + ] + }, + { + "cell_type": "markdown", + "id": "f6a52f34", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9d6807dc", + "metadata": { + "editable": true + }, + "source": [ + "What does this mean? And why do we insist on all this? Let us look at some examples.\n", + "\n", + "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n", + "Note also that we do not split the data into training and test." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "2ed0cafc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "\n", + "np.random.seed(2021)\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "def fit_theta(X, y):\n", + " return np.linalg.pinv(X.T @ X) @ X.T @ y\n", + "\n", + "\n", + "true_theta = [2, 0.5, 3.7]\n", + "\n", + "x = np.linspace(0, 1, 11)\n", + "y = np.sum(\n", + " np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n", + ") + 0.1 * np.random.normal(size=len(x))\n", + "\n", + "degree = 3\n", + "X = np.zeros((len(x), degree))\n", + "\n", + "# Include the intercept in the design matrix\n", + "for p in range(degree):\n", + " X[:, p] = x ** p\n", + "\n", + "theta = fit_theta(X, y)\n", + "\n", + "# Intercept is included in the design matrix\n", + "skl = LinearRegression(fit_intercept=False).fit(X, y)\n", + "\n", + "print(f\"True theta: {true_theta}\")\n", + "print(f\"Fitted theta: {theta}\")\n", + "print(f\"Sklearn fitted theta: {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with intercept column\")\n", + "print(MSE(y,ypredictOwn))\n", + "print(f\"MSE with intercept column from SKL\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "\n", + "plt.figure()\n", + "plt.scatter(x, y, label=\"Data\")\n", + "plt.plot(x, X @ theta, label=\"Fit\")\n", + "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n", + "\n", + "\n", + "# Do not include the intercept in the design matrix\n", + "X = np.zeros((len(x), degree - 1))\n", + "\n", + "for p in range(degree - 1):\n", + " X[:, p] = x ** (p + 1)\n", + "\n", + "# Intercept is not included in the design matrix\n", + "skl = LinearRegression(fit_intercept=True).fit(X, y)\n", + "\n", + "# Use centered values for X and y when computing coefficients\n", + "y_offset = np.average(y, axis=0)\n", + "X_offset = np.average(X, axis=0)\n", + "\n", + "theta = fit_theta(X - X_offset, y - y_offset)\n", + "intercept = np.mean(y_offset - X_offset @ theta)\n", + "\n", + "print(f\"Manual intercept: {intercept}\")\n", + "print(f\"Fitted theta (without intercept): {theta}\")\n", + "print(f\"Sklearn intercept: {skl.intercept_}\")\n", + "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n", + "ypredictOwn = X @ theta\n", + "ypredictSKL = skl.predict(X)\n", + "print(f\"MSE with Manual intercept\")\n", + "print(MSE(y,ypredictOwn+intercept))\n", + "print(f\"MSE with Sklearn intercept\")\n", + "print(MSE(y,ypredictSKL))\n", + "\n", + "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n", + "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n", + "plt.grid()\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "f72dbb49", + "metadata": { + "editable": true + }, + "source": [ + "The intercept is the value of our output/target variable\n", + "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n", + "\n", + "Printing the MSE, we see first that both methods give the same MSE, as\n", + "they should. However, when we move to for example Ridge regression,\n", + "the way we treat the intercept may give a larger or smaller MSE,\n", + "meaning that the MSE can be penalized by the value of the\n", + "intercept. Not including the intercept in the fit, means that the\n", + "regularization term does not include $\\theta_0$. For different values\n", + "of $\\lambda$, this may lead to different MSE values. \n", + "\n", + "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by" + ] + }, + { + "cell_type": "markdown", + "id": "b7759b1f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba0ecd6e", + "metadata": { + "editable": true + }, + "source": [ + "but when we take out the intercept, this equation becomes" + ] + }, + { + "cell_type": "markdown", + "id": "ae897f1e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f9c41f7f", + "metadata": { + "editable": true + }, + "source": [ + "For Lasso regression we have" + ] + }, + { + "cell_type": "markdown", + "id": "fa013cc4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0c9b24be", + "metadata": { + "editable": true + }, + "source": [ + "It means that, when scaling the design matrix and the outputs/targets,\n", + "by subtracting the mean values, we have an optimization problem which\n", + "is not penalized by the intercept. The MSE value can then be smaller\n", + "since it focuses only on the remaining quantities. If we however bring\n", + "back the intercept, we will get a MSE which then contains the\n", + "intercept.\n", + "\n", + "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known vanilla data set." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "4f9b1fa0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree))\n", + "#We include explicitely the intercept column\n", + "for degree in range(Maxpolydegree):\n", + " X[:,degree] = x**degree\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "p = Maxpolydegree\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n", + " # Note: we include the intercept column and no scaling\n", + " RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n", + " RegRidge.fit(X_train,y_train)\n", + " # and then make the prediction\n", + " ytildeOwnRidge = X_train @ OwnRidgeTheta\n", + " ypredictOwnRidge = X_test @ OwnRidgeTheta\n", + " ytildeRidge = RegRidge.predict(X_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta)\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "1aa5ca37", + "metadata": { + "editable": true + }, + "source": [ + "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n", + "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n", + "What happens if we do not include the intercept in our fit?\n", + "Let us see how we can change this code by zero centering." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "a731e32c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn import linear_model\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "def MSE(y_data,y_model):\n", + " n = np.size(y_model)\n", + " return np.sum((y_data-y_model)**2)/n\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(315)\n", + "\n", + "n = 100\n", + "x = np.random.rand(n)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n", + "\n", + "Maxpolydegree = 20\n", + "X = np.zeros((n,Maxpolydegree-1))\n", + "\n", + "for degree in range(1,Maxpolydegree): #No intercept column\n", + " X[:,degree-1] = x**(degree)\n", + "\n", + "# We split the data in test and training data\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", + "\n", + "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n", + "X_train_mean = np.mean(X_train,axis=0)\n", + "#Center by removing mean from each feature\n", + "X_train_scaled = X_train - X_train_mean \n", + "X_test_scaled = X_test - X_train_mean\n", + "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n", + "#Remove the intercept from the training data.\n", + "y_scaler = np.mean(y_train) \n", + "y_train_scaled = y_train - y_scaler \n", + "\n", + "p = Maxpolydegree-1\n", + "I = np.eye(p,p)\n", + "# Decide which values of lambda to use\n", + "nlambdas = 6\n", + "MSEOwnRidgePredict = np.zeros(nlambdas)\n", + "MSERidgePredict = np.zeros(nlambdas)\n", + "\n", + "lambdas = np.logspace(-4, 2, nlambdas)\n", + "for i in range(nlambdas):\n", + " lmb = lambdas[i]\n", + " OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n", + " intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n", + " #Add intercept to prediction\n", + " ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n", + " RegRidge = linear_model.Ridge(lmb)\n", + " RegRidge.fit(X_train,y_train)\n", + " ypredictRidge = RegRidge.predict(X_test)\n", + " MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n", + " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", + " print(\"Theta values for own Ridge implementation\")\n", + " print(OwnRidgeTheta) #Intercept is given by mean of target variable\n", + " print(\"Theta values for Scikit-Learn Ridge implementation\")\n", + " print(RegRidge.coef_)\n", + " print('Intercept from own implementation:')\n", + " print(intercept_)\n", + " print('Intercept from Scikit-Learn Ridge implementation')\n", + " print(RegRidge.intercept_)\n", + " print(\"MSE values for own Ridge implementation\")\n", + " print(MSEOwnRidgePredict[i])\n", + " print(\"MSE values for Scikit-Learn Ridge implementation\")\n", + " print(MSERidgePredict[i])\n", + "\n", + "\n", + "# Now plot the results\n", + "plt.figure()\n", + "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n", + "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('MSE')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6ea197d8", + "metadata": { + "editable": true + }, + "source": [ + "We see here, when compared to the code which includes explicitely the\n", + "intercept column, that our MSE value is actually smaller. This is\n", + "because the regularization term does not include the intercept value\n", + "$\\theta_0$ in the fitting. This applies to Lasso regularization as\n", + "well. It means that our optimization is now done only with the\n", + "centered matrix and/or vector that enter the fitting procedure." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week38.ipynb b/doc/LectureNotes/week38.ipynb new file mode 100644 index 000000000..1d25f9941 --- /dev/null +++ b/doc/LectureNotes/week38.ipynb @@ -0,0 +1,2283 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8f27372d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "fff8ca30", + "metadata": { + "editable": true + }, + "source": [ + "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n", + "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n", + "\n", + "Date: **September 15-19, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "7ee7e714", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 38, lecture Monday September 15\n", + "\n", + "**Material for the lecture on Monday September 15.**\n", + "\n", + "1. Statistical interpretation of OLS and various expectation values\n", + "\n", + "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", + "\n", + "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at \n", + "\n", + "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n", + "\n", + "5. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "3b5ac440", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos\n", + "1. Raschka et al, pages 175-192\n", + "\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", + "\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", + "\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", + "\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see " + ] + }, + { + "cell_type": "markdown", + "id": "6d5dba52", + "metadata": { + "editable": true + }, + "source": [ + "## Linking the regression analysis with a statistical interpretation\n", + "\n", + "We will now couple the discussions of ordinary least squares, Ridge\n", + "and Lasso regression with a statistical interpretation, that is we\n", + "move from a linear algebra analysis to a statistical analysis. In\n", + "particular, we will focus on what the regularization terms can result\n", + "in. We will amongst other things show that the regularization\n", + "parameter can reduce considerably the variance of the parameters\n", + "$\\theta$.\n", + "\n", + "On of the advantages of doing linear regression is that we actually end up with\n", + "analytical expressions for several statistical quantities. \n", + "Standard least squares and Ridge regression allow us to\n", + "derive quantities like the variance and other expectation values in a\n", + "rather straightforward way.\n", + "\n", + "It is assumed that $\\varepsilon_i\n", + "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n", + "independent, i.e.:" + ] + }, + { + "cell_type": "markdown", + "id": "bfc2983a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \n", + "\\mbox{Cov}(\\varepsilon_{i_1},\n", + "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n", + "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2. \\end{array} \\right.\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b5f5980", + "metadata": { + "editable": true + }, + "source": [ + "The randomness of $\\varepsilon_i$ implies that\n", + "$\\mathbf{y}_i$ is also a random variable. In particular,\n", + "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n", + "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n", + "non-random scalar. To specify the parameters of the distribution of\n", + "$\\mathbf{y}_i$ we need to calculate its first two moments. \n", + "\n", + "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n", + "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n", + "row number $i$ and perform a sum over all values $p$." + ] + }, + { + "cell_type": "markdown", + "id": "3464c7e8", + "metadata": { + "editable": true + }, + "source": [ + "## Assumptions made\n", + "\n", + "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n", + "that there exists a function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n", + "which describe our data" + ] + }, + { + "cell_type": "markdown", + "id": "ed0fd2df", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "feb9d4c2", + "metadata": { + "editable": true + }, + "source": [ + "We approximate this function with our model from the solution of the linear regression equations, that is our\n", + "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with" + ] + }, + { + "cell_type": "markdown", + "id": "eb6d71f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "566399f6", + "metadata": { + "editable": true + }, + "source": [ + "## Expectation value and variance\n", + "\n", + "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$" + ] + }, + { + "cell_type": "markdown", + "id": "6b33f497", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \n", + "\\mathbb{E}(y_i) & =\n", + "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n", + "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5f2f79f2", + "metadata": { + "editable": true + }, + "source": [ + "while\n", + "its variance is" + ] + }, + { + "cell_type": "markdown", + "id": "199121b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n", + "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n", + "[\\mathbb{E}(y_i)]^2 \\\\ & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n", + "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n", + "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n", + "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n", + "\\ast} \\, \\theta)^2 \\\\ & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n", + "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n", + "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n", + "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n", + "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2. \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a1cc529", + "metadata": { + "editable": true + }, + "source": [ + "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n", + "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)." + ] + }, + { + "cell_type": "markdown", + "id": "149e63be", + "metadata": { + "editable": true + }, + "source": [ + "## Expectation value and variance for $\\boldsymbol{\\theta}$\n", + "\n", + "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "6a6fb04a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "79420d06", + "metadata": { + "editable": true + }, + "source": [ + "This means that the estimator of the regression parameters is unbiased.\n", + "\n", + "We can also calculate the variance\n", + "\n", + "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is" + ] + }, + { + "cell_type": "markdown", + "id": "0e3de992", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{eqnarray*}\n", + "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n", + "\\\\\n", + "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n", + "\\\\\n", + "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n", + "% \\\\\n", + "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n", + "\\\\\n", + "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n", + "\\end{eqnarray*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d3ea2897", + "metadata": { + "editable": true + }, + "source": [ + "where we have used that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n", + "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n", + "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n", + "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n", + "variance of the estimate of the $j$-th regression coefficient:\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n", + "construct a confidence interval for the estimates.\n", + "\n", + "In a similar way, we can obtain analytical expressions for say the\n", + "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n", + "when we employ Ridge regression, allowing us again to define a confidence interval. \n", + "\n", + "It is rather straightforward to show that" + ] + }, + { + "cell_type": "markdown", + "id": "da5e3927", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ab5488b", + "metadata": { + "editable": true + }, + "source": [ + "We see clearly that \n", + "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n", + "\n", + "We can also compute the variance as" + ] + }, + { + "cell_type": "markdown", + "id": "f904a739", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T} \\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10fd648b", + "metadata": { + "editable": true + }, + "source": [ + "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n", + "\n", + "With this, we can compute the difference" + ] + }, + { + "cell_type": "markdown", + "id": "4812c2a4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "199d8531", + "metadata": { + "editable": true + }, + "source": [ + "The difference is non-negative definite since each component of the\n", + "matrix product is non-negative definite. \n", + "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below." + ] + }, + { + "cell_type": "markdown", + "id": "96c16676", + "metadata": { + "editable": true + }, + "source": [ + "## Deriving OLS from a probability distribution\n", + "\n", + "Our basic assumption when we derived the OLS equations was to assume\n", + "that our output is determined by a given continuous function\n", + "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n", + "distribution with zero mean value and an undetermined variance\n", + "$\\sigma^2$.\n", + "\n", + "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n", + "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n", + "the design matrix are not stochastic variables, we can assume that the\n", + "probability distribution of our targets is also a normal distribution\n", + "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n", + "single output $y_i$ is given by the Gaussian distribution" + ] + }, + { + "cell_type": "markdown", + "id": "a2a1a004", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5aad445b", + "metadata": { + "editable": true + }, + "source": [ + "## Independent and Identically Distributed (iid)\n", + "\n", + "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n", + "We define this distribution as" + ] + }, + { + "cell_type": "markdown", + "id": "d197c8bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e2e7462f", + "metadata": { + "editable": true + }, + "source": [ + "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n", + "\n", + "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have" + ] + }, + { + "cell_type": "markdown", + "id": "eb635d3d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "445ed13e", + "metadata": { + "editable": true + }, + "source": [ + "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n", + "in case we have a simple one-dimensional input and output case" + ] + }, + { + "cell_type": "markdown", + "id": "319bfc6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "90abf35a", + "metadata": { + "editable": true + }, + "source": [ + "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n", + "We can now rewrite the above probability as" + ] + }, + { + "cell_type": "markdown", + "id": "04b66fbd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4a27b5a7", + "metadata": { + "editable": true + }, + "source": [ + "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$." + ] + }, + { + "cell_type": "markdown", + "id": "8d12543f", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum Likelihood Estimation (MLE)\n", + "\n", + "In statistics, maximum likelihood estimation (MLE) is a method of\n", + "estimating the parameters of an assumed probability distribution,\n", + "given some observed data. This is achieved by maximizing a likelihood\n", + "function so that, under the assumed statistical model, the observed\n", + "data is the most probable. \n", + "\n", + "We will assume here that our events are given by the above Gaussian\n", + "distribution and we will determine the optimal parameters $\\theta$ by\n", + "maximizing the above PDF. However, computing the derivatives of a\n", + "product function is cumbersome and can easily lead to overflow and/or\n", + "underflowproblems, with potentials for loss of numerical precision.\n", + "\n", + "In practice, it is more convenient to maximize the logarithm of the\n", + "PDF because it is a monotonically increasing function of the argument.\n", + "Alternatively, and this will be our option, we will minimize the\n", + "negative of the logarithm since this is a monotonically decreasing\n", + "function.\n", + "\n", + "Note also that maximization/minimization of the logarithm of the PDF\n", + "is equivalent to the maximization/minimization of the function itself." + ] + }, + { + "cell_type": "markdown", + "id": "2e5cd118", + "metadata": { + "editable": true + }, + "source": [ + "## A new Cost Function\n", + "\n", + "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF" + ] + }, + { + "cell_type": "markdown", + "id": "c71a5edf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e663bf2e", + "metadata": { + "editable": true + }, + "source": [ + "which becomes" + ] + }, + { + "cell_type": "markdown", + "id": "c4bc4873", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f5bc59b8", + "metadata": { + "editable": true + }, + "source": [ + "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely" + ] + }, + { + "cell_type": "markdown", + "id": "4f6ddf4a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "afda0a6b", + "metadata": { + "editable": true + }, + "source": [ + "which leads to the well-known OLS equation for the optimal paramters $\\theta$" + ] + }, + { + "cell_type": "markdown", + "id": "b5335dc0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f86a52d", + "metadata": { + "editable": true + }, + "source": [ + "Next week we will make a similar analysis for Ridge and Lasso regression" + ] + }, + { + "cell_type": "markdown", + "id": "5cdb1767", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods\n", + "\n", + "Before we proceed, we need to rethink what we have been doing. In our\n", + "eager to fit the data, we have omitted several important elements in\n", + "our regression analysis. In what follows we will\n", + "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n", + "\n", + "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n", + "\n", + "and discuss how to select a given model (one of the difficult parts in machine learning)." + ] + }, + { + "cell_type": "markdown", + "id": "69435d77", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", + "\n", + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n", + "cross-validation and the bootstrap method." + ] + }, + { + "cell_type": "markdown", + "id": "cefbb559", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." + ] + }, + { + "cell_type": "markdown", + "id": "2659401a", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." + ] + }, + { + "cell_type": "markdown", + "id": "4d5d7748", + "metadata": { + "editable": true + }, + "source": [ + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." + ] + }, + { + "cell_type": "markdown", + "id": "54df92b3", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", + "\n", + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." + ] + }, + { + "cell_type": "markdown", + "id": "5b1a1390", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n", + "\n", + "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**." + ] + }, + { + "cell_type": "markdown", + "id": "39f233e4", + "metadata": { + "editable": true + }, + "source": [ + "## The Central Limit Theorem\n", + "\n", + "Suppose we have a PDF $p(x)$ from which we generate a series $N$\n", + "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n", + "is viewed as the average of a specific measurement, e.g., throwing \n", + "dice 100 times and then taking the average value, or producing a certain\n", + "amount of random numbers. \n", + "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n", + "which follows. We do the same for $\\mathbb{E}[z]=z$.\n", + "\n", + "If we compute the mean $z$ of $m$ such mean values $x_i$" + ] + }, + { + "cell_type": "markdown", + "id": "361320d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a363db1e", + "metadata": { + "editable": true + }, + "source": [ + "the question we pose is which is the PDF of the new variable $z$." + ] + }, + { + "cell_type": "markdown", + "id": "92967efc", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the Limit\n", + "\n", + "The probability of obtaining an average value $z$ is the product of the \n", + "probabilities of obtaining arbitrary individual mean values $x_i$,\n", + "but with the constraint that the average is $z$. We can express this through\n", + "the following expression" + ] + }, + { + "cell_type": "markdown", + "id": "1bffca97", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n", + " \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0dacb6fc", + "metadata": { + "editable": true + }, + "source": [ + "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n", + "All measurements that lead to each individual $x_i$ are expected to\n", + "be independent, which in turn means that we can express $\\tilde{p}$ as the \n", + "product of individual $p(x_i)$. The independence assumption is important in the derivation of the central limit theorem." + ] + }, + { + "cell_type": "markdown", + "id": "baeedf81", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting the $\\delta$-function\n", + "\n", + "If we use the integral expression for the $\\delta$-function" + ] + }, + { + "cell_type": "markdown", + "id": "20cc7770", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f67d3b94", + "metadata": { + "editable": true + }, + "source": [ + "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n", + "we arrive at" + ] + }, + { + "cell_type": "markdown", + "id": "17f59fb6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n", + " dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5f899fbe", + "metadata": { + "editable": true + }, + "source": [ + "with the integral over $x$ resulting in" + ] + }, + { + "cell_type": "markdown", + "id": "19a1f5bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n", + " \\int_{-\\infty}^{\\infty}dxp(x)\n", + " \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1db8fcf2", + "metadata": { + "editable": true + }, + "source": [ + "## Identifying Terms\n", + "\n", + "The second term on the rhs disappears since this is just the mean and \n", + "employing the definition of $\\sigma^2$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "bfadf7e5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n", + " 1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c65ce24", + "metadata": { + "editable": true + }, + "source": [ + "resulting in" + ] + }, + { + "cell_type": "markdown", + "id": "8cd5650a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n", + " \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11fdc936", + "metadata": { + "editable": true + }, + "source": [ + "and in the limit $m\\rightarrow \\infty$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "ed88642e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n", + " \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82c61b81", + "metadata": { + "editable": true + }, + "source": [ + "which is the normal distribution with variance\n", + "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n", + "and $\\mu$ is also the mean of the PDF $p(x)$." + ] + }, + { + "cell_type": "markdown", + "id": "bc43db46", + "metadata": { + "editable": true + }, + "source": [ + "## Wrapping it up\n", + "\n", + "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n", + "the average of $m$ random values corresponding to a PDF $p(x)$ \n", + "is a normal distribution whose mean is the \n", + "mean value of the PDF $p(x)$ and whose variance is the variance\n", + "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n", + "\n", + "The central limit theorem leads to the well-known expression for the\n", + "standard deviation, given by" + ] + }, + { + "cell_type": "markdown", + "id": "25418113", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma_m=\n", + "\\frac{\\sigma}{\\sqrt{m}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e5d3c3eb", + "metadata": { + "editable": true + }, + "source": [ + "The latter is true only if the average value is known exactly. This is obtained in the limit\n", + "$m\\rightarrow \\infty$ only. Because the mean and the variance are measured quantities we obtain \n", + "the familiar expression in statistics (the so-called Bessel correction)" + ] + }, + { + "cell_type": "markdown", + "id": "c504cba4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma_m\\approx \n", + "\\frac{\\sigma}{\\sqrt{m-1}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "079ded2a", + "metadata": { + "editable": true + }, + "source": [ + "In many cases however the above estimate for the standard deviation,\n", + "in particular if correlations are strong, may be too simplistic. Keep\n", + "in mind that we have assumed that the variables $x$ are independent\n", + "and identically distributed. This is obviously not always the\n", + "case. For example, the random numbers (or better pseudorandom numbers)\n", + "we generate in various calculations do always exhibit some\n", + "correlations.\n", + "\n", + "The theorem is satisfied by a large class of PDFs. Note however that for a\n", + "finite $m$, it is not always possible to find a closed form /analytic expression for\n", + "$\\tilde{p}(x)$." + ] + }, + { + "cell_type": "markdown", + "id": "e8534a50", + "metadata": { + "editable": true + }, + "source": [ + "## Confidence Intervals\n", + "\n", + "Confidence intervals are used in statistics and represent a type of estimate\n", + "computed from the observed data. This gives a range of values for an\n", + "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n", + "\n", + "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n", + "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n", + "\n", + "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n", + "\n", + "This quantity can be used to\n", + "construct a confidence interval for the estimates." + ] + }, + { + "cell_type": "markdown", + "id": "2fc73431", + "metadata": { + "editable": true + }, + "source": [ + "## Standard Approach based on the Normal Distribution\n", + "\n", + "We will assume that the parameters $\\theta$ follow a normal\n", + "distribution. We can then define the confidence interval. Here we will be using as\n", + "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n", + "for the standard deviation. We have then a confidence interval" + ] + }, + { + "cell_type": "markdown", + "id": "0f8b0845", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25105753", + "metadata": { + "editable": true + }, + "source": [ + "where $z$ defines the level of certainty (or confidence). For a normal\n", + "distribution typical parameters are $z=2.576$ which corresponds to a\n", + "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n", + "$95\\%$. A confidence level of $95\\%$ is commonly used and it is\n", + "normally referred to as a *two-sigmas* confidence level, that is we\n", + "approximate $z\\approx 2$.\n", + "\n", + "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n", + "\n", + "In this text you will also find an in-depth discussion of the\n", + "Bootstrap method, why it works and various theorems related to it." + ] + }, + { + "cell_type": "markdown", + "id": "89be6eea", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap background\n", + "\n", + "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n", + "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n", + "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n", + "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n", + "$\\widehat{\\theta}$. You can think of this as using a histogram\n", + "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n", + "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n", + "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n", + "estimators." + ] + }, + { + "cell_type": "markdown", + "id": "6c240b38", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: More Bootstrap background\n", + "\n", + "In the case that $\\widehat{\\theta}$ has\n", + "more than one component, and the components are independent, we use the\n", + "same estimator on each component separately. If the probability\n", + "density function of $X_i$, $p(x)$, had been known, then it would have\n", + "been straightforward to do this by: \n", + "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n", + "\n", + "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n", + "\n", + "By repeated use of the above two points, many\n", + "estimates of $\\widehat{\\theta}$ can be obtained. The\n", + "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n", + "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$." + ] + }, + { + "cell_type": "markdown", + "id": "fbd95a5c", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap approach\n", + "\n", + "But\n", + "unless there is enough information available about the process that\n", + "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n", + "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552) asked the\n", + "question: What if we replace $p(x)$ by the relative frequency\n", + "of the observation $X_i$?\n", + "\n", + "If we draw observations in accordance with\n", + "the relative frequency of the observations, will we obtain the same\n", + "result in some asymptotic sense? The answer is yes." + ] + }, + { + "cell_type": "markdown", + "id": "dc50d43a", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap steps\n", + "\n", + "The independent bootstrap works like this: \n", + "\n", + "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n", + "\n", + "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n", + "\n", + "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n", + "\n", + "4. Repeat this process $k$ times. \n", + "\n", + "When you are done, you can draw a histogram of the relative frequency\n", + "of $\\widehat \\theta^*$. This is your estimate of the probability\n", + "distribution $p(t)$. Using this probability distribution you can\n", + "estimate any statistics thereof. In principle you never draw the\n", + "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n", + "you use the estimators corresponding to the statistic of interest. For\n", + "example, if you are interested in estimating the variance of $\\widehat\n", + "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n", + "$\\widehat \\theta^*$." + ] + }, + { + "cell_type": "markdown", + "id": "283068cc", + "metadata": { + "editable": true + }, + "source": [ + "## Code example for the Bootstrap method\n", + "\n", + "The following code starts with a Gaussian distribution with mean value\n", + "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n", + "used in the bootstrap analysis. The bootstrap analysis returns a data\n", + "set after a given number of bootstrap operations (as many as we have\n", + "data points). This data set consists of estimated mean values for each\n", + "bootstrap operation. The histogram generated by the bootstrap method\n", + "shows that the distribution for these mean values is also a Gaussian,\n", + "centered around the mean value $\\mu=100$ but with standard deviation\n", + "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n", + "this case the same as the number of original data points). The value\n", + "of the standard deviation is what we expect from the central limit\n", + "theorem." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ff4790ba", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import numpy as np\n", + "from time import time\n", + "from scipy.stats import norm\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Returns mean of bootstrap samples \n", + "# Bootstrap algorithm\n", + "def bootstrap(data, datapoints):\n", + " t = np.zeros(datapoints)\n", + " n = len(data)\n", + " # non-parametric bootstrap \n", + " for i in range(datapoints):\n", + " t[i] = np.mean(data[np.random.randint(0,n,n)])\n", + " # analysis \n", + " print(\"Bootstrap Statistics :\")\n", + " print(\"original bias std. error\")\n", + " print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n", + " return t\n", + "\n", + "# We set the mean value to 100 and the standard deviation to 15\n", + "mu, sigma = 100, 15\n", + "datapoints = 10000\n", + "# We generate random numbers according to the normal distribution\n", + "x = mu + sigma*np.random.randn(datapoints)\n", + "# bootstrap returns the data sample \n", + "t = bootstrap(x, datapoints)" + ] + }, + { + "cell_type": "markdown", + "id": "3e6adc2f", + "metadata": { + "editable": true + }, + "source": [ + "We see that our new variance and from that the standard deviation, agrees with the central limit theorem." + ] + }, + { + "cell_type": "markdown", + "id": "6ec8223c", + "metadata": { + "editable": true + }, + "source": [ + "## Plotting the Histogram" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3cf4144d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# the histogram of the bootstrapped data (normalized data if density = True)\n", + "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n", + "# add a 'best fit' line \n", + "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n", + "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n", + "plt.xlabel('x')\n", + "plt.ylabel('Probability')\n", + "plt.grid(True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "db5a8f91", + "metadata": { + "editable": true + }, + "source": [ + "## The bias-variance tradeoff\n", + "\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "327bce6a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c671d4e", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", + "\n", + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "6e05fc43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c45e0752", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this as" + ] + }, + { + "cell_type": "markdown", + "id": "bafa4ab6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ea0bc471", + "metadata": { + "editable": true + }, + "source": [ + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", + "\n", + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "08b603f3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4114d10e", + "metadata": { + "editable": true + }, + "source": [ + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "8890c666", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d5b7ce4", + "metadata": { + "editable": true + }, + "source": [ + "which, using the abovementioned expectation values can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "3913c5b9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5e0067b1", + "metadata": { + "editable": true + }, + "source": [ + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$." + ] + }, + { + "cell_type": "markdown", + "id": "326bc8f1", + "metadata": { + "editable": true + }, + "source": [ + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d3713eca", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Bias-Variance tradeoff" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "01c3b507", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 500\n", + "n_boostraps = 100\n", + "degree = 18 # A quite high value, just to show.\n", + "noise = 0.1\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-1, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n", + "\n", + "# Hold out some test data that is never used in training.\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "# Combine x transformation and model into one operation.\n", + "# Not neccesary, but convenient.\n", + "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + "\n", + "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n", + "# for each bootstrap iteration.\n", + "y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + "for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + "\n", + " # Evaluate the new model on the same test data each time.\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + "# Note: Expectations and variances taken w.r.t. different training\n", + "# data sets, hence the axis=1. Subsequent means are taken across the test data\n", + "# set in order to obtain a total value, but before this we have error/bias/variance\n", + "# calculated per data point in the test set.\n", + "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n", + "# maintains the column vector form. Dropping this yields very unexpected results.\n", + "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + "print('Error:', error)\n", + "print('Bias^2:', bias)\n", + "print('Var:', variance)\n", + "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n", + "\n", + "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n", + "plt.scatter(x_test, y_test, label='Data points')\n", + "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "949e3a5e", + "metadata": { + "editable": true + }, + "source": [ + "## Understanding what happens" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "7e7f4926", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "33c5cae5", + "metadata": { + "editable": true + }, + "source": [ + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." + ] + }, + { + "cell_type": "markdown", + "id": "f931f0f2", + "metadata": { + "editable": true + }, + "source": [ + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "58daa28d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3bbcf741", + "metadata": { + "editable": true + }, + "source": [ + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." + ] + }, + { + "cell_type": "markdown", + "id": "4b0ffe06", + "metadata": { + "editable": true + }, + "source": [ + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" + ] + }, + { + "cell_type": "markdown", + "id": "b11baed6", + "metadata": { + "editable": true + }, + "source": [ + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", + "\n", + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "39e76d49", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", + "\n", + "## Cross-validation on Ridge regression using KFold only\n", + "\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", + "\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", + "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", + "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", + "\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", + "\n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", + "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", + "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", + "\n", + " j += 1\n", + " i += 1\n", + "\n", + "\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", + "\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", + "\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e7d12ef0", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "47f6ae18", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9c1d4754", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "b698ac66", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "0a2409b0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "56f130b5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "This week we will discuss during the first hour of each lab session\n", + "some technicalities related to the project and methods for updating\n", + "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n", + "the jupyter-notebook from week 37 (September 12-16).\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see \n", + "\n", + "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at " + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week39.ipynb b/doc/LectureNotes/week39.ipynb new file mode 100644 index 000000000..1f411fe62 --- /dev/null +++ b/doc/LectureNotes/week39.ipynb @@ -0,0 +1,2430 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3a65fcc4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "284ac98b", + "metadata": { + "editable": true + }, + "source": [ + "# Week 39: Resampling methods and logistic regression\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n", + "\n", + "Date: **Week 39**" + ] + }, + { + "cell_type": "markdown", + "id": "582e0b32", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 39, September 22-26, 2025\n", + "\n", + "**Material for the lecture on Monday September 22.**\n", + "\n", + "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", + "\n", + "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n", + "\n", + "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n", + "\n", + "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "08ea52de", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos, resampling methods\n", + "1. Raschka et al, pages 175-192\n", + "\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", + "\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", + "\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", + "\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)" + ] + }, + { + "cell_type": "markdown", + "id": "a8d5878f", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos, logistic regression\n", + "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n", + "\n", + "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n", + "\n", + "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n", + "\n", + "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)" + ] + }, + { + "cell_type": "markdown", + "id": "e93210f9", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions week 39\n", + "\n", + "**Material for the lab sessions on Tuesday and Wednesday.**\n", + "\n", + "1. Discussions on how to structure your report for the first project\n", + "\n", + "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n", + "\n", + "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n", + "\n", + "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n", + "\n", + "5. A general guideline can be found at ." + ] + }, + { + "cell_type": "markdown", + "id": "c319a504", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture material" + ] + }, + { + "cell_type": "markdown", + "id": "5f29284a", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", + "\n", + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation." + ] + }, + { + "cell_type": "markdown", + "id": "4a774608", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." + ] + }, + { + "cell_type": "markdown", + "id": "5e62c381", + "metadata": { + "editable": true + }, + "source": [ + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." + ] + }, + { + "cell_type": "markdown", + "id": "96896342", + "metadata": { + "editable": true + }, + "source": [ + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." + ] + }, + { + "cell_type": "markdown", + "id": "d5318be7", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", + "\n", + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." + ] + }, + { + "cell_type": "markdown", + "id": "7597015e", + "metadata": { + "editable": true + }, + "source": [ + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)." + ] + }, + { + "cell_type": "markdown", + "id": "fbf69230", + "metadata": { + "editable": true + }, + "source": [ + "## The bias-variance tradeoff\n", + "\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "358f7872", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a4aceef", + "metadata": { + "editable": true + }, + "source": [ + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", + "\n", + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "84416669", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0036358e", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this as" + ] + }, + { + "cell_type": "markdown", + "id": "d712d2d7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b71e48ac", + "metadata": { + "editable": true + }, + "source": [ + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", + "\n", + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "c78ceafe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "74aae5bc", + "metadata": { + "editable": true + }, + "source": [ + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "1f2313f1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a29b174f", + "metadata": { + "editable": true + }, + "source": [ + "which, using the abovementioned expectation values can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "3bc08002", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7b7d24e8", + "metadata": { + "editable": true + }, + "source": [ + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n", + "\n", + "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**" + ] + }, + { + "cell_type": "markdown", + "id": "f2118d82", + "metadata": { + "editable": true + }, + "source": [ + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "baf08f8a", + "metadata": { + "editable": true + }, + "source": [ + "## Understanding what happens" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1bd7ac4e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3edb75ab", + "metadata": { + "editable": true + }, + "source": [ + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." + ] + }, + { + "cell_type": "markdown", + "id": "88ce8a48", + "metadata": { + "editable": true + }, + "source": [ + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "40385eb8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a0c0d4df", + "metadata": { + "editable": true + }, + "source": [ + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." + ] + }, + { + "cell_type": "markdown", + "id": "68d3e653", + "metadata": { + "editable": true + }, + "source": [ + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" + ] + }, + { + "cell_type": "markdown", + "id": "7f7a6350", + "metadata": { + "editable": true + }, + "source": [ + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", + "\n", + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "23eef50b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", + "\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", + "\n", + "## Cross-validation on Ridge regression using KFold only\n", + "\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", + "\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", + "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", + "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", + "\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", + "\n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", + "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", + "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", + "\n", + " j += 1\n", + " i += 1\n", + "\n", + "\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", + "\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", + "\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "76662787", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "166cd085", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "53dc97b8", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "660084ab", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5dd5aec2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2c1f6d4b", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic Regression\n", + "\n", + "In linear regression our main interest was centered on learning the\n", + "coefficients of a functional fit (say a polynomial) in order to be\n", + "able to predict the response of a continuous variable on some unseen\n", + "data. The fit to the continuous variable $y_i$ is based on some\n", + "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", + "analytical expressions for standard ordinary Least Squares or Ridge\n", + "regression (in terms of matrices to invert) for several quantities,\n", + "ranging from the variance and thereby the confidence intervals of the\n", + "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n", + "the product of the design matrices, linear regression gives then a\n", + "simple recipe for fitting our data." + ] + }, + { + "cell_type": "markdown", + "id": "149e92ec", + "metadata": { + "editable": true + }, + "source": [ + "## Classification problems\n", + "\n", + "Classification problems, however, are concerned with outcomes taking\n", + "the form of discrete variables (i.e. categories). We may for example,\n", + "on the basis of DNA sequencing for a number of patients, like to find\n", + "out which mutations are important for a certain disease; or based on\n", + "scans of various patients' brains, figure out if there is a tumor or\n", + "not; or given a specific physical system, we'd like to identify its\n", + "state, say whether it is an ordered or disordered system (typical\n", + "situation in solid state physics); or classify the status of a\n", + "patient, whether she/he has a stroke or not and many other similar\n", + "situations.\n", + "\n", + "The most common situation we encounter when we apply logistic\n", + "regression is that of two possible outcomes, normally denoted as a\n", + "binary outcome, true or false, positive or negative, success or\n", + "failure etc." + ] + }, + { + "cell_type": "markdown", + "id": "ce85cd3a", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization and Deep learning\n", + "\n", + "Logistic regression will also serve as our stepping stone towards\n", + "neural network algorithms and supervised deep learning. For logistic\n", + "learning, the minimization of the cost function leads to a non-linear\n", + "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n", + "problem calls therefore for minimization algorithms. This forms the\n", + "bottle neck of all machine learning algorithms, namely how to find\n", + "reliable minima of a multi-variable function. This leads us to the\n", + "family of gradient descent methods. The latter are the working horses\n", + "of basically all modern machine learning algorithms.\n", + "\n", + "We note also that many of the topics discussed here on logistic \n", + "regression are also commonly used in modern supervised Deep Learning\n", + "models, as we will see later." + ] + }, + { + "cell_type": "markdown", + "id": "2eb9e687", + "metadata": { + "editable": true + }, + "source": [ + "## Basics\n", + "\n", + "We consider the case where the outputs/targets, also called the\n", + "responses or the outcomes, $y_i$ are discrete and only take values\n", + "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", + "\n", + "The goal is to predict the\n", + "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", + "made of $n$ samples, each of which carries $p$ features or predictors. The\n", + "primary goal is to identify the classes to which new unseen samples\n", + "belong.\n", + "\n", + "Let us specialize to the case of two classes only, with outputs\n", + "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", + "credit card user that could default or not on her/his credit card\n", + "debt. That is" + ] + }, + { + "cell_type": "markdown", + "id": "9b8b7d05", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7db50d1a", + "metadata": { + "editable": true + }, + "source": [ + "## Linear classifier\n", + "\n", + "Before moving to the logistic model, let us try to use our linear\n", + "regression model to classify these two outcomes. We could for example\n", + "fit a linear model to the default case if $y_i > 0.5$ and the no\n", + "default case $y_i \\leq 0.5$.\n", + "\n", + "We would then have our \n", + "weighted linear combination, namely" + ] + }, + { + "cell_type": "markdown", + "id": "a78fc346", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} + \\boldsymbol{\\epsilon},\n", + "\\label{_auto1} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "661d8faf", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n", + "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors." + ] + }, + { + "cell_type": "markdown", + "id": "8620ba1b", + "metadata": { + "editable": true + }, + "source": [ + "## Some selected properties\n", + "\n", + "The main problem with our function is that it takes values on the\n", + "entire real axis. In the case of logistic regression, however, the\n", + "labels $y_i$ are discrete variables. A typical example is the credit\n", + "card data discussed below here, where we can set the state of\n", + "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n", + "in the data set (see the full example below).\n", + "\n", + "One simple way to get a discrete output is to have sign\n", + "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n", + "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n", + "We will encounter this model in our first demonstration of neural networks.\n", + "\n", + "Historically it is called the **perceptron** model in the machine learning\n", + "literature. This model is extremely simple. However, in many cases it is more\n", + "favorable to use a ``soft\" classifier that outputs\n", + "the probability of a given category. This leads us to the logistic function." + ] + }, + { + "cell_type": "markdown", + "id": "8fdbebd2", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example\n", + "\n", + "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8dc64aeb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "from IPython.display import display\n", + "from pylab import plt, mpl\n", + "mpl.rcParams['font.family'] = 'serif'\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"chddata.csv\"),'r')\n", + "\n", + "# Read the chd data as csv file and organize the data into arrays with age group, age, and chd\n", + "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n", + "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n", + "output = chd['CHD']\n", + "age = chd['Age']\n", + "agegroup = chd['Agegroup']\n", + "numberID = chd['ID'] \n", + "display(chd)\n", + "\n", + "plt.scatter(age, output, marker='o')\n", + "plt.axis([18,70.0,-0.1, 1.2])\n", + "plt.xlabel(r'Age')\n", + "plt.ylabel(r'CHD')\n", + "plt.title(r'Age distribution and Coronary heart disease')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "40385068", + "metadata": { + "editable": true + }, + "source": [ + "## Plotting the mean value for each group\n", + "\n", + "What we could attempt however is to plot the mean value for each group." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a473659b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n", + "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n", + "plt.plot(group, agegroupmean, \"r-\")\n", + "plt.axis([0,9,0, 1.0])\n", + "plt.xlabel(r'Age group')\n", + "plt.ylabel(r'CHD mean values')\n", + "plt.title(r'Mean values for each age group')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "3e2ab512", + "metadata": { + "editable": true + }, + "source": [ + "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n", + "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model" + ] + }, + { + "cell_type": "markdown", + "id": "40361f1b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a1b379fb", + "metadata": { + "editable": true + }, + "source": [ + "This expression implies however that $f(y_i\\vert x_i)$ could take any\n", + "value from minus infinity to plus infinity. If we however let\n", + "$f(y\\vert y)$ be represented by the mean value, the above example\n", + "shows us that we can constrain the function to take values between\n", + "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n", + "at our last curve we see also that it has an S-shaped form. This leads\n", + "us to a very popular model for the function $f$, namely the so-called\n", + "Sigmoid function or logistic model. We will consider this function as\n", + "representing the probability for finding a value of $y_i$ with a given\n", + "$x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "bcbf3d2b", + "metadata": { + "editable": true + }, + "source": [ + "## The logistic function\n", + "\n", + "Another widely studied model, is the so-called \n", + "perceptron model, which is an example of a \"hard classification\" model. We\n", + "will encounter this model when we discuss neural networks as\n", + "well. Each datapoint is deterministically assigned to a category (i.e\n", + "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n", + "classifier that outputs the probability of a given category rather\n", + "than a single value. For example, given $x_i$, the classifier\n", + "outputs the probability of being in a category $k$. Logistic regression\n", + "is the most common example of a so-called soft classifier. In logistic\n", + "regression, the probability that a data point $x_i$\n", + "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event," + ] + }, + { + "cell_type": "markdown", + "id": "38918f44", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fd225d0f", + "metadata": { + "editable": true + }, + "source": [ + "Note that $1-p(t)= p(-t)$." + ] + }, + { + "cell_type": "markdown", + "id": "d340b5c1", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of likelihood functions used in logistic regression and nueral networks\n", + "\n", + "The following code plots the logistic function, the step function and other functions we will encounter from here and on." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "357d6f03", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"The sigmoid function (or the logistic curve) is a\n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"tanh Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.tanh(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('tanh function')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8be63821", + "metadata": { + "editable": true + }, + "source": [ + "## Two parameters\n", + "\n", + "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "f79d930e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8a758aae", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n", + "\n", + "Note that we used" + ] + }, + { + "cell_type": "markdown", + "id": "88159170", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f9972402", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum likelihood\n", + "\n", + "In order to define the total likelihood for all possible outcomes from a \n", + "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", + "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", + "We aim thus at maximizing \n", + "the probability of seeing the observed data. We can then approximate the \n", + "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "949524d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9a7fded", + "metadata": { + "editable": true + }, + "source": [ + "from which we obtain the log-likelihood and our **cost/loss** function" + ] + }, + { + "cell_type": "markdown", + "id": "4c5f78fb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ccce506", + "metadata": { + "editable": true + }, + "source": [ + "## The cost function rewritten\n", + "\n", + "Reordering the logarithms, we can rewrite the **cost/loss** function as" + ] + }, + { + "cell_type": "markdown", + "id": "bf58bb76", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41543ca6", + "metadata": { + "editable": true + }, + "source": [ + "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n", + "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + ] + }, + { + "cell_type": "markdown", + "id": "e664b57a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb357503", + "metadata": { + "editable": true + }, + "source": [ + "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", + "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + ] + }, + { + "cell_type": "markdown", + "id": "e388ad02", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cross entropy\n", + "\n", + "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n", + "therefore, any local minimizer is a global minimizer. \n", + "\n", + "Minimizing this\n", + "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "1d4f2850", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "68a0c133", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c942a72b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42caf6db", + "metadata": { + "editable": true + }, + "source": [ + "## A more compact expression\n", + "\n", + "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", + "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", + "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n", + "derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "22cd94c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9428067", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "29178d5a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6b7671ad", + "metadata": { + "editable": true + }, + "source": [ + "## Extending to more predictors\n", + "\n", + "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + ] + }, + { + "cell_type": "markdown", + "id": "500b6574", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cf0b50ce", + "metadata": { + "editable": true + }, + "source": [ + "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to" + ] + }, + { + "cell_type": "markdown", + "id": "537486ee", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "534fb571", + "metadata": { + "editable": true + }, + "source": [ + "## Including more classes\n", + "\n", + "Till now we have mainly focused on two classes, the so-called binary\n", + "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", + "of simplicity assume we have only two predictors. We have then following model" + ] + }, + { + "cell_type": "markdown", + "id": "fa7ca275", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cc765c0e", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2c43387d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e063f183", + "metadata": { + "editable": true + }, + "source": [ + "and so on till the class $C=K-1$ class" + ] + }, + { + "cell_type": "markdown", + "id": "060fa00c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9034492", + "metadata": { + "editable": true + }, + "source": [ + "and the model is specified in term of $K-1$ so-called log-odds or\n", + "**logit** transformations." + ] + }, + { + "cell_type": "markdown", + "id": "b7fba1fc", + "metadata": { + "editable": true + }, + "source": [ + "## More classes\n", + "\n", + "In our discussion of neural networks we will encounter the above again\n", + "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "\n", + "The softmax function is used in various multiclass classification\n", + "methods, such as multinomial logistic regression (also known as\n", + "softmax regression), multiclass linear discriminant analysis, naive\n", + "Bayes classifiers, and artificial neural networks. Specifically, in\n", + "multinomial logistic regression and linear discriminant analysis, the\n", + "input to the function is the result of $K$ distinct linear functions,\n", + "and the predicted probability for the $k$-th class given a sample\n", + "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n", + "predictors):" + ] + }, + { + "cell_type": "markdown", + "id": "a8346f86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b05e18eb", + "metadata": { + "editable": true + }, + "source": [ + "It is easy to extend to more predictors. The final class is" + ] + }, + { + "cell_type": "markdown", + "id": "3bff89b1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e89e832c", + "metadata": { + "editable": true + }, + "source": [ + "and they sum to one. Our earlier discussions were all specialized to\n", + "the case with two classes only. It is easy to see from the above that\n", + "what we derived earlier is compatible with these equations.\n", + "\n", + "To find the optimal parameters we would typically use a gradient\n", + "descent method. Newton's method and gradient descent methods are\n", + "discussed in the material on [optimization\n", + "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "464d4933", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization, the central part of any Machine Learning algortithm\n", + "\n", + "Almost every problem in machine learning and data science starts with\n", + "a dataset $X$, a model $g(\\theta)$, which is a function of the\n", + "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n", + "us to judge how well the model $g(\\theta)$ explains the observations\n", + "$X$. The model is fit by finding the values of $\\theta$ that minimize\n", + "the cost function. Ideally we would be able to solve for $\\theta$\n", + "analytically, however this is not possible in general and we must use\n", + "some approximative/numerical method to compute the minimum." + ] + }, + { + "cell_type": "markdown", + "id": "c707d4a0", + "metadata": { + "editable": true + }, + "source": [ + "## Revisiting our Logistic Regression case\n", + "\n", + "In our discussion on Logistic Regression we studied the \n", + "case of\n", + "two classes, with $y_i$ either\n", + "$0$ or $1$. Furthermore we assumed also that we have only two\n", + "parameters $\\theta$ in our fitting, that is we\n", + "defined probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "3f00d244", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2d239661", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$." + ] + }, + { + "cell_type": "markdown", + "id": "4243778f", + "metadata": { + "editable": true + }, + "source": [ + "## The equations to solve\n", + "\n", + "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", + "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", + "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n", + "the first derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "21ce04bb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b854153c", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "235c9b1d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1651fe82", + "metadata": { + "editable": true + }, + "source": [ + "This defines what is called the Hessian matrix." + ] + }, + { + "cell_type": "markdown", + "id": "f36a8c94", + "metadata": { + "editable": true + }, + "source": [ + "## Solving using Newton-Raphson's method\n", + "\n", + "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", + "\n", + "Our iterative scheme is then given by" + ] + }, + { + "cell_type": "markdown", + "id": "438b5efe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3ae8207", + "metadata": { + "editable": true + }, + "source": [ + "or in matrix form as" + ] + }, + { + "cell_type": "markdown", + "id": "702a38c4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "43b5a9ab", + "metadata": { + "editable": true + }, + "source": [ + "The right-hand side is computed with the old values of $\\theta$. \n", + "\n", + "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + ] + }, + { + "cell_type": "markdown", + "id": "5b579d10", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Logistic Regression\n", + "\n", + "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a59b8c77", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "class LogisticRegression:\n", + " \"\"\"\n", + " Logistic Regression for binary and multiclass classification.\n", + " \"\"\"\n", + " def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n", + " self.lr = lr # Learning rate for gradient descent\n", + " self.epochs = epochs # Number of iterations\n", + " self.fit_intercept = fit_intercept # Whether to add intercept (bias)\n", + " self.verbose = verbose # Print loss during training if True\n", + " self.weights = None\n", + " self.multi_class = False # Will be determined at fit time\n", + "\n", + " def _add_intercept(self, X):\n", + " \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n", + " intercept = np.ones((X.shape[0], 1))\n", + " return np.concatenate((intercept, X), axis=1)\n", + "\n", + " def _sigmoid(self, z):\n", + " \"\"\"Sigmoid function for binary logistic.\"\"\"\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def _softmax(self, Z):\n", + " \"\"\"Softmax function for multiclass logistic.\"\"\"\n", + " exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n", + " return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n", + "\n", + " def fit(self, X, y):\n", + " \"\"\"\n", + " Train the logistic regression model using gradient descent.\n", + " Supports binary (sigmoid) and multiclass (softmax) based on y.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " y = np.array(y)\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Add intercept if needed\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " n_features += 1\n", + "\n", + " # Determine classes and mode (binary vs multiclass)\n", + " unique_classes = np.unique(y)\n", + " if len(unique_classes) > 2:\n", + " self.multi_class = True\n", + " else:\n", + " self.multi_class = False\n", + "\n", + " # ----- Multiclass case -----\n", + " if self.multi_class:\n", + " n_classes = len(unique_classes)\n", + " # Map original labels to 0...n_classes-1\n", + " class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n", + " y_indices = np.array([class_to_index[c] for c in y])\n", + " # Initialize weight matrix (features x classes)\n", + " self.weights = np.zeros((n_features, n_classes))\n", + "\n", + " # One-hot encode y\n", + " Y_onehot = np.zeros((n_samples, n_classes))\n", + " Y_onehot[np.arange(n_samples), y_indices] = 1\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " scores = X.dot(self.weights) # Linear scores (n_samples x n_classes)\n", + " probs = self._softmax(scores) # Probabilities (n_samples x n_classes)\n", + " # Compute gradient (features x classes)\n", + " gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n", + " # Update weights\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute current loss (categorical cross-entropy)\n", + " loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n", + " print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n", + "\n", + " # ----- Binary case -----\n", + " else:\n", + " # Convert y to 0/1 if not already\n", + " if not np.array_equal(unique_classes, [0, 1]):\n", + " # Map the two classes to 0 and 1\n", + " class0, class1 = unique_classes\n", + " y_binary = np.where(y == class1, 1, 0)\n", + " else:\n", + " y_binary = y.copy().astype(int)\n", + "\n", + " # Initialize weights vector (features,)\n", + " self.weights = np.zeros(n_features)\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " linear_model = X.dot(self.weights) # (n_samples,)\n", + " probs = self._sigmoid(linear_model) # (n_samples,)\n", + " # Gradient for binary cross-entropy\n", + " gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute binary cross-entropy loss\n", + " loss = -np.mean(\n", + " y_binary * np.log(probs + 1e-15) + \n", + " (1 - y_binary) * np.log(1 - probs + 1e-15)\n", + " )\n", + " print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n", + "\n", + " def predict_prob(self, X):\n", + " \"\"\"\n", + " Compute probability estimates. Returns a 1D array for binary or\n", + " a 2D array (n_samples x n_classes) for multiclass.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " # Add intercept if the model used it\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " scores = X.dot(self.weights)\n", + " if self.multi_class:\n", + " return self._softmax(scores)\n", + " else:\n", + " return self._sigmoid(scores)\n", + "\n", + " def predict(self, X):\n", + " \"\"\"\n", + " Predict class labels for samples in X.\n", + " Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n", + " \"\"\"\n", + " probs = self.predict_prob(X)\n", + " if self.multi_class:\n", + " # Choose class with highest probability\n", + " return np.argmax(probs, axis=1)\n", + " else:\n", + " # Threshold at 0.5 for binary\n", + " return (probs >= 0.5).astype(int)" + ] + }, + { + "cell_type": "markdown", + "id": "d7401376", + "metadata": { + "editable": true + }, + "source": [ + "The class implements the sigmoid and softmax internally. During fit(),\n", + "we check the number of classes: if more than 2, we set\n", + "self.multi_class=True and perform multinomial logistic regression. We\n", + "one-hot encode the target vector and update a weight matrix with\n", + "softmax probabilities. Otherwise, we do standard binary logistic\n", + "regression, converting labels to 0/1 if needed and updating a weight\n", + "vector. In both cases we use batch gradient descent on the\n", + "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n", + "stability). Progress (loss) can be printed if verbose=True." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "8609fd64", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Evaluation Metrics\n", + "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n", + "\n", + "def accuracy_score(y_true, y_pred):\n", + " \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n", + " y_true = np.array(y_true)\n", + " y_pred = np.array(y_pred)\n", + " return np.mean(y_true == y_pred)\n", + "\n", + "def binary_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Binary cross-entropy loss.\n", + " y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n", + " \"\"\"\n", + " y_true = np.array(y_true)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n", + "\n", + "def categorical_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Categorical cross-entropy loss for multiclass.\n", + " y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n", + " \"\"\"\n", + " y_true = np.array(y_true, dtype=int)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " # One-hot encode true labels\n", + " n_samples, n_classes = y_prob.shape\n", + " one_hot = np.zeros_like(y_prob)\n", + " one_hot[np.arange(n_samples), y_true] = 1\n", + " # Compute cross-entropy\n", + " loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n", + " return np.mean(loss_vec)" + ] + }, + { + "cell_type": "markdown", + "id": "1879aba2", + "metadata": { + "editable": true + }, + "source": [ + "### Synthetic data generation\n", + "\n", + "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n", + "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "6083d844", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic binary classification data.\n", + " Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " # Half samples for class 0, half for class 1\n", + " n0 = n_samples // 2\n", + " n1 = n_samples - n0\n", + " # Class 0 around mean -2, class 1 around +2\n", + " mean0 = -2 * np.ones(n_features)\n", + " mean1 = 2 * np.ones(n_features)\n", + " X0 = rng.randn(n0, n_features) + mean0\n", + " X1 = rng.randn(n1, n_features) + mean1\n", + " X = np.vstack((X0, X1))\n", + " y = np.array([0]*n0 + [1]*n1)\n", + " return X, y\n", + "\n", + "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic multiclass data with n_classes Gaussian clusters.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " X = []\n", + " y = []\n", + " samples_per_class = n_samples // n_classes\n", + " for cls in range(n_classes):\n", + " # Random cluster center for each class\n", + " center = rng.uniform(-5, 5, size=n_features)\n", + " Xi = rng.randn(samples_per_class, n_features) + center\n", + " yi = [cls] * samples_per_class\n", + " X.append(Xi)\n", + " y.extend(yi)\n", + " X = np.vstack(X)\n", + " y = np.array(y)\n", + " return X, y\n", + "\n", + "\n", + "# Generate and test on binary data\n", + "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n", + "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_bin.fit(X_bin, y_bin)\n", + "y_prob_bin = model_bin.predict_prob(X_bin) # probabilities for class 1\n", + "y_pred_bin = model_bin.predict(X_bin) # predicted classes 0 or 1\n", + "\n", + "acc_bin = accuracy_score(y_bin, y_pred_bin)\n", + "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n", + "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n", + "#For multiclass:\n", + "# Generate and test on multiclass data\n", + "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n", + "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_multi.fit(X_multi, y_multi)\n", + "y_prob_multi = model_multi.predict_prob(X_multi) # (n_samples x 3) probabilities\n", + "y_pred_multi = model_multi.predict(X_multi) # predicted labels 0,1,2\n", + "\n", + "acc_multi = accuracy_score(y_multi, y_pred_multi)\n", + "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n", + "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n", + "\n", + "# CSV Export\n", + "import csv\n", + "\n", + "# Export binary results\n", + "with open('binary_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_bin, y_pred_bin):\n", + " writer.writerow([true, pred])\n", + "\n", + "# Export multiclass results\n", + "with open('multiclass_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_multi, y_pred_multi):\n", + " writer.writerow([true, pred])" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week40.ipynb b/doc/LectureNotes/week40.ipynb new file mode 100644 index 000000000..aa3733b88 --- /dev/null +++ b/doc/LectureNotes/week40.ipynb @@ -0,0 +1,2459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "2303c986", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "75c3b33e", + "metadata": { + "editable": true + }, + "source": [ + "# Week 40: Gradient descent methods (continued) and start Neural networks\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **September 29-October 3, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "4ba50982", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday September 29, 2025\n", + "1. Logistic regression and gradient descent, examples on how to code\n", + "\n", + "\n", + "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n", + "\n", + "3. Video of lecture at \n", + "\n", + "4. Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "1d527020", + "metadata": { + "editable": true + }, + "source": [ + "## Suggested readings and videos\n", + "**Readings and Videos:**\n", + "\n", + "1. The lecture notes for week 40 (these notes)\n", + "\n", + "\n", + "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n", + "\n", + "\n", + "\n", + "3. Neural Networks demystified at \n", + "\n", + "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\"" + ] + }, + { + "cell_type": "markdown", + "id": "63a4d497", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions Tuesday and Wednesday\n", + "**Material for the active learning sessions on Tuesday and Wednesday.**\n", + "\n", + " * Work on project 1 and discussions on how to structure your report\n", + "\n", + " * No weekly exercises for week 40, project work only\n", + "\n", + " * Video on how to write scientific reports recorded during one of the lab sessions at \n", + "\n", + " * A general guideline can be found at ." + ] + }, + { + "cell_type": "markdown", + "id": "73621d6b", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic Regression, from last week\n", + "\n", + "In linear regression our main interest was centered on learning the\n", + "coefficients of a functional fit (say a polynomial) in order to be\n", + "able to predict the response of a continuous variable on some unseen\n", + "data. The fit to the continuous variable $y_i$ is based on some\n", + "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", + "analytical expressions for standard ordinary Least Squares or Ridge\n", + "regression (in terms of matrices to invert) for several quantities,\n", + "ranging from the variance and thereby the confidence intervals of the\n", + "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n", + "the product of the design matrices, linear regression gives then a\n", + "simple recipe for fitting our data." + ] + }, + { + "cell_type": "markdown", + "id": "fc1df17b", + "metadata": { + "editable": true + }, + "source": [ + "## Classification problems\n", + "\n", + "Classification problems, however, are concerned with outcomes taking\n", + "the form of discrete variables (i.e. categories). We may for example,\n", + "on the basis of DNA sequencing for a number of patients, like to find\n", + "out which mutations are important for a certain disease; or based on\n", + "scans of various patients' brains, figure out if there is a tumor or\n", + "not; or given a specific physical system, we'd like to identify its\n", + "state, say whether it is an ordered or disordered system (typical\n", + "situation in solid state physics); or classify the status of a\n", + "patient, whether she/he has a stroke or not and many other similar\n", + "situations.\n", + "\n", + "The most common situation we encounter when we apply logistic\n", + "regression is that of two possible outcomes, normally denoted as a\n", + "binary outcome, true or false, positive or negative, success or\n", + "failure etc." + ] + }, + { + "cell_type": "markdown", + "id": "a3d311e6", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization and Deep learning\n", + "\n", + "Logistic regression will also serve as our stepping stone towards\n", + "neural network algorithms and supervised deep learning. For logistic\n", + "learning, the minimization of the cost function leads to a non-linear\n", + "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n", + "problem calls therefore for minimization algorithms.\n", + "\n", + "As we have discussed earlier, this forms the\n", + "bottle neck of all machine learning algorithms, namely how to find\n", + "reliable minima of a multi-variable function. This leads us to the\n", + "family of gradient descent methods. The latter are the working horses\n", + "of basically all modern machine learning algorithms.\n", + "\n", + "We note also that many of the topics discussed here on logistic \n", + "regression are also commonly used in modern supervised Deep Learning\n", + "models, as we will see later." + ] + }, + { + "cell_type": "markdown", + "id": "4120d6f9", + "metadata": { + "editable": true + }, + "source": [ + "## Basics\n", + "\n", + "We consider the case where the outputs/targets, also called the\n", + "responses or the outcomes, $y_i$ are discrete and only take values\n", + "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", + "\n", + "The goal is to predict the\n", + "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", + "made of $n$ samples, each of which carries $p$ features or predictors. The\n", + "primary goal is to identify the classes to which new unseen samples\n", + "belong.\n", + "\n", + "Last week we specialized to the case of two classes only, with outputs\n", + "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", + "credit card user that could default or not on her/his credit card\n", + "debt. That is" + ] + }, + { + "cell_type": "markdown", + "id": "9e85d1e4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a0d8c838", + "metadata": { + "editable": true + }, + "source": [ + "## Two parameters\n", + "\n", + "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "7cea7945", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6adc5106", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n", + "\n", + "Note that we used" + ] + }, + { + "cell_type": "markdown", + "id": "f976068e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dedf9f0e", + "metadata": { + "editable": true + }, + "source": [ + "## Maximum likelihood\n", + "\n", + "In order to define the total likelihood for all possible outcomes from a \n", + "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", + "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", + "We aim thus at maximizing \n", + "the probability of seeing the observed data. We can then approximate the \n", + "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "bd8b54ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "57bfb17f", + "metadata": { + "editable": true + }, + "source": [ + "from which we obtain the log-likelihood and our **cost/loss** function" + ] + }, + { + "cell_type": "markdown", + "id": "00aee268", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e12940f3", + "metadata": { + "editable": true + }, + "source": [ + "## The cost function rewritten\n", + "\n", + "Reordering the logarithms, we can rewrite the **cost/loss** function as" + ] + }, + { + "cell_type": "markdown", + "id": "e5b2b29e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c6c0ba4c", + "metadata": { + "editable": true + }, + "source": [ + "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n", + "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + ] + }, + { + "cell_type": "markdown", + "id": "46ee2ea8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a05709b", + "metadata": { + "editable": true + }, + "source": [ + "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", + "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + ] + }, + { + "cell_type": "markdown", + "id": "ae1362c9", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cross entropy\n", + "\n", + "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n", + "therefore, any local minimizer is a global minimizer. \n", + "\n", + "Minimizing this\n", + "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "57f4670b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1dc19f59", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "4e96dc87", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fa77bec9", + "metadata": { + "editable": true + }, + "source": [ + "## A more compact expression\n", + "\n", + "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", + "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", + "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n", + "derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "1b013fd2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "910f36dd", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "8212d0ed", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ae7078b", + "metadata": { + "editable": true + }, + "source": [ + "## Extending to more predictors\n", + "\n", + "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + ] + }, + { + "cell_type": "markdown", + "id": "59e57d7c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6ffe0955", + "metadata": { + "editable": true + }, + "source": [ + "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to" + ] + }, + { + "cell_type": "markdown", + "id": "56e9bd82", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "86b12946", + "metadata": { + "editable": true + }, + "source": [ + "## Including more classes\n", + "\n", + "Till now we have mainly focused on two classes, the so-called binary\n", + "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", + "of simplicity assume we have only two predictors. We have then following model" + ] + }, + { + "cell_type": "markdown", + "id": "d55394df", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee01378a", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c7fadfbb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e8310f63", + "metadata": { + "editable": true + }, + "source": [ + "and so on till the class $C=K-1$ class" + ] + }, + { + "cell_type": "markdown", + "id": "be651647", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e277c601", + "metadata": { + "editable": true + }, + "source": [ + "and the model is specified in term of $K-1$ so-called log-odds or\n", + "**logit** transformations." + ] + }, + { + "cell_type": "markdown", + "id": "aea3a410", + "metadata": { + "editable": true + }, + "source": [ + "## More classes\n", + "\n", + "In our discussion of neural networks we will encounter the above again\n", + "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "\n", + "The softmax function is used in various multiclass classification\n", + "methods, such as multinomial logistic regression (also known as\n", + "softmax regression), multiclass linear discriminant analysis, naive\n", + "Bayes classifiers, and artificial neural networks. Specifically, in\n", + "multinomial logistic regression and linear discriminant analysis, the\n", + "input to the function is the result of $K$ distinct linear functions,\n", + "and the predicted probability for the $k$-th class given a sample\n", + "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n", + "predictors):" + ] + }, + { + "cell_type": "markdown", + "id": "bfa7221f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3d749c39", + "metadata": { + "editable": true + }, + "source": [ + "It is easy to extend to more predictors. The final class is" + ] + }, + { + "cell_type": "markdown", + "id": "dc061a39", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ea10488", + "metadata": { + "editable": true + }, + "source": [ + "and they sum to one. Our earlier discussions were all specialized to\n", + "the case with two classes only. It is easy to see from the above that\n", + "what we derived earlier is compatible with these equations.\n", + "\n", + "To find the optimal parameters we would typically use a gradient\n", + "descent method. Newton's method and gradient descent methods are\n", + "discussed in the material on [optimization\n", + "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "9cb3baf8", + "metadata": { + "editable": true + }, + "source": [ + "## Optimization, the central part of any Machine Learning algortithm\n", + "\n", + "Almost every problem in machine learning and data science starts with\n", + "a dataset $X$, a model $g(\\theta)$, which is a function of the\n", + "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n", + "us to judge how well the model $g(\\theta)$ explains the observations\n", + "$X$. The model is fit by finding the values of $\\theta$ that minimize\n", + "the cost function. Ideally we would be able to solve for $\\theta$\n", + "analytically, however this is not possible in general and we must use\n", + "some approximative/numerical method to compute the minimum." + ] + }, + { + "cell_type": "markdown", + "id": "387393d7", + "metadata": { + "editable": true + }, + "source": [ + "## Revisiting our Logistic Regression case\n", + "\n", + "In our discussion on Logistic Regression we studied the \n", + "case of\n", + "two classes, with $y_i$ either\n", + "$0$ or $1$. Furthermore we assumed also that we have only two\n", + "parameters $\\theta$ in our fitting, that is we\n", + "defined probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "30f64659", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n", + "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ba65422", + "metadata": { + "editable": true + }, + "source": [ + "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$." + ] + }, + { + "cell_type": "markdown", + "id": "005f46d7", + "metadata": { + "editable": true + }, + "source": [ + "## The equations to solve\n", + "\n", + "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", + "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", + "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n", + "the first derivative of the cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "61a638bc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "469c0042", + "metadata": { + "editable": true + }, + "source": [ + "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", + "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as" + ] + }, + { + "cell_type": "markdown", + "id": "0af5449a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f4c16b4f", + "metadata": { + "editable": true + }, + "source": [ + "This defines what is called the Hessian matrix." + ] + }, + { + "cell_type": "markdown", + "id": "ddbe7f50", + "metadata": { + "editable": true + }, + "source": [ + "## Solving using Newton-Raphson's method\n", + "\n", + "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", + "\n", + "Our iterative scheme is then given by" + ] + }, + { + "cell_type": "markdown", + "id": "52830f96", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1b8a1c14", + "metadata": { + "editable": true + }, + "source": [ + "or in matrix form as" + ] + }, + { + "cell_type": "markdown", + "id": "8ad73cea", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6d47dd0b", + "metadata": { + "editable": true + }, + "source": [ + "The right-hand side is computed with the old values of $\\theta$. \n", + "\n", + "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + ] + }, + { + "cell_type": "markdown", + "id": "f399c2f4", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Logistic Regression\n", + "\n", + "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "79f6b6fc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "class LogisticRegression:\n", + " \"\"\"\n", + " Logistic Regression for binary and multiclass classification.\n", + " \"\"\"\n", + " def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n", + " self.lr = lr # Learning rate for gradient descent\n", + " self.epochs = epochs # Number of iterations\n", + " self.fit_intercept = fit_intercept # Whether to add intercept (bias)\n", + " self.verbose = verbose # Print loss during training if True\n", + " self.weights = None\n", + " self.multi_class = False # Will be determined at fit time\n", + "\n", + " def _add_intercept(self, X):\n", + " \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n", + " intercept = np.ones((X.shape[0], 1))\n", + " return np.concatenate((intercept, X), axis=1)\n", + "\n", + " def _sigmoid(self, z):\n", + " \"\"\"Sigmoid function for binary logistic.\"\"\"\n", + " return 1 / (1 + np.exp(-z))\n", + "\n", + " def _softmax(self, Z):\n", + " \"\"\"Softmax function for multiclass logistic.\"\"\"\n", + " exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n", + " return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n", + "\n", + " def fit(self, X, y):\n", + " \"\"\"\n", + " Train the logistic regression model using gradient descent.\n", + " Supports binary (sigmoid) and multiclass (softmax) based on y.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " y = np.array(y)\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Add intercept if needed\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " n_features += 1\n", + "\n", + " # Determine classes and mode (binary vs multiclass)\n", + " unique_classes = np.unique(y)\n", + " if len(unique_classes) > 2:\n", + " self.multi_class = True\n", + " else:\n", + " self.multi_class = False\n", + "\n", + " # ----- Multiclass case -----\n", + " if self.multi_class:\n", + " n_classes = len(unique_classes)\n", + " # Map original labels to 0...n_classes-1\n", + " class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n", + " y_indices = np.array([class_to_index[c] for c in y])\n", + " # Initialize weight matrix (features x classes)\n", + " self.weights = np.zeros((n_features, n_classes))\n", + "\n", + " # One-hot encode y\n", + " Y_onehot = np.zeros((n_samples, n_classes))\n", + " Y_onehot[np.arange(n_samples), y_indices] = 1\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " scores = X.dot(self.weights) # Linear scores (n_samples x n_classes)\n", + " probs = self._softmax(scores) # Probabilities (n_samples x n_classes)\n", + " # Compute gradient (features x classes)\n", + " gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n", + " # Update weights\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute current loss (categorical cross-entropy)\n", + " loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n", + " print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n", + "\n", + " # ----- Binary case -----\n", + " else:\n", + " # Convert y to 0/1 if not already\n", + " if not np.array_equal(unique_classes, [0, 1]):\n", + " # Map the two classes to 0 and 1\n", + " class0, class1 = unique_classes\n", + " y_binary = np.where(y == class1, 1, 0)\n", + " else:\n", + " y_binary = y.copy().astype(int)\n", + "\n", + " # Initialize weights vector (features,)\n", + " self.weights = np.zeros(n_features)\n", + "\n", + " # Gradient descent\n", + " for epoch in range(self.epochs):\n", + " linear_model = X.dot(self.weights) # (n_samples,)\n", + " probs = self._sigmoid(linear_model) # (n_samples,)\n", + " # Gradient for binary cross-entropy\n", + " gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n", + " self.weights -= self.lr * gradient\n", + "\n", + " if self.verbose and epoch % 100 == 0:\n", + " # Compute binary cross-entropy loss\n", + " loss = -np.mean(\n", + " y_binary * np.log(probs + 1e-15) + \n", + " (1 - y_binary) * np.log(1 - probs + 1e-15)\n", + " )\n", + " print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n", + "\n", + " def predict_prob(self, X):\n", + " \"\"\"\n", + " Compute probability estimates. Returns a 1D array for binary or\n", + " a 2D array (n_samples x n_classes) for multiclass.\n", + " \"\"\"\n", + " X = np.array(X)\n", + " # Add intercept if the model used it\n", + " if self.fit_intercept:\n", + " X = self._add_intercept(X)\n", + " scores = X.dot(self.weights)\n", + " if self.multi_class:\n", + " return self._softmax(scores)\n", + " else:\n", + " return self._sigmoid(scores)\n", + "\n", + " def predict(self, X):\n", + " \"\"\"\n", + " Predict class labels for samples in X.\n", + " Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n", + " \"\"\"\n", + " probs = self.predict_prob(X)\n", + " if self.multi_class:\n", + " # Choose class with highest probability\n", + " return np.argmax(probs, axis=1)\n", + " else:\n", + " # Threshold at 0.5 for binary\n", + " return (probs >= 0.5).astype(int)" + ] + }, + { + "cell_type": "markdown", + "id": "24e84b29", + "metadata": { + "editable": true + }, + "source": [ + "The class implements the sigmoid and softmax internally. During fit(),\n", + "we check the number of classes: if more than 2, we set\n", + "self.multi_class=True and perform multinomial logistic regression. We\n", + "one-hot encode the target vector and update a weight matrix with\n", + "softmax probabilities. Otherwise, we do standard binary logistic\n", + "regression, converting labels to 0/1 if needed and updating a weight\n", + "vector. In both cases we use batch gradient descent on the\n", + "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n", + "stability). Progress (loss) can be printed if verbose=True." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "7a73eca4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Evaluation Metrics\n", + "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n", + "\n", + "def accuracy_score(y_true, y_pred):\n", + " \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n", + " y_true = np.array(y_true)\n", + " y_pred = np.array(y_pred)\n", + " return np.mean(y_true == y_pred)\n", + "\n", + "def binary_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Binary cross-entropy loss.\n", + " y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n", + " \"\"\"\n", + " y_true = np.array(y_true)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n", + "\n", + "def categorical_cross_entropy(y_true, y_prob):\n", + " \"\"\"\n", + " Categorical cross-entropy loss for multiclass.\n", + " y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n", + " \"\"\"\n", + " y_true = np.array(y_true, dtype=int)\n", + " y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n", + " # One-hot encode true labels\n", + " n_samples, n_classes = y_prob.shape\n", + " one_hot = np.zeros_like(y_prob)\n", + " one_hot[np.arange(n_samples), y_true] = 1\n", + " # Compute cross-entropy\n", + " loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n", + " return np.mean(loss_vec)" + ] + }, + { + "cell_type": "markdown", + "id": "40d4b30f", + "metadata": { + "editable": true + }, + "source": [ + "### Synthetic data generation\n", + "\n", + "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n", + "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ac0089bf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic binary classification data.\n", + " Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " # Half samples for class 0, half for class 1\n", + " n0 = n_samples // 2\n", + " n1 = n_samples - n0\n", + " # Class 0 around mean -2, class 1 around +2\n", + " mean0 = -2 * np.ones(n_features)\n", + " mean1 = 2 * np.ones(n_features)\n", + " X0 = rng.randn(n0, n_features) + mean0\n", + " X1 = rng.randn(n1, n_features) + mean1\n", + " X = np.vstack((X0, X1))\n", + " y = np.array([0]*n0 + [1]*n1)\n", + " return X, y\n", + "\n", + "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n", + " \"\"\"\n", + " Generate synthetic multiclass data with n_classes Gaussian clusters.\n", + " \"\"\"\n", + " rng = np.random.RandomState(random_state)\n", + " X = []\n", + " y = []\n", + " samples_per_class = n_samples // n_classes\n", + " for cls in range(n_classes):\n", + " # Random cluster center for each class\n", + " center = rng.uniform(-5, 5, size=n_features)\n", + " Xi = rng.randn(samples_per_class, n_features) + center\n", + " yi = [cls] * samples_per_class\n", + " X.append(Xi)\n", + " y.extend(yi)\n", + " X = np.vstack(X)\n", + " y = np.array(y)\n", + " return X, y\n", + "\n", + "\n", + "# Generate and test on binary data\n", + "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n", + "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_bin.fit(X_bin, y_bin)\n", + "y_prob_bin = model_bin.predict_prob(X_bin) # probabilities for class 1\n", + "y_pred_bin = model_bin.predict(X_bin) # predicted classes 0 or 1\n", + "\n", + "acc_bin = accuracy_score(y_bin, y_pred_bin)\n", + "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n", + "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n", + "#For multiclass:\n", + "# Generate and test on multiclass data\n", + "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n", + "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n", + "model_multi.fit(X_multi, y_multi)\n", + "y_prob_multi = model_multi.predict_prob(X_multi) # (n_samples x 3) probabilities\n", + "y_pred_multi = model_multi.predict(X_multi) # predicted labels 0,1,2\n", + "\n", + "acc_multi = accuracy_score(y_multi, y_pred_multi)\n", + "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n", + "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n", + "\n", + "# CSV Export\n", + "import csv\n", + "\n", + "# Export binary results\n", + "with open('binary_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_bin, y_pred_bin):\n", + " writer.writerow([true, pred])\n", + "\n", + "# Export multiclass results\n", + "with open('multiclass_results.csv', mode='w', newline='') as f:\n", + " writer = csv.writer(f)\n", + " writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n", + " for true, pred in zip(y_multi, y_pred_multi):\n", + " writer.writerow([true, pred])" + ] + }, + { + "cell_type": "markdown", + "id": "1e9acef3", + "metadata": { + "editable": true + }, + "source": [ + "## Using **Scikit-learn**\n", + "\n", + "We show here how we can use a logistic regression case on a data set\n", + "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n", + "using Logistic regression as our algorithm for classification. This is\n", + "a widely studied data set and can easily be included in demonstrations\n", + "of classification problems." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9153234a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data\n", + "cancer = load_breast_cancer()\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))" + ] + }, + { + "cell_type": "markdown", + "id": "908d547b", + "metadata": { + "editable": true + }, + "source": [ + "## Using the correlation matrix\n", + "\n", + "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n", + "We use **Pandas** to compute the correlation matrix." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8a46f4f3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "cancer = load_breast_cancer()\n", + "import pandas as pd\n", + "# Making a data frame\n", + "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n", + "\n", + "fig, axes = plt.subplots(15,2,figsize=(10,20))\n", + "malignant = cancer.data[cancer.target == 0]\n", + "benign = cancer.data[cancer.target == 1]\n", + "ax = axes.ravel()\n", + "\n", + "for i in range(30):\n", + " _, bins = np.histogram(cancer.data[:,i], bins =50)\n", + " ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n", + " ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n", + " ax[i].set_title(cancer.feature_names[i])\n", + " ax[i].set_yticks(())\n", + "ax[0].set_xlabel(\"Feature magnitude\")\n", + "ax[0].set_ylabel(\"Frequency\")\n", + "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n", + "fig.tight_layout()\n", + "plt.show()\n", + "\n", + "import seaborn as sns\n", + "correlation_matrix = cancerpd.corr().round(1)\n", + "# use the heatmap function from seaborn to plot the correlation matrix\n", + "# annot = True to print the values inside the square\n", + "plt.figure(figsize=(15,8))\n", + "sns.heatmap(data=correlation_matrix, annot=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ba0275a7", + "metadata": { + "editable": true + }, + "source": [ + "## Discussing the correlation data\n", + "\n", + "In the above example we note two things. In the first plot we display\n", + "the overlap of benign and malignant tumors as functions of the various\n", + "features in the Wisconsin data set. We see that for\n", + "some of the features we can distinguish clearly the benign and\n", + "malignant cases while for other features we cannot. This can point to\n", + "us which features may be of greater interest when we wish to classify\n", + "a benign or not benign tumour.\n", + "\n", + "In the second figure we have computed the so-called correlation\n", + "matrix, which in our case with thirty features becomes a $30\\times 30$\n", + "matrix.\n", + "\n", + "We constructed this matrix using **pandas** via the statements" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "1af34f8e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)" + ] + }, + { + "cell_type": "markdown", + "id": "1eac30d3", + "metadata": { + "editable": true + }, + "source": [ + "and then" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a0cdd9c9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "correlation_matrix = cancerpd.corr().round(1)" + ] + }, + { + "cell_type": "markdown", + "id": "013777ad", + "metadata": { + "editable": true + }, + "source": [ + "Diagonalizing this matrix we can in turn say something about which\n", + "features are of relevance and which are not. This leads us to\n", + "the classical Principal Component Analysis (PCA) theorem with\n", + "applications. This will be discussed later this semester." + ] + }, + { + "cell_type": "markdown", + "id": "410f90ac", + "metadata": { + "editable": true + }, + "source": [ + "## Other measures in classification studies" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "fa16a459", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split \n", + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load the data\n", + "cancer = load_breast_cancer()\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", + "print(X_train.shape)\n", + "print(X_test.shape)\n", + "# Logistic Regression\n", + "logreg = LogisticRegression(solver='lbfgs')\n", + "logreg.fit(X_train, y_train)\n", + "\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from sklearn.model_selection import cross_validate\n", + "#Cross validation\n", + "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", + "print(accuracy)\n", + "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", + "\n", + "import scikitplot as skplt\n", + "y_pred = logreg.predict(X_test)\n", + "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", + "plt.show()\n", + "y_probas = logreg.predict_proba(X_test)\n", + "skplt.metrics.plot_roc(y_test, y_probas)\n", + "plt.show()\n", + "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a721de53", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to Neural networks\n", + "\n", + "Artificial neural networks are computational systems that can learn to\n", + "perform tasks by considering examples, generally without being\n", + "programmed with any task-specific rules. It is supposed to mimic a\n", + "biological system, wherein neurons interact by sending signals in the\n", + "form of mathematical functions between layers. All layers can contain\n", + "an arbitrary number of neurons, and each connection is represented by\n", + "a weight variable." + ] + }, + { + "cell_type": "markdown", + "id": "68de5052", + "metadata": { + "editable": true + }, + "source": [ + "## Artificial neurons\n", + "\n", + "The field of artificial neural networks has a long history of\n", + "development, and is closely connected with the advancement of computer\n", + "science and computers in general. A model of artificial neurons was\n", + "first developed by McCulloch and Pitts in 1943 to study signal\n", + "processing in the brain and has later been refined by others. The\n", + "general idea is to mimic neural networks in the human brain, which is\n", + "composed of billions of neurons that communicate with each other by\n", + "sending electrical signals. Each neuron accumulates its incoming\n", + "signals, which must exceed an activation threshold to yield an\n", + "output. If the threshold is not overcome, the neuron remains inactive,\n", + "i.e. has zero output.\n", + "\n", + "This behaviour has inspired a simple mathematical model for an artificial neuron." + ] + }, + { + "cell_type": "markdown", + "id": "7685af02", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n", + "\\label{artificialNeuron} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3dfcfcb0", + "metadata": { + "editable": true + }, + "source": [ + "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n", + "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n", + "\n", + "Conceptually, it is helpful to divide neural networks into four\n", + "categories:\n", + "1. general purpose neural networks for supervised learning,\n", + "\n", + "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n", + "\n", + "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n", + "\n", + "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n", + "\n", + "In natural science, DNNs and CNNs have already found numerous\n", + "applications. In statistical physics, they have been applied to detect\n", + "phase transitions in 2D Ising and Potts models, lattice gauge\n", + "theories, and different phases of polymers, or solving the\n", + "Navier-Stokes equation in weather forecasting. Deep learning has also\n", + "found interesting applications in quantum physics. Various quantum\n", + "phase transitions can be detected and studied using DNNs and CNNs,\n", + "topological phases, and even non-equilibrium many-body\n", + "localization. Representing quantum states as DNNs quantum state\n", + "tomography are among some of the impressive achievements to reveal the\n", + "potential of DNNs to facilitate the study of quantum systems.\n", + "\n", + "In quantum information theory, it has been shown that one can perform\n", + "gate decompositions with the help of neural. \n", + "\n", + "The applications are not limited to the natural sciences. There is a\n", + "plethora of applications in essentially all disciplines, from the\n", + "humanities to life science and medicine." + ] + }, + { + "cell_type": "markdown", + "id": "0d037ca7", + "metadata": { + "editable": true + }, + "source": [ + "## Neural network types\n", + "\n", + "An artificial neural network (ANN), is a computational model that\n", + "consists of layers of connected neurons, or nodes or units. We will\n", + "refer to these interchangeably as units or nodes, and sometimes as\n", + "neurons.\n", + "\n", + "It is supposed to mimic a biological nervous system by letting each\n", + "neuron interact with other neurons by sending signals in the form of\n", + "mathematical functions between layers. A wide variety of different\n", + "ANNs have been developed, but most of them consist of an input layer,\n", + "an output layer and eventual layers in-between, called *hidden\n", + "layers*. All layers can contain an arbitrary number of nodes, and each\n", + "connection between two nodes is associated with a weight variable.\n", + "\n", + "Neural networks (also called neural nets) are neural-inspired\n", + "nonlinear models for supervised learning. As we will see, neural nets\n", + "can be viewed as natural, more powerful extensions of supervised\n", + "learning methods such as linear and logistic regression and soft-max\n", + "methods we discussed earlier." + ] + }, + { + "cell_type": "markdown", + "id": "7bcf7188", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward neural networks\n", + "\n", + "The feed-forward neural network (FFNN) was the first and simplest type\n", + "of ANNs that were devised. In this network, the information moves in\n", + "only one direction: forward through the layers.\n", + "\n", + "Nodes are represented by circles, while the arrows display the\n", + "connections between the nodes, including the direction of information\n", + "flow. Additionally, each arrow corresponds to a weight variable\n", + "(figure to come). We observe that each node in a layer is connected\n", + "to *all* nodes in the subsequent layer, making this a so-called\n", + "*fully-connected* FFNN." + ] + }, + { + "cell_type": "markdown", + "id": "cd094e20", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Network\n", + "\n", + "A different variant of FFNNs are *convolutional neural networks*\n", + "(CNNs), which have a connectivity pattern inspired by the animal\n", + "visual cortex. Individual neurons in the visual cortex only respond to\n", + "stimuli from small sub-regions of the visual field, called a receptive\n", + "field. This makes the neurons well-suited to exploit the strong\n", + "spatially local correlation present in natural images. The response of\n", + "each neuron can be approximated mathematically as a convolution\n", + "operation. (figure to come)\n", + "\n", + "Convolutional neural networks emulate the behaviour of neurons in the\n", + "visual cortex by enforcing a *local* connectivity pattern between\n", + "nodes of adjacent layers: Each node in a convolutional layer is\n", + "connected only to a subset of the nodes in the previous layer, in\n", + "contrast to the fully-connected FFNN. Often, CNNs consist of several\n", + "convolutional layers that learn local features of the input, with a\n", + "fully-connected layer at the end, which gathers all the local data and\n", + "produces the outputs. They have wide applications in image and video\n", + "recognition." + ] + }, + { + "cell_type": "markdown", + "id": "ea99157e", + "metadata": { + "editable": true + }, + "source": [ + "## Recurrent neural networks\n", + "\n", + "So far we have only mentioned ANNs where information flows in one\n", + "direction: forward. *Recurrent neural networks* on the other hand,\n", + "have connections between nodes that form directed *cycles*. This\n", + "creates a form of internal memory which are able to capture\n", + "information on what has been calculated before; the output is\n", + "dependent on the previous computations. Recurrent NNs make use of\n", + "sequential information by performing the same task for every element\n", + "in a sequence, where each element depends on previous elements. An\n", + "example of such information is sentences, making recurrent NNs\n", + "especially well-suited for handwriting and speech recognition." + ] + }, + { + "cell_type": "markdown", + "id": "b73754c2", + "metadata": { + "editable": true + }, + "source": [ + "## Other types of networks\n", + "\n", + "There are many other kinds of ANNs that have been developed. One type\n", + "that is specifically designed for interpolation in multidimensional\n", + "space is the radial basis function (RBF) network. RBFs are typically\n", + "made up of three layers: an input layer, a hidden layer with\n", + "non-linear radial symmetric activation functions and a linear output\n", + "layer (''linear'' here means that each node in the output layer has a\n", + "linear activation function). The layers are normally fully-connected\n", + "and there are no cycles, thus RBFs can be viewed as a type of\n", + "fully-connected FFNN. They are however usually treated as a separate\n", + "type of NN due the unusual activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "aa97c83d", + "metadata": { + "editable": true + }, + "source": [ + "## Multilayer perceptrons\n", + "\n", + "One uses often so-called fully-connected feed-forward neural networks\n", + "with three or more layers (an input layer, one or more hidden layers\n", + "and an output layer) consisting of neurons that have non-linear\n", + "activation functions.\n", + "\n", + "Such networks are often called *multilayer perceptrons* (MLPs)." + ] + }, + { + "cell_type": "markdown", + "id": "abe84919", + "metadata": { + "editable": true + }, + "source": [ + "## Why multilayer perceptrons?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**.\n", + "\n", + "Note that the requirements on the activation function only applies to\n", + "the hidden layer, the output nodes are always assumed to be linear, so\n", + "as to not restrict the range of output values." + ] + }, + { + "cell_type": "markdown", + "id": "d3ff207b", + "metadata": { + "editable": true + }, + "source": [ + "## Illustration of a single perceptron model and a multi-perceptron model\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f982c11f", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of XOR, OR and AND gates\n", + "\n", + "Let us first try to fit various gates using standard linear\n", + "regression. The gates we are thinking of are the classical XOR, OR and\n", + "AND gates, well-known elements in computer science. The tables here\n", + "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n", + "specific target $y_i$." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "04a3e090", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "Simple code that tests XOR, OR and AND gates with linear regression\n", + "\"\"\"\n", + "\n", + "import numpy as np\n", + "# Design matrix\n", + "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n", + "print(f\"The X.TX matrix:{X.T @ X}\")\n", + "Xinv = np.linalg.pinv(X.T @ X)\n", + "print(f\"The invers of X.TX matrix:{Xinv}\")\n", + "\n", + "# The XOR gate \n", + "yXOR = np.array( [ 0, 1 ,1, 0])\n", + "ThetaXOR = Xinv @ X.T @ yXOR\n", + "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n", + "print(f\"The linear regression prediction for the XOR gate:{X @ ThetaXOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yOR = np.array( [ 0, 1 ,1, 1])\n", + "ThetaOR = Xinv @ X.T @ yOR\n", + "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n", + "print(f\"The linear regression prediction for the OR gate:{X @ ThetaOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yAND = np.array( [ 0, 0 ,0, 1])\n", + "ThetaAND = Xinv @ X.T @ yAND\n", + "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n", + "print(f\"The linear regression prediction for the AND gate:{X @ ThetaAND}\")" + ] + }, + { + "cell_type": "markdown", + "id": "95b1f5a5", + "metadata": { + "editable": true + }, + "source": [ + "What is happening here?" + ] + }, + { + "cell_type": "markdown", + "id": "0d200eff", + "metadata": { + "editable": true + }, + "source": [ + "## Does Logistic Regression do a better Job?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "040a69d0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"\n", + "Simple code that tests XOR and OR gates with linear regression\n", + "and logistic regression\n", + "\"\"\"\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LogisticRegression\n", + "import numpy as np\n", + "\n", + "# Design matrix\n", + "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n", + "print(f\"The X.TX matrix:{X.T @ X}\")\n", + "Xinv = np.linalg.pinv(X.T @ X)\n", + "print(f\"The invers of X.TX matrix:{Xinv}\")\n", + "\n", + "# The XOR gate \n", + "yXOR = np.array( [ 0, 1 ,1, 0])\n", + "ThetaXOR = Xinv @ X.T @ yXOR\n", + "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n", + "print(f\"The linear regression prediction for the XOR gate:{X @ ThetaXOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yOR = np.array( [ 0, 1 ,1, 1])\n", + "ThetaOR = Xinv @ X.T @ yOR\n", + "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n", + "print(f\"The linear regression prediction for the OR gate:{X @ ThetaOR}\")\n", + "\n", + "\n", + "# The OR gate \n", + "yAND = np.array( [ 0, 0 ,0, 1])\n", + "ThetaAND = Xinv @ X.T @ yAND\n", + "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n", + "print(f\"The linear regression prediction for the AND gate:{X @ ThetaAND}\")\n", + "\n", + "# Now we change to logistic regression\n", + "\n", + "\n", + "# Logistic Regression\n", + "logreg = LogisticRegression()\n", + "logreg.fit(X, yOR)\n", + "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n", + "\n", + "logreg.fit(X, yXOR)\n", + "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n", + "\n", + "\n", + "logreg.fit(X, yAND)\n", + "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))" + ] + }, + { + "cell_type": "markdown", + "id": "49f17f65", + "metadata": { + "editable": true + }, + "source": [ + "Not exactly impressive, but somewhat better." + ] + }, + { + "cell_type": "markdown", + "id": "714e0891", + "metadata": { + "editable": true + }, + "source": [ + "## Adding Neural Networks" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "28bde670", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "# and now neural networks with Scikit-Learn and the XOR\n", + "\n", + "from sklearn.neural_network import MLPClassifier\n", + "from sklearn.datasets import make_classification\n", + "X, yXOR = make_classification(n_samples=100, random_state=1)\n", + "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n", + "FFNN.predict_proba(X)\n", + "print(f\"Test set accuracy with Feed Forward Neural Network for XOR gate:{FFNN.score(X, yXOR)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "4440856f", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "The output $y$ is produced via the activation function $f$" + ] + }, + { + "cell_type": "markdown", + "id": "6199da92", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "62c964e3", + "metadata": { + "editable": true + }, + "source": [ + "This function receives $x_i$ as inputs.\n", + "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n", + "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n", + "the neurons in the preceding layer. Furthermore, an MLP is\n", + "fully-connected, which means that each neuron receives a weighted sum\n", + "of the outputs of *all* neurons in the previous layer." + ] + }, + { + "cell_type": "markdown", + "id": "64ba4c70", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$," + ] + }, + { + "cell_type": "markdown", + "id": "66c11135", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0f47b20a", + "metadata": { + "editable": true + }, + "source": [ + "Here $b_i$ is the so-called bias which is normally needed in\n", + "case of zero activation weights or inputs. How to fix the biases and\n", + "the weights will be discussed below. The value of $z_i^1$ is the\n", + "argument to the activation function $f_i$ of each node $i$, The\n", + "variable $M$ stands for all possible inputs to a given node $i$ in the\n", + "first layer. We define the output $y_i^1$ of all neurons in layer 1 as" + ] + }, + { + "cell_type": "markdown", + "id": "bda56156", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j + b_i^1\\right)\n", + "\\label{outputLayer1} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1330fab9", + "metadata": { + "editable": true + }, + "source": [ + "where we assume that all nodes in the same layer have identical\n", + "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n", + "In this case we would identify these functions with a superscript $l$ for the $l$-th layer," + ] + }, + { + "cell_type": "markdown", + "id": "ae474dfb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n", + "\\label{generalLayer} \\tag{4}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6cb6fed", + "metadata": { + "editable": true + }, + "source": [ + "where $N_l$ is the number of nodes in layer $l$. When the output of\n", + "all the nodes in the first hidden layer are computed, the values of\n", + "the subsequent layer can be calculated and so forth until the output\n", + "is obtained." + ] + }, + { + "cell_type": "markdown", + "id": "2f8f9b4e", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "The output of neuron $i$ in layer 2 is thus," + ] + }, + { + "cell_type": "markdown", + "id": "18e74238", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d10df3e7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \n", + " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n", + "\\label{outputLayer2} \\tag{6}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "da21a316", + "metadata": { + "editable": true + }, + "source": [ + "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads" + ] + }, + { + "cell_type": "markdown", + "id": "76938a28", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n", + "\\label{_auto3} \\tag{7}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65434967", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \n", + " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n", + " + b_1^3\\right]\n", + "\\label{_auto4} \\tag{8}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "31d4f5aa", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "We can generalize this expression to an MLP with $l$ hidden\n", + "layers. The complete functional form is," + ] + }, + { + "cell_type": "markdown", + "id": "114030e5", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n", + "\\label{completeNN} \\tag{9}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a93aec4e", + "metadata": { + "editable": true + }, + "source": [ + "which illustrates a basic property of MLPs: The only independent\n", + "variables are the input values $x_n$." + ] + }, + { + "cell_type": "markdown", + "id": "7c85562d", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematical model\n", + "\n", + "This confirms that an MLP, despite its quite convoluted mathematical\n", + "form, is nothing more than an analytic function, specifically a\n", + "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n", + "\\hat{y} \\in \\mathbb{R}^m$.\n", + "\n", + "Furthermore, the flexibility and universality of an MLP can be\n", + "illustrated by realizing that the expression is essentially a nested\n", + "sum of scaled activation functions of the form" + ] + }, + { + "cell_type": "markdown", + "id": "1152ea5e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " f(x) = c_1 f(c_2 x + c_3) + c_4\n", + "\\label{_auto5} \\tag{10}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f3d4b33", + "metadata": { + "editable": true + }, + "source": [ + "where the parameters $c_i$ are weights and biases. By adjusting these\n", + "parameters, the activation functions can be shifted up and down or\n", + "left and right, change slope or be rescaled which is the key to the\n", + "flexibility of a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "4c1ac54e", + "metadata": { + "editable": true + }, + "source": [ + "### Matrix-vector notation\n", + "\n", + "We can introduce a more convenient notation for the activations in an A NN. \n", + "\n", + "Additionally, we can represent the biases and activations\n", + "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n", + "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n", + "\n", + "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n", + "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n", + "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as" + ] + }, + { + "cell_type": "markdown", + "id": "5c4a861f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n", + " f_2\\left(\\left[\\begin{array}{ccc}\n", + " w^2_{11} &w^2_{12} &w^2_{13} \\\\\n", + " w^2_{21} &w^2_{22} &w^2_{23} \\\\\n", + " w^2_{31} &w^2_{32} &w^2_{33} \\\\\n", + " \\end{array} \\right] \\cdot\n", + " \\left[\\begin{array}{c}\n", + " y^1_1 \\\\\n", + " y^1_2 \\\\\n", + " y^1_3 \\\\\n", + " \\end{array}\\right] + \n", + " \\left[\\begin{array}{c}\n", + " b^2_1 \\\\\n", + " b^2_2 \\\\\n", + " b^2_3 \\\\\n", + " \\end{array}\\right]\\right).\n", + "\\label{_auto6} \\tag{11}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "276b271b", + "metadata": { + "editable": true + }, + "source": [ + "### Matrix-vector notation and activation\n", + "\n", + "The activation of node $i$ in layer 2 is" + ] + }, + { + "cell_type": "markdown", + "id": "63a5b8f1", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n", + " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n", + "\\label{_auto7} \\tag{12}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "316b8c32", + "metadata": { + "editable": true + }, + "source": [ + "This is not just a convenient and compact notation, but also a useful\n", + "and intuitive way to think about MLPs: The output is calculated by a\n", + "series of matrix-vector multiplications and vector additions that are\n", + "used as input to the activation functions. For each operation\n", + "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer." + ] + }, + { + "cell_type": "markdown", + "id": "34ba90c8", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). As described\n", + "in, the following restrictions are imposed on an activation function\n", + "for a FFNN to fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "3019fcaf", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, Logistic and Hyperbolic ones\n", + "\n", + "The second requirement excludes all linear functions. Furthermore, in\n", + "a MLP with only linear activation functions, each layer simply\n", + "performs a linear transformation of its inputs.\n", + "\n", + "Regardless of the number of layers, the output of the NN will be\n", + "nothing but a linear function of the inputs. Thus we need to introduce\n", + "some kind of non-linearity to the NN to be able to fit non-linear\n", + "functions Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "389ff36b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee9b399a", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "36f98b26", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb7b8839", + "metadata": { + "editable": true + }, + "source": [ + "### Relevance\n", + "\n", + "The *sigmoid* function are more biologically plausible because the\n", + "output of inactive neurons are zero. Such activation function are\n", + "called *one-sided*. However, it has been shown that the hyperbolic\n", + "tangent performs better than the sigmoid for training MLPs. has\n", + "become the most popular for *deep neural networks*" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "db8d28b5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\"\"\"The sigmoid function (or the logistic curve) is a \n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Sine Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.sin(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sine function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Plots a graph of the squashing function used by a rectified linear\n", + "unit\"\"\"\n", + "z = numpy.arange(-2, 2, .1)\n", + "zero = numpy.zeros(len(z))\n", + "y = numpy.max([zero, z], axis=0)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, y)\n", + "ax.set_ylim([-2.0, 2.0])\n", + "ax.set_xlim([-2.0, 2.0])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('Rectified linear unit')\n", + "\n", + "plt.show()" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week41.ipynb b/doc/LectureNotes/week41.ipynb new file mode 100644 index 000000000..c9b1adcdd --- /dev/null +++ b/doc/LectureNotes/week41.ipynb @@ -0,0 +1,3820 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b625bb28", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "679109d4", + "metadata": { + "editable": true + }, + "source": [ + "# Week 41 Neural networks and constructing a neural network code\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **Week 41**" + ] + }, + { + "cell_type": "markdown", + "id": "d7401ab9", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 41, October 6-10" + ] + }, + { + "cell_type": "markdown", + "id": "f47e1c5c", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lecture on Monday October 6, 2025\n", + "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n", + "\n", + "2. Building our own Feed-forward Neural Network, getting started\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "af0a9895", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and Videos:\n", + "1. These lecture notes\n", + "\n", + "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n", + "\n", + "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n", + "\n", + "4. Neural Networks demystified at \n", + "\n", + "5. Building Neural Networks from scratch at \n", + "\n", + "6. Video on Neural Networks at \n", + "\n", + "7. Video on the back propagation algorithm at \n", + "\n", + "8. We also recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at ." + ] + }, + { + "cell_type": "markdown", + "id": "be1e5c03", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning\n", + "\n", + "**Two recent books online.**\n", + "\n", + "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n", + "\n", + "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)" + ] + }, + { + "cell_type": "markdown", + "id": "52520e8f", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on books with hands-on material and codes\n", + "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)" + ] + }, + { + "cell_type": "markdown", + "id": "408a0487", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions on Tuesday and Wednesday\n", + "\n", + "Aim: Getting started with coding neural network. The exercises this\n", + "week aim at setting up the feed-forward part of a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "23056baf", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday October 6" + ] + }, + { + "cell_type": "markdown", + "id": "56a2f2f2", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to Neural networks\n", + "\n", + "Artificial neural networks are computational systems that can learn to\n", + "perform tasks by considering examples, generally without being\n", + "programmed with any task-specific rules. It is supposed to mimic a\n", + "biological system, wherein neurons interact by sending signals in the\n", + "form of mathematical functions between layers. All layers can contain\n", + "an arbitrary number of neurons, and each connection is represented by\n", + "a weight variable." + ] + }, + { + "cell_type": "markdown", + "id": "2e3fa93d", + "metadata": { + "editable": true + }, + "source": [ + "## Artificial neurons\n", + "\n", + "The field of artificial neural networks has a long history of\n", + "development, and is closely connected with the advancement of computer\n", + "science and computers in general. A model of artificial neurons was\n", + "first developed by McCulloch and Pitts in 1943 to study signal\n", + "processing in the brain and has later been refined by others. The\n", + "general idea is to mimic neural networks in the human brain, which is\n", + "composed of billions of neurons that communicate with each other by\n", + "sending electrical signals. Each neuron accumulates its incoming\n", + "signals, which must exceed an activation threshold to yield an\n", + "output. If the threshold is not overcome, the neuron remains inactive,\n", + "i.e. has zero output.\n", + "\n", + "This behaviour has inspired a simple mathematical model for an artificial neuron." + ] + }, + { + "cell_type": "markdown", + "id": "0afafe3e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n", + "\\label{artificialNeuron} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc113056", + "metadata": { + "editable": true + }, + "source": [ + "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n", + "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n", + "\n", + "Conceptually, it is helpful to divide neural networks into four\n", + "categories:\n", + "1. general purpose neural networks for supervised learning,\n", + "\n", + "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n", + "\n", + "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n", + "\n", + "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n", + "\n", + "In natural science, DNNs and CNNs have already found numerous\n", + "applications. In statistical physics, they have been applied to detect\n", + "phase transitions in 2D Ising and Potts models, lattice gauge\n", + "theories, and different phases of polymers, or solving the\n", + "Navier-Stokes equation in weather forecasting. Deep learning has also\n", + "found interesting applications in quantum physics. Various quantum\n", + "phase transitions can be detected and studied using DNNs and CNNs,\n", + "topological phases, and even non-equilibrium many-body\n", + "localization. Representing quantum states as DNNs quantum state\n", + "tomography are among some of the impressive achievements to reveal the\n", + "potential of DNNs to facilitate the study of quantum systems.\n", + "\n", + "In quantum information theory, it has been shown that one can perform\n", + "gate decompositions with the help of neural. \n", + "\n", + "The applications are not limited to the natural sciences. There is a\n", + "plethora of applications in essentially all disciplines, from the\n", + "humanities to life science and medicine." + ] + }, + { + "cell_type": "markdown", + "id": "872c3321", + "metadata": { + "editable": true + }, + "source": [ + "## Neural network types\n", + "\n", + "An artificial neural network (ANN), is a computational model that\n", + "consists of layers of connected neurons, or nodes or units. We will\n", + "refer to these interchangeably as units or nodes, and sometimes as\n", + "neurons.\n", + "\n", + "It is supposed to mimic a biological nervous system by letting each\n", + "neuron interact with other neurons by sending signals in the form of\n", + "mathematical functions between layers. A wide variety of different\n", + "ANNs have been developed, but most of them consist of an input layer,\n", + "an output layer and eventual layers in-between, called *hidden\n", + "layers*. All layers can contain an arbitrary number of nodes, and each\n", + "connection between two nodes is associated with a weight variable.\n", + "\n", + "Neural networks (also called neural nets) are neural-inspired\n", + "nonlinear models for supervised learning. As we will see, neural nets\n", + "can be viewed as natural, more powerful extensions of supervised\n", + "learning methods such as linear and logistic regression and soft-max\n", + "methods we discussed earlier." + ] + }, + { + "cell_type": "markdown", + "id": "53edae74", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward neural networks\n", + "\n", + "The feed-forward neural network (FFNN) was the first and simplest type\n", + "of ANNs that were devised. In this network, the information moves in\n", + "only one direction: forward through the layers.\n", + "\n", + "Nodes are represented by circles, while the arrows display the\n", + "connections between the nodes, including the direction of information\n", + "flow. Additionally, each arrow corresponds to a weight variable\n", + "(figure to come). We observe that each node in a layer is connected\n", + "to *all* nodes in the subsequent layer, making this a so-called\n", + "*fully-connected* FFNN." + ] + }, + { + "cell_type": "markdown", + "id": "0eef36d6", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Network\n", + "\n", + "A different variant of FFNNs are *convolutional neural networks*\n", + "(CNNs), which have a connectivity pattern inspired by the animal\n", + "visual cortex. Individual neurons in the visual cortex only respond to\n", + "stimuli from small sub-regions of the visual field, called a receptive\n", + "field. This makes the neurons well-suited to exploit the strong\n", + "spatially local correlation present in natural images. The response of\n", + "each neuron can be approximated mathematically as a convolution\n", + "operation. (figure to come)\n", + "\n", + "Convolutional neural networks emulate the behaviour of neurons in the\n", + "visual cortex by enforcing a *local* connectivity pattern between\n", + "nodes of adjacent layers: Each node in a convolutional layer is\n", + "connected only to a subset of the nodes in the previous layer, in\n", + "contrast to the fully-connected FFNN. Often, CNNs consist of several\n", + "convolutional layers that learn local features of the input, with a\n", + "fully-connected layer at the end, which gathers all the local data and\n", + "produces the outputs. They have wide applications in image and video\n", + "recognition." + ] + }, + { + "cell_type": "markdown", + "id": "bf602451", + "metadata": { + "editable": true + }, + "source": [ + "## Recurrent neural networks\n", + "\n", + "So far we have only mentioned ANNs where information flows in one\n", + "direction: forward. *Recurrent neural networks* on the other hand,\n", + "have connections between nodes that form directed *cycles*. This\n", + "creates a form of internal memory which are able to capture\n", + "information on what has been calculated before; the output is\n", + "dependent on the previous computations. Recurrent NNs make use of\n", + "sequential information by performing the same task for every element\n", + "in a sequence, where each element depends on previous elements. An\n", + "example of such information is sentences, making recurrent NNs\n", + "especially well-suited for handwriting and speech recognition." + ] + }, + { + "cell_type": "markdown", + "id": "0afbe2d0", + "metadata": { + "editable": true + }, + "source": [ + "## Other types of networks\n", + "\n", + "There are many other kinds of ANNs that have been developed. One type\n", + "that is specifically designed for interpolation in multidimensional\n", + "space is the radial basis function (RBF) network. RBFs are typically\n", + "made up of three layers: an input layer, a hidden layer with\n", + "non-linear radial symmetric activation functions and a linear output\n", + "layer (''linear'' here means that each node in the output layer has a\n", + "linear activation function). The layers are normally fully-connected\n", + "and there are no cycles, thus RBFs can be viewed as a type of\n", + "fully-connected FFNN. They are however usually treated as a separate\n", + "type of NN due the unusual activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "d957cfe8", + "metadata": { + "editable": true + }, + "source": [ + "## Multilayer perceptrons\n", + "\n", + "One uses often so-called fully-connected feed-forward neural networks\n", + "with three or more layers (an input layer, one or more hidden layers\n", + "and an output layer) consisting of neurons that have non-linear\n", + "activation functions.\n", + "\n", + "Such networks are often called *multilayer perceptrons* (MLPs)." + ] + }, + { + "cell_type": "markdown", + "id": "57b218ab", + "metadata": { + "editable": true + }, + "source": [ + "## Why multilayer perceptrons?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**.\n", + "\n", + "Note that the requirements on the activation function only applies to\n", + "the hidden layer, the output nodes are always assumed to be linear, so\n", + "as to not restrict the range of output values." + ] + }, + { + "cell_type": "markdown", + "id": "6bda8dda", + "metadata": { + "editable": true + }, + "source": [ + "## Illustration of a single perceptron model and a multi-perceptron model\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two hidden layers, an input layer and an output layer.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f7d514be", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning and neural networks\n", + "\n", + "Neural networks, in its so-called feed-forward form, where each\n", + "iterations contains a feed-forward stage and a back-propgagation\n", + "stage, consist of series of affine matrix-matrix and matrix-vector\n", + "multiplications. The unknown parameters (the so-called biases and\n", + "weights which deternine the architecture of a neural network), are\n", + "uptaded iteratively using the so-called back-propagation algorithm.\n", + "This algorithm corresponds to the so-called reverse mode of \n", + "automatic differentation." + ] + }, + { + "cell_type": "markdown", + "id": "02ed299b", + "metadata": { + "editable": true + }, + "source": [ + "## Basics of an NN\n", + "\n", + "A neural network consists of a series of hidden layers, in addition to\n", + "the input and output layers. Each layer $l$ has a set of parameters\n", + "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n", + "parameters in other layers through a series of affine transformations,\n", + "for a standard NN these are matrix-matrix and matrix-vector\n", + "multiplications. For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n", + "\n", + "It consist of two basic steps:\n", + "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n", + "\n", + "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n", + "\n", + "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion." + ] + }, + { + "cell_type": "markdown", + "id": "96b8c13c", + "metadata": { + "editable": true + }, + "source": [ + "## Overarching view of a neural network\n", + "\n", + "The architecture of a neural network defines our model. This model\n", + "aims at describing some function $f(\\boldsymbol{x}$ which represents\n", + "some final result (outputs or tagrget values) given a specific inpput\n", + "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n", + "vectors.\n", + "\n", + "The architecture consists of\n", + "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n", + "\n", + "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n", + "\n", + "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n", + "\n", + "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n", + "\n", + "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model." + ] + }, + { + "cell_type": "markdown", + "id": "089704bf", + "metadata": { + "editable": true + }, + "source": [ + "## The optimization problem\n", + "\n", + "The cost function is a function of the unknown parameters\n", + "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n", + "parameters needed to define a neural network\n", + "\n", + "If we are dealing with a regression task a typical cost/loss function\n", + "is the mean squared error" + ] + }, + { + "cell_type": "markdown", + "id": "91ef7170", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9402737", + "metadata": { + "editable": true + }, + "source": [ + "This function represents one of many possible ways to define\n", + "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case." + ] + }, + { + "cell_type": "markdown", + "id": "09940e05", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters of neural networks\n", + "For neural networks the parameters\n", + "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n", + "defined below).\n", + "\n", + "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n", + "superscript indicates the layer number. The biases are typically given\n", + "by vector elements representing each single node of a given layer,\n", + "that is $b_j^{(l)}$." + ] + }, + { + "cell_type": "markdown", + "id": "2bd7b3ff", + "metadata": { + "editable": true + }, + "source": [ + "## Other ingredients of a neural network\n", + "\n", + "Having defined the architecture of a neural network, the optimization\n", + "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n", + "involves the calculations of gradients and their optimization. The\n", + "gradients represent the derivatives of a multidimensional object and\n", + "are often approximated by various gradient methods, including\n", + "1. various quasi-Newton methods,\n", + "\n", + "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n", + "\n", + "3. GD with momentum and other approximations to the learning rates such as\n", + "\n", + " * Adapative gradient (ADAgrad)\n", + "\n", + " * Root mean-square propagation (RMSprop)\n", + "\n", + " * Adaptive gradient with momentum (ADAM) and many other\n", + "\n", + "4. Stochastic gradient descent and various families of learning rate approximations" + ] + }, + { + "cell_type": "markdown", + "id": "1a771f02", + "metadata": { + "editable": true + }, + "source": [ + "## Other parameters\n", + "\n", + "In addition to the above, there are often additional hyperparamaters\n", + "which are included in the setup of a neural network. These will be\n", + "discussed below." + ] + }, + { + "cell_type": "markdown", + "id": "3291a232", + "metadata": { + "editable": true + }, + "source": [ + "## Universal approximation theorem\n", + "\n", + "The universal approximation theorem plays a central role in deep\n", + "learning. [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n", + "the following:\n", + "\n", + "Let $\\sigma$ be any continuous sigmoidal function such that" + ] + }, + { + "cell_type": "markdown", + "id": "74cc209d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fe210f2f", + "metadata": { + "editable": true + }, + "source": [ + "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n", + "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n", + "$\\epsilon >0$, there is a one-layer (hidden) neural network\n", + "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n", + "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which" + ] + }, + { + "cell_type": "markdown", + "id": "4dfec9c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a65f0cd5", + "metadata": { + "editable": true + }, + "source": [ + "## Some parallels from real analysis\n", + "\n", + "For those of you familiar with for example the [Stone-Weierstrass\n", + "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n", + "for polynomial approximations or the convergence criterion for Fourier\n", + "series, there are similarities in the derivation of the proof for\n", + "neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "d006386b", + "metadata": { + "editable": true + }, + "source": [ + "## The approximation theorem in words\n", + "\n", + "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n", + "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n", + "arbitrary accuracy.**\n", + "\n", + "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "0b094d43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f2b9ca56", + "metadata": { + "editable": true + }, + "source": [ + "Then we have" + ] + }, + { + "cell_type": "markdown", + "id": "db4817b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "43216143", + "metadata": { + "editable": true + }, + "source": [ + "## More on the general approximation theorem\n", + "\n", + "None of the proofs give any insight into the relation between the\n", + "number of of hidden layers and nodes and the approximation error\n", + "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n", + "\n", + "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n", + "\n", + "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want." + ] + }, + { + "cell_type": "markdown", + "id": "ef48ad88", + "metadata": { + "editable": true + }, + "source": [ + "## Class of functions we can approximate\n", + "\n", + "The class of functions that can be approximated are the continuous ones.\n", + "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points." + ] + }, + { + "cell_type": "markdown", + "id": "7c4fed36", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the equations for a neural network\n", + "\n", + "The questions we want to ask are how do changes in the biases and the\n", + "weights in our network change the cost function and how can we use the\n", + "final output to modify the weights and biases?\n", + "\n", + "To derive these equations let us start with a plain regression problem\n", + "and define our cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "c4cf04e8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ecc9a1bd", + "metadata": { + "editable": true + }, + "source": [ + "where the $y_i$s are our $n$ targets (the values we want to\n", + "reproduce), while the outputs of the network after having propagated\n", + "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$." + ] + }, + { + "cell_type": "markdown", + "id": "91e6e150", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a neural network with three hidden layers\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "4fabe3cc", + "metadata": { + "editable": true + }, + "source": [ + "## Definitions\n", + "\n", + "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n", + "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n", + "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n", + "$l$-th layer as a function of the bias, the weights which add up from\n", + "the previous layer $l-1$ and the forward passes/outputs\n", + "$\\hat{a}^{l-1}$ from the previous layer as" + ] + }, + { + "cell_type": "markdown", + "id": "8c25e4cf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ae861380", + "metadata": { + "editable": true + }, + "source": [ + "where $b_k^l$ are the biases from layer $l$. Here $M_{l-1}$\n", + "represents the total number of nodes/neurons/units of layer $l-1$. The\n", + "figure in the whiteboard notes illustrates this equation. We can rewrite this in a more\n", + "compact form as the matrix-vector products we discussed earlier," + ] + }, + { + "cell_type": "markdown", + "id": "2b7f7b74", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e76386ca", + "metadata": { + "editable": true + }, + "source": [ + "## Inputs to the activation function\n", + "\n", + "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n", + "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n", + "activation function. In the examples here we will use the sigmoid\n", + "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n", + "and their nodes. It means we have" + ] + }, + { + "cell_type": "markdown", + "id": "12a9fb38", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "08bbe824", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives and the chain rule\n", + "\n", + "From the definition of the activation $z_j^l$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "3783fe53", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b70d213", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "209db1b2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6e42f02f", + "metadata": { + "editable": true + }, + "source": [ + "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)" + ] + }, + { + "cell_type": "markdown", + "id": "78422fdc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c8491cf", + "metadata": { + "editable": true + }, + "source": [ + "## Derivative of the cost function\n", + "\n", + "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n", + "\n", + "Let us specialize to the output layer $l=L$. Our cost function is" + ] + }, + { + "cell_type": "markdown", + "id": "82fb3ded", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}^L) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "88fe7049", + "metadata": { + "editable": true + }, + "source": [ + "The derivative of this function with respect to the weights is" + ] + }, + { + "cell_type": "markdown", + "id": "af856571", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d684ab45", + "metadata": { + "editable": true + }, + "source": [ + "The last partial derivative can easily be computed and reads (by applying the chain rule)" + ] + }, + { + "cell_type": "markdown", + "id": "ac371b5c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8dbfe230", + "metadata": { + "editable": true + }, + "source": [ + "## Simpler examples first, and automatic differentiation\n", + "\n", + "In order to understand the back propagation algorithm and its\n", + "derivation (an implementation of the chain rule), let us first digress\n", + "with some simple examples. These examples are also meant to motivate\n", + "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)." + ] + }, + { + "cell_type": "markdown", + "id": "7244f7f3", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on the chain rule and gradients\n", + "\n", + "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)" + ] + }, + { + "cell_type": "markdown", + "id": "ffb80d86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f15ef23", + "metadata": { + "editable": true + }, + "source": [ + "## Multivariable functions\n", + "\n", + "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives" + ] + }, + { + "cell_type": "markdown", + "id": "1734d532", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8c013e25", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f416e200", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "943d440c", + "metadata": { + "editable": true + }, + "source": [ + "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)" + ] + }, + { + "cell_type": "markdown", + "id": "9a88f9e3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s} &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6bc993bf", + "metadata": { + "editable": true + }, + "source": [ + "## Automatic differentiation through examples\n", + "\n", + "A great introduction to automatic differentiation is given by Baydin et al., see .\n", + "See also the video at .\n", + "\n", + "Automatic differentiation is a represented by a repeated application\n", + "of the chain rule on well-known functions and allows for the\n", + "calculation of derivatives to numerical precision. It is not the same\n", + "as the calculation of symbolic derivatives via for example SymPy, nor\n", + "does it use approximative formulae based on Taylor-expansions of a\n", + "function around a given value. The latter are error prone due to\n", + "truncation errors and values of the step size $\\Delta$." + ] + }, + { + "cell_type": "markdown", + "id": "0685fdd2", + "metadata": { + "editable": true + }, + "source": [ + "## Simple example\n", + "\n", + "Our first example is rather simple," + ] + }, + { + "cell_type": "markdown", + "id": "9a2b16de", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) =\\exp{x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba5c3f8a", + "metadata": { + "editable": true + }, + "source": [ + "with derivative" + ] + }, + { + "cell_type": "markdown", + "id": "d0c973a9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) =2x\\exp{x^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "34c21223", + "metadata": { + "editable": true + }, + "source": [ + "We can use SymPy to extract the pertinent lines of Python code through the following simple example" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "72fa0f44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from sympy import *\n", + "x = symbols('x')\n", + "expr = exp(x*x)\n", + "simplify(expr)\n", + "derivative = diff(expr,x)\n", + "print(python(expr))\n", + "print(python(derivative))" + ] + }, + { + "cell_type": "markdown", + "id": "78884bc6", + "metadata": { + "editable": true + }, + "source": [ + "## Smarter way of evaluating the above function\n", + "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable" + ] + }, + { + "cell_type": "markdown", + "id": "f13d7286", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a = x^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "443739d9", + "metadata": { + "editable": true + }, + "source": [ + "leading to" + ] + }, + { + "cell_type": "markdown", + "id": "48b45da1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = f(a(x)) = b= \\exp{a}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "81e7fd8f", + "metadata": { + "editable": true + }, + "source": [ + "We now assume that all operations can be counted in terms of equal\n", + "floating point operations. This means that in order to calculate\n", + "$f(x)$ we need first to square $x$ and then compute the exponential. We\n", + "have thus two floating point operations only." + ] + }, + { + "cell_type": "markdown", + "id": "824bbfa1", + "metadata": { + "editable": true + }, + "source": [ + "## Reducing the number of operations\n", + "\n", + "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "42d2716e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) = 2xb,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f27855c1", + "metadata": { + "editable": true + }, + "source": [ + "which reduces the number of operations from four in the orginal\n", + "expression to two. This means that if we need to compute $f(x)$ and\n", + "its derivative (a common task in optimizations), we have reduced the\n", + "number of operations from six to four in total.\n", + "\n", + "**Note** that the usage of a symbolic software like SymPy does not\n", + "include such simplifications and the calculations of the function and\n", + "the derivatives yield in general more floating point operations." + ] + }, + { + "cell_type": "markdown", + "id": "d4fe531f", + "metadata": { + "editable": true + }, + "source": [ + "## Chain rule, forward and reverse modes\n", + "\n", + "In the above example we have introduced the variables $a$ and $b$, and our function is" + ] + }, + { + "cell_type": "markdown", + "id": "aba8f666", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = f(a(x)) = b= \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "404c698a", + "metadata": { + "editable": true + }, + "source": [ + "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as" + ] + }, + { + "cell_type": "markdown", + "id": "2c73032a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "95a71a82", + "metadata": { + "editable": true + }, + "source": [ + "We note that since $b=f(x)$ that" + ] + }, + { + "cell_type": "markdown", + "id": "c71b8e66", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{db}=1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "23998633", + "metadata": { + "editable": true + }, + "source": [ + "leading to" + ] + }, + { + "cell_type": "markdown", + "id": "0708e562", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ee8c4ade", + "metadata": { + "editable": true + }, + "source": [ + "as before." + ] + }, + { + "cell_type": "markdown", + "id": "860d410c", + "metadata": { + "editable": true + }, + "source": [ + "## Forward and reverse modes\n", + "\n", + "We have that" + ] + }, + { + "cell_type": "markdown", + "id": "064e5852", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "983c3afe", + "metadata": { + "editable": true + }, + "source": [ + "which we can rewrite either as" + ] + }, + { + "cell_type": "markdown", + "id": "a1f9638f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "84a07e04", + "metadata": { + "editable": true + }, + "source": [ + "or" + ] + }, + { + "cell_type": "markdown", + "id": "4383650d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "36a2d607", + "metadata": { + "editable": true + }, + "source": [ + "The first expression is called reverse mode (or back propagation)\n", + "since we start by evaluating the derivatives at the end point and then\n", + "propagate backwards. This is the standard way of evaluating\n", + "derivatives (gradients) when optimizing the parameters of a neural\n", + "network. In the context of deep learning this is computationally\n", + "more efficient since the output of a neural network consists of either\n", + "one or some few other output variables.\n", + "\n", + "The second equation defines the so-called **forward mode**." + ] + }, + { + "cell_type": "markdown", + "id": "ab0a9ca8", + "metadata": { + "editable": true + }, + "source": [ + "## More complicated function\n", + "\n", + "We increase our ambitions and introduce a slightly more complicated function" + ] + }, + { + "cell_type": "markdown", + "id": "e85a7d29", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) =\\sqrt{x^2+exp{x^2}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "91c151e1", + "metadata": { + "editable": true + }, + "source": [ + "with derivative" + ] + }, + { + "cell_type": "markdown", + "id": "037a60e4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f198b96", + "metadata": { + "editable": true + }, + "source": [ + "The corresponding SymPy code reads" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "620b6c3e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from sympy import *\n", + "x = symbols('x')\n", + "expr = sqrt(x*x+exp(x*x))\n", + "simplify(expr)\n", + "derivative = diff(expr,x)\n", + "print(python(expr))\n", + "print(python(derivative))" + ] + }, + { + "cell_type": "markdown", + "id": "d1fe5ce8", + "metadata": { + "editable": true + }, + "source": [ + "## Counting the number of floating point operations\n", + "\n", + "A simple count of operations shows that we need five operations for\n", + "the function itself and ten for the derivative. Fifteen operations in total if we wish to proceed with the above codes.\n", + "\n", + "Can we reduce this to\n", + "say half the number of operations?" + ] + }, + { + "cell_type": "markdown", + "id": "746e84de", + "metadata": { + "editable": true + }, + "source": [ + "## Defining intermediate operations\n", + "\n", + "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n", + "We define the following quantities" + ] + }, + { + "cell_type": "markdown", + "id": "cbb4abde", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a = x^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "640a0037", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "e3b8b12d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b = \\exp{x^2} = \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5b2087bf", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "5c397a99", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "c= a+b,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4884822", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c1834aef", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "d=f(x)=\\sqrt{c}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aeee8fc4", + "metadata": { + "editable": true + }, + "source": [ + "## New expression for the derivative\n", + "\n", + "With these definitions we obtain the following partial derivatives" + ] + }, + { + "cell_type": "markdown", + "id": "df71e889", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a}{\\partial x} = 2x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "358a49a2", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "95138b08", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial b}{\\partial a} = \\exp{a},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0a0e2f81", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "7fa7f3b5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial c}{\\partial a} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c74442e2", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2e9ebae8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial c}{\\partial b} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "db89516c", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "0bc2735a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42e0cb08", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "56ccf1d5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial d} = 1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "557f2482", + "metadata": { + "editable": true + }, + "source": [ + "## Final derivatives\n", + "Our final derivatives are thus" + ] + }, + { + "cell_type": "markdown", + "id": "90eeebe1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6c2abeb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b} = \\frac{1}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3f5af305", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n", + "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a} = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b78e9f43", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "d197d721", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x} = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "17334528", + "metadata": { + "editable": true + }, + "source": [ + "which is just" + ] + }, + { + "cell_type": "markdown", + "id": "f69ca3fd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e937d622", + "metadata": { + "editable": true + }, + "source": [ + "and requires only three operations if we can reuse all intermediate variables." + ] + }, + { + "cell_type": "markdown", + "id": "8ab7ba6b", + "metadata": { + "editable": true + }, + "source": [ + "## In general not this simple\n", + "\n", + "In general, see the generalization below, unless we can obtain simple\n", + "analytical expressions which we can simplify further, the final\n", + "implementation of automatic differentiation involves repeated\n", + "calculations (and thereby operations) of derivatives of elementary\n", + "functions." + ] + }, + { + "cell_type": "markdown", + "id": "02665ba6", + "metadata": { + "editable": true + }, + "source": [ + "## Automatic differentiation\n", + "\n", + "We can make this example more formal. Automatic differentiation is a\n", + "formalization of the previous example (see graph).\n", + "\n", + "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n", + "\n", + "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is" + ] + }, + { + "cell_type": "markdown", + "id": "c473a49a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6beeffc2", + "metadata": { + "editable": true + }, + "source": [ + "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n", + "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n", + "\n", + "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$." + ] + }, + { + "cell_type": "markdown", + "id": "814918db", + "metadata": { + "editable": true + }, + "source": [ + "## Chain rule\n", + "\n", + "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n", + "We have defined" + ] + }, + { + "cell_type": "markdown", + "id": "a7a72e3b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x_L} = 1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "041df7ab", + "metadata": { + "editable": true + }, + "source": [ + "which allows us to find the derivatives of the various variables $x_i$ as" + ] + }, + { + "cell_type": "markdown", + "id": "b687bc51", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5c87f3af", + "metadata": { + "editable": true + }, + "source": [ + "Whenever we have a function which can be expressed as a computation\n", + "graph and the various functions can be expressed in terms of\n", + "elementary functions that are differentiable, then automatic\n", + "differentiation works. The functions may not need to be elementary\n", + "functions, they could also be computer programs, although not all\n", + "programs can be automatically differentiated." + ] + }, + { + "cell_type": "markdown", + "id": "02df0535", + "metadata": { + "editable": true + }, + "source": [ + "## First network example, simple percepetron with one input\n", + "\n", + "As yet another example we define now a simple perceptron model with\n", + "all quantities given by scalars. We consider only one input variable\n", + "$x$ and one target value $y$. We define an activation function\n", + "$\\sigma_1$ which takes as input" + ] + }, + { + "cell_type": "markdown", + "id": "dc45fa01", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1x+b_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5568395b", + "metadata": { + "editable": true + }, + "source": [ + "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", + "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", + "graph from whiteboard notes). This output is then fed into the\n", + "**cost/loss** function, which we here for the sake of simplicity just\n", + "define as the squared error" + ] + }, + { + "cell_type": "markdown", + "id": "e6ae6f18", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d6abd22", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with no hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "1e466108", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the parameters\n", + "\n", + "In setting up the feed forward and back propagation parts of the\n", + "algorithm, we need now the derivative of the various variables we want\n", + "to train.\n", + "\n", + "We need" + ] + }, + { + "cell_type": "markdown", + "id": "3b6fd059", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfad60fc", + "metadata": { + "editable": true + }, + "source": [ + "Using the chain rule we find" + ] + }, + { + "cell_type": "markdown", + "id": "5c5014b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c677323", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "93362833", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c857a902", + "metadata": { + "editable": true + }, + "source": [ + "which we later will just define as" + ] + }, + { + "cell_type": "markdown", + "id": "b7b95721", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e2574534", + "metadata": { + "editable": true + }, + "source": [ + "## Adding a hidden layer\n", + "\n", + "We change our simple model to (see graph)\n", + "a network with just one hidden layer but with scalar variables only.\n", + "\n", + "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "ae7a5afa", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7962e138", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0add2cb1", + "metadata": { + "editable": true + }, + "source": [ + "and the cost function" + ] + }, + { + "cell_type": "markdown", + "id": "2ea986fc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "683c4849", + "metadata": { + "editable": true + }, + "source": [ + "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." + ] + }, + { + "cell_type": "markdown", + "id": "f345670c", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with one hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "bb15a76b", + "metadata": { + "editable": true + }, + "source": [ + "## The derivatives\n", + "\n", + "The derivatives are now, using the chain rule again" + ] + }, + { + "cell_type": "markdown", + "id": "d0882362", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3e16d45d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2a0a41b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e8f61358", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5a8258cb", + "metadata": { + "editable": true + }, + "source": [ + "Can you generalize this to more than one hidden layer?" + ] + }, + { + "cell_type": "markdown", + "id": "bb720314", + "metadata": { + "editable": true + }, + "source": [ + "## Important observations\n", + "\n", + "From the above equations we see that the derivatives of the activation\n", + "functions play a central role. If they vanish, the training may\n", + "stop. This is called the vanishing gradient problem, see discussions below. If they become\n", + "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n", + "is referenced as the exploding gradient problem." + ] + }, + { + "cell_type": "markdown", + "id": "52217a26", + "metadata": { + "editable": true + }, + "source": [ + "## The training\n", + "\n", + "The training of the parameters is done through various gradient descent approximations with" + ] + }, + { + "cell_type": "markdown", + "id": "eb647e50", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cda95964", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "130a2766", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_i \\leftarrow b_i-\\eta \\delta_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ac7cc3bc", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ is the learning rate.\n", + "\n", + "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n", + "\n", + "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model." + ] + }, + { + "cell_type": "markdown", + "id": "cde60cd2", + "metadata": { + "editable": true + }, + "source": [ + "## Code example\n", + "\n", + "The code here implements the above model with one hidden layer and\n", + "scalar variables for the same function we studied in the previous\n", + "example. The code is however set up so that we can add multiple\n", + "inputs $x$ and target values $y$. Note also that we have the\n", + "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n", + "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "3616dd69", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "# We use the Sigmoid function as activation function\n", + "def sigmoid(z):\n", + " return 1.0/(1.0+np.exp(-z))\n", + "\n", + "def forwardpropagation(x):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_1 = np.matmul(x, w_1) + b_1\n", + " # activation in the hidden layer\n", + " a_1 = sigmoid(z_1)\n", + " # weighted sum of inputs to the output layer\n", + " z_2 = np.matmul(a_1, w_2) + b_2\n", + " a_2 = z_2\n", + " return a_1, a_2\n", + "\n", + "def backpropagation(x, y):\n", + " a_1, a_2 = forwardpropagation(x)\n", + " # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n", + " delta_2 = a_2 - y\n", + " print(0.5*((a_2-y)**2))\n", + " # delta for the hidden layer\n", + " delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_1.T, delta_2)\n", + " output_bias_gradient = np.sum(delta_2, axis=0)\n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(x.T, delta_1)\n", + " hidden_bias_gradient = np.sum(delta_1, axis=0)\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = np.array([4.0],dtype=np.float64)\n", + "# Target values\n", + "y = 2*x+1.0 \n", + "\n", + "# Defining the neural network, only scalars here\n", + "n_inputs = x.shape\n", + "n_features = 1\n", + "n_hidden_neurons = 1\n", + "n_outputs = 1\n", + "\n", + "# Initialize the network\n", + "# weights and bias in the hidden layer\n", + "w_1 = np.random.randn(n_features, n_hidden_neurons)\n", + "b_1 = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n", + "b_2 = np.zeros(n_outputs) + 0.01\n", + "\n", + "eta = 0.1\n", + "for i in range(50):\n", + " # calculate gradients\n", + " derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " w_2 -= eta * derivW2\n", + " b_2 -= eta * derivB2\n", + " w_1 -= eta * derivW1\n", + " b_1 -= eta * derivB1" + ] + }, + { + "cell_type": "markdown", + "id": "3348a149", + "metadata": { + "editable": true + }, + "source": [ + "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." + ] + }, + { + "cell_type": "markdown", + "id": "b9b47543", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 1: Including more data\n", + "\n", + "Try to increase the amount of input and\n", + "target/output data. Try also to perform calculations for more values\n", + "of the learning rates. Feel free to add either hyperparameters with an\n", + "$l_1$ norm or an $l_2$ norm and discuss your results.\n", + "Discuss your results as functions of the amount of training data and various learning rates.\n", + "\n", + "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**." + ] + }, + { + "cell_type": "markdown", + "id": "3d2a82c9", + "metadata": { + "editable": true + }, + "source": [ + "## Simple neural network and the back propagation equations\n", + "\n", + "Let us now try to increase our level of ambition and attempt at setting \n", + "up the equations for a neural network with two input nodes, one hidden\n", + "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n", + "\n", + "We need to define the following parameters and variables with the input layer (layer $(0)$) \n", + "where we label the nodes $x_0$ and $x_1$" + ] + }, + { + "cell_type": "markdown", + "id": "e2bda122", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4324d91", + "metadata": { + "editable": true + }, + "source": [ + "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" + ] + }, + { + "cell_type": "markdown", + "id": "b3c0b344", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fb200d12", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with two input nodes, one hidden layer and one output node\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "5a7e37cd", + "metadata": { + "editable": true + }, + "source": [ + "## The ouput layer\n", + "\n", + "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables" + ] + }, + { + "cell_type": "markdown", + "id": "11f25dfa", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8755dbae", + "metadata": { + "editable": true + }, + "source": [ + "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", + "The parameters we need to optimize are given by" + ] + }, + { + "cell_type": "markdown", + "id": "51983594", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20a70d90", + "metadata": { + "editable": true + }, + "source": [ + "## Compact expressions\n", + "\n", + "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n", + "The inputs to the first hidden layer are" + ] + }, + { + "cell_type": "markdown", + "id": "76e186dc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3396d1b9", + "metadata": { + "editable": true + }, + "source": [ + "with outputs" + ] + }, + { + "cell_type": "markdown", + "id": "2f4d2eed", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6863edaa", + "metadata": { + "editable": true + }, + "source": [ + "## Output layer\n", + "\n", + "For the final output layer we have the inputs to the final activation function" + ] + }, + { + "cell_type": "markdown", + "id": "569b5a62", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "88775a53", + "metadata": { + "editable": true + }, + "source": [ + "resulting in the output" + ] + }, + { + "cell_type": "markdown", + "id": "11852c41", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4e2e26a9", + "metadata": { + "editable": true + }, + "source": [ + "## Explicit derivatives\n", + "\n", + "In total we have nine parameters which we need to train. Using the\n", + "chain rule (or just the back-propagation algorithm) we can find all\n", + "derivatives. Since we will use automatic differentiation in reverse\n", + "mode, we start with the derivatives of the cost function with respect\n", + "to the parameters of the output layer, namely" + ] + }, + { + "cell_type": "markdown", + "id": "25da37b5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4094b188", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "99f40072", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a93180cb", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "312c8e22", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4db8065c", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives of the hidden layer\n", + "\n", + "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)" + ] + }, + { + "cell_type": "markdown", + "id": "316b7cc7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}= \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ef16e76", + "metadata": { + "editable": true + }, + "source": [ + "which, noting that" + ] + }, + { + "cell_type": "markdown", + "id": "85a0f70d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "108db06e", + "metadata": { + "editable": true + }, + "source": [ + "allows us to rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "2922e5c6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb6f6fe5", + "metadata": { + "editable": true + }, + "source": [ + "## Final expression\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "3a0d272d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "70a6cf5c", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "a862fb73", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "703fa2c1", + "metadata": { + "editable": true + }, + "source": [ + "Similarly, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "2032458a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "97d8acd7", + "metadata": { + "editable": true + }, + "source": [ + "## Completing the list\n", + "\n", + "Similarly, we find" + ] + }, + { + "cell_type": "markdown", + "id": "972e5301", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba8f5955", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "3ac41463", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ab92a69c", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "8224b6f2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b55a566b", + "metadata": { + "editable": true + }, + "source": [ + "## Final expressions for the biases of the hidden layer\n", + "\n", + "For the sake of completeness, we list the derivatives of the biases, which are" + ] + }, + { + "cell_type": "markdown", + "id": "cb5f687e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6d8361e8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ccfb7fa8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20fd0aa3", + "metadata": { + "editable": true + }, + "source": [ + "As we will see below, these expressions can be generalized in a more compact form." + ] + }, + { + "cell_type": "markdown", + "id": "6bca7f99", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient expressions\n", + "\n", + "For this specific model, with just one output node and two hidden\n", + "nodes, the gradient descent equations take the following form for output layer" + ] + }, + { + "cell_type": "markdown", + "id": "430e26d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ced71f83", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ec12ee1a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f46fe24d", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "af8f924d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4aeb6140", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "0bc2f26c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7eafd358", + "metadata": { + "editable": true + }, + "source": [ + "where $\\eta$ is the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "548f58f6", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 2: Extended program\n", + "\n", + "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "4c38514a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "06303245", + "metadata": { + "editable": true + }, + "source": [ + "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$" + ] + }, + { + "cell_type": "markdown", + "id": "ed0c0029", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "93df389e", + "metadata": { + "editable": true + }, + "source": [ + "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n", + "You can extend your code to include automatic differentiation.\n", + "\n", + "With these examples, we are now ready to embark upon the writing of more a general code for neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "5df18704", + "metadata": { + "editable": true + }, + "source": [ + "## Getting serious, the back propagation equations for a neural network\n", + "\n", + "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n", + "\n", + "We have thus" + ] + }, + { + "cell_type": "markdown", + "id": "ae3765be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dd8f7882", + "metadata": { + "editable": true + }, + "source": [ + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "f204fdd7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c28e8401", + "metadata": { + "editable": true + }, + "source": [ + "and using the Hadamard product of two vectors we can write this as" + ] + }, + { + "cell_type": "markdown", + "id": "910c4eb1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "efd2f948", + "metadata": { + "editable": true + }, + "source": [ + "## Analyzing the last results\n", + "\n", + "This is an important expression. The second term on the right handside\n", + "measures how fast the cost function is changing as a function of the $j$th\n", + "output activation. If, for example, the cost function doesn't depend\n", + "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n", + "which is what we would expect. The first term on the right, measures\n", + "how fast the activation function $f$ is changing at a given activation\n", + "value $z_j^L$." + ] + }, + { + "cell_type": "markdown", + "id": "e1eeeba2", + "metadata": { + "editable": true + }, + "source": [ + "## More considerations\n", + "\n", + "Notice that everything in the above equations is easily computed. In\n", + "particular, we compute $z_j^L$ while computing the behaviour of the\n", + "network, and it is only a small additional overhead to compute\n", + "$\\sigma'(z^L_j)$. The exact form of the derivative with respect to the\n", + "output depends on the form of the cost function.\n", + "However, provided the cost function is known there should be little\n", + "trouble in calculating" + ] + }, + { + "cell_type": "markdown", + "id": "b5e74c11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e129fe72", + "metadata": { + "editable": true + }, + "source": [ + "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" + ] + }, + { + "cell_type": "markdown", + "id": "3879d293", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1ea1da9d", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives in terms of $z_j^L$\n", + "\n", + "It is also easy to see that our previous equation can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "c7156e16", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8311b4aa", + "metadata": { + "editable": true + }, + "source": [ + "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" + ] + }, + { + "cell_type": "markdown", + "id": "7bb3d820", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eeb0c00", + "metadata": { + "editable": true + }, + "source": [ + "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." + ] + }, + { + "cell_type": "markdown", + "id": "bc7d3757", + "metadata": { + "editable": true + }, + "source": [ + "## Bringing it together\n", + "\n", + "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are" + ] + }, + { + "cell_type": "markdown", + "id": "9f018cff", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1},\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ebde7551", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f96aa8f7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "\\label{_auto2} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1215d118", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c5f6885e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "\\label{_auto3} \\tag{4}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1dedde99", + "metadata": { + "editable": true + }, + "source": [ + "## Final back propagating equation\n", + "\n", + "We have that (replacing $L$ with a general layer $l$)" + ] + }, + { + "cell_type": "markdown", + "id": "a182b912", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9fcc3201", + "metadata": { + "editable": true + }, + "source": [ + "We want to express this in terms of the equations for layer $l+1$." + ] + }, + { + "cell_type": "markdown", + "id": "54237463", + "metadata": { + "editable": true + }, + "source": [ + "## Using the chain rule and summing over all $k$ entries\n", + "\n", + "We obtain" + ] + }, + { + "cell_type": "markdown", + "id": "dc069f5a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "71ba0435", + "metadata": { + "editable": true + }, + "source": [ + "and recalling that" + ] + }, + { + "cell_type": "markdown", + "id": "bd00cbe9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e7e0241", + "metadata": { + "editable": true + }, + "source": [ + "with $M_l$ being the number of nodes in layer $l$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "e8e3697e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7d86a02b", + "metadata": { + "editable": true + }, + "source": [ + "This is our final equation.\n", + "\n", + "We are now ready to set up the algorithm for back propagation and learning the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "ff1dc46f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm\n", + "\n", + "The four equations provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\hat{x}$ and the activations\n", + "$\\hat{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\hat{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\hat{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "1313e6dc", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "74378773", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "70450254", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "81a28b23", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a733356", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "f469f486", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7461e5e6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "50a1b605", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "0cebce43", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "2e4405bd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2920aa4e", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "bc4357b0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d9b66569", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week42.ipynb b/doc/LectureNotes/week42.ipynb new file mode 100644 index 000000000..45a126e79 --- /dev/null +++ b/doc/LectureNotes/week42.ipynb @@ -0,0 +1,5952 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d231eeee", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "5e782cb1", + "metadata": { + "editable": true + }, + "source": [ + "# Week 42 Constructing a Neural Network code with examples\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **October 13-17, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "53309290", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture October 13, 2025\n", + "1. Building our own Feed-forward Neural Network and discussion of project 2\n", + "\n", + "2. Project 2 is available at " + ] + }, + { + "cell_type": "markdown", + "id": "71367514", + "metadata": { + "editable": true + }, + "source": [ + "## Readings and videos\n", + "1. These lecture notes\n", + "\n", + "2. Video of lecture at \n", + "\n", + "3. Whiteboard notes at \n", + "\n", + "4. For a more in depth discussion on neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8. \n", + "\n", + "5. Neural Networks demystified at \n", + "\n", + "6. Building Neural Networks from scratch at \n", + "\n", + "7. Video on Neural Networks at \n", + "\n", + "8. Video on the back propagation algorithm at \n", + "\n", + "I also recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at ." + ] + }, + { + "cell_type": "markdown", + "id": "c7be87be", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions on Tuesday and Wednesday\n", + "1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at \n", + "\n", + "2. Discussion of project 2" + ] + }, + { + "cell_type": "markdown", + "id": "8e0567a2", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture material: Writing a code which implements a feed-forward neural network\n", + "\n", + "Last week we discussed the basics of neural networks and deep learning\n", + "and the basics of automatic differentiation. We looked also at\n", + "examples on how compute the parameters of a simple network with scalar\n", + "inputs and ouputs and no or just one hidden layers.\n", + "\n", + "We ended our discussions with the derivation of the equations for a\n", + "neural network with one hidden layers and two input variables and two\n", + "hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm." + ] + }, + { + "cell_type": "markdown", + "id": "549dcc05", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of deep learning\n", + "\n", + "**Two recent books online.**\n", + "\n", + "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n", + "\n", + "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)" + ] + }, + { + "cell_type": "markdown", + "id": "21203bae", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder on books with hands-on material and codes\n", + "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)" + ] + }, + { + "cell_type": "markdown", + "id": "1c102a30", + "metadata": { + "editable": true + }, + "source": [ + "## Reading recommendations\n", + "\n", + "1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n", + "\n", + "2. Goodfellow et al, chapter 6 and 7 contain most of the neural network background." + ] + }, + { + "cell_type": "markdown", + "id": "53f11afe", + "metadata": { + "editable": true + }, + "source": [ + "## Reminder from last week: First network example, simple percepetron with one input\n", + "\n", + "As yet another example we define now a simple perceptron model with\n", + "all quantities given by scalars. We consider only one input variable\n", + "$x$ and one target value $y$. We define an activation function\n", + "$\\sigma_1$ which takes as input" + ] + }, + { + "cell_type": "markdown", + "id": "afa8c42a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1x+b_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cb5c959f", + "metadata": { + "editable": true + }, + "source": [ + "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", + "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", + "graph from whiteboard notes). This output is then fed into the\n", + "**cost/loss** function, which we here for the sake of simplicity just\n", + "define as the squared error" + ] + }, + { + "cell_type": "markdown", + "id": "0083ae15", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f4931203", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with no hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d3a3754d", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the parameters\n", + "\n", + "In setting up the feed forward and back propagation parts of the\n", + "algorithm, we need now the derivative of the various variables we want\n", + "to train.\n", + "\n", + "We need" + ] + }, + { + "cell_type": "markdown", + "id": "bcd5dbab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2cbc30f1", + "metadata": { + "editable": true + }, + "source": [ + "Using the chain rule we find" + ] + }, + { + "cell_type": "markdown", + "id": "1a1d803d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "776735c7", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c1a2e5af", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9e603df9", + "metadata": { + "editable": true + }, + "source": [ + "which we later will just define as" + ] + }, + { + "cell_type": "markdown", + "id": "533212cd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "09d91067", + "metadata": { + "editable": true + }, + "source": [ + "## Adding a hidden layer\n", + "\n", + "We change our simple model to (see graph)\n", + "a network with just one hidden layer but with scalar variables only.\n", + "\n", + "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "f767afe7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f38ded54", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3f03bc3", + "metadata": { + "editable": true + }, + "source": [ + "and the cost function" + ] + }, + { + "cell_type": "markdown", + "id": "9062730e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "75bbc32c", + "metadata": { + "editable": true + }, + "source": [ + "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." + ] + }, + { + "cell_type": "markdown", + "id": "fcf02dbf", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with one hidden layer\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "aa97678f", + "metadata": { + "editable": true + }, + "source": [ + "## The derivatives\n", + "\n", + "The derivatives are now, using the chain rule again" + ] + }, + { + "cell_type": "markdown", + "id": "98f68e27", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4528178", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d6304298", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dfc47ba6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8834c3dc", + "metadata": { + "editable": true + }, + "source": [ + "Can you generalize this to more than one hidden layer?" + ] + }, + { + "cell_type": "markdown", + "id": "40956770", + "metadata": { + "editable": true + }, + "source": [ + "## Important observations\n", + "\n", + "From the above equations we see that the derivatives of the activation\n", + "functions play a central role. If they vanish, the training may\n", + "stop. This is called the vanishing gradient problem, see discussions below. If they become\n", + "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n", + "is referenced as the exploding gradient problem." + ] + }, + { + "cell_type": "markdown", + "id": "69e7fdcf", + "metadata": { + "editable": true + }, + "source": [ + "## The training\n", + "\n", + "The training of the parameters is done through various gradient descent approximations with" + ] + }, + { + "cell_type": "markdown", + "id": "726d4c90", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0ee83d1c", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f5b3b5a5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_i \\leftarrow b_i-\\eta \\delta_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2746792", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ is the learning rate.\n", + "\n", + "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n", + "\n", + "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model." + ] + }, + { + "cell_type": "markdown", + "id": "76e2e41a", + "metadata": { + "editable": true + }, + "source": [ + "## Code example\n", + "\n", + "The code here implements the above model with one hidden layer and\n", + "scalar variables for the same function we studied in the previous\n", + "example. The code is however set up so that we can add multiple\n", + "inputs $x$ and target values $y$. Note also that we have the\n", + "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n", + "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1c4719c1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "# We use the Sigmoid function as activation function\n", + "def sigmoid(z):\n", + " return 1.0/(1.0+np.exp(-z))\n", + "\n", + "def forwardpropagation(x):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_1 = np.matmul(x, w_1) + b_1\n", + " # activation in the hidden layer\n", + " a_1 = sigmoid(z_1)\n", + " # weighted sum of inputs to the output layer\n", + " z_2 = np.matmul(a_1, w_2) + b_2\n", + " a_2 = z_2\n", + " return a_1, a_2\n", + "\n", + "def backpropagation(x, y):\n", + " a_1, a_2 = forwardpropagation(x)\n", + " # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n", + " delta_2 = a_2 - y\n", + " print(0.5*((a_2-y)**2))\n", + " # delta for the hidden layer\n", + " delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_1.T, delta_2)\n", + " output_bias_gradient = np.sum(delta_2, axis=0)\n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(x.T, delta_1)\n", + " hidden_bias_gradient = np.sum(delta_1, axis=0)\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = np.array([4.0],dtype=np.float64)\n", + "# Target values\n", + "y = 2*x+1.0 \n", + "\n", + "# Defining the neural network, only scalars here\n", + "n_inputs = x.shape\n", + "n_features = 1\n", + "n_hidden_neurons = 1\n", + "n_outputs = 1\n", + "\n", + "# Initialize the network\n", + "# weights and bias in the hidden layer\n", + "w_1 = np.random.randn(n_features, n_hidden_neurons)\n", + "b_1 = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n", + "b_2 = np.zeros(n_outputs) + 0.01\n", + "\n", + "eta = 0.1\n", + "for i in range(50):\n", + " # calculate gradients\n", + " derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " w_2 -= eta * derivW2\n", + " b_2 -= eta * derivB2\n", + " w_1 -= eta * derivW1\n", + " b_1 -= eta * derivB1" + ] + }, + { + "cell_type": "markdown", + "id": "debaaadc", + "metadata": { + "editable": true + }, + "source": [ + "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." + ] + }, + { + "cell_type": "markdown", + "id": "7d576f19", + "metadata": { + "editable": true + }, + "source": [ + "## Simple neural network and the back propagation equations\n", + "\n", + "Let us now try to increase our level of ambition and attempt at setting \n", + "up the equations for a neural network with two input nodes, one hidden\n", + "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n", + "\n", + "We need to define the following parameters and variables with the input layer (layer $(0)$) \n", + "where we label the nodes $x_1$ and $x_2$" + ] + }, + { + "cell_type": "markdown", + "id": "582b3b43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "x_1 = a_1^{(0)} \\wedge x_2 = a_2^{(0)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c8eace47", + "metadata": { + "editable": true + }, + "source": [ + "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" + ] + }, + { + "cell_type": "markdown", + "id": "81ec9945", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_1^{(1)},b_2^{(1)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c35e1f69", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a simple neural network with two input nodes, one hidden layer with two hidden noeds and one output node\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "05b8eea9", + "metadata": { + "editable": true + }, + "source": [ + "## The ouput layer\n", + "\n", + "We have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables" + ] + }, + { + "cell_type": "markdown", + "id": "7ef9cb55", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}=\\left\\{w_{1}^{(2)},w_{2}^{(2)}\\right\\} \\wedge b^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eb5c5ac", + "metadata": { + "editable": true + }, + "source": [ + "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", + "The parameters we need to optimize are given by" + ] + }, + { + "cell_type": "markdown", + "id": "00492358", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\Theta}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45cca5aa", + "metadata": { + "editable": true + }, + "source": [ + "## Compact expressions\n", + "\n", + "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n", + "The inputs to the first hidden layer are" + ] + }, + { + "cell_type": "markdown", + "id": "22cfb40b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}z_1^{(1)} \\\\ z_2^{(1)} \\end{bmatrix}=\\left(\\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\\\ w_{21}^{(1)} &w_{22}^{(1)} \\end{bmatrix}\\right)^{T}\\begin{bmatrix}a_1^{(0)} \\\\ a_2^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_1^{(1)} \\\\ b_2^{(1)} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "45b30d06", + "metadata": { + "editable": true + }, + "source": [ + "with outputs" + ] + }, + { + "cell_type": "markdown", + "id": "ebd6a7a5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{bmatrix}a_1^{(1)} \\\\ a_2^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_1^{(1)}) \\\\ \\sigma^{(1)}(z_2^{(1)}) \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "659dd686", + "metadata": { + "editable": true + }, + "source": [ + "## Output layer\n", + "\n", + "For the final output layer we have the inputs to the final activation function" + ] + }, + { + "cell_type": "markdown", + "id": "34a1d4ca", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "34471712", + "metadata": { + "editable": true + }, + "source": [ + "resulting in the output" + ] + }, + { + "cell_type": "markdown", + "id": "0b3a74fd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1a5bdab3", + "metadata": { + "editable": true + }, + "source": [ + "## Explicit derivatives\n", + "\n", + "In total we have nine parameters which we need to train. Using the\n", + "chain rule (or just the back-propagation algorithm) we can find all\n", + "derivatives. Since we will use automatic differentiation in reverse\n", + "mode, we start with the derivatives of the cost function with respect\n", + "to the parameters of the output layer, namely" + ] + }, + { + "cell_type": "markdown", + "id": "37f19e78", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5505aab8", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "d55d045c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "04f101e7", + "metadata": { + "editable": true + }, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "bfab2e91", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77f35b7e", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives of the hidden layer\n", + "\n", + "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)" + ] + }, + { + "cell_type": "markdown", + "id": "8cf4a606", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}= \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "86951351", + "metadata": { + "editable": true + }, + "source": [ + "which, noting that" + ] + }, + { + "cell_type": "markdown", + "id": "73414e65", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8f0aaa15", + "metadata": { + "editable": true + }, + "source": [ + "allows us to rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "730c5415", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1afcb5a1", + "metadata": { + "editable": true + }, + "source": [ + "## Final expression\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "7f30cb44", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "14c045ce", + "metadata": { + "editable": true + }, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "0c1a2c68", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a3385222", + "metadata": { + "editable": true + }, + "source": [ + "Similarly, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "18ee3804", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{12}^{(1)}}=\\delta_1^{(1)}a_2^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ad741d56", + "metadata": { + "editable": true + }, + "source": [ + "## Completing the list\n", + "\n", + "Similarly, we find" + ] + }, + { + "cell_type": "markdown", + "id": "65870a70", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{21}^{(1)}}=\\delta_2^{(1)}a_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f7807fdc", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9af4a759", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{22}^{(1)}}=\\delta_2^{(1)}a_2^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dc548cb7", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "83b75e94", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_2^{(1)}=w_2^{(2)}\\frac{\\partial a_2^{(1)}}{\\partial z_2^{(1)}}\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1c2be559", + "metadata": { + "editable": true + }, + "source": [ + "## Final expressions for the biases of the hidden layer\n", + "\n", + "For the sake of completeness, we list the derivatives of the biases, which are" + ] + }, + { + "cell_type": "markdown", + "id": "18b85f86", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "63e39eb4", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "a55371c1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{2}^{(1)}}=\\delta_2^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fa31a9b3", + "metadata": { + "editable": true + }, + "source": [ + "As we will see below, these expressions can be generalized in a more compact form." + ] + }, + { + "cell_type": "markdown", + "id": "580df891", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient expressions\n", + "\n", + "For this specific model, with just one output node and two hidden\n", + "nodes, the gradient descent equations take the following form for output layer" + ] + }, + { + "cell_type": "markdown", + "id": "c10bf2ce", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0bae11f8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ed4a8b93", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2d582987", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "5fa760a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc9de8bf", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f00e3ace", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ac96362", + "metadata": { + "editable": true + }, + "source": [ + "where $\\eta$ is the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "9c46f966", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the equations for a neural network\n", + "\n", + "The questions we want to ask are how do changes in the biases and the\n", + "weights in our network change the cost function and how can we use the\n", + "final output to modify the weights and biases?\n", + "\n", + "To derive these equations let us start with a plain regression problem\n", + "and define our cost function as" + ] + }, + { + "cell_type": "markdown", + "id": "ea509b11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e08ff771", + "metadata": { + "editable": true + }, + "source": [ + "where the $y_i$s are our $n$ targets (the values we want to\n", + "reproduce), while the outputs of the network after having propagated\n", + "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$." + ] + }, + { + "cell_type": "markdown", + "id": "6f476983", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0535d087", + "metadata": { + "editable": true + }, + "source": [ + "## Definitions\n", + "\n", + "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n", + "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n", + "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n", + "$l$-th layer as a function of the bias, the weights which add up from\n", + "the previous layer $l-1$ and the forward passes/outputs\n", + "$\\boldsymbol{a}^{l-1}$ from the previous layer as" + ] + }, + { + "cell_type": "markdown", + "id": "5e024ec1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "239fb4c6", + "metadata": { + "editable": true + }, + "source": [ + "where $b_k^l$ are the biases from layer $l$. Here $M_{l-1}$\n", + "represents the total number of nodes/neurons/units of layer $l-1$. The\n", + "figure in the whiteboard notes illustrates this equation. We can rewrite this in a more\n", + "compact form as the matrix-vector products we discussed earlier," + ] + }, + { + "cell_type": "markdown", + "id": "7e4fa6c5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}^l = \\left(\\boldsymbol{W}^l\\right)^T\\boldsymbol{a}^{l-1}+\\boldsymbol{b}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c47cc3c6", + "metadata": { + "editable": true + }, + "source": [ + "## Inputs to the activation function\n", + "\n", + "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n", + "output of layer $l$ as $\\boldsymbol{a}^l = \\sigma(\\boldsymbol{z}^l)$ where $\\sigma$ is our\n", + "activation function. In the examples here we will use the sigmoid\n", + "function discussed in our logistic regression lectures. We will also use the same activation function $\\sigma$ for all layers\n", + "and their nodes. It means we have" + ] + }, + { + "cell_type": "markdown", + "id": "4eb89f11", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "92744a90", + "metadata": { + "editable": true + }, + "source": [ + "## Layout of input to first hidden layer $l=1$ from input layer $l=0$\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "35424d45", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives and the chain rule\n", + "\n", + "From the definition of the input variable to the activation function, that is $z_j^l$ we have" + ] + }, + { + "cell_type": "markdown", + "id": "b8502930", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "81ad45a5", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "11bb8afb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b53ec752", + "metadata": { + "editable": true + }, + "source": [ + "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)" + ] + }, + { + "cell_type": "markdown", + "id": "b7519a84", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c57689db", + "metadata": { + "editable": true + }, + "source": [ + "## Derivative of the cost function\n", + "\n", + "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n", + "\n", + "Let us specialize to the output layer $l=L$. Our cost function is" + ] + }, + { + "cell_type": "markdown", + "id": "a9f83b15", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "{\\cal C}(\\boldsymbol{\\Theta}^L) = \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "067c2583", + "metadata": { + "editable": true + }, + "source": [ + "The derivative of this function with respect to the weights is" + ] + }, + { + "cell_type": "markdown", + "id": "43545710", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L} = \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eb33717", + "metadata": { + "editable": true + }, + "source": [ + "The last partial derivative can easily be computed and reads (by applying the chain rule)" + ] + }, + { + "cell_type": "markdown", + "id": "e09a8734", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3dc0f5a3", + "metadata": { + "editable": true + }, + "source": [ + "## The back propagation equations for a neural network\n", + "\n", + "We have thus" + ] + }, + { + "cell_type": "markdown", + "id": "bb58784b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_i^{L-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10aea094", + "metadata": { + "editable": true + }, + "source": [ + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "b7cc2db8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6cce9a62", + "metadata": { + "editable": true + }, + "source": [ + "and using the Hadamard product of two vectors we can write this as" + ] + }, + { + "cell_type": "markdown", + "id": "43e5a84b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}^L = \\sigma'(\\boldsymbol{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d5c607a7", + "metadata": { + "editable": true + }, + "source": [ + "## Analyzing the last results\n", + "\n", + "This is an important expression. The second term on the right handside\n", + "measures how fast the cost function is changing as a function of the $j$th\n", + "output activation. If, for example, the cost function doesn't depend\n", + "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n", + "which is what we would expect. The first term on the right, measures\n", + "how fast the activation function $f$ is changing at a given activation\n", + "value $z_j^L$." + ] + }, + { + "cell_type": "markdown", + "id": "a51b3b58", + "metadata": { + "editable": true + }, + "source": [ + "## More considerations\n", + "\n", + "Notice that everything in the above equations is easily computed. In\n", + "particular, we compute $z_j^L$ while computing the behaviour of the\n", + "network, and it is only a small additional overhead to compute\n", + "$\\sigma'(z^L_j)$. The exact form of the derivative with respect to the\n", + "output depends on the form of the cost function.\n", + "However, provided the cost function is known there should be little\n", + "trouble in calculating" + ] + }, + { + "cell_type": "markdown", + "id": "4cd9d058", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c80b630d", + "metadata": { + "editable": true + }, + "source": [ + "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" + ] + }, + { + "cell_type": "markdown", + "id": "dc0c1a06", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}}{\\partial w_{ij}^L} = \\delta_j^La_i^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8f2065b7", + "metadata": { + "editable": true + }, + "source": [ + "## Derivatives in terms of $z_j^L$\n", + "\n", + "It is also easy to see that our previous equation can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "7f89b9d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "49c2cd3f", + "metadata": { + "editable": true + }, + "source": [ + "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" + ] + }, + { + "cell_type": "markdown", + "id": "517b1a37", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65c8107f", + "metadata": { + "editable": true + }, + "source": [ + "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." + ] + }, + { + "cell_type": "markdown", + "id": "2a10f902", + "metadata": { + "editable": true + }, + "source": [ + "## Bringing it together\n", + "\n", + "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are" + ] + }, + { + "cell_type": "markdown", + "id": "b2ebf9c2", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\frac{\\partial{\\cal C}(\\boldsymbol{W^L})}{\\partial w_{ij}^L} = \\delta_j^La_i^{L-1},\n", + "\\label{_auto1} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "90336322", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "f25ff166", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "\\label{_auto2} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4cf11d5e", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2670748d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "\\label{_auto3} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "18c29f71", + "metadata": { + "editable": true + }, + "source": [ + "## Final back propagating equation\n", + "\n", + "We have that (replacing $L$ with a general layer $l$)" + ] + }, + { + "cell_type": "markdown", + "id": "c593470c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "28e8caef", + "metadata": { + "editable": true + }, + "source": [ + "We want to express this in terms of the equations for layer $l+1$." + ] + }, + { + "cell_type": "markdown", + "id": "516de9d7", + "metadata": { + "editable": true + }, + "source": [ + "## Using the chain rule and summing over all $k$ entries\n", + "\n", + "We obtain" + ] + }, + { + "cell_type": "markdown", + "id": "004c0bf4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d62a3b1f", + "metadata": { + "editable": true + }, + "source": [ + "and recalling that" + ] + }, + { + "cell_type": "markdown", + "id": "e9af770e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eca56f17", + "metadata": { + "editable": true + }, + "source": [ + "with $M_l$ being the number of nodes in layer $l$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "bb0e4414", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a4b190fc", + "metadata": { + "editable": true + }, + "source": [ + "This is our final equation.\n", + "\n", + "We are now ready to set up the algorithm for back propagation and learning the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "ec0f87c0", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n", + "\n", + "**The architecture (our model).**\n", + "\n", + "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n", + "\n", + "2. Define the number of hidden layers and hidden nodes\n", + "\n", + "3. Define activation functions for hidden layers and output layers\n", + "\n", + "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n", + "\n", + "5. Define cost function and possible regularization terms with hyperparameters\n", + "\n", + "6. Initialize weights and biases\n", + "\n", + "7. Fix number of iterations for the feed forward part and back propagation part" + ] + }, + { + "cell_type": "markdown", + "id": "2fb45155", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 1\n", + "\n", + "The four equations provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n", + "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\boldsymbol{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "3d5c2a0e", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "9183bbd0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "32ece956", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "466d6bda", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f31b228", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "fbeac005", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bc6ae984", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65f3133d", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "5d27bbe1", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "5e5d0aa0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ea32e5bb", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "3a9bb5a6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9008dcf8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "89aba7d6", + "metadata": { + "editable": true + }, + "source": [ + "## Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). The following\n", + "restrictions are imposed on an activation function for an FFNN to\n", + "fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "ea0cdce2", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, Logistic and Hyperbolic ones\n", + "\n", + "The second requirement excludes all linear functions. Furthermore, in\n", + "a MLP with only linear activation functions, each layer simply\n", + "performs a linear transformation of its inputs.\n", + "\n", + "Regardless of the number of layers, the output of the NN will be\n", + "nothing but a linear function of the inputs. Thus we need to introduce\n", + "some kind of non-linearity to the NN to be able to fit non-linear\n", + "functions Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "91342c80", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd6eb22a", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "4e75b2ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1626d9b7", + "metadata": { + "editable": true + }, + "source": [ + "## Relevance\n", + "\n", + "The *sigmoid* function are more biologically plausible because the\n", + "output of inactive neurons are zero. Such activation function are\n", + "called *one-sided*. However, it has been shown that the hyperbolic\n", + "tangent performs better than the sigmoid for training MLPs. has\n", + "become the most popular for *deep neural networks*" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4ac7c23c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "\"\"\"The sigmoid function (or the logistic curve) is a \n", + "function that takes any real number, z, and outputs a number (0,1).\n", + "It is useful in neural networks for assigning weights on a relative scale.\n", + "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", + "\n", + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import math as mt\n", + "\n", + "z = numpy.arange(-5, 5, .1)\n", + "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", + "sigma = sigma_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, sigma)\n", + "ax.set_ylim([-0.1, 1.1])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sigmoid function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Step Function\"\"\"\n", + "z = numpy.arange(-5, 5, .02)\n", + "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", + "step = step_fn(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, step)\n", + "ax.set_ylim([-0.5, 1.5])\n", + "ax.set_xlim([-5,5])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('step function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Sine Function\"\"\"\n", + "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", + "t = numpy.sin(z)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, t)\n", + "ax.set_ylim([-1.0, 1.0])\n", + "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('sine function')\n", + "\n", + "plt.show()\n", + "\n", + "\"\"\"Plots a graph of the squashing function used by a rectified linear\n", + "unit\"\"\"\n", + "z = numpy.arange(-2, 2, .1)\n", + "zero = numpy.zeros(len(z))\n", + "y = numpy.max([zero, z], axis=0)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.add_subplot(111)\n", + "ax.plot(z, y)\n", + "ax.set_ylim([-2.0, 2.0])\n", + "ax.set_xlim([-2.0, 2.0])\n", + "ax.grid(True)\n", + "ax.set_xlabel('z')\n", + "ax.set_title('Rectified linear unit')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6aeb0ee4", + "metadata": { + "editable": true + }, + "source": [ + "## Vanishing gradients\n", + "\n", + "The Back propagation algorithm we derived above works by going from\n", + "the output layer to the input layer, propagating the error gradient on\n", + "the way. Once the algorithm has computed the gradient of the cost\n", + "function with regards to each parameter in the network, it uses these\n", + "gradients to update each parameter with a Gradient Descent (GD) step.\n", + "\n", + "Unfortunately for us, the gradients often get smaller and smaller as\n", + "the algorithm progresses down to the first hidden layers. As a result,\n", + "the GD update leaves the lower layer connection weights virtually\n", + "unchanged, and training never converges to a good solution. This is\n", + "known in the literature as **the vanishing gradients problem**." + ] + }, + { + "cell_type": "markdown", + "id": "ea47d1d6", + "metadata": { + "editable": true + }, + "source": [ + "## Exploding gradients\n", + "\n", + "In other cases, the opposite can happen, namely the the gradients can\n", + "grow bigger and bigger. The result is that many of the layers get\n", + "large updates of the weights the algorithm diverges. This is the\n", + "**exploding gradients problem**, which is mostly encountered in\n", + "recurrent neural networks. More generally, deep neural networks suffer\n", + "from unstable gradients, different layers may learn at widely\n", + "different speeds" + ] + }, + { + "cell_type": "markdown", + "id": "1947aa95", + "metadata": { + "editable": true + }, + "source": [ + "## Is the Logistic activation function (Sigmoid) our choice?\n", + "\n", + "Although this unfortunate behavior has been empirically observed for\n", + "quite a while (it was one of the reasons why deep neural networks were\n", + "mostly abandoned for a long time), it is only around 2010 that\n", + "significant progress was made in understanding it.\n", + "\n", + "A paper titled [Understanding the Difficulty of Training Deep\n", + "Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio](http://proceedings.mlr.press/v9/glorot10a.html) found that\n", + "the problems with the popular logistic\n", + "sigmoid activation function and the weight initialization technique\n", + "that was most popular at the time, namely random initialization using\n", + "a normal distribution with a mean of 0 and a standard deviation of\n", + "1." + ] + }, + { + "cell_type": "markdown", + "id": "d024119f", + "metadata": { + "editable": true + }, + "source": [ + "## Logistic function as the root of problems\n", + "\n", + "They showed that with this activation function and this\n", + "initialization scheme, the variance of the outputs of each layer is\n", + "much greater than the variance of its inputs. Going forward in the\n", + "network, the variance keeps increasing after each layer until the\n", + "activation function saturates at the top layers. This is actually made\n", + "worse by the fact that the logistic function has a mean of 0.5, not 0\n", + "(the hyperbolic tangent function has a mean of 0 and behaves slightly\n", + "better than the logistic function in deep networks)." + ] + }, + { + "cell_type": "markdown", + "id": "c9178132", + "metadata": { + "editable": true + }, + "source": [ + "## The derivative of the Logistic funtion\n", + "\n", + "Looking at the logistic activation function, when inputs become large\n", + "(negative or positive), the function saturates at 0 or 1, with a\n", + "derivative extremely close to 0. Thus when backpropagation kicks in,\n", + "it has virtually no gradient to propagate back through the network,\n", + "and what little gradient exists keeps getting diluted as\n", + "backpropagation progresses down through the top layers, so there is\n", + "really nothing left for the lower layers.\n", + "\n", + "In their paper, Glorot and Bengio propose a way to significantly\n", + "alleviate this problem. We need the signal to flow properly in both\n", + "directions: in the forward direction when making predictions, and in\n", + "the reverse direction when backpropagating gradients. We don’t want\n", + "the signal to die out, nor do we want it to explode and saturate. For\n", + "the signal to flow properly, the authors argue that we need the\n", + "variance of the outputs of each layer to be equal to the variance of\n", + "its inputs, and we also need the gradients to have equal variance\n", + "before and after flowing through a layer in the reverse direction." + ] + }, + { + "cell_type": "markdown", + "id": "756185f5", + "metadata": { + "editable": true + }, + "source": [ + "## Insights from the paper by Glorot and Bengio\n", + "\n", + "One of the insights in the 2010 paper by Glorot and Bengio was that\n", + "the vanishing/exploding gradients problems were in part due to a poor\n", + "choice of activation function. Until then most people had assumed that\n", + "if Nature had chosen to use roughly sigmoid activation functions in\n", + "biological neurons, they must be an excellent choice. But it turns out\n", + "that other activation functions behave much better in deep neural\n", + "networks, in particular the ReLU activation function, mostly because\n", + "it does not saturate for positive values (and also because it is quite\n", + "fast to compute)." + ] + }, + { + "cell_type": "markdown", + "id": "3d92cad4", + "metadata": { + "editable": true + }, + "source": [ + "## The RELU function family\n", + "\n", + "The ReLU activation function suffers from a problem known as the dying\n", + "ReLUs: during training, some neurons effectively die, meaning they\n", + "stop outputting anything other than 0.\n", + "\n", + "In some cases, you may find that half of your network’s neurons are\n", + "dead, especially if you used a large learning rate. During training,\n", + "if a neuron’s weights get updated such that the weighted sum of the\n", + "neuron’s inputs is negative, it will start outputting 0. When this\n", + "happen, the neuron is unlikely to come back to life since the gradient\n", + "of the ReLU function is 0 when its input is negative." + ] + }, + { + "cell_type": "markdown", + "id": "cbc6f721", + "metadata": { + "editable": true + }, + "source": [ + "## ELU function\n", + "\n", + "To solve this problem, nowadays practitioners use a variant of the\n", + "ReLU function, such as the leaky ReLU discussed above or the so-called\n", + "exponential linear unit (ELU) function" + ] + }, + { + "cell_type": "markdown", + "id": "9249dc7b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\ z & z \\ge 0.\\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e59de3af", + "metadata": { + "editable": true + }, + "source": [ + "## Which activation function should we use?\n", + "\n", + "In general it seems that the ELU activation function is better than\n", + "the leaky ReLU function (and its variants), which is better than\n", + "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n", + "than the logistic function.\n", + "\n", + "If runtime performance is an issue, then you may opt for the leaky\n", + "ReLU function over the ELU function If you don’t want to tweak yet\n", + "another hyperparameter, you may just use the default $\\alpha$ of\n", + "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n", + "computing power, you can use cross-validation or bootstrap to evaluate\n", + "other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "e2da998c", + "metadata": { + "editable": true + }, + "source": [ + "## More on activation functions, output layers\n", + "\n", + "In most cases you can use the ReLU activation function in the hidden\n", + "layers (or one of its variants).\n", + "\n", + "It is a bit faster to compute than other activation functions, and the\n", + "gradient descent optimization does in general not get stuck.\n", + "\n", + "**For the output layer:**\n", + "\n", + "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n", + "\n", + "* For regression tasks, you can simply use no activation function at all." + ] + }, + { + "cell_type": "markdown", + "id": "e1abf01e", + "metadata": { + "editable": true + }, + "source": [ + "## Fine-tuning neural network hyperparameters\n", + "\n", + "The flexibility of neural networks is also one of their main\n", + "drawbacks: there are many hyperparameters to tweak. Not only can you\n", + "use any imaginable network topology (how neurons/nodes are\n", + "interconnected), but even in a simple FFNN you can change the number\n", + "of layers, the number of neurons per layer, the type of activation\n", + "function to use in each layer, the weight initialization logic, the\n", + "stochastic gradient optmized and much more. How do you know what\n", + "combination of hyperparameters is the best for your task?\n", + "\n", + "* You can use grid search with cross-validation to find the right hyperparameters.\n", + "\n", + "However,since there are many hyperparameters to tune, and since\n", + "training a neural network on a large dataset takes a lot of time, you\n", + "will only be able to explore a tiny part of the hyperparameter space.\n", + "\n", + "* You can use randomized search.\n", + "\n", + "* Or use tools like [Oscar](http://oscar.calldesk.ai/), which implements more complex algorithms to help you find a good set of hyperparameters quickly." + ] + }, + { + "cell_type": "markdown", + "id": "a8ded7cd", + "metadata": { + "editable": true + }, + "source": [ + "## Hidden layers\n", + "\n", + "For many problems you can start with just one or two hidden layers and\n", + "it will work just fine. For the MNIST data set discussed below you can easily get a\n", + "high accuracy using just one hidden layer with a few hundred neurons.\n", + "You can reach for this data set above 98% accuracy using two hidden\n", + "layers with the same total amount of neurons, in roughly the same\n", + "amount of training time.\n", + "\n", + "For more complex problems, you can gradually ramp up the number of\n", + "hidden layers, until you start overfitting the training set. Very\n", + "complex tasks, such as large image classification or speech\n", + "recognition, typically require networks with dozens of layers and they\n", + "need a huge amount of training data. However, you will rarely have to\n", + "train such networks from scratch: it is much more common to reuse\n", + "parts of a pretrained state-of-the-art network that performs a similar\n", + "task." + ] + }, + { + "cell_type": "markdown", + "id": "96da4f48", + "metadata": { + "editable": true + }, + "source": [ + "## Batch Normalization\n", + "\n", + "Batch Normalization aims to address the vanishing/exploding gradients\n", + "problems, and more generally the problem that the distribution of each\n", + "layer’s inputs changes during training, as the parameters of the\n", + "previous layers change.\n", + "\n", + "The technique consists of adding an operation in the model just before\n", + "the activation function of each layer, simply zero-centering and\n", + "normalizing the inputs, then scaling and shifting the result using two\n", + "new parameters per layer (one for scaling, the other for shifting). In\n", + "other words, this operation lets the model learn the optimal scale and\n", + "mean of the inputs for each layer. In order to zero-center and\n", + "normalize the inputs, the algorithm needs to estimate the inputs’ mean\n", + "and standard deviation. It does so by evaluating the mean and standard\n", + "deviation of the inputs over the current mini-batch, from this the\n", + "name batch normalization." + ] + }, + { + "cell_type": "markdown", + "id": "395346a7", + "metadata": { + "editable": true + }, + "source": [ + "## Dropout\n", + "\n", + "It is a fairly simple algorithm: at every training step, every neuron\n", + "(including the input neurons but excluding the output neurons) has a\n", + "probability $p$ of being temporarily dropped out, meaning it will be\n", + "entirely ignored during this training step, but it may be active\n", + "during the next step.\n", + "\n", + "The hyperparameter $p$ is called the dropout rate, and it is typically\n", + "set to 50%. After training, the neurons are not dropped anymore. It\n", + "is viewed as one of the most popular regularization techniques." + ] + }, + { + "cell_type": "markdown", + "id": "9c712bbb", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient Clipping\n", + "\n", + "A popular technique to lessen the exploding gradients problem is to\n", + "simply clip the gradients during backpropagation so that they never\n", + "exceed some threshold (this is mostly useful for recurrent neural\n", + "networks).\n", + "\n", + "This technique is called Gradient Clipping.\n", + "\n", + "In general however, Batch\n", + "Normalization is preferred." + ] + }, + { + "cell_type": "markdown", + "id": "2b66ea72", + "metadata": { + "editable": true + }, + "source": [ + "## A top-down perspective on Neural networks\n", + "\n", + "The first thing we would like to do is divide the data into two or\n", + "three parts. A training set, a validation or dev (development) set,\n", + "and a test set. The test set is the data on which we want to make\n", + "predictions. The dev set is a subset of the training data we use to\n", + "check how well we are doing out-of-sample, after training the model on\n", + "the training dataset. We use the validation error as a proxy for the\n", + "test error in order to make tweaks to our model. It is crucial that we\n", + "do not use any of the test data to train the algorithm. This is a\n", + "cardinal sin in ML. Then:\n", + "\n", + "1. Estimate optimal error rate\n", + "\n", + "2. Minimize underfitting (bias) on training data set.\n", + "\n", + "3. Make sure you are not overfitting." + ] + }, + { + "cell_type": "markdown", + "id": "5acbc082", + "metadata": { + "editable": true + }, + "source": [ + "## More top-down perspectives\n", + "\n", + "If the validation and test sets are drawn from the same distributions,\n", + "then a good performance on the validation set should lead to similarly\n", + "good performance on the test set. \n", + "\n", + "However, sometimes\n", + "the training data and test data differ in subtle ways because, for\n", + "example, they are collected using slightly different methods, or\n", + "because it is cheaper to collect data in one way versus another. In\n", + "this case, there can be a mismatch between the training and test\n", + "data. This can lead to the neural network overfitting these small\n", + "differences between the test and training sets, and a poor performance\n", + "on the test set despite having a good performance on the validation\n", + "set. To rectify this, Andrew Ng suggests making two validation or dev\n", + "sets, one constructed from the training data and one constructed from\n", + "the test data. The difference between the performance of the algorithm\n", + "on these two validation sets quantifies the train-test mismatch. This\n", + "can serve as another important diagnostic when using DNNs for\n", + "supervised learning." + ] + }, + { + "cell_type": "markdown", + "id": "31825b65", + "metadata": { + "editable": true + }, + "source": [ + "## Limitations of supervised learning with deep networks\n", + "\n", + "Like all statistical methods, supervised learning using neural\n", + "networks has important limitations. This is especially important when\n", + "one seeks to apply these methods, especially to physics problems. Like\n", + "all tools, DNNs are not a universal solution. Often, the same or\n", + "better performance on a task can be achieved by using a few\n", + "hand-engineered features (or even a collection of random\n", + "features)." + ] + }, + { + "cell_type": "markdown", + "id": "c76d9af9", + "metadata": { + "editable": true + }, + "source": [ + "## Limitations of NNs\n", + "\n", + "Here we list some of the important limitations of supervised neural network based models. \n", + "\n", + "* **Need labeled data**. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).\n", + "\n", + "* **Supervised neural networks are extremely data intensive.** DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs." + ] + }, + { + "cell_type": "markdown", + "id": "bdc93363", + "metadata": { + "editable": true + }, + "source": [ + "## Homogeneous data\n", + "\n", + "* **Homogeneous data.** Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types." + ] + }, + { + "cell_type": "markdown", + "id": "a1d6ff64", + "metadata": { + "editable": true + }, + "source": [ + "## More limitations\n", + "\n", + "* **Many problems are not about prediction.** In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.\n", + "\n", + "Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems." + ] + }, + { + "cell_type": "markdown", + "id": "0c2e5742", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up a Multi-layer perceptron model for classification\n", + "\n", + "We are now gong to develop an example based on the MNIST data\n", + "base. This is a classification problem and we need to use our\n", + "cross-entropy function we discussed in connection with logistic\n", + "regression. The cross-entropy defines our cost function for the\n", + "classificaton problems with neural networks.\n", + "\n", + "In binary classification with two classes $(0, 1)$ we define the\n", + "logistic/sigmoid function as the probability that a particular input\n", + "is in class $0$ or $1$. This is possible because the logistic\n", + "function takes any input from the real numbers and inputs a number\n", + "between 0 and 1, and can therefore be interpreted as a probability. It\n", + "also has other nice properties, such as a derivative that is simple to\n", + "calculate.\n", + "\n", + "For an input $\\boldsymbol{a}$ from the hidden layer, the probability that the input $\\boldsymbol{x}$\n", + "is in class 0 or 1 is just. We let $\\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$\n", + "represents our activation values $z$. We have" + ] + }, + { + "cell_type": "markdown", + "id": "d4da3f02", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "01ea2e0b", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9c1c7bec", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9238ff2d", + "metadata": { + "editable": true + }, + "source": [ + "where $y \\in \\{0, 1\\}$ and $\\boldsymbol{\\theta}$ represents the weights and biases\n", + "of our network." + ] + }, + { + "cell_type": "markdown", + "id": "3be74bd1", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the cost function\n", + "\n", + "Our cost function is given as (see the Logistic regression lectures)" + ] + }, + { + "cell_type": "markdown", + "id": "2e2fd39c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n", + "y_i \\ln[P(y_i = 0)] + (1 - y_i) \\ln [1 - P(y_i = 0)] = \\sum_{i=1}^n \\mathcal{L}_i(\\boldsymbol{\\theta}) .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42b1d26b", + "metadata": { + "editable": true + }, + "source": [ + "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n", + "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$. \n", + "The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather\n", + "than maximizing a negative number. \n", + "\n", + "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector: \n", + "\n", + "$y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and\n", + "\n", + "$y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ \n", + "\n", + "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$).. \n", + "\n", + "If $\\boldsymbol{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th\n", + "output vector $\\boldsymbol{y}_i$. \n", + "The probability of $\\boldsymbol{x}_i$ being in class $c$ will be given by the softmax function:" + ] + }, + { + "cell_type": "markdown", + "id": "f740a484", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n", + "{\\sum_{c'=0}^{C-1} \\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_{c'})}} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "19189bfc", + "metadata": { + "editable": true + }, + "source": [ + "which reduces to the logistic function in the binary case. \n", + "The likelihood of this $C$-class classifier\n", + "is now given as:" + ] + }, + { + "cell_type": "markdown", + "id": "aeb3ef60", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dbf419a1", + "metadata": { + "editable": true + }, + "source": [ + "Again we take the negative log-likelihood to define our cost function:" + ] + }, + { + "cell_type": "markdown", + "id": "9e345753", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3b13095e", + "metadata": { + "editable": true + }, + "source": [ + "See the logistic regression lectures for a full definition of the cost function.\n", + "\n", + "The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!" + ] + }, + { + "cell_type": "markdown", + "id": "96501a91", + "metadata": { + "editable": true + }, + "source": [ + "## Example: binary classification problem\n", + "\n", + "As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\\beta$ as" + ] + }, + { + "cell_type": "markdown", + "id": "48cf79fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3243c0b1", + "metadata": { + "editable": true + }, + "source": [ + "where we had defined the logistic (sigmoid) function" + ] + }, + { + "cell_type": "markdown", + "id": "bb312a09", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "484cf2b4", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2b9c5483", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ca21f09", + "metadata": { + "editable": true + }, + "source": [ + "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n", + "\n", + "Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. \n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "4852e4d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e3b7cbef", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "0c1e69a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e71df7f4", + "metadata": { + "editable": true + }, + "source": [ + "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n", + "Our cost function at the final layer $l=L$ is now" + ] + }, + { + "cell_type": "markdown", + "id": "50d6fecc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e145e461", + "metadata": { + "editable": true + }, + "source": [ + "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get" + ] + }, + { + "cell_type": "markdown", + "id": "97f13260", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4361ce3b", + "metadata": { + "editable": true + }, + "source": [ + "In case we use another activation function than the logistic one, we need to evaluate other derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "52a16654", + "metadata": { + "editable": true + }, + "source": [ + "## The Softmax function\n", + "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need" + ] + }, + { + "cell_type": "markdown", + "id": "3bfb321e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n", + "\\frac{\\partial f(z_i^l)}{\\partial z_j^l} \\frac{\\partial z_j^l}{\\partial w_{jk}^l}= \\frac{\\partial f(z_i^l)}{\\partial z_j^l}a_k^{l-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eccac6c9", + "metadata": { + "editable": true + }, + "source": [ + "For the Softmax function we have" + ] + }, + { + "cell_type": "markdown", + "id": "23634198", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7a2e75ba", + "metadata": { + "editable": true + }, + "source": [ + "Its derivative with respect to $z_j^l$ gives" + ] + }, + { + "cell_type": "markdown", + "id": "2dad2d14", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "46415917", + "metadata": { + "editable": true + }, + "source": [ + "which in case of the simply binary model reduces to having $i=j$." + ] + }, + { + "cell_type": "markdown", + "id": "6adc7c1e", + "metadata": { + "editable": true + }, + "source": [ + "## Developing a code for doing neural networks with back propagation\n", + "\n", + "One can identify a set of key steps when using neural networks to solve supervised learning problems: \n", + "\n", + "1. Collect and pre-process data \n", + "\n", + "2. Define model and architecture \n", + "\n", + "3. Choose cost function and optimizer \n", + "\n", + "4. Train the model \n", + "\n", + "5. Evaluate model performance on test data \n", + "\n", + "6. Adjust hyperparameters (if necessary, network architecture)" + ] + }, + { + "cell_type": "markdown", + "id": "4110d83e", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Here we will be using the MNIST dataset, which is readily available through the **scikit-learn**\n", + "package. You may also find it for example [here](http://yann.lecun.com/exdb/mnist/). \n", + "The *MNIST* (Modified National Institute of Standards and Technology) database is a large database\n", + "of handwritten digits that is commonly used for training various image processing systems. \n", + "The MNIST dataset consists of 70 000 images of size $28\\times 28$ pixels, each labeled from 0 to 9. \n", + "The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\\times 8$ collected and processed from this database. \n", + "\n", + "To feed data into a feed-forward neural network we need to represent\n", + "the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$. Each\n", + "row represents an *input*, in this case a handwritten digit, and\n", + "each column represents a *feature*, in this case a pixel. The\n", + "correct answers, also known as *labels* or *targets* are\n", + "represented as a 1D array of integers \n", + "$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.\n", + "\n", + "As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from\n", + "measurements of height (in m) \n", + "and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example: \n", + "\n", + "$$ X = \\begin{bmatrix}\n", + "1.85 & 81\\\\\n", + "1.71 & 65\\\\\n", + "1.95 & 103\\\\\n", + "1.55 & 42\\\\\n", + "1.63 & 56\n", + "\\end{bmatrix} ,$$ \n", + "\n", + "and the targets would be: \n", + "\n", + "$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$ \n", + "\n", + "Since each input image is a 2D matrix, we need to flatten the image\n", + "(i.e. \"unravel\" the 2D matrix into a 1D array) to turn the data into a\n", + "design/feature matrix. This means we lose all spatial information in the\n", + "image, such as locality and translational invariance. More complicated\n", + "architectures such as Convolutional Neural Networks can take advantage\n", + "of such information, and are most commonly applied when analyzing\n", + "images." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "070c610d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "28bb6085", + "metadata": { + "editable": true + }, + "source": [ + "## Train and test datasets\n", + "\n", + "Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions. \n", + "\n", + "We will reserve $80 \\%$ of our dataset for training and $20 \\%$ for testing. \n", + "\n", + "It is important that the train and test datasets are drawn randomly from our dataset, to ensure\n", + "no bias in the sampling. \n", + "Say you are taking measurements of weather data to predict the weather in the coming 5 days.\n", + "You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data\n", + "collected from 12.00 to 24.00." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5a6ae0b0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-liner from scikit-learn library\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)\n", + "\n", + "# equivalently in numpy\n", + "def train_test_split_numpy(inputs, labels, train_size, test_size):\n", + " n_inputs = len(inputs)\n", + " inputs_shuffled = inputs.copy()\n", + " labels_shuffled = labels.copy()\n", + " \n", + " np.random.shuffle(inputs_shuffled)\n", + " np.random.shuffle(labels_shuffled)\n", + " \n", + " train_end = int(n_inputs*train_size)\n", + " X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]\n", + " Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]\n", + " \n", + " return X_train, X_test, Y_train, Y_test\n", + "\n", + "#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)\n", + "\n", + "print(\"Number of training images: \" + str(len(X_train)))\n", + "print(\"Number of test images: \" + str(len(X_test)))" + ] + }, + { + "cell_type": "markdown", + "id": "c26d604d", + "metadata": { + "editable": true + }, + "source": [ + "## Define model and architecture\n", + "\n", + "Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have \n", + "\n", + "$$ z = \\sum_{i=1}^n w_i a_i ,$$\n", + "\n", + "$$ y = f(z) ,$$\n", + "\n", + "where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer\n", + "and $w_i$ is the weight to input $i$. \n", + "The activation of the neurons in the input layer is just the features (e.g. a pixel value). \n", + "\n", + "The simplest activation function for a neuron is the *Heaviside* function:\n", + "\n", + "$$ f(z) = \n", + "\\begin{cases}\n", + "1, & z > 0\\\\\n", + "0, & \\text{otherwise}\n", + "\\end{cases}\n", + "$$\n", + "\n", + "A feed-forward neural network with this activation is known as a *perceptron*. \n", + "For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer. \n", + "This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), \n", + "and we call these architectures *multiclass perceptrons*. \n", + "\n", + "However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and \n", + "Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function. \n", + "\n", + "Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU). \n", + "We will be using the sigmoid function $\\sigma(x)$: \n", + "\n", + "$$ f(x) = \\sigma(x) = \\frac{1}{1 + e^{-x}} ,$$\n", + "\n", + "which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "2775283b", + "metadata": { + "editable": true + }, + "source": [ + "## Layers\n", + "\n", + "* Input \n", + "\n", + "Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons. \n", + "\n", + "* Hidden layer\n", + "\n", + "We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer. \n", + "Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer. \n", + "\n", + "* Output\n", + "\n", + "If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,\n", + "which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1. \n", + "\n", + "For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class. \n", + "\n", + "Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function: \n", + "\n", + "$$ P(\\text{class $j$} \\mid \\text{input $\\boldsymbol{a}$}) = \\frac{\\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_j)}}\n", + "{\\sum_{c=0}^{9} \\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_c)}} ,$$ \n", + "\n", + "i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\\boldsymbol{a}$, with $\\boldsymbol{w}_j$ the weights of neuron $j$ to the inputs. \n", + "The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1. \n", + "The exponent is just the weighted sum of inputs as before: \n", + "\n", + "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i+b_j.$$ \n", + "\n", + "Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500\n", + "weights to the output layer." + ] + }, + { + "cell_type": "markdown", + "id": "f7455c00", + "metadata": { + "editable": true + }, + "source": [ + "## Weights and biases\n", + "\n", + "Typically weights are initialized with small values distributed around zero, drawn from a uniform\n", + "or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless. \n", + "\n", + "Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range\n", + "of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$: \n", + "\n", + "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i + b_j.$$ \n", + "\n", + "The bias weights $\\boldsymbol{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "20b3c8c0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# building our neural network\n", + "\n", + "n_inputs, n_features = X_train.shape\n", + "n_hidden_neurons = 50\n", + "n_categories = 10\n", + "\n", + "# we make the weights normally distributed using numpy.random.randn\n", + "\n", + "# weights and bias in the hidden layer\n", + "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n", + "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n", + "output_bias = np.zeros(n_categories) + 0.01" + ] + }, + { + "cell_type": "markdown", + "id": "a41d9acd", + "metadata": { + "editable": true + }, + "source": [ + "## Feed-forward pass\n", + "\n", + "Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories. \n", + "For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$: \n", + "\n", + "$$ z_{j}^{l} = \\sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$\n", + "\n", + "this is then passed through our activation function \n", + "\n", + "$$ a_{j}^{l} = f(z_{j}^{l}) .$$ \n", + "\n", + "We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer: \n", + "\n", + "$$ z_{j}^{L} = \\sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$ \n", + "\n", + "Finally we calculate the output of neuron $j$ in the output layer using the softmax function: \n", + "\n", + "$$ a_{j}^{L} = \\frac{\\exp{(z_j^{L})}}\n", + "{\\sum_{c=0}^{C-1} \\exp{(z_c^{L})}} .$$" + ] + }, + { + "cell_type": "markdown", + "id": "b2f64238", + "metadata": { + "editable": true + }, + "source": [ + "## Matrix multiplications\n", + "\n", + "Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden\n", + "layer have the dimensions \n", + "$W_{hidden} = (n_{features}, n_{hidden})$,\n", + "we can easily feed the network all our training data in one go by taking the matrix product \n", + "\n", + "$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ \n", + "\n", + "and obtain a matrix that holds the weighted sum of inputs to the hidden layer\n", + "for each input image and each hidden neuron. \n", + "We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$: \n", + "\n", + "$$ \\boldsymbol{z}^{l} = \\boldsymbol{X} \\boldsymbol{W}^{l} + \\boldsymbol{b}^{l} ,$$\n", + "\n", + "meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image. \n", + "This is then passed through the activation: \n", + "\n", + "$$ \\boldsymbol{a}^{l} = f(\\boldsymbol{z}^l) .$$ \n", + "\n", + "This is fed to the output layer: \n", + "\n", + "$$ \\boldsymbol{z}^{L} = \\boldsymbol{a}^{L} \\boldsymbol{W}^{L} + \\boldsymbol{b}^{L} .$$\n", + "\n", + "Finally we receive our output values for each image and each category by passing it through the softmax function: \n", + "\n", + "$$ output = softmax (\\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "1f5589af", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# setup the feed-forward pass, subscript h = hidden layer\n", + "\n", + "def sigmoid(x):\n", + " return 1/(1 + np.exp(-x))\n", + "\n", + "def feed_forward(X):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_h = np.matmul(X, hidden_weights) + hidden_bias\n", + " # activation in the hidden layer\n", + " a_h = sigmoid(z_h)\n", + " \n", + " # weighted sum of inputs to the output layer\n", + " z_o = np.matmul(a_h, output_weights) + output_bias\n", + " # softmax output\n", + " # axis 0 holds each input and axis 1 the probabilities of each category\n", + " exp_term = np.exp(z_o)\n", + " probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + " \n", + " return probabilities\n", + "\n", + "probabilities = feed_forward(X_train)\n", + "print(\"probabilities = (n_inputs, n_categories) = \" + str(probabilities.shape))\n", + "print(\"probability that image 0 is in category 0,1,2,...,9 = \\n\" + str(probabilities[0]))\n", + "print(\"probabilities sum up to: \" + str(probabilities[0].sum()))\n", + "print()\n", + "\n", + "# we obtain a prediction by taking the class with the highest likelihood\n", + "def predict(X):\n", + " probabilities = feed_forward(X)\n", + " return np.argmax(probabilities, axis=1)\n", + "\n", + "predictions = predict(X_train)\n", + "print(\"predictions = (n_inputs) = \" + str(predictions.shape))\n", + "print(\"prediction for image 0: \" + str(predictions[0]))\n", + "print(\"correct label for image 0: \" + str(Y_train[0]))" + ] + }, + { + "cell_type": "markdown", + "id": "4518e911", + "metadata": { + "editable": true + }, + "source": [ + "## Choose cost function and optimizer\n", + "\n", + "To measure how well our neural network is doing we need to introduce a cost function. \n", + "We will call the function that gives the error of a single sample output the *loss* function, and the function\n", + "that gives the total error of our network across all samples the *cost* function.\n", + "A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood. \n", + "\n", + "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector: \n", + "\n", + "$$ y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$ \n", + "\n", + "$$ y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$ \n", + "\n", + "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset. \n", + "\n", + "Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector. \n", + "We define the cost function $\\mathcal{C}$ as a sum over the cross-entropy loss for each point $\\boldsymbol{x}_i$ in the dataset.\n", + "\n", + "In the one-hot representation only one of the terms in the loss function is non-zero, namely the\n", + "probability of the correct category $c'$ \n", + "(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong\n", + "you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\\boldsymbol{\\theta}$ represents the parameters of our network, i.e. all the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "d519516b", + "metadata": { + "editable": true + }, + "source": [ + "## Optimizing the cost function\n", + "\n", + "The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent\n", + "is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function. \n", + "Each parameter $\\theta$ is iteratively adjusted according to the rule \n", + "\n", + "$$ \\theta_{i+1} = \\theta_i - \\eta \\nabla \\mathcal{C}(\\theta_i) ,$$\n", + "\n", + "where $\\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum. \n", + "This update can be repeated for any number of iterations, or until we are satisfied with the result. \n", + "\n", + "A simple and effective improvement is a variant called *Batch Gradient Descent*. \n", + "Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient\n", + "on a subset of the data called a *minibatch*. \n", + "If there are $N$ data points and we have a minibatch size of $M$, the total number of batches\n", + "is $N/M$. \n", + "We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes: \n", + "\n", + "$$ \\nabla \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\nabla \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n", + "\\frac{1}{M} \\sum_{i \\in B_k} \\nabla \\mathcal{L}_i(\\theta) ,$$\n", + "\n", + "i.e. instead of averaging the loss over the entire dataset, we average over a minibatch. \n", + "\n", + "This has two important benefits: \n", + "1. Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima. \n", + "\n", + "2. It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient. \n", + "\n", + "The various optmization methods, with codes and algorithms, are discussed in our lectures on [Gradient descent approaches](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + ] + }, + { + "cell_type": "markdown", + "id": "46b71202", + "metadata": { + "editable": true + }, + "source": [ + "## Regularization\n", + "\n", + "It is common to add an extra term to the cost function, proportional\n", + "to the size of the weights. This is equivalent to constraining the\n", + "size of the weights, so that they do not grow out of control.\n", + "Constraining the size of the weights means that the weights cannot\n", + "grow arbitrarily large to fit the training data, and in this way\n", + "reduces *overfitting*.\n", + "\n", + "We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes: \n", + "\n", + "$$ \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n", + "\\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) + \\lambda \\lvert \\lvert \\boldsymbol{w} \\rvert \\rvert_2^2 \n", + "= \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}(\\theta) + \\lambda \\sum_{ij} w_{ij}^2,$$ \n", + "\n", + "i.e. we sum up all the weights squared. The factor $\\lambda$ is known as a regularization parameter.\n", + "\n", + "In order to train the model, we need to calculate the derivative of\n", + "the cost function with respect to every bias and weight in the\n", + "network. In total our network has $(64 + 1)\\times 50=3250$ weights in\n", + "the hidden layer and $(50 + 1)\\times 10=510$ weights to the output\n", + "layer ($+1$ for the bias), and the gradient must be calculated for\n", + "every parameter. We use the *backpropagation* algorithm discussed\n", + "above. This is a clever use of the chain rule that allows us to\n", + "calculate the gradient efficently." + ] + }, + { + "cell_type": "markdown", + "id": "129c39d3", + "metadata": { + "editable": true + }, + "source": [ + "## Matrix multiplication\n", + "\n", + "To more efficently train our network these equations are implemented using matrix operations. \n", + "The error in the output layer is calculated simply as, with $\\boldsymbol{t}$ being our targets, \n", + "\n", + "$$ \\delta_L = \\boldsymbol{t} - \\boldsymbol{y} = (n_{inputs}, n_{categories}) .$$ \n", + "\n", + "The gradient for the output weights is calculated as \n", + "\n", + "$$ \\nabla W_{L} = \\boldsymbol{a}^T \\delta_L = (n_{hidden}, n_{categories}) ,$$\n", + "\n", + "where $\\boldsymbol{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input. \n", + "Since we are going backwards we have to transpose the activation matrix. \n", + "\n", + "The gradient with respect to the output bias is then \n", + "\n", + "$$ \\nabla \\boldsymbol{b}_{L} = \\sum_{i=1}^{n_{inputs}} \\delta_L = (n_{categories}) .$$ \n", + "\n", + "The error in the hidden layer is \n", + "\n", + "$$ \\Delta_h = \\delta_L W_{L}^T \\circ f'(z_{h}) = \\delta_L W_{L}^T \\circ a_{h} \\circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$ \n", + "\n", + "where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean\n", + "that we are summing up the products for each neuron in the output layer. The symbol $\\circ$ denotes\n", + "the *Hadamard product*, meaning element-wise multiplication. \n", + "\n", + "This again gives us the gradients in the hidden layer: \n", + "\n", + "$$ \\nabla W_{h} = X^T \\delta_h = (n_{features}, n_{hidden}) ,$$ \n", + "\n", + "$$ \\nabla b_{h} = \\sum_{i=1}^{n_{inputs}} \\delta_h = (n_{hidden}) .$$" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "8abafb44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# to categorical turns our integer vector into a onehot representation\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "# one-hot in numpy\n", + "def to_categorical_numpy(integer_vector):\n", + " n_inputs = len(integer_vector)\n", + " n_categories = np.max(integer_vector) + 1\n", + " onehot_vector = np.zeros((n_inputs, n_categories))\n", + " onehot_vector[range(n_inputs), integer_vector] = 1\n", + " \n", + " return onehot_vector\n", + "\n", + "#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)\n", + "Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)\n", + "\n", + "def feed_forward_train(X):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_h = np.matmul(X, hidden_weights) + hidden_bias\n", + " # activation in the hidden layer\n", + " a_h = sigmoid(z_h)\n", + " \n", + " # weighted sum of inputs to the output layer\n", + " z_o = np.matmul(a_h, output_weights) + output_bias\n", + " # softmax output\n", + " # axis 0 holds each input and axis 1 the probabilities of each category\n", + " exp_term = np.exp(z_o)\n", + " probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + " \n", + " # for backpropagation need activations in hidden and output layers\n", + " return a_h, probabilities\n", + "\n", + "def backpropagation(X, Y):\n", + " a_h, probabilities = feed_forward_train(X)\n", + " \n", + " # error in the output layer\n", + " error_output = probabilities - Y\n", + " # error in the hidden layer\n", + " error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)\n", + " \n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_h.T, error_output)\n", + " output_bias_gradient = np.sum(error_output, axis=0)\n", + " \n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(X.T, error_hidden)\n", + " hidden_bias_gradient = np.sum(error_hidden, axis=0)\n", + "\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "print(\"Old accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))\n", + "\n", + "eta = 0.01\n", + "lmbd = 0.01\n", + "for i in range(1000):\n", + " # calculate gradients\n", + " dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)\n", + " \n", + " # regularization term gradients\n", + " dWo += lmbd * output_weights\n", + " dWh += lmbd * hidden_weights\n", + " \n", + " # update weights and biases\n", + " output_weights -= eta * dWo\n", + " output_bias -= eta * dBo\n", + " hidden_weights -= eta * dWh\n", + " hidden_bias -= eta * dBh\n", + "\n", + "print(\"New accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))" + ] + }, + { + "cell_type": "markdown", + "id": "e95c7166", + "metadata": { + "editable": true + }, + "source": [ + "## Improving performance\n", + "\n", + "As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image. \n", + "In order to obtain a network that does something useful, we will have to do a bit more work. \n", + "\n", + "The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\\lambda = 10^{-6},...,10^{-0}$. \n", + "\n", + "Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period\n", + "going through the entire dataset ($n/M$ batches) an *epoch*.\n", + "\n", + "If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers. \n", + "Andrew Ng goes through some of these considerations in this [video](https://youtu.be/F1ka6a13S9I). You can find a summary of the video [here](https://kevinzakka.github.io/2016/09/26/applying-deep-learning/)." + ] + }, + { + "cell_type": "markdown", + "id": "b4365471", + "metadata": { + "editable": true + }, + "source": [ + "## Full object-oriented implementation\n", + "\n", + "It is very natural to think of the network as an object, with specific instances of the network\n", + "being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5a0357b2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "class NeuralNetwork:\n", + " def __init__(\n", + " self,\n", + " X_data,\n", + " Y_data,\n", + " n_hidden_neurons=50,\n", + " n_categories=10,\n", + " epochs=10,\n", + " batch_size=100,\n", + " eta=0.1,\n", + " lmbd=0.0):\n", + "\n", + " self.X_data_full = X_data\n", + " self.Y_data_full = Y_data\n", + "\n", + " self.n_inputs = X_data.shape[0]\n", + " self.n_features = X_data.shape[1]\n", + " self.n_hidden_neurons = n_hidden_neurons\n", + " self.n_categories = n_categories\n", + "\n", + " self.epochs = epochs\n", + " self.batch_size = batch_size\n", + " self.iterations = self.n_inputs // self.batch_size\n", + " self.eta = eta\n", + " self.lmbd = lmbd\n", + "\n", + " self.create_biases_and_weights()\n", + "\n", + " def create_biases_and_weights(self):\n", + " self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)\n", + " self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01\n", + "\n", + " self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)\n", + " self.output_bias = np.zeros(self.n_categories) + 0.01\n", + "\n", + " def feed_forward(self):\n", + " # feed-forward for training\n", + " self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias\n", + " self.a_h = sigmoid(self.z_h)\n", + "\n", + " self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias\n", + "\n", + " exp_term = np.exp(self.z_o)\n", + " self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + "\n", + " def feed_forward_out(self, X):\n", + " # feed-forward for output\n", + " z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias\n", + " a_h = sigmoid(z_h)\n", + "\n", + " z_o = np.matmul(a_h, self.output_weights) + self.output_bias\n", + " \n", + " exp_term = np.exp(z_o)\n", + " probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n", + " return probabilities\n", + "\n", + " def backpropagation(self):\n", + " error_output = self.probabilities - self.Y_data\n", + " error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)\n", + "\n", + " self.output_weights_gradient = np.matmul(self.a_h.T, error_output)\n", + " self.output_bias_gradient = np.sum(error_output, axis=0)\n", + "\n", + " self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)\n", + " self.hidden_bias_gradient = np.sum(error_hidden, axis=0)\n", + "\n", + " if self.lmbd > 0.0:\n", + " self.output_weights_gradient += self.lmbd * self.output_weights\n", + " self.hidden_weights_gradient += self.lmbd * self.hidden_weights\n", + "\n", + " self.output_weights -= self.eta * self.output_weights_gradient\n", + " self.output_bias -= self.eta * self.output_bias_gradient\n", + " self.hidden_weights -= self.eta * self.hidden_weights_gradient\n", + " self.hidden_bias -= self.eta * self.hidden_bias_gradient\n", + "\n", + " def predict(self, X):\n", + " probabilities = self.feed_forward_out(X)\n", + " return np.argmax(probabilities, axis=1)\n", + "\n", + " def predict_probabilities(self, X):\n", + " probabilities = self.feed_forward_out(X)\n", + " return probabilities\n", + "\n", + " def train(self):\n", + " data_indices = np.arange(self.n_inputs)\n", + "\n", + " for i in range(self.epochs):\n", + " for j in range(self.iterations):\n", + " # pick datapoints with replacement\n", + " chosen_datapoints = np.random.choice(\n", + " data_indices, size=self.batch_size, replace=False\n", + " )\n", + "\n", + " # minibatch training data\n", + " self.X_data = self.X_data_full[chosen_datapoints]\n", + " self.Y_data = self.Y_data_full[chosen_datapoints]\n", + "\n", + " self.feed_forward()\n", + " self.backpropagation()" + ] + }, + { + "cell_type": "markdown", + "id": "a417307d", + "metadata": { + "editable": true + }, + "source": [ + "## Evaluate model performance on test data\n", + "\n", + "To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data. \n", + "We measure the performance of the network using the *accuracy* score. \n", + "The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$. \n", + "\n", + "$$ \\text{Accuracy} = \\frac{\\sum_{i=1}^n I(\\tilde{y}_i = y_i)}{n} ,$$ \n", + "\n", + "where $I$ is the indicator function, $1$ if $\\tilde{y}_i = y_i$ and $0$ otherwise." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "8ee4b306", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "epochs = 100\n", + "batch_size = 100\n", + "\n", + "dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n", + " n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n", + "dnn.train()\n", + "test_predict = dnn.predict(X_test)\n", + "\n", + "# accuracy score from scikit library\n", + "print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n", + "\n", + "# equivalent in numpy\n", + "def accuracy_score_numpy(Y_test, Y_pred):\n", + " return np.sum(Y_test == Y_pred) / len(Y_test)\n", + "\n", + "#print(\"Accuracy score on test set: \", accuracy_score_numpy(Y_test, test_predict))" + ] + }, + { + "cell_type": "markdown", + "id": "efcbd954", + "metadata": { + "editable": true + }, + "source": [ + "## Adjust hyperparameters\n", + "\n", + "We now perform a grid search to find the optimal hyperparameters for the network. \n", + "Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\\%$ ($2\\%$ error rate)." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "bb527e6e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "# store the models for later use\n", + "DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + "\n", + "# grid search\n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n", + " n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n", + " dnn.train()\n", + " \n", + " DNN_numpy[i][j] = dnn\n", + " \n", + " test_predict = dnn.predict(X_test)\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "d282951d", + "metadata": { + "editable": true + }, + "source": [ + "## Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "69d3d9c8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# visual representation of grid search\n", + "# uses seaborn heatmap, you can also do this with matplotlib imshow\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " dnn = DNN_numpy[i][j]\n", + " \n", + " train_pred = dnn.predict(X_train) \n", + " test_pred = dnn.predict(X_test)\n", + "\n", + " train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n", + " test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "99f5058c", + "metadata": { + "editable": true + }, + "source": [ + "## scikit-learn implementation\n", + "\n", + "**scikit-learn** focuses more\n", + "on traditional machine learning methods, such as regression,\n", + "clustering, decision trees, etc. As such, it has only two types of\n", + "neural networks: Multi Layer Perceptron outputting continuous values,\n", + "*MPLRegressor*, and Multi Layer Perceptron outputting labels,\n", + "*MLPClassifier*. We will see how simple it is to use these classes.\n", + "\n", + "**scikit-learn** implements a few improvements from our neural network,\n", + "such as early stopping, a varying learning rate, different\n", + "optimization methods, etc. We would therefore expect a better\n", + "performance overall." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "7898d99f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.neural_network import MLPClassifier\n", + "# store models for later use\n", + "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + "\n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n", + " alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n", + " dnn.fit(X_train, Y_train)\n", + " \n", + " DNN_scikit[i][j] = dnn\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Accuracy score on test set: \", dnn.score(X_test, Y_test))\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "7ceec918", + "metadata": { + "editable": true + }, + "source": [ + "## Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "98abf229", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " dnn = DNN_scikit[i][j]\n", + " \n", + " train_pred = dnn.predict(X_train) \n", + " test_pred = dnn.predict(X_test)\n", + "\n", + " train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n", + " test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ba07c374", + "metadata": { + "editable": true + }, + "source": [ + "## Building neural networks in Tensorflow and Keras\n", + "\n", + "Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n", + "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n", + "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer. \n", + "\n", + "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n", + "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n", + "NumPy arrays." + ] + }, + { + "cell_type": "markdown", + "id": "1cf09819", + "metadata": { + "editable": true + }, + "source": [ + "## Tensorflow\n", + "\n", + "Tensorflow is an open source library machine learning library\n", + "developed by the Google Brain team for internal use. It was released\n", + "under the Apache 2.0 open source license in November 9, 2015.\n", + "\n", + "Tensorflow is a computational framework that allows you to construct\n", + "machine learning models at different levels of abstraction, from\n", + "high-level, object-oriented APIs like Keras, down to the C++ kernels\n", + "that Tensorflow is built upon. The higher levels of abstraction are\n", + "simpler to use, but less flexible, and our choice of implementation\n", + "should reflect the problems we are trying to solve.\n", + "\n", + "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n", + "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n", + "to represent your model, and then create a Tensorflow *session* to run the graph.\n", + "\n", + "In this guide we will analyze the same data as we did in our NumPy and\n", + "scikit-learn tutorial, gathered from the MNIST database of images. We\n", + "will give an introduction to the lower level Python Application\n", + "Program Interfaces (APIs), and see how we use them to build our graph.\n", + "Then we will build (effectively) the same graph in Keras, to see just\n", + "how simple solving a machine learning problem can be.\n", + "\n", + "To install tensorflow on Unix/Linux systems, use pip as" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "2c2c3ec5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "pip3 install tensorflow" + ] + }, + { + "cell_type": "markdown", + "id": "39d013b1", + "metadata": { + "editable": true + }, + "source": [ + "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n", + "(current release of CPU-only TensorFlow)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "fbf36c26", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf tensorflow\n", + "conda activate tf" + ] + }, + { + "cell_type": "markdown", + "id": "94e66380", + "metadata": { + "editable": true + }, + "source": [ + "To install the current release of GPU TensorFlow" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "5e72b1d2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf-gpu tensorflow-gpu\n", + "conda activate tf-gpu" + ] + }, + { + "cell_type": "markdown", + "id": "40470dbd", + "metadata": { + "editable": true + }, + "source": [ + "## Using Keras\n", + "\n", + "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n", + "that supports Tensorflow, CTNK and Theano as backends. \n", + "If you have Anaconda installed you may run the following command" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "f2cd4f41", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda install keras" + ] + }, + { + "cell_type": "markdown", + "id": "636940c6", + "metadata": { + "editable": true + }, + "source": [ + "You can look up the [instructions here](https://keras.io/) for more information.\n", + "\n", + "We will to a large extent use **keras** in this course." + ] + }, + { + "cell_type": "markdown", + "id": "d9f47b57", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Let us look again at the MINST data set." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "1489b5d5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tensorflow as tf\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "672dc5a2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-hot representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "0513084f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "n_neurons_layer1 = 100\n", + "n_neurons_layer2 = 50\n", + "n_categories = 10\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n", + " model = Sequential()\n", + " model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_categories, activation='softmax'))\n", + " \n", + " sgd = optimizers.SGD(lr=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "02a34777", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n", + " eta=eta, lmbd=lmbd)\n", + " DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = DNN.evaluate(X_test, Y_test)\n", + " \n", + " DNN_keras[i][j] = DNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "52c1d6e2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " DNN = DNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "53f9be79", + "metadata": { + "editable": true + }, + "source": [ + "## Building a neural network code\n", + "\n", + "Here we present a flexible object oriented codebase\n", + "for a feed forward neural network, along with a demonstration of how\n", + "to use it. Before we get into the details of the neural network, we\n", + "will first present some implementations of various schedulers, cost\n", + "functions and activation functions that can be used together with the\n", + "neural network.\n", + "\n", + "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023." + ] + }, + { + "cell_type": "markdown", + "id": "39bd1718", + "metadata": { + "editable": true + }, + "source": [ + "### Learning rate methods\n", + "\n", + "The code below shows object oriented implementations of the Constant,\n", + "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n", + "of the classes belong to the shared abstract Scheduler class, and\n", + "share the update_change() and reset() methods allowing for any of the\n", + "schedulers to be seamlessly used during the training stage, as will\n", + "later be shown in the fit() method of the neural\n", + "network. Update_change() only has one parameter, the gradient\n", + "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n", + "from the weights. The reset() function takes no parameters, and resets\n", + "the desired variables. For Constant and Momentum, reset does nothing." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "4c1f42f1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "class Scheduler:\n", + " \"\"\"\n", + " Abstract class for Schedulers\n", + " \"\"\"\n", + "\n", + " def __init__(self, eta):\n", + " self.eta = eta\n", + "\n", + " # should be overwritten\n", + " def update_change(self, gradient):\n", + " raise NotImplementedError\n", + "\n", + " # overwritten if needed\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Constant(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + "\n", + " def update_change(self, gradient):\n", + " return self.eta * gradient\n", + " \n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Momentum(Scheduler):\n", + " def __init__(self, eta: float, momentum: float):\n", + " super().__init__(eta)\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " self.change = self.momentum * self.change + self.eta * gradient\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Adagrad(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " return self.eta * gradient * G_t_inverse\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class AdagradMomentum(Scheduler):\n", + " def __init__(self, eta, momentum):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class RMS_prop(Scheduler):\n", + " def __init__(self, eta, rho):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.second = 0.0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + " self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n", + " return self.eta * gradient / (np.sqrt(self.second + delta))\n", + "\n", + " def reset(self):\n", + " self.second = 0.0\n", + "\n", + "\n", + "class Adam(Scheduler):\n", + " def __init__(self, eta, rho, rho2):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.rho2 = rho2\n", + " self.moment = 0\n", + " self.second = 0\n", + " self.n_epochs = 1\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n", + " self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n", + "\n", + " moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n", + " second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n", + "\n", + " return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n", + "\n", + " def reset(self):\n", + " self.n_epochs += 1\n", + " self.moment = 0\n", + " self.second = 0" + ] + }, + { + "cell_type": "markdown", + "id": "532aecc2", + "metadata": { + "editable": true + }, + "source": [ + "### Usage of the above learning rate schedulers\n", + "\n", + "To initalize a scheduler, simply create the object and pass in the\n", + "necessary parameters such as the learning rate and the momentum as\n", + "shown below. As the Scheduler class is an abstract class it should not\n", + "called directly, and will raise an error upon usage." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "b24b4414", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", + "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)" + ] + }, + { + "cell_type": "markdown", + "id": "32a25c0b", + "metadata": { + "editable": true + }, + "source": [ + "Here is a small example for how a segment of code using schedulers\n", + "could look. Switching out the schedulers is simple." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "7a7d273f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "weights = np.ones((3,3))\n", + "print(f\"Before scheduler:\\n{weights=}\")\n", + "\n", + "epochs = 10\n", + "for e in range(epochs):\n", + " gradient = np.random.rand(3, 3)\n", + " change = adam_scheduler.update_change(gradient)\n", + " weights = weights - change\n", + " adam_scheduler.reset()\n", + "\n", + "print(f\"\\nAfter scheduler:\\n{weights=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d34cd45c", + "metadata": { + "editable": true + }, + "source": [ + "### Cost functions\n", + "\n", + "Here we discuss cost functions that can be used when creating the\n", + "neural network. Every cost function takes the target vector as its\n", + "parameter, and returns a function valued only at $x$ such that it may\n", + "easily be differentiated." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "9ad6425d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "def CostOLS(target):\n", + " \n", + " def func(X):\n", + " return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostLogReg(target):\n", + "\n", + " def func(X):\n", + " \n", + " return -(1.0 / target.shape[0]) * np.sum(\n", + " (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n", + " )\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostCrossEntropy(target):\n", + " \n", + " def func(X):\n", + " return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n", + "\n", + " return func" + ] + }, + { + "cell_type": "markdown", + "id": "baaaff79", + "metadata": { + "editable": true + }, + "source": [ + "Below we give a short example of how these cost function may be used\n", + "to obtain results if you wish to test them out on your own using\n", + "AutoGrad's automatics differentiation." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "78f11b83", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "target = np.array([[1, 2, 3]]).T\n", + "a = np.array([[4, 5, 6]]).T\n", + "\n", + "cost_func = CostCrossEntropy\n", + "cost_func_derivative = grad(cost_func(target))\n", + "\n", + "valued_at_a = cost_func_derivative(a)\n", + "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")" + ] + }, + { + "cell_type": "markdown", + "id": "05285af5", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "Finally, before we look at the neural network, we will look at the\n", + "activation functions which can be specified between the hidden layers\n", + "and as the output function. Each function can be valued for any given\n", + "vector or matrix X, and can be differentiated via derivate()." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "7ac52c84", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import elementwise_grad\n", + "\n", + "def identity(X):\n", + " return X\n", + "\n", + "\n", + "def sigmoid(X):\n", + " try:\n", + " return 1.0 / (1 + np.exp(-X))\n", + " except FloatingPointError:\n", + " return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n", + "\n", + "\n", + "def softmax(X):\n", + " X = X - np.max(X, axis=-1, keepdims=True)\n", + " delta = 10e-10\n", + " return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n", + "\n", + "\n", + "def RELU(X):\n", + " return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n", + "\n", + "\n", + "def LRELU(X):\n", + " delta = 10e-4\n", + " return np.where(X > np.zeros(X.shape), X, delta * X)\n", + "\n", + "\n", + "def derivate(func):\n", + " if func.__name__ == \"RELU\":\n", + "\n", + " def func(X):\n", + " return np.where(X > 0, 1, 0)\n", + "\n", + " return func\n", + "\n", + " elif func.__name__ == \"LRELU\":\n", + "\n", + " def func(X):\n", + " delta = 10e-4\n", + " return np.where(X > 0, 1, delta)\n", + "\n", + " return func\n", + "\n", + " else:\n", + " return elementwise_grad(func)" + ] + }, + { + "cell_type": "markdown", + "id": "873e7caa", + "metadata": { + "editable": true + }, + "source": [ + "Below follows a short demonstration of how to use an activation\n", + "function. The derivative of the activation function will be important\n", + "when calculating the output delta term during backpropagation. Note\n", + "that derivate() can also be used for cost functions for a more\n", + "generalized approach." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "bd43ac18", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "z = np.array([[4, 5, 6]]).T\n", + "print(f\"Input to activation function:\\n{z}\")\n", + "\n", + "act_func = sigmoid\n", + "a = act_func(z)\n", + "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n", + "\n", + "act_func_derivative = derivate(act_func)\n", + "valued_at_z = act_func_derivative(a)\n", + "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")" + ] + }, + { + "cell_type": "markdown", + "id": "3dc2175e", + "metadata": { + "editable": true + }, + "source": [ + "### The Neural Network\n", + "\n", + "Now that we have gotten a good understanding of the implementation of\n", + "some important components, we can take a look at an object oriented\n", + "implementation of a feed forward neural network. The feed forward\n", + "neural network has been implemented as a class named FFNN, which can\n", + "be initiated as a regressor or classifier dependant on the choice of\n", + "cost function. The FFNN can have any number of input nodes, hidden\n", + "layers with any amount of hidden nodes, and any amount of output nodes\n", + "meaning it can perform multiclass classification as well as binary\n", + "classification and regression problems. Although there is a lot of\n", + "code present, it makes for an easy to use and generalizeable interface\n", + "for creating many types of neural networks as will be demonstrated\n", + "below." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "5b4b161c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import math\n", + "import autograd.numpy as np\n", + "import sys\n", + "import warnings\n", + "from autograd import grad, elementwise_grad\n", + "from random import random, seed\n", + "from copy import deepcopy, copy\n", + "from typing import Tuple, Callable\n", + "from sklearn.utils import resample\n", + "\n", + "warnings.simplefilter(\"error\")\n", + "\n", + "\n", + "class FFNN:\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Feed Forward Neural Network with interface enabling flexible design of a\n", + " nerual networks architecture and the specification of activation function\n", + " in the hidden layers and output layer respectively. This model can be used\n", + " for both regression and classification problems, depending on the output function.\n", + "\n", + " Attributes:\n", + " ------------\n", + " I dimensions (tuple[int]): A list of positive integers, which specifies the\n", + " number of nodes in each of the networks layers. The first integer in the array\n", + " defines the number of nodes in the input layer, the second integer defines number\n", + " of nodes in the first hidden layer and so on until the last number, which\n", + " specifies the number of nodes in the output layer.\n", + " II hidden_func (Callable): The activation function for the hidden layers\n", + " III output_func (Callable): The activation function for the output layer\n", + " IV cost_func (Callable): Our cost function\n", + " V seed (int): Sets random seed, makes results reproducible\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " dimensions: tuple[int],\n", + " hidden_func: Callable = sigmoid,\n", + " output_func: Callable = lambda x: x,\n", + " cost_func: Callable = CostOLS,\n", + " seed: int = None,\n", + " ):\n", + " self.dimensions = dimensions\n", + " self.hidden_func = hidden_func\n", + " self.output_func = output_func\n", + " self.cost_func = cost_func\n", + " self.seed = seed\n", + " self.weights = list()\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + " self.classification = None\n", + "\n", + " self.reset_weights()\n", + " self._set_classification()\n", + "\n", + " def fit(\n", + " self,\n", + " X: np.ndarray,\n", + " t: np.ndarray,\n", + " scheduler: Scheduler,\n", + " batches: int = 1,\n", + " epochs: int = 100,\n", + " lam: float = 0,\n", + " X_val: np.ndarray = None,\n", + " t_val: np.ndarray = None,\n", + " ):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " This function performs the training the neural network by performing the feedforward and backpropagation\n", + " algorithm to update the networks weights.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray) : training data\n", + " II t (np.ndarray) : target data\n", + " III scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n", + " IV scheduler_args (list[int]) : list of all arguments necessary for scheduler\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " V batches (int) : number of batches the datasets are split into, default equal to 1\n", + " VI epochs (int) : number of iterations used to train the network, default equal to 100\n", + " VII lam (float) : regularization hyperparameter lambda\n", + " VIII X_val (np.ndarray) : validation set\n", + " IX t_val (np.ndarray) : validation target set\n", + "\n", + " Returns:\n", + " ------------\n", + " I scores (dict) : A dictionary containing the performance metrics of the model.\n", + " The number of the metrics depends on the parameters passed to the fit-function.\n", + "\n", + " \"\"\"\n", + "\n", + " # setup \n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " val_set = False\n", + " if X_val is not None and t_val is not None:\n", + " val_set = True\n", + "\n", + " # creating arrays for score metrics\n", + " train_errors = np.empty(epochs)\n", + " train_errors.fill(np.nan)\n", + " val_errors = np.empty(epochs)\n", + " val_errors.fill(np.nan)\n", + "\n", + " train_accs = np.empty(epochs)\n", + " train_accs.fill(np.nan)\n", + " val_accs = np.empty(epochs)\n", + " val_accs.fill(np.nan)\n", + "\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + "\n", + " batch_size = X.shape[0] // batches\n", + "\n", + " X, t = resample(X, t)\n", + "\n", + " # this function returns a function valued only at X\n", + " cost_function_train = self.cost_func(t)\n", + " if val_set:\n", + " cost_function_val = self.cost_func(t_val)\n", + "\n", + " # create schedulers for each weight matrix\n", + " for i in range(len(self.weights)):\n", + " self.schedulers_weight.append(copy(scheduler))\n", + " self.schedulers_bias.append(copy(scheduler))\n", + "\n", + " print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n", + "\n", + " try:\n", + " for e in range(epochs):\n", + " for i in range(batches):\n", + " # allows for minibatch gradient descent\n", + " if i == batches - 1:\n", + " # If the for loop has reached the last batch, take all thats left\n", + " X_batch = X[i * batch_size :, :]\n", + " t_batch = t[i * batch_size :, :]\n", + " else:\n", + " X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n", + " t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n", + "\n", + " self._feedforward(X_batch)\n", + " self._backpropagate(X_batch, t_batch, lam)\n", + "\n", + " # reset schedulers for each epoch (some schedulers pass in this call)\n", + " for scheduler in self.schedulers_weight:\n", + " scheduler.reset()\n", + "\n", + " for scheduler in self.schedulers_bias:\n", + " scheduler.reset()\n", + "\n", + " # computing performance metrics\n", + " pred_train = self.predict(X)\n", + " train_error = cost_function_train(pred_train)\n", + "\n", + " train_errors[e] = train_error\n", + " if val_set:\n", + " \n", + " pred_val = self.predict(X_val)\n", + " val_error = cost_function_val(pred_val)\n", + " val_errors[e] = val_error\n", + "\n", + " if self.classification:\n", + " train_acc = self._accuracy(self.predict(X), t)\n", + " train_accs[e] = train_acc\n", + " if val_set:\n", + " val_acc = self._accuracy(pred_val, t_val)\n", + " val_accs[e] = val_acc\n", + "\n", + " # printing progress bar\n", + " progression = e / epochs\n", + " print_length = self._progress_bar(\n", + " progression,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " except KeyboardInterrupt:\n", + " # allows for stopping training at any point and seeing the result\n", + " pass\n", + "\n", + " # visualization of training progression (similiar to tensorflow progression bar)\n", + " sys.stdout.write(\"\\r\" + \" \" * print_length)\n", + " sys.stdout.flush()\n", + " self._progress_bar(\n", + " 1,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " sys.stdout.write(\"\")\n", + "\n", + " # return performance metrics for the entire run\n", + " scores = dict()\n", + "\n", + " scores[\"train_errors\"] = train_errors\n", + "\n", + " if val_set:\n", + " scores[\"val_errors\"] = val_errors\n", + "\n", + " if self.classification:\n", + " scores[\"train_accs\"] = train_accs\n", + "\n", + " if val_set:\n", + " scores[\"val_accs\"] = val_accs\n", + "\n", + " return scores\n", + "\n", + " def predict(self, X: np.ndarray, *, threshold=0.5):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs prediction after training of the network has been finished.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " II threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n", + " in classification problems\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " This vector is thresholded if regression=False, meaning that classification results\n", + " in a vector of 1s and 0s, while regressions in an array of decimal numbers\n", + "\n", + " \"\"\"\n", + "\n", + " predict = self._feedforward(X)\n", + "\n", + " if self.classification:\n", + " return np.where(predict > threshold, 1, 0)\n", + " else:\n", + " return predict\n", + "\n", + " def reset_weights(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Resets/Reinitializes the weights in order to train the network for a new problem.\n", + "\n", + " \"\"\"\n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " self.weights = list()\n", + " for i in range(len(self.dimensions) - 1):\n", + " weight_array = np.random.randn(\n", + " self.dimensions[i] + 1, self.dimensions[i + 1]\n", + " )\n", + " weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n", + "\n", + " self.weights.append(weight_array)\n", + "\n", + " def _feedforward(self, X: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates the activation of each layer starting at the input and ending at the output.\n", + " Each following activation is calculated from a weighted sum of each of the preceeding\n", + " activations (except in the case of the input layer).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " \"\"\"\n", + "\n", + " # reset matrices\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + "\n", + " # if X is just a vector, make it into a matrix\n", + " if len(X.shape) == 1:\n", + " X = X.reshape((1, X.shape[0]))\n", + "\n", + " # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n", + " # to add bias to our data\n", + " bias = np.ones((X.shape[0], 1)) * 0.01\n", + " X = np.hstack([bias, X])\n", + "\n", + " # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n", + " # exponent indicates layer number).\n", + " a = X\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(a)\n", + "\n", + " # The feed forward algorithm\n", + " for i in range(len(self.weights)):\n", + " if i < len(self.weights) - 1:\n", + " z = a @ self.weights[i]\n", + " self.z_matrices.append(z)\n", + " a = self.hidden_func(z)\n", + " # bias column again added to the data here\n", + " bias = np.ones((a.shape[0], 1)) * 0.01\n", + " a = np.hstack([bias, a])\n", + " self.a_matrices.append(a)\n", + " else:\n", + " try:\n", + " # a^L, the nodes in our output layers\n", + " z = a @ self.weights[i]\n", + " a = self.output_func(z)\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(z)\n", + " except Exception as OverflowError:\n", + " print(\n", + " \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n", + " )\n", + "\n", + " # this will be a^L\n", + " return a\n", + "\n", + " def _backpropagate(self, X, t, lam):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs the backpropagation algorithm. In other words, this method\n", + " calculates the gradient of all the layers starting at the\n", + " output layer, and moving from right to left accumulates the gradient until\n", + " the input layer is reached. Each layers respective weights are updated while\n", + " the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each.\n", + " II t (np.ndarray): The target vector, with n rows of p targets.\n", + " III lam (float32): regularization parameter used to punish the weights in case of overfitting\n", + "\n", + " Returns:\n", + " ------------\n", + " No return value.\n", + "\n", + " \"\"\"\n", + " out_derivative = derivate(self.output_func)\n", + " hidden_derivative = derivate(self.hidden_func)\n", + "\n", + " for i in range(len(self.weights) - 1, -1, -1):\n", + " # delta terms for output\n", + " if i == len(self.weights) - 1:\n", + " # for multi-class classification\n", + " if (\n", + " self.output_func.__name__ == \"softmax\"\n", + " ):\n", + " delta_matrix = self.a_matrices[i + 1] - t\n", + " # for single class classification\n", + " else:\n", + " cost_func_derivative = grad(self.cost_func(t))\n", + " delta_matrix = out_derivative(\n", + " self.z_matrices[i + 1]\n", + " ) * cost_func_derivative(self.a_matrices[i + 1])\n", + "\n", + " # delta terms for hidden layer\n", + " else:\n", + " delta_matrix = (\n", + " self.weights[i + 1][1:, :] @ delta_matrix.T\n", + " ).T * hidden_derivative(self.z_matrices[i + 1])\n", + "\n", + " # calculate gradient\n", + " gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n", + " gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n", + " 1, delta_matrix.shape[1]\n", + " )\n", + "\n", + " # regularization term\n", + " gradient_weights += self.weights[i][1:, :] * lam\n", + "\n", + " # use scheduler\n", + " update_matrix = np.vstack(\n", + " [\n", + " self.schedulers_bias[i].update_change(gradient_bias),\n", + " self.schedulers_weight[i].update_change(gradient_weights),\n", + " ]\n", + " )\n", + "\n", + " # update weights and bias\n", + " self.weights[i] -= update_matrix\n", + "\n", + " def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates accuracy of given prediction to target\n", + "\n", + " Parameters:\n", + " ------------\n", + " I prediction (np.ndarray): vector of predicitons output network\n", + " (1s and 0s in case of classification, and real numbers in case of regression)\n", + " II target (np.ndarray): vector of true values (What the network ideally should predict)\n", + "\n", + " Returns:\n", + " ------------\n", + " A floating point number representing the percentage of correctly classified instances.\n", + " \"\"\"\n", + " assert prediction.size == target.size\n", + " return np.average((target == prediction))\n", + " def _set_classification(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Decides if FFNN acts as classifier (True) og regressor (False),\n", + " sets self.classification during init()\n", + " \"\"\"\n", + " self.classification = False\n", + " if (\n", + " self.cost_func.__name__ == \"CostLogReg\"\n", + " or self.cost_func.__name__ == \"CostCrossEntropy\"\n", + " ):\n", + " self.classification = True\n", + "\n", + " def _progress_bar(self, progression, **kwargs):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Displays progress of training\n", + " \"\"\"\n", + " print_length = 40\n", + " num_equals = int(progression * print_length)\n", + " num_not = print_length - num_equals\n", + " arrow = \">\" if num_equals > 0 else \"\"\n", + " bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n", + " perc_print = self._format(progression * 100, decimals=5)\n", + " line = f\" {bar} {perc_print}% \"\n", + "\n", + " for key in kwargs:\n", + " if not np.isnan(kwargs[key]):\n", + " value = self._format(kwargs[key], decimals=4)\n", + " line += f\"| {key}: {value} \"\n", + " sys.stdout.write(\"\\r\" + line)\n", + " sys.stdout.flush()\n", + " return len(line)\n", + "\n", + " def _format(self, value, decimals=4):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Formats decimal numbers for progress bar\n", + " \"\"\"\n", + " if value > 0:\n", + " v = value\n", + " elif value < 0:\n", + " v = -10 * value\n", + " else:\n", + " v = 1\n", + " n = 1 + math.floor(math.log10(v))\n", + " if n >= decimals - 1:\n", + " return str(round(value))\n", + " return f\"{value:.{decimals-n-1}f}\"" + ] + }, + { + "cell_type": "markdown", + "id": "9596ae53", + "metadata": { + "editable": true + }, + "source": [ + "Before we make a model, we will quickly generate a dataset we can use\n", + "for our linear regression problem as shown below" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "a11f680f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "def SkrankeFunction(x, y):\n", + " return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n", + "\n", + "def create_X(x, y, n):\n", + " if len(x.shape) > 1:\n", + " x = np.ravel(x)\n", + " y = np.ravel(y)\n", + "\n", + " N = len(x)\n", + " l = int((n + 1) * (n + 2) / 2) # Number of elements in beta\n", + " X = np.ones((N, l))\n", + "\n", + " for i in range(1, n + 1):\n", + " q = int((i) * (i + 1) / 2)\n", + " for k in range(i + 1):\n", + " X[:, q + k] = (x ** (i - k)) * (y**k)\n", + "\n", + " return X\n", + "\n", + "step=0.5\n", + "x = np.arange(0, 1, step)\n", + "y = np.arange(0, 1, step)\n", + "x, y = np.meshgrid(x, y)\n", + "target = SkrankeFunction(x, y)\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "poly_degree=3\n", + "X = create_X(x, y, poly_degree)\n", + "\n", + "X_train, X_test, t_train, t_test = train_test_split(X, target)" + ] + }, + { + "cell_type": "markdown", + "id": "0fc39e40", + "metadata": { + "editable": true + }, + "source": [ + "Now that we have our dataset ready for the regression, we can create\n", + "our regressor. Note that with the seed parameter, we can make sure our\n", + "results stay the same every time we run the neural network. For\n", + "inititialization, we simply specify the dimensions (we wish the amount\n", + "of input nodes to be equal to the datapoints, and the output to\n", + "predict one value)." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "a67ab3a0", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "3add8665", + "metadata": { + "editable": true + }, + "source": [ + "We then fit our model with our training data using the scheduler of our choice." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "4a4fbc7a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Constant(eta=1e-3)\n", + "scores = linear_regression.fit(X_train, t_train, scheduler)" + ] + }, + { + "cell_type": "markdown", + "id": "4dff1871", + "metadata": { + "editable": true + }, + "source": [ + "Due to the progress bar we can see the MSE (train_error) throughout\n", + "the FFNN's training. Note that the fit() function has some optional\n", + "parameters with defualt arguments. For example, the regularization\n", + "hyperparameter can be left ignored if not needed, and equally the FFNN\n", + "will by default run for 100 epochs. These can easily be changed, such\n", + "as for example:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "ad40e38c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "43cd1e22", + "metadata": { + "editable": true + }, + "source": [ + "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", + "\n", + "Let us then switch to a binary classification. We use a binary\n", + "classification dataset, and follow a similar setup to the regression\n", + "case." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "cde36b38", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "wisconsin = load_breast_cancer()\n", + "X = wisconsin.data\n", + "target = wisconsin.target\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "X_train, X_val, t_train, t_val = train_test_split(X, target)\n", + "\n", + "scaler = MinMaxScaler()\n", + "scaler.fit(X_train)\n", + "X_train = scaler.transform(X_train)\n", + "X_val = scaler.transform(X_val)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "2bc572a4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "e3e6fa31", + "metadata": { + "editable": true + }, + "source": [ + "We will now make use of our validation data by passing it into our fit function as a keyword argument" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "575ceb29", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "622015f0", + "metadata": { + "editable": true + }, + "source": [ + "Finally, we will create a neural network with 2 hidden layers with activation functions." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "9c075b36", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 1\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "44ded771", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "317e6e5c", + "metadata": { + "editable": true + }, + "source": [ + "### Multiclass classification\n", + "\n", + "Finally, we will demonstrate the use case of multiclass classification\n", + "using our FFNN with the famous MNIST dataset, which contain images of\n", + "digits between the range of 0 to 9." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "8911de9d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "\n", + "def onehot(target: np.ndarray):\n", + " onehot = np.zeros((target.size, target.max() + 1))\n", + " onehot[np.arange(target.size), target] = 1\n", + " return onehot\n", + "\n", + "digits = load_digits()\n", + "\n", + "X = digits.data\n", + "target = digits.target\n", + "target = onehot(target)\n", + "\n", + "input_nodes = 64\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 10\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n", + "\n", + "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = multiclass.fit(X, target, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "82d61377", + "metadata": { + "editable": true + }, + "source": [ + "## Testing the XOR gate and other gates\n", + "\n", + "Let us now use our code to test the XOR gate." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "2a72a374", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", + "\n", + "# The XOR gate\n", + "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n", + "\n", + "input_nodes = X.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n", + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "2d892009", + "metadata": { + "editable": true + }, + "source": [ + "Not bad, but the results depend strongly on the learning reate. Try different learning rates." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week43.ipynb b/doc/LectureNotes/week43.ipynb new file mode 100644 index 000000000..b190102b6 --- /dev/null +++ b/doc/LectureNotes/week43.ipynb @@ -0,0 +1,5950 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5e07edf2", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "44b465a0", + "metadata": { + "editable": true + }, + "source": [ + "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **October 20, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "9d7bd8c9", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 43\n", + "\n", + "**Material for the lecture on Monday October 20, 2025.**\n", + "\n", + "1. Reminder from last week, see also lecture notes from week 42 at as well as those from week 41, see see . \n", + "\n", + "2. Building our own Feed-forward Neural Network.\n", + "\n", + "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n", + "\n", + "4. Start discussions on how to use neural networks for solving differential equations (ordinary and partial ones). This topic continues next week as well.\n", + "\n", + "5. Video of lecture at \n", + "\n", + "6. Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "c50cff0f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercises and lab session week 43\n", + "**Lab sessions on Tuesday and Wednesday.**\n", + "\n", + "1. Work on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n", + "\n", + "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems" + ] + }, + { + "cell_type": "markdown", + "id": "fe8d32ed", + "metadata": { + "editable": true + }, + "source": [ + "## Using Automatic differentiation\n", + "\n", + "In our discussions of ordinary differential equations and neural network codes\n", + "we will also study the usage of Autograd, see for example in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at and the lecture slides from week 41, see ." + ] + }, + { + "cell_type": "markdown", + "id": "99999ab4", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation and automatic differentiation\n", + "\n", + "For more details on the back propagation algorithm and automatic differentiation see\n", + "1. \n", + "\n", + "2. \n", + "\n", + "3. Slides 12-44 at " + ] + }, + { + "cell_type": "markdown", + "id": "b4489372", + "metadata": { + "editable": true + }, + "source": [ + "## Lecture Monday October 20" + ] + }, + { + "cell_type": "markdown", + "id": "f7435e4a", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n", + "This is a reminder from last week.\n", + "\n", + "**The architecture (our model).**\n", + "\n", + "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n", + "\n", + "2. Define the number of hidden layers and hidden nodes\n", + "\n", + "3. Define activation functions for hidden layers and output layers\n", + "\n", + "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n", + "\n", + "5. Define cost function and possible regularization terms with hyperparameters\n", + "\n", + "6. Initialize weights and biases\n", + "\n", + "7. Fix number of iterations for the feed forward part and back propagation part" + ] + }, + { + "cell_type": "markdown", + "id": "e2561576", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 1\n", + "\n", + "Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n", + "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\boldsymbol{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "39ed46ed", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "776b50ac", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b0ad385d", + "metadata": { + "editable": true + }, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "bb592830", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41259526", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "47eaff91", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05b74533", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6edb8648", + "metadata": { + "editable": true + }, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "a663fc08", + "metadata": { + "editable": true + }, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "479150e0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "41b9b1ea", + "metadata": { + "editable": true + }, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "590c403a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "w_{ij}^l\\leftarrow = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3db8cbb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a204182a", + "metadata": { + "editable": true + }, + "source": [ + "## Activation functions\n", + "\n", + "A property that characterizes a neural network, other than its\n", + "connectivity, is the choice of activation function(s). The following\n", + "restrictions are imposed on an activation function for an FFNN to\n", + "fulfill the universal approximation theorem\n", + "\n", + " * Non-constant\n", + "\n", + " * Bounded\n", + "\n", + " * Monotonically-increasing\n", + "\n", + " * Continuous" + ] + }, + { + "cell_type": "markdown", + "id": "4fe58cce", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions, examples\n", + "\n", + "Typical examples are the logistic *Sigmoid*" + ] + }, + { + "cell_type": "markdown", + "id": "a14f6d08", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4c290410", + "metadata": { + "editable": true + }, + "source": [ + "and the *hyperbolic tangent* function" + ] + }, + { + "cell_type": "markdown", + "id": "ca1ac514", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\sigma(x) = \\tanh(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9bcfab3", + "metadata": { + "editable": true + }, + "source": [ + "## The RELU function family\n", + "\n", + "The ReLU activation function suffers from a problem known as the dying\n", + "ReLUs: during training, some neurons effectively die, meaning they\n", + "stop outputting anything other than 0.\n", + "\n", + "In some cases, you may find that half of your network’s neurons are\n", + "dead, especially if you used a large learning rate. During training,\n", + "if a neuron’s weights get updated such that the weighted sum of the\n", + "neuron’s inputs is negative, it will start outputting 0. When this\n", + "happen, the neuron is unlikely to come back to life since the gradient\n", + "of the ReLU function is 0 when its input is negative." + ] + }, + { + "cell_type": "markdown", + "id": "2fdf56f7", + "metadata": { + "editable": true + }, + "source": [ + "## ELU function\n", + "\n", + "To solve this problem, nowadays practitioners use a variant of the\n", + "ReLU function, such as the leaky ReLU discussed above or the so-called\n", + "exponential linear unit (ELU) function" + ] + }, + { + "cell_type": "markdown", + "id": "14bf193c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\ z & z \\ge 0.\\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df29068f", + "metadata": { + "editable": true + }, + "source": [ + "## Which activation function should we use?\n", + "\n", + "In general it seems that the ELU activation function is better than\n", + "the leaky ReLU function (and its variants), which is better than\n", + "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n", + "than the logistic function.\n", + "\n", + "If runtime performance is an issue, then you may opt for the leaky\n", + "ReLU function over the ELU function If you don’t want to tweak yet\n", + "another hyperparameter, you may just use the default $\\alpha$ of\n", + "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n", + "computing power, you can use cross-validation or bootstrap to evaluate\n", + "other activation functions." + ] + }, + { + "cell_type": "markdown", + "id": "2fb5a29e", + "metadata": { + "editable": true + }, + "source": [ + "## More on activation functions, output layers\n", + "\n", + "In most cases you can use the ReLU activation function in the hidden\n", + "layers (or one of its variants).\n", + "\n", + "It is a bit faster to compute than other activation functions, and the\n", + "gradient descent optimization does in general not get stuck.\n", + "\n", + "**For the output layer:**\n", + "\n", + "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n", + "\n", + "* For regression tasks, you can simply use no activation function at all." + ] + }, + { + "cell_type": "markdown", + "id": "bab79791", + "metadata": { + "editable": true + }, + "source": [ + "## Building neural networks in Tensorflow and Keras\n", + "\n", + "Now we want to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n", + "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n", + "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer. \n", + "\n", + "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n", + "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n", + "NumPy arrays." + ] + }, + { + "cell_type": "markdown", + "id": "cc32bc9d", + "metadata": { + "editable": true + }, + "source": [ + "## Tensorflow\n", + "\n", + "Tensorflow is an open source library machine learning library\n", + "developed by the Google Brain team for internal use. It was released\n", + "under the Apache 2.0 open source license in November 9, 2015.\n", + "\n", + "Tensorflow is a computational framework that allows you to construct\n", + "machine learning models at different levels of abstraction, from\n", + "high-level, object-oriented APIs like Keras, down to the C++ kernels\n", + "that Tensorflow is built upon. The higher levels of abstraction are\n", + "simpler to use, but less flexible, and our choice of implementation\n", + "should reflect the problems we are trying to solve.\n", + "\n", + "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n", + "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n", + "to represent your model, and then create a Tensorflow *session* to run the graph.\n", + "\n", + "In this guide we will analyze the same data as we did in our NumPy and\n", + "scikit-learn tutorial, gathered from the MNIST database of images. We\n", + "will give an introduction to the lower level Python Application\n", + "Program Interfaces (APIs), and see how we use them to build our graph.\n", + "Then we will build (effectively) the same graph in Keras, to see just\n", + "how simple solving a machine learning problem can be.\n", + "\n", + "To install tensorflow on Unix/Linux systems, use pip as" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "deb81088", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "pip3 install tensorflow" + ] + }, + { + "cell_type": "markdown", + "id": "979148b0", + "metadata": { + "editable": true + }, + "source": [ + "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n", + "(current release of CPU-only TensorFlow)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ad63b8d9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf tensorflow\n", + "conda activate tf" + ] + }, + { + "cell_type": "markdown", + "id": "1417a40e", + "metadata": { + "editable": true + }, + "source": [ + "To install the current release of GPU TensorFlow" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d56acb3a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda create -n tf-gpu tensorflow-gpu\n", + "conda activate tf-gpu" + ] + }, + { + "cell_type": "markdown", + "id": "6a163d27", + "metadata": { + "editable": true + }, + "source": [ + "## Using Keras\n", + "\n", + "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n", + "that supports Tensorflow, CTNK and Theano as backends. \n", + "If you have Anaconda installed you may run the following command" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9ee390a8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "conda install keras" + ] + }, + { + "cell_type": "markdown", + "id": "528ea3d5", + "metadata": { + "editable": true + }, + "source": [ + "You can look up the [instructions here](https://keras.io/) for more information.\n", + "\n", + "We will to a large extent use **keras** in this course." + ] + }, + { + "cell_type": "markdown", + "id": "32178225", + "metadata": { + "editable": true + }, + "source": [ + "## Collect and pre-process data\n", + "\n", + "Let us look again at the MINST data set." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "e37f86e4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tensorflow as tf\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# flatten the image\n", + "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n", + "n_inputs = len(inputs)\n", + "inputs = inputs.reshape(n_inputs, -1)\n", + "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "06a7c3bd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# one-hot representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "358b46c5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "n_neurons_layer1 = 100\n", + "n_neurons_layer2 = 50\n", + "n_categories = 10\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)\n", + "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n", + " model = Sequential()\n", + " model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(Dense(n_categories, activation='softmax'))\n", + " \n", + " sgd = optimizers.SGD(learning_rate=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "5a0445fb", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n", + " eta=eta, lmbd=lmbd)\n", + " DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = DNN.evaluate(X_test, Y_test)\n", + " \n", + " DNN_keras[i][j] = DNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "f301c7cf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# optional\n", + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " DNN = DNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "610c95e1", + "metadata": { + "editable": true + }, + "source": [ + "## Using Pytorch with the full MNIST data set" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d0f3ad9a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "\n", + "# Device configuration: use GPU if available\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "# MNIST dataset (downloads if not already present)\n", + "transform = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.5,), (0.5,)) # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n", + "])\n", + "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "\n", + "class NeuralNet(nn.Module):\n", + " def __init__(self):\n", + " super(NeuralNet, self).__init__()\n", + " self.fc1 = nn.Linear(28*28, 100) # first hidden layer (784 -> 100)\n", + " self.fc2 = nn.Linear(100, 100) # second hidden layer (100 -> 100)\n", + " self.fc3 = nn.Linear(100, 10) # output layer (100 -> 10 classes)\n", + " def forward(self, x):\n", + " x = x.view(x.size(0), -1) # flatten images into vectors of size 784\n", + " x = torch.relu(self.fc1(x)) # hidden layer 1 + ReLU activation\n", + " x = torch.relu(self.fc2(x)) # hidden layer 2 + ReLU activation\n", + " x = self.fc3(x) # output layer (logits for 10 classes)\n", + " return x\n", + "\n", + "model = NeuralNet().to(device)\n", + "\n", + "\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n", + "\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " model.train() # set model to training mode\n", + " running_loss = 0.0\n", + " for images, labels in train_loader:\n", + " # Move data to device (GPU if available, else CPU)\n", + " images, labels = images.to(device), labels.to(device)\n", + "\n", + " optimizer.zero_grad() # reset gradients to zero\n", + " outputs = model(images) # forward pass: compute predictions\n", + " loss = criterion(outputs, labels) # compute cross-entropy loss\n", + " loss.backward() # backpropagate to compute gradients\n", + " optimizer.step() # update weights using SGD step \n", + "\n", + " running_loss += loss.item()\n", + " # Compute average loss over all batches in this epoch\n", + " avg_loss = running_loss / len(train_loader)\n", + " print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n", + "\n", + "#Evaluation on the Test Set\n", + "\n", + "\n", + "\n", + "model.eval() # set model to evaluation mode \n", + "correct = 0\n", + "total = 0\n", + "with torch.no_grad(): # disable gradient calculation for evaluation \n", + " for images, labels in test_loader:\n", + " images, labels = images.to(device), labels.to(device)\n", + " outputs = model(images)\n", + " _, predicted = torch.max(outputs, dim=1) # class with highest score\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + "accuracy = 100 * correct / total\n", + "print(f\"Test Accuracy: {accuracy:.2f}%\")" + ] + }, + { + "cell_type": "markdown", + "id": "aad687aa", + "metadata": { + "editable": true + }, + "source": [ + "## And a similar example using Tensorflow with Keras" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b6c4fad4", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "\n", + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "from tensorflow.keras import layers, regularizers\n", + "\n", + "# Check for GPU (TensorFlow will use it automatically if available)\n", + "gpus = tf.config.list_physical_devices('GPU')\n", + "print(f\"GPUs available: {gpus}\")\n", + "\n", + "# 1) Load and preprocess MNIST\n", + "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n", + "# Normalize to [0, 1]\n", + "x_train = (x_train.astype(\"float32\") / 255.0)\n", + "x_test = (x_test.astype(\"float32\") / 255.0)\n", + "\n", + "# 2) Build the model: 784 -> 100 -> 100 -> 10\n", + "l2_reg = 1e-4 # L2 regularization strength\n", + "\n", + "model = keras.Sequential([\n", + " layers.Input(shape=(28, 28)),\n", + " layers.Flatten(),\n", + " layers.Dense(100, activation=\"relu\",\n", + " kernel_regularizer=regularizers.l2(l2_reg)),\n", + " layers.Dense(100, activation=\"relu\",\n", + " kernel_regularizer=regularizers.l2(l2_reg)),\n", + " layers.Dense(10, activation=\"softmax\") # output probabilities for 10 classes\n", + "])\n", + "\n", + "# 3) Compile with SGD + weight decay via L2 regularizers\n", + "model.compile(\n", + " optimizer=keras.optimizers.SGD(learning_rate=0.01),\n", + " loss=\"sparse_categorical_crossentropy\",\n", + " metrics=[\"accuracy\"],\n", + ")\n", + "\n", + "model.summary()\n", + "\n", + "# 4) Train\n", + "history = model.fit(\n", + " x_train, y_train,\n", + " epochs=10,\n", + " batch_size=64,\n", + " validation_split=0.1, # optional: monitor validation during training\n", + " verbose=1\n", + ")\n", + "\n", + "# 5) Evaluate on test set\n", + "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n", + "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "73162fbb", + "metadata": { + "editable": true + }, + "source": [ + "## Building our own neural network code\n", + "\n", + "Here we present a flexible object oriented codebase\n", + "for a feed forward neural network, along with a demonstration of how\n", + "to use it. Before we get into the details of the neural network, we\n", + "will first present some implementations of various schedulers, cost\n", + "functions and activation functions that can be used together with the\n", + "neural network.\n", + "\n", + "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023." + ] + }, + { + "cell_type": "markdown", + "id": "86f36041", + "metadata": { + "editable": true + }, + "source": [ + "### Learning rate methods\n", + "\n", + "The code below shows object oriented implementations of the Constant,\n", + "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n", + "of the classes belong to the shared abstract Scheduler class, and\n", + "share the update_change() and reset() methods allowing for any of the\n", + "schedulers to be seamlessly used during the training stage, as will\n", + "later be shown in the fit() method of the neural\n", + "network. Update_change() only has one parameter, the gradient\n", + "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n", + "from the weights. The reset() function takes no parameters, and resets\n", + "the desired variables. For Constant and Momentum, reset does nothing." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bcbec449", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "class Scheduler:\n", + " \"\"\"\n", + " Abstract class for Schedulers\n", + " \"\"\"\n", + "\n", + " def __init__(self, eta):\n", + " self.eta = eta\n", + "\n", + " # should be overwritten\n", + " def update_change(self, gradient):\n", + " raise NotImplementedError\n", + "\n", + " # overwritten if needed\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Constant(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + "\n", + " def update_change(self, gradient):\n", + " return self.eta * gradient\n", + " \n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Momentum(Scheduler):\n", + " def __init__(self, eta: float, momentum: float):\n", + " super().__init__(eta)\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " self.change = self.momentum * self.change + self.eta * gradient\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " pass\n", + "\n", + "\n", + "class Adagrad(Scheduler):\n", + " def __init__(self, eta):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " return self.eta * gradient * G_t_inverse\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class AdagradMomentum(Scheduler):\n", + " def __init__(self, eta, momentum):\n", + " super().__init__(eta)\n", + " self.G_t = None\n", + " self.momentum = momentum\n", + " self.change = 0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " if self.G_t is None:\n", + " self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n", + "\n", + " self.G_t += gradient @ gradient.T\n", + "\n", + " G_t_inverse = 1 / (\n", + " delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n", + " )\n", + " self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n", + " return self.change\n", + "\n", + " def reset(self):\n", + " self.G_t = None\n", + "\n", + "\n", + "class RMS_prop(Scheduler):\n", + " def __init__(self, eta, rho):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.second = 0.0\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + " self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n", + " return self.eta * gradient / (np.sqrt(self.second + delta))\n", + "\n", + " def reset(self):\n", + " self.second = 0.0\n", + "\n", + "\n", + "class Adam(Scheduler):\n", + " def __init__(self, eta, rho, rho2):\n", + " super().__init__(eta)\n", + " self.rho = rho\n", + " self.rho2 = rho2\n", + " self.moment = 0\n", + " self.second = 0\n", + " self.n_epochs = 1\n", + "\n", + " def update_change(self, gradient):\n", + " delta = 1e-8 # avoid division ny zero\n", + "\n", + " self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n", + " self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n", + "\n", + " moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n", + " second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n", + "\n", + " return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n", + "\n", + " def reset(self):\n", + " self.n_epochs += 1\n", + " self.moment = 0\n", + " self.second = 0" + ] + }, + { + "cell_type": "markdown", + "id": "961989d9", + "metadata": { + "editable": true + }, + "source": [ + "### Usage of the above learning rate schedulers\n", + "\n", + "To initalize a scheduler, simply create the object and pass in the\n", + "necessary parameters such as the learning rate and the momentum as\n", + "shown below. As the Scheduler class is an abstract class it should not\n", + "called directly, and will raise an error upon usage." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "1e9fbe0f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n", + "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)" + ] + }, + { + "cell_type": "markdown", + "id": "b5adb1b4", + "metadata": { + "editable": true + }, + "source": [ + "Here is a small example for how a segment of code using schedulers\n", + "could look. Switching out the schedulers is simple." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "dc4f4d28", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "weights = np.ones((3,3))\n", + "print(f\"Before scheduler:\\n{weights=}\")\n", + "\n", + "epochs = 10\n", + "for e in range(epochs):\n", + " gradient = np.random.rand(3, 3)\n", + " change = adam_scheduler.update_change(gradient)\n", + " weights = weights - change\n", + " adam_scheduler.reset()\n", + "\n", + "print(f\"\\nAfter scheduler:\\n{weights=}\")" + ] + }, + { + "cell_type": "markdown", + "id": "8964d118", + "metadata": { + "editable": true + }, + "source": [ + "### Cost functions\n", + "\n", + "Here we discuss cost functions that can be used when creating the\n", + "neural network. Every cost function takes the target vector as its\n", + "parameter, and returns a function valued only at $x$ such that it may\n", + "easily be differentiated." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3a8470bd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "\n", + "def CostOLS(target):\n", + " \n", + " def func(X):\n", + " return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostLogReg(target):\n", + "\n", + " def func(X):\n", + " \n", + " return -(1.0 / target.shape[0]) * np.sum(\n", + " (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n", + " )\n", + "\n", + " return func\n", + "\n", + "\n", + "def CostCrossEntropy(target):\n", + " \n", + " def func(X):\n", + " return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n", + "\n", + " return func" + ] + }, + { + "cell_type": "markdown", + "id": "ab4daf8f", + "metadata": { + "editable": true + }, + "source": [ + "Below we give a short example of how these cost function may be used\n", + "to obtain results if you wish to test them out on your own using\n", + "AutoGrad's automatics differentiation." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "cf8922ac", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from autograd import grad\n", + "\n", + "target = np.array([[1, 2, 3]]).T\n", + "a = np.array([[4, 5, 6]]).T\n", + "\n", + "cost_func = CostCrossEntropy\n", + "cost_func_derivative = grad(cost_func(target))\n", + "\n", + "valued_at_a = cost_func_derivative(a)\n", + "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")" + ] + }, + { + "cell_type": "markdown", + "id": "fab332c4", + "metadata": { + "editable": true + }, + "source": [ + "### Activation functions\n", + "\n", + "Finally, before we look at the neural network, we will look at the\n", + "activation functions which can be specified between the hidden layers\n", + "and as the output function. Each function can be valued for any given\n", + "vector or matrix X, and can be differentiated via derivate()." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "5ab56013", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import elementwise_grad\n", + "\n", + "def identity(X):\n", + " return X\n", + "\n", + "\n", + "def sigmoid(X):\n", + " try:\n", + " return 1.0 / (1 + np.exp(-X))\n", + " except FloatingPointError:\n", + " return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n", + "\n", + "\n", + "def softmax(X):\n", + " X = X - np.max(X, axis=-1, keepdims=True)\n", + " delta = 10e-10\n", + " return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n", + "\n", + "\n", + "def RELU(X):\n", + " return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n", + "\n", + "\n", + "def LRELU(X):\n", + " delta = 10e-4\n", + " return np.where(X > np.zeros(X.shape), X, delta * X)\n", + "\n", + "\n", + "def derivate(func):\n", + " if func.__name__ == \"RELU\":\n", + "\n", + " def func(X):\n", + " return np.where(X > 0, 1, 0)\n", + "\n", + " return func\n", + "\n", + " elif func.__name__ == \"LRELU\":\n", + "\n", + " def func(X):\n", + " delta = 10e-4\n", + " return np.where(X > 0, 1, delta)\n", + "\n", + " return func\n", + "\n", + " else:\n", + " return elementwise_grad(func)" + ] + }, + { + "cell_type": "markdown", + "id": "969612c3", + "metadata": { + "editable": true + }, + "source": [ + "Below follows a short demonstration of how to use an activation\n", + "function. The derivative of the activation function will be important\n", + "when calculating the output delta term during backpropagation. Note\n", + "that derivate() can also be used for cost functions for a more\n", + "generalized approach." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "313878c6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "z = np.array([[4, 5, 6]]).T\n", + "print(f\"Input to activation function:\\n{z}\")\n", + "\n", + "act_func = sigmoid\n", + "a = act_func(z)\n", + "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n", + "\n", + "act_func_derivative = derivate(act_func)\n", + "valued_at_z = act_func_derivative(a)\n", + "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")" + ] + }, + { + "cell_type": "markdown", + "id": "095347a2", + "metadata": { + "editable": true + }, + "source": [ + "### The Neural Network\n", + "\n", + "Now that we have gotten a good understanding of the implementation of\n", + "some important components, we can take a look at an object oriented\n", + "implementation of a feed forward neural network. The feed forward\n", + "neural network has been implemented as a class named FFNN, which can\n", + "be initiated as a regressor or classifier dependant on the choice of\n", + "cost function. The FFNN can have any number of input nodes, hidden\n", + "layers with any amount of hidden nodes, and any amount of output nodes\n", + "meaning it can perform multiclass classification as well as binary\n", + "classification and regression problems. Although there is a lot of\n", + "code present, it makes for an easy to use and generalizeable interface\n", + "for creating many types of neural networks as will be demonstrated\n", + "below." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9ea2b0b7", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import math\n", + "import autograd.numpy as np\n", + "import sys\n", + "import warnings\n", + "from autograd import grad, elementwise_grad\n", + "from random import random, seed\n", + "from copy import deepcopy, copy\n", + "from typing import Tuple, Callable\n", + "from sklearn.utils import resample\n", + "\n", + "warnings.simplefilter(\"error\")\n", + "\n", + "\n", + "class FFNN:\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Feed Forward Neural Network with interface enabling flexible design of a\n", + " nerual networks architecture and the specification of activation function\n", + " in the hidden layers and output layer respectively. This model can be used\n", + " for both regression and classification problems, depending on the output function.\n", + "\n", + " Attributes:\n", + " ------------\n", + " I dimensions (tuple[int]): A list of positive integers, which specifies the\n", + " number of nodes in each of the networks layers. The first integer in the array\n", + " defines the number of nodes in the input layer, the second integer defines number\n", + " of nodes in the first hidden layer and so on until the last number, which\n", + " specifies the number of nodes in the output layer.\n", + " II hidden_func (Callable): The activation function for the hidden layers\n", + " III output_func (Callable): The activation function for the output layer\n", + " IV cost_func (Callable): Our cost function\n", + " V seed (int): Sets random seed, makes results reproducible\n", + " \"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " dimensions: tuple[int],\n", + " hidden_func: Callable = sigmoid,\n", + " output_func: Callable = lambda x: x,\n", + " cost_func: Callable = CostOLS,\n", + " seed: int = None,\n", + " ):\n", + " self.dimensions = dimensions\n", + " self.hidden_func = hidden_func\n", + " self.output_func = output_func\n", + " self.cost_func = cost_func\n", + " self.seed = seed\n", + " self.weights = list()\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + " self.classification = None\n", + "\n", + " self.reset_weights()\n", + " self._set_classification()\n", + "\n", + " def fit(\n", + " self,\n", + " X: np.ndarray,\n", + " t: np.ndarray,\n", + " scheduler: Scheduler,\n", + " batches: int = 1,\n", + " epochs: int = 100,\n", + " lam: float = 0,\n", + " X_val: np.ndarray = None,\n", + " t_val: np.ndarray = None,\n", + " ):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " This function performs the training the neural network by performing the feedforward and backpropagation\n", + " algorithm to update the networks weights.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray) : training data\n", + " II t (np.ndarray) : target data\n", + " III scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n", + " IV scheduler_args (list[int]) : list of all arguments necessary for scheduler\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " V batches (int) : number of batches the datasets are split into, default equal to 1\n", + " VI epochs (int) : number of iterations used to train the network, default equal to 100\n", + " VII lam (float) : regularization hyperparameter lambda\n", + " VIII X_val (np.ndarray) : validation set\n", + " IX t_val (np.ndarray) : validation target set\n", + "\n", + " Returns:\n", + " ------------\n", + " I scores (dict) : A dictionary containing the performance metrics of the model.\n", + " The number of the metrics depends on the parameters passed to the fit-function.\n", + "\n", + " \"\"\"\n", + "\n", + " # setup \n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " val_set = False\n", + " if X_val is not None and t_val is not None:\n", + " val_set = True\n", + "\n", + " # creating arrays for score metrics\n", + " train_errors = np.empty(epochs)\n", + " train_errors.fill(np.nan)\n", + " val_errors = np.empty(epochs)\n", + " val_errors.fill(np.nan)\n", + "\n", + " train_accs = np.empty(epochs)\n", + " train_accs.fill(np.nan)\n", + " val_accs = np.empty(epochs)\n", + " val_accs.fill(np.nan)\n", + "\n", + " self.schedulers_weight = list()\n", + " self.schedulers_bias = list()\n", + "\n", + " batch_size = X.shape[0] // batches\n", + "\n", + " X, t = resample(X, t)\n", + "\n", + " # this function returns a function valued only at X\n", + " cost_function_train = self.cost_func(t)\n", + " if val_set:\n", + " cost_function_val = self.cost_func(t_val)\n", + "\n", + " # create schedulers for each weight matrix\n", + " for i in range(len(self.weights)):\n", + " self.schedulers_weight.append(copy(scheduler))\n", + " self.schedulers_bias.append(copy(scheduler))\n", + "\n", + " print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n", + "\n", + " try:\n", + " for e in range(epochs):\n", + " for i in range(batches):\n", + " # allows for minibatch gradient descent\n", + " if i == batches - 1:\n", + " # If the for loop has reached the last batch, take all thats left\n", + " X_batch = X[i * batch_size :, :]\n", + " t_batch = t[i * batch_size :, :]\n", + " else:\n", + " X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n", + " t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n", + "\n", + " self._feedforward(X_batch)\n", + " self._backpropagate(X_batch, t_batch, lam)\n", + "\n", + " # reset schedulers for each epoch (some schedulers pass in this call)\n", + " for scheduler in self.schedulers_weight:\n", + " scheduler.reset()\n", + "\n", + " for scheduler in self.schedulers_bias:\n", + " scheduler.reset()\n", + "\n", + " # computing performance metrics\n", + " pred_train = self.predict(X)\n", + " train_error = cost_function_train(pred_train)\n", + "\n", + " train_errors[e] = train_error\n", + " if val_set:\n", + " \n", + " pred_val = self.predict(X_val)\n", + " val_error = cost_function_val(pred_val)\n", + " val_errors[e] = val_error\n", + "\n", + " if self.classification:\n", + " train_acc = self._accuracy(self.predict(X), t)\n", + " train_accs[e] = train_acc\n", + " if val_set:\n", + " val_acc = self._accuracy(pred_val, t_val)\n", + " val_accs[e] = val_acc\n", + "\n", + " # printing progress bar\n", + " progression = e / epochs\n", + " print_length = self._progress_bar(\n", + " progression,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " except KeyboardInterrupt:\n", + " # allows for stopping training at any point and seeing the result\n", + " pass\n", + "\n", + " # visualization of training progression (similiar to tensorflow progression bar)\n", + " sys.stdout.write(\"\\r\" + \" \" * print_length)\n", + " sys.stdout.flush()\n", + " self._progress_bar(\n", + " 1,\n", + " train_error=train_errors[e],\n", + " train_acc=train_accs[e],\n", + " val_error=val_errors[e],\n", + " val_acc=val_accs[e],\n", + " )\n", + " sys.stdout.write(\"\")\n", + "\n", + " # return performance metrics for the entire run\n", + " scores = dict()\n", + "\n", + " scores[\"train_errors\"] = train_errors\n", + "\n", + " if val_set:\n", + " scores[\"val_errors\"] = val_errors\n", + "\n", + " if self.classification:\n", + " scores[\"train_accs\"] = train_accs\n", + "\n", + " if val_set:\n", + " scores[\"val_accs\"] = val_accs\n", + "\n", + " return scores\n", + "\n", + " def predict(self, X: np.ndarray, *, threshold=0.5):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs prediction after training of the network has been finished.\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Optional Parameters:\n", + " ------------\n", + " II threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n", + " in classification problems\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " This vector is thresholded if regression=False, meaning that classification results\n", + " in a vector of 1s and 0s, while regressions in an array of decimal numbers\n", + "\n", + " \"\"\"\n", + "\n", + " predict = self._feedforward(X)\n", + "\n", + " if self.classification:\n", + " return np.where(predict > threshold, 1, 0)\n", + " else:\n", + " return predict\n", + "\n", + " def reset_weights(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Resets/Reinitializes the weights in order to train the network for a new problem.\n", + "\n", + " \"\"\"\n", + " if self.seed is not None:\n", + " np.random.seed(self.seed)\n", + "\n", + " self.weights = list()\n", + " for i in range(len(self.dimensions) - 1):\n", + " weight_array = np.random.randn(\n", + " self.dimensions[i] + 1, self.dimensions[i + 1]\n", + " )\n", + " weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n", + "\n", + " self.weights.append(weight_array)\n", + "\n", + " def _feedforward(self, X: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates the activation of each layer starting at the input and ending at the output.\n", + " Each following activation is calculated from a weighted sum of each of the preceeding\n", + " activations (except in the case of the input layer).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each\n", + "\n", + " Returns:\n", + " ------------\n", + " I z (np.ndarray): A prediction vector (row) for each row in our design matrix\n", + " \"\"\"\n", + "\n", + " # reset matrices\n", + " self.a_matrices = list()\n", + " self.z_matrices = list()\n", + "\n", + " # if X is just a vector, make it into a matrix\n", + " if len(X.shape) == 1:\n", + " X = X.reshape((1, X.shape[0]))\n", + "\n", + " # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n", + " # to add bias to our data\n", + " bias = np.ones((X.shape[0], 1)) * 0.01\n", + " X = np.hstack([bias, X])\n", + "\n", + " # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n", + " # exponent indicates layer number).\n", + " a = X\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(a)\n", + "\n", + " # The feed forward algorithm\n", + " for i in range(len(self.weights)):\n", + " if i < len(self.weights) - 1:\n", + " z = a @ self.weights[i]\n", + " self.z_matrices.append(z)\n", + " a = self.hidden_func(z)\n", + " # bias column again added to the data here\n", + " bias = np.ones((a.shape[0], 1)) * 0.01\n", + " a = np.hstack([bias, a])\n", + " self.a_matrices.append(a)\n", + " else:\n", + " try:\n", + " # a^L, the nodes in our output layers\n", + " z = a @ self.weights[i]\n", + " a = self.output_func(z)\n", + " self.a_matrices.append(a)\n", + " self.z_matrices.append(z)\n", + " except Exception as OverflowError:\n", + " print(\n", + " \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n", + " )\n", + "\n", + " # this will be a^L\n", + " return a\n", + "\n", + " def _backpropagate(self, X, t, lam):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Performs the backpropagation algorithm. In other words, this method\n", + " calculates the gradient of all the layers starting at the\n", + " output layer, and moving from right to left accumulates the gradient until\n", + " the input layer is reached. Each layers respective weights are updated while\n", + " the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n", + "\n", + " Parameters:\n", + " ------------\n", + " I X (np.ndarray): The design matrix, with n rows of p features each.\n", + " II t (np.ndarray): The target vector, with n rows of p targets.\n", + " III lam (float32): regularization parameter used to punish the weights in case of overfitting\n", + "\n", + " Returns:\n", + " ------------\n", + " No return value.\n", + "\n", + " \"\"\"\n", + " out_derivative = derivate(self.output_func)\n", + " hidden_derivative = derivate(self.hidden_func)\n", + "\n", + " for i in range(len(self.weights) - 1, -1, -1):\n", + " # delta terms for output\n", + " if i == len(self.weights) - 1:\n", + " # for multi-class classification\n", + " if (\n", + " self.output_func.__name__ == \"softmax\"\n", + " ):\n", + " delta_matrix = self.a_matrices[i + 1] - t\n", + " # for single class classification\n", + " else:\n", + " cost_func_derivative = grad(self.cost_func(t))\n", + " delta_matrix = out_derivative(\n", + " self.z_matrices[i + 1]\n", + " ) * cost_func_derivative(self.a_matrices[i + 1])\n", + "\n", + " # delta terms for hidden layer\n", + " else:\n", + " delta_matrix = (\n", + " self.weights[i + 1][1:, :] @ delta_matrix.T\n", + " ).T * hidden_derivative(self.z_matrices[i + 1])\n", + "\n", + " # calculate gradient\n", + " gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n", + " gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n", + " 1, delta_matrix.shape[1]\n", + " )\n", + "\n", + " # regularization term\n", + " gradient_weights += self.weights[i][1:, :] * lam\n", + "\n", + " # use scheduler\n", + " update_matrix = np.vstack(\n", + " [\n", + " self.schedulers_bias[i].update_change(gradient_bias),\n", + " self.schedulers_weight[i].update_change(gradient_weights),\n", + " ]\n", + " )\n", + "\n", + " # update weights and bias\n", + " self.weights[i] -= update_matrix\n", + "\n", + " def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Calculates accuracy of given prediction to target\n", + "\n", + " Parameters:\n", + " ------------\n", + " I prediction (np.ndarray): vector of predicitons output network\n", + " (1s and 0s in case of classification, and real numbers in case of regression)\n", + " II target (np.ndarray): vector of true values (What the network ideally should predict)\n", + "\n", + " Returns:\n", + " ------------\n", + " A floating point number representing the percentage of correctly classified instances.\n", + " \"\"\"\n", + " assert prediction.size == target.size\n", + " return np.average((target == prediction))\n", + " def _set_classification(self):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Decides if FFNN acts as classifier (True) og regressor (False),\n", + " sets self.classification during init()\n", + " \"\"\"\n", + " self.classification = False\n", + " if (\n", + " self.cost_func.__name__ == \"CostLogReg\"\n", + " or self.cost_func.__name__ == \"CostCrossEntropy\"\n", + " ):\n", + " self.classification = True\n", + "\n", + " def _progress_bar(self, progression, **kwargs):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Displays progress of training\n", + " \"\"\"\n", + " print_length = 40\n", + " num_equals = int(progression * print_length)\n", + " num_not = print_length - num_equals\n", + " arrow = \">\" if num_equals > 0 else \"\"\n", + " bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n", + " perc_print = self._format(progression * 100, decimals=5)\n", + " line = f\" {bar} {perc_print}% \"\n", + "\n", + " for key in kwargs:\n", + " if not np.isnan(kwargs[key]):\n", + " value = self._format(kwargs[key], decimals=4)\n", + " line += f\"| {key}: {value} \"\n", + " sys.stdout.write(\"\\r\" + line)\n", + " sys.stdout.flush()\n", + " return len(line)\n", + "\n", + " def _format(self, value, decimals=4):\n", + " \"\"\"\n", + " Description:\n", + " ------------\n", + " Formats decimal numbers for progress bar\n", + " \"\"\"\n", + " if value > 0:\n", + " v = value\n", + " elif value < 0:\n", + " v = -10 * value\n", + " else:\n", + " v = 1\n", + " n = 1 + math.floor(math.log10(v))\n", + " if n >= decimals - 1:\n", + " return str(round(value))\n", + " return f\"{value:.{decimals-n-1}f}\"" + ] + }, + { + "cell_type": "markdown", + "id": "0f29bccd", + "metadata": { + "editable": true + }, + "source": [ + "Before we make a model, we will quickly generate a dataset we can use\n", + "for our linear regression problem as shown below" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "dc37b403", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "def SkrankeFunction(x, y):\n", + " return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n", + "\n", + "def create_X(x, y, n):\n", + " if len(x.shape) > 1:\n", + " x = np.ravel(x)\n", + " y = np.ravel(y)\n", + "\n", + " N = len(x)\n", + " l = int((n + 1) * (n + 2) / 2) # Number of elements in beta\n", + " X = np.ones((N, l))\n", + "\n", + " for i in range(1, n + 1):\n", + " q = int((i) * (i + 1) / 2)\n", + " for k in range(i + 1):\n", + " X[:, q + k] = (x ** (i - k)) * (y**k)\n", + "\n", + " return X\n", + "\n", + "step=0.5\n", + "x = np.arange(0, 1, step)\n", + "y = np.arange(0, 1, step)\n", + "x, y = np.meshgrid(x, y)\n", + "target = SkrankeFunction(x, y)\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "poly_degree=3\n", + "X = create_X(x, y, poly_degree)\n", + "\n", + "X_train, X_test, t_train, t_test = train_test_split(X, target)" + ] + }, + { + "cell_type": "markdown", + "id": "91790369", + "metadata": { + "editable": true + }, + "source": [ + "Now that we have our dataset ready for the regression, we can create\n", + "our regressor. Note that with the seed parameter, we can make sure our\n", + "results stay the same every time we run the neural network. For\n", + "inititialization, we simply specify the dimensions (we wish the amount\n", + "of input nodes to be equal to the datapoints, and the output to\n", + "predict one value)." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "62585c7a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "69cdc171", + "metadata": { + "editable": true + }, + "source": [ + "We then fit our model with our training data using the scheduler of our choice." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d0713298", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Constant(eta=1e-3)\n", + "scores = linear_regression.fit(X_train, t_train, scheduler)" + ] + }, + { + "cell_type": "markdown", + "id": "310f805d", + "metadata": { + "editable": true + }, + "source": [ + "Due to the progress bar we can see the MSE (train_error) throughout\n", + "the FFNN's training. Note that the fit() function has some optional\n", + "parameters with defualt arguments. For example, the regularization\n", + "hyperparameter can be left ignored if not needed, and equally the FFNN\n", + "will by default run for 100 epochs. These can easily be changed, such\n", + "as for example:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "216d1c44", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "ba2e5a39", + "metadata": { + "editable": true + }, + "source": [ + "We see that given more epochs to train on, the regressor reaches a lower MSE.\n", + "\n", + "Let us then switch to a binary classification. We use a binary\n", + "classification dataset, and follow a similar setup to the regression\n", + "case." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "8c5b291e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_breast_cancer\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "wisconsin = load_breast_cancer()\n", + "X = wisconsin.data\n", + "target = wisconsin.target\n", + "target = target.reshape(target.shape[0], 1)\n", + "\n", + "X_train, X_val, t_train, t_val = train_test_split(X, target)\n", + "\n", + "scaler = MinMaxScaler()\n", + "scaler.fit(X_train)\n", + "X_train = scaler.transform(X_train)\n", + "X_val = scaler.transform(X_val)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "4f6aa682", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "markdown", + "id": "3ff7c54a", + "metadata": { + "editable": true + }, + "source": [ + "We will now make use of our validation data by passing it into our fit function as a keyword argument" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "4bbcaedd", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "aa4f54fe", + "metadata": { + "editable": true + }, + "source": [ + "Finally, we will create a neural network with 2 hidden layers with activation functions." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "c11be1f5", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "input_nodes = X_train.shape[1]\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 1\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "78482f24", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)" + ] + }, + { + "cell_type": "markdown", + "id": "678b88e7", + "metadata": { + "editable": true + }, + "source": [ + "### Multiclass classification\n", + "\n", + "Finally, we will demonstrate the use case of multiclass classification\n", + "using our FFNN with the famous MNIST dataset, which contain images of\n", + "digits between the range of 0 to 9." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "833a7321", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import load_digits\n", + "\n", + "def onehot(target: np.ndarray):\n", + " onehot = np.zeros((target.size, target.max() + 1))\n", + " onehot[np.arange(target.size), target] = 1\n", + " return onehot\n", + "\n", + "digits = load_digits()\n", + "\n", + "X = digits.data\n", + "target = digits.target\n", + "target = onehot(target)\n", + "\n", + "input_nodes = 64\n", + "hidden_nodes1 = 100\n", + "hidden_nodes2 = 30\n", + "output_nodes = 10\n", + "\n", + "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n", + "\n", + "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n", + "\n", + "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "\n", + "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n", + "scores = multiclass.fit(X, target, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "1af2ad7b", + "metadata": { + "editable": true + }, + "source": [ + "## Testing the XOR gate and other gates\n", + "\n", + "Let us now use our code to test the XOR gate." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "752c6403", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n", + "\n", + "# The XOR gate\n", + "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n", + "\n", + "input_nodes = X.shape[1]\n", + "output_nodes = 1\n", + "\n", + "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n", + "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n", + "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n", + "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)" + ] + }, + { + "cell_type": "markdown", + "id": "0a7c91e3", + "metadata": { + "editable": true + }, + "source": [ + "Not bad, but the results depend strongly on the learning reate. Try different learning rates." + ] + }, + { + "cell_type": "markdown", + "id": "40ffa1fb", + "metadata": { + "editable": true + }, + "source": [ + "## Solving differential equations with Deep Learning\n", + "\n", + "The Universal Approximation Theorem states that a neural network can\n", + "approximate any function at a single hidden layer along with one input\n", + "and output layer to any given precision.\n", + "\n", + "**Book on solving differential equations with ML methods.**\n", + "\n", + "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n", + "\n", + "**Physics informed neural networks.**\n", + "\n", + "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n", + "\n", + "**Thanks to Kristine Baluka Hein.**\n", + "\n", + "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n", + "A great thanks to Kristine." + ] + }, + { + "cell_type": "markdown", + "id": "191ba3eb", + "metadata": { + "editable": true + }, + "source": [ + "## Ordinary Differential Equations first\n", + "\n", + "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n", + "\n", + "In general, an ordinary differential equation looks like" + ] + }, + { + "cell_type": "markdown", + "id": "a0be312a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{ode} \\tag{1}\n", + "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "000663cf", + "metadata": { + "editable": true + }, + "source": [ + "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n", + "\n", + "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n", + "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n", + "The equation is referred to as a $n$-th order ODE.\n", + "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n", + "for the solution to be unique." + ] + }, + { + "cell_type": "markdown", + "id": "f5b87995", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "Let the trial solution $g_t(x)$ be" + ] + }, + { + "cell_type": "markdown", + "id": "a166c0b6", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f1e49a2c", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n", + "of conditions, $N(x,P)$ a neural network with weights and biases\n", + "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n", + "neural network. The role of the function $h_2(x, N(x,P))$, is to\n", + "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n", + "evaluated at the values of $x$ where the given conditions must be\n", + "satisfied. The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n", + "the conditions.\n", + "\n", + "But what about the network $N(x,P)$?\n", + "\n", + "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation." + ] + }, + { + "cell_type": "markdown", + "id": "207d1a97", + "metadata": { + "editable": true + }, + "source": [ + "## Minimization process\n", + "\n", + "For the minimization to be defined, we need to have a cost function at hand to minimize.\n", + "\n", + "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n", + "We can choose to consider the mean squared error as the cost function for an input $x$.\n", + "Since we are looking at one input, the cost function is just $f$ squared.\n", + "The cost function $c\\left(x, P \\right)$ can therefore be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "94a061a1", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "93244d03", + "metadata": { + "editable": true + }, + "source": [ + "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n", + "the cost function becomes" + ] + }, + { + "cell_type": "markdown", + "id": "6dc16fd4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{cost} \\tag{3}\n", + "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "01f4c14a", + "metadata": { + "editable": true + }, + "source": [ + "The neural net should then find the parameters $P$ that minimizes the cost function in\n", + "([3](#cost)) for a set of $N$ training samples $x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "1784066c", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cost function using gradient descent and automatic differentiation\n", + "\n", + "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n", + "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n", + "\n", + "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n", + "Automatic differentiation is a method of finding the derivatives numerically with very high precision." + ] + }, + { + "cell_type": "markdown", + "id": "43e1b7bf", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Exponential decay\n", + "\n", + "An exponential decay of a quantity $g(x)$ is described by the equation" + ] + }, + { + "cell_type": "markdown", + "id": "5c28e60a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solve_expdec} \\tag{4}\n", + " g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfd2e420", + "metadata": { + "editable": true + }, + "source": [ + "with $g(0) = g_0$ for some chosen initial value $g_0$.\n", + "\n", + "The analytical solution of ([4](#solve_expdec)) is" + ] + }, + { + "cell_type": "markdown", + "id": "b93aa0f8", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "093952f0", + "metadata": { + "editable": true + }, + "source": [ + "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))." + ] + }, + { + "cell_type": "markdown", + "id": "8f82fa61", + "metadata": { + "editable": true + }, + "source": [ + "## The function to solve for\n", + "\n", + "The program will use a neural network to solve" + ] + }, + { + "cell_type": "markdown", + "id": "027d9c52", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode} \\tag{6}\n", + "g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c18c4ee8", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n", + "\n", + "In this example, $\\gamma = 2$ and $g_0 = 10$." + ] + }, + { + "cell_type": "markdown", + "id": "a0d7fc0a", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be" + ] + }, + { + "cell_type": "markdown", + "id": "73cd72f4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a4d0850f", + "metadata": { + "editable": true + }, + "source": [ + "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer." + ] + }, + { + "cell_type": "markdown", + "id": "62f3b94f", + "metadata": { + "editable": true + }, + "source": [ + "## Setup of Network\n", + "\n", + "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}}, P_{\\text{output}} \\}$.\n", + "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n", + "\n", + "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n", + "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n", + "\n", + "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n", + "\n", + "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:" + ] + }, + { + "cell_type": "markdown", + "id": "f5144858", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{trial} \\tag{7}\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6b441362", + "metadata": { + "editable": true + }, + "source": [ + "## Reformulating the problem\n", + "\n", + "We wish that our neural network manages to minimize a given cost function.\n", + "\n", + "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n", + "such that it describes the problem a neural network can solve for.\n", + "\n", + "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n", + "\n", + "The trial solution" + ] + }, + { + "cell_type": "markdown", + "id": "abfe2d6d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aabb6c7b", + "metadata": { + "editable": true + }, + "source": [ + "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "11fc8b1b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{nnmin} \\tag{8}\n", + "g_t'(x, P) = - \\gamma g_t(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "604c92b4", + "metadata": { + "editable": true + }, + "source": [ + "is fulfilled as *best as possible*." + ] + }, + { + "cell_type": "markdown", + "id": "e2cd7572", + "metadata": { + "editable": true + }, + "source": [ + "## More technicalities\n", + "\n", + "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n", + "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n", + "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n", + "\n", + "This gives the following cost function our neural network must solve for:" + ] + }, + { + "cell_type": "markdown", + "id": "d916a5f6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d746e69c", + "metadata": { + "editable": true + }, + "source": [ + "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n", + "\n", + "or, in terms of weights and biases for the hidden and output layer in our network:" + ] + }, + { + "cell_type": "markdown", + "id": "4c34c242", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f55f3047", + "metadata": { + "editable": true + }, + "source": [ + "for an input value $x$." + ] + }, + { + "cell_type": "markdown", + "id": "485e4671", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes" + ] + }, + { + "cell_type": "markdown", + "id": "5628ca35", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{min} \\tag{9}\n", + "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "da2c90ea", + "metadata": { + "editable": true + }, + "source": [ + "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes" + ] + }, + { + "cell_type": "markdown", + "id": "d386a466", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P} C(\\boldsymbol{x}, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ec3d975a", + "metadata": { + "editable": true + }, + "source": [ + "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n", + "\n", + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4f0f47e7", + "metadata": { + "editable": true + }, + "source": [ + "## A possible implementation of a neural network\n", + "\n", + "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n", + "\n", + "First, the neural network must feed forward the inputs.\n", + "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n", + "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be $N_{\\text{hidden} }$." + ] + }, + { + "cell_type": "markdown", + "id": "a757d9cf", + "metadata": { + "editable": true + }, + "source": [ + "## Technicalities\n", + "\n", + "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:" + ] + }, + { + "cell_type": "markdown", + "id": "ee093dd9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "x_j\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4d3954bf", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities I\n", + "\n", + "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:" + ] + }, + { + "cell_type": "markdown", + "id": "b4b36b8c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big) \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + " b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "x_1 & x_2 & \\dots & x_N\n", + "\\end{pmatrix} \\\\\n", + "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "36e8a1dd", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities II\n", + "\n", + "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n", + "\n", + "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n", + "\n", + "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:" + ] + }, + { + "cell_type": "markdown", + "id": "af2e68be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7b8922c6", + "metadata": { + "editable": true + }, + "source": [ + "It is possible to use other activations functions for the hidden layer also.\n", + "\n", + "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n", + "\n", + "$$\n", + "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big( \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n", + "$$\n", + "\n", + "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n", + "\n", + "The output layer consists of one neuron in this case, and combines the\n", + "output from each of the neurons in the hidden layers. The output layer\n", + "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n", + "and biases $b_i^{\\text{output}}$. In this case,\n", + "it is assumes that the number of neurons in the output layer is one." + ] + }, + { + "cell_type": "markdown", + "id": "2aa977d9", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities III\n", + "\n", + "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously." + ] + }, + { + "cell_type": "markdown", + "id": "48eccfa6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{1,j}^{\\text{output}} & =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "\\boldsymbol{x}_j^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4c2cdbf", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities IV\n", + "\n", + "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:" + ] + }, + { + "cell_type": "markdown", + "id": "be26d9c9", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}_{1}^{\\text{output}} =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3703c9a", + "metadata": { + "editable": true + }, + "source": [ + "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network." + ] + }, + { + "cell_type": "markdown", + "id": "9859680c", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation\n", + "\n", + "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n", + "\n", + "The chosen cost function for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "c3df269d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dc69023a", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize the cost function, an optimization method must be chosen.\n", + "\n", + "Here, gradient descent with a constant step size has been chosen." + ] + }, + { + "cell_type": "markdown", + "id": "d4bed3bd", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent\n", + "\n", + "The idea of the gradient descent algorithm is to update parameters in\n", + "a direction where the cost function decreases goes to a minimum.\n", + "\n", + "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n", + "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n", + "\\boldsymbol{\\omega})$, goes as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "ed2a4f9a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9a4f604", + "metadata": { + "editable": true + }, + "source": [ + "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n", + "\n", + "The value of $\\lambda$ decides how large steps the algorithm must take\n", + "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n", + "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n", + "to the elements in $\\boldsymbol{\\omega}$.\n", + "\n", + "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n", + "respect to the two sets of weights and biases, that is for the hidden\n", + "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n", + "}$ .\n", + "\n", + "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by" + ] + }, + { + "cell_type": "markdown", + "id": "e48d507f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P) \\\\\n", + "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b84c5cf5", + "metadata": { + "editable": true + }, + "source": [ + "## The code for solving the ODE" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "293d0f7d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Assuming one input, hidden, and output layer\n", + "def neural_network(params, x):\n", + "\n", + " # Find the weights (including and biases) for the hidden and output layer.\n", + " # Assume that params is a list of parameters for each layer.\n", + " # The biases are the first element for each array in params,\n", + " # and the weights are the remaning elements in each array in params.\n", + "\n", + " w_hidden = params[0]\n", + " w_output = params[1]\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " ## Hidden layer:\n", + "\n", + " # Add a row of ones to include bias\n", + " x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_input)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " ## Output layer:\n", + "\n", + " # Include bias:\n", + " x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_hidden)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial(x,params, g0 = 10):\n", + " return g0 + x*neural_network(params,x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n", + "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n", + " ## Set up initial weights and biases\n", + "\n", + " # For the hidden layer\n", + " p0 = npr.randn(num_neurons_hidden, 2 )\n", + "\n", + " # For the output layer\n", + " p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n", + "\n", + " P = [p0, p1]\n", + "\n", + " print('Initial cost: %g'%cost_function(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of two arrays;\n", + " # one for the gradient w.r.t P_hidden and\n", + " # one for the gradient w.r.t P_output\n", + " cost_grad = cost_function_grad(P, x)\n", + "\n", + " P[0] = P[0] - lmb * cost_grad[0]\n", + " P[1] = P[1] - lmb * cost_grad[1]\n", + "\n", + " print('Final cost: %g'%cost_function(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " # Set seed such that the weight are initialized\n", + " # with same weights and biases for every run.\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = 10\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " # Use the network\n", + " P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " # Print the deviation from the trial solution and true solution\n", + " res = g_trial(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "54c070e1", + "metadata": { + "editable": true + }, + "source": [ + "## The network with one input layer, specified number of hidden layers, and one output layer\n", + "\n", + "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n", + "\n", + "The number of neurons within each hidden layer are given as a list of integers in the program below." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "4ab2467e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# The neural network with one input layer and one output layer,\n", + "# but with number of hidden layers specified by the user.\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x,params, g0 = 10):\n", + " return g0 + x*deep_neural_network(params, x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The same cost function as before, but calls deep_neural_network instead.\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input and one output layer,\n", + "# but with specified number of hidden layers from the user.\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # The number of elements in the list num_hidden_neurons thus represents\n", + " # the number of hidden layers.\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weights and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = np.array([10,10])\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " res = g_trial_deep(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','dnn'])\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "05126a03", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Population growth\n", + "\n", + "A logistic model of population growth assumes that a population converges toward an equilibrium.\n", + "The population growth can be modeled by" + ] + }, + { + "cell_type": "markdown", + "id": "7b4e9871", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{log} \\tag{10}\n", + "\tg'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "20266e3a", + "metadata": { + "editable": true + }, + "source": [ + "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n", + "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n", + "\n", + "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n", + "and high execution time (this might be more apparent in the examples solving PDEs),\n", + "using a library like TensorFlow is recommended.\n", + "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method." + ] + }, + { + "cell_type": "markdown", + "id": "8a3f1b3d", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the problem\n", + "\n", + "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n", + "The population follows the model" + ] + }, + { + "cell_type": "markdown", + "id": "14dfc04b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode_population} \\tag{11}\n", + "g'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b125d1d3", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$.\n", + "\n", + "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$." + ] + }, + { + "cell_type": "markdown", + "id": "226a3528", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "We will get a slightly different trial solution, as the boundary conditions are different\n", + "compared to the case for exponential decay.\n", + "\n", + "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n", + "\n", + "$$\n", + "h_1(t) = g_0 + t \\cdot N(t,P)\n", + "$$\n", + "\n", + "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n", + "\n", + "The analytical solution is\n", + "\n", + "$$\n", + "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "adeeb731", + "metadata": { + "editable": true + }, + "source": [ + "## The program using Autograd\n", + "\n", + "The network will be the similar as for the exponential decay example, but with some small modifications for our problem." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "eb3ed6d1", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Function to get the parameters.\n", + "# Done such that one can easily change the paramaters after one's liking.\n", + "def get_parameters():\n", + " alpha = 2\n", + " A = 1\n", + " g0 = 1.2\n", + " return alpha, A, g0\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = f(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# The right side of the ODE:\n", + "def f(x, g_trial):\n", + " alpha,A, g0 = get_parameters()\n", + " return alpha*g_trial*(A - g_trial)\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x, params):\n", + " alpha,A, g0 = get_parameters()\n", + " return g0 + x*deep_neural_network(params,x)\n", + "\n", + "# The analytical solution:\n", + "def g_analytic(t):\n", + " alpha,A, g0 = get_parameters()\n", + " return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100, 50, 25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2407df1c", + "metadata": { + "editable": true + }, + "source": [ + "## Using forward Euler to solve the ODE\n", + "\n", + "A straightforward way of solving an ODE numerically, is to use Euler's method.\n", + "\n", + "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n", + "\n", + "$$\n", + "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n", + "$$\n", + "\n", + "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields" + ] + }, + { + "cell_type": "markdown", + "id": "e30d9840", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + " g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n", + " &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4af6e338", + "metadata": { + "editable": true + }, + "source": [ + "along with the condition that $g(0) = g_0$.\n", + "\n", + "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n", + "\n", + "For $i \\geq 1$, we have that" + ] + }, + { + "cell_type": "markdown", + "id": "606cf0d3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "t_i &= i\\Delta t \\\\\n", + "&= (i - 1)\\Delta t + \\Delta t \\\\\n", + "&= t_{i-1} + \\Delta t\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3275ea67", + "metadata": { + "editable": true + }, + "source": [ + "Now, if $g_i = g(t_i)$ then" + ] + }, + { + "cell_type": "markdown", + "id": "8c36efec", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " g_i &= g(t_i) \\\\\n", + " &= g(t_{i-1} + \\Delta t) \\\\\n", + " &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n", + " &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odenum} \\tag{12}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5290cde6", + "metadata": { + "editable": true + }, + "source": [ + "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n", + "\n", + "Equation ([12](#odenum)) could be implemented in the following way,\n", + "extending the program that uses the network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "d5488516", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Assume that all function definitions from the example program using Autograd\n", + "# are located here.\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100,50,25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " ## Find an approximation to the funtion using forward Euler\n", + "\n", + " alpha, A, g0 = get_parameters()\n", + " dt = T/(Nt - 1)\n", + "\n", + " # Perform forward Euler to solve the ODE\n", + " g_euler = np.zeros(Nt)\n", + " g_euler[0] = g0\n", + "\n", + " for i in range(1,Nt):\n", + " g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n", + "\n", + " # Print the errors done by each method\n", + " diff1 = np.max(np.abs(g_euler - g_analytical))\n", + " diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n", + "\n", + " print('Max absolute difference between Euler method and analytical: %g'%diff1)\n", + " print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n", + "\n", + " # Plot results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(t,g_euler)\n", + " plt.plot(t,g_analytical)\n", + " plt.plot(t,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['euler','analytical','dnn'])\n", + " plt.xlabel('Time t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "d631641d", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the one dimensional Poisson equation\n", + "\n", + "The Poisson equation for $g(x)$ in one dimension is" + ] + }, + { + "cell_type": "markdown", + "id": "3bd8043b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{poisson} \\tag{13}\n", + " -g''(x) = f(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "818ac1d8", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function for $x \\in (0,1)$.\n", + "\n", + "The conditions that $g(x)$ is chosen to fulfill, are" + ] + }, + { + "cell_type": "markdown", + "id": "894be116", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g(0) &= 0 \\\\\n", + " g(1) &= 0\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c2fce07f", + "metadata": { + "editable": true + }, + "source": [ + "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n", + "The results from the networks can then be compared to the analytical solution.\n", + "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "1e2ffb5e", + "metadata": { + "editable": true + }, + "source": [ + "## The specific equation to solve for\n", + "\n", + "Here, the function $g(x)$ to solve for follows the equation" + ] + }, + { + "cell_type": "markdown", + "id": "5677eb07", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "-g''(x) = f(x),\\qquad x \\in (0,1)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "89173815", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function, along with the chosen conditions" + ] + }, + { + "cell_type": "markdown", + "id": "f6e81c01", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0) = g(1) = 0\n", + "\\end{aligned}\\label{cond} \\tag{14}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "82b4c100", + "metadata": { + "editable": true + }, + "source": [ + "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n", + "\n", + "For this case, a possible trial solution satisfying the conditions could be" + ] + }, + { + "cell_type": "markdown", + "id": "05574f7f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5c17a08c", + "metadata": { + "editable": true + }, + "source": [ + "The analytical solution for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "a0ce240a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g(x) = x(1 - x)\\exp(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d90da9be", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the equation using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "ffd8b552", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "2cde42e7", + "metadata": { + "editable": true + }, + "source": [ + "## Comparing with a numerical scheme\n", + "\n", + "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n", + "\n", + "Using Taylor series, the second derivative can be expressed as\n", + "\n", + "$$\n", + "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n", + "$$\n", + "\n", + "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n", + "\n", + "Looking away from the error terms gives an approximation to the second derivative:" + ] + }, + { + "cell_type": "markdown", + "id": "e24a46af", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{approx} \\tag{15}\n", + "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2417ec7c", + "metadata": { + "editable": true + }, + "source": [ + "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes" + ] + }, + { + "cell_type": "markdown", + "id": "012a9c2b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n", + "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "101bccb8", + "metadata": { + "editable": true + }, + "source": [ + "Since we know from our problem that" + ] + }, + { + "cell_type": "markdown", + "id": "280cdc54", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "-g''(x) &= f(x) \\\\\n", + "&= (3x + x^2)\\exp(x)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "38bc9035", + "metadata": { + "editable": true + }, + "source": [ + "along with the conditions $g(0) = g(1) = 0$,\n", + "the following scheme can be used to find an approximate solution for $g(x)$ numerically:" + ] + }, + { + "cell_type": "markdown", + "id": "3925a117", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n", + " -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odesys} \\tag{16}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f86e85b", + "metadata": { + "editable": true + }, + "source": [ + "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n", + "\n", + "The equation can be rewritten into a matrix equation:" + ] + }, + { + "cell_type": "markdown", + "id": "394b14bc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\begin{pmatrix}\n", + "2 & -1 & 0 & \\dots & 0 \\\\\n", + "-1 & 2 & -1 & \\dots & 0 \\\\\n", + "\\vdots & & \\ddots & & \\vdots \\\\\n", + "0 & \\dots & -1 & 2 & -1 \\\\\n", + "0 & \\dots & 0 & -1 & 2\\\\\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "g_1 \\\\\n", + "g_2 \\\\\n", + "\\vdots \\\\\n", + "g_{N_x - 3} \\\\\n", + "g_{N_x - 2}\n", + "\\end{pmatrix}\n", + "&=\n", + "\\Delta x^2\n", + "\\begin{pmatrix}\n", + "f(x_1) \\\\\n", + "f(x_2) \\\\\n", + "\\vdots \\\\\n", + "f(x_{N_x - 3}) \\\\\n", + "f(x_{N_x - 2})\n", + "\\end{pmatrix} \\\\\n", + "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ab07ae1", + "metadata": { + "editable": true + }, + "source": [ + "which makes it possible to solve for the vector $\\boldsymbol{g}$." + ] + }, + { + "cell_type": "markdown", + "id": "8134c34f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the code\n", + "\n", + "We can then compare the result from this numerical scheme with the output from our network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "4362f9a9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + "\n", + " ## Perform the computation using the numerical scheme\n", + "\n", + " dx = 1/(Nx - 1)\n", + "\n", + " # Set up the matrix A\n", + " A = np.zeros((Nx-2,Nx-2))\n", + "\n", + " A[0,0] = 2\n", + " A[0,1] = -1\n", + "\n", + " for i in range(1,Nx-3):\n", + " A[i,i-1] = -1\n", + " A[i,i] = 2\n", + " A[i,i+1] = -1\n", + "\n", + " A[Nx - 3, Nx - 4] = -1\n", + " A[Nx - 3, Nx - 3] = 2\n", + "\n", + " # Set up the vector f\n", + " f_vec = dx**2 * f(x[1:-1])\n", + "\n", + " # Solve the equation\n", + " g_res = np.linalg.solve(A,f_vec)\n", + "\n", + " g_vec = np.zeros(Nx)\n", + " g_vec[1:-1] = g_res\n", + "\n", + " # Print the differences between each method\n", + " max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " max_diff2 = np.max(np.abs(g_vec - g_analytical))\n", + " print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n", + " print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(x,g_vec)\n", + " plt.plot(x,g_analytical)\n", + " plt.plot(x,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['numerical scheme','analytical','dnn'])\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "c66dc85a", + "metadata": { + "editable": true + }, + "source": [ + "## Partial Differential Equations\n", + "\n", + "A partial differential equation (PDE) has a solution here the function\n", + "is defined by multiple variables. The equation may involve all kinds\n", + "of combinations of which variables the function is differentiated with\n", + "respect to.\n", + "\n", + "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "cf60d1fc", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{PDE} \\tag{17}\n", + " f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bff85f6e", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given." + ] + }, + { + "cell_type": "markdown", + "id": "64289867", + "metadata": { + "editable": true + }, + "source": [ + "## Type of problem\n", + "\n", + "The problem our network must solve for, is similar to the ODE case.\n", + "We must have a trial solution $g_t$ at hand.\n", + "\n", + "For instance, the trial solution could be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "75d3a4d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6f3e695d", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n", + "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n", + "\n", + "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions." + ] + }, + { + "cell_type": "markdown", + "id": "da1ba3cf", + "metadata": { + "editable": true + }, + "source": [ + "## Network requirements\n", + "\n", + "The network tries then the minimize the cost function following the\n", + "same ideas as described for the ODE case, but now with more than one\n", + "variables to consider. The concept still remains the same; find a set\n", + "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n", + "close to zero as possible.\n", + "\n", + "As for the ODE case, the cost function is the mean squared error that\n", + "the network must try to minimize. The cost function for the network to\n", + "minimize is" + ] + }, + { + "cell_type": "markdown", + "id": "373065ff", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x_1, \\dots, x_N, P\\right) = \\left( f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2281eade", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:" + ] + }, + { + "cell_type": "markdown", + "id": "989a8905", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b36367a0", + "metadata": { + "editable": true + }, + "source": [ + "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into" + ] + }, + { + "cell_type": "markdown", + "id": "6f6f51dd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "35bd1e4a", + "metadata": { + "editable": true + }, + "source": [ + "## Example: The diffusion equation\n", + "\n", + "In one spatial dimension, the equation reads" + ] + }, + { + "cell_type": "markdown", + "id": "2b804c0a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "07f20557", + "metadata": { + "editable": true + }, + "source": [ + "where a possible choice of conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "0e14c702", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a19c5cae", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x)$ being some given function." + ] + }, + { + "cell_type": "markdown", + "id": "de041a40", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the problem\n", + "\n", + "For this case, we want to find $g(x,t)$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "519bb7a7", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation} \\label{diffonedim} \\tag{18}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "129322ea", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "ddc7b725", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5497b34b", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x) = \\sin(\\pi x)$.\n", + "\n", + "First, let us set up the deep neural network.\n", + "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n", + "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions." + ] + }, + { + "cell_type": "markdown", + "id": "0b9040e4", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd\n", + "\n", + "The only change to do here, is to extend our network such that\n", + "functions of multiple parameters are correctly handled. In this case\n", + "we have two variables in our function to solve for, that is time $t$\n", + "and position $x$. The variables will be represented by a\n", + "one-dimensional array in the program. The program will evaluate the\n", + "network at each possible pair $(x,t)$, given an array for the desired\n", + "$x$-values and $t$-values to approximate the solution at." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "17097802", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]" + ] + }, + { + "cell_type": "markdown", + "id": "a2178b56", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The trial solution\n", + "\n", + "The cost function must then iterate through the given arrays\n", + "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n", + "neural network and the trial solution is evaluated at, and then finds\n", + "the Jacobian of the trial solution.\n", + "\n", + "A possible trial solution for this PDE is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n", + "$$\n", + "\n", + "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n", + "\n", + "To fulfill the conditions, $A(x,t)$ could be:\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n", + "$$\n", + "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "533f4e84", + "metadata": { + "editable": true + }, + "source": [ + "## Why the jacobian?\n", + "\n", + "The Jacobian is used because the program must find the derivative of\n", + "the trial solution with respect to $x$ and $t$.\n", + "\n", + "This gives the necessity of computing the Jacobian matrix, as we want\n", + "to evaluate the gradient with respect to $x$ and $t$ (note that the\n", + "Jacobian of a scalar-valued multivariate function is simply its\n", + "gradient).\n", + "\n", + "In Autograd, the differentiation is by default done with respect to\n", + "the first input argument of your Python function. Since the points is\n", + "an array representing $x$ and $t$, the Jacobian is calculated using\n", + "the values of $x$ and $t$.\n", + "\n", + "To find the second derivative with respect to $x$ and $t$, the\n", + "Jacobian can be found for the second time. The result is a Hessian\n", + "matrix, which is the matrix containing all the possible second order\n", + "mixed derivatives of $g(x,t)$." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "7b494481", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum" + ] + }, + { + "cell_type": "markdown", + "id": "9f4b4939", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The full program\n", + "\n", + "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n", + "\n", + "The analytical solution of our problem is\n", + "\n", + "$$\n", + "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n", + "$$\n", + "\n", + "A possible way to implement a neural network solving the PDE, is given below.\n", + "Be aware, though, that it is fairly slow for the parameters used.\n", + "A better result is possible, but requires more iterations, and thus longer time to complete.\n", + "\n", + "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n", + "Using TensorFlow results in a much better execution time. Try it!" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "83d6eb7d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import jacobian,hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the network\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## Define the trial solution and cost function\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum /( np.size(x)*np.size(t) )\n", + "\n", + "## For comparison, define the analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n", + "\n", + "## Set up a function for training the network to solve for the equation\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [100, 25]\n", + " num_iter = 250\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " g_dnn_ag = np.zeros((Nx, Nt))\n", + " G_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " g_dnn_ag[i,j] = g_trial(point,P)\n", + "\n", + " G_analytical[i,j] = g_analytic(point)\n", + "\n", + " # Find the map difference between the analytical and the computed solution\n", + " diff_ag = np.abs(g_dnn_ag - G_analytical)\n", + " print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = g_dnn_ag[:,indx1]\n", + " res2 = g_dnn_ag[:,indx2]\n", + " res3 = g_dnn_ag[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = G_analytical[:,indx1]\n", + " res_analytical2 = G_analytical[:,indx2]\n", + " res_analytical3 = G_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "ada13a48", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the wave equation with Neural Networks\n", + "\n", + "The wave equation is" + ] + }, + { + "cell_type": "markdown", + "id": "e4727d73", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0b86d555", + "metadata": { + "editable": true + }, + "source": [ + "with $c$ being the specified wave speed.\n", + "\n", + "Here, the chosen conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "216948d5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\tg(0,t) &= 0 \\\\\n", + "\tg(1,t) &= 0 \\\\\n", + "\tg(x,0) &= u(x) \\\\\n", + "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "44c25fdc", + "metadata": { + "editable": true + }, + "source": [ + "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions." + ] + }, + { + "cell_type": "markdown", + "id": "98f919eb", + "metadata": { + "editable": true + }, + "source": [ + "## The problem to solve for\n", + "\n", + "The wave equation to solve for, is" + ] + }, + { + "cell_type": "markdown", + "id": "01299767", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{wave} \\tag{19}\n", + "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "556587c5", + "metadata": { + "editable": true + }, + "source": [ + "where $c$ is the given wave speed.\n", + "The chosen conditions for this equation are" + ] + }, + { + "cell_type": "markdown", + "id": "c9eb4f3a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0,t) &= 0, &t \\geq 0 \\\\\n", + "g(1,t) &= 0, &t \\geq 0 \\\\\n", + "g(x,0) &= u(x), &x\\in[0,1] \\\\\n", + "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n", + "\\end{aligned} \\label{condwave} \\tag{20}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "63128ef6", + "metadata": { + "editable": true + }, + "source": [ + "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "ff568c81", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n", + "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n", + "\n", + "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n", + "$$\n", + "\n", + "where\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t^2)u(x) + tv(x)\n", + "$$\n", + "\n", + "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example." + ] + }, + { + "cell_type": "markdown", + "id": "7b32c8dd", + "metadata": { + "editable": true + }, + "source": [ + "## The analytical solution\n", + "\n", + "The analytical solution for our specific problem, is\n", + "\n", + "$$\n", + "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fc33e683", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the wave equation - the full program using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "2f923958", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def v(x):\n", + " return -np.pi*np.sin(np.pi*x)\n", + "\n", + "def h1(point):\n", + " x,t = point\n", + " return (1 - t**2)*u(x) + t*v(x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n", + "\n", + "## Define the cost function\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_d2x = g_t_hessian[0][0]\n", + " g_t_d2t = g_t_hessian[1][1]\n", + "\n", + " err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum / (np.size(t) * np.size(x))\n", + "\n", + "## The neural network\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## The analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n", + "\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [50,20]\n", + " num_iter = 1000\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " res = np.zeros((Nx, Nt))\n", + " res_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " res[i,j] = g_trial(point,P)\n", + "\n", + " res_analytical[i,j] = g_analytic(point)\n", + "\n", + " diff = np.abs(res - res_analytical)\n", + " print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = res[:,indx1]\n", + " res2 = res[:,indx2]\n", + " res3 = res[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = res_analytical[:,indx1]\n", + " res_analytical2 = res_analytical[:,indx2]\n", + " res_analytical3 = res_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "95dea76f", + "metadata": { + "editable": true + }, + "source": [ + "## Resources on differential equations and deep learning\n", + "\n", + "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n", + "\n", + "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n", + "\n", + "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n", + "\n", + "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week44.ipynb b/doc/LectureNotes/week44.ipynb new file mode 100644 index 000000000..6193b11ee --- /dev/null +++ b/doc/LectureNotes/week44.ipynb @@ -0,0 +1,4983 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "67995f17", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d31bb6a0", + "metadata": { + "editable": true + }, + "source": [ + "# Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n", + "\n", + "Date: **Week 44**" + ] + }, + { + "cell_type": "markdown", + "id": "846f5bd7", + "metadata": { + "editable": true + }, + "source": [ + "## Plan for week 44\n", + "\n", + "**Material for the lecture Monday October 27, 2025.**\n", + "\n", + "1. Solving differential equations, continuation from last week, first lecture\n", + "\n", + "2. Convolutional Neural Networks, second lecture\n", + "\n", + "3. Readings and Videos:\n", + "\n", + " * These lecture notes at \n", + "\n", + " * For a more in depth discussion on neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n", + "\n", + " * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at . \n", + "\n", + " * Video on Deep Learning at \n", + "\n", + " * Video on Convolutional Neural Networks from MIT at \n", + "\n", + " * Video on CNNs from Stanford at \n", + "\n", + " * Video of lecture October 27 at \n", + "\n", + " * Whiteboard notes at " + ] + }, + { + "cell_type": "markdown", + "id": "855f98ab", + "metadata": { + "editable": true + }, + "source": [ + "## Lab sessions on Tuesday and Wednesday\n", + "\n", + "* Main focus is discussion of and work on project 2\n", + "\n", + "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday" + ] + }, + { + "cell_type": "markdown", + "id": "12675cc5", + "metadata": { + "editable": true + }, + "source": [ + "## Material for Lecture Monday October 27" + ] + }, + { + "cell_type": "markdown", + "id": "f714320f", + "metadata": { + "editable": true + }, + "source": [ + "## Solving differential equations with Deep Learning\n", + "\n", + "The Universal Approximation Theorem states that a neural network can\n", + "approximate any function at a single hidden layer along with one input\n", + "and output layer to any given precision.\n", + "\n", + "**Book on solving differential equations with ML methods.**\n", + "\n", + "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n", + "\n", + "**Physics informed neural networks.**\n", + "\n", + "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n", + "\n", + "**Thanks to Kristine Baluka Hein.**\n", + "\n", + "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n", + "A great thanks to Kristine." + ] + }, + { + "cell_type": "markdown", + "id": "ebe354b6", + "metadata": { + "editable": true + }, + "source": [ + "## Ordinary Differential Equations first\n", + "\n", + "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n", + "\n", + "In general, an ordinary differential equation looks like" + ] + }, + { + "cell_type": "markdown", + "id": "f16621c0", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{ode} \\tag{1}\n", + "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b272a0d", + "metadata": { + "editable": true + }, + "source": [ + "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n", + "\n", + "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n", + "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n", + "The equation is referred to as a $n$-th order ODE.\n", + "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n", + "for the solution to be unique." + ] + }, + { + "cell_type": "markdown", + "id": "611b2399", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "Let the trial solution $g_t(x)$ be" + ] + }, + { + "cell_type": "markdown", + "id": "cab2d9fb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n", + "\\label{_auto1} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "fbd68a84", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n", + "of conditions, $N(x,P)$ a neural network with weights and biases\n", + "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n", + "neural network. The role of the function $h_2(x, N(x,P))$, is to\n", + "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n", + "evaluated at the values of $x$ where the given conditions must be\n", + "satisfied. The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n", + "the conditions.\n", + "\n", + "But what about the network $N(x,P)$?\n", + "\n", + "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation." + ] + }, + { + "cell_type": "markdown", + "id": "24929e78", + "metadata": { + "editable": true + }, + "source": [ + "## Minimization process\n", + "\n", + "For the minimization to be defined, we need to have a cost function at hand to minimize.\n", + "\n", + "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n", + "We can choose to consider the mean squared error as the cost function for an input $x$.\n", + "Since we are looking at one input, the cost function is just $f$ squared.\n", + "The cost function $c\\left(x, P \\right)$ can therefore be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "8da0a4d4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3de8b89e", + "metadata": { + "editable": true + }, + "source": [ + "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n", + "the cost function becomes" + ] + }, + { + "cell_type": "markdown", + "id": "1275ce7a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{cost} \\tag{3}\n", + "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a522e0fa", + "metadata": { + "editable": true + }, + "source": [ + "The neural net should then find the parameters $P$ that minimizes the cost function in\n", + "([3](#cost)) for a set of $N$ training samples $x_i$." + ] + }, + { + "cell_type": "markdown", + "id": "8a18955b", + "metadata": { + "editable": true + }, + "source": [ + "## Minimizing the cost function using gradient descent and automatic differentiation\n", + "\n", + "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n", + "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n", + "\n", + "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n", + "Automatic differentiation is a method of finding the derivatives numerically with very high precision." + ] + }, + { + "cell_type": "markdown", + "id": "888808f7", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Exponential decay\n", + "\n", + "An exponential decay of a quantity $g(x)$ is described by the equation" + ] + }, + { + "cell_type": "markdown", + "id": "fcefd7fb", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solve_expdec} \\tag{4}\n", + " g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "02cb2ce9", + "metadata": { + "editable": true + }, + "source": [ + "with $g(0) = g_0$ for some chosen initial value $g_0$.\n", + "\n", + "The analytical solution of ([4](#solve_expdec)) is" + ] + }, + { + "cell_type": "markdown", + "id": "bdd9ef4d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n", + "\\label{_auto2} \\tag{5}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "867cbb56", + "metadata": { + "editable": true + }, + "source": [ + "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))." + ] + }, + { + "cell_type": "markdown", + "id": "2f9ac7ae", + "metadata": { + "editable": true + }, + "source": [ + "## The function to solve for\n", + "\n", + "The program will use a neural network to solve" + ] + }, + { + "cell_type": "markdown", + "id": "49a68337", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode} \\tag{6}\n", + "g'(x) = -\\gamma g(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a6a70316", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n", + "\n", + "In this example, $\\gamma = 2$ and $g_0 = 10$." + ] + }, + { + "cell_type": "markdown", + "id": "15622597", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be" + ] + }, + { + "cell_type": "markdown", + "id": "3661d5fe", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "245327b3", + "metadata": { + "editable": true + }, + "source": [ + "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer." + ] + }, + { + "cell_type": "markdown", + "id": "57ae96b2", + "metadata": { + "editable": true + }, + "source": [ + "## Setup of Network\n", + "\n", + "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}}, P_{\\text{output}} \\}$.\n", + "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n", + "\n", + "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n", + "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n", + "\n", + "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n", + "\n", + "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:" + ] + }, + { + "cell_type": "markdown", + "id": "6e7ea73f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{trial} \\tag{7}\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ef84086", + "metadata": { + "editable": true + }, + "source": [ + "## Reformulating the problem\n", + "\n", + "We wish that our neural network manages to minimize a given cost function.\n", + "\n", + "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n", + "such that it describes the problem a neural network can solve for.\n", + "\n", + "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n", + "\n", + "The trial solution" + ] + }, + { + "cell_type": "markdown", + "id": "03980965", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x, P) = g_0 + x \\cdot N(x, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f838bf7c", + "metadata": { + "editable": true + }, + "source": [ + "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "3e1ebb62", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{nnmin} \\tag{8}\n", + "g_t'(x, P) = - \\gamma g_t(x, P)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a85dcbea", + "metadata": { + "editable": true + }, + "source": [ + "is fulfilled as *best as possible*." + ] + }, + { + "cell_type": "markdown", + "id": "dc4a2fc0", + "metadata": { + "editable": true + }, + "source": [ + "## More technicalities\n", + "\n", + "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n", + "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n", + "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n", + "\n", + "This gives the following cost function our neural network must solve for:" + ] + }, + { + "cell_type": "markdown", + "id": "20921b20", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "06e89d99", + "metadata": { + "editable": true + }, + "source": [ + "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n", + "\n", + "or, in terms of weights and biases for the hidden and output layer in our network:" + ] + }, + { + "cell_type": "markdown", + "id": "fb4b7d00", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "925d8872", + "metadata": { + "editable": true + }, + "source": [ + "for an input value $x$." + ] + }, + { + "cell_type": "markdown", + "id": "46f38d69", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes" + ] + }, + { + "cell_type": "markdown", + "id": "adca56df", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{min} \\tag{9}\n", + "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9e260216", + "metadata": { + "editable": true + }, + "source": [ + "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes" + ] + }, + { + "cell_type": "markdown", + "id": "7d5e7f63", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\min_{P} C(\\boldsymbol{x}, P)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d7442d44", + "metadata": { + "editable": true + }, + "source": [ + "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n", + "\n", + "$$\n", + "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "af21673a", + "metadata": { + "editable": true + }, + "source": [ + "## A possible implementation of a neural network\n", + "\n", + "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n", + "\n", + "First, the neural network must feed forward the inputs.\n", + "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n", + "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer. The number of neurons in the hidden layer will be $N_{\\text{hidden} }$." + ] + }, + { + "cell_type": "markdown", + "id": "6687f370", + "metadata": { + "editable": true + }, + "source": [ + "## Technicalities\n", + "\n", + "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:" + ] + }, + { + "cell_type": "markdown", + "id": "7c07e210", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "x_j\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7747386f", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities I\n", + "\n", + "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:" + ] + }, + { + "cell_type": "markdown", + "id": "981c5e4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big) \\\\\n", + "&=\n", + "\\begin{pmatrix}\n", + " b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "x_1 & x_2 & \\dots & x_N\n", + "\\end{pmatrix} \\\\\n", + "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7eedb1ed", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities II\n", + "\n", + "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n", + "\n", + "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n", + "\n", + "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:" + ] + }, + { + "cell_type": "markdown", + "id": "8507388c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "32c6ce19", + "metadata": { + "editable": true + }, + "source": [ + "It is possible to use other activations functions for the hidden layer also.\n", + "\n", + "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n", + "\n", + "$$\n", + "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big( \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n", + "$$\n", + "\n", + "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n", + "\n", + "The output layer consists of one neuron in this case, and combines the\n", + "output from each of the neurons in the hidden layers. The output layer\n", + "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n", + "and biases $b_i^{\\text{output}}$. In this case,\n", + "it is assumes that the number of neurons in the output layer is one." + ] + }, + { + "cell_type": "markdown", + "id": "d3adb503", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities III\n", + "\n", + "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously." + ] + }, + { + "cell_type": "markdown", + "id": "41fb7d85", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "z_{1,j}^{\\text{output}} & =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 \\\\\n", + "\\boldsymbol{x}_j^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6af6c5f6", + "metadata": { + "editable": true + }, + "source": [ + "## Final technicalities IV\n", + "\n", + "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:" + ] + }, + { + "cell_type": "markdown", + "id": "bfdfcfe5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{z}_{1}^{\\text{output}} =\n", + "\\begin{pmatrix}\n", + "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "1 & 1 & \\dots & 1 \\\\\n", + "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n", + "\\end{pmatrix}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "224fb7a0", + "metadata": { + "editable": true + }, + "source": [ + "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network." + ] + }, + { + "cell_type": "markdown", + "id": "03c8c39e", + "metadata": { + "editable": true + }, + "source": [ + "## Back propagation\n", + "\n", + "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n", + "\n", + "The chosen cost function for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "f467feb4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "287a0aed", + "metadata": { + "editable": true + }, + "source": [ + "In order to minimize the cost function, an optimization method must be chosen.\n", + "\n", + "Here, gradient descent with a constant step size has been chosen." + ] + }, + { + "cell_type": "markdown", + "id": "a49835f1", + "metadata": { + "editable": true + }, + "source": [ + "## Gradient descent\n", + "\n", + "The idea of the gradient descent algorithm is to update parameters in\n", + "a direction where the cost function decreases goes to a minimum.\n", + "\n", + "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n", + "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n", + "\\boldsymbol{\\omega})$, goes as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "62d6f51d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ca20573", + "metadata": { + "editable": true + }, + "source": [ + "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n", + "\n", + "The value of $\\lambda$ decides how large steps the algorithm must take\n", + "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n", + "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n", + "to the elements in $\\boldsymbol{\\omega}$.\n", + "\n", + "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n", + "respect to the two sets of weights and biases, that is for the hidden\n", + "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n", + "}$ .\n", + "\n", + "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by" + ] + }, + { + "cell_type": "markdown", + "id": "8b16bc94", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P) \\\\\n", + "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a339b3a7", + "metadata": { + "editable": true + }, + "source": [ + "## The code for solving the ODE" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a63e587a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Assuming one input, hidden, and output layer\n", + "def neural_network(params, x):\n", + "\n", + " # Find the weights (including and biases) for the hidden and output layer.\n", + " # Assume that params is a list of parameters for each layer.\n", + " # The biases are the first element for each array in params,\n", + " # and the weights are the remaning elements in each array in params.\n", + "\n", + " w_hidden = params[0]\n", + " w_output = params[1]\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " ## Hidden layer:\n", + "\n", + " # Add a row of ones to include bias\n", + " x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_input)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " ## Output layer:\n", + "\n", + " # Include bias:\n", + " x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_hidden)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial(x,params, g0 = 10):\n", + " return g0 + x*neural_network(params,x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n", + "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n", + " ## Set up initial weights and biases\n", + "\n", + " # For the hidden layer\n", + " p0 = npr.randn(num_neurons_hidden, 2 )\n", + "\n", + " # For the output layer\n", + " p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n", + "\n", + " P = [p0, p1]\n", + "\n", + " print('Initial cost: %g'%cost_function(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of two arrays;\n", + " # one for the gradient w.r.t P_hidden and\n", + " # one for the gradient w.r.t P_output\n", + " cost_grad = cost_function_grad(P, x)\n", + "\n", + " P[0] = P[0] - lmb * cost_grad[0]\n", + " P[1] = P[1] - lmb * cost_grad[1]\n", + "\n", + " print('Final cost: %g'%cost_function(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " # Set seed such that the weight are initialized\n", + " # with same weights and biases for every run.\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = 10\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " # Use the network\n", + " P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " # Print the deviation from the trial solution and true solution\n", + " res = g_trial(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "85985bda", + "metadata": { + "editable": true + }, + "source": [ + "## The network with one input layer, specified number of hidden layers, and one output layer\n", + "\n", + "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n", + "\n", + "The number of neurons within each hidden layer are given as a list of integers in the program below." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "91831f8e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# The neural network with one input layer and one output layer,\n", + "# but with number of hidden layers specified by the user.\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x,params, g0 = 10):\n", + " return g0 + x*deep_neural_network(params, x)\n", + "\n", + "# The right side of the ODE:\n", + "def g(x, g_trial, gamma = 2):\n", + " return -gamma*g_trial\n", + "\n", + "# The same cost function as before, but calls deep_neural_network instead.\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the neural network\n", + " d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = g(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# Solve the exponential decay ODE using neural network with one input and one output layer,\n", + "# but with specified number of hidden layers from the user.\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # The number of elements in the list num_hidden_neurons thus represents\n", + " # the number of hidden layers.\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weights and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weights using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "def g_analytic(x, gamma = 2, g0 = 10):\n", + " return g0*np.exp(-gamma*x)\n", + "\n", + "# Solve the given problem\n", + "if __name__ == '__main__':\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " N = 10\n", + " x = np.linspace(0, 1, N)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = np.array([10,10])\n", + " num_iter = 10000\n", + " lmb = 0.001\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " res = g_trial_deep(x,P)\n", + " res_analytical = g_analytic(x)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, res_analytical)\n", + " plt.plot(x, res[0,:])\n", + " plt.legend(['analytical','dnn'])\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e6de1553", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Population growth\n", + "\n", + "A logistic model of population growth assumes that a population converges toward an equilibrium.\n", + "The population growth can be modeled by" + ] + }, + { + "cell_type": "markdown", + "id": "6e4c5e3a", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{log} \\tag{10}\n", + "\tg'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "64a97256", + "metadata": { + "editable": true + }, + "source": [ + "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n", + "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n", + "\n", + "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n", + "and high execution time (this might be more apparent in the examples solving PDEs),\n", + "using a library like TensorFlow is recommended.\n", + "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method." + ] + }, + { + "cell_type": "markdown", + "id": "94bb8aaa", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the problem\n", + "\n", + "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n", + "The population follows the model" + ] + }, + { + "cell_type": "markdown", + "id": "29ead54b", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{solveode_population} \\tag{11}\n", + "g'(t) = \\alpha g(t)(A - g(t))\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5685f6e2", + "metadata": { + "editable": true + }, + "source": [ + "where $g(0) = g_0$.\n", + "\n", + "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$." + ] + }, + { + "cell_type": "markdown", + "id": "adaea719", + "metadata": { + "editable": true + }, + "source": [ + "## The trial solution\n", + "\n", + "We will get a slightly different trial solution, as the boundary conditions are different\n", + "compared to the case for exponential decay.\n", + "\n", + "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n", + "\n", + "$$\n", + "h_1(t) = g_0 + t \\cdot N(t,P)\n", + "$$\n", + "\n", + "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n", + "\n", + "The analytical solution is\n", + "\n", + "$$\n", + "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4ee7e543", + "metadata": { + "editable": true + }, + "source": [ + "## The program using Autograd\n", + "\n", + "The network will be the similar as for the exponential decay example, but with some small modifications for our problem." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e50f4369", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "# Function to get the parameters.\n", + "# Done such that one can easily change the paramaters after one's liking.\n", + "def get_parameters():\n", + " alpha = 2\n", + " A = 1\n", + " g0 = 1.2\n", + " return alpha, A, g0\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n", + "\n", + " # The right side of the ODE\n", + " func = f(x, g_t)\n", + "\n", + " err_sqr = (d_g_t - func)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum / np.size(err_sqr)\n", + "\n", + "# The right side of the ODE:\n", + "def f(x, g_trial):\n", + " alpha,A, g0 = get_parameters()\n", + " return alpha*g_trial*(A - g_trial)\n", + "\n", + "# The trial solution using the deep neural network:\n", + "def g_trial_deep(x, params):\n", + " alpha,A, g0 = get_parameters()\n", + " return g0 + x*deep_neural_network(params,x)\n", + "\n", + "# The analytical solution:\n", + "def g_analytic(t):\n", + " alpha,A, g0 = get_parameters()\n", + " return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100, 50, 25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "cf212644", + "metadata": { + "editable": true + }, + "source": [ + "## Using forward Euler to solve the ODE\n", + "\n", + "A straightforward way of solving an ODE numerically, is to use Euler's method.\n", + "\n", + "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n", + "\n", + "$$\n", + "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n", + "$$\n", + "\n", + "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields" + ] + }, + { + "cell_type": "markdown", + "id": "46f2fb77", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + " g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n", + " &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aab2dfa5", + "metadata": { + "editable": true + }, + "source": [ + "along with the condition that $g(0) = g_0$.\n", + "\n", + "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n", + "\n", + "For $i \\geq 1$, we have that" + ] + }, + { + "cell_type": "markdown", + "id": "8eea575e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "t_i &= i\\Delta t \\\\\n", + "&= (i - 1)\\Delta t + \\Delta t \\\\\n", + "&= t_{i-1} + \\Delta t\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b91b116d", + "metadata": { + "editable": true + }, + "source": [ + "Now, if $g_i = g(t_i)$ then" + ] + }, + { + "cell_type": "markdown", + "id": "b438159d", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " g_i &= g(t_i) \\\\\n", + " &= g(t_{i-1} + \\Delta t) \\\\\n", + " &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n", + " &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odenum} \\tag{12}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c4fcc89b", + "metadata": { + "editable": true + }, + "source": [ + "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n", + "\n", + "Equation ([12](#odenum)) could be implemented in the following way,\n", + "extending the program that uses the network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "98f55b29", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Assume that all function definitions from the example program using Autograd\n", + "# are located here.\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nt = 10\n", + " T = 1\n", + " t = np.linspace(0,T, Nt)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [100,50,25]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(t,P)\n", + " g_analytical = g_analytic(t)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(t, g_analytical)\n", + " plt.plot(t, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " ## Find an approximation to the funtion using forward Euler\n", + "\n", + " alpha, A, g0 = get_parameters()\n", + " dt = T/(Nt - 1)\n", + "\n", + " # Perform forward Euler to solve the ODE\n", + " g_euler = np.zeros(Nt)\n", + " g_euler[0] = g0\n", + "\n", + " for i in range(1,Nt):\n", + " g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n", + "\n", + " # Print the errors done by each method\n", + " diff1 = np.max(np.abs(g_euler - g_analytical))\n", + " diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n", + "\n", + " print('Max absolute difference between Euler method and analytical: %g'%diff1)\n", + " print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n", + "\n", + " # Plot results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(t,g_euler)\n", + " plt.plot(t,g_analytical)\n", + " plt.plot(t,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['euler','analytical','dnn'])\n", + " plt.xlabel('Time t')\n", + " plt.ylabel('g(t)')\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "a6e8888e", + "metadata": { + "editable": true + }, + "source": [ + "## Example: Solving the one dimensional Poisson equation\n", + "\n", + "The Poisson equation for $g(x)$ in one dimension is" + ] + }, + { + "cell_type": "markdown", + "id": "ac2720d4", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{poisson} \\tag{13}\n", + " -g''(x) = f(x)\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "65554b02", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function for $x \\in (0,1)$.\n", + "\n", + "The conditions that $g(x)$ is chosen to fulfill, are" + ] + }, + { + "cell_type": "markdown", + "id": "0cdf0586", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g(0) &= 0 \\\\\n", + " g(1) &= 0\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f7e65a6a", + "metadata": { + "editable": true + }, + "source": [ + "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n", + "The results from the networks can then be compared to the analytical solution.\n", + "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "cd827e12", + "metadata": { + "editable": true + }, + "source": [ + "## The specific equation to solve for\n", + "\n", + "Here, the function $g(x)$ to solve for follows the equation" + ] + }, + { + "cell_type": "markdown", + "id": "a6100e41", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "-g''(x) = f(x),\\qquad x \\in (0,1)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "15c06751", + "metadata": { + "editable": true + }, + "source": [ + "where $f(x)$ is a given function, along with the chosen conditions" + ] + }, + { + "cell_type": "markdown", + "id": "b2b4dd2f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "g(0) = g(1) = 0\n", + "\\end{aligned}\\label{cond} \\tag{14}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2133aeed", + "metadata": { + "editable": true + }, + "source": [ + "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n", + "\n", + "For this case, a possible trial solution satisfying the conditions could be" + ] + }, + { + "cell_type": "markdown", + "id": "5baf9b4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed82aba2", + "metadata": { + "editable": true + }, + "source": [ + "The analytical solution for this problem is" + ] + }, + { + "cell_type": "markdown", + "id": "c9bce69c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "g(x) = x(1 - x)\\exp(x)\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ce42c4a8", + "metadata": { + "editable": true + }, + "source": [ + "## Solving the equation using Autograd" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2fcb9045", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + " max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9db2e30e", + "metadata": { + "editable": true + }, + "source": [ + "## Comparing with a numerical scheme\n", + "\n", + "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n", + "\n", + "Using Taylor series, the second derivative can be expressed as\n", + "\n", + "$$\n", + "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n", + "$$\n", + "\n", + "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n", + "\n", + "Looking away from the error terms gives an approximation to the second derivative:" + ] + }, + { + "cell_type": "markdown", + "id": "2cea098e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{approx} \\tag{15}\n", + "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4606d139", + "metadata": { + "editable": true + }, + "source": [ + "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes" + ] + }, + { + "cell_type": "markdown", + "id": "bf52b218", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n", + "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5649b303", + "metadata": { + "editable": true + }, + "source": [ + "Since we know from our problem that" + ] + }, + { + "cell_type": "markdown", + "id": "cabbaeeb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "-g''(x) &= f(x) \\\\\n", + "&= (3x + x^2)\\exp(x)\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9116da9a", + "metadata": { + "editable": true + }, + "source": [ + "along with the conditions $g(0) = g(1) = 0$,\n", + "the following scheme can be used to find an approximate solution for $g(x)$ numerically:" + ] + }, + { + "cell_type": "markdown", + "id": "fa0313ed", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\begin{aligned}\n", + " -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n", + " -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n", + " \\end{aligned}\n", + "\\end{equation} \\label{odesys} \\tag{16}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4bff256", + "metadata": { + "editable": true + }, + "source": [ + "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n", + "\n", + "The equation can be rewritten into a matrix equation:" + ] + }, + { + "cell_type": "markdown", + "id": "2817b619", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{aligned}\n", + "\\begin{pmatrix}\n", + "2 & -1 & 0 & \\dots & 0 \\\\\n", + "-1 & 2 & -1 & \\dots & 0 \\\\\n", + "\\vdots & & \\ddots & & \\vdots \\\\\n", + "0 & \\dots & -1 & 2 & -1 \\\\\n", + "0 & \\dots & 0 & -1 & 2\\\\\n", + "\\end{pmatrix}\n", + "\\begin{pmatrix}\n", + "g_1 \\\\\n", + "g_2 \\\\\n", + "\\vdots \\\\\n", + "g_{N_x - 3} \\\\\n", + "g_{N_x - 2}\n", + "\\end{pmatrix}\n", + "&=\n", + "\\Delta x^2\n", + "\\begin{pmatrix}\n", + "f(x_1) \\\\\n", + "f(x_2) \\\\\n", + "\\vdots \\\\\n", + "f(x_{N_x - 3}) \\\\\n", + "f(x_{N_x - 2})\n", + "\\end{pmatrix} \\\\\n", + "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n", + "\\end{aligned}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5130b233", + "metadata": { + "editable": true + }, + "source": [ + "which makes it possible to solve for the vector $\\boldsymbol{g}$." + ] + }, + { + "cell_type": "markdown", + "id": "18a4fdda", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the code\n", + "\n", + "We can then compare the result from this numerical scheme with the output from our network using Autograd:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "3cff184d", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import grad, elementwise_grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import pyplot as plt\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # N_hidden is the number of hidden layers\n", + " # deep_params is a list, len() should be used\n", + " N_hidden = len(deep_params) - 1 # -1 since params consists of\n", + " # parameters to all the hidden\n", + " # layers AND the output layer.\n", + "\n", + " # Assumes input x being an one-dimensional array\n", + " num_values = np.size(x)\n", + " x = x.reshape(-1, num_values)\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + "\n", + " # Due to multiple hidden layers, define a variable referencing to the\n", + " # output of the previous layer:\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output\n", + "\n", + "\n", + "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n", + " # num_hidden_neurons is now a list of number of neurons within each hidden layer\n", + "\n", + " # Find the number of hidden layers:\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 )\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: %g'%cost_function_deep(P, x))\n", + "\n", + " ## Start finding the optimal weigths using gradient descent\n", + "\n", + " # Find the Python function that represents the gradient of the cost function\n", + " # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n", + " cost_function_deep_grad = grad(cost_function_deep,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " # Evaluate the gradient at the current weights and biases in P.\n", + " # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n", + " # in the hidden layers and output layers evaluated at x.\n", + " cost_deep_grad = cost_function_deep_grad(P, x)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_deep_grad[l]\n", + "\n", + " print('Final cost: %g'%cost_function_deep(P, x))\n", + "\n", + " return P\n", + "\n", + "## Set up the cost function specified for this Poisson equation:\n", + "\n", + "# The right side of the ODE\n", + "def f(x):\n", + " return (3*x + x**2)*np.exp(x)\n", + "\n", + "def cost_function_deep(P, x):\n", + "\n", + " # Evaluate the trial function with the current parameters P\n", + " g_t = g_trial_deep(x,P)\n", + "\n", + " # Find the derivative w.r.t x of the trial function\n", + " d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n", + "\n", + " right_side = f(x)\n", + "\n", + " err_sqr = (-d2_g_t - right_side)**2\n", + " cost_sum = np.sum(err_sqr)\n", + "\n", + " return cost_sum/np.size(err_sqr)\n", + "\n", + "# The trial solution:\n", + "def g_trial_deep(x,P):\n", + " return x*(1-x)*deep_neural_network(P,x)\n", + "\n", + "# The analytic solution;\n", + "def g_analytic(x):\n", + " return x*(1-x)*np.exp(x)\n", + "\n", + "if __name__ == '__main__':\n", + " npr.seed(4155)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10\n", + " x = np.linspace(0,1, Nx)\n", + "\n", + " ## Set up the initial parameters\n", + " num_hidden_neurons = [200,100]\n", + " num_iter = 1000\n", + " lmb = 1e-3\n", + "\n", + " P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " g_dnn_ag = g_trial_deep(x,P)\n", + " g_analytical = g_analytic(x)\n", + "\n", + " # Find the maximum absolute difference between the solutons:\n", + "\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n", + " plt.plot(x, g_analytical)\n", + " plt.plot(x, g_dnn_ag[0,:])\n", + " plt.legend(['analytical','nn'])\n", + " plt.xlabel('x')\n", + " plt.ylabel('g(x)')\n", + "\n", + " ## Perform the computation using the numerical scheme\n", + "\n", + " dx = 1/(Nx - 1)\n", + "\n", + " # Set up the matrix A\n", + " A = np.zeros((Nx-2,Nx-2))\n", + "\n", + " A[0,0] = 2\n", + " A[0,1] = -1\n", + "\n", + " for i in range(1,Nx-3):\n", + " A[i,i-1] = -1\n", + " A[i,i] = 2\n", + " A[i,i+1] = -1\n", + "\n", + " A[Nx - 3, Nx - 4] = -1\n", + " A[Nx - 3, Nx - 3] = 2\n", + "\n", + " # Set up the vector f\n", + " f_vec = dx**2 * f(x[1:-1])\n", + "\n", + " # Solve the equation\n", + " g_res = np.linalg.solve(A,f_vec)\n", + "\n", + " g_vec = np.zeros(Nx)\n", + " g_vec[1:-1] = g_res\n", + "\n", + " # Print the differences between each method\n", + " max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n", + " max_diff2 = np.max(np.abs(g_vec - g_analytical))\n", + " print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n", + " print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n", + "\n", + " # Plot the results\n", + " plt.figure(figsize=(10,10))\n", + "\n", + " plt.plot(x,g_vec)\n", + " plt.plot(x,g_analytical)\n", + " plt.plot(x,g_dnn_ag[0,:])\n", + "\n", + " plt.legend(['numerical scheme','analytical','dnn'])\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "89115be0", + "metadata": { + "editable": true + }, + "source": [ + "## Partial Differential Equations\n", + "\n", + "A partial differential equation (PDE) has a solution here the function\n", + "is defined by multiple variables. The equation may involve all kinds\n", + "of combinations of which variables the function is differentiated with\n", + "respect to.\n", + "\n", + "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "c43a6341", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation} \\label{PDE} \\tag{17}\n", + " f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "218a7a68", + "metadata": { + "editable": true + }, + "source": [ + "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given." + ] + }, + { + "cell_type": "markdown", + "id": "902f8f61", + "metadata": { + "editable": true + }, + "source": [ + "## Type of problem\n", + "\n", + "The problem our network must solve for, is similar to the ODE case.\n", + "We must have a trial solution $g_t$ at hand.\n", + "\n", + "For instance, the trial solution could be expressed as" + ] + }, + { + "cell_type": "markdown", + "id": "1c2bbcbd", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + " g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "73f5bf7b", + "metadata": { + "editable": true + }, + "source": [ + "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n", + "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n", + "\n", + "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions." + ] + }, + { + "cell_type": "markdown", + "id": "dbb4ece5", + "metadata": { + "editable": true + }, + "source": [ + "## Network requirements\n", + "\n", + "The network tries then the minimize the cost function following the\n", + "same ideas as described for the ODE case, but now with more than one\n", + "variables to consider. The concept still remains the same; find a set\n", + "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n", + "close to zero as possible.\n", + "\n", + "As for the ODE case, the cost function is the mean squared error that\n", + "the network must try to minimize. The cost function for the network to\n", + "minimize is" + ] + }, + { + "cell_type": "markdown", + "id": "d01d3943", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(x_1, \\dots, x_N, P\\right) = \\left( f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6514db22", + "metadata": { + "editable": true + }, + "source": [ + "## More details\n", + "\n", + "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:" + ] + }, + { + "cell_type": "markdown", + "id": "5a0ed10c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "200fc78c", + "metadata": { + "editable": true + }, + "source": [ + "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into" + ] + }, + { + "cell_type": "markdown", + "id": "0c87647d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6484a267", + "metadata": { + "editable": true + }, + "source": [ + "## Example: The diffusion equation\n", + "\n", + "In one spatial dimension, the equation reads" + ] + }, + { + "cell_type": "markdown", + "id": "2c2a2467", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6df58357", + "metadata": { + "editable": true + }, + "source": [ + "where a possible choice of conditions are" + ] + }, + { + "cell_type": "markdown", + "id": "13d9c7f6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "627708ec", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x)$ being some given function." + ] + }, + { + "cell_type": "markdown", + "id": "43cdd945", + "metadata": { + "editable": true + }, + "source": [ + "## Defining the problem\n", + "\n", + "For this case, we want to find $g(x,t)$ such that" + ] + }, + { + "cell_type": "markdown", + "id": "ccdcb67e", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + " \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n", + "\\end{equation} \\label{diffonedim} \\tag{18}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ebe711f8", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "2174f30f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{align*}\n", + "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n", + "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n", + "g(x,0) &= u(x),\\qquad x\\in [0,1]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "083ed2ff", + "metadata": { + "editable": true + }, + "source": [ + "with $u(x) = \\sin(\\pi x)$.\n", + "\n", + "First, let us set up the deep neural network.\n", + "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n", + "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions." + ] + }, + { + "cell_type": "markdown", + "id": "cf5e3f46", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd\n", + "\n", + "The only change to do here, is to extend our network such that\n", + "functions of multiple parameters are correctly handled. In this case\n", + "we have two variables in our function to solve for, that is time $t$\n", + "and position $x$. The variables will be represented by a\n", + "one-dimensional array in the program. The program will evaluate the\n", + "network at each possible pair $(x,t)$, given an array for the desired\n", + "$x$-values and $t$-values to approximate the solution at." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "4fee106b", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]" + ] + }, + { + "cell_type": "markdown", + "id": "63e5fb7e", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The trial solution\n", + "\n", + "The cost function must then iterate through the given arrays\n", + "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n", + "neural network and the trial solution is evaluated at, and then finds\n", + "the Jacobian of the trial solution.\n", + "\n", + "A possible trial solution for this PDE is\n", + "\n", + "$$\n", + "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n", + "$$\n", + "\n", + "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n", + "\n", + "To fulfill the conditions, $h_1(x,t)$ could be:\n", + "\n", + "$$\n", + "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n", + "$$\n", + "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$." + ] + }, + { + "cell_type": "markdown", + "id": "50cfea81", + "metadata": { + "editable": true + }, + "source": [ + "## Why the Jacobian?\n", + "\n", + "The Jacobian is used because the program must find the derivative of\n", + "the trial solution with respect to $x$ and $t$.\n", + "\n", + "This gives the necessity of computing the Jacobian matrix, as we want\n", + "to evaluate the gradient with respect to $x$ and $t$ (note that the\n", + "Jacobian of a scalar-valued multivariate function is simply its\n", + "gradient).\n", + "\n", + "In Autograd, the differentiation is by default done with respect to\n", + "the first input argument of your Python function. Since the points is\n", + "an array representing $x$ and $t$, the Jacobian is calculated using\n", + "the values of $x$ and $t$.\n", + "\n", + "To find the second derivative with respect to $x$ and $t$, the\n", + "Jacobian can be found for the second time. The result is a Hessian\n", + "matrix, which is the matrix containing all the possible second order\n", + "mixed derivatives of $g(x,t)$." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "309808f6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Set up the trial function:\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum" + ] + }, + { + "cell_type": "markdown", + "id": "9880d94c", + "metadata": { + "editable": true + }, + "source": [ + "## Setting up the network using Autograd; The full program\n", + "\n", + "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n", + "\n", + "The analytical solution of our problem is\n", + "\n", + "$$\n", + "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n", + "$$\n", + "\n", + "A possible way to implement a neural network solving the PDE, is given below.\n", + "Be aware, though, that it is fairly slow for the parameters used.\n", + "A better result is possible, but requires more iterations, and thus longer time to complete.\n", + "\n", + "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n", + "Using TensorFlow results in a much better execution time. Try it!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fcd284e3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import autograd.numpy as np\n", + "from autograd import jacobian,hessian,grad\n", + "import autograd.numpy.random as npr\n", + "from matplotlib import cm\n", + "from matplotlib import pyplot as plt\n", + "from mpl_toolkits.mplot3d import axes3d\n", + "\n", + "## Set up the network\n", + "\n", + "def sigmoid(z):\n", + " return 1/(1 + np.exp(-z))\n", + "\n", + "def deep_neural_network(deep_params, x):\n", + " # x is now a point and a 1D numpy array; make it a column vector\n", + " num_coordinates = np.size(x,0)\n", + " x = x.reshape(num_coordinates,-1)\n", + "\n", + " num_points = np.size(x,1)\n", + "\n", + " # N_hidden is the number of hidden layers\n", + " N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n", + "\n", + " # Assume that the input layer does nothing to the input x\n", + " x_input = x\n", + " x_prev = x_input\n", + "\n", + " ## Hidden layers:\n", + "\n", + " for l in range(N_hidden):\n", + " # From the list of parameters P; find the correct weigths and bias for this layer\n", + " w_hidden = deep_params[l]\n", + "\n", + " # Add a row of ones to include bias\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n", + "\n", + " z_hidden = np.matmul(w_hidden, x_prev)\n", + " x_hidden = sigmoid(z_hidden)\n", + "\n", + " # Update x_prev such that next layer can use the output from this layer\n", + " x_prev = x_hidden\n", + "\n", + " ## Output layer:\n", + "\n", + " # Get the weights and bias for this layer\n", + " w_output = deep_params[-1]\n", + "\n", + " # Include bias:\n", + " x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n", + "\n", + " z_output = np.matmul(w_output, x_prev)\n", + " x_output = z_output\n", + "\n", + " return x_output[0][0]\n", + "\n", + "## Define the trial solution and cost function\n", + "def u(x):\n", + " return np.sin(np.pi*x)\n", + "\n", + "def g_trial(point,P):\n", + " x,t = point\n", + " return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n", + "\n", + "# The right side of the ODE:\n", + "def f(point):\n", + " return 0.\n", + "\n", + "# The cost function:\n", + "def cost_function(P, x, t):\n", + " cost_sum = 0\n", + "\n", + " g_t_jacobian_func = jacobian(g_trial)\n", + " g_t_hessian_func = hessian(g_trial)\n", + "\n", + " for x_ in x:\n", + " for t_ in t:\n", + " point = np.array([x_,t_])\n", + "\n", + " g_t = g_trial(point,P)\n", + " g_t_jacobian = g_t_jacobian_func(point,P)\n", + " g_t_hessian = g_t_hessian_func(point,P)\n", + "\n", + " g_t_dt = g_t_jacobian[1]\n", + " g_t_d2x = g_t_hessian[0][0]\n", + "\n", + " func = f(point)\n", + "\n", + " err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n", + " cost_sum += err_sqr\n", + "\n", + " return cost_sum /( np.size(x)*np.size(t) )\n", + "\n", + "## For comparison, define the analytical solution\n", + "def g_analytic(point):\n", + " x,t = point\n", + " return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n", + "\n", + "## Set up a function for training the network to solve for the equation\n", + "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n", + " ## Set up initial weigths and biases\n", + " N_hidden = np.size(num_neurons)\n", + "\n", + " ## Set up initial weigths and biases\n", + "\n", + " # Initialize the list of parameters:\n", + " P = [None]*(N_hidden + 1) # + 1 to include the output layer\n", + "\n", + " P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n", + " for l in range(1,N_hidden):\n", + " P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n", + "\n", + " # For the output layer\n", + " P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n", + "\n", + " print('Initial cost: ',cost_function(P, x, t))\n", + "\n", + " cost_function_grad = grad(cost_function,0)\n", + "\n", + " # Let the update be done num_iter times\n", + " for i in range(num_iter):\n", + " cost_grad = cost_function_grad(P, x , t)\n", + "\n", + " for l in range(N_hidden+1):\n", + " P[l] = P[l] - lmb * cost_grad[l]\n", + "\n", + " print('Final cost: ',cost_function(P, x, t))\n", + "\n", + " return P\n", + "\n", + "if __name__ == '__main__':\n", + " ### Use the neural network:\n", + " npr.seed(15)\n", + "\n", + " ## Decide the vales of arguments to the function to solve\n", + " Nx = 10; Nt = 10\n", + " x = np.linspace(0, 1, Nx)\n", + " t = np.linspace(0,1,Nt)\n", + "\n", + " ## Set up the parameters for the network\n", + " num_hidden_neurons = [100, 25]\n", + " num_iter = 250\n", + " lmb = 0.01\n", + "\n", + " P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n", + "\n", + " ## Store the results\n", + " g_dnn_ag = np.zeros((Nx, Nt))\n", + " G_analytical = np.zeros((Nx, Nt))\n", + " for i,x_ in enumerate(x):\n", + " for j, t_ in enumerate(t):\n", + " point = np.array([x_, t_])\n", + " g_dnn_ag[i,j] = g_trial(point,P)\n", + "\n", + " G_analytical[i,j] = g_analytic(point)\n", + "\n", + " # Find the map difference between the analytical and the computed solution\n", + " diff_ag = np.abs(g_dnn_ag - G_analytical)\n", + " print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n", + "\n", + " ## Plot the solutions in two dimensions, that being in position and time\n", + "\n", + " T,X = np.meshgrid(t,x)\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n", + " s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Analytical solution')\n", + " s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " fig = plt.figure(figsize=(10,10))\n", + " ax = fig.add_suplot(projection='3d')\n", + " ax.set_title('Difference')\n", + " s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n", + " ax.set_xlabel('Time $t$')\n", + " ax.set_ylabel('Position $x$');\n", + "\n", + " ## Take some slices of the 3D plots just to see the solutions at particular times\n", + " indx1 = 0\n", + " indx2 = int(Nt/2)\n", + " indx3 = Nt-1\n", + "\n", + " t1 = t[indx1]\n", + " t2 = t[indx2]\n", + " t3 = t[indx3]\n", + "\n", + " # Slice the results from the DNN\n", + " res1 = g_dnn_ag[:,indx1]\n", + " res2 = g_dnn_ag[:,indx2]\n", + " res3 = g_dnn_ag[:,indx3]\n", + "\n", + " # Slice the analytical results\n", + " res_analytical1 = G_analytical[:,indx1]\n", + " res_analytical2 = G_analytical[:,indx2]\n", + " res_analytical3 = G_analytical[:,indx3]\n", + "\n", + " # Plot the slices\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t1)\n", + " plt.plot(x, res1)\n", + " plt.plot(x,res_analytical1)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t2)\n", + " plt.plot(x, res2)\n", + " plt.plot(x,res_analytical2)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.figure(figsize=(10,10))\n", + " plt.title(\"Computed solutions at time = %g\"%t3)\n", + " plt.plot(x, res3)\n", + " plt.plot(x,res_analytical3)\n", + " plt.legend(['dnn','analytical'])\n", + "\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "51ff4964", + "metadata": { + "editable": true + }, + "source": [ + "## Resources on differential equations and deep learning\n", + "\n", + "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n", + "\n", + "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n", + "\n", + "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n", + "\n", + "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)" + ] + }, + { + "cell_type": "markdown", + "id": "f7c3b9fc", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Networks (recognizing images)\n", + "\n", + "Convolutional neural networks (CNNs) were developed during the last\n", + "decade of the previous century, with a focus on character recognition\n", + "tasks. Nowadays, CNNs are a central element in the spectacular success\n", + "of deep learning methods. The success in for example image\n", + "classifications have made them a central tool for most machine\n", + "learning practitioners.\n", + "\n", + "CNNs are very similar to ordinary Neural Networks.\n", + "They are made up of neurons that have learnable weights and\n", + "biases. Each neuron receives some inputs, performs a dot product and\n", + "optionally follows it with a non-linearity. The whole network still\n", + "expresses a single differentiable score function: from the raw image\n", + "pixels on one end to class scores at the other. And they still have a\n", + "loss function (for example Softmax) on the last (fully-connected) layer\n", + "and all the tips/tricks we developed for learning regular Neural\n", + "Networks still apply (back propagation, gradient descent etc etc)." + ] + }, + { + "cell_type": "markdown", + "id": "5d3a5ee8", + "metadata": { + "editable": true + }, + "source": [ + "## What is the Difference\n", + "\n", + "**CNN architectures make the explicit assumption that\n", + "the inputs are images, which allows us to encode certain properties\n", + "into the architecture. These then make the forward function more\n", + "efficient to implement and vastly reduce the amount of parameters in\n", + "the network.**" + ] + }, + { + "cell_type": "markdown", + "id": "e8618fc8", + "metadata": { + "editable": true + }, + "source": [ + "## Neural Networks vs CNNs\n", + "\n", + "Neural networks are defined as **affine transformations**, that is \n", + "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n", + "output (to which a bias vector is usually added before passing the result\n", + "through a nonlinear activation function). This is applicable to any type of input, be it an\n", + "image, a sound clip or an unordered collection of features: whatever their\n", + "dimensionality, their representation can always be flattened into a vector\n", + "before the transformation." + ] + }, + { + "cell_type": "markdown", + "id": "b41b4781", + "metadata": { + "editable": true + }, + "source": [ + "## Why CNNS for images, sound files, medical images from CT scans etc?\n", + "\n", + "However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic\n", + "structure. More formally, they share these important properties:\n", + "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n", + "\n", + "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n", + "\n", + "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n", + "\n", + "These properties are not exploited when an affine transformation is applied; in\n", + "fact, all the axes are treated in the same way and the topological information\n", + "is not taken into account. Still, taking advantage of the implicit structure of\n", + "the data may prove very handy in solving some tasks, like computer vision and\n", + "speech recognition, and in these cases it would be best to preserve it. This is\n", + "where discrete convolutions come into play.\n", + "\n", + "A discrete convolution is a linear transformation that preserves this notion of\n", + "ordering. It is sparse (only a few input units contribute to a given output\n", + "unit) and reuses parameters (the same weights are applied to multiple locations\n", + "in the input)." + ] + }, + { + "cell_type": "markdown", + "id": "33bf8922", + "metadata": { + "editable": true + }, + "source": [ + "## Regular NNs don’t scale well to full images\n", + "\n", + "As an example, consider\n", + "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n", + "single fully-connected neuron in a first hidden layer of a regular\n", + "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n", + "seems manageable, but clearly this fully-connected structure does not\n", + "scale to larger images. For example, an image of more respectable\n", + "size, say $200\\times 200\\times 3$, would lead to neurons that have \n", + "$200\\times 200\\times 3 = 120,000$ weights. \n", + "\n", + "We could have\n", + "several such neurons, and the parameters would add up quickly! Clearly,\n", + "this full connectivity is wasteful and the huge number of parameters\n", + "would quickly lead to possible overfitting.\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A regular 3-layer Neural Network.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "95c20234", + "metadata": { + "editable": true + }, + "source": [ + "## 3D volumes of neurons\n", + "\n", + "Convolutional Neural Networks take advantage of the fact that the\n", + "input consists of images and they constrain the architecture in a more\n", + "sensible way. \n", + "\n", + "In particular, unlike a regular Neural Network, the\n", + "layers of a CNN have neurons arranged in 3 dimensions: width,\n", + "height, depth. (Note that the word depth here refers to the third\n", + "dimension of an activation volume, not to the depth of a full Neural\n", + "Network, which can refer to the total number of layers in a network.)\n", + "\n", + "To understand it better, the above example of an image \n", + "with an input volume of\n", + "activations has dimensions $32\\times 32\\times 3$ (width, height,\n", + "depth respectively). \n", + "\n", + "The neurons in a layer will\n", + "only be connected to a small region of the layer before it, instead of\n", + "all of the neurons in a fully-connected manner. Moreover, the final\n", + "output layer could for this specific image have dimensions $1\\times 1 \\times 10$, \n", + "because by the\n", + "end of the CNN architecture we will reduce the full image into a\n", + "single vector of class scores, arranged along the depth\n", + "dimension. \n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "2b7ba652", + "metadata": { + "editable": true + }, + "source": [ + "## More on Dimensionalities\n", + "\n", + "In fields like signal processing (and imaging as well), one designs\n", + "so-called filters. These filters are defined by the convolutions and\n", + "are often hand-crafted. One may specify filters for smoothing, edge\n", + "detection, frequency reshaping, and similar operations. However with\n", + "neural networks the idea is to automatically learn the filters and use\n", + "many of them in conjunction with non-linear operations (activation\n", + "functions).\n", + "\n", + "As an example consider a neural network operating on sound sequence\n", + "data. Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$. We\n", + "construct then a neural network with onle hidden layer only with\n", + "$10^4$ nodes. This means that we will have a weight matrix with\n", + "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n", + "\n", + "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n", + "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "b6a7ae46", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0d56b05e", + "metadata": { + "editable": true + }, + "source": [ + "that is ten billion parameters to determine." + ] + }, + { + "cell_type": "markdown", + "id": "35c90423", + "metadata": { + "editable": true + }, + "source": [ + "## Further remarks\n", + "\n", + "The main principles that justify convolutions is locality of\n", + "information and repetion of patterns within the signal. Sound samples\n", + "of the input in adjacent spots are much more likely to affect each\n", + "other than those that are very far away. Similarly, sounds are\n", + "repeated in multiple times in the signal. While slightly simplistic,\n", + "reasoning about such a sound example demonstrates this. The same\n", + "principles then apply to images and other similar data." + ] + }, + { + "cell_type": "markdown", + "id": "d08d4fb6", + "metadata": { + "editable": true + }, + "source": [ + "## Layers used to build CNNs\n", + "\n", + "A simple CNN is a sequence of layers, and every layer of a CNN\n", + "transforms one volume of activations to another through a\n", + "differentiable function. We use three main types of layers to build\n", + "CNN architectures: Convolutional Layer, Pooling Layer, and\n", + "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n", + "will stack these layers to form a full CNN architecture.\n", + "\n", + "A simple CNN for image classification could have the architecture:\n", + "\n", + "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n", + "\n", + "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n", + "\n", + "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n", + "\n", + "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n", + "\n", + "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume." + ] + }, + { + "cell_type": "markdown", + "id": "dd95dcc6", + "metadata": { + "editable": true + }, + "source": [ + "## Transforming images\n", + "\n", + "CNNs transform the original image layer by layer from the original\n", + "pixel values to the final class scores. \n", + "\n", + "Observe that some layers contain\n", + "parameters and other don’t. In particular, the CNN layers perform\n", + "transformations that are a function of not only the activations in the\n", + "input volume, but also of the parameters (the weights and biases of\n", + "the neurons). On the other hand, the RELU/POOL layers will implement a\n", + "fixed function. The parameters in the CONV/FC layers will be trained\n", + "with gradient descent so that the class scores that the CNN computes\n", + "are consistent with the labels in the training set for each image." + ] + }, + { + "cell_type": "markdown", + "id": "5fdbdbfd", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in brief\n", + "\n", + "In summary:\n", + "\n", + "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n", + "\n", + "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n", + "\n", + "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n", + "\n", + "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n", + "\n", + "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)" + ] + }, + { + "cell_type": "markdown", + "id": "c0cbb6b0", + "metadata": { + "editable": true + }, + "source": [ + "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "caf2418d", + "metadata": { + "editable": true + }, + "source": [ + "## Key Idea\n", + "\n", + "A dense neural network is representd by an affine operation (like\n", + "matrix-matrix multiplication) where all parameters are included.\n", + "\n", + "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n", + "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n", + "\n", + "We say we perform a filtering (convolution is the mathematical operation)." + ] + }, + { + "cell_type": "markdown", + "id": "7d5552d8", + "metadata": { + "editable": true + }, + "source": [ + "## How to do image compression before the era of deep learning\n", + "\n", + "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n", + "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n", + "\n", + "The orthogonal vectors which are obtained from the SVD, can be used to\n", + "project down the dimensionality of a given image. In the example here\n", + "we gray-scale an image and downsize it.\n", + "\n", + "This recipe relies on us being able to actually perform the SVD. For\n", + "large images, and in particular with many images to reconstruct, using the SVD \n", + "may quickly become an overwhelming task. With the advent of efficient deep\n", + "learning methods like CNNs and later generative methods, these methods\n", + "have become in the last years the premier way of performing image\n", + "analysis. In particular for classification problems with labelled images." + ] + }, + { + "cell_type": "markdown", + "id": "d0bc0489", + "metadata": { + "editable": true + }, + "source": [ + "## The SVD example" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cec697e6", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from matplotlib.image import imread\n", + "import matplotlib.pyplot as plt\n", + "import scipy.linalg as ln\n", + "import numpy as np\n", + "import os\n", + "from PIL import Image\n", + "from math import log10, sqrt \n", + "plt.rcParams['figure.figsize'] = [16, 8]\n", + "# Import image\n", + "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n", + "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n", + "img = plt.imshow(X)\n", + "# convert to gray\n", + "img.set_cmap('gray')\n", + "plt.axis('off')\n", + "plt.show()\n", + "# Call image size\n", + "print(': %s'%str(X.shape))\n", + "\n", + "\n", + "# split the matrix into U, S, VT\n", + "U, S, VT = np.linalg.svd(X,full_matrices=False)\n", + "S = np.diag(S)\n", + "m = 800 # Image's width\n", + "n = 1200 # Image's height\n", + "j = 0\n", + "# Try compression with different k vectors (these represent projections):\n", + "for k in (5,10, 20, 100,200,400,500):\n", + " # Original size of the image\n", + " originalSize = m * n \n", + " # Size after compressed\n", + " compressedSize = k * (1 + m + n) \n", + " # The projection of the original image\n", + " Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n", + " plt.figure(j+1)\n", + " j += 1\n", + " img = plt.imshow(Xapprox)\n", + " img.set_cmap('gray')\n", + " \n", + " plt.axis('off')\n", + " plt.title('k = ' + str(k))\n", + " plt.show() \n", + " print('Original size of image:')\n", + " print(originalSize)\n", + " print('Compression rate as Compressed image / Original size:')\n", + " ratio = compressedSize * 1.0 / originalSize\n", + " print(ratio)\n", + " print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' ) \n", + " # Estimate MQA\n", + " x= X.astype(\"float\")\n", + " y=Xapprox.astype(\"float\")\n", + " err = np.sum((x - y) ** 2)\n", + " err /= float(X.shape[0] * Xapprox.shape[1])\n", + " print('The mean-square deviation '+ str(round( err)))\n", + " max_pixel = 255.0\n", + " # Estimate Signal Noise Ratio\n", + " srv = 20 * (log10(max_pixel / sqrt(err)))\n", + " print('Signa to noise ratio '+ str(round(srv)) +'dB')" + ] + }, + { + "cell_type": "markdown", + "id": "6a578704", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of CNNs\n", + "\n", + "The mathematics of CNNs is based on the mathematical operation of\n", + "**convolution**. In mathematics (in particular in functional analysis),\n", + "convolution is represented by mathematical operations (integration,\n", + "summation etc) on two functions in order to produce a third function\n", + "that expresses how the shape of one gets modified by the other.\n", + "Convolution has a plethora of applications in a variety of\n", + "disciplines, spanning from statistics to signal processing, computer\n", + "vision, solutions of differential equations,linear algebra,\n", + "engineering, and yes, machine learning.\n", + "\n", + "Mathematically, convolution is defined as follows (one-dimensional example):\n", + "Let us define a continuous function $y(t)$ given by" + ] + }, + { + "cell_type": "markdown", + "id": "5c858d52", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\int x(a) w(t-a) da,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a96333c3", + "metadata": { + "editable": true + }, + "source": [ + "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n", + "\n", + "The above integral is written in a more compact form as" + ] + }, + { + "cell_type": "markdown", + "id": "9834d45e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\left(x * w\\right)(t).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "13e15c5f", + "metadata": { + "editable": true + }, + "source": [ + "The discretized version reads" + ] + }, + { + "cell_type": "markdown", + "id": "0a496b2f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "48c5ecd3", + "metadata": { + "editable": true + }, + "source": [ + "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n", + "\n", + "How can we use this? And what does it mean? Let us study some familiar examples first." + ] + }, + { + "cell_type": "markdown", + "id": "7cab11e7", + "metadata": { + "editable": true + }, + "source": [ + "## Convolution Examples: Polynomial multiplication\n", + "\n", + "Our first example is that of a multiplication between two polynomials,\n", + "which we will rewrite in terms of the mathematics of convolution. In\n", + "the final stage, since the problem here is a discrete one, we will\n", + "recast the final expression in terms of a matrix-vector\n", + "multiplication, where the matrix is a so-called [Toeplitz matrix\n", + "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n", + "\n", + "Let us look a the following polynomials to second and third order, respectively:" + ] + }, + { + "cell_type": "markdown", + "id": "c90333f8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c1b0c9b", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9c8df6e8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "50667dfa", + "metadata": { + "editable": true + }, + "source": [ + "The polynomial multiplication gives us a new polynomial of degree $5$" + ] + }, + { + "cell_type": "markdown", + "id": "11f2ea4b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4abea758", + "metadata": { + "editable": true + }, + "source": [ + "## Efficient Polynomial Multiplication\n", + "\n", + "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n", + "We note first that the new coefficients are given as" + ] + }, + { + "cell_type": "markdown", + "id": "ad22b2d2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{split}\n", + "\\delta_0=&\\alpha_0\\beta_0\\\\\n", + "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n", + "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n", + "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n", + "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n", + "\\delta_5=&\\alpha_2\\beta_3.\\\\\n", + "\\end{split}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a3ee064", + "metadata": { + "editable": true + }, + "source": [ + "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n", + "\n", + "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as" + ] + }, + { + "cell_type": "markdown", + "id": "3aca65d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0e04ce27", + "metadata": { + "editable": true + }, + "source": [ + "or as a double sum with restriction $l=i+j$" + ] + }, + { + "cell_type": "markdown", + "id": "173eda29", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a196c2cd", + "metadata": { + "editable": true + }, + "source": [ + "## Further simplification\n", + "\n", + "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as" + ] + }, + { + "cell_type": "markdown", + "id": "56018bb8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba91ab7b", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of\n", + "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation." + ] + }, + { + "cell_type": "markdown", + "id": "1b25324b", + "metadata": { + "editable": true + }, + "source": [ + "## A more efficient way of coding the above Convolution\n", + "\n", + "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n", + "which are non-zero, we can rewrite the above convolution expressions\n", + "as a matrix-vector multiplication" + ] + }, + { + "cell_type": "markdown", + "id": "dd6d9155", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n", + " \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n", + "\t\t\t \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n", + "\t\t\t 0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n", + "\t\t\t 0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n", + "\t\t\t 0 & 0 & 0 & \\alpha_2\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "28050537", + "metadata": { + "editable": true + }, + "source": [ + "## Commutative process\n", + "\n", + "The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding $\\beta$ and a vector holding $\\alpha$.\n", + "In this case we have" + ] + }, + { + "cell_type": "markdown", + "id": "f8278af4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0 \\\\\n", + " \\beta_1 & \\beta_0 & 0 \\\\\n", + "\t\t\t \\beta_2 & \\beta_1 & \\beta_0 \\\\\n", + "\t\t\t \\beta_3 & \\beta_2 & \\beta_1 \\\\\n", + "\t\t\t 0 & \\beta_3 & \\beta_2 \\\\\n", + "\t\t\t 0 & 0 & \\beta_3\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cfa8bf9e", + "metadata": { + "editable": true + }, + "source": [ + "Note that the use of these matrices is for mathematical purposes only\n", + "and not implementation purposes. When implementing the above equation\n", + "we do not encode (and allocate memory) the matrices explicitely. We\n", + "rather code the convolutions in the minimal memory footprint that they\n", + "require." + ] + }, + { + "cell_type": "markdown", + "id": "4ad971ca", + "metadata": { + "editable": true + }, + "source": [ + "## Toeplitz matrices\n", + "\n", + "The above matrices are examples of so-called [Toeplitz\n", + "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n", + "Toeplitz matrix is a matrix in which each descending diagonal from\n", + "left to right is constant. For instance the last matrix, which we\n", + "rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "ff12250a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0 \\\\\n", + " a_1 & a_0 & 0 \\\\\n", + "\t\t\t a_2 & a_1 & a_0 \\\\\n", + "\t\t\t a_3 & a_2 & a_1 \\\\\n", + "\t\t\t 0 & a_3 & a_2 \\\\\n", + "\t\t\t 0 & 0 & a_3\n", + "\t\t\t \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d7ebe2e9", + "metadata": { + "editable": true + }, + "source": [ + "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n", + "matrix. Such a matrix does not need to be a square matrix. Toeplitz\n", + "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n", + "polynomial, compressed to a finite-dimensional space, can be\n", + "represented by such a matrix. The example above shows that we can\n", + "represent linear convolution as multiplication of a Toeplitz matrix by\n", + "a vector." + ] + }, + { + "cell_type": "markdown", + "id": "5bfa6cd4", + "metadata": { + "editable": true + }, + "source": [ + "## Fourier series and Toeplitz matrices\n", + "\n", + "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n", + "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n", + "\n", + "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)" + ] + }, + { + "cell_type": "markdown", + "id": "4cb64d8c", + "metadata": { + "editable": true + }, + "source": [ + "## Generalizing the above one-dimensional case\n", + "\n", + "In order to align the above simple case with the more general\n", + "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n", + "with $\\boldsymbol{w}$. We will interpret $\\boldsymbol{w}$ as a weight/filter function\n", + "with which we want to perform the convolution with an input variable\n", + "$\\boldsymbol{x}$ of length $n$. We will assume always that the filter\n", + "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n", + "\n", + "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have" + ] + }, + { + "cell_type": "markdown", + "id": "b05f94fc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e95bb8b8", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n", + "Here the symbol $*$ represents the mathematical operation of convolution." + ] + }, + { + "cell_type": "markdown", + "id": "490b28d9", + "metadata": { + "editable": true + }, + "source": [ + "## Memory considerations\n", + "\n", + "This expression leaves us however with some terms with negative\n", + "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n", + "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n", + "\n", + "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n", + "represent a third-order polynomial.\n", + "\n", + "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n", + "contains the coefficients of a fifth-order polynomial. When $i=5$ we\n", + "may also have values of $x(4)$ and $x(5)$ which are not defined." + ] + }, + { + "cell_type": "markdown", + "id": "73dba37b", + "metadata": { + "editable": true + }, + "source": [ + "## Padding\n", + "\n", + "The solution to this is what is called **padding**! We simply define a\n", + "new vector $x$ with two added elements set to zero before $x(0)$ and\n", + "two new elements after $x(3)$ set to zero. That is, we augment the\n", + "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n", + "constant (a new hyperparameter), see discussions below as well." + ] + }, + { + "cell_type": "markdown", + "id": "a4ef9cfb", + "metadata": { + "editable": true + }, + "source": [ + "## New vector\n", + "\n", + "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n", + "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n", + "$x(6)=0$, and $x(7)=0$.\n", + "\n", + "We have added four new elements, which\n", + "are all zero. The benefit is that we can rewrite the equation for\n", + "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$," + ] + }, + { + "cell_type": "markdown", + "id": "a3df037d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a10c95fd", + "metadata": { + "editable": true + }, + "source": [ + "As an example, we have" + ] + }, + { + "cell_type": "markdown", + "id": "be674b8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c903130e", + "metadata": { + "editable": true + }, + "source": [ + "as before except that we have an additional term $x(6)w(0)$, which is zero.\n", + "\n", + "Similarly, for the fifth-order term we have" + ] + }, + { + "cell_type": "markdown", + "id": "369fb648", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9eae3982", + "metadata": { + "editable": true + }, + "source": [ + "The zeroth-order term is" + ] + }, + { + "cell_type": "markdown", + "id": "52147ec0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f26b1f24", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting as dot products\n", + "\n", + "If we now flip the filter/weight vector, with the following term as a typical example" + ] + }, + { + "cell_type": "markdown", + "id": "1cda7b7e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "de80daa7", + "metadata": { + "editable": true + }, + "source": [ + "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n", + "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n", + "\n", + "The padding $P$ we have introduced for the convolution stage is just\n", + "another hyperparameter which is introduced as part of the\n", + "architecture. Similarly, below we will also introduce another\n", + "hyperparameter called **Stride** $S$." + ] + }, + { + "cell_type": "markdown", + "id": "bdb16a64", + "metadata": { + "editable": true + }, + "source": [ + "## Cross correlation\n", + "\n", + "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n", + "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)" + ] + }, + { + "cell_type": "markdown", + "id": "a88a1043", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "03659d77", + "metadata": { + "editable": true + }, + "source": [ + "we have now" + ] + }, + { + "cell_type": "markdown", + "id": "532e84de", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0487f1f5", + "metadata": { + "editable": true + }, + "source": [ + "Both TensorFlow and PyTorch (as well as our own code example below),\n", + "implement the last equation, although it is normally referred to as\n", + "convolution. The same padding rules and stride rules discussed below\n", + "apply to this expression as well.\n", + "\n", + "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression." + ] + }, + { + "cell_type": "markdown", + "id": "98475dfa", + "metadata": { + "editable": true + }, + "source": [ + "## Two-dimensional objects\n", + "\n", + "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n", + "We often use convolutions over more than one dimension at a time. If\n", + "we have a two-dimensional image $X$ as input, we can have a **filter**\n", + "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$" + ] + }, + { + "cell_type": "markdown", + "id": "1cb3be71", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd9cd9fb", + "metadata": { + "editable": true + }, + "source": [ + "Convolution is a commutative process, which means we can rewrite this equation as" + ] + }, + { + "cell_type": "markdown", + "id": "1ba314a8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b9fb3fef", + "metadata": { + "editable": true + }, + "source": [ + "Normally the latter is more straightforward to implement in a machine\n", + "larning library since there is less variation in the range of values\n", + "of $m$ and $n$.\n", + "\n", + "As mentioned above, most deep learning libraries implement\n", + "cross-correlation instead of convolution (although it is referred to as\n", + "convolution)" + ] + }, + { + "cell_type": "markdown", + "id": "2d48086b", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2a62fbae", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in more detail, simple example\n", + "\n", + "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n", + "and a $2\\times 2$ filter $W$ given by the following matrices" + ] + }, + { + "cell_type": "markdown", + "id": "0176ecc6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02} \\\\\n", + " x_{10} & x_{11} & x_{12} \\\\\n", + "\t x_{20} & x_{21} & x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f87b6051", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "164502cc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n", + "\t w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1d4e61fe", + "metadata": { + "editable": true + }, + "source": [ + "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n", + "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n", + "\n", + "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n", + "\n", + "Here we perform the operation" + ] + }, + { + "cell_type": "markdown", + "id": "7aae890d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "352ba109", + "metadata": { + "editable": true + }, + "source": [ + "and obtain" + ] + }, + { + "cell_type": "markdown", + "id": "4660c16f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\\\\n", + "\t x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "edb9d39b", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n", + "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as" + ] + }, + { + "cell_type": "markdown", + "id": "11470079", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9b505f16", + "metadata": { + "editable": true + }, + "source": [ + "and the new matrix" + ] + }, + { + "cell_type": "markdown", + "id": "30c903b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n", + " 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n", + "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\\\\n", + " 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "057a5e31", + "metadata": { + "editable": true + }, + "source": [ + "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "e5f35917", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7c7cca5e", + "metadata": { + "editable": true + }, + "source": [ + "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\\times 2$ output matrix." + ] + }, + { + "cell_type": "markdown", + "id": "ed8782fc", + "metadata": { + "editable": true + }, + "source": [ + "## The convolution stage\n", + "\n", + "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n", + "order to reduce the dimensionality of an image, adds, in addition to\n", + "the weights and biases (to be trained by the back propagation\n", + "algorithm) that define the filters, two new hyperparameters, the so-called\n", + "**padding** $P$ and the stride $S$." + ] + }, + { + "cell_type": "markdown", + "id": "3582873f", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the number of parameters\n", + "\n", + "In the above example we have an input matrix of dimension $3\\times\n", + "3$. In general we call the input for an input volume and it is defined\n", + "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n", + "standard three color channels $D_1=3$.\n", + "\n", + "The above example has $W_1=H_1=3$ and $D_1=1$.\n", + "\n", + "When we introduce the filter we have the following additional hyperparameters\n", + "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n", + "\n", + "2. $F$ as the filter's spatial extent\n", + "\n", + "3. $S$ as the stride parameter\n", + "\n", + "4. $P$ as the padding parameter\n", + "\n", + "These parameters are defined by the architecture of the network and are not included in the training." + ] + }, + { + "cell_type": "markdown", + "id": "c06b2b85", + "metadata": { + "editable": true + }, + "source": [ + "## New image (or volume)\n", + "\n", + "Acting with the filter on the input volume produces an output volume\n", + "which is defined by its width $W_2$, its height $H_2$ and its depth\n", + "$D_2$.\n", + "\n", + "These are defined by the following relations" + ] + }, + { + "cell_type": "markdown", + "id": "aa9ff748", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0508533e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2b59d0d6", + "metadata": { + "editable": true + }, + "source": [ + "and $D_2=K$." + ] + }, + { + "cell_type": "markdown", + "id": "e283f13b", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters to train, common settings\n", + "\n", + "With parameter sharing, the convolution involves thus for each filter $F\\times F\\times D_1$ weights plus one bias parameter.\n", + "\n", + "In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "59617fcb", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f406e197", + "metadata": { + "editable": true + }, + "source": [ + "parameters to train by back propagation.\n", + "\n", + "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n", + "\n", + "**Common settings.**\n", + "\n", + "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n", + "\n", + "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n", + "\n", + "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n", + "\n", + "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$" + ] + }, + { + "cell_type": "markdown", + "id": "82febfb4", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of CNN setups\n", + "\n", + "Let us assume we have an input volume $V$ given by an image of dimensionality\n", + "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n", + "\n", + "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n", + "\n", + "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n", + "of dimensionality $28\\times 28\\times 3$.\n", + "\n", + "The total number of parameters to train for each filter is then\n", + "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n", + "gives us $76$ parameters for each filter, leading to a total of $760$\n", + "parameters for the ten filters.\n", + "\n", + "How many parameters will a filter of dimensionality $3\\times 3$\n", + "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n", + "\n", + "Note that strides constitute a form of **subsampling**. As an alternative to\n", + "being interpreted as a measure of how much the kernel/filter is translated, strides\n", + "can also be viewed as how much of the output is retained. For instance, moving\n", + "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n", + "retaining only odd output elements." + ] + }, + { + "cell_type": "markdown", + "id": "638e063c", + "metadata": { + "editable": true + }, + "source": [ + "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "d182de4b", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling\n", + "\n", + "In addition to discrete convolutions themselves, **pooling** operations\n", + "make up another important building block in CNNs. Pooling operations reduce\n", + "the size of feature maps by using some function to summarize subregions, such\n", + "as taking the average or the maximum value.\n", + "\n", + "Pooling works by sliding a window across the input and feeding the content of\n", + "the window to a **pooling function**. In some sense, pooling works very much\n", + "like a discrete convolution, but replaces the linear combination described by\n", + "the kernel with some other function." + ] + }, + { + "cell_type": "markdown", + "id": "1159bffe", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling arithmetic\n", + "\n", + "In a neural network, pooling layers provide invariance to small translations of\n", + "the input. The most common kind of pooling is **max pooling**, which\n", + "consists in splitting the input in (usually non-overlapping) patches and\n", + "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n", + "mean or average pooling, which all share the same idea of aggregating the input\n", + "locally by applying a non-linearity to the content of some patches." + ] + }, + { + "cell_type": "markdown", + "id": "138b6d6a", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "97123878", + "metadata": { + "editable": true + }, + "source": [ + "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n", + "\n", + "As discussed above, CNNs are neural networks built from the assumption\n", + "that the inputs to the network are 2D images. This is important\n", + "because the number of features or pixels in images grows very fast\n", + "with the image size, and an enormous number of weights and biases are\n", + "needed in order to build an accurate network. Next week we will\n", + "discuss in more detail how we can build a CNN using either TensorFlow\n", + "with Keras and PyTorch." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/LectureNotes/week45.ipynb b/doc/LectureNotes/week45.ipynb new file mode 100644 index 000000000..c5336e2ab --- /dev/null +++ b/doc/LectureNotes/week45.ipynb @@ -0,0 +1,2335 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9686648f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "45892517", + "metadata": { + "editable": true + }, + "source": [ + "# Week 45, Convolutional Neural Networks (CCNs)\n", + "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n", + "\n", + "Date: **November 3-7, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "8449fbfd", + "metadata": { + "editable": true + }, + "source": [ + "## Plans for week 45\n", + "\n", + "**Material for the lecture on Monday November 3, 2025.**\n", + "\n", + "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n", + "\n", + "2. Readings and Videos:\n", + "\n", + "3. These lecture notes at \n", + "\n", + "4. Video of lecture at \n", + "\n", + "5. Whiteboard notes at \n", + "\n", + "6. For a more in depth discussion on CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications \n", + "\n", + "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at .\n", + "\n", + "\n", + "a. Video on Deep Learning at " + ] + }, + { + "cell_type": "markdown", + "id": "4ad8a4b2", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "Discussion of and work on project 2, no exercises this week, only project work" + ] + }, + { + "cell_type": "markdown", + "id": "48e99fbe", + "metadata": { + "editable": true + }, + "source": [ + "## Material for Lecture Monday November 3" + ] + }, + { + "cell_type": "markdown", + "id": "661e183c", + "metadata": { + "editable": true + }, + "source": [ + "## Convolutional Neural Networks (recognizing images), reminder from last week\n", + "\n", + "Convolutional neural networks (CNNs) were developed during the last\n", + "decade of the previous century, with a focus on character recognition\n", + "tasks. Nowadays, CNNs are a central element in the spectacular success\n", + "of deep learning methods. The success in for example image\n", + "classifications have made them a central tool for most machine\n", + "learning practitioners.\n", + "\n", + "CNNs are very similar to ordinary Neural Networks.\n", + "They are made up of neurons that have learnable weights and\n", + "biases. Each neuron receives some inputs, performs a dot product and\n", + "optionally follows it with a non-linearity. The whole network still\n", + "expresses a single differentiable score function: from the raw image\n", + "pixels on one end to class scores at the other. And they still have a\n", + "loss function (for example Softmax) on the last (fully-connected) layer\n", + "and all the tips/tricks we developed for learning regular Neural\n", + "Networks still apply (back propagation, gradient descent etc etc)." + ] + }, + { + "cell_type": "markdown", + "id": "96a38398", + "metadata": { + "editable": true + }, + "source": [ + "## What is the Difference\n", + "\n", + "**CNN architectures make the explicit assumption that\n", + "the inputs are images, which allows us to encode certain properties\n", + "into the architecture. These then make the forward function more\n", + "efficient to implement and vastly reduce the amount of parameters in\n", + "the network.**" + ] + }, + { + "cell_type": "markdown", + "id": "3ca522fb", + "metadata": { + "editable": true + }, + "source": [ + "## Neural Networks vs CNNs\n", + "\n", + "Neural networks are defined as **affine transformations**, that is \n", + "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n", + "output (to which a bias vector is usually added before passing the result\n", + "through a nonlinear activation function). This is applicable to any type of input, be it an\n", + "image, a sound clip or an unordered collection of features: whatever their\n", + "dimensionality, their representation can always be flattened into a vector\n", + "before the transformation." + ] + }, + { + "cell_type": "markdown", + "id": "609aa156", + "metadata": { + "editable": true + }, + "source": [ + "## Why CNNS for images, sound files, medical images from CT scans etc?\n", + "\n", + "However, when we consider images, sound clips and many other similar kinds of data, these data have an intrinsic\n", + "structure. More formally, they share these important properties:\n", + "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n", + "\n", + "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n", + "\n", + "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n", + "\n", + "These properties are not exploited when an affine transformation is applied; in\n", + "fact, all the axes are treated in the same way and the topological information\n", + "is not taken into account. Still, taking advantage of the implicit structure of\n", + "the data may prove very handy in solving some tasks, like computer vision and\n", + "speech recognition, and in these cases it would be best to preserve it. This is\n", + "where discrete convolutions come into play.\n", + "\n", + "A discrete convolution is a linear transformation that preserves this notion of\n", + "ordering. It is sparse (only a few input units contribute to a given output\n", + "unit) and reuses parameters (the same weights are applied to multiple locations\n", + "in the input)." + ] + }, + { + "cell_type": "markdown", + "id": "c280e4de", + "metadata": { + "editable": true + }, + "source": [ + "## Regular NNs don’t scale well to full images\n", + "\n", + "As an example, consider\n", + "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n", + "single fully-connected neuron in a first hidden layer of a regular\n", + "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n", + "seems manageable, but clearly this fully-connected structure does not\n", + "scale to larger images. For example, an image of more respectable\n", + "size, say $200\\times 200\\times 3$, would lead to neurons that have \n", + "$200\\times 200\\times 3 = 120,000$ weights. \n", + "\n", + "We could have\n", + "several such neurons, and the parameters would add up quickly! Clearly,\n", + "this full connectivity is wasteful and the huge number of parameters\n", + "would quickly lead to possible overfitting.\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A regular 3-layer Neural Network.

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "0d86d50e", + "metadata": { + "editable": true + }, + "source": [ + "## 3D volumes of neurons\n", + "\n", + "Convolutional Neural Networks take advantage of the fact that the\n", + "input consists of images and they constrain the architecture in a more\n", + "sensible way. \n", + "\n", + "In particular, unlike a regular Neural Network, the\n", + "layers of a CNN have neurons arranged in 3 dimensions: width,\n", + "height, depth. (Note that the word depth here refers to the third\n", + "dimension of an activation volume, not to the depth of a full Neural\n", + "Network, which can refer to the total number of layers in a network.)\n", + "\n", + "To understand it better, the above example of an image \n", + "with an input volume of\n", + "activations has dimensions $32\\times 32\\times 3$ (width, height,\n", + "depth respectively). \n", + "\n", + "The neurons in a layer will\n", + "only be connected to a small region of the layer before it, instead of\n", + "all of the neurons in a fully-connected manner. Moreover, the final\n", + "output layer could for this specific image have dimensions $1\\times 1 \\times 10$, \n", + "because by the\n", + "end of the CNN architecture we will reduce the full image into a\n", + "single vector of class scores, arranged along the depth\n", + "dimension. \n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "93102a35", + "metadata": { + "editable": true + }, + "source": [ + "## More on Dimensionalities\n", + "\n", + "In fields like signal processing (and imaging as well), one designs\n", + "so-called filters. These filters are defined by the convolutions and\n", + "are often hand-crafted. One may specify filters for smoothing, edge\n", + "detection, frequency reshaping, and similar operations. However with\n", + "neural networks the idea is to automatically learn the filters and use\n", + "many of them in conjunction with non-linear operations (activation\n", + "functions).\n", + "\n", + "As an example consider a neural network operating on sound sequence\n", + "data. Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$. We\n", + "construct then a neural network with onle hidden layer only with\n", + "$10^4$ nodes. This means that we will have a weight matrix with\n", + "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n", + "\n", + "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n", + "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "b0e6ea33", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3fbba997", + "metadata": { + "editable": true + }, + "source": [ + "that is ten billion parameters to determine." + ] + }, + { + "cell_type": "markdown", + "id": "4be9d3e0", + "metadata": { + "editable": true + }, + "source": [ + "## Further remarks\n", + "\n", + "The main principles that justify convolutions is locality of\n", + "information and repetion of patterns within the signal. Sound samples\n", + "of the input in adjacent spots are much more likely to affect each\n", + "other than those that are very far away. Similarly, sounds are\n", + "repeated in multiple times in the signal. While slightly simplistic,\n", + "reasoning about such a sound example demonstrates this. The same\n", + "principles then apply to images and other similar data." + ] + }, + { + "cell_type": "markdown", + "id": "b93711ab", + "metadata": { + "editable": true + }, + "source": [ + "## Layers used to build CNNs\n", + "\n", + "A simple CNN is a sequence of layers, and every layer of a CNN\n", + "transforms one volume of activations to another through a\n", + "differentiable function. We use three main types of layers to build\n", + "CNN architectures: Convolutional Layer, Pooling Layer, and\n", + "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n", + "will stack these layers to form a full CNN architecture.\n", + "\n", + "A simple CNN for image classification could have the architecture:\n", + "\n", + "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n", + "\n", + "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n", + "\n", + "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n", + "\n", + "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n", + "\n", + "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume." + ] + }, + { + "cell_type": "markdown", + "id": "df93de2c", + "metadata": { + "editable": true + }, + "source": [ + "## Transforming images\n", + "\n", + "CNNs transform the original image layer by layer from the original\n", + "pixel values to the final class scores. \n", + "\n", + "Observe that some layers contain\n", + "parameters and other don’t. In particular, the CNN layers perform\n", + "transformations that are a function of not only the activations in the\n", + "input volume, but also of the parameters (the weights and biases of\n", + "the neurons). On the other hand, the RELU/POOL layers will implement a\n", + "fixed function. The parameters in the CONV/FC layers will be trained\n", + "with gradient descent so that the class scores that the CNN computes\n", + "are consistent with the labels in the training set for each image." + ] + }, + { + "cell_type": "markdown", + "id": "35b469f8", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in brief\n", + "\n", + "In summary:\n", + "\n", + "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n", + "\n", + "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n", + "\n", + "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n", + "\n", + "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n", + "\n", + "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)" + ] + }, + { + "cell_type": "markdown", + "id": "f2bc243c", + "metadata": { + "editable": true + }, + "source": [ + "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "92956a26", + "metadata": { + "editable": true + }, + "source": [ + "## Key Idea\n", + "\n", + "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n", + "\n", + "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n", + "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n", + "\n", + "We say we perform a filtering (convolution is the mathematical operation)." + ] + }, + { + "cell_type": "markdown", + "id": "b758f4ee", + "metadata": { + "editable": true + }, + "source": [ + "## Mathematics of CNNs\n", + "\n", + "The mathematics of CNNs is based on the mathematical operation of\n", + "**convolution**. In mathematics (in particular in functional analysis),\n", + "convolution is represented by mathematical operations (integration,\n", + "summation etc) on two functions in order to produce a third function\n", + "that expresses how the shape of one gets modified by the other.\n", + "Convolution has a plethora of applications in a variety of\n", + "disciplines, spanning from statistics to signal processing, computer\n", + "vision, solutions of differential equations,linear algebra,\n", + "engineering, and yes, machine learning.\n", + "\n", + "Mathematically, convolution is defined as follows (one-dimensional example):\n", + "Let us define a continuous function $y(t)$ given by" + ] + }, + { + "cell_type": "markdown", + "id": "9fa911b3", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\int x(a) w(t-a) da,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "918817a5", + "metadata": { + "editable": true + }, + "source": [ + "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n", + "\n", + "The above integral is written in a more compact form as" + ] + }, + { + "cell_type": "markdown", + "id": "d5538df6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\left(x * w\\right)(t).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d4a4e2bc", + "metadata": { + "editable": true + }, + "source": [ + "The discretized version reads" + ] + }, + { + "cell_type": "markdown", + "id": "68268e68", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "198bcce9", + "metadata": { + "editable": true + }, + "source": [ + "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n", + "\n", + "How can we use this? And what does it mean? Let us study some familiar examples first." + ] + }, + { + "cell_type": "markdown", + "id": "43b535c4", + "metadata": { + "editable": true + }, + "source": [ + "## Convolution Examples: Polynomial multiplication\n", + "\n", + "Our first example is that of a multiplication between two polynomials,\n", + "which we will rewrite in terms of the mathematics of convolution. In\n", + "the final stage, since the problem here is a discrete one, we will\n", + "recast the final expression in terms of a matrix-vector\n", + "multiplication, where the matrix is a so-called [Toeplitz matrix\n", + "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n", + "\n", + "Let us look a the following polynomials to second and third order, respectively:" + ] + }, + { + "cell_type": "markdown", + "id": "45bc8ffc", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2c42df04", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "08c139bf", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bf189420", + "metadata": { + "editable": true + }, + "source": [ + "The polynomial multiplication gives us a new polynomial of degree $5$" + ] + }, + { + "cell_type": "markdown", + "id": "7f5d7607", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a2f47e64", + "metadata": { + "editable": true + }, + "source": [ + "## Efficient Polynomial Multiplication\n", + "\n", + "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n", + "We note first that the new coefficients are given as" + ] + }, + { + "cell_type": "markdown", + "id": "7890aee8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\begin{split}\n", + "\\delta_0=&\\alpha_0\\beta_0\\\\\n", + "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n", + "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n", + "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n", + "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n", + "\\delta_5=&\\alpha_2\\beta_3.\\\\\n", + "\\end{split}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6a03a3eb", + "metadata": { + "editable": true + }, + "source": [ + "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n", + "\n", + "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as" + ] + }, + { + "cell_type": "markdown", + "id": "b49e404f", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4ef5b061", + "metadata": { + "editable": true + }, + "source": [ + "or as a double sum with restriction $l=i+j$" + ] + }, + { + "cell_type": "markdown", + "id": "61685a6c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7ced5341", + "metadata": { + "editable": true + }, + "source": [ + "## Further simplification\n", + "\n", + "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as" + ] + }, + { + "cell_type": "markdown", + "id": "3d00697e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "22837be3", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of\n", + "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation." + ] + }, + { + "cell_type": "markdown", + "id": "1603a086", + "metadata": { + "editable": true + }, + "source": [ + "## A more efficient way of coding the above Convolution\n", + "\n", + "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n", + "which are non-zero, we can rewrite the above convolution expressions\n", + "as a matrix-vector multiplication" + ] + }, + { + "cell_type": "markdown", + "id": "340acf5c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n", + " \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n", + "\t\t\t \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n", + "\t\t\t 0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n", + "\t\t\t 0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n", + "\t\t\t 0 & 0 & 0 & \\alpha_2\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cdc8d513", + "metadata": { + "editable": true + }, + "source": [ + "## Commutative process\n", + "\n", + "The process is commutative and we can easily see that we can rewrite the multiplication in terms of a matrix holding $\\beta$ and a vector holding $\\alpha$.\n", + "In this case we have" + ] + }, + { + "cell_type": "markdown", + "id": "51e1f3d8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0 \\\\\n", + " \\beta_1 & \\beta_0 & 0 \\\\\n", + "\t\t\t \\beta_2 & \\beta_1 & \\beta_0 \\\\\n", + "\t\t\t \\beta_3 & \\beta_2 & \\beta_1 \\\\\n", + "\t\t\t 0 & \\beta_3 & \\beta_2 \\\\\n", + "\t\t\t 0 & 0 & \\beta_3\n", + "\t\t\t \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ce936f65", + "metadata": { + "editable": true + }, + "source": [ + "Note that the use of these matrices is for mathematical purposes only\n", + "and not implementation purposes. When implementing the above equation\n", + "we do not encode (and allocate memory) the matrices explicitely. We\n", + "rather code the convolutions in the minimal memory footprint that they\n", + "require." + ] + }, + { + "cell_type": "markdown", + "id": "c93a683f", + "metadata": { + "editable": true + }, + "source": [ + "## Toeplitz matrices\n", + "\n", + "The above matrices are examples of so-called [Toeplitz\n", + "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n", + "Toeplitz matrix is a matrix in which each descending diagonal from\n", + "left to right is constant. For instance the last matrix, which we\n", + "rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "1e3cffca", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0 \\\\\n", + " a_1 & a_0 & 0 \\\\\n", + "\t\t\t a_2 & a_1 & a_0 \\\\\n", + "\t\t\t a_3 & a_2 & a_1 \\\\\n", + "\t\t\t 0 & a_3 & a_2 \\\\\n", + "\t\t\t 0 & 0 & a_3\n", + "\t\t\t \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e27270d9", + "metadata": { + "editable": true + }, + "source": [ + "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n", + "matrix. Such a matrix does not need to be a square matrix. Toeplitz\n", + "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n", + "polynomial, compressed to a finite-dimensional space, can be\n", + "represented by such a matrix. The example above shows that we can\n", + "represent linear convolution as multiplication of a Toeplitz matrix by\n", + "a vector." + ] + }, + { + "cell_type": "markdown", + "id": "125ef645", + "metadata": { + "editable": true + }, + "source": [ + "## Fourier series and Toeplitz matrices\n", + "\n", + "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n", + "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n", + "\n", + "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)" + ] + }, + { + "cell_type": "markdown", + "id": "d13ab1e4", + "metadata": { + "editable": true + }, + "source": [ + "## Generalizing the above one-dimensional case\n", + "\n", + "In order to align the above simple case with the more general\n", + "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n", + "with $\\boldsymbol{w}$. We will interpret $\\boldsymbol{w}$ as a weight/filter function\n", + "with which we want to perform the convolution with an input variable\n", + "$\\boldsymbol{x}$ of length $n$. We will assume always that the filter\n", + "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n", + "\n", + "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have" + ] + }, + { + "cell_type": "markdown", + "id": "b9eb4b1e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bdf0893f", + "metadata": { + "editable": true + }, + "source": [ + "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n", + "Here the symbol $*$ represents the mathematical operation of convolution." + ] + }, + { + "cell_type": "markdown", + "id": "64cd5dbb", + "metadata": { + "editable": true + }, + "source": [ + "## Memory considerations\n", + "\n", + "This expression leaves us however with some terms with negative\n", + "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n", + "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n", + "\n", + "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n", + "represent a third-order polynomial.\n", + "\n", + "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n", + "contains the coefficients of a fifth-order polynomial. When $i=5$ we\n", + "may also have values of $x(4)$ and $x(5)$ which are not defined." + ] + }, + { + "cell_type": "markdown", + "id": "20fa0219", + "metadata": { + "editable": true + }, + "source": [ + "## Padding\n", + "\n", + "The solution to this is what is called **padding**! We simply define a\n", + "new vector $x$ with two added elements set to zero before $x(0)$ and\n", + "two new elements after $x(3)$ set to zero. That is, we augment the\n", + "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n", + "constant (a new hyperparameter), see discussions below as well." + ] + }, + { + "cell_type": "markdown", + "id": "d24c7e69", + "metadata": { + "editable": true + }, + "source": [ + "## New vector\n", + "\n", + "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n", + "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n", + "$x(6)=0$, and $x(7)=0$.\n", + "\n", + "We have added four new elements, which\n", + "are all zero. The benefit is that we can rewrite the equation for\n", + "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$," + ] + }, + { + "cell_type": "markdown", + "id": "c00151a8", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c9b39bfd", + "metadata": { + "editable": true + }, + "source": [ + "As an example, we have" + ] + }, + { + "cell_type": "markdown", + "id": "53de5ac4", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e1025d77", + "metadata": { + "editable": true + }, + "source": [ + "as before except that we have an additional term $x(6)w(0)$, which is zero.\n", + "\n", + "Similarly, for the fifth-order term we have" + ] + }, + { + "cell_type": "markdown", + "id": "34a5a413", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5ef38242", + "metadata": { + "editable": true + }, + "source": [ + "The zeroth-order term is" + ] + }, + { + "cell_type": "markdown", + "id": "42a8bd2e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2580d624", + "metadata": { + "editable": true + }, + "source": [ + "## Rewriting as dot products\n", + "\n", + "If we now flip the filter/weight vector, with the following term as a typical example" + ] + }, + { + "cell_type": "markdown", + "id": "76157e3c", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a47c0bbf", + "metadata": { + "editable": true + }, + "source": [ + "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n", + "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n", + "\n", + "The padding $P$ we have introduced for the convolution stage is just\n", + "another hyperparameter which is introduced as part of the\n", + "architecture. Similarly, below we will also introduce another\n", + "hyperparameter called **Stride** $S$." + ] + }, + { + "cell_type": "markdown", + "id": "4de2c235", + "metadata": { + "editable": true + }, + "source": [ + "## Cross correlation\n", + "\n", + "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n", + "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)" + ] + }, + { + "cell_type": "markdown", + "id": "33319954", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d46fb216", + "metadata": { + "editable": true + }, + "source": [ + "we have now" + ] + }, + { + "cell_type": "markdown", + "id": "1125a773", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4e9ea645", + "metadata": { + "editable": true + }, + "source": [ + "Both TensorFlow and PyTorch (as well as our own code example below),\n", + "implement the last equation, although it is normally referred to as\n", + "convolution. The same padding rules and stride rules discussed below\n", + "apply to this expression as well.\n", + "\n", + "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression." + ] + }, + { + "cell_type": "markdown", + "id": "711fc589", + "metadata": { + "editable": true + }, + "source": [ + "## Two-dimensional objects\n", + "\n", + "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n", + "We often use convolutions over more than one dimension at a time. If\n", + "we have a two-dimensional image $X$ as input, we can have a **filter**\n", + "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$" + ] + }, + { + "cell_type": "markdown", + "id": "ea93186d", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2ce72e4f", + "metadata": { + "editable": true + }, + "source": [ + "Convolution is a commutative process, which means we can rewrite this equation as" + ] + }, + { + "cell_type": "markdown", + "id": "7c891889", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "337f70a6", + "metadata": { + "editable": true + }, + "source": [ + "Normally the latter is more straightforward to implement in a machine\n", + "larning library since there is less variation in the range of values\n", + "of $m$ and $n$.\n", + "\n", + "As mentioned above, most deep learning libraries implement\n", + "cross-correlation instead of convolution (although it is referred to as\n", + "convolution)" + ] + }, + { + "cell_type": "markdown", + "id": "aa0e3c87", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77113b34", + "metadata": { + "editable": true + }, + "source": [ + "## CNNs in more detail, simple example\n", + "\n", + "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n", + "and a $2\\times 2$ filter $W$ given by the following matrices" + ] + }, + { + "cell_type": "markdown", + "id": "d54278c7", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02} \\\\\n", + " x_{10} & x_{11} & x_{12} \\\\\n", + "\t x_{20} & x_{21} & x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "597d1ef3", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c544ba40", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n", + "\t w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6c1b40b", + "metadata": { + "editable": true + }, + "source": [ + "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n", + "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n", + "\n", + "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n", + "\n", + "Here we perform the operation" + ] + }, + { + "cell_type": "markdown", + "id": "d8ee5cf0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5df35204", + "metadata": { + "editable": true + }, + "source": [ + "and obtain" + ] + }, + { + "cell_type": "markdown", + "id": "afe8a3ab", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\\\\n", + "\t x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a1c6848", + "metadata": { + "editable": true + }, + "source": [ + "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n", + "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as" + ] + }, + { + "cell_type": "markdown", + "id": "4506234a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f1b2fef4", + "metadata": { + "editable": true + }, + "source": [ + "and the new matrix" + ] + }, + { + "cell_type": "markdown", + "id": "6c372fa6", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n", + " 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n", + "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\\\\n", + " 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "61ad1cf3", + "metadata": { + "editable": true + }, + "source": [ + "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "a18a70a2", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b63a1613", + "metadata": { + "editable": true + }, + "source": [ + "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\\times 2$ output matrix." + ] + }, + { + "cell_type": "markdown", + "id": "8fa9fe57", + "metadata": { + "editable": true + }, + "source": [ + "## The convolution stage\n", + "\n", + "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n", + "order to reduce the dimensionality of an image, adds, in addition to\n", + "the weights and biases (to be trained by the back propagation\n", + "algorithm) that define the filters, two new hyperparameters, the so-called\n", + "**padding** $P$ and the stride $S$." + ] + }, + { + "cell_type": "markdown", + "id": "a30b6ced", + "metadata": { + "editable": true + }, + "source": [ + "## Finding the number of parameters\n", + "\n", + "In the above example we have an input matrix of dimension $3\\times\n", + "3$. In general we call the input for an input volume and it is defined\n", + "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n", + "standard three color channels $D_1=3$.\n", + "\n", + "The above example has $W_1=H_1=3$ and $D_1=1$.\n", + "\n", + "When we introduce the filter we have the following additional hyperparameters\n", + "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n", + "\n", + "2. $F$ as the filter's spatial extent\n", + "\n", + "3. $S$ as the stride parameter\n", + "\n", + "4. $P$ as the padding parameter\n", + "\n", + "These parameters are defined by the architecture of the network and are not included in the training." + ] + }, + { + "cell_type": "markdown", + "id": "b38d040f", + "metadata": { + "editable": true + }, + "source": [ + "## New image (or volume)\n", + "\n", + "Acting with the filter on the input volume produces an output volume\n", + "which is defined by its width $W_2$, its height $H_2$ and its depth\n", + "$D_2$.\n", + "\n", + "These are defined by the following relations" + ] + }, + { + "cell_type": "markdown", + "id": "3b090ce0", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "52fa4212", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dfa9a926", + "metadata": { + "editable": true + }, + "source": [ + "and $D_2=K$." + ] + }, + { + "cell_type": "markdown", + "id": "9bb02c26", + "metadata": { + "editable": true + }, + "source": [ + "## Parameters to train, common settings\n", + "\n", + "With parameter sharing, the convolution involves thus for each filter $F\\times F\\times D_1$ weights plus one bias parameter.\n", + "\n", + "In total we have" + ] + }, + { + "cell_type": "markdown", + "id": "d98e6808", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "601ecd16", + "metadata": { + "editable": true + }, + "source": [ + "parameters to train by back propagation.\n", + "\n", + "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n", + "\n", + "**Common settings.**\n", + "\n", + "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n", + "\n", + "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n", + "\n", + "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n", + "\n", + "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$" + ] + }, + { + "cell_type": "markdown", + "id": "3f87e148", + "metadata": { + "editable": true + }, + "source": [ + "## Examples of CNN setups\n", + "\n", + "Let us assume we have an input volume $V$ given by an image of dimensionality\n", + "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n", + "\n", + "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n", + "\n", + "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n", + "of dimensionality $28\\times 28\\times 3$.\n", + "\n", + "The total number of parameters to train for each filter is then\n", + "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n", + "gives us $76$ parameters for each filter, leading to a total of $760$\n", + "parameters for the ten filters.\n", + "\n", + "How many parameters will a filter of dimensionality $3\\times 3$\n", + "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n", + "\n", + "Note that strides constitute a form of **subsampling**. As an alternative to\n", + "being interpreted as a measure of how much the kernel/filter is translated, strides\n", + "can also be viewed as how much of the output is retained. For instance, moving\n", + "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n", + "retaining only odd output elements." + ] + }, + { + "cell_type": "markdown", + "id": "45526eae", + "metadata": { + "editable": true + }, + "source": [ + "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "963177d2", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling\n", + "\n", + "In addition to discrete convolutions themselves, **pooling** operations\n", + "make up another important building block in CNNs. Pooling operations reduce\n", + "the size of feature maps by using some function to summarize subregions, such\n", + "as taking the average or the maximum value.\n", + "\n", + "Pooling works by sliding a window across the input and feeding the content of\n", + "the window to a **pooling function**. In some sense, pooling works very much\n", + "like a discrete convolution, but replaces the linear combination described by\n", + "the kernel with some other function." + ] + }, + { + "cell_type": "markdown", + "id": "f657465b", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling arithmetic\n", + "\n", + "In a neural network, pooling layers provide invariance to small translations of\n", + "the input. The most common kind of pooling is **max pooling**, which\n", + "consists in splitting the input in (usually non-overlapping) patches and\n", + "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n", + "mean or average pooling, which all share the same idea of aggregating the input\n", + "locally by applying a non-linearity to the content of some patches." + ] + }, + { + "cell_type": "markdown", + "id": "33142d01", + "metadata": { + "editable": true + }, + "source": [ + "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1: A deep CNN

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "7e8ee265", + "metadata": { + "editable": true + }, + "source": [ + "## Building convolutional neural networks using Tensorflow and Keras\n", + "\n", + "As discussed above, CNNs are neural networks built from the assumption that the inputs\n", + "to the network are 2D images. This is important because the number of features or pixels in images\n", + "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network. \n", + "\n", + "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n", + "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n", + "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n", + "matrices, typically 1 for each color dimension (Red, Green, Blue)." + ] + }, + { + "cell_type": "markdown", + "id": "c4e2bc6f", + "metadata": { + "editable": true + }, + "source": [ + "## Setting it up\n", + "\n", + "It means that to represent the entire\n", + "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:" + ] + }, + { + "cell_type": "markdown", + "id": "f8d6e5be", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "bd170ded", + "metadata": { + "editable": true + }, + "source": [ + "## The MNIST dataset again\n", + "\n", + "The MNIST dataset consists of grayscale images with a pixel size of\n", + "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n", + "neuron in the first hidden layer.\n", + "\n", + "If we were to analyze images of size $128\\times 128$ we would require\n", + "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n", + "dealing with color images, as most images are, we have an image matrix\n", + "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n", + "meaning 3 times the number of weights $= 49152$ are required for every\n", + "single neuron in the first hidden layer." + ] + }, + { + "cell_type": "markdown", + "id": "5f8a4322", + "metadata": { + "editable": true + }, + "source": [ + "## Strong correlations\n", + "\n", + "Images typically have strong local correlations, meaning that a small\n", + "part of the image varies little from its neighboring regions. If for\n", + "example we have an image of a blue car, we can roughly assume that a\n", + "small blue part of the image is surrounded by other blue regions.\n", + "\n", + "Therefore, instead of connecting every single pixel to a neuron in the\n", + "first hidden layer, as we have previously done with deep neural\n", + "networks, we can instead connect each neuron to a small part of the\n", + "image (in all 3 RGB depth dimensions). The size of each small area is\n", + "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)." + ] + }, + { + "cell_type": "markdown", + "id": "bad994c1", + "metadata": { + "editable": true + }, + "source": [ + "## Layers of a CNN\n", + "\n", + "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth. \n", + "The input image is typically a square matrix of depth 3. \n", + "\n", + "A **convolution** is performed on the image which outputs\n", + "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n", + "\n", + "Each filter slides along the input image, taking the dot product\n", + "between each small part of the image and the filter, in all depth\n", + "dimensions. This is then passed through a non-linear function,\n", + "typically the **Rectified Linear (ReLu)** function, which serves as the\n", + "activation of the neurons in the first convolutional layer. This is\n", + "further passed through a **pooling layer**, which reduces the size of the\n", + "convolutional layer, e.g. by taking the maximum or average across some\n", + "small regions, and this serves as input to the next convolutional\n", + "layer." + ] + }, + { + "cell_type": "markdown", + "id": "3f9bf131", + "metadata": { + "editable": true + }, + "source": [ + "## Systematic reduction\n", + "\n", + "By systematically reducing the size of the input volume, through\n", + "convolution and pooling, the network should create representations of\n", + "small parts of the input, and then from them assemble representations\n", + "of larger areas. The final pooling layer is flattened to serve as\n", + "input to a hidden layer, such that each neuron in the final pooling\n", + "layer is connected to every single neuron in the hidden layer. This\n", + "then serves as input to the output layer, e.g. a softmax output for\n", + "classification." + ] + }, + { + "cell_type": "markdown", + "id": "625ace40", + "metadata": { + "editable": true + }, + "source": [ + "## Prerequisites: Collect and pre-process data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a3f06a64", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn import datasets\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "\n", + "# display images in notebook\n", + "%matplotlib inline\n", + "plt.rcParams['figure.figsize'] = (12,12)\n", + "\n", + "\n", + "# download MNIST dataset\n", + "digits = datasets.load_digits()\n", + "\n", + "# define inputs and labels\n", + "inputs = digits.images\n", + "labels = digits.target\n", + "\n", + "# RGB images have a depth of 3\n", + "# our images are grayscale so they should have a depth of 1\n", + "inputs = inputs[:,:,:,np.newaxis]\n", + "\n", + "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n", + "print(\"labels = (n_inputs) = \" + str(labels.shape))\n", + "\n", + "\n", + "# choose some random images to display\n", + "n_inputs = len(inputs)\n", + "indices = np.arange(n_inputs)\n", + "random_indices = np.random.choice(indices, size=5)\n", + "\n", + "for i, image in enumerate(digits.images[random_indices]):\n", + " plt.subplot(1, 5, i+1)\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n", + " plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "764e7143", + "metadata": { + "editable": true + }, + "source": [ + "## Importing Keras and Tensorflow" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "1b8fd15a", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from tensorflow.keras import datasets, layers, models\n", + "from tensorflow.keras.layers import Input\n", + "from tensorflow.keras.models import Sequential #This allows appending layers to existing models\n", + "from tensorflow.keras.layers import Dense #This allows defining the characteristics of a particular layer\n", + "from tensorflow.keras import optimizers #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n", + "from tensorflow.keras import regularizers #This allows using whichever regularizer we want (l1,l2,l1_l2)\n", + "from tensorflow.keras.utils import to_categorical #This allows using categorical cross entropy as the cost function\n", + "#from tensorflow.keras import Conv2D\n", + "#from tensorflow.keras import MaxPooling2D\n", + "#from tensorflow.keras import Flatten\n", + "\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# representation of labels\n", + "labels = to_categorical(labels)\n", + "\n", + "# split into train and test data\n", + "# one-liner from scikit-learn library\n", + "train_size = 0.8\n", + "test_size = 1 - train_size\n", + "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n", + " test_size=test_size)" + ] + }, + { + "cell_type": "markdown", + "id": "bf68c3f4", + "metadata": { + "editable": true + }, + "source": [ + "## Running with Keras" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d5a91d0e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n", + " n_filters, n_neurons_connected, n_categories,\n", + " eta, lmbd):\n", + " model = Sequential()\n", + " model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n", + " activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n", + " model.add(layers.Flatten())\n", + " model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n", + " model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n", + " \n", + " sgd = optimizers.SGD(lr=eta)\n", + " model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n", + " \n", + " return model\n", + "\n", + "epochs = 100\n", + "batch_size = 100\n", + "input_shape = X_train.shape[1:4]\n", + "receptive_field = 3\n", + "n_filters = 10\n", + "n_neurons_connected = 50\n", + "n_categories = 10\n", + "\n", + "eta_vals = np.logspace(-5, 1, 7)\n", + "lmbd_vals = np.logspace(-5, 1, 7)" + ] + }, + { + "cell_type": "markdown", + "id": "8ff4d34b", + "metadata": { + "editable": true + }, + "source": [ + "## Final part" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "c1035646", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n", + " \n", + "for i, eta in enumerate(eta_vals):\n", + " for j, lmbd in enumerate(lmbd_vals):\n", + " CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n", + " n_filters, n_neurons_connected, n_categories,\n", + " eta, lmbd)\n", + " CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n", + " scores = CNN.evaluate(X_test, Y_test)\n", + " \n", + " CNN_keras[i][j] = CNN\n", + " \n", + " print(\"Learning rate = \", eta)\n", + " print(\"Lambda = \", lmbd)\n", + " print(\"Test accuracy: %.3f\" % scores[1])\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "dcdee4b4", + "metadata": { + "editable": true + }, + "source": [ + "## Final visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "c34c4218", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# visual representation of grid search\n", + "# uses seaborn heatmap, could probably do this in matplotlib\n", + "import seaborn as sns\n", + "\n", + "sns.set()\n", + "\n", + "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n", + "\n", + "for i in range(len(eta_vals)):\n", + " for j in range(len(lmbd_vals)):\n", + " CNN = CNN_keras[i][j]\n", + "\n", + " train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n", + " test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n", + "\n", + " \n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Training Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()\n", + "\n", + "fig, ax = plt.subplots(figsize = (10, 10))\n", + "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n", + "ax.set_title(\"Test Accuracy\")\n", + "ax.set_ylabel(\"$\\eta$\")\n", + "ax.set_xlabel(\"$\\lambda$\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "9848777f", + "metadata": { + "editable": true + }, + "source": [ + "## The CIFAR01 data set\n", + "\n", + "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n", + "6,000 images in each class. The dataset is divided into 50,000\n", + "training images and 10,000 testing images. The classes are mutually\n", + "exclusive and there is no overlap between them." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "e3c34685", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "\n", + "from tensorflow.keras import datasets, layers, models\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# We import the data set\n", + "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n", + "\n", + "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n", + "train_images, test_images = train_images / 255.0, test_images / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "376a2959", + "metadata": { + "editable": true + }, + "source": [ + "## Verifying the data set\n", + "\n", + "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "fa4b303c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n", + " 'dog', 'frog', 'horse', 'ship', 'truck']\n", + "plt.figure(figsize=(10,10))\n", + "for i in range(25):\n", + " plt.subplot(5,5,i+1)\n", + " plt.xticks([])\n", + " plt.yticks([])\n", + " plt.grid(False)\n", + " plt.imshow(train_images[i], cmap=plt.cm.binary)\n", + " # The CIFAR labels happen to be arrays, \n", + " # which is why you need the extra index\n", + " plt.xlabel(class_names[train_labels[i][0]])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "8f717ab7", + "metadata": { + "editable": true + }, + "source": [ + "## Set up the model\n", + "\n", + "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n", + "\n", + "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "91013222", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model = models.Sequential()\n", + "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "\n", + "# Let's display the architecture of our model so far.\n", + "\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "64f3581b", + "metadata": { + "editable": true + }, + "source": [ + "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer." + ] + }, + { + "cell_type": "markdown", + "id": "07774fd6", + "metadata": { + "editable": true + }, + "source": [ + "## Add Dense layers on top\n", + "\n", + "To complete our model, you will feed the last output tensor from the\n", + "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n", + "to perform classification. Dense layers take vectors as input (which\n", + "are 1D), while the current output is a 3D tensor. First, you will\n", + "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n", + "layers on top. CIFAR has 10 output classes, so you use a final Dense\n", + "layer with 10 outputs and a softmax activation." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a6dc1206", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model.add(layers.Flatten())\n", + "model.add(layers.Dense(64, activation='relu'))\n", + "model.add(layers.Dense(10))\n", + "Here's the complete architecture of our model.\n", + "\n", + "model.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "71ef5715", + "metadata": { + "editable": true + }, + "source": [ + "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers." + ] + }, + { + "cell_type": "markdown", + "id": "596eaf51", + "metadata": { + "editable": true + }, + "source": [ + "## Compile and train the model" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1c8159af", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "model.compile(optimizer='adam',\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])\n", + "\n", + "history = model.fit(train_images, train_labels, epochs=10, \n", + " validation_data=(test_images, test_labels))" + ] + }, + { + "cell_type": "markdown", + "id": "23913f02", + "metadata": { + "editable": true + }, + "source": [ + "## Finally, evaluate the model" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "942cf136", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "plt.plot(history.history['accuracy'], label='accuracy')\n", + "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n", + "plt.xlabel('Epoch')\n", + "plt.ylabel('Accuracy')\n", + "plt.ylim([0.5, 1])\n", + "plt.legend(loc='lower right')\n", + "\n", + "test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)\n", + "\n", + "print(test_acc)" + ] + }, + { + "cell_type": "markdown", + "id": "9cf8f35b", + "metadata": { + "editable": true + }, + "source": [ + "## Building code using Pytorch\n", + "\n", + "This code loads and normalizes the MNIST dataset. Thereafter it defines a CNN architecture with:\n", + "1. Two convolutional layers\n", + "\n", + "2. Max pooling\n", + "\n", + "3. Dropout for regularization\n", + "\n", + "4. Two fully connected layers\n", + "\n", + "It uses the Adam optimizer and for cost function it employs the\n", + "Cross-Entropy function. It trains for 10 epochs.\n", + "You can modify the architecture (number of layers, channels, dropout\n", + "rate) or training parameters (learning rate, batch size, epochs) to\n", + "experiment with different configurations." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3f08edcf", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "from torchvision import datasets, transforms\n", + "\n", + "# Set device\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "\n", + "# Define transforms\n", + "transform = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.1307,), (0.3081,))\n", + "])\n", + "\n", + "# Load datasets\n", + "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n", + "\n", + "# Create data loaders\n", + "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define CNN model\n", + "class CNN(nn.Module):\n", + " def __init__(self):\n", + " super(CNN, self).__init__()\n", + " self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n", + " self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n", + " self.pool = nn.MaxPool2d(2, 2)\n", + " self.fc1 = nn.Linear(64*7*7, 1024)\n", + " self.fc2 = nn.Linear(1024, 10)\n", + " self.dropout = nn.Dropout(0.5)\n", + "\n", + " def forward(self, x):\n", + " x = self.pool(F.relu(self.conv1(x)))\n", + " x = self.pool(F.relu(self.conv2(x)))\n", + " x = x.view(-1, 64*7*7)\n", + " x = self.dropout(F.relu(self.fc1(x)))\n", + " x = self.fc2(x)\n", + " return x\n", + "\n", + "# Initialize model, loss function, and optimizer\n", + "model = CNN().to(device)\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.Adam(model.parameters(), lr=0.001)\n", + "\n", + "# Training loop\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " model.train()\n", + " running_loss = 0.0\n", + " for batch_idx, (data, target) in enumerate(train_loader):\n", + " data, target = data.to(device), target.to(device)\n", + " optimizer.zero_grad()\n", + " outputs = model(data)\n", + " loss = criterion(outputs, target)\n", + " loss.backward()\n", + " optimizer.step()\n", + " running_loss += loss.item()\n", + "\n", + " print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n", + "\n", + "# Testing the model\n", + "model.eval()\n", + "correct = 0\n", + "total = 0\n", + "with torch.no_grad():\n", + " for data, target in test_loader:\n", + " data, target = data.to(device), target.to(device)\n", + " outputs = model(data)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += target.size(0)\n", + " correct += (predicted == target).sum().item()\n", + "\n", + "print(f'Test Accuracy: {100 * correct / total:.2f}%')" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/Programs/Regression/binary_results.csv b/doc/Programs/Regression/binary_results.csv new file mode 100644 index 000000000..1a5f8e043 --- /dev/null +++ b/doc/Programs/Regression/binary_results.csv @@ -0,0 +1,201 @@ +TrueLabel,PredictedLabel +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,0 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 diff --git a/doc/Programs/Regression/multiclass_results.csv b/doc/Programs/Regression/multiclass_results.csv new file mode 100644 index 000000000..9ffbe5203 --- /dev/null +++ b/doc/Programs/Regression/multiclass_results.csv @@ -0,0 +1,301 @@ +TrueLabel,PredictedLabel +0,0 +0,1 +0,0 +0,0 +0,1 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,1 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,1 +0,0 +0,2 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +0,0 +1,1 +1,1 +1,0 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,0 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,0 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,1 +1,0 +1,1 +1,1 +1,0 +1,1 +1,1 +1,1 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 +2,2 diff --git a/doc/Projects/2025/Project2/html/._Project2-bs000.html b/doc/Projects/2025/Project2/html/._Project2-bs000.html new file mode 100644 index 000000000..f84073612 --- /dev/null +++ b/doc/Projects/2025/Project2/html/._Project2-bs000.html @@ -0,0 +1,640 @@ + + + + + + + +Project 2 on Machine Learning, deadline November 10 (Midnight) + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +
    +
    +

    Project 2 on Machine Learning, deadline November 10 (Midnight)

    +
    + + +
    +Data Analysis and Machine Learning FYS-STK3155/FYS4155 +
    + +
    +University of Oslo, Norway +
    +
    +
    +

    October 14, 2025

    +
    +
    + + +
    +

    Deliverables

    + +

    First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the People page.

    + +

    In canvas, deliver as a group and include:

    + +
      +
    • A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:
    • +
        +
      • It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count
      • +
      • It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.
      • +
      +
    • A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include
    • +
    +

    A PDF file of the report

    +
      +
    • A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.
    • +
    • A README file with the name of the group members
    • +
    • a short description of the project
    • +
    • a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce
    • +
    +

    Preamble: Note on writing reports, using reference material, AI and other tools

    + +

    We want you to answer the three different projects by handing in +reports written like a standard scientific/technical report. The links +at +https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects +contain more information. There you can find examples of previous +reports, the projects themselves, how we grade reports etc. How to +write reports will also be discussed during the various lab +sessions. Please do ask us if you are in doubt. +

    + +

    When using codes and material from other sources, you should refer to +these in the bibliography of your report, indicating wherefrom you for +example got the code, whether this is from the lecture notes, +softwares like Scikit-Learn, TensorFlow, PyTorch or other +sources. These sources should always be cited correctly. How to cite +some of the libraries is often indicated from their corresponding +GitHub sites or websites, see for example how to cite Scikit-Learn at +https://scikit-learn.org/dev/about.html. +

    + +

    We enocurage you to use tools like ChatGPT or similar in writing the +report. If you use for example ChatGPT, please do cite it properly and +include (if possible) your questions and answers as an addition to the +report. This can be uploaded to for example your website, +GitHub/GitLab or similar as supplemental material. +

    + +

    If you would like to study other data sets, feel free to propose other +sets. What we have proposed here are mere suggestions from our +side. If you opt for another data set, consider using a set which has +been studied in the scientific literature. This makes it easier for +you to compare and analyze your results. Comparing with existing +results from the scientific literature is also an essential element of +the scientific discussion. The University of California at Irvine with +its Machine Learning repository at +https://archive.ics.uci.edu/ml/index.php is an excellent site to look +up for examples and inspiration. Kaggle.com is an equally interesting +site. Feel free to explore these sites. +

    +

    Classification and Regression, writing our own neural network code

    + +

    The main aim of this project is to study both classification and +regression problems by developing our own +feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html and https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html) as well as the lecture material from the same weeks (see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html and https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html) should contain enough information for you to get started with writing your own code. +

    + +

    We will also reuse our codes on gradient descent methods from project 1.

    + +

    The data sets that we propose here are (the default sets)

    + +
      +
    • Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be
    • +
        +
      • The simple one-dimensional function Runge function from project 1, that is \( f(x) = \frac{1}{1+25x^2} \). We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of https://www.nature.com/articles/s41467-025-61362-4 for an extensive list of two-dimensional functions).
      • +
      +
    • Classification.
    • + +
    +

    We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.

    +

    Part a): Analytical warm-up

    + +

    When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective +gradients. The functions whose gradients we need are: +

    +
      +
    1. The mean-squared error (MSE) with and without the \( L_1 \) and \( L_2 \) norms (regression problems)
    2. +
    3. The binary cross entropy (aka log loss) for binary classification problems with and without \( L_1 \) and \( L_2 \) norms
    4. +
    5. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)
    6. +
    +

    Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.

    + +

    We will test three activation functions for our neural network setup, these are the

    +
      +
    1. The Sigmoid (aka logit) function,
    2. +
    3. the RELU function and
    4. +
    5. the Leaky RELU function
    6. +
    +

    Set up their expressions and their first derivatives. +You may consult the lecture notes (with codes and more) from week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html. +

    +

    Reminder about the gradient machinery from project 1

    + +

    In the setup of a neural network code you will need your gradient descent codes from +project 1. For neural networks we will recommend using stochastic +gradient descent with either the RMSprop or the ADAM algorithms for +updating the learning rates. But you should feel free to try plain gradient descent as well. +

    + +

    We recommend reading chapter 8 on optimization from the textbook of +Goodfellow, Bengio and Courville at +https://www.deeplearningbook.org/. This chapter contains many +useful insights and discussions on the optimization part of machine +learning. A useful reference on the back progagation algorithm is +Nielsen's book at http://neuralnetworksanddeeplearning.com/. +

    + +

    You will find the Python Seaborn +package +useful when plotting the results as function of the learning rate +\( \eta \) and the hyper-parameter \( \lambda \) . +

    +

    Part b): Writing your own Neural Network code

    + +

    Your aim now, and this is the central part of this project, is to +write your own FFNN code implementing the back +propagation algorithm discussed in the lecture slides from week 41 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html and week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html. +

    + +

    We will focus on a regression problem first, using the one-dimensional Runge function

    +$$ +f(x) = \frac{1}{1+25x^2}, +$$ + +

    from project 1.

    + +

    Use only the mean-squared error as cost function (no regularization terms) and +write an FFNN code for a regression problem with a flexible number of hidden +layers and nodes using only the Sigmoid function as activation function for +the hidden layers. Initialize the weights using a normal +distribution. How would you initialize the biases? And which +activation function would you select for the final output layer? +And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? +

    + +

    Train your network and compare the results with those from your OLS +regression code from project 1 using the one-dimensional Runge +function. When comparing your neural network code with the OLS +results from project 1, use the same data sets which gave you the best +MSE score. Moreover, use the polynomial order from project 1 that gave you the +best result. Compare these results with your neural network with one +and two hidden layers using \( 50 \) and \( 100 \) hidden nodes, respectively. +

    + +

    Comment your results and give a critical discussion of the results +obtained with the OLS code from project 1 and your own neural network +code. Make an analysis of the learning rates employed to find the +optimal MSE score. Test both stochastic gradient descent +with RMSprop and ADAM and plain gradient descent with different +learning rates. +

    + +

    You should, as you did in project 1, scale your data.

    +

    Part c): Testing against other software libraries

    + +

    You should test your results against a similar code using Scikit-Learn (see the examples in the above lecture notes from weeks 41 and 42) or tensorflow/keras or Pytorch (for Pytorch, see Raschka et al.'s text chapters 12 and 13).

    + +

    Furthermore, you should also test that your derivatives are correctly +calculated using automatic differentiation, using for example the +Autograd library or the JAX library. It is optional to implement +these libraries for the present project. In this project they serve as +useful tests of our derivatives. +

    +

    Part d): Testing different activation functions and depths of the neural network

    + +

    You should also test different activation functions for the hidden +layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and +discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting? +It is optional in this project to perform a bias-variance trade-off analysis. +

    +

    Part e): Testing different norms

    + +

    Finally, still using the one-dimensional Runge function, add now the +hyperparameters \( \lambda \) with the \( L_2 \) and \( L_1 \) norms. Find the +optimal results for the hyperparameters \( \lambda \) and the learning +rates \( \eta \) and neural network architecture and compare the \( L_2 \) results with Ridge regression from +project 1 and the \( L_1 \) results with the Lasso calculations of project 1. +Use again the same data sets and the best results from project 1 in your comparisons. +

    +

    Part f): Classification analysis using neural networks

    + +

    With a well-written code it should now be easy to change the +activation function for the output layer. +

    + +

    Here we will change the cost function for our neural network code +developed in parts b), d) and e) in order to perform a classification +analysis. The classification problem we will study is the multiclass +MNIST problem, see the description of the full data set at +https://www.kaggle.com/datasets/hojjatk/mnist-dataset. We will use the Softmax cross entropy function discussed in a). +The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. +

    + +

    Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the +MNIST-Fashion data set at for example https://www.kaggle.com/datasets/zalando-research/fashionmnist. +

    + +

    To set up the data set, the following python programs may be useful

    + + +
    +
    +
    +
    +
    +
    from sklearn.datasets import fetch_openml
    +
    +# Fetch the MNIST dataset
    +mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
    +
    +# Extract data (features) and target (labels)
    +X = mnist.data
    +y = mnist.target
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling

    + + +
    +
    +
    +
    +
    +
    X = X / 255.0
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    And then perform the standard train-test splitting

    + + +
    +
    +
    +
    +
    +
    from sklearn.model_selection import train_test_split
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    To measure the performance of our classification problem we will use the +so-called accuracy score. The accuracy is as you would expect just +the number of correctly guessed targets \( t_i \) divided by the total +number of targets, that is +

    + +$$ +\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} , +$$ + +

    where \( I \) is the indicator function, \( 1 \) if \( t_i = y_i \) and \( 0 \) +otherwise if we have a binary classification problem. Here \( t_i \) +represents the target and \( y_i \) the outputs of your FFNN code and \( n \) is simply the number of targets \( t_i \). +

    + +

    Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter \( \lambda \), various activation functions, number of hidden layers and nodes and activation functions.

    + +

    Again, we strongly recommend that you compare your own neural Network +code for classification and pertinent results against a similar code using Scikit-Learn or tensorflow/keras or pytorch. +

    + +

    If you have time, you can use the functionality of scikit-learn and compare your neural network results with those from Logistic regression. This is optional. +The weblink here https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. +

    + +

    If you wish to compare with say Logisti Regression from scikit-learn, the following code uses the above data set

    + + +
    +
    +
    +
    +
    +
    from sklearn.linear_model import LogisticRegression
    +# Initialize the model
    +model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
    +# Train the model
    +model.fit(X_train, y_train)
    +from sklearn.metrics import accuracy_score
    +# Make predictions on the test set
    +y_pred = model.predict(X_test)
    +# Calculate accuracy
    +accuracy = accuracy_score(y_test, y_pred)
    +print(f"Model Accuracy: {accuracy:.4f}")
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +

    Part g) Critical evaluation of the various algorithms

    + +

    After all these glorious calculations, you should now summarize the +various algorithms and come with a critical evaluation of their pros +and cons. Which algorithm works best for the regression case and which +is best for the classification case. These codes can also be part of +your final project 3, but now applied to other data sets. +

    +

    Summary of methods to implement and analyze

    + +Required Implementation: +
      +
    1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.
    2. +
    3. Implement a neural network with
    4. +
        +
      • A flexible number of layers
      • +
      • A flexible number of nodes in each layer
      • +
      • A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)
      • +
      • A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification
      • +
      • An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)
      • +
      +
    5. Implement the back-propagation algorithm to compute the gradient of your neural network
    6. +
    7. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)
    8. +
        +
      • With no optimization algorithm
      • +
      • With RMS Prop
      • +
      • With ADAM
      • +
      +
    9. Implement scaling and train-test splitting of your data, preferably using sklearn
    10. +
    11. Implement and compute metrics like the MSE and Accuracy
    12. +
    +

    Required Analysis:

    +
      +
    1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.
    2. +
    3. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.
    4. +
    5. Show and argue for the advantages and disadvantages of using a neural network for regression on your data
    6. +
    7. Show and argue for the advantages and disadvantages of using a neural network for classification on your data
    8. +
    9. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network
    10. +
    +

    Optional (Note that you should include at least two of these in the report):

    +
      +
    1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)
    2. +
    3. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.
    4. +
    5. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
    6. +
    7. Use a more complex classification dataset instead, like the fashion MNIST (see https://www.kaggle.com/datasets/zalando-research/fashionmnist)
    8. +
    9. Use a more complex regression dataset instead, like the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of https://www.nature.com/articles/s41467-025-61362-4 for an extensive list of two-dimensional functions).
    10. +
    11. Compute and interpret a confusion matrix of your best classification model (see https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607)
    12. +
    +

    Background literature

    + +
      +
    1. The text of Michael Nielsen is highly recommended, see Nielsen's book at http://neuralnetworksanddeeplearning.com/. It is an excellent read.
    2. +
    3. Goodfellow, Bengio and Courville, Deep Learning at https://www.deeplearningbook.org/. Here we recommend chapters 6, 7 and 8
    4. +
    5. Raschka et al. at https://sebastianraschka.com/blog/2022/ml-pytorch-book.html. Here we recommend chapters 11, 12 and 13.
    6. +
    +

    Introduction to numerical projects

    + +

    Here follows a brief recipe and recommendation on how to write a report for each +project. +

    + +
      +
    • Give a short description of the nature of the problem and the eventual numerical methods you have used.
    • +
    • Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.
    • +
    • Include the source code of your program. Comment your program properly.
    • +
    • If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.
    • +
    • Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.
    • +
    • Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.
    • +
    • Try to give an interpretation of you results in your answers to the problems.
    • +
    • Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.
    • +
    • Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.
    • +
    +

    Format for electronic delivery of report and programs

    + +

    The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:

    + +
      +
    • Use Canvas to hand in your projects, log in at https://www.uio.no/english/services/it/education/canvas/ with your normal UiO username and password.
    • +
    • Upload only the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.
    • +
    • In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
    • +
    +

    Finally, +we encourage you to collaborate. Optimal working groups consist of +2-3 students. You can then hand in a common report. +

    + +

    + +

      +
    • 1
    • +
    + +
    + + + + +
    + © 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license +
    + + + diff --git a/doc/Projects/2025/Project2/html/Project2-bs.html b/doc/Projects/2025/Project2/html/Project2-bs.html new file mode 100644 index 000000000..f84073612 --- /dev/null +++ b/doc/Projects/2025/Project2/html/Project2-bs.html @@ -0,0 +1,640 @@ + + + + + + + +Project 2 on Machine Learning, deadline November 10 (Midnight) + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +
    +
    +

    Project 2 on Machine Learning, deadline November 10 (Midnight)

    +
    + + +
    +Data Analysis and Machine Learning FYS-STK3155/FYS4155 +
    + +
    +University of Oslo, Norway +
    +
    +
    +

    October 14, 2025

    +
    +
    + + +
    +

    Deliverables

    + +

    First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the People page.

    + +

    In canvas, deliver as a group and include:

    + +
      +
    • A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:
    • +
        +
      • It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count
      • +
      • It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.
      • +
      +
    • A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include
    • +
    +

    A PDF file of the report

    +
      +
    • A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.
    • +
    • A README file with the name of the group members
    • +
    • a short description of the project
    • +
    • a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce
    • +
    +

    Preamble: Note on writing reports, using reference material, AI and other tools

    + +

    We want you to answer the three different projects by handing in +reports written like a standard scientific/technical report. The links +at +https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects +contain more information. There you can find examples of previous +reports, the projects themselves, how we grade reports etc. How to +write reports will also be discussed during the various lab +sessions. Please do ask us if you are in doubt. +

    + +

    When using codes and material from other sources, you should refer to +these in the bibliography of your report, indicating wherefrom you for +example got the code, whether this is from the lecture notes, +softwares like Scikit-Learn, TensorFlow, PyTorch or other +sources. These sources should always be cited correctly. How to cite +some of the libraries is often indicated from their corresponding +GitHub sites or websites, see for example how to cite Scikit-Learn at +https://scikit-learn.org/dev/about.html. +

    + +

    We enocurage you to use tools like ChatGPT or similar in writing the +report. If you use for example ChatGPT, please do cite it properly and +include (if possible) your questions and answers as an addition to the +report. This can be uploaded to for example your website, +GitHub/GitLab or similar as supplemental material. +

    + +

    If you would like to study other data sets, feel free to propose other +sets. What we have proposed here are mere suggestions from our +side. If you opt for another data set, consider using a set which has +been studied in the scientific literature. This makes it easier for +you to compare and analyze your results. Comparing with existing +results from the scientific literature is also an essential element of +the scientific discussion. The University of California at Irvine with +its Machine Learning repository at +https://archive.ics.uci.edu/ml/index.php is an excellent site to look +up for examples and inspiration. Kaggle.com is an equally interesting +site. Feel free to explore these sites. +

    +

    Classification and Regression, writing our own neural network code

    + +

    The main aim of this project is to study both classification and +regression problems by developing our own +feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html and https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html) as well as the lecture material from the same weeks (see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html and https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html) should contain enough information for you to get started with writing your own code. +

    + +

    We will also reuse our codes on gradient descent methods from project 1.

    + +

    The data sets that we propose here are (the default sets)

    + +
      +
    • Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be
    • +
        +
      • The simple one-dimensional function Runge function from project 1, that is \( f(x) = \frac{1}{1+25x^2} \). We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of https://www.nature.com/articles/s41467-025-61362-4 for an extensive list of two-dimensional functions).
      • +
      +
    • Classification.
    • + +
    +

    We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.

    +

    Part a): Analytical warm-up

    + +

    When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective +gradients. The functions whose gradients we need are: +

    +
      +
    1. The mean-squared error (MSE) with and without the \( L_1 \) and \( L_2 \) norms (regression problems)
    2. +
    3. The binary cross entropy (aka log loss) for binary classification problems with and without \( L_1 \) and \( L_2 \) norms
    4. +
    5. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)
    6. +
    +

    Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.

    + +

    We will test three activation functions for our neural network setup, these are the

    +
      +
    1. The Sigmoid (aka logit) function,
    2. +
    3. the RELU function and
    4. +
    5. the Leaky RELU function
    6. +
    +

    Set up their expressions and their first derivatives. +You may consult the lecture notes (with codes and more) from week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html. +

    +

    Reminder about the gradient machinery from project 1

    + +

    In the setup of a neural network code you will need your gradient descent codes from +project 1. For neural networks we will recommend using stochastic +gradient descent with either the RMSprop or the ADAM algorithms for +updating the learning rates. But you should feel free to try plain gradient descent as well. +

    + +

    We recommend reading chapter 8 on optimization from the textbook of +Goodfellow, Bengio and Courville at +https://www.deeplearningbook.org/. This chapter contains many +useful insights and discussions on the optimization part of machine +learning. A useful reference on the back progagation algorithm is +Nielsen's book at http://neuralnetworksanddeeplearning.com/. +

    + +

    You will find the Python Seaborn +package +useful when plotting the results as function of the learning rate +\( \eta \) and the hyper-parameter \( \lambda \) . +

    +

    Part b): Writing your own Neural Network code

    + +

    Your aim now, and this is the central part of this project, is to +write your own FFNN code implementing the back +propagation algorithm discussed in the lecture slides from week 41 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html and week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html. +

    + +

    We will focus on a regression problem first, using the one-dimensional Runge function

    +$$ +f(x) = \frac{1}{1+25x^2}, +$$ + +

    from project 1.

    + +

    Use only the mean-squared error as cost function (no regularization terms) and +write an FFNN code for a regression problem with a flexible number of hidden +layers and nodes using only the Sigmoid function as activation function for +the hidden layers. Initialize the weights using a normal +distribution. How would you initialize the biases? And which +activation function would you select for the final output layer? +And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? +

    + +

    Train your network and compare the results with those from your OLS +regression code from project 1 using the one-dimensional Runge +function. When comparing your neural network code with the OLS +results from project 1, use the same data sets which gave you the best +MSE score. Moreover, use the polynomial order from project 1 that gave you the +best result. Compare these results with your neural network with one +and two hidden layers using \( 50 \) and \( 100 \) hidden nodes, respectively. +

    + +

    Comment your results and give a critical discussion of the results +obtained with the OLS code from project 1 and your own neural network +code. Make an analysis of the learning rates employed to find the +optimal MSE score. Test both stochastic gradient descent +with RMSprop and ADAM and plain gradient descent with different +learning rates. +

    + +

    You should, as you did in project 1, scale your data.

    +

    Part c): Testing against other software libraries

    + +

    You should test your results against a similar code using Scikit-Learn (see the examples in the above lecture notes from weeks 41 and 42) or tensorflow/keras or Pytorch (for Pytorch, see Raschka et al.'s text chapters 12 and 13).

    + +

    Furthermore, you should also test that your derivatives are correctly +calculated using automatic differentiation, using for example the +Autograd library or the JAX library. It is optional to implement +these libraries for the present project. In this project they serve as +useful tests of our derivatives. +

    +

    Part d): Testing different activation functions and depths of the neural network

    + +

    You should also test different activation functions for the hidden +layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and +discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting? +It is optional in this project to perform a bias-variance trade-off analysis. +

    +

    Part e): Testing different norms

    + +

    Finally, still using the one-dimensional Runge function, add now the +hyperparameters \( \lambda \) with the \( L_2 \) and \( L_1 \) norms. Find the +optimal results for the hyperparameters \( \lambda \) and the learning +rates \( \eta \) and neural network architecture and compare the \( L_2 \) results with Ridge regression from +project 1 and the \( L_1 \) results with the Lasso calculations of project 1. +Use again the same data sets and the best results from project 1 in your comparisons. +

    +

    Part f): Classification analysis using neural networks

    + +

    With a well-written code it should now be easy to change the +activation function for the output layer. +

    + +

    Here we will change the cost function for our neural network code +developed in parts b), d) and e) in order to perform a classification +analysis. The classification problem we will study is the multiclass +MNIST problem, see the description of the full data set at +https://www.kaggle.com/datasets/hojjatk/mnist-dataset. We will use the Softmax cross entropy function discussed in a). +The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. +

    + +

    Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the +MNIST-Fashion data set at for example https://www.kaggle.com/datasets/zalando-research/fashionmnist. +

    + +

    To set up the data set, the following python programs may be useful

    + + +
    +
    +
    +
    +
    +
    from sklearn.datasets import fetch_openml
    +
    +# Fetch the MNIST dataset
    +mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
    +
    +# Extract data (features) and target (labels)
    +X = mnist.data
    +y = mnist.target
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling

    + + +
    +
    +
    +
    +
    +
    X = X / 255.0
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    And then perform the standard train-test splitting

    + + +
    +
    +
    +
    +
    +
    from sklearn.model_selection import train_test_split
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    To measure the performance of our classification problem we will use the +so-called accuracy score. The accuracy is as you would expect just +the number of correctly guessed targets \( t_i \) divided by the total +number of targets, that is +

    + +$$ +\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} , +$$ + +

    where \( I \) is the indicator function, \( 1 \) if \( t_i = y_i \) and \( 0 \) +otherwise if we have a binary classification problem. Here \( t_i \) +represents the target and \( y_i \) the outputs of your FFNN code and \( n \) is simply the number of targets \( t_i \). +

    + +

    Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter \( \lambda \), various activation functions, number of hidden layers and nodes and activation functions.

    + +

    Again, we strongly recommend that you compare your own neural Network +code for classification and pertinent results against a similar code using Scikit-Learn or tensorflow/keras or pytorch. +

    + +

    If you have time, you can use the functionality of scikit-learn and compare your neural network results with those from Logistic regression. This is optional. +The weblink here https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. +

    + +

    If you wish to compare with say Logisti Regression from scikit-learn, the following code uses the above data set

    + + +
    +
    +
    +
    +
    +
    from sklearn.linear_model import LogisticRegression
    +# Initialize the model
    +model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
    +# Train the model
    +model.fit(X_train, y_train)
    +from sklearn.metrics import accuracy_score
    +# Make predictions on the test set
    +y_pred = model.predict(X_test)
    +# Calculate accuracy
    +accuracy = accuracy_score(y_test, y_pred)
    +print(f"Model Accuracy: {accuracy:.4f}")
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +

    Part g) Critical evaluation of the various algorithms

    + +

    After all these glorious calculations, you should now summarize the +various algorithms and come with a critical evaluation of their pros +and cons. Which algorithm works best for the regression case and which +is best for the classification case. These codes can also be part of +your final project 3, but now applied to other data sets. +

    +

    Summary of methods to implement and analyze

    + +Required Implementation: +
      +
    1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.
    2. +
    3. Implement a neural network with
    4. +
        +
      • A flexible number of layers
      • +
      • A flexible number of nodes in each layer
      • +
      • A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)
      • +
      • A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification
      • +
      • An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)
      • +
      +
    5. Implement the back-propagation algorithm to compute the gradient of your neural network
    6. +
    7. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)
    8. +
        +
      • With no optimization algorithm
      • +
      • With RMS Prop
      • +
      • With ADAM
      • +
      +
    9. Implement scaling and train-test splitting of your data, preferably using sklearn
    10. +
    11. Implement and compute metrics like the MSE and Accuracy
    12. +
    +

    Required Analysis:

    +
      +
    1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.
    2. +
    3. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.
    4. +
    5. Show and argue for the advantages and disadvantages of using a neural network for regression on your data
    6. +
    7. Show and argue for the advantages and disadvantages of using a neural network for classification on your data
    8. +
    9. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network
    10. +
    +

    Optional (Note that you should include at least two of these in the report):

    +
      +
    1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)
    2. +
    3. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.
    4. +
    5. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
    6. +
    7. Use a more complex classification dataset instead, like the fashion MNIST (see https://www.kaggle.com/datasets/zalando-research/fashionmnist)
    8. +
    9. Use a more complex regression dataset instead, like the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of https://www.nature.com/articles/s41467-025-61362-4 for an extensive list of two-dimensional functions).
    10. +
    11. Compute and interpret a confusion matrix of your best classification model (see https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607)
    12. +
    +

    Background literature

    + +
      +
    1. The text of Michael Nielsen is highly recommended, see Nielsen's book at http://neuralnetworksanddeeplearning.com/. It is an excellent read.
    2. +
    3. Goodfellow, Bengio and Courville, Deep Learning at https://www.deeplearningbook.org/. Here we recommend chapters 6, 7 and 8
    4. +
    5. Raschka et al. at https://sebastianraschka.com/blog/2022/ml-pytorch-book.html. Here we recommend chapters 11, 12 and 13.
    6. +
    +

    Introduction to numerical projects

    + +

    Here follows a brief recipe and recommendation on how to write a report for each +project. +

    + +
      +
    • Give a short description of the nature of the problem and the eventual numerical methods you have used.
    • +
    • Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.
    • +
    • Include the source code of your program. Comment your program properly.
    • +
    • If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.
    • +
    • Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.
    • +
    • Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.
    • +
    • Try to give an interpretation of you results in your answers to the problems.
    • +
    • Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.
    • +
    • Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.
    • +
    +

    Format for electronic delivery of report and programs

    + +

    The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:

    + +
      +
    • Use Canvas to hand in your projects, log in at https://www.uio.no/english/services/it/education/canvas/ with your normal UiO username and password.
    • +
    • Upload only the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.
    • +
    • In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
    • +
    +

    Finally, +we encourage you to collaborate. Optimal working groups consist of +2-3 students. You can then hand in a common report. +

    + +

    + +

      +
    • 1
    • +
    + +
    + + + + +
    + © 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license +
    + + + diff --git a/doc/Projects/2025/Project2/html/Project2.html b/doc/Projects/2025/Project2/html/Project2.html new file mode 100644 index 000000000..71e58c711 --- /dev/null +++ b/doc/Projects/2025/Project2/html/Project2.html @@ -0,0 +1,658 @@ + + + + + + + +Project 2 on Machine Learning, deadline November 10 (Midnight) + + + + + + + + + + + + + + +
    +

    Project 2 on Machine Learning, deadline November 10 (Midnight)

    +
    + + +
    +Data Analysis and Machine Learning FYS-STK3155/FYS4155 +
    + +
    +University of Oslo, Norway +
    +
    +
    +

    October 14, 2025

    +
    +
    +

    Deliverables

    + +

    First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the People page.

    + +

    In canvas, deliver as a group and include:

    + +
      +
    • A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:
    • +
        +
      • It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count
      • +
      • It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.
      • +
      +
    • A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include
    • +
    +

    A PDF file of the report

    +
      +
    • A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.
    • +
    • A README file with the name of the group members
    • +
    • a short description of the project
    • +
    • a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce
    • +
    +

    Preamble: Note on writing reports, using reference material, AI and other tools

    + +

    We want you to answer the three different projects by handing in +reports written like a standard scientific/technical report. The links +at +https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects +contain more information. There you can find examples of previous +reports, the projects themselves, how we grade reports etc. How to +write reports will also be discussed during the various lab +sessions. Please do ask us if you are in doubt. +

    + +

    When using codes and material from other sources, you should refer to +these in the bibliography of your report, indicating wherefrom you for +example got the code, whether this is from the lecture notes, +softwares like Scikit-Learn, TensorFlow, PyTorch or other +sources. These sources should always be cited correctly. How to cite +some of the libraries is often indicated from their corresponding +GitHub sites or websites, see for example how to cite Scikit-Learn at +https://scikit-learn.org/dev/about.html. +

    + +

    We enocurage you to use tools like ChatGPT or similar in writing the +report. If you use for example ChatGPT, please do cite it properly and +include (if possible) your questions and answers as an addition to the +report. This can be uploaded to for example your website, +GitHub/GitLab or similar as supplemental material. +

    + +

    If you would like to study other data sets, feel free to propose other +sets. What we have proposed here are mere suggestions from our +side. If you opt for another data set, consider using a set which has +been studied in the scientific literature. This makes it easier for +you to compare and analyze your results. Comparing with existing +results from the scientific literature is also an essential element of +the scientific discussion. The University of California at Irvine with +its Machine Learning repository at +https://archive.ics.uci.edu/ml/index.php is an excellent site to look +up for examples and inspiration. Kaggle.com is an equally interesting +site. Feel free to explore these sites. +

    +

    Classification and Regression, writing our own neural network code

    + +

    The main aim of this project is to study both classification and +regression problems by developing our own +feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html and https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html) as well as the lecture material from the same weeks (see https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html and https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html) should contain enough information for you to get started with writing your own code. +

    + +

    We will also reuse our codes on gradient descent methods from project 1.

    + +

    The data sets that we propose here are (the default sets)

    + +
      +
    • Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be
    • +
        +
      • The simple one-dimensional function Runge function from project 1, that is \( f(x) = \frac{1}{1+25x^2} \). We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of https://www.nature.com/articles/s41467-025-61362-4 for an extensive list of two-dimensional functions).
      • +
      +
    • Classification.
    • + +
    +

    We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.

    +

    Part a): Analytical warm-up

    + +

    When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective +gradients. The functions whose gradients we need are: +

    +
      +
    1. The mean-squared error (MSE) with and without the \( L_1 \) and \( L_2 \) norms (regression problems)
    2. +
    3. The binary cross entropy (aka log loss) for binary classification problems with and without \( L_1 \) and \( L_2 \) norms
    4. +
    5. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)
    6. +
    +

    Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.

    + +

    We will test three activation functions for our neural network setup, these are the

    +
      +
    1. The Sigmoid (aka logit) function,
    2. +
    3. the RELU function and
    4. +
    5. the Leaky RELU function
    6. +
    +

    Set up their expressions and their first derivatives. +You may consult the lecture notes (with codes and more) from week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html. +

    +

    Reminder about the gradient machinery from project 1

    + +

    In the setup of a neural network code you will need your gradient descent codes from +project 1. For neural networks we will recommend using stochastic +gradient descent with either the RMSprop or the ADAM algorithms for +updating the learning rates. But you should feel free to try plain gradient descent as well. +

    + +

    We recommend reading chapter 8 on optimization from the textbook of +Goodfellow, Bengio and Courville at +https://www.deeplearningbook.org/. This chapter contains many +useful insights and discussions on the optimization part of machine +learning. A useful reference on the back progagation algorithm is +Nielsen's book at http://neuralnetworksanddeeplearning.com/. +

    + +

    You will find the Python Seaborn +package +useful when plotting the results as function of the learning rate +\( \eta \) and the hyper-parameter \( \lambda \) . +

    +

    Part b): Writing your own Neural Network code

    + +

    Your aim now, and this is the central part of this project, is to +write your own FFNN code implementing the back +propagation algorithm discussed in the lecture slides from week 41 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html and week 42 at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html. +

    + +

    We will focus on a regression problem first, using the one-dimensional Runge function

    +$$ +f(x) = \frac{1}{1+25x^2}, +$$ + +

    from project 1.

    + +

    Use only the mean-squared error as cost function (no regularization terms) and +write an FFNN code for a regression problem with a flexible number of hidden +layers and nodes using only the Sigmoid function as activation function for +the hidden layers. Initialize the weights using a normal +distribution. How would you initialize the biases? And which +activation function would you select for the final output layer? +And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? +

    + +

    Train your network and compare the results with those from your OLS +regression code from project 1 using the one-dimensional Runge +function. When comparing your neural network code with the OLS +results from project 1, use the same data sets which gave you the best +MSE score. Moreover, use the polynomial order from project 1 that gave you the +best result. Compare these results with your neural network with one +and two hidden layers using \( 50 \) and \( 100 \) hidden nodes, respectively. +

    + +

    Comment your results and give a critical discussion of the results +obtained with the OLS code from project 1 and your own neural network +code. Make an analysis of the learning rates employed to find the +optimal MSE score. Test both stochastic gradient descent +with RMSprop and ADAM and plain gradient descent with different +learning rates. +

    + +

    You should, as you did in project 1, scale your data.

    +

    Part c): Testing against other software libraries

    + +

    You should test your results against a similar code using Scikit-Learn (see the examples in the above lecture notes from weeks 41 and 42) or tensorflow/keras or Pytorch (for Pytorch, see Raschka et al.'s text chapters 12 and 13).

    + +

    Furthermore, you should also test that your derivatives are correctly +calculated using automatic differentiation, using for example the +Autograd library or the JAX library. It is optional to implement +these libraries for the present project. In this project they serve as +useful tests of our derivatives. +

    +

    Part d): Testing different activation functions and depths of the neural network

    + +

    You should also test different activation functions for the hidden +layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and +discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting? +It is optional in this project to perform a bias-variance trade-off analysis. +

    +

    Part e): Testing different norms

    + +

    Finally, still using the one-dimensional Runge function, add now the +hyperparameters \( \lambda \) with the \( L_2 \) and \( L_1 \) norms. Find the +optimal results for the hyperparameters \( \lambda \) and the learning +rates \( \eta \) and neural network architecture and compare the \( L_2 \) results with Ridge regression from +project 1 and the \( L_1 \) results with the Lasso calculations of project 1. +Use again the same data sets and the best results from project 1 in your comparisons. +

    +

    Part f): Classification analysis using neural networks

    + +

    With a well-written code it should now be easy to change the +activation function for the output layer. +

    + +

    Here we will change the cost function for our neural network code +developed in parts b), d) and e) in order to perform a classification +analysis. The classification problem we will study is the multiclass +MNIST problem, see the description of the full data set at +https://www.kaggle.com/datasets/hojjatk/mnist-dataset. We will use the Softmax cross entropy function discussed in a). +The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. +

    + +

    Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the +MNIST-Fashion data set at for example https://www.kaggle.com/datasets/zalando-research/fashionmnist. +

    + +

    To set up the data set, the following python programs may be useful

    + + +
    +
    +
    +
    +
    +
    from sklearn.datasets import fetch_openml
    +
    +# Fetch the MNIST dataset
    +mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
    +
    +# Extract data (features) and target (labels)
    +X = mnist.data
    +y = mnist.target
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling

    + + +
    +
    +
    +
    +
    +
    X = X / 255.0
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    And then perform the standard train-test splitting

    + + +
    +
    +
    +
    +
    +
    from sklearn.model_selection import train_test_split
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    To measure the performance of our classification problem we will use the +so-called accuracy score. The accuracy is as you would expect just +the number of correctly guessed targets \( t_i \) divided by the total +number of targets, that is +

    + +$$ +\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} , +$$ + +

    where \( I \) is the indicator function, \( 1 \) if \( t_i = y_i \) and \( 0 \) +otherwise if we have a binary classification problem. Here \( t_i \) +represents the target and \( y_i \) the outputs of your FFNN code and \( n \) is simply the number of targets \( t_i \). +

    + +

    Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter \( \lambda \), various activation functions, number of hidden layers and nodes and activation functions.

    + +

    Again, we strongly recommend that you compare your own neural Network +code for classification and pertinent results against a similar code using Scikit-Learn or tensorflow/keras or pytorch. +

    + +

    If you have time, you can use the functionality of scikit-learn and compare your neural network results with those from Logistic regression. This is optional. +The weblink here https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. +

    + +

    If you wish to compare with say Logisti Regression from scikit-learn, the following code uses the above data set

    + + +
    +
    +
    +
    +
    +
    from sklearn.linear_model import LogisticRegression
    +# Initialize the model
    +model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
    +# Train the model
    +model.fit(X_train, y_train)
    +from sklearn.metrics import accuracy_score
    +# Make predictions on the test set
    +y_pred = model.predict(X_test)
    +# Calculate accuracy
    +accuracy = accuracy_score(y_test, y_pred)
    +print(f"Model Accuracy: {accuracy:.4f}")
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +

    Part g) Critical evaluation of the various algorithms

    + +

    After all these glorious calculations, you should now summarize the +various algorithms and come with a critical evaluation of their pros +and cons. Which algorithm works best for the regression case and which +is best for the classification case. These codes can also be part of +your final project 3, but now applied to other data sets. +

    +

    Summary of methods to implement and analyze

    + +Required Implementation: +
      +
    1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.
    2. +
    3. Implement a neural network with
    4. +
        +
      • A flexible number of layers
      • +
      • A flexible number of nodes in each layer
      • +
      • A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)
      • +
      • A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification
      • +
      • An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)
      • +
      +
    5. Implement the back-propagation algorithm to compute the gradient of your neural network
    6. +
    7. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)
    8. +
        +
      • With no optimization algorithm
      • +
      • With RMS Prop
      • +
      • With ADAM
      • +
      +
    9. Implement scaling and train-test splitting of your data, preferably using sklearn
    10. +
    11. Implement and compute metrics like the MSE and Accuracy
    12. +
    +

    Required Analysis:

    +
      +
    1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.
    2. +
    3. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.
    4. +
    5. Show and argue for the advantages and disadvantages of using a neural network for regression on your data
    6. +
    7. Show and argue for the advantages and disadvantages of using a neural network for classification on your data
    8. +
    9. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network
    10. +
    +

    Optional (Note that you should include at least two of these in the report):

    +
      +
    1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)
    2. +
    3. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.
    4. +
    5. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
    6. +
    7. Use a more complex classification dataset instead, like the fashion MNIST (see https://www.kaggle.com/datasets/zalando-research/fashionmnist)
    8. +
    9. Use a more complex regression dataset instead, like the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of https://www.nature.com/articles/s41467-025-61362-4 for an extensive list of two-dimensional functions).
    10. +
    11. Compute and interpret a confusion matrix of your best classification model (see https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607)
    12. +
    +

    Background literature

    + +
      +
    1. The text of Michael Nielsen is highly recommended, see Nielsen's book at http://neuralnetworksanddeeplearning.com/. It is an excellent read.
    2. +
    3. Goodfellow, Bengio and Courville, Deep Learning at https://www.deeplearningbook.org/. Here we recommend chapters 6, 7 and 8
    4. +
    5. Raschka et al. at https://sebastianraschka.com/blog/2022/ml-pytorch-book.html. Here we recommend chapters 11, 12 and 13.
    6. +
    +

    Introduction to numerical projects

    + +

    Here follows a brief recipe and recommendation on how to write a report for each +project. +

    + +
      +
    • Give a short description of the nature of the problem and the eventual numerical methods you have used.
    • +
    • Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.
    • +
    • Include the source code of your program. Comment your program properly.
    • +
    • If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.
    • +
    • Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.
    • +
    • Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.
    • +
    • Try to give an interpretation of you results in your answers to the problems.
    • +
    • Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.
    • +
    • Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.
    • +
    +

    Format for electronic delivery of report and programs

    + +

    The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:

    + +
      +
    • Use Canvas to hand in your projects, log in at https://www.uio.no/english/services/it/education/canvas/ with your normal UiO username and password.
    • +
    • Upload only the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.
    • +
    • In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
    • +
    +

    Finally, +we encourage you to collaborate. Optimal working groups consist of +2-3 students. You can then hand in a common report. +

    + + +
    + © 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license +
    + + + diff --git a/doc/Projects/2025/Project2/ipynb/.ipynb_checkpoints/Project2-checkpoint.ipynb b/doc/Projects/2025/Project2/ipynb/.ipynb_checkpoints/Project2-checkpoint.ipynb new file mode 100644 index 000000000..90ca0ae29 --- /dev/null +++ b/doc/Projects/2025/Project2/ipynb/.ipynb_checkpoints/Project2-checkpoint.ipynb @@ -0,0 +1,635 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d724df6f", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "8c1bfdba", + "metadata": { + "editable": true + }, + "source": [ + "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n", + "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n", + "\n", + "Date: **October 14, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "42f6cef9", + "metadata": { + "editable": true + }, + "source": [ + "## Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + "\n", + " * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + "\n", + " * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "\n", + "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + "\n", + "A PDF file of the report\n", + " * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + "\n", + " * A README file with the name of the group members\n", + "\n", + " * a short description of the project\n", + "\n", + " * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce" + ] + }, + { + "cell_type": "markdown", + "id": "6b088eeb", + "metadata": { + "editable": true + }, + "source": [ + "### Preamble: Note on writing reports, using reference material, AI and other tools\n", + "\n", + "We want you to answer the three different projects by handing in\n", + "reports written like a standard scientific/technical report. The links\n", + "at\n", + "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n", + "contain more information. There you can find examples of previous\n", + "reports, the projects themselves, how we grade reports etc. How to\n", + "write reports will also be discussed during the various lab\n", + "sessions. Please do ask us if you are in doubt.\n", + "\n", + "When using codes and material from other sources, you should refer to\n", + "these in the bibliography of your report, indicating wherefrom you for\n", + "example got the code, whether this is from the lecture notes,\n", + "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n", + "sources. These sources should always be cited correctly. How to cite\n", + "some of the libraries is often indicated from their corresponding\n", + "GitHub sites or websites, see for example how to cite Scikit-Learn at\n", + "https://scikit-learn.org/dev/about.html./n", + "\n", + "We enocurage you to use tools like ChatGPT or similar in writing the\n", + "report. If you use for example ChatGPT, please do cite it properly and\n", + "include (if possible) your questions and answers as an addition to the\n", + "report. This can be uploaded to for example your website,\n", + "GitHub/GitLab or similar as supplemental material.\n", + "\n", + "If you would like to study other data sets, feel free to propose other\n", + "sets. What we have proposed here are mere suggestions from our\n", + "side. If you opt for another data set, consider using a set which has\n", + "been studied in the scientific literature. This makes it easier for\n", + "you to compare and analyze your results. Comparing with existing\n", + "results from the scientific literature is also an essential element of\n", + "the scientific discussion. The University of California at Irvine with\n", + "its Machine Learning repository at\n", + "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n", + "up for examples and inspiration. Kaggle.com is an equally interesting\n", + "site. Feel free to explore these sites." + ] + }, + { + "cell_type": "markdown", + "id": "1f51c6be", + "metadata": { + "editable": true + }, + "source": [ + "## Classification and Regression, writing our own neural network code\n", + "\n", + "The main aim of this project is to study both classification and\n", + "regression problems by developing our own \n", + "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see and ) as well as the lecture material from the same weeks (see and ) should contain enough information for you to get started with writing your own code.\n", + "\n", + "We will also reuse our codes on gradient descent methods from project 1.\n", + "\n", + "The data sets that we propose here are (the default sets)\n", + "\n", + "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n", + "\n", + " * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "* Classification.\n", + "\n", + " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at .\n", + "\n", + "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1." + ] + }, + { + "cell_type": "markdown", + "id": "5428a6da", + "metadata": { + "editable": true + }, + "source": [ + "### Part a): Analytical warm-up\n", + "\n", + "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n", + "gradients. The functions whose gradients we need are:\n", + "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n", + "\n", + "2. The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms\n", + "\n", + "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n", + "\n", + "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.\n", + "\n", + "We will test three activation functions for our neural network setup, these are the \n", + "1. The Sigmoid (aka **logit**) function,\n", + "\n", + "2. the RELU function and\n", + "\n", + "3. the Leaky RELU function\n", + "\n", + "Set up their expressions and their first derivatives.\n", + "You may consult the lecture notes (with codes and more) from week 42 at ." + ] + }, + { + "cell_type": "markdown", + "id": "d56ea8d6", + "metadata": { + "editable": true + }, + "source": [ + "### Reminder about the gradient machinery from project 1\n", + "\n", + "In the setup of a neural network code you will need your gradient descent codes from\n", + "project 1. For neural networks we will recommend using stochastic\n", + "gradient descent with either the RMSprop or the ADAM algorithms for\n", + "updating the learning rates. But you should feel free to try plain gradient descent as well.\n", + "\n", + "We recommend reading chapter 8 on optimization from the textbook of\n", + "Goodfellow, Bengio and Courville at\n", + ". This chapter contains many\n", + "useful insights and discussions on the optimization part of machine\n", + "learning. A useful reference on the back progagation algorithm is\n", + "Nielsen's book at . \n", + "\n", + "You will find the Python [Seaborn\n", + "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n", + "useful when plotting the results as function of the learning rate\n", + "$\\eta$ and the hyper-parameter $\\lambda$ ." + ] + }, + { + "cell_type": "markdown", + "id": "87464bce", + "metadata": { + "editable": true + }, + "source": [ + "### Part b): Writing your own Neural Network code\n", + "\n", + "Your aim now, and this is the central part of this project, is to\n", + "write your own FFNN code implementing the back\n", + "propagation algorithm discussed in the lecture slides from week 41 at and week 42 at .\n", + "\n", + "We will focus on a regression problem first, using the one-dimensional Runge function" + ] + }, + { + "cell_type": "markdown", + "id": "fc102ae5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1+25x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cec503de", + "metadata": { + "editable": true + }, + "source": [ + "from project 1.\n", + "\n", + "Use only the mean-squared error as cost function (no regularization terms) and \n", + "write an FFNN code for a regression problem with a flexible number of hidden\n", + "layers and nodes using only the Sigmoid function as activation function for\n", + "the hidden layers. Initialize the weights using a normal\n", + "distribution. How would you initialize the biases? And which\n", + "activation function would you select for the final output layer?\n", + "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n", + "\n", + "Train your network and compare the results with those from your OLS\n", + "regression code from project 1 using the one-dimensional Runge\n", + "function. When comparing your neural network code with the OLS\n", + "results from project 1, use the same data sets which gave you the best\n", + "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n", + "best result. Compare these results with your neural network with one\n", + "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n", + "\n", + "Comment your results and give a critical discussion of the results\n", + "obtained with the OLS code from project 1 and your own neural network\n", + "code. Make an analysis of the learning rates employed to find the\n", + "optimal MSE score. Test both stochastic gradient descent\n", + "with RMSprop and ADAM and plain gradient descent with different\n", + "learning rates.\n", + "\n", + "You should, as you did in project 1, scale your data." + ] + }, + { + "cell_type": "markdown", + "id": "bbf4879f", + "metadata": { + "editable": true + }, + "source": [ + "### Part c): Testing against other software libraries\n", + "\n", + "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n", + "\n", + "Furthermore, you should also test that your derivatives are correctly\n", + "calculated using automatic differentiation, using for example the\n", + "**Autograd** library or the **JAX** library. It is optional to implement\n", + "these libraries for the present project. In this project they serve as\n", + "useful tests of our derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "307035d6", + "metadata": { + "editable": true + }, + "source": [ + "### Part d): Testing different activation functions and depths of the neural network\n", + "\n", + "You should also test different activation functions for the hidden\n", + "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n", + "discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n", + "It is optional in this project to perform a bias-variance trade-off analysis." + ] + }, + { + "cell_type": "markdown", + "id": "a6d69596", + "metadata": { + "editable": true + }, + "source": [ + "### Part e): Testing different norms\n", + "\n", + "Finally, still using the one-dimensional Runge function, add now the\n", + "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms. Find the\n", + "optimal results for the hyperparameters $\\lambda$ and the learning\n", + "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n", + "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n", + "Use again the same data sets and the best results from project 1 in your comparisons." + ] + }, + { + "cell_type": "markdown", + "id": "b4073806", + "metadata": { + "editable": true + }, + "source": [ + "### Part f): Classification analysis using neural networks\n", + "\n", + "With a well-written code it should now be easy to change the\n", + "activation function for the output layer.\n", + "\n", + "Here we will change the cost function for our neural network code\n", + "developed in parts b), d) and e) in order to perform a classification\n", + "analysis. The classification problem we will study is the multiclass\n", + "MNIST problem, see the description of the full data set at\n", + ". We will use the Softmax cross entropy function discussed in a). \n", + "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n", + "\n", + "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the \n", + "MNIST-Fashion data set at for example .\n", + "\n", + "To set up the data set, the following python programs may be useful" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "97f27c66", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import fetch_openml\n", + "\n", + "# Fetch the MNIST dataset\n", + "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n", + "\n", + "# Extract data (features) and target (labels)\n", + "X = mnist.data\n", + "y = mnist.target" + ] + }, + { + "cell_type": "markdown", + "id": "9525e347", + "metadata": { + "editable": true + }, + "source": [ + "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a9919b5f", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = X / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "c794dffb", + "metadata": { + "editable": true + }, + "source": [ + "And then perform the standard train-test splitting" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ea0aa772", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "b960fb33", + "metadata": { + "editable": true + }, + "source": [ + "To measure the performance of our classification problem we will use the\n", + "so-called *accuracy* score. The accuracy is as you would expect just\n", + "the number of correctly guessed targets $t_i$ divided by the total\n", + "number of targets, that is" + ] + }, + { + "cell_type": "markdown", + "id": "47b8fa51", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5e5a1100", + "metadata": { + "editable": true + }, + "source": [ + "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n", + "otherwise if we have a binary classification problem. Here $t_i$\n", + "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n", + "\n", + "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions. \n", + "\n", + "Again, we strongly recommend that you compare your own neural Network\n", + "code for classification and pertinent results against a similar code using **Scikit-Learn** or **tensorflow/keras** or **pytorch**.\n", + "\n", + "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n", + "The weblink here compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero. \n", + "\n", + "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "94699ffc", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "# Initialize the model\n", + "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n", + "# Train the model\n", + "model.fit(X_train, y_train)\n", + "from sklearn.metrics import accuracy_score\n", + "# Make predictions on the test set\n", + "y_pred = model.predict(X_test)\n", + "# Calculate accuracy\n", + "accuracy = accuracy_score(y_test, y_pred)\n", + "print(f\"Model Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "5a842d68", + "metadata": { + "editable": true + }, + "source": [ + "### Part g) Critical evaluation of the various algorithms\n", + "\n", + "After all these glorious calculations, you should now summarize the\n", + "various algorithms and come with a critical evaluation of their pros\n", + "and cons. Which algorithm works best for the regression case and which\n", + "is best for the classification case. These codes can also be part of\n", + "your final project 3, but now applied to other data sets." + ] + }, + { + "cell_type": "markdown", + "id": "b57aadc2", + "metadata": { + "editable": true + }, + "source": [ + "## Summary of methods to implement and analyze\n", + "\n", + "**Required Implementation:**\n", + "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n", + "\n", + "2. Implement a neural network with\n", + "\n", + " * A flexible number of layers\n", + "\n", + " * A flexible number of nodes in each layer\n", + "\n", + " * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n", + "\n", + " * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n", + "\n", + " * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n", + "\n", + "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n", + "\n", + "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n", + "\n", + " * With no optimization algorithm\n", + "\n", + " * With RMS Prop\n", + "\n", + " * With ADAM\n", + "\n", + "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n", + "\n", + "6. Implement and compute metrics like the MSE and Accuracy" + ] + }, + { + "cell_type": "markdown", + "id": "ae2d8c77", + "metadata": { + "editable": true + }, + "source": [ + "### Required Analysis:\n", + "\n", + "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n", + "\n", + "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n", + "\n", + "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n", + "\n", + "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n", + "\n", + "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network" + ] + }, + { + "cell_type": "markdown", + "id": "97736190", + "metadata": { + "editable": true + }, + "source": [ + "### Optional (Note that you should include at least two of these in the report):\n", + "\n", + "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with one layer)\n", + "\n", + "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n", + "\n", + "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n", + "\n", + "4. Use a more complex classification dataset instead, like the fashion MNIST (see )\n", + "\n", + "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "6. Compute and interpret a confusion matrix of your best classification model (see )" + ] + }, + { + "cell_type": "markdown", + "id": "8f4d4afc", + "metadata": { + "editable": true + }, + "source": [ + "## Background literature\n", + "\n", + "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at . It is an excellent read.\n", + "\n", + "2. Goodfellow, Bengio and Courville, Deep Learning at . Here we recommend chapters 6, 7 and 8\n", + "\n", + "3. Raschka et al. at . Here we recommend chapters 11, 12 and 13." + ] + }, + { + "cell_type": "markdown", + "id": "404319bc", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to numerical projects\n", + "\n", + "Here follows a brief recipe and recommendation on how to write a report for each\n", + "project.\n", + "\n", + " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "\n", + " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "\n", + " * Include the source code of your program. Comment your program properly.\n", + "\n", + " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "\n", + " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "\n", + " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "\n", + " * Try to give an interpretation of you results in your answers to the problems.\n", + "\n", + " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "\n", + " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + ] + }, + { + "cell_type": "markdown", + "id": "a23505fa", + "metadata": { + "editable": true + }, + "source": [ + "## Format for electronic delivery of report and programs\n", + "\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n", + "\n", + " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "\n", + " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "\n", + " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "\n", + "Finally, \n", + "we encourage you to collaborate. Optimal working groups consist of \n", + "2-3 students. You can then hand in a common report." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/Projects/2025/Project2/ipynb/Project2.ipynb b/doc/Projects/2025/Project2/ipynb/Project2.ipynb new file mode 100644 index 000000000..faf4aee16 --- /dev/null +++ b/doc/Projects/2025/Project2/ipynb/Project2.ipynb @@ -0,0 +1,635 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "96e577ca", + "metadata": { + "editable": true + }, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "067c02b9", + "metadata": { + "editable": true + }, + "source": [ + "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n", + "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n", + "\n", + "Date: **October 14, 2025**" + ] + }, + { + "cell_type": "markdown", + "id": "01f9fedd", + "metadata": { + "editable": true + }, + "source": [ + "## Deliverables\n", + "\n", + "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n", + "\n", + "In canvas, deliver as a group and include:\n", + "\n", + "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n", + "\n", + " * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n", + "\n", + " * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n", + "\n", + "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n", + "\n", + "A PDF file of the report\n", + " * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n", + "\n", + " * A README file with the name of the group members\n", + "\n", + " * a short description of the project\n", + "\n", + " * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce" + ] + }, + { + "cell_type": "markdown", + "id": "9f8e4871", + "metadata": { + "editable": true + }, + "source": [ + "### Preamble: Note on writing reports, using reference material, AI and other tools\n", + "\n", + "We want you to answer the three different projects by handing in\n", + "reports written like a standard scientific/technical report. The links\n", + "at\n", + "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n", + "contain more information. There you can find examples of previous\n", + "reports, the projects themselves, how we grade reports etc. How to\n", + "write reports will also be discussed during the various lab\n", + "sessions. Please do ask us if you are in doubt.\n", + "\n", + "When using codes and material from other sources, you should refer to\n", + "these in the bibliography of your report, indicating wherefrom you for\n", + "example got the code, whether this is from the lecture notes,\n", + "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n", + "sources. These sources should always be cited correctly. How to cite\n", + "some of the libraries is often indicated from their corresponding\n", + "GitHub sites or websites, see for example how to cite Scikit-Learn at\n", + "https://scikit-learn.org/dev/about.html./n", + "\n", + "We enocurage you to use tools like ChatGPT or similar in writing the\n", + "report. If you use for example ChatGPT, please do cite it properly and\n", + "include (if possible) your questions and answers as an addition to the\n", + "report. This can be uploaded to for example your website,\n", + "GitHub/GitLab or similar as supplemental material.\n", + "\n", + "If you would like to study other data sets, feel free to propose other\n", + "sets. What we have proposed here are mere suggestions from our\n", + "side. If you opt for another data set, consider using a set which has\n", + "been studied in the scientific literature. This makes it easier for\n", + "you to compare and analyze your results. Comparing with existing\n", + "results from the scientific literature is also an essential element of\n", + "the scientific discussion. The University of California at Irvine with\n", + "its Machine Learning repository at\n", + "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n", + "up for examples and inspiration. Kaggle.com is an equally interesting\n", + "site. Feel free to explore these sites." + ] + }, + { + "cell_type": "markdown", + "id": "460cc6ea", + "metadata": { + "editable": true + }, + "source": [ + "## Classification and Regression, writing our own neural network code\n", + "\n", + "The main aim of this project is to study both classification and\n", + "regression problems by developing our own \n", + "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see and ) as well as the lecture material from the same weeks (see and ) should contain enough information for you to get started with writing your own code.\n", + "\n", + "We will also reuse our codes on gradient descent methods from project 1.\n", + "\n", + "The data sets that we propose here are (the default sets)\n", + "\n", + "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n", + "\n", + " * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "* Classification.\n", + "\n", + " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at .\n", + "\n", + "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1." + ] + }, + { + "cell_type": "markdown", + "id": "d62a07ef", + "metadata": { + "editable": true + }, + "source": [ + "### Part a): Analytical warm-up\n", + "\n", + "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n", + "gradients. The functions whose gradients we need are:\n", + "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n", + "\n", + "2. The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms\n", + "\n", + "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n", + "\n", + "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy.\n", + "\n", + "We will test three activation functions for our neural network setup, these are the \n", + "1. The Sigmoid (aka **logit**) function,\n", + "\n", + "2. the RELU function and\n", + "\n", + "3. the Leaky RELU function\n", + "\n", + "Set up their expressions and their first derivatives.\n", + "You may consult the lecture notes (with codes and more) from week 42 at ." + ] + }, + { + "cell_type": "markdown", + "id": "9cd8b8ac", + "metadata": { + "editable": true + }, + "source": [ + "### Reminder about the gradient machinery from project 1\n", + "\n", + "In the setup of a neural network code you will need your gradient descent codes from\n", + "project 1. For neural networks we will recommend using stochastic\n", + "gradient descent with either the RMSprop or the ADAM algorithms for\n", + "updating the learning rates. But you should feel free to try plain gradient descent as well.\n", + "\n", + "We recommend reading chapter 8 on optimization from the textbook of\n", + "Goodfellow, Bengio and Courville at\n", + ". This chapter contains many\n", + "useful insights and discussions on the optimization part of machine\n", + "learning. A useful reference on the back progagation algorithm is\n", + "Nielsen's book at . \n", + "\n", + "You will find the Python [Seaborn\n", + "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n", + "useful when plotting the results as function of the learning rate\n", + "$\\eta$ and the hyper-parameter $\\lambda$ ." + ] + }, + { + "cell_type": "markdown", + "id": "5931b155", + "metadata": { + "editable": true + }, + "source": [ + "### Part b): Writing your own Neural Network code\n", + "\n", + "Your aim now, and this is the central part of this project, is to\n", + "write your own FFNN code implementing the back\n", + "propagation algorithm discussed in the lecture slides from week 41 at and week 42 at .\n", + "\n", + "We will focus on a regression problem first, using the one-dimensional Runge function" + ] + }, + { + "cell_type": "markdown", + "id": "b273fc8a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "f(x) = \\frac{1}{1+25x^2},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e13db1ec", + "metadata": { + "editable": true + }, + "source": [ + "from project 1.\n", + "\n", + "Use only the mean-squared error as cost function (no regularization terms) and \n", + "write an FFNN code for a regression problem with a flexible number of hidden\n", + "layers and nodes using only the Sigmoid function as activation function for\n", + "the hidden layers. Initialize the weights using a normal\n", + "distribution. How would you initialize the biases? And which\n", + "activation function would you select for the final output layer?\n", + "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n", + "\n", + "Train your network and compare the results with those from your OLS\n", + "regression code from project 1 using the one-dimensional Runge\n", + "function. When comparing your neural network code with the OLS\n", + "results from project 1, use the same data sets which gave you the best\n", + "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n", + "best result. Compare these results with your neural network with one\n", + "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n", + "\n", + "Comment your results and give a critical discussion of the results\n", + "obtained with the OLS code from project 1 and your own neural network\n", + "code. Make an analysis of the learning rates employed to find the\n", + "optimal MSE score. Test both stochastic gradient descent\n", + "with RMSprop and ADAM and plain gradient descent with different\n", + "learning rates.\n", + "\n", + "You should, as you did in project 1, scale your data." + ] + }, + { + "cell_type": "markdown", + "id": "4f864e31", + "metadata": { + "editable": true + }, + "source": [ + "### Part c): Testing against other software libraries\n", + "\n", + "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n", + "\n", + "Furthermore, you should also test that your derivatives are correctly\n", + "calculated using automatic differentiation, using for example the\n", + "**Autograd** library or the **JAX** library. It is optional to implement\n", + "these libraries for the present project. In this project they serve as\n", + "useful tests of our derivatives." + ] + }, + { + "cell_type": "markdown", + "id": "c9faeafd", + "metadata": { + "editable": true + }, + "source": [ + "### Part d): Testing different activation functions and depths of the neural network\n", + "\n", + "You should also test different activation functions for the hidden\n", + "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n", + "discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n", + "It is optional in this project to perform a bias-variance trade-off analysis." + ] + }, + { + "cell_type": "markdown", + "id": "d865c22b", + "metadata": { + "editable": true + }, + "source": [ + "### Part e): Testing different norms\n", + "\n", + "Finally, still using the one-dimensional Runge function, add now the\n", + "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms. Find the\n", + "optimal results for the hyperparameters $\\lambda$ and the learning\n", + "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n", + "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n", + "Use again the same data sets and the best results from project 1 in your comparisons." + ] + }, + { + "cell_type": "markdown", + "id": "5270af8f", + "metadata": { + "editable": true + }, + "source": [ + "### Part f): Classification analysis using neural networks\n", + "\n", + "With a well-written code it should now be easy to change the\n", + "activation function for the output layer.\n", + "\n", + "Here we will change the cost function for our neural network code\n", + "developed in parts b), d) and e) in order to perform a classification\n", + "analysis. The classification problem we will study is the multiclass\n", + "MNIST problem, see the description of the full data set at\n", + ". We will use the Softmax cross entropy function discussed in a). \n", + "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n", + "\n", + "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the \n", + "MNIST-Fashion data set at for example .\n", + "\n", + "To set up the data set, the following python programs may be useful" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e0e1fea", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.datasets import fetch_openml\n", + "\n", + "# Fetch the MNIST dataset\n", + "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n", + "\n", + "# Extract data (features) and target (labels)\n", + "X = mnist.data\n", + "y = mnist.target" + ] + }, + { + "cell_type": "markdown", + "id": "8fe85677", + "metadata": { + "editable": true + }, + "source": [ + "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b28318b2", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "X = X / 255.0" + ] + }, + { + "cell_type": "markdown", + "id": "97e02c71", + "metadata": { + "editable": true + }, + "source": [ + "And then perform the standard train-test splitting" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "88af355c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "id": "d1f8f0ed", + "metadata": { + "editable": true + }, + "source": [ + "To measure the performance of our classification problem we will use the\n", + "so-called *accuracy* score. The accuracy is as you would expect just\n", + "the number of correctly guessed targets $t_i$ divided by the total\n", + "number of targets, that is" + ] + }, + { + "cell_type": "markdown", + "id": "554b3a48", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "77bfdd5c", + "metadata": { + "editable": true + }, + "source": [ + "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n", + "otherwise if we have a binary classification problem. Here $t_i$\n", + "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n", + "\n", + "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions. \n", + "\n", + "Again, we strongly recommend that you compare your own neural Network\n", + "code for classification and pertinent results against a similar code using **Scikit-Learn** or **tensorflow/keras** or **pytorch**.\n", + "\n", + "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n", + "The weblink here compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n", + "\n", + "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "eaa9e72e", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "# Initialize the model\n", + "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n", + "# Train the model\n", + "model.fit(X_train, y_train)\n", + "from sklearn.metrics import accuracy_score\n", + "# Make predictions on the test set\n", + "y_pred = model.predict(X_test)\n", + "# Calculate accuracy\n", + "accuracy = accuracy_score(y_test, y_pred)\n", + "print(f\"Model Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "c7ba883e", + "metadata": { + "editable": true + }, + "source": [ + "### Part g) Critical evaluation of the various algorithms\n", + "\n", + "After all these glorious calculations, you should now summarize the\n", + "various algorithms and come with a critical evaluation of their pros\n", + "and cons. Which algorithm works best for the regression case and which\n", + "is best for the classification case. These codes can also be part of\n", + "your final project 3, but now applied to other data sets." + ] + }, + { + "cell_type": "markdown", + "id": "595be693", + "metadata": { + "editable": true + }, + "source": [ + "## Summary of methods to implement and analyze\n", + "\n", + "**Required Implementation:**\n", + "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n", + "\n", + "2. Implement a neural network with\n", + "\n", + " * A flexible number of layers\n", + "\n", + " * A flexible number of nodes in each layer\n", + "\n", + " * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n", + "\n", + " * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n", + "\n", + " * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n", + "\n", + "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n", + "\n", + "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n", + "\n", + " * With no optimization algorithm\n", + "\n", + " * With RMS Prop\n", + "\n", + " * With ADAM\n", + "\n", + "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n", + "\n", + "6. Implement and compute metrics like the MSE and Accuracy" + ] + }, + { + "cell_type": "markdown", + "id": "35138b41", + "metadata": { + "editable": true + }, + "source": [ + "### Required Analysis:\n", + "\n", + "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n", + "\n", + "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n", + "\n", + "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n", + "\n", + "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n", + "\n", + "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network" + ] + }, + { + "cell_type": "markdown", + "id": "b18bea03", + "metadata": { + "editable": true + }, + "source": [ + "### Optional (Note that you should include at least two of these in the report):\n", + "\n", + "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n", + "\n", + "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n", + "\n", + "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n", + "\n", + "4. Use a more complex classification dataset instead, like the fashion MNIST (see )\n", + "\n", + "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of for an extensive list of two-dimensional functions). \n", + "\n", + "6. Compute and interpret a confusion matrix of your best classification model (see )" + ] + }, + { + "cell_type": "markdown", + "id": "580d8424", + "metadata": { + "editable": true + }, + "source": [ + "## Background literature\n", + "\n", + "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at . It is an excellent read.\n", + "\n", + "2. Goodfellow, Bengio and Courville, Deep Learning at . Here we recommend chapters 6, 7 and 8\n", + "\n", + "3. Raschka et al. at . Here we recommend chapters 11, 12 and 13." + ] + }, + { + "cell_type": "markdown", + "id": "96f5c67e", + "metadata": { + "editable": true + }, + "source": [ + "## Introduction to numerical projects\n", + "\n", + "Here follows a brief recipe and recommendation on how to write a report for each\n", + "project.\n", + "\n", + " * Give a short description of the nature of the problem and the eventual numerical methods you have used.\n", + "\n", + " * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n", + "\n", + " * Include the source code of your program. Comment your program properly.\n", + "\n", + " * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n", + "\n", + " * Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n", + "\n", + " * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n", + "\n", + " * Try to give an interpretation of you results in your answers to the problems.\n", + "\n", + " * Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n", + "\n", + " * Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning." + ] + }, + { + "cell_type": "markdown", + "id": "d1bc28ba", + "metadata": { + "editable": true + }, + "source": [ + "## Format for electronic delivery of report and programs\n", + "\n", + "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n", + "\n", + " * Use Canvas to hand in your projects, log in at with your normal UiO username and password.\n", + "\n", + " * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n", + "\n", + " * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n", + "\n", + "Finally, \n", + "we encourage you to collaborate. Optimal working groups consist of \n", + "2-3 students. You can then hand in a common report." + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/Projects/2025/Project2/ipynb/ipynb-Project2-src.tar.gz b/doc/Projects/2025/Project2/ipynb/ipynb-Project2-src.tar.gz new file mode 100644 index 000000000..d9ea3457e Binary files /dev/null and b/doc/Projects/2025/Project2/ipynb/ipynb-Project2-src.tar.gz differ diff --git a/doc/Projects/2025/Project2/pdf/Project2.p.tex b/doc/Projects/2025/Project2/pdf/Project2.p.tex new file mode 100644 index 000000000..b2d52b1bb --- /dev/null +++ b/doc/Projects/2025/Project2/pdf/Project2.p.tex @@ -0,0 +1,614 @@ +%% +%% Automatically generated file from DocOnce source +%% (https://github.com/doconce/doconce/) +%% doconce format latex Project2.do.txt --print_latex_style=trac --latex_admon=paragraph +%% +% #ifdef PTEX2TEX_EXPLANATION +%% +%% The file follows the ptex2tex extended LaTeX format, see +%% ptex2tex: https://code.google.com/p/ptex2tex/ +%% +%% Run +%% ptex2tex myfile +%% or +%% doconce ptex2tex myfile +%% +%% to turn myfile.p.tex into an ordinary LaTeX file myfile.tex. +%% (The ptex2tex program: https://code.google.com/p/ptex2tex) +%% Many preprocess options can be added to ptex2tex or doconce ptex2tex +%% +%% ptex2tex -DMINTED myfile +%% doconce ptex2tex myfile envir=minted +%% +%% ptex2tex will typeset code environments according to a global or local +%% .ptex2tex.cfg configure file. doconce ptex2tex will typeset code +%% according to options on the command line (just type doconce ptex2tex to +%% see examples). If doconce ptex2tex has envir=minted, it enables the +%% minted style without needing -DMINTED. +% #endif + +% #define PREAMBLE + +% #ifdef PREAMBLE +%-------------------- begin preamble ---------------------- + +\documentclass[% +oneside, % oneside: electronic viewing, twoside: printing +final, % draft: marks overfull hboxes, figures with paths +10pt]{article} + +\listfiles % print all files needed to compile this document + +\usepackage{relsize,makeidx,color,setspace,amsmath,amsfonts,amssymb} +\usepackage[table]{xcolor} +\usepackage{bm,ltablex,microtype} + +\usepackage[pdftex]{graphicx} + +\usepackage{ptex2tex} +% #ifdef MINTED +\usepackage{minted} +\usemintedstyle{default} +% #endif + +\usepackage[T1]{fontenc} +%\usepackage[latin1]{inputenc} +\usepackage{ucs} +\usepackage[utf8x]{inputenc} + +\usepackage{lmodern} % Latin Modern fonts derived from Computer Modern + +% Hyperlinks in PDF: +\definecolor{linkcolor}{rgb}{0,0,0.4} +\usepackage{hyperref} +\hypersetup{ + breaklinks=true, + colorlinks=true, + linkcolor=linkcolor, + urlcolor=linkcolor, + citecolor=black, + filecolor=black, + %filecolor=blue, + pdfmenubar=true, + pdftoolbar=true, + bookmarksdepth=3 % Uncomment (and tweak) for PDF bookmarks with more levels than the TOC + } +%\hyperbaseurl{} % hyperlinks are relative to this root + +\setcounter{tocdepth}{2} % levels in table of contents + +% --- fancyhdr package for fancy headers --- +\usepackage{fancyhdr} +\fancyhf{} % sets both header and footer to nothing +\renewcommand{\headrulewidth}{0pt} +\fancyfoot[LE,RO]{\thepage} +% Ensure copyright on titlepage (article style) and chapter pages (book style) +\fancypagestyle{plain}{ + \fancyhf{} + \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}} +% \renewcommand{\footrulewidth}{0mm} + \renewcommand{\headrulewidth}{0mm} +} +% Ensure copyright on titlepages with \thispagestyle{empty} +\fancypagestyle{empty}{ + \fancyhf{} + \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}} + \renewcommand{\footrulewidth}{0mm} + \renewcommand{\headrulewidth}{0mm} +} + +\pagestyle{fancy} + + +% prevent orhpans and widows +\clubpenalty = 10000 +\widowpenalty = 10000 + +% --- end of standard preamble for documents --- + + +% insert custom LaTeX commands... + +\raggedbottom +\makeindex +\usepackage[totoc]{idxlayout} % for index in the toc +\usepackage[nottoc]{tocbibind} % for references/bibliography in the toc + +%-------------------- end preamble ---------------------- + +\begin{document} + +% matching end for #ifdef PREAMBLE +% #endif + +\newcommand{\exercisesection}[1]{\subsection*{#1}} + + +% ------------------- main content ---------------------- + + + +% ----------------- title ------------------------- + +\thispagestyle{empty} + +\begin{center} +{\LARGE\bf +\begin{spacing}{1.25} +Project 2 on Machine Learning, deadline November 10 (Midnight) +\end{spacing} +} +\end{center} + +% ----------------- author(s) ------------------------- + +\begin{center} +{\bf \href{{http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html}}{Data Analysis and Machine Learning FYS-STK3155/FYS4155}} +\end{center} + + \begin{center} +% List of all institutions: +\centerline{{\small University of Oslo, Norway}} +\end{center} + +% ----------------- end author(s) ------------------------- + +% --- begin date --- +\begin{center} +October 14, 2025 +\end{center} +% --- end date --- + +\vspace{1cm} + + +\subsection{Deliverables} + +First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the \textbf{People} page. + +In canvas, deliver as a group and include: + +\begin{itemize} +\item A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include: +\begin{itemize} + + \item It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count + + \item It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository. + +\end{itemize} + +\noindent +\item A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include +\end{itemize} + +\noindent +A PDF file of the report +\begin{itemize} + \item A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results. + + \item A README file with the name of the group members + + \item a short description of the project + + \item a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce +\end{itemize} + +\noindent +\paragraph{Preamble: Note on writing reports, using reference material, AI and other tools.} +We want you to answer the three different projects by handing in +reports written like a standard scientific/technical report. The links +at +https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects +contain more information. There you can find examples of previous +reports, the projects themselves, how we grade reports etc. How to +write reports will also be discussed during the various lab +sessions. Please do ask us if you are in doubt. + +When using codes and material from other sources, you should refer to +these in the bibliography of your report, indicating wherefrom you for +example got the code, whether this is from the lecture notes, +softwares like Scikit-Learn, TensorFlow, PyTorch or other +sources. These sources should always be cited correctly. How to cite +some of the libraries is often indicated from their corresponding +GitHub sites or websites, see for example how to cite Scikit-Learn at +https://scikit-learn.org/dev/about.html. + +We enocurage you to use tools like ChatGPT or similar in writing the +report. If you use for example ChatGPT, please do cite it properly and +include (if possible) your questions and answers as an addition to the +report. This can be uploaded to for example your website, +GitHub/GitLab or similar as supplemental material. + +If you would like to study other data sets, feel free to propose other +sets. What we have proposed here are mere suggestions from our +side. If you opt for another data set, consider using a set which has +been studied in the scientific literature. This makes it easier for +you to compare and analyze your results. Comparing with existing +results from the scientific literature is also an essential element of +the scientific discussion. The University of California at Irvine with +its Machine Learning repository at +https://archive.ics.uci.edu/ml/index.php is an excellent site to look +up for examples and inspiration. Kaggle.com is an equally interesting +site. Feel free to explore these sites. + +\subsection{Classification and Regression, writing our own neural network code} + +The main aim of this project is to study both classification and +regression problems by developing our own +feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}) as well as the lecture material from the same weeks (see \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}) should contain enough information for you to get started with writing your own code. + +We will also reuse our codes on gradient descent methods from project 1. + +The data sets that we propose here are (the default sets) + +\begin{itemize} +\item Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be +\begin{itemize} + + \item The simple one-dimensional function Runge function from project 1, that is $f(x) = \frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). + +\end{itemize} + +\noindent +\item Classification. +\begin{itemize} + + \item We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at \href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}. +\end{itemize} + +\noindent +\end{itemize} + +\noindent +We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1. + +\paragraph{Part a): Analytical warm-up.} +When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective +gradients. The functions whose gradients we need are: +\begin{enumerate} +\item The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems) + +\item The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms + +\item The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function) +\end{enumerate} + +\noindent +Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy. + +We will test three activation functions for our neural network setup, these are the +\begin{enumerate} +\item The Sigmoid (aka \textbf{logit}) function, + +\item the RELU function and + +\item the Leaky RELU function +\end{enumerate} + +\noindent +Set up their expressions and their first derivatives. +You may consult the lecture notes (with codes and more) from week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}. + +\paragraph{Reminder about the gradient machinery from project 1.} +In the setup of a neural network code you will need your gradient descent codes from +project 1. For neural networks we will recommend using stochastic +gradient descent with either the RMSprop or the ADAM algorithms for +updating the learning rates. But you should feel free to try plain gradient descent as well. + +We recommend reading chapter 8 on optimization from the textbook of +Goodfellow, Bengio and Courville at +\href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. This chapter contains many +useful insights and discussions on the optimization part of machine +learning. A useful reference on the back progagation algorithm is +Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. + +You will find the Python \href{{https://seaborn.pydata.org/generated/seaborn.heatmap.html}}{Seaborn +package} +useful when plotting the results as function of the learning rate +$\eta$ and the hyper-parameter $\lambda$ . + +\paragraph{Part b): Writing your own Neural Network code.} +Your aim now, and this is the central part of this project, is to +write your own FFNN code implementing the back +propagation algorithm discussed in the lecture slides from week 41 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}. + +We will focus on a regression problem first, using the one-dimensional Runge function +\[ +f(x) = \frac{1}{1+25x^2}, +\] +from project 1. + +Use only the mean-squared error as cost function (no regularization terms) and +write an FFNN code for a regression problem with a flexible number of hidden +layers and nodes using only the Sigmoid function as activation function for +the hidden layers. Initialize the weights using a normal +distribution. How would you initialize the biases? And which +activation function would you select for the final output layer? +And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? + +Train your network and compare the results with those from your OLS +regression code from project 1 using the one-dimensional Runge +function. When comparing your neural network code with the OLS +results from project 1, use the same data sets which gave you the best +MSE score. Moreover, use the polynomial order from project 1 that gave you the +best result. Compare these results with your neural network with one +and two hidden layers using $50$ and $100$ hidden nodes, respectively. + +Comment your results and give a critical discussion of the results +obtained with the OLS code from project 1 and your own neural network +code. Make an analysis of the learning rates employed to find the +optimal MSE score. Test both stochastic gradient descent +with RMSprop and ADAM and plain gradient descent with different +learning rates. + +You should, as you did in project 1, scale your data. + +\paragraph{Part c): Testing against other software libraries.} +You should test your results against a similar code using \textbf{Scikit-Learn} (see the examples in the above lecture notes from weeks 41 and 42) or \textbf{tensorflow/keras} or \textbf{Pytorch} (for Pytorch, see Raschka et al.'s text chapters 12 and 13). + +Furthermore, you should also test that your derivatives are correctly +calculated using automatic differentiation, using for example the +\textbf{Autograd} library or the \textbf{JAX} library. It is optional to implement +these libraries for the present project. In this project they serve as +useful tests of our derivatives. + +\paragraph{Part d): Testing different activation functions and depths of the neural network.} +You should also test different activation functions for the hidden +layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and +discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting? +It is optional in this project to perform a bias-variance trade-off analysis. + +\paragraph{Part e): Testing different norms.} +Finally, still using the one-dimensional Runge function, add now the +hyperparameters $\lambda$ with the $L_2$ and $L_1$ norms. Find the +optimal results for the hyperparameters $\lambda$ and the learning +rates $\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from +project 1 and the $L_1$ results with the Lasso calculations of project 1. +Use again the same data sets and the best results from project 1 in your comparisons. + +\paragraph{Part f): Classification analysis using neural networks.} +With a well-written code it should now be easy to change the +activation function for the output layer. + +Here we will change the cost function for our neural network code +developed in parts b), d) and e) in order to perform a classification +analysis. The classification problem we will study is the multiclass +MNIST problem, see the description of the full data set at +\href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}. We will use the Softmax cross entropy function discussed in a). +The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. + +Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the +MNIST-Fashion data set at for example \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}. + +To set up the data set, the following python programs may be useful + + + + + + + + + +\bpycod +from sklearn.datasets import fetch_openml + +# Fetch the MNIST dataset +mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto') + +# Extract data (features) and target (labels) +X = mnist.data +y = mnist.target + +\epycod + +You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling + + +\bpycod +X = X / 255.0 + +\epycod + +And then perform the standard train-test splitting + + + +\bpycod +from sklearn.model_selection import train_test_split +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) + +\epycod + + +To measure the performance of our classification problem we will use the +so-called \emph{accuracy} score. The accuracy is as you would expect just +the number of correctly guessed targets $t_i$ divided by the total +number of targets, that is + +\[ +\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} , +\] + +where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$ +otherwise if we have a binary classification problem. Here $t_i$ +represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$. + +Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\lambda$, various activation functions, number of hidden layers and nodes and activation functions. + +Again, we strongly recommend that you compare your own neural Network +code for classification and pertinent results against a similar code using \textbf{Scikit-Learn} or \textbf{tensorflow/keras} or \textbf{pytorch}. + +If you have time, you can use the functionality of \textbf{scikit-learn} and compare your neural network results with those from Logistic regression. This is optional. +The weblink here \href{{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}{\nolinkurl{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. + +If you wish to compare with say Logisti Regression from \textbf{scikit-learn}, the following code uses the above data set + + + + + + + + + + + + +\bpycod +from sklearn.linear_model import LogisticRegression +# Initialize the model +model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42) +# Train the model +model.fit(X_train, y_train) +from sklearn.metrics import accuracy_score +# Make predictions on the test set +y_pred = model.predict(X_test) +# Calculate accuracy +accuracy = accuracy_score(y_test, y_pred) +print(f"Model Accuracy: {accuracy:.4f}") + +\epycod + + +\paragraph{Part g) Critical evaluation of the various algorithms.} +After all these glorious calculations, you should now summarize the +various algorithms and come with a critical evaluation of their pros +and cons. Which algorithm works best for the regression case and which +is best for the classification case. These codes can also be part of +your final project 3, but now applied to other data sets. + +\subsection{Summary of methods to implement and analyze} + +\textbf{Required Implementation:} +\begin{enumerate} +\item Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task. + +\item Implement a neural network with +\begin{itemize} + + \item A flexible number of layers + + \item A flexible number of nodes in each layer + + \item A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax) + + \item A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification + + \item An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics) + +\end{itemize} + +\noindent +\item Implement the back-propagation algorithm to compute the gradient of your neural network + +\item Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network) +\begin{itemize} + + \item With no optimization algorithm + + \item With RMS Prop + + \item With ADAM + +\end{itemize} + +\noindent +\item Implement scaling and train-test splitting of your data, preferably using sklearn + +\item Implement and compute metrics like the MSE and Accuracy +\end{enumerate} + +\noindent +\paragraph{Required Analysis:} +\begin{enumerate} +\item Briefly show and argue for the advantages and disadvantages of the methods from Project 1. + +\item Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance. + +\item Show and argue for the advantages and disadvantages of using a neural network for regression on your data + +\item Show and argue for the advantages and disadvantages of using a neural network for classification on your data + +\item Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network +\end{enumerate} + +\noindent +\paragraph{Optional (Note that you should include at least two of these in the report):} +\begin{enumerate} +\item Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer) + +\item Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation. + +\item Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html) + +\item Use a more complex classification dataset instead, like the fashion MNIST (see \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}) + +\item Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). + +\item Compute and interpret a confusion matrix of your best classification model (see \href{{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}}{\nolinkurl{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}}) +\end{enumerate} + +\noindent +\subsection{Background literature} + +\begin{enumerate} +\item The text of Michael Nielsen is highly recommended, see Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. It is an excellent read. + +\item Goodfellow, Bengio and Courville, Deep Learning at \href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. Here we recommend chapters 6, 7 and 8 + +\item Raschka et al.~at \href{{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}{\nolinkurl{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}. Here we recommend chapters 11, 12 and 13. +\end{enumerate} + +\noindent +\subsection{Introduction to numerical projects} + +Here follows a brief recipe and recommendation on how to write a report for each +project. + +\begin{itemize} + \item Give a short description of the nature of the problem and the eventual numerical methods you have used. + + \item Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself. + + \item Include the source code of your program. Comment your program properly. + + \item If possible, try to find analytic solutions, or known limits in order to test your program when developing the code. + + \item Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes. + + \item Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc. + + \item Try to give an interpretation of you results in your answers to the problems. + + \item Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it. + + \item Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning. +\end{itemize} + +\noindent +\subsection{Format for electronic delivery of report and programs} + +The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report: + +\begin{itemize} + \item Use Canvas to hand in your projects, log in at \href{{https://www.uio.no/english/services/it/education/canvas/}}{\nolinkurl{https://www.uio.no/english/services/it/education/canvas/}} with your normal UiO username and password. + + \item Upload \textbf{only} the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them. + + \item In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters. +\end{itemize} + +\noindent +Finally, +we encourage you to collaborate. Optimal working groups consist of +2-3 students. You can then hand in a common report. + + +% ------------------- end of main content --------------- + +% #ifdef PREAMBLE +\end{document} +% #endif + diff --git a/doc/Projects/2025/Project2/pdf/Project2.tex b/doc/Projects/2025/Project2/pdf/Project2.tex new file mode 100644 index 000000000..488317149 --- /dev/null +++ b/doc/Projects/2025/Project2/pdf/Project2.tex @@ -0,0 +1,582 @@ +%% +%% Automatically generated file from DocOnce source +%% (https://github.com/doconce/doconce/) +%% doconce format latex Project2.do.txt --print_latex_style=trac --latex_admon=paragraph +%% + + +%-------------------- begin preamble ---------------------- + +\documentclass[% +oneside, % oneside: electronic viewing, twoside: printing +final, % draft: marks overfull hboxes, figures with paths +10pt]{article} + +\listfiles % print all files needed to compile this document + +\usepackage{relsize,makeidx,color,setspace,amsmath,amsfonts,amssymb} +\usepackage[table]{xcolor} +\usepackage{bm,ltablex,microtype} + +\usepackage[pdftex]{graphicx} + +\usepackage{fancyvrb} % packages needed for verbatim environments + +\usepackage[T1]{fontenc} +%\usepackage[latin1]{inputenc} +\usepackage{ucs} +\usepackage[utf8x]{inputenc} + +\usepackage{lmodern} % Latin Modern fonts derived from Computer Modern + +% Hyperlinks in PDF: +\definecolor{linkcolor}{rgb}{0,0,0.4} +\usepackage{hyperref} +\hypersetup{ + breaklinks=true, + colorlinks=true, + linkcolor=linkcolor, + urlcolor=linkcolor, + citecolor=black, + filecolor=black, + %filecolor=blue, + pdfmenubar=true, + pdftoolbar=true, + bookmarksdepth=3 % Uncomment (and tweak) for PDF bookmarks with more levels than the TOC + } +%\hyperbaseurl{} % hyperlinks are relative to this root + +\setcounter{tocdepth}{2} % levels in table of contents + +% --- fancyhdr package for fancy headers --- +\usepackage{fancyhdr} +\fancyhf{} % sets both header and footer to nothing +\renewcommand{\headrulewidth}{0pt} +\fancyfoot[LE,RO]{\thepage} +% Ensure copyright on titlepage (article style) and chapter pages (book style) +\fancypagestyle{plain}{ + \fancyhf{} + \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}} +% \renewcommand{\footrulewidth}{0mm} + \renewcommand{\headrulewidth}{0mm} +} +% Ensure copyright on titlepages with \thispagestyle{empty} +\fancypagestyle{empty}{ + \fancyhf{} + \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}} + \renewcommand{\footrulewidth}{0mm} + \renewcommand{\headrulewidth}{0mm} +} + +\pagestyle{fancy} + + +% prevent orhpans and widows +\clubpenalty = 10000 +\widowpenalty = 10000 + +% --- end of standard preamble for documents --- + + +% insert custom LaTeX commands... + +\raggedbottom +\makeindex +\usepackage[totoc]{idxlayout} % for index in the toc +\usepackage[nottoc]{tocbibind} % for references/bibliography in the toc + +%-------------------- end preamble ---------------------- + +\begin{document} + +% matching end for #ifdef PREAMBLE + +\newcommand{\exercisesection}[1]{\subsection*{#1}} + + +% ------------------- main content ---------------------- + + + +% ----------------- title ------------------------- + +\thispagestyle{empty} + +\begin{center} +{\LARGE\bf +\begin{spacing}{1.25} +Project 2 on Machine Learning, deadline November 10 (Midnight) +\end{spacing} +} +\end{center} + +% ----------------- author(s) ------------------------- + +\begin{center} +{\bf \href{{http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html}}{Data Analysis and Machine Learning FYS-STK3155/FYS4155}} +\end{center} + + \begin{center} +% List of all institutions: +\centerline{{\small University of Oslo, Norway}} +\end{center} + +% ----------------- end author(s) ------------------------- + +% --- begin date --- +\begin{center} +October 14, 2025 +\end{center} +% --- end date --- + +\vspace{1cm} + + +\subsection*{Deliverables} + +First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the \textbf{People} page. + +In canvas, deliver as a group and include: + +\begin{itemize} +\item A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include: +\begin{itemize} + + \item It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count + + \item It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository. + +\end{itemize} + +\noindent +\item A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include +\end{itemize} + +\noindent +A PDF file of the report +\begin{itemize} + \item A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results. + + \item A README file with the name of the group members + + \item a short description of the project + + \item a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce +\end{itemize} + +\noindent +\paragraph{Preamble: Note on writing reports, using reference material, AI and other tools.} +We want you to answer the three different projects by handing in +reports written like a standard scientific/technical report. The links +at +https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects +contain more information. There you can find examples of previous +reports, the projects themselves, how we grade reports etc. How to +write reports will also be discussed during the various lab +sessions. Please do ask us if you are in doubt. + +When using codes and material from other sources, you should refer to +these in the bibliography of your report, indicating wherefrom you for +example got the code, whether this is from the lecture notes, +softwares like Scikit-Learn, TensorFlow, PyTorch or other +sources. These sources should always be cited correctly. How to cite +some of the libraries is often indicated from their corresponding +GitHub sites or websites, see for example how to cite Scikit-Learn at +https://scikit-learn.org/dev/about.html. + +We enocurage you to use tools like ChatGPT or similar in writing the +report. If you use for example ChatGPT, please do cite it properly and +include (if possible) your questions and answers as an addition to the +report. This can be uploaded to for example your website, +GitHub/GitLab or similar as supplemental material. + +If you would like to study other data sets, feel free to propose other +sets. What we have proposed here are mere suggestions from our +side. If you opt for another data set, consider using a set which has +been studied in the scientific literature. This makes it easier for +you to compare and analyze your results. Comparing with existing +results from the scientific literature is also an essential element of +the scientific discussion. The University of California at Irvine with +its Machine Learning repository at +https://archive.ics.uci.edu/ml/index.php is an excellent site to look +up for examples and inspiration. Kaggle.com is an equally interesting +site. Feel free to explore these sites. + +\subsection*{Classification and Regression, writing our own neural network code} + +The main aim of this project is to study both classification and +regression problems by developing our own +feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}) as well as the lecture material from the same weeks (see \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}) should contain enough information for you to get started with writing your own code. + +We will also reuse our codes on gradient descent methods from project 1. + +The data sets that we propose here are (the default sets) + +\begin{itemize} +\item Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be +\begin{itemize} + + \item The simple one-dimensional function Runge function from project 1, that is $f(x) = \frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). + +\end{itemize} + +\noindent +\item Classification. +\begin{itemize} + + \item We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at \href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}. +\end{itemize} + +\noindent +\end{itemize} + +\noindent +We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1. + +\paragraph{Part a): Analytical warm-up.} +When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective +gradients. The functions whose gradients we need are: +\begin{enumerate} +\item The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems) + +\item The binary cross entropy (aka log loss) for binary classification problems with and without $L_1$ and $L_2$ norms + +\item The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function) +\end{enumerate} + +\noindent +Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax cross entropy. + +We will test three activation functions for our neural network setup, these are the +\begin{enumerate} +\item The Sigmoid (aka \textbf{logit}) function, + +\item the RELU function and + +\item the Leaky RELU function +\end{enumerate} + +\noindent +Set up their expressions and their first derivatives. +You may consult the lecture notes (with codes and more) from week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}. + +\paragraph{Reminder about the gradient machinery from project 1.} +In the setup of a neural network code you will need your gradient descent codes from +project 1. For neural networks we will recommend using stochastic +gradient descent with either the RMSprop or the ADAM algorithms for +updating the learning rates. But you should feel free to try plain gradient descent as well. + +We recommend reading chapter 8 on optimization from the textbook of +Goodfellow, Bengio and Courville at +\href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. This chapter contains many +useful insights and discussions on the optimization part of machine +learning. A useful reference on the back progagation algorithm is +Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. + +You will find the Python \href{{https://seaborn.pydata.org/generated/seaborn.heatmap.html}}{Seaborn +package} +useful when plotting the results as function of the learning rate +$\eta$ and the hyper-parameter $\lambda$ . + +\paragraph{Part b): Writing your own Neural Network code.} +Your aim now, and this is the central part of this project, is to +write your own FFNN code implementing the back +propagation algorithm discussed in the lecture slides from week 41 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}. + +We will focus on a regression problem first, using the one-dimensional Runge function +\[ +f(x) = \frac{1}{1+25x^2}, +\] +from project 1. + +Use only the mean-squared error as cost function (no regularization terms) and +write an FFNN code for a regression problem with a flexible number of hidden +layers and nodes using only the Sigmoid function as activation function for +the hidden layers. Initialize the weights using a normal +distribution. How would you initialize the biases? And which +activation function would you select for the final output layer? +And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? + +Train your network and compare the results with those from your OLS +regression code from project 1 using the one-dimensional Runge +function. When comparing your neural network code with the OLS +results from project 1, use the same data sets which gave you the best +MSE score. Moreover, use the polynomial order from project 1 that gave you the +best result. Compare these results with your neural network with one +and two hidden layers using $50$ and $100$ hidden nodes, respectively. + +Comment your results and give a critical discussion of the results +obtained with the OLS code from project 1 and your own neural network +code. Make an analysis of the learning rates employed to find the +optimal MSE score. Test both stochastic gradient descent +with RMSprop and ADAM and plain gradient descent with different +learning rates. + +You should, as you did in project 1, scale your data. + +\paragraph{Part c): Testing against other software libraries.} +You should test your results against a similar code using \textbf{Scikit-Learn} (see the examples in the above lecture notes from weeks 41 and 42) or \textbf{tensorflow/keras} or \textbf{Pytorch} (for Pytorch, see Raschka et al.'s text chapters 12 and 13). + +Furthermore, you should also test that your derivatives are correctly +calculated using automatic differentiation, using for example the +\textbf{Autograd} library or the \textbf{JAX} library. It is optional to implement +these libraries for the present project. In this project they serve as +useful tests of our derivatives. + +\paragraph{Part d): Testing different activation functions and depths of the neural network.} +You should also test different activation functions for the hidden +layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and +discuss your results. Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting? +It is optional in this project to perform a bias-variance trade-off analysis. + +\paragraph{Part e): Testing different norms.} +Finally, still using the one-dimensional Runge function, add now the +hyperparameters $\lambda$ with the $L_2$ and $L_1$ norms. Find the +optimal results for the hyperparameters $\lambda$ and the learning +rates $\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from +project 1 and the $L_1$ results with the Lasso calculations of project 1. +Use again the same data sets and the best results from project 1 in your comparisons. + +\paragraph{Part f): Classification analysis using neural networks.} +With a well-written code it should now be easy to change the +activation function for the output layer. + +Here we will change the cost function for our neural network code +developed in parts b), d) and e) in order to perform a classification +analysis. The classification problem we will study is the multiclass +MNIST problem, see the description of the full data set at +\href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}. We will use the Softmax cross entropy function discussed in a). +The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. + +Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the +MNIST-Fashion data set at for example \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}. + +To set up the data set, the following python programs may be useful + + + + + + + + + +\begin{verbatim} +from sklearn.datasets import fetch_openml + +# Fetch the MNIST dataset +mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto') + +# Extract data (features) and target (labels) +X = mnist.data +y = mnist.target + +\end{verbatim} + +You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling + + +\begin{verbatim} +X = X / 255.0 + +\end{verbatim} + +And then perform the standard train-test splitting + + + +\begin{verbatim} +from sklearn.model_selection import train_test_split +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) + +\end{verbatim} + + +To measure the performance of our classification problem we will use the +so-called \emph{accuracy} score. The accuracy is as you would expect just +the number of correctly guessed targets $t_i$ divided by the total +number of targets, that is + +\[ +\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} , +\] + +where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$ +otherwise if we have a binary classification problem. Here $t_i$ +represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$. + +Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\lambda$, various activation functions, number of hidden layers and nodes and activation functions. + +Again, we strongly recommend that you compare your own neural Network +code for classification and pertinent results against a similar code using \textbf{Scikit-Learn} or \textbf{tensorflow/keras} or \textbf{pytorch}. + +If you have time, you can use the functionality of \textbf{scikit-learn} and compare your neural network results with those from Logistic regression. This is optional. +The weblink here \href{{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}{\nolinkurl{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. + +If you wish to compare with say Logisti Regression from \textbf{scikit-learn}, the following code uses the above data set + + + + + + + + + + + + +\begin{verbatim} +from sklearn.linear_model import LogisticRegression +# Initialize the model +model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42) +# Train the model +model.fit(X_train, y_train) +from sklearn.metrics import accuracy_score +# Make predictions on the test set +y_pred = model.predict(X_test) +# Calculate accuracy +accuracy = accuracy_score(y_test, y_pred) +print(f"Model Accuracy: {accuracy:.4f}") + +\end{verbatim} + + +\paragraph{Part g) Critical evaluation of the various algorithms.} +After all these glorious calculations, you should now summarize the +various algorithms and come with a critical evaluation of their pros +and cons. Which algorithm works best for the regression case and which +is best for the classification case. These codes can also be part of +your final project 3, but now applied to other data sets. + +\subsection*{Summary of methods to implement and analyze} + +\textbf{Required Implementation:} +\begin{enumerate} +\item Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task. + +\item Implement a neural network with +\begin{itemize} + + \item A flexible number of layers + + \item A flexible number of nodes in each layer + + \item A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax) + + \item A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification + + \item An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics) + +\end{itemize} + +\noindent +\item Implement the back-propagation algorithm to compute the gradient of your neural network + +\item Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network) +\begin{itemize} + + \item With no optimization algorithm + + \item With RMS Prop + + \item With ADAM + +\end{itemize} + +\noindent +\item Implement scaling and train-test splitting of your data, preferably using sklearn + +\item Implement and compute metrics like the MSE and Accuracy +\end{enumerate} + +\noindent +\paragraph{Required Analysis:} +\begin{enumerate} +\item Briefly show and argue for the advantages and disadvantages of the methods from Project 1. + +\item Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance. + +\item Show and argue for the advantages and disadvantages of using a neural network for regression on your data + +\item Show and argue for the advantages and disadvantages of using a neural network for classification on your data + +\item Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network +\end{enumerate} + +\noindent +\paragraph{Optional (Note that you should include at least two of these in the report):} +\begin{enumerate} +\item Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer) + +\item Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation. + +\item Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html) + +\item Use a more complex classification dataset instead, like the fashion MNIST (see \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}) + +\item Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). + +\item Compute and interpret a confusion matrix of your best classification model (see \href{{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}}{\nolinkurl{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}}) +\end{enumerate} + +\noindent +\subsection*{Background literature} + +\begin{enumerate} +\item The text of Michael Nielsen is highly recommended, see Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. It is an excellent read. + +\item Goodfellow, Bengio and Courville, Deep Learning at \href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. Here we recommend chapters 6, 7 and 8 + +\item Raschka et al.~at \href{{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}{\nolinkurl{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}. Here we recommend chapters 11, 12 and 13. +\end{enumerate} + +\noindent +\subsection*{Introduction to numerical projects} + +Here follows a brief recipe and recommendation on how to write a report for each +project. + +\begin{itemize} + \item Give a short description of the nature of the problem and the eventual numerical methods you have used. + + \item Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself. + + \item Include the source code of your program. Comment your program properly. + + \item If possible, try to find analytic solutions, or known limits in order to test your program when developing the code. + + \item Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes. + + \item Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc. + + \item Try to give an interpretation of you results in your answers to the problems. + + \item Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it. + + \item Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning. +\end{itemize} + +\noindent +\subsection*{Format for electronic delivery of report and programs} + +The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report: + +\begin{itemize} + \item Use Canvas to hand in your projects, log in at \href{{https://www.uio.no/english/services/it/education/canvas/}}{\nolinkurl{https://www.uio.no/english/services/it/education/canvas/}} with your normal UiO username and password. + + \item Upload \textbf{only} the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them. + + \item In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters. +\end{itemize} + +\noindent +Finally, +we encourage you to collaborate. Optimal working groups consist of +2-3 students. You can then hand in a common report. + + +% ------------------- end of main content --------------- + +\end{document} + diff --git a/doc/Textbooks/SuttonBartoIPRLBook2ndEd.pdf b/doc/Textbooks/SuttonBartoIPRLBook2ndEd.pdf new file mode 100644 index 000000000..f0a9e7792 Binary files /dev/null and b/doc/Textbooks/SuttonBartoIPRLBook2ndEd.pdf differ diff --git a/doc/pub/week37/html/._week37-bs000.html b/doc/pub/week37/html/._week37-bs000.html index 39cfcfc18..5cb7bff40 100644 --- a/doc/pub/week37/html/._week37-bs000.html +++ b/doc/pub/week37/html/._week37-bs000.html @@ -8,8 +8,8 @@ - -Week 37: Statistical interpretations and Resampling Methods + +Week 37: Gradient descent methods @@ -40,159 +40,222 @@ 2, None, 'plans-for-week-37-lecture-monday'), - ('Plans for week 37, lab sessions', + ('Readings and Videos:', 2, None, 'readings-and-videos'), + ('Material for lecture Monday September 8', 2, None, - 'plans-for-week-37-lab-sessions'), - ('Material for lecture Monday September 9', + 'material-for-lecture-monday-september-8'), + ('Gradient descent and revisiting Ordinary Least Squares from ' + 'last week', 2, None, - 'material-for-lecture-monday-september-9'), - ('Deriving OLS from a probability distribution', + 'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'), + ('Gradient descent example', 2, None, 'gradient-descent-example'), + ('The derivative of the cost/loss function', 2, None, - 'deriving-ols-from-a-probability-distribution'), - ('Independent and Identically Distrubuted (iid)', + 'the-derivative-of-the-cost-loss-function'), + ('The Hessian matrix', 2, None, 'the-hessian-matrix'), + ('Simple program', 2, None, 'simple-program'), + ('Gradient Descent Example', 2, None, 'gradient-descent-example'), + ('Gradient descent and Ridge', 2, None, - 'independent-and-identically-distrubuted-iid'), - ('Maximum Likelihood Estimation (MLE)', + 'gradient-descent-and-ridge'), + ('The Hessian matrix for Ridge Regression', 2, None, - 'maximum-likelihood-estimation-mle'), - ('A new Cost Function', 2, None, 'a-new-cost-function'), - ("More basic Statistics and Bayes' theorem", + 'the-hessian-matrix-for-ridge-regression'), + ('Program example for gradient descent with Ridge Regression', 2, None, - 'more-basic-statistics-and-bayes-theorem'), - ('Marginal Probability', 2, None, 'marginal-probability'), - ('Conditional Probability', 2, None, 'conditional-probability'), - ("Bayes' Theorem", 2, None, 'bayes-theorem'), - ("Interpretations of Bayes' Theorem", + 'program-example-for-gradient-descent-with-ridge-regression'), + ('Using gradient descent methods, limitations', 2, None, - 'interpretations-of-bayes-theorem'), - ("Example of Usage of Bayes' theorem", + 'using-gradient-descent-methods-limitations'), + ('Momentum based GD', 2, None, 'momentum-based-gd'), + ('Improving gradient descent with momentum', 2, None, - 'example-of-usage-of-bayes-theorem'), - ('Doing it correctly', 2, None, 'doing-it-correctly'), - ("Bayes' Theorem and Ridge and Lasso Regression", + 'improving-gradient-descent-with-momentum'), + ('Same code but now with momentum gradient descent', 2, None, - 'bayes-theorem-and-ridge-and-lasso-regression'), - ('Ridge and Bayes', 2, None, 'ridge-and-bayes'), - ('Lasso and Bayes', 2, None, 'lasso-and-bayes'), - ('Why resampling methods', 2, None, 'why-resampling-methods'), - ('Resampling methods', 2, None, 'resampling-methods'), - ('Resampling approaches can be computationally expensive', + 'same-code-but-now-with-momentum-gradient-descent'), + ('Overview video on Stochastic Gradient Descent (SGD)', 2, None, - 'resampling-approaches-can-be-computationally-expensive'), - ('Why resampling methods ?', 2, None, 'why-resampling-methods'), - ('Statistical analysis', 2, None, 'statistical-analysis'), - ('Resampling methods', 2, None, 'resampling-methods'), - ('Resampling methods: Bootstrap', + 'overview-video-on-stochastic-gradient-descent-sgd'), + ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'), + ('Pros and cons', 2, None, 'pros-and-cons'), + ('Convergence rates', 2, None, 'convergence-rates'), + ('Accuracy', 2, None, 'accuracy'), + ('Stochastic Gradient Descent (SGD)', 2, None, - 'resampling-methods-bootstrap'), - ('The Central Limit Theorem', + 'stochastic-gradient-descent-sgd'), + ('Stochastic Gradient Descent', 2, None, - 'the-central-limit-theorem'), - ('Finding the Limit', 2, None, 'finding-the-limit'), - ('Rewriting the $\\delta$-function', + 'stochastic-gradient-descent'), + ('Computation of gradients', 2, None, 'computation-of-gradients'), + ('SGD example', 2, None, 'sgd-example'), + ('The gradient step', 2, None, 'the-gradient-step'), + ('Simple example code', 2, None, 'simple-example-code'), + ('When do we stop?', 2, None, 'when-do-we-stop'), + ('Slightly different approach', 2, None, - 'rewriting-the-delta-function'), - ('Identifying Terms', 2, None, 'identifying-terms'), - ('Wrapping it up', 2, None, 'wrapping-it-up'), - ('Confidence Intervals', 2, None, 'confidence-intervals'), - ('Standard Approach based on the Normal Distribution', + 'slightly-different-approach'), + ('Time decay rate', 2, None, 'time-decay-rate'), + ('Code with a Number of Minibatches which varies', 2, None, - 'standard-approach-based-on-the-normal-distribution'), - ('Resampling methods: Bootstrap background', + 'code-with-a-number-of-minibatches-which-varies'), + ('Replace or not', 2, None, 'replace-or-not'), + ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison', 2, None, - 'resampling-methods-bootstrap-background'), - ('Resampling methods: More Bootstrap background', + 'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'), + ('Theoretical Convergence Speed and convex optimization', + 3, + None, + 'theoretical-convergence-speed-and-convex-optimization'), + ('Strongly Convex Case', 3, None, 'strongly-convex-case'), + ('Non-Convex Problems', 3, None, 'non-convex-problems'), + ('Memory Usage and Scalability', + 2, + None, + 'memory-usage-and-scalability'), + ('Empirical Evidence: Convergence Time and Memory in Practice', + 2, + None, + 'empirical-evidence-convergence-time-and-memory-in-practice'), + ('Deep Neural Networks', 3, None, 'deep-neural-networks'), + ('Memory constraints', 3, None, 'memory-constraints'), + ('Second moment of the gradient', + 2, + None, + 'second-moment-of-the-gradient'), + ('Challenge: Choosing a Fixed Learning Rate', + 2, + None, + 'challenge-choosing-a-fixed-learning-rate'), + ('Motivation for Adaptive Step Sizes', + 2, + None, + 'motivation-for-adaptive-step-sizes'), + ('AdaGrad algorithm, taken from "Goodfellow et ' + 'al":"https://www.deeplearningbook.org/contents/optimization.html"', + 2, + None, + 'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'), + ('Derivation of the AdaGrad Algorithm', + 2, + None, + 'derivation-of-the-adagrad-algorithm'), + ('AdaGrad Update Rule Derivation', + 2, + None, + 'adagrad-update-rule-derivation'), + ('AdaGrad Properties', 2, None, 'adagrad-properties'), + ('RMSProp: Adaptive Learning Rates', + 2, + None, + 'rmsprop-adaptive-learning-rates'), + ('RMSProp algorithm, taken from "Goodfellow et ' + 'al":"https://www.deeplearningbook.org/contents/optimization.html"', + 2, + None, + 'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'), + ('Adam Optimizer', 2, None, 'adam-optimizer'), + ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"', + 2, + None, + 'adam-optimizer-https-arxiv-org-abs-1412-6980'), + ('Why Combine Momentum and RMSProp?', + 2, + None, + 'why-combine-momentum-and-rmsprop'), + ('Adam: Exponential Moving Averages (Moments)', 2, None, - 'resampling-methods-more-bootstrap-background'), - ('Resampling methods: Bootstrap approach', + 'adam-exponential-moving-averages-moments'), + ('Adam: Bias Correction', 2, None, 'adam-bias-correction'), + ('Adam: Update Rule Derivation', 2, None, - 'resampling-methods-bootstrap-approach'), - ('Resampling methods: Bootstrap steps', + 'adam-update-rule-derivation'), + ('Adam vs. AdaGrad and RMSProp', 2, None, - 'resampling-methods-bootstrap-steps'), - ('Code example for the Bootstrap method', + 'adam-vs-adagrad-and-rmsprop'), + ('Adaptivity Across Dimensions', 2, None, - 'code-example-for-the-bootstrap-method'), - ('Plotting the Histogram', 2, None, 'plotting-the-histogram'), - ('The bias-variance tradeoff', + 'adaptivity-across-dimensions'), + ('ADAM algorithm, taken from "Goodfellow et ' + 'al":"https://www.deeplearningbook.org/contents/optimization.html"', 2, None, - 'the-bias-variance-tradeoff'), - ('A way to Read the Bias-Variance Tradeoff', + 'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'), + ('Algorithms and codes for Adagrad, RMSprop and Adam', 2, None, - 'a-way-to-read-the-bias-variance-tradeoff'), - ('Example code for Bias-Variance tradeoff', + 'algorithms-and-codes-for-adagrad-rmsprop-and-adam'), + ('Practical tips', 2, None, 'practical-tips'), + ('Sneaking in automatic differentiation using Autograd', 2, None, - 'example-code-for-bias-variance-tradeoff'), - ('Understanding what happens', + 'sneaking-in-automatic-differentiation-using-autograd'), + ('Same code but now with momentum gradient descent', 2, None, - 'understanding-what-happens'), - ('Summing up', 2, None, 'summing-up'), - ("Another Example from Scikit-Learn's Repository", + 'same-code-but-now-with-momentum-gradient-descent'), + ('Including Stochastic Gradient Descent with Autograd', 2, None, - 'another-example-from-scikit-learn-s-repository'), - ('Various steps in cross-validation', + 'including-stochastic-gradient-descent-with-autograd'), + ('Same code but now with momentum gradient descent', 2, None, - 'various-steps-in-cross-validation'), - ('Cross-validation in brief', + 'same-code-but-now-with-momentum-gradient-descent'), + ("But none of these can compete with Newton's method", 2, None, - 'cross-validation-in-brief'), - ('Code Example for Cross-validation and $k$-fold ' - 'Cross-validation', + 'but-none-of-these-can-compete-with-newton-s-method'), + ('Similar (second order function now) problem but now with ' + 'AdaGrad', 2, None, - 'code-example-for-cross-validation-and-k-fold-cross-validation'), - ('More examples on bootstrap and cross-validation and errors', + 'similar-second-order-function-now-problem-but-now-with-adagrad'), + ('RMSprop for adaptive learning rate with Stochastic Gradient ' + 'Descent', 2, None, - 'more-examples-on-bootstrap-and-cross-validation-and-errors'), - ('The same example but now with cross-validation', + 'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'), + ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"', 2, None, - 'the-same-example-but-now-with-cross-validation'), + 'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'), ('Material for the lab sessions', 2, None, 'material-for-the-lab-sessions'), - ('Linking the regression analysis with a statistical ' - 'interpretation', + ('Reminder on different scaling methods', 2, None, - 'linking-the-regression-analysis-with-a-statistical-interpretation'), - ('Assumptions made', 2, None, 'assumptions-made'), - ('Expectation value and variance', + 'reminder-on-different-scaling-methods'), + ('Functionality in Scikit-Learn', 2, None, - 'expectation-value-and-variance'), - ('Expectation value and variance for $\\boldsymbol{\\beta}$', + 'functionality-in-scikit-learn'), + ('More preprocessing', 2, None, 'more-preprocessing'), + ('Frequently used scaling functions', 2, None, - 'expectation-value-and-variance-for-boldsymbol-beta')]} + 'frequently-used-scaling-functions')]} end of tocinfo --> @@ -220,66 +283,86 @@ - Week 37: Statistical interpretations and Resampling Methods + Week 37: Gradient descent methods
    -

    Plans for week 37, lab sessions

    - +

    Readings and Videos:

    -Material for the lab sessions on Tuesday and Wednesday +

    -

      - -

    • Calculations of expectation values
    • - -

    • Discussion of resampling techniques
    • - -

    • Exercise set for week 37
    • - -

    • Work on project 1
    • - -

    • Video of exercise sessions week 37
    • - -

    • For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen's article is highly recommended.
    • -
    +
      +

    1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at https://www.deeplearningbook.org/contents/numerical.html and chapter 8.3-8.5 at https://www.deeplearningbook.org/contents/optimization.html
    2. +

    3. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.
    4. +

    5. Video on gradient descent at https://www.youtube.com/watch?v=sDv4f4s2SB8
    6. +

    7. Video on Stochastic gradient descent at https://www.youtube.com/watch?v=vMh0zPT0tLI
    8. +
    -

    Material for lecture Monday September 9

    +

    Material for lecture Monday September 8

    -

    Deriving OLS from a probability distribution

    +

    Gradient descent and revisiting Ordinary Least Squares from last week

    -

    Our basic assumption when we derived the OLS equations was to assume -that our output is determined by a given continuous function -\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal -distribution with zero mean value and an undetermined variance -\( \sigma^2 \). +

    Last week we started with linear regression as a case study for the gradient descent +methods. Linear regression is a great test case for the gradient +descent methods discussed in the lectures since it has several +desirable properties such as:

    -

    We found above that the outputs \( \boldsymbol{y} \) have a mean value given by -\( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to -the design matrix are not stochastic variables, we can assume that the -probability distribution of our targets is also a normal distribution -but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a -single output \( y_i \) is given by the Gaussian distribution +

      +

    1. An analytical solution (recall homework sets for week 35).
    2. +

    3. The gradient can be computed analytically.
    4. +

    5. The cost function is convex which guarantees that gradient descent converges for small enough learning rates
    6. +
    +

    +

    We revisit an example similar to what we had in the first homework set. We have a function of the type

    + + + +
    +
    +
    +
    +
    +
    import numpy as np
    +x = 2*np.random.rand(m,1)
    +y = 4+3*x+np.random.randn(m,1)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    with \( x_i \in [0,1] \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). +The linear regression model is given by

    +

     
    +$$ +h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x, +$$ +

     
    +

    such that

     
    $$ -y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. +\boldsymbol{y}_i = \theta_0 + \theta_1 x_i. $$

     

    -

    Independent and Identically Distrubuted (iid)

    +

    Gradient descent example

    -

    We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. -We define this distribution as -

    +

    Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \theta = (\theta_0, \theta_1)^T \)

    + +

    It is convenient to write \( \mathbf{\boldsymbol{y}} = X\theta \) where \( X \in \mathbb{R}^{100 \times 2} \) is the design matrix given by (we keep the intercept here)

     
    $$ -p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}, +X \equiv \begin{bmatrix} +1 & x_1 \\ +\vdots & \vdots \\ +1 & x_{100} & \\ +\end{bmatrix}. $$

     
    -

    which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\beta} \).

    - -

    Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have

    - +

    The cost/loss/risk function is given by

     
    $$ -p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}). +C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] $$

     
    -

    We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is -in case we have a simple one-dimensional input and output case -

    +

    and we want to find \( \theta \) such that \( C(\theta) \) is minimized.

    +
    + +
    +

    The derivative of the cost/loss function

    + +

    Computing \( \partial C(\theta) / \partial \theta_0 \) and \( \partial C(\theta) / \partial \theta_1 \) we can show that the gradient can be written as

     
    $$ -\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})]. +\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\ +\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\ +\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), $$

     
    -

    In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). -We can now rewrite the above probability as -

    +

    where \( X \) is the design matrix defined above.

    +
    + +
    +

    The Hessian matrix

    +

    The Hessian matrix of \( C(\theta) \) is given by

     
    $$ -p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. +\boldsymbol{H} \equiv \begin{bmatrix} +\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ +\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ +\end{bmatrix} = \frac{2}{n}X^T X. $$

     
    -

    It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\beta} \).

    +

    This result implies that \( C(\theta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.

    -

    Maximum Likelihood Estimation (MLE)

    - -

    In statistics, maximum likelihood estimation (MLE) is a method of -estimating the parameters of an assumed probability distribution, -given some observed data. This is achieved by maximizing a likelihood -function so that, under the assumed statistical model, the observed -data is the most probable. -

    +

    Simple program

    -

    We will assume here that our events are given by the above Gaussian -distribution and we will determine the optimal parameters \( \beta \) by -maximizing the above PDF. However, computing the derivatives of a -product function is cumbersome and can easily lead to overflow and/or -underflowproblems, with potentials for loss of numerical precision. -

    +

    We can now write a program that minimizes \( C(\theta) \) using the gradient descent method with a constant learning rate \( \eta \) according to

    +

     
    +$$ +\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots +$$ +

     
    -

    In practice, it is more convenient to maximize the logarithm of the -PDF because it is a monotonically increasing function of the argument. -Alternatively, and this will be our option, we will minimize the -negative of the logarithm since this is a monotonically decreasing -function. +

    We can use the expression we computed for the gradient and let use a +\( \theta_0 \) be chosen randomly and let \( \eta = 0.001 \). Stop iterating +when \( ||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8} \). Note that the code below does not include the latter stop criterion.

    -

    Note also that maximization/minimization of the logarithm of the PDF -is equivalent to the maximization/minimization of the function itself. +

    And finally we can compare our solution for \( \theta \) with the analytic result given by +\( \theta= (X^TX)^{-1} X^T \mathbf{y} \).

    -

    A new Cost Function

    +

    Gradient Descent Example

    + +

    Here our simple example

    + + +
    +
    +
    +
    +
    +
    # Importing various packages
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from mpl_toolkits.mplot3d import Axes3D
    +from matplotlib import cm
    +from matplotlib.ticker import LinearLocator, FormatStrFormatter
    +import sys
    +
    +# the number of datapoints
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +# Hessian matrix
    +H = (2.0/n)* X.T @ X
    +# Get the eigenvalues
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
    +print(theta_linreg)
    +theta = np.random.randn(2,1)
    +
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +for iter in range(Niterations):
    +    gradient = (2.0/n)*X.T @ (X @ theta-y)
    +    theta -= eta*gradient
    +
    +print(theta)
    +xnew = np.array([[0],[2]])
    +xbnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = xbnew.dot(theta)
    +ypredict2 = xbnew.dot(theta_linreg)
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Gradient descent example')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    We could now define a new cost function to minimize, namely the negative logarithm of the above PDF

    +
    +

    Gradient descent and Ridge

    +

    We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \theta \),

     
    $$ -C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}, +C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0. $$

     
    -

    which becomes

    +

    In order to minimize \( C_{\text{ridge}}(\theta) \) using GD we adjust the gradient as follows

     
    $$ -C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}. +\nabla_\theta C_{\text{ridge}}(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\ +\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\ +\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta). $$

     
    -

    Taking the derivative of the new cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely

    - +

    We can easily extend our program to minimize \( C_{\text{ridge}}(\theta) \) using gradient descent and compare with the analytical solution given by

     
    $$ -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0, +\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}. $$

     
    +

    -

    which leads to the well-known OLS equation for the optimal paramters \( \beta \)

    +
    +

    The Hessian matrix for Ridge Regression

    +

    The Hessian matrix of Ridge Regression for our simple example is given by

     
    $$ -\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}! +\boldsymbol{H} \equiv \begin{bmatrix} +\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ +\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ +\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}. $$

     
    -

    Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics.

    +

    This implies that the Hessian matrix is positive definite, hence the stationary point is a +minimum. +Note that the Ridge cost function is convex being a sum of two convex +functions. Therefore, the stationary point is a global +minimum of this function. +

    -

    More basic Statistics and Bayes' theorem

    +

    Program example for gradient descent with Ridge Regression

    -

    A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry. -Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics. -

    - -

    Assume we have two domains of events \( X=[x_0,x_1,\dots,x_{n-1}] \) and \( Y=[y_0,y_1,\dots,y_{n-1}] \).

    - -

    We define also the likelihood for \( X \) and \( Y \) as \( p(X) \) and \( p(Y) \) respectively. -The likelihood of a specific event \( x_i \) (or \( y_i \)) is then written as \( p(X=x_i) \) or just \( p(x_i)=p_i \). -

    - -
    -Union of events is given by -

    -

     
    -$$ -p(X \cup Y)= p(X)+p(Y)-p(X \cap Y). -$$ -

     
    + +

    +
    +
    +
    +
    +
    from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from mpl_toolkits.mplot3d import Axes3D
    +from matplotlib import cm
    +from matplotlib.ticker import LinearLocator, FormatStrFormatter
    +import sys
    +
    +# the number of datapoints
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +
    +#Ridge parameter lambda
    +lmbda  = 0.001
    +Id = n*lmbda* np.eye(XT_X.shape[0])
    +
    +# Hessian matrix
    +H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
    +# Get the eigenvalues
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +
    +theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
    +print(theta_linreg)
    +# Start plain gradient descent
    +theta = np.random.randn(2,1)
    +
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta
    +    theta -= eta*gradients
    +
    +print(theta)
    +ypredict = X @ theta
    +ypredict2 = X @ theta_linreg
    +plt.plot(x, ypredict, "r-")
    +plt.plot(x, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Gradient descent example for Ridge')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +

    Using gradient descent methods, limitations

    -
    -The product rule (aka joint probability) is given by -

    -

     
    -$$ -p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X), -$$ -

     
    +

      +

    • Gradient descent (GD) finds local minima of our function. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
    • +

    • GD is sensitive to initial conditions. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
    • +

    • Gradients are computationally expensive to calculate for large datasets. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over all \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called "mini batches". This has the added benefit of introducing stochasticity into our algorithm.
    • +

    • GD is very sensitive to choices of learning rates. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would adaptively choose the learning rates to match the landscape.
    • +

    • GD treats all directions in parameter space uniformly. Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.
    • +

    • GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
    • +
    +
    -

    where we read \( p(X\vert Y) \) as the likelihood of obtaining \( X \) given \( Y \).

    - +
    +

    Momentum based GD

    -

    If we have independent events then \( p(X,Y)=p(X)p(Y) \).

    +

    We discuss here some simple examples where we introduce what is called +'memory'about previous steps, or what is normally called momentum +gradient descent. +For the mathematical details, see whiteboad notes from lecture on September 8, 2025. +

    -

    Marginal Probability

    +

    Improving gradient descent with momentum

    -

    The marginal probability is defined in terms of only one of the set of variables \( X,Y \). For a discrete probability we have

    -
    - -

    -

     
    -$$ -p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i). -$$ -

     
    + + +

    +
    +
    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# take a step
    +		solution = solution - step_size * gradient
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# perform the gradient descent search
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    Conditional Probability

    +

    Same code but now with momentum gradient descent

    -

    The conditional probability, if \( p(Y) > 0 \), is

    -
    - -

    -

     
    -$$ -p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}. -$$ -

     
    + + +

    +
    +
    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# keep track of the change
    +	change = 0.0
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# calculate update
    +		new_change = step_size * gradient + momentum * change
    +		# take a step
    +		solution = solution - new_change
    +		# save the change
    +		change = new_change
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# define momentum
    +momentum = 0.3
    +# perform the gradient descent search with momentum
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    Bayes' Theorem

    - -

    If we combine the conditional probability with the marginal probability and the standard product rule, we have

    -

     
    -$$ -p(X\vert Y)= \frac{p(X,Y)}{p(Y)}, -$$ -

     
    +

    Overview video on Stochastic Gradient Descent (SGD)

    -

    which we can rewrite as

    - -

     
    -$$ -p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}, -$$ -

     
    +What is Stochastic Gradient Descent +

    There are several reasons for using stochastic gradient descent. Some of these are:

    -

    which is Bayes' theorem. It allows us to evaluate the uncertainty in in \( X \) after we have observed \( Y \). We can easily interchange \( X \) with \( Y \).

    +
      +

    1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.
    2. +

    3. Hopefully avoid Local Minima
    4. +

    5. Memory Usage: Requires less memory compared to computing gradients for the entire dataset.
    6. +
    -

    Interpretations of Bayes' Theorem

    +

    Batches and mini-batches

    + +

    In gradient descent we compute the cost function and its gradient for all data points we have.

    -

    The quantity \( p(Y\vert X) \) on the right-hand side of the theorem is -evaluated for the observed data \( Y \) and can be viewed as a function of -the parameter space represented by \( X \). This function is not -necesseraly normalized and is normally called the likelihood function. +

    In large-scale applications such as the ILSVRC challenge, the +training data can have on order of millions of examples. Hence, it +seems wasteful to compute the full cost function over the entire +training set in order to perform only a single parameter update. A +very common approach to addressing this challenge is to compute the +gradient over batches of the training data. For example, a typical batch could contain some thousand examples from +an entire training set of several millions. This batch is then used to +perform a parameter update.

    +
    -

    The function \( p(X) \) on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.

    +
    +

    Pros and cons

    -

    Let us try to illustrate Bayes' theorem through an example.

    +
      +

    1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.
    2. +

    3. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.
    4. +

    5. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.
    6. +
    -

    Example of Usage of Bayes' theorem

    +

    Convergence rates

    -

    Let us suppose that you are undergoing a series of mammography scans in -order to rule out possible breast cancer cases. We define the -sensitivity for a positive event by the variable \( X \). It takes binary -values with \( X=1 \) representing a positive event and \( X=0 \) being a -negative event. We reserve \( Y \) as a classification parameter for -either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a positive thing). -

    +
      +

    1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.
    2. +

    3. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.
    4. +
    +
    -

    We let \( Y=1 \) represent the the case of having breast cancer and \( Y=0 \) as not.

    +
    +

    Accuracy

    -

    Let us assume that if you have breast cancer, the test will be positive with a probability of \( 0.8 \), that is we have

    +

    In general, stochastic Gradient Descent is Less accurate than gradient +descent, as it calculates the gradient on single examples, which may +not accurately represent the overall dataset. Gradient Descent is +more accurate because it uses the average gradient calculated over the +entire dataset. +

    -

     
    -$$ -p(X=1\vert Y=1) =0.8. -$$ -

     
    +

    There are other disadvantages to using SGD. The main drawback is that +its convergence behaviour can be more erratic due to the random +sampling of individual training examples. This can lead to less +accurate results, as the algorithm may not converge to the true +minimum of the cost function. Additionally, the learning rate, which +determines the step size of each update to the model’s parameters, +must be carefully chosen to ensure convergence. +

    -

    This obviously sounds scary since many would conclude that if the test is positive, there is a likelihood of \( 80\% \) for having cancer. -It is however not correct, as the following Bayesian analysis shows. +

    It is however the method of choice in deep learning algorithms where +SGD is often used in combination with other optimization techniques, +such as momentum or adaptive learning rates

    -

    Doing it correctly

    +

    Stochastic Gradient Descent (SGD)

    + +

    In stochastic gradient descent, the extreme case is the case where we +have only one batch, that is we include the whole data set. +

    + +

    This process is called Stochastic Gradient +Descent (SGD) (or also sometimes on-line gradient descent). This is +relatively less common to see because in practice due to vectorized +code optimizations it can be computationally much more efficient to +evaluate the gradient for 100 examples, than the gradient for one +example 100 times. Even though SGD technically refers to using a +single example at a time to evaluate the gradient, you will hear +people use the term SGD even when referring to mini-batch gradient +descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD +for “Batch gradient descent” are rare to see), where it is usually +assumed that mini-batches are used. The size of the mini-batch is a +hyperparameter but it is not very common to cross-validate or bootstrap it. It is +usually based on memory constraints (if any), or set to some value, +e.g. 32, 64 or 128. We use powers of 2 in practice because many +vectorized operation implementations work faster when their inputs are +sized in powers of 2. +

    + +

    In our notes with SGD we mean stochastic gradient descent with mini-batches.

    +
    + +
    +

    Stochastic Gradient Descent

    -

    If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number. -Let us assume that the prior probability in the population as a whole is +

    Stochastic gradient descent (SGD) and variants thereof address some of +the shortcomings of the Gradient descent method discussed above.

    +

    The underlying idea of SGD comes from the observation that the cost +function, which we want to minimize, can almost always be written as a +sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \), +

     
    $$ -p(Y=1) =0.004. +C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i, +\mathbf{\theta}). $$

     
    +

    + +
    +

    Computation of gradients

    -

    We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have

    +

    This in turn means that the gradient can be +computed as a sum over \( i \)-gradients +

     
    $$ -p(X=1\vert Y=0) =0.1. +\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}). $$

     
    -

    Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute

    +

    Stochasticity/randomness is introduced by only taking the +gradient on a subset of the data called minibatches. If there are \( n \) +data points and the size of each minibatch is \( M \), there will be \( n/M \) +minibatches. We denote these minibatches by \( B_k \) where +\( k=1,\cdots,n/M \). +

    +
    + +
    +

    SGD example

    +

    As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) +and we choose to have \( M=5 \) minibathces, +then each minibatch contains two data points. In particular we have +\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 = +(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you +have only a single batch with all data points and on the other extreme, +you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e +\( B_k = \mathbf{x}_k \). +

    +

    The idea is now to approximate the gradient by replacing the sum over +all data points with a sum over the data points in one the minibatches +picked at random in each gradient descent step +

     
    $$ -p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031. +\nabla_{\theta} +C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta +c_i(\mathbf{x}_i, \mathbf{\theta}). $$

     
    - -

    That is, in case of a positive test, there is only a \( 3\% \) chance of having breast cancer!

    -

    Bayes' Theorem and Ridge and Lasso Regression

    +

    The gradient step

    -

    Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.

    - -

    For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)

    +

    Thus a gradient descent step now looks like

     
    $$ -\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})], +\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}) $$

     
    -

    is given by

    -

     
    -$$ -p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. -$$ -

     
    +

    where \( k \) is picked at random with equal +probability from \( [1,n/M] \). An iteration over the number of +minibathces (n/M) is commonly referred to as an epoch. Thus it is +typical to choose a number of epochs and for each epoch iterate over +the number of minibatches, as exemplified in the code below. +

    +
    -

    In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)? That is, how can we define the posterior probability

    +
    +

    Simple example code

    -

     
    -$$ -p(\boldsymbol{\beta}\vert\boldsymbol{D}). -$$ -

     
    -

    Bayes' theorem comes to our rescue here since (omitting the normalization constant)

    -

     
    -$$ -p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}). -$$ -

     
    + +

    +
    +
    +
    +
    +
    import numpy as np 
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 10 #number of epochs
    +
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for 
    +        j += 1
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the prior \( p(\boldsymbol{\beta}) \)!

    +

    Taking the gradient only on a subset of the data has two important +benefits. First, it introduces randomness which decreases the chance +that our opmization scheme gets stuck in a local minima. Second, if +the size of the minibatches are small relative to the number of +datapoints (\( M < n \)), the computation of the gradient is much +cheaper since we sum over the datapoints in the \( k-th \) minibatch and not +all \( n \) datapoints. +

    -

    Ridge and Bayes

    - -

    With the posterior probability defined by a likelihood which we have -already modeled and an unknown prior, we are now ready to make -additional models for the prior. +

    When do we stop?

    + +

    A natural question is when do we stop the search for a new minimum? +One possibility is to compute the full gradient after a given number +of epochs and check if the norm of the gradient is smaller than some +threshold and stop if true. However, the condition that the gradient +is zero is valid also for local minima, so this would only tell us +that we are close to a local/global minimum. However, we could also +evaluate the cost function at this point, store the result and +continue the search. If the test kicks in at a later stage we can +compare the values of the cost function and keep the \( \theta \) that +gave the lowest value.

    +
    -

    We can, based on our discussions of the variance of \( \boldsymbol{\beta} \) and the mean value, assume that the prior for the values \( \boldsymbol{\beta} \) is given by a Gaussian with mean value zero and variance \( \tau^2 \), that is

    +
    +

    Slightly different approach

    + +

    Another approach is to let the step length \( \eta_j \) depend on the +number of epochs in such a way that it becomes very small after a +reasonable time such that we do not move at all. Such approaches are +also called scaling. There are many such ways to scale the learning +rate +and discussions here. See +also +https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 +for a discussion of different scaling functions for the learning rate. +

    +
    -

     
    -$$ -p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}. -$$ -

     
    +

    +

    Time decay rate

    -

    Our posterior probability becomes then (omitting the normalization factor which is just a constant)

    -

     
    -$$ -p(\boldsymbol{\beta\vert\boldsymbol{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}. -$$ -

     
    +

    As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function

     
    +$$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ +

     
    goes to zero as the number of epochs gets large. I.e. we start with a step length \( \eta_j (0; t_0, t_1) = t_0/t_1 \) which decays in time \( t \).

    -

    We can now optimize this quantity with respect to \( \boldsymbol{\beta} \). As we -did for OLS, this is most conveniently done by taking the negative -logarithm of the posterior probability. Doing so and leaving out the -constants terms that do not depend on \( \beta \), we have +

    In this way we can fix the number of epochs, compute \( \theta \) and +evaluate the cost function at the end. Repeating the computation will +give a different result since the scheme is random by design. Then we +pick the final \( \theta \) that gives the lowest value of the cost +function.

    -

     
    -$$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\boldsymbol{\beta}\vert\vert_2^2, -$$ -

     
    -

    and replacing \( 1/2\tau^2 \) with \( \lambda \) we have

    + +
    +
    +
    +
    +
    +
    import numpy as np 
    +
    +def step_length(t,t0,t1):
    +    return t0/(t+t1)
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 500 #number of epochs
    +t0 = 1.0
    +t1 = 10
    +
    +eta_j = t0/t1
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for theta
    +        t = epoch*m+i
    +        eta_j = step_length(t,t0,t1)
    +        j += 1
    +
    +print("eta_j after %d epochs: %g" % (n_epochs,eta_j))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +
    +

    Code with a Number of Minibatches which varies

    -

     
    -$$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_2^2, -$$ -

     
    +

    In the code here we vary the number of mini-batches.

    -

    which is our Ridge cost function! Nice, isn't it?

    + +
    +
    +
    +
    +
    +
    # Importing various packages
    +from math import exp, sqrt
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ ((X @ theta)-y)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    Lasso and Bayes

    +

    Replace or not

    + +

    In the above code, we have use replacement in setting up the +mini-batches. The discussion +here may be +useful. +

    +
    -

    To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution (Laplace in this case) with zero mean value, that is

    +
    +

    SGD vs Full-Batch GD: Convergence Speed and Memory Comparison

    +

    Theoretical Convergence Speed and convex optimization

    +

    Consider minimizing an empirical cost function

     
    $$ -p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}. +C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta), $$

     
    -

    Our posterior probability becomes then (omitting the normalization factor which is just a constant)

    +

    where each \( l_i(\theta) \) is a +differentiable loss term. Gradient Descent (GD) updates parameters +using the full gradient \( \nabla C(\theta) \), while Stochastic Gradient +Descent (SGD) uses a single sample (or mini-batch) gradient \( \nabla +l_i(\theta) \) selected at random. In equation form, one GD step is: +

    +

     
    $$ -p(\boldsymbol{\beta}\vert\boldsymbol{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}. +\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t), $$

     
    -

    Taking the negative -logarithm of the posterior probability and leaving out the -constants terms that do not depend on \( \beta \), we have -

    +

    whereas one SGD step is:

     
    $$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\boldsymbol{\beta}\vert\vert_1, +\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t), $$

     
    -

    and replacing \( 1/\tau \) with \( \lambda \) we have

    +

    with \( i_t \) randomly chosen. On smooth convex problems, GD and SGD both +converge to the global minimum, but their rates differ. GD can take +larger, more stable steps since it uses the exact gradient, achieving +an error that decreases on the order of \( O(1/t) \) per iteration for +convex objectives (and even exponentially fast for strongly convex +cases). In contrast, plain SGD has more variance in each step, leading +to sublinear convergence in expectation – typically \( O(1/\sqrt{t}) \) +for general convex objectives (\thetaith appropriate diminishing step +sizes) . Intuitively, GD’s trajectory is smoother and more +predictable, while SGD’s path oscillates due to noise but costs far +less per iteration, enabling many more updates in the same time. +

    +

    Strongly Convex Case

    +

    If \( C(\theta) \) is strongly convex and \( L \)-smooth (so GD enjoys linear +convergence), the gap \( C(\theta_t)-C(\theta^*) \) for GD shrinks as +

     
    $$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1, +C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)], $$

     
    -

    which is our Lasso cost function!

    -
    +

    a geometric (linear) convergence per iteration . Achieving an +\( \epsilon \)-accurate solution thus takes on the order of +\( \log(1/\epsilon) \) iterations for GD. However, each GD iteration costs +\( O(N) \) gradient evaluations. SGD cannot exploit strong convexity to +obtain a linear rate – instead, with a properly decaying step size +(e.g. \( \eta_t = \frac{1}{\mu t} \)) or iterate averaging, SGD attains an +\( O(1/t) \) convergence rate in expectation . For example, one result +of Moulines and Bach 2011, see https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html shows that with \( \eta_t = \Theta(1/t) \), +

    +

     
    +$$ +\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t), +$$ +

     
    -

    -

    Why resampling methods

    +

    for strongly convex, smooth \( F \) . This \( 1/t \) rate is slower per +iteration than GD’s exponential decay, but each SGD iteration is \( N \) +times cheaper. In fact, to reach error \( \epsilon \), plain SGD needs on +the order of \( T=O(1/\epsilon) \) iterations (sub-linear convergence), +while GD needs \( O(\log(1/\epsilon)) \) iterations. When accounting for +cost-per-iteration, GD requires \( O(N \log(1/\epsilon)) \) total gradient +computations versus SGD’s \( O(1/\epsilon) \) single-sample +computations. In large-scale regimes (huge \( N \)), SGD can be +faster in wall-clock time because \( N \log(1/\epsilon) \) may far exceed +\( 1/\epsilon \) for reasonable accuracy levels. In other words, +with millions of data points, one epoch of GD (one full gradient) is +extremely costly, whereas SGD can make \( N \) cheap updates in the time +GD makes one – often yielding a good solution faster in practice, even +though SGD’s asymptotic error decays more slowly. As one lecture +succinctly puts it: “SGD can be super effective in terms of iteration +cost and memory, but SGD is slow to converge and can’t adapt to strong +convexity” . Thus, the break-even point depends on \( N \) and the desired +accuracy: for moderate accuracy on very large \( N \), SGD’s cheaper +updates win; for extremely high precision (very small \( \epsilon \)) on a +modest \( N \), GD’s fast convergence per step can be advantageous. +

    +

    Non-Convex Problems

    -

    Before we proceed, we need to rethink what we have been doing. In our -eager to fit the data, we have omitted several important elements in -our regression analysis. In what follows we will +

    In non-convex optimization (e.g. deep neural networks), neither GD nor +SGD guarantees global minima, but SGD often displays faster progress +in finding useful minima. Theoretical results here are weaker, usually +showing convergence to a stationary point \( \theta \) (\( |\nabla C| \) is +small) in expectation. For example, GD might require \( O(1/\epsilon^2) \) +iterations to ensure \( |\nabla C(\theta)| < \epsilon \), and SGD typically has +similar polynomial complexity (often worse due to gradient +noise). However, a noteworthy difference is that SGD’s stochasticity +can help escape saddle points or poor local minima. Random gradient +fluctuations act like implicit noise, helping the iterate “jump” out +of flat saddle regions where full-batch GD could stagnate . In fact, +research has shown that adding noise to GD can guarantee escaping +saddle points in polynomial time, and the inherent noise in SGD often +serves this role. Empirically, this means SGD can sometimes find a +lower loss basin faster, whereas full-batch GD might get “stuck” near +saddle points or need a very small learning rate to navigate complex +error surfaces . Overall, in modern high-dimensional machine learning, +SGD (or mini-batch SGD) is the workhorse for large non-convex problems +because it converges to good solutions much faster in practice, +despite the lack of a linear convergence guarantee. Full-batch GD is +rarely used on large neural networks, as it would require tiny steps +to avoid divergence and is extremely slow per iteration .

    -
      -

    1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff
    2. -

    3. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more
    4. -
    -

    -

    and discuss how to select a given model (one of the difficult parts in machine learning).

    -

    Resampling methods

    -
    - -

    -

    Resampling methods are an indispensable tool in modern -statistics. They involve repeatedly drawing samples from a training -set and refitting a model of interest on each sample in order to -obtain additional information about the fitted model. For example, in -order to estimate the variability of a linear regression fit, we can -repeatedly draw different samples from the training data, fit a linear -regression to each new sample, and then examine the extent to which -the resulting fits differ. Such an approach may allow us to obtain -information that would not be available from fitting the model only -once using the original training sample. -

    - -

    Two resampling methods are often used in Machine Learning analyses,

    -
      -

    1. The bootstrap method
    2. -

    3. and Cross-Validation
    4. -
    -

    -

    In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular -cross-validation and the bootstrap method. +

    Memory Usage and Scalability

    + +

    A major advantage of SGD is its memory efficiency in handling large +datasets. Full-batch GD requires access to the entire training set for +each iteration, which often means the whole dataset (or a large +subset) must reside in memory to compute \( \nabla C(\theta) \) . This results +in memory usage that scales linearly with the dataset size \( N \). For +instance, if each training sample is large (e.g. high-dimensional +features), computing a full gradient may require storing a substantial +portion of the data or all intermediate gradients until they are +aggregated. In contrast, SGD needs only a single (or a small +mini-batch of) training example(s) in memory at any time . The +algorithm processes one sample (or mini-batch) at a time and +immediately updates the model, discarding that sample before moving to +the next. This streaming approach means that memory footprint is +essentially independent of \( N \) (apart from storing the model +parameters themselves). As one source notes, gradient descent +“requires more memory than SGD” because it “must store the entire +dataset for each iteration,” whereas SGD “only needs to store the +current training example” . In practical terms, if you have a dataset +of size, say, 1 million examples, full-batch GD would need memory for +all million every step, while SGD could be implemented to load just +one example at a time – a crucial benefit if data are too large to fit +in RAM or GPU memory. This scalability makes SGD suitable for +large-scale learning: as long as you can stream data from disk, SGD +can handle arbitrarily large datasets with fixed memory. In fact, SGD +“does not need to remember which examples were visited” in the past, +allowing it to run in an online fashion on infinite data streams +. Full-batch GD, on the other hand, would require multiple passes +through a giant dataset per update (or a complex distributed memory +system), which is often infeasible. +

    + +

    There is also a secondary memory effect: computing a full-batch +gradient in deep learning requires storing all intermediate +activations for backpropagation across the entire batch. A very large +batch (approaching the full dataset) might exhaust GPU memory due to +the need to hold activation gradients for thousands or millions of +examples simultaneously. SGD/minibatches mitigate this by splitting +the workload – e.g. with a mini-batch of size 32 or 256, memory use +stays bounded, whereas a full-batch (size = \( N \)) forward/backward pass +could not even be executed if \( N \) is huge. Techniques like gradient +accumulation exist to simulate large-batch GD by summing many +small-batch gradients – but these still process data in manageable +chunks to avoid memory overflow. In summary, memory complexity for GD +grows with \( N \), while for SGD it remains \( O(1) \) w.r.t. dataset size +(only the model and perhaps a mini-batch reside in memory) . This is a +key reason why batch GD “does not scale” to very large data and why +virtually all large-scale machine learning algorithms rely on +stochastic or mini-batch methods.

    -
    -

    Resampling approaches can be computationally expensive

    -
    - -

    - -

    Resampling approaches can be computationally expensive, because they -involve fitting the same statistical method multiple times using -different subsets of the training data. However, due to recent -advances in computing power, the computational requirements of -resampling methods generally are not prohibitive. In this chapter, we -discuss two of the most commonly used resampling methods, -cross-validation and the bootstrap. Both methods are important tools -in the practical application of many statistical learning -procedures. For example, cross-validation can be used to estimate the -test error associated with a given statistical learning method in -order to evaluate its performance, or to select the appropriate level -of flexibility. The process of evaluating a model’s performance is -known as model assessment, whereas the process of selecting the proper -level of flexibility for a model is known as model selection. The -bootstrap is widely used. +

    Empirical Evidence: Convergence Time and Memory in Practice

    + +

    Empirical studies strongly support the theoretical trade-offs +above. In large-scale machine learning tasks, SGD often converges to a +good solution much faster in wall-clock time than full-batch GD, and +it uses far less memory. For example, Bottou & Bousquet (2008) +analyzed learning time under a fixed computational budget and +concluded that when data is abundant, it’s better to use a faster +(even if less precise) optimization method to process more examples in +the same time . This analysis showed that for large-scale problems, +processing more data with SGD yields lower error than spending the +time to do exact (batch) optimization on fewer data . In other words, +if you have a time budget, it’s often optimal to accept slightly +slower convergence per step (as with SGD) in exchange for being able +to use many more training samples in that time. This phenomenon is +borne out by experiments: +

    +

    Deep Neural Networks

    + +

    In modern deep learning, full-batch GD is so slow that it is rarely +attempted; instead, mini-batch SGD is standard. A recent study +demonstrated that it is possible to train a ResNet-50 on ImageNet +using full-batch gradient descent, but it required careful tuning +(e.g. gradient clipping, tiny learning rates) and vast computational +resources – and even then, each full-batch update was extremely +expensive. +

    + +

    Using a huge batch +(closer to full GD) tends to slow down convergence if the learning +rate is not scaled up, and often encounters optimization difficulties +(plateaus) that small batches avoid. +Empirically, small or medium +batch SGD finds minima in fewer clock hours because it can rapidly +loop over the data with gradient noise aiding exploration. +

    +

    Memory constraints

    + +

    From a memory standpoint, practitioners note that batch GD becomes +infeasible on large data. For example, if one tried to do full-batch +training on a dataset that doesn’t fit in RAM or GPU memory, the +program would resort to heavy disk I/O or simply crash. SGD +circumvents this by processing mini-batches. Even in cases where data +does fit in memory, using a full batch can spike memory usage due to +storing all gradients. One empirical observation is that mini-batch +training has a “lower, fluctuating usage pattern” of memory, whereas +full-batch loading “quickly consumes memory (often exceeding limits)” +. This is especially relevant for graph neural networks or other +models where a “batch” may include a huge chunk of a graph: full-batch +gradient computation can exhaust GPU memory, whereas mini-batch +methods keep memory usage manageable . +

    + +

    In summary, SGD converges faster than full-batch GD in terms of actual +training time for large-scale problems, provided we measure +convergence as reaching a good-enough solution. Theoretical bounds +show SGD needs more iterations, but because it performs many more +updates per unit time (and requires far less memory), it often +achieves lower loss in a given time frame than GD. Full-batch GD might +take slightly fewer iterations in theory, but each iteration is so +costly that it is “slower… especially for large datasets” . Meanwhile, +memory scaling strongly favors SGD: GD’s memory cost grows with +dataset size, making it impractical beyond a point, whereas SGD’s +memory use is modest and mostly constant w.r.t. \( N \) . These +differences have made SGD (and mini-batch variants) the de facto +choice for training large machine learning models, from logistic +regression on millions of examples to deep neural networks with +billions of parameters. The consensus in both research and practice is +that for large-scale or high-dimensional tasks, SGD-type methods +converge quicker per unit of computation and handle memory constraints +better than standard full-batch gradient descent .

    -
    -

    Why resampling methods ?

    -
    -Statistical analysis -

    - -

      -

    • Our simulations can be treated as computer experiments. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
    • -

    • The results can be analysed with the same statistical tools as we would use when analysing experimental data.
    • -

    • As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
    • -
    -
    +

    Second moment of the gradient

    + +

    In stochastic gradient descent, with and without momentum, we still +have to specify a schedule for tuning the learning rates \( \eta_t \) +as a function of time. As discussed in the context of Newton's +method, this presents a number of dilemmas. The learning rate is +limited by the steepest direction which can change depending on the +current position in the landscape. To circumvent this problem, ideally +our algorithm would keep track of curvature and take large steps in +shallow, flat directions and small steps in steep, narrow directions. +Second-order methods accomplish this by calculating or approximating +the Hessian and normalizing the learning rate by the +curvature. However, this is very computationally expensive for +extremely large models. Ideally, we would like to be able to +adaptively change the step size to match the landscape without paying +the steep computational price of calculating or approximating +Hessians. +

    + +

    During the last decade a number of methods have been introduced that accomplish +this by tracking not only the gradient, but also the second moment of +the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and +ADAM. +

    -

    Statistical analysis

    -
    - -

    - -

      -

    • As in other experiments, many numerical experiments have two classes of errors:
    • -
        - -

      • Statistical errors
      • - -

      • Systematical errors
      • -
      +

      Challenge: Choosing a Fixed Learning Rate

      +

      A fixed \( \eta \) is hard to get right:

      +
        +

      1. If \( \eta \) is too large, the updates can overshoot the minimum, causing oscillations or divergence
      2. +

      3. If \( \eta \) is too small, convergence is very slow (many iterations to make progress)
      4. +

      -

    • Statistical errors can be estimated using standard tools from statistics
    • -

    • Systematical errors are method specific and must be treated differently from case to case.
    • -
    -
    +

    In practice, one often uses trial-and-error or schedules (decaying \( \eta \) over time) to find a workable balance. +For a function with steep directions and flat directions, a single global \( \eta \) may be inappropriate: +

    +
      +

    1. Steep coordinates require a smaller step size to avoid oscillation.
    2. +

    3. Flat/shallow coordinates could use a larger step to speed up progress.
    4. +

    5. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature.
    6. +
    -

    Resampling methods

    - -

    With all these analytical equations for both the OLS and Ridge -regression, we will now outline how to assess a given model. This will -lead to a discussion of the so-called bias-variance tradeoff (see -below) and so-called resampling methods. -

    - -

    One of the quantities we have discussed as a way to measure errors is -the mean-squared error (MSE), mainly used for fitting of continuous -functions. Another choice is the absolute error. -

    +

    Motivation for Adaptive Step Sizes

    -

    In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data, -we discuss the -

      -

    1. prediction error or simply the test error \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the
    2. -

    3. training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.
    4. +

    5. Instead of a fixed global \( \eta \), use an adaptive learning rate for each parameter that depends on the history of gradients.
    6. +

    7. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.
    8. +

    9. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected
    10. +

    11. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates
    12. +

    13. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.

    -

    As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error. -For a certain level of complexity the test error will reach minimum, before starting to increase again. The -training error reaches a saturation. -

    +

    AdaGrad algorithm, taken from Goodfellow et al

    + +

    +
    +

    +
    +

    -

    Resampling methods: Bootstrap

    +

    Derivation of the AdaGrad Algorithm

    +
    - +Accumulating Gradient History

    -

    Bootstrapping is a non-parametric approach to statistical inference -that substitutes computation for more traditional distributional -assumptions and asymptotic results. Bootstrapping offers a number of -advantages: -

      -

    1. The bootstrap is quite general, although there are some cases in which it fails.
    2. - -

    3. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.
    4. - -

    5. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.
    6. -

    7. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
    8. +

    9. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)
    10. +

    11. Let \( g_t = \nabla C_{i_t}(x_t) \) be the gradient at step \( t \) (or a subgradient for nondifferentiable cases).
    12. +

    13. Initialize \( r_0 = 0 \) (an all-zero vector in \( \mathbb{R}^d \)).
    14. +

    15. At each iteration \( t \), update the accumulation:
    -
    - -

    The textbook by Davison on the Bootstrap Methods and their Applications provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by Efron and Tibshirani.

    - -

    Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called central limit theorem.

    -
    - -
    -

    The Central Limit Theorem

    - -

    Suppose we have a PDF \( p(x) \) from which we generate a series \( N \) -of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \) -is viewed as the average of a specific measurement, e.g., throwing -dice 100 times and then taking the average value, or producing a certain -amount of random numbers. -For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion -which follows. We do the same for \( \mathbb{E}[z]=z \). -

    - -

    If we compute the mean \( z \) of \( m \) such mean values \( x_i \)

    +

     
    $$ - z=\frac{x_1+x_2+\dots+x_m}{m}, +r_t = r_{t-1} + g_t \circ g_t, $$

     
    -

    the question we pose is which is the PDF of the new variable \( z \).

    +
      +

    1. Here \( g_t \circ g_t \) denotes element-wise square of the gradient vector. \( g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2 \) for each parameter \( j \).
    2. +

    3. We can view \( H_t = \mathrm{diag}(r_t) \) as a diagonal matrix of past squared gradients. Initially \( H_0 = 0 \).
    4. +
    +
    -

    Finding the Limit

    +

    AdaGrad Update Rule Derivation

    -

    The probability of obtaining an average value \( z \) is the product of the -probabilities of obtaining arbitrary individual mean values \( x_i \), -but with the constraint that the average is \( z \). We can express this through -the following expression -

    +

    We scale the gradient by the inverse square root of the accumulated matrix \( H_t \). The AdaGrad update at step \( t \) is:

     
    $$ - \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m) - \delta(z-\frac{x_1+x_2+\dots+x_m}{m}), +\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t, $$

     
    -

    where the \( \delta \)-function enbodies the constraint that the mean is \( z \). -All measurements that lead to each individual \( x_i \) are expected to -be independent, which in turn means that we can express \( \tilde{p} \) as the -product of individual \( p(x_i) \). The independence assumption is important in the derivation of the central limit theorem. +

    where \( H_t^{-1/2} \) is the diagonal matrix with entries \( (r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2} \) +In coordinates, this means each parameter \( j \) has an individual step size:

    -
    - -
    -

    Rewriting the \( \delta \)-function

    - -

    If we use the integral expression for the \( \delta \)-function

    -

     
    $$ - \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty} - dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)}, + \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}. $$

     
    -

    and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value -we arrive at -

    +

    In practice we add a small constant \( \epsilon \) in the denominator for numerical stability to avoid division by zero:

     
    $$ - \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty} - dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty} - dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m, +\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}. $$

     
    -

    with the integral over \( x \) resulting in

    +

    Equivalently, the effective learning rate for parameter \( j \) at time \( t \) is \( \displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}} \). This decreases over time as \( r_{t,j} \) grows.

    +
    -

     
    -$$ - \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}= - \int_{-\infty}^{\infty}dxp(x) - \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right]. -$$ -

     
    +

    +

    AdaGrad Properties

    + +
      +

    1. AdaGrad automatically tunes the step size for each parameter. Parameters with more volatile or large gradients get smaller steps, and those with small or infrequent gradients get relatively larger steps
    2. +

    3. No manual schedule needed: The accumulation \( r_t \) keeps increasing (or stays the same if gradient is zero), so step sizes \( \eta/\sqrt{r_t} \) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.
    4. +

    5. Sparse data benefit: For very sparse features, \( r_{t,j} \) grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal
    6. +

    7. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem
    8. +
    +

    +

    It effectively reduces the need to tune \( \eta \) by hand.

    +
      +

    1. Limitations: Because \( r_t \) accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)
    2. +
    -

    Identifying Terms

    +

    RMSProp: Adaptive Learning Rates

    -

    The second term on the rhs disappears since this is just the mean and -employing the definition of \( \sigma^2 \) we have +

    Addresses AdaGrad’s diminishing learning rate issue. +Uses a decaying average of squared gradients (instead of a cumulative sum):

     
    $$ - \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}= - 1-\frac{q^2\sigma^2}{2m^2}+\dots, +v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2, $$

     
    -

    resulting in

    +

    with \( \rho \) typically \( 0.9 \) (or \( 0.99 \)).

    +
      +

    1. Update: \( \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t) \).
    2. +

    3. Recent gradients have more weight, so \( v_t \) adapts to the current landscape.
    4. +

    5. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.
    6. +
    +

    +

    RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 – unpublished.)

    +

    RMSProp algorithm, taken from Goodfellow et al

    -

     
    -$$ - \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx - \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m, -$$ -

     
    +

    +

    +

    +
    +

    +
    -

    and in the limit \( m\rightarrow \infty \) we obtain

    +
    +

    Adam Optimizer

    -

     
    -$$ - \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})} - \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)}, -$$ -

     
    +

    Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.

    -

    which is the normal distribution with variance -\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \) -and \( \mu \) is also the mean of the PDF \( p(x) \). -

    +
      +

    1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
    2. +

    3. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
    4. +

    5. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
    6. +

    7. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
    8. +
    +

    +

    Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.

    -

    Wrapping it up

    +

    ADAM optimizer

    -

    Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of -the average of \( m \) random values corresponding to a PDF \( p(x) \) -is a normal distribution whose mean is the -mean value of the PDF \( p(x) \) and whose variance is the variance -of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \). +

    In ADAM, we keep a running average of +both the first and second moment of the gradient and use this +information to adaptively change the learning rate for different +parameters. The method is efficient when working with large +problems involving lots data and/or parameters. It is a combination of the +gradient descent with momentum algorithm and the RMSprop algorithm +discussed above.

    +
    -

    The central limit theorem leads to the well-known expression for the -standard deviation, given by -

    +
    +

    Why Combine Momentum and RMSProp?

    + +
      +

    1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
    2. +

    3. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
    4. +

    5. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
    6. +

    7. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
    8. +
    +

    +

    Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice

    +
    +
    +

    Adam: Exponential Moving Averages (Moments)

    +

    Adam maintains two moving averages at each time step \( t \) for each parameter \( w \):

    +
    +First moment (mean) \( m_t \) +

    +

    The Momentum term

     
    $$ - \sigma_m= -\frac{\sigma}{\sqrt{m}}. +m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t), $$

     
    +

    -

    The latter is true only if the average value is known exactly. This is obtained in the limit -\( m\rightarrow \infty \) only. Because the mean and the variance are measured quantities we obtain -the familiar expression in statistics (the so-called Bessel correction) -

    +
    +Second moment (uncentered variance) \( v_t \) +

    +

    The RMS term

     
    $$ - \sigma_m\approx -\frac{\sigma}{\sqrt{m-1}}. +v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2, $$

     
    -

    In many cases however the above estimate for the standard deviation, -in particular if correlations are strong, may be too simplistic. Keep -in mind that we have assumed that the variables \( x \) are independent -and identically distributed. This is obviously not always the -case. For example, the random numbers (or better pseudorandom numbers) -we generate in various calculations do always exhibit some -correlations. -

    +

    with typical \( \beta_1 = 0.9 \), \( \beta_2 = 0.999 \). Initialize \( m_0 = 0 \), \( v_0 = 0 \).

    +
    -

    The theorem is satisfied by a large class of PDFs. Note however that for a -finite \( m \), it is not always possible to find a closed form /analytic expression for -\( \tilde{p}(x) \). -

    +

    These are biased estimators of the true first and second moment of the gradients, especially at the start (since \( m_0,v_0 \) are zero)

    -

    Confidence Intervals

    +

    Adam: Bias Correction

    +

    To counteract initialization bias in \( m_t, v_t \), Adam computes bias-corrected estimates

    +

     
    +$$ +\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. +$$ +

     
    -

    Confidence intervals are used in statistics and represent a type of estimate -computed from the observed data. This gives a range of values for an -unknown parameter such as the parameters \( \boldsymbol{\beta} \) from linear regression. -

    - -

    With the OLS expressions for the parameters \( \boldsymbol{\beta} \) we found -\( \mathbb{E}(\boldsymbol{\beta}) = \boldsymbol{\beta} \), which means that the estimator of the regression parameters is unbiased. -

    - -

    In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is -\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \). -

    - -

    This quantity can be used to -construct a confidence interval for the estimates. -

    -
    +
      +

    • When \( t \) is small, \( 1-\beta_i^t \approx 0 \), so \( \hat{m}_t, \hat{v}_t \) significantly larger than raw \( m_t, v_t \), compensating for the initial zero bias.
    • +

    • As \( t \) increases, \( 1-\beta_i^t \to 1 \), and \( \hat{m}_t, \hat{v}_t \) converge to \( m_t, v_t \).
    • +

    • Bias correction is important for Adam’s stability in early iterations
    • +
    +
    -

    Standard Approach based on the Normal Distribution

    - -

    We will assume that the parameters \( \beta \) follow a normal -distribution. We can then define the confidence interval. Here we will be using as -shorthands \( \mu_{\beta} \) for the above mean value and \( \sigma_{\beta} \) -for the standard deviation. We have then a confidence interval -

    - +

    Adam: Update Rule Derivation

    +

    Finally, Adam updates parameters using the bias-corrected moments:

     
    $$ -\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right), +\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t, $$

     
    -

    where \( z \) defines the level of certainty (or confidence). For a normal -distribution typical parameters are \( z=2.576 \) which corresponds to a -confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of -\( 95\% \). A confidence level of \( 95\% \) is commonly used and it is -normally referred to as a two-sigmas confidence level, that is we -approximate \( z\approx 2 \). -

    - -

    For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by Davison on the Bootstrap Methods and their Applications

    - -

    In this text you will also find an in-depth discussion of the -Bootstrap method, why it works and various theorems related to it. +

    where \( \epsilon \) is a small constant (e.g. \( 10^{-8} \)) to prevent division by zero. +Breaking it down:

    +
      +

    1. Compute gradient \( \nabla C(\theta_t) \).
    2. +

    3. Update first moment \( m_t \) and second moment \( v_t \) (exponential moving averages).
    4. +

    5. Bias-correct: \( \hat{m}_t = m_t/(1-\beta_1^t) \), \( \; \hat{v}_t = v_t/(1-\beta_2^t) \).
    6. +

    7. Compute step: \( \Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \).
    8. +

    9. Update parameters: \( \theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t \).
    10. +
    +

    +

    This is the Adam update rule as given in the original paper.

    -

    Resampling methods: Bootstrap background

    +

    Adam vs. AdaGrad and RMSProp

    -

    Since \( \widehat{\beta} = \widehat{\beta}(\boldsymbol{X}) \) is a function of random variables, -\( \widehat{\beta} \) itself must be a random variable. Thus it has -a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to -estimate \( p(\boldsymbol{t}) \) by the relative frequency of -\( \widehat{\beta} \). You can think of this as using a histogram -in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely -resembles \( p(\vec{t}) \), then using numerics, it is straight forward to -estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point -estimators. -

    +
      +

    1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)
    2. +

    3. RMSProp: Uses moving average of squared gradients (like Adam’s \( v_t \)) to maintain adaptive learning rates, but does not include momentum or bias-correction.
    4. +

    5. Adam: Effectively RMSProp + Momentum + Bias-correction
    6. +
        + +

      • Momentum (\( m_t \)) provides acceleration and smoother convergence.
      • + +

      • Adaptive \( v_t \) scaling moderates the step size per dimension.
      • + +

      • Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.
      • +
      +

      +

    +

    +

    In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone

    -

    Resampling methods: More Bootstrap background

    +

    Adaptivity Across Dimensions

    -

    In the case that \( \widehat{\beta} \) has -more than one component, and the components are independent, we use the -same estimator on each component separately. If the probability -density function of \( X_i \), \( p(x) \), had been known, then it would have -been straightforward to do this by: -

      -

    1. Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).
    2. -

    3. Then using these numbers, we could compute a replica of \( \widehat{\beta} \) called \( \widehat{\beta}^* \).
    4. +

    5. Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.
    6. +

    7. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.
    8. +

    9. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.

    -

    By repeated use of the above two points, many -estimates of \( \widehat{\beta} \) can be obtained. The -idea is to use the relative frequency of \( \widehat{\beta}^* \) -(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \). -

    +

    ADAM algorithm, taken from Goodfellow et al

    + +

    +
    +

    +
    +

    -

    Resampling methods: Bootstrap approach

    +

    Algorithms and codes for Adagrad, RMSprop and Adam

    -

    But -unless there is enough information available about the process that -generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general -unknown. Therefore, Efron in 1979 asked the -question: What if we replace \( p(x) \) by the relative frequency -of the observation \( X_i \)? -

    +

    The algorithms we have implemented are well described in the text by Goodfellow, Bengio and Courville, chapter 8.

    -

    If we draw observations in accordance with -the relative frequency of the observations, will we obtain the same -result in some asymptotic sense? The answer is yes. -

    +

    The codes which implement these algorithms are discussed below here.

    -

    Resampling methods: Bootstrap steps

    - -

    The independent bootstrap works like this:

    +

    Practical tips

    -
      -

    1. Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).
    2. -

    3. Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).
    4. -

    5. Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\beta}^* \) by evaluating \( \widehat \beta \) under the observations \( \boldsymbol{x}^* \).
    6. -

    7. Repeat this process \( k \) times.
    8. -
    -

    -

    When you are done, you can draw a histogram of the relative frequency -of \( \widehat \beta^* \). This is your estimate of the probability -distribution \( p(t) \). Using this probability distribution you can -estimate any statistics thereof. In principle you never draw the -histogram of the relative frequency of \( \widehat{\beta}^* \). Instead -you use the estimators corresponding to the statistic of interest. For -example, if you are interested in estimating the variance of \( \widehat -\beta \), apply the etsimator \( \widehat \sigma^2 \) to the values -\( \widehat \beta^* \). -

    +
      +

    • Randomize the data when making mini-batches. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
    • +

    • Transform your inputs. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
    • +

    • Monitor the out-of-sample performance. Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This early stopping significantly improves performance in many settings.
    • +

    • Adaptive optimization methods don't always have good generalization. Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
    • +
    -

    Code example for the Bootstrap method

    +

    Sneaking in automatic differentiation using Autograd

    + +

    In the examples here we take the liberty of sneaking in automatic +differentiation (without having discussed the mathematics). In +project 1 you will write the gradients as discussed above, that is +hard-coding the gradients. By introducing automatic differentiation +via the library autograd, which is now replaced by JAX, we have +more flexibility in setting up alternative cost functions. +

    -

    The following code starts with a Gaussian distribution with mean value -\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data -used in the bootstrap analysis. The bootstrap analysis returns a data -set after a given number of bootstrap operations (as many as we have -data points). This data set consists of estimated mean values for each -bootstrap operation. The histogram generated by the bootstrap method -shows that the distribution for these mean values is also a Gaussian, -centered around the mean value \( \mu=100 \) but with standard deviation -\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in -this case the same as the number of original data points). The value -of the standard deviation is what we expect from the central limit -theorem. +

    The +first example shows results with ordinary leats squares.

    @@ -1142,32 +1783,55 @@

    Code example for the Bootstrap me
    -
    import numpy as np
    -from time import time
    -from scipy.stats import norm
    +  
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
     import matplotlib.pyplot as plt
    -
    -# Returns mean of bootstrap samples 
    -# Bootstrap algorithm
    -def bootstrap(data, datapoints):
    -    t = np.zeros(datapoints)
    -    n = len(data)
    -    # non-parametric bootstrap         
    -    for i in range(datapoints):
    -        t[i] = np.mean(data[np.random.randint(0,n,n)])
    -    # analysis    
    -    print("Bootstrap Statistics :")
    -    print("original           bias      std. error")
    -    print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
    -    return t
    -
    -# We set the mean value to 100 and the standard deviation to 15
    -mu, sigma = 100, 15
    -datapoints = 10000
    -# We generate random numbers according to the normal distribution
    -x = mu + sigma*np.random.randn(datapoints)
    -# bootstrap returns the data sample                                    
    -t = bootstrap(x, datapoints)
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
     
    @@ -1182,12 +1846,10 @@

    Code example for the Bootstrap me

    - -

    We see that our new variance and from that the standard deviation, agrees with the central limit theorem.

    -

    Plotting the Histogram

    +

    Same code but now with momentum gradient descent

    @@ -1195,15 +1857,59 @@

    Plotting the Histogram

    -
    # the histogram of the bootstrapped data (normalized data if density = True)
    -n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)
    -# add a 'best fit' line  
    -y = norm.pdf(binsboot, np.mean(t), np.std(t))
    -lt = plt.plot(binsboot, y, 'b', linewidth=1)
    -plt.xlabel('x')
    -plt.ylabel('Probability')
    -plt.grid(True)
    -plt.show()
    +  
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x#+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 30
    +
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd")
    +print(theta)
    +
    +# Now improve with momentum gradient descent
    +change = 0.0
    +delta_momentum = 0.3
    +for iter in range(Niterations):
    +    # calculate gradient
    +    gradients = training_gradient(theta)
    +    # calculate update
    +    new_change = eta*gradients+delta_momentum*change
    +    # take a step
    +    theta -= new_change
    +    # save the change
    +    change = new_change
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd wth momentum")
    +print(theta)
     
    @@ -1221,90 +1927,13 @@

    Plotting the Histogram

    -

    The bias-variance tradeoff

    +

    Including Stochastic Gradient Descent with Autograd

    -

    We will discuss the bias-variance tradeoff in the context of -continuous predictions such as regression. However, many of the -intuitions and ideas discussed here also carry over to classification -tasks. Consider a dataset \( \mathcal{D} \) consisting of the data -\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). +

    In this code we include the stochastic gradient descent approach +discussed above. Note here that we specify which argument we are +taking the derivative with respect to when using autograd.

    -

    Let us assume that the true data is generated from a noisy model

    - -

     
    -$$ -\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon} -$$ -

     
    - -

    where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).

    - -

    In our derivation of the ordinary least squares method we defined then -an approximation to the function \( f \) in terms of the parameters -\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, -that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). -

    - -

    Thereafter we found the parameters \( \boldsymbol{\beta} \) by optimizing the means squared error via the so-called cost function

    -

     
    -$$ -C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. -$$ -

     
    - -

    We can rewrite this as

    -

     
    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2. -$$ -

     
    - -

    The three terms represent the square of the bias of the learning -method, which can be thought of as the error caused by the simplifying -assumptions built into the method. The second term represents the -variance of the chosen model and finally the last terms is variance of -the error \( \boldsymbol{\epsilon} \). -

    - -

    To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \). -We use a more compact notation in terms of the expectation value -

    -

     
    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right], -$$ -

     
    - -

    and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get

    -

     
    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right], -$$ -

     
    - -

    which, using the abovementioned expectation values can be rewritten as

    -

     
    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2, -$$ -

     
    - -

    that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).

    -
    - -
    -

    A way to Read the Bias-Variance Tradeoff

    - -

    -
    -

    -
    -

    -
    - -
    -

    Example code for Bias-Variance tradeoff

    @@ -1312,60 +1941,79 @@

    Example code for Bias-Variance
    -
    import matplotlib.pyplot as plt
    +  
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
     import numpy as np
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.preprocessing import PolynomialFeatures
    -from sklearn.model_selection import train_test_split
    -from sklearn.pipeline import make_pipeline
    -from sklearn.utils import resample
    -
    -np.random.seed(2018)
    -
    -n = 500
    -n_boostraps = 100
    -degree = 18  # A quite high value, just to show.
    -noise = 0.1
    -
    -# Make data set.
    -x = np.linspace(-1, 3, n).reshape(-1, 1)
    -y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
    -
    -# Hold out some test data that is never used in training.
    -x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    -
    -# Combine x transformation and model into one operation.
    -# Not neccesary, but convenient.
    -model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    -
    -# The following (m x n_bootstraps) matrix holds the column vectors y_pred
    -# for each bootstrap iteration.
    -y_pred = np.empty((y_test.shape[0], n_boostraps))
    -for i in range(n_boostraps):
    -    x_, y_ = resample(x_train, y_train)
    -
    -    # Evaluate the new model on the same test data each time.
    -    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    -
    -# Note: Expectations and variances taken w.r.t. different training
    -# data sets, hence the axis=1. Subsequent means are taken across the test data
    -# set in order to obtain a total value, but before this we have error/bias/variance
    -# calculated per data point in the test set.
    -# Note 2: The use of keepdims=True is important in the calculation of bias as this 
    -# maintains the column vector form. Dropping this yields very unexpected results.
    -error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    -bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    -variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    -print('Error:', error)
    -print('Bias^2:', bias)
    -print('Var:', variance)
    -print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))
    -
    -plt.plot(x[::5, :], y[::5, :], label='f(x)')
    -plt.scatter(x_test, y_test, label='Data points')
    -plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')
    -plt.legend()
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
     plt.show()
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
     
    @@ -1383,7 +2031,7 @@

    Example code for Bias-Variance

    -

    Understanding what happens

    +

    Same code but now with momentum gradient descent

    @@ -1391,52 +2039,73 @@

    Understanding what happens

    -
    import matplotlib.pyplot as plt
    +  
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
     import numpy as np
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.preprocessing import PolynomialFeatures
    -from sklearn.model_selection import train_test_split
    -from sklearn.pipeline import make_pipeline
    -from sklearn.utils import resample
    -
    -np.random.seed(2018)
    -
    -n = 40
    -n_boostraps = 100
    -maxdegree = 14
    -
    -
    -# Make data set.
    -x = np.linspace(-3, 3, n).reshape(-1, 1)
    -y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
    -error = np.zeros(maxdegree)
    -bias = np.zeros(maxdegree)
    -variance = np.zeros(maxdegree)
    -polydegree = np.zeros(maxdegree)
    -x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    -
    -for degree in range(maxdegree):
    -    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    -    y_pred = np.empty((y_test.shape[0], n_boostraps))
    -    for i in range(n_boostraps):
    -        x_, y_ = resample(x_train, y_train)
    -        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    -
    -    polydegree[degree] = degree
    -    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    -    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    -    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    -    print('Polynomial degree:', degree)
    -    print('Error:', error[degree])
    -    print('Bias^2:', bias[degree])
    -    print('Var:', variance[degree])
    -    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
    -
    -plt.plot(polydegree, error, label='Error')
    -plt.plot(polydegree, bias, label='bias')
    -plt.plot(polydegree, variance, label='Variance')
    -plt.legend()
    -plt.show()
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +change = 0.0
    +delta_momentum = 0.3
    +
    +for epoch in range(n_epochs):
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        # calculate update
    +        new_change = eta*gradients+delta_momentum*change
    +        # take a step
    +        theta -= new_change
    +        # save the change
    +        change = new_change
    +print("theta from own sdg with momentum")
    +print(theta)
     
    @@ -1454,60 +2123,67 @@

    Understanding what happens

    -

    Summing up

    +

    But none of these can compete with Newton's method

    -

    The bias-variance tradeoff summarizes the fundamental tension in -machine learning, particularly supervised learning, between the -complexity of a model and the amount of training data needed to train -it. Since data is often limited, in practice it is often useful to -use a less-complex model with higher bias, that is a model whose asymptotic -performance is worse than another model because it is easier to -train and less sensitive to sampling noise arising from having a -finite-sized training dataset (smaller variance). -

    +

    Note that we here have introduced automatic differentiation

    -

    The above equations tell us that in -order to minimize the expected test error, we need to select a -statistical learning method that simultaneously achieves low variance -and low bias. Note that variance is inherently a nonnegative quantity, -and squared bias is also nonnegative. Hence, we see that the expected -test MSE can never lie below \( Var(\epsilon) \), the irreducible error. -

    - -

    What do we mean by the variance and bias of a statistical learning -method? The variance refers to the amount by which our model would change if we -estimated it using a different training data set. Since the training -data are used to fit the statistical learning method, different -training data sets will result in a different estimate. But ideally the -estimate for our model should not vary too much between training -sets. However, if a method has high variance then small changes in -the training data can result in large changes in the model. In general, more -flexible statistical methods have higher variance. -

    - -

    You may also find this recent article of interest.

    + +
    +
    +
    +
    +
    +
    # Using Newton's method
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+5*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +# Note that here the Hessian does not depend on the parameters theta
    +invH = np.linalg.pinv(H)
    +theta = np.random.randn(3,1)
    +Niterations = 5
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= invH @ gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own Newton code")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    Another Example from Scikit-Learn's Repository

    - -

    This example demonstrates the problems of underfitting and overfitting and -how we can use linear regression with polynomial features to approximate -nonlinear functions. The plot shows the function that we want to approximate, -which is a part of the cosine function. In addition, the samples from the -real function and the approximations of different models are displayed. The -models have polynomial features of different degrees. We can see that a -linear function (polynomial with degree 1) is not sufficient to fit the -training samples. This is called underfitting. A polynomial of degree 4 -approximates the true function almost perfectly. However, for higher degrees -the model will overfit the training data, i.e. it learns the noise of the -training data. -We evaluate quantitatively overfitting and underfitting by using -cross-validation. We calculate the mean squared error (MSE) on the validation -set, the higher, the less likely the model generalizes correctly from the -training data. -

    - +

    Similar (second order function now) problem but now with AdaGrad

    @@ -1515,55 +2191,54 @@

    Another Example from Sci
    -
    #print(__doc__)
    -
    +  
    # Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
     import numpy as np
    +import autograd.numpy as np
     import matplotlib.pyplot as plt
    -from sklearn.pipeline import Pipeline
    -from sklearn.preprocessing import PolynomialFeatures
    -from sklearn.linear_model import LinearRegression
    -from sklearn.model_selection import cross_val_score
    -
    -
    -def true_fun(X):
    -    return np.cos(1.5 * np.pi * X)
    -
    -np.random.seed(0)
    -
    -n_samples = 30
    -degrees = [1, 4, 15]
    -
    -X = np.sort(np.random.rand(n_samples))
    -y = true_fun(X) + np.random.randn(n_samples) * 0.1
    -
    -plt.figure(figsize=(14, 5))
    -for i in range(len(degrees)):
    -    ax = plt.subplot(1, len(degrees), i + 1)
    -    plt.setp(ax, xticks=(), yticks=())
    -
    -    polynomial_features = PolynomialFeatures(degree=degrees[i],
    -                                             include_bias=False)
    -    linear_regression = LinearRegression()
    -    pipeline = Pipeline([("polynomial_features", polynomial_features),
    -                         ("linear_regression", linear_regression)])
    -    pipeline.fit(X[:, np.newaxis], y)
    -
    -    # Evaluate the models using crossvalidation
    -    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
    -                             scoring="neg_mean_squared_error", cv=10)
    -
    -    X_test = np.linspace(0, 1, 100)
    -    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    -    plt.plot(X_test, true_fun(X_test), label="True function")
    -    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
    -    plt.xlabel("x")
    -    plt.ylabel("y")
    -    plt.xlim((0, 1))
    -    plt.ylim((-2, 2))
    -    plt.legend(loc="best")
    -    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
    -        degrees[i], -scores.mean(), scores.std()))
    -plt.show()
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        Giter += gradients*gradients
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +        theta -= update
    +print("theta from own AdaGrad")
    +print(theta)
     
    @@ -1578,52 +2253,12 @@

    Another Example from Sci

    -
    - -
    -

    Various steps in cross-validation

    - -

    When the repetitive splitting of the data set is done randomly, -samples may accidently end up in a fast majority of the splits in -either training or test set. Such samples may have an unbalanced -influence on either model building or prediction evaluation. To avoid -this \( k \)-fold cross-validation structures the data splitting. The -samples are divided into \( k \) more or less equally sized exhaustive and -mutually exclusive subsets. In turn (at each split) one of these -subsets plays the role of the test set while the union of the -remaining subsets constitutes the training set. Such a splitting -warrants a balanced representation of each sample in both training and -test set over the splits. Still the division into the \( k \) subsets -involves a degree of randomness. This may be fully excluded when -choosing \( k=n \). This particular case is referred to as leave-one-out -cross-validation (LOOCV). -

    -
    - -
    -

    Cross-validation in brief

    - -

    For the various values of \( k \)

    -
      -

    1. shuffle the dataset randomly.
    2. -

    3. Split the dataset into \( k \) groups.
    4. -

    5. For each unique group: -
        -

      1. Decide which group to use as set for test data
      2. -

      3. Take the remaining groups as a training data set
      4. -

      5. Fit a model on the training set and evaluate it on the test set
      6. -

      7. Retain the evaluation score and discard the model
      8. -
      -

      -

    6. Summarize the model using the sample of model evaluation scores
    7. -
    +

    Running this code we note an almost perfect agreement with the results from matrix inversion.

    -

    Code Example for Cross-validation and \( k \)-fold Cross-validation

    - -

    The code here uses Ridge regression with cross-validation (CV) resampling and \( k \)-fold CV in order to fit a specific polynomial.

    +

    RMSprop for adaptive learning rate with Stochastic Gradient Descent

    @@ -1631,95 +2266,60 @@

    Code Exam
    -
    import numpy as np
    +  
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
     import matplotlib.pyplot as plt
    -from sklearn.model_selection import KFold
    -from sklearn.linear_model import Ridge
    -from sklearn.model_selection import cross_val_score
    -from sklearn.preprocessing import PolynomialFeatures
    -
    -# A seed just to ensure that the random numbers are the same for every run.
    -# Useful for eventual debugging.
    -np.random.seed(3155)
    -
    -# Generate the data.
    -nsamples = 100
    -x = np.random.randn(nsamples)
    -y = 3*x**2 + np.random.randn(nsamples)
    -
    -## Cross-validation on Ridge regression using KFold only
    -
    -# Decide degree on polynomial to fit
    -poly = PolynomialFeatures(degree = 6)
    -
    -# Decide which values of lambda to use
    -nlambdas = 500
    -lambdas = np.logspace(-3, 5, nlambdas)
    -
    -# Initialize a KFold instance
    -k = 5
    -kfold = KFold(n_splits = k)
    -
    -# Perform the cross-validation to estimate MSE
    -scores_KFold = np.zeros((nlambdas, k))
    -
    -i = 0
    -for lmb in lambdas:
    -    ridge = Ridge(alpha = lmb)
    -    j = 0
    -    for train_inds, test_inds in kfold.split(x):
    -        xtrain = x[train_inds]
    -        ytrain = y[train_inds]
    -
    -        xtest = x[test_inds]
    -        ytest = y[test_inds]
    -
    -        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
    -        ridge.fit(Xtrain, ytrain[:, np.newaxis])
    -
    -        Xtest = poly.fit_transform(xtest[:, np.newaxis])
    -        ypred = ridge.predict(Xtest)
    -
    -        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
    -
    -        j += 1
    -    i += 1
    -
    -
    -estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
    -
    -## Cross-validation using cross_val_score from sklearn along with KFold
    -
    -# kfold is an instance initialized above as:
    -# kfold = KFold(n_splits = k)
    -
    -estimated_mse_sklearn = np.zeros(nlambdas)
    -i = 0
    -for lmb in lambdas:
    -    ridge = Ridge(alpha = lmb)
    -
    -    X = poly.fit_transform(x[:, np.newaxis])
    -    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
    -
    -    # cross_val_score return an array containing the estimated negative mse for every fold.
    -    # we have to the the mean of every array in order to get an estimate of the mse of the model
    -    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
    -
    -    i += 1
    -
    -## Plot and compare the slightly different ways to perform cross-validation
    -
    -plt.figure()
    -
    -plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
    -plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
    -
    -plt.xlabel('log10(lambda)')
    -plt.ylabel('mse')
    -
    -plt.legend()
    -
    -plt.show()
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameter rho
    +rho = 0.99
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +	# Accumulated gradient
    +	# Scaling with rho the new and the previous results
    +        Giter = (rho*Giter+(1-rho)*gradients*gradients)
    +	# Taking the diagonal only and inverting
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +	# Hadamard product
    +        theta -= update
    +print("theta from own RMSprop")
    +print(theta)
     
    @@ -1737,7 +2337,7 @@

    Code Exam

    -

    More examples on bootstrap and cross-validation and errors

    +

    And finally ADAM

    @@ -1746,84 +2346,65 @@

    More example
    -
    # Common imports
    -import os
    +  
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
     import numpy as np
    -import pandas as pd
    +import autograd.numpy as np
     import matplotlib.pyplot as plt
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.model_selection import train_test_split
    -from sklearn.utils import resample
    -from sklearn.metrics import mean_squared_error
    -# Where to save the figures and data files
    -PROJECT_ROOT_DIR = "Results"
    -FIGURE_ID = "Results/FigureFiles"
    -DATA_ID = "DataFiles/"
    -
    -if not os.path.exists(PROJECT_ROOT_DIR):
    -    os.mkdir(PROJECT_ROOT_DIR)
    -
    -if not os.path.exists(FIGURE_ID):
    -    os.makedirs(FIGURE_ID)
    -
    -if not os.path.exists(DATA_ID):
    -    os.makedirs(DATA_ID)
    -
    -def image_path(fig_id):
    -    return os.path.join(FIGURE_ID, fig_id)
    -
    -def data_path(dat_id):
    -    return os.path.join(DATA_ID, dat_id)
    -
    -def save_fig(fig_id):
    -    plt.savefig(image_path(fig_id) + ".png", format='png')
    -
    -infile = open(data_path("EoS.csv"),'r')
    -
    -# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    -EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    -EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    -EoS = EoS.dropna()
    -Energies = EoS['Energy']
    -Density = EoS['Density']
    -#  The design matrix now as function of various polytrops
    -
    -Maxpolydegree = 30
    -X = np.zeros((len(Density),Maxpolydegree))
    -X[:,0] = 1.0
    -testerror = np.zeros(Maxpolydegree)
    -trainingerror = np.zeros(Maxpolydegree)
    -polynomial = np.zeros(Maxpolydegree)
    -
    -trials = 100
    -for polydegree in range(1, Maxpolydegree):
    -    polynomial[polydegree] = polydegree
    -    for degree in range(polydegree):
    -        X[:,degree] = Density**(degree/3.0)
    -
    -# loop over trials in order to estimate the expectation value of the MSE
    -    testerror[polydegree] = 0.0
    -    trainingerror[polydegree] = 0.0
    -    for samples in range(trials):
    -        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    -        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    -        ypred = model.predict(x_train)
    -        ytilde = model.predict(x_test)
    -        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    -        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    -
    -    testerror[polydegree] /= trials
    -    trainingerror[polydegree] /= trials
    -    print("Degree of polynomial: %3d"% polynomial[polydegree])
    -    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    -    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    -
    -plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    -plt.plot(polynomial, np.log10(testerror), label='Test Error')
    -plt.xlabel('Polynomial degree')
    -plt.ylabel('log10[MSE]')
    -plt.legend()
    -plt.show()
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980
    +theta1 = 0.9
    +theta2 = 0.999
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-7
    +iter = 0
    +for epoch in range(n_epochs):
    +    first_moment = 0.0
    +    second_moment = 0.0
    +    iter += 1
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        # Computing moments first
    +        first_moment = theta1*first_moment + (1-theta1)*gradients
    +        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients
    +        first_term = first_moment/(1.0-theta1**iter)
    +        second_term = second_moment/(1.0-theta2**iter)
    +	# Scaling with rho the new and the previous results
    +        update = eta*first_term/(np.sqrt(second_term)+delta)
    +        theta -= update
    +print("theta from own ADAM")
    +print(theta)
     
    @@ -1838,14 +2419,47 @@

    More example

    +

    + +
    +

    Material for the lab sessions

    -

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.

    +
    + +

    +

      +

    1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)
    2. +

    3. Work on project 1 +
    4. +
    +

    +

    For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen's article is highly recommended.

    +
    -

    The same example but now with cross-validation

    +

    Reminder on different scaling methods

    + +

    Before fitting a regression model, it is good practice to normalize or +standardize the features. This ensures all features are on a +comparable scale, which is especially important when using +regularization. In the exercises this week we will perform standardization, scaling each +feature to have mean 0 and standard deviation 1. +

    + +

    Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix \( \boldsymbol{X} \). +Then we subtract the mean and divide by the standard deviation for each feature. +

    + +

    In the example here we +we will also center the target \( \boldsymbol{y} \) to mean \( 0 \). Centering \( \boldsymbol{y} \) +(and each feature) means the model does not require a separate intercept +term, the data is shifted such that the intercept is effectively 0 +. (In practice, one could include an intercept in the model and not +penalize it, but here we simplify by centering.) +Choose \( n=100 \) data points and set up $\boldsymbol{x}, \( \boldsymbol{y} \) and the design matrix \( \boldsymbol{X} \). +

    -

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    @@ -1853,73 +2467,15 @@

    The same example but now
    -
    # Common imports
    -import os
    -import numpy as np
    -import pandas as pd
    -import matplotlib.pyplot as plt
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.metrics import mean_squared_error
    -from sklearn.model_selection import KFold
    -from sklearn.model_selection import cross_val_score
    -
    -
    -# Where to save the figures and data files
    -PROJECT_ROOT_DIR = "Results"
    -FIGURE_ID = "Results/FigureFiles"
    -DATA_ID = "DataFiles/"
    -
    -if not os.path.exists(PROJECT_ROOT_DIR):
    -    os.mkdir(PROJECT_ROOT_DIR)
    -
    -if not os.path.exists(FIGURE_ID):
    -    os.makedirs(FIGURE_ID)
    -
    -if not os.path.exists(DATA_ID):
    -    os.makedirs(DATA_ID)
    -
    -def image_path(fig_id):
    -    return os.path.join(FIGURE_ID, fig_id)
    -
    -def data_path(dat_id):
    -    return os.path.join(DATA_ID, dat_id)
    -
    -def save_fig(fig_id):
    -    plt.savefig(image_path(fig_id) + ".png", format='png')
    -
    -infile = open(data_path("EoS.csv"),'r')
    -
    -# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    -EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    -EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    -EoS = EoS.dropna()
    -Energies = EoS['Energy']
    -Density = EoS['Density']
    -#  The design matrix now as function of various polytrops
    -
    -Maxpolydegree = 30
    -X = np.zeros((len(Density),Maxpolydegree))
    -X[:,0] = 1.0
    -estimated_mse_sklearn = np.zeros(Maxpolydegree)
    -polynomial = np.zeros(Maxpolydegree)
    -k =5
    -kfold = KFold(n_splits = k)
    -
    -for polydegree in range(1, Maxpolydegree):
    -    polynomial[polydegree] = polydegree
    -    for degree in range(polydegree):
    -        X[:,degree] = Density**(degree/3.0)
    -        OLS = LinearRegression(fit_intercept=False)
    -# loop over trials in order to estimate the expectation value of the MSE
    -    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    -#[:, np.newaxis]
    -    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    -
    -plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    -plt.xlabel('Polynomial degree')
    -plt.ylabel('log10[MSE]')
    -plt.legend()
    -plt.show()
    +  
    # Standardize features (zero mean, unit variance for each feature)
    +X_mean = X.mean(axis=0)
    +X_std = X.std(axis=0)
    +X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
    +X_norm = (X - X_mean) / X_std
    +
    +# Center the target to zero mean (optional, to simplify intercept handling)
    +y_mean = ?
    +y_centered = ?
     
    @@ -1934,211 +2490,606 @@

    The same example but now

    + +

    Do we need to center the values of \( y \)?

    + +

    After this preprocessing, each column of \( \boldsymbol{X}_{\mathrm{norm}} \) has mean zero and standard deviation \( 1 \) +and \( \boldsymbol{y}_{\mathrm{centered}} \) has mean 0. This can make the optimization landscape +nicer and ensures the regularization penalty \( \lambda \sum_j +\theta_j^2 \) in Ridge regression treats each coefficient fairly (since features are on the +same scale). +

    -

    Material for the lab sessions

    +

    Functionality in Scikit-Learn

    + +

    Scikit-Learn has several functions which allow us to rescale the +data, normally resulting in much better results in terms of various +accuracy scores. The StandardScaler function in Scikit-Learn +ensures that for each feature/predictor we study the mean value is +zero and the variance is one (every column in the design/feature +matrix). This scaling has the drawback that it does not ensure that +we have a particular maximum or minimum in our data set. Another +function included in Scikit-Learn is the MinMaxScaler which +ensures that all features are exactly between \( 0 \) and \( 1 \). The +

    -

    Linking the regression analysis with a statistical interpretation

    +

    More preprocessing

    -

    We will now couple the discussions of ordinary least squares, Ridge -and Lasso regression with a statistical interpretation, that is we -move from a linear algebra analysis to a statistical analysis. In -particular, we will focus on what the regularization terms can result -in. We will amongst other things show that the regularization -parameter can reduce considerably the variance of the parameters -\( \beta \). +

    + +

    +

    The Normalizer scales each data +point such that the feature vector has a euclidean length of one. In other words, it +projects a data point on the circle (or sphere in the case of higher dimensions) with a +radius of 1. This means every data point is scaled by a different number (by the +inverse of it’s length). +This normalization is often used when only the direction (or angle) of the data matters, +not the length of the feature vector. +

    + +

    The RobustScaler works similarly to the StandardScaler in that it +ensures statistical properties for each feature that guarantee that +they are on the same scale. However, the RobustScaler uses the median +and quartiles, instead of mean and variance. This makes the +RobustScaler ignore data points that are very different from the rest +(like measurement errors). These odd data points are also called +outliers, and might often lead to trouble for other scaling +techniques.

    +
    +
    -

    The -advantage of doing linear regression is that we actually end up with -analytical expressions for several statistical quantities. -Standard least squares and Ridge regression allow us to -derive quantities like the variance and other expectation values in a -rather straightforward way. -

    +
    +

    Frequently used scaling functions

    -

    It is assumed that \( \varepsilon_i -\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are -independent, i.e.: +

    Many features are often scaled using standardization to improve performance. In Scikit-Learn this is given by the StandardScaler function as discussed above. It is easy however to write your own. +Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:

    +

     
    $$ -\begin{align*} -\mbox{Cov}(\varepsilon_{i_1}, -\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if} -& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2. \end{array} \right. -\end{align*} + x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)}, $$

     
    -

    The randomness of \( \varepsilon_i \) implies that -\( \mathbf{y}_i \) is also a random variable. In particular, -\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim -\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\beta} \) is a -non-random scalar. To specify the parameters of the distribution of -\( \mathbf{y}_i \) we need to calculate its first two moments. +

    where \( \overline{x}_j \) and \( \sigma(x_j) \) are the mean and standard deviation, respectively, of the feature \( x_j \). +This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one.

    -

    Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The -notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the -row number \( i \) and perform a sum over all values \( p \). +

    Keep in mind that when you transform your data set before training a model, the same transformation needs to be done +on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as

    -
    -
    -

    Assumptions made

    -

    The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off) -that there exists a function \( f(\boldsymbol{x}) \) and a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \) -which describe our data + +

    +
    +
    +
    +
    +
    """
    +#Model training, we compute the mean value of y and X
    +y_train_mean = np.mean(y_train)
    +X_train_mean = np.mean(X_train,axis=0)
    +X_train = X_train - X_train_mean
    +y_train = y_train - y_train_mean
    +
    +# The we fit our model with the training data
    +trained_model = some_model.fit(X_train,y_train)
    +
    +
    +#Model prediction, we need also to transform our data set used for the prediction.
    +X_test = X_test - X_train_mean #Use mean from training data
    +y_pred = trained_model(X_test)
    +y_pred = y_pred + y_train_mean
    +"""
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Let us try to understand what this may imply mathematically when we +subtract the mean values, also known as zero centering. For +simplicity, we will focus on ordinary regression, as done in the above example.

    + +

    The cost/loss function for regression is

     
    $$ -\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon} +C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,. $$

     
    -

    We approximate this function with our model from the solution of the linear regression equations, that is our -function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with +

    Recall also that we use the squared value. This expression can lead to an +increased penalty for higher differences between predicted and +output/target values. +

    + +

    What we have done is to single out the \( \theta_0 \) term in the +definition of the mean squared error (MSE). The design matrix \( X \) +does in this case not contain any intercept column. When we take the +derivative with respect to \( \theta_0 \), we want the derivative to obey

    +

     
    $$ -\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\beta}. +\frac{\partial C}{\partial \theta_j} = 0, $$

     
    -

    -
    -

    Expectation value and variance

    +

    for all \( j \). For \( \theta_0 \) we have

    -

    We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \)

     
    $$ -\begin{align*} -\mathbb{E}(y_i) & = -\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\beta}) + \mathbb{E}(\varepsilon_i) -\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, -\end{align*} +\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right). $$

     
    -

    while -its variance is -

    +

    Multiplying away the constant \( 2/n \), we obtain

     
    $$ -\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i -- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) - -[\mathbb{E}(y_i)]^2 \\ & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, -\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 \\ & -= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 \varepsilon_i -\mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i, -\ast} \, \beta)^2 \\ & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 -\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + -\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 -\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \, -\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2. -\end{align*} +\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j. $$

     
    -

    Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with -mean value \( \boldsymbol{X}\boldsymbol{\beta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). +

    Let us specialize first to the case where we have only two parameters \( \theta_0 \) and \( \theta_1 \). +Our result for \( \theta_0 \) simplifies then to

    -
    +

     
    +$$ +n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1. +$$ +

     
    -

    -

    Expectation value and variance for \( \boldsymbol{\beta} \)

    +

    We obtain then

    +

     
    +$$ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}. +$$ +

     
    -

    With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\beta}} \) we can evaluate the expectation value

    +

    If we define

     
    $$ -\mathbb{E}(\boldsymbol{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\beta}. +\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}, $$

     
    -

    This means that the estimator of the regression parameters is unbiased.

    +

    and the mean value of the outputs as

    +

     
    +$$ +\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i, +$$ +

     
    -

    We can also calculate the variance

    +

    we have

    +

     
    +$$ +\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}. +$$ +

     
    -

    The variance of the optimal value \( \boldsymbol{\hat{\beta}} \) is

    +

    In the general case with more parameters than \( \theta_0 \) and \( \theta_1 \), we have

     
    $$ -\begin{eqnarray*} -\mbox{Var}(\boldsymbol{\hat{\beta}}) & = & \mathbb{E} \{ [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})] [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})]^{T} \} -\\ -& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}]^{T} \} -\\ -% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -% \\ -% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -% \\ -& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -\\ -& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -% \\ -% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^T \, \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1} -% \\ -% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\beta} \boldsymbol{\beta}^T -\\ -& = & \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1}, -\end{eqnarray*} +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j. $$

     
    -

    where we have used that \( \mathbb{E} (\mathbf{y} \mathbf{y}^{T}) = -\mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} + -\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\beta}) = \sigma^2 -\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the -variance of the estimate of the \( j \)-th regression coefficient: -\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \). This may be used to -construct a confidence interval for the estimates. -

    +

    We can rewrite the latter equation as

    +

     
    +$$ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j, +$$ +

     
    + +

    where we have defined

    +

     
    +$$ +\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij}, +$$ +

     
    + +

    the mean value for all elements of the column vector \( \boldsymbol{x}_j \).

    + +

    Replacing \( y_i \) with \( y_i - y_i - \overline{\boldsymbol{y}} \) and centering also our design matrix results in a cost function (in vector-matrix disguise)

    +

     
    +$$ +C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). +$$ +

     
    -

    In a similar way, we can obtain analytical expressions for say the -expectation values of the parameters \( \boldsymbol{\beta} \) and their variance -when we employ Ridge regression, allowing us again to define a confidence interval. +

    If we minimize with respect to \( \boldsymbol{\theta} \) we have then

    + +

     
    +$$ +\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}, +$$ +

     
    + +

    where \( \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} \) +and \( \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} \).

    -

    It is rather straightforward to show that

    +

    For Ridge regression we need to add \( \lambda \boldsymbol{\theta}^T\boldsymbol{\theta} \) to the cost function and get then

     
    $$ -\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}. +\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}. $$

     
    -

    We see clearly that -\( \mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\boldsymbol{\beta}}^{\mathrm{OLS}} \) for any \( \lambda > 0 \). +

    What does this mean? And why do we insist on all this? Let us look at some examples.

    + +

    This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (code example thanks to Øyvind Sigmundson Schøyen). Here our scaling of the data is done by subtracting the mean values only. +Note also that we do not split the data into training and test.

    -

    We can also compute the variance as

    + +
    +
    +
    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +
    +from sklearn.linear_model import LinearRegression
    +
    +
    +np.random.seed(2021)
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +
    +
    +def fit_theta(X, y):
    +    return np.linalg.pinv(X.T @ X) @ X.T @ y
    +
    +
    +true_theta = [2, 0.5, 3.7]
    +
    +x = np.linspace(0, 1, 11)
    +y = np.sum(
    +    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0
    +) + 0.1 * np.random.normal(size=len(x))
    +
    +degree = 3
    +X = np.zeros((len(x), degree))
    +
    +# Include the intercept in the design matrix
    +for p in range(degree):
    +    X[:, p] = x ** p
    +
    +theta = fit_theta(X, y)
    +
    +# Intercept is included in the design matrix
    +skl = LinearRegression(fit_intercept=False).fit(X, y)
    +
    +print(f"True theta: {true_theta}")
    +print(f"Fitted theta: {theta}")
    +print(f"Sklearn fitted theta: {skl.coef_}")
    +ypredictOwn = X @ theta
    +ypredictSKL = skl.predict(X)
    +print(f"MSE with intercept column")
    +print(MSE(y,ypredictOwn))
    +print(f"MSE with intercept column from SKL")
    +print(MSE(y,ypredictSKL))
    +
    +
    +plt.figure()
    +plt.scatter(x, y, label="Data")
    +plt.plot(x, X @ theta, label="Fit")
    +plt.plot(x, skl.predict(X), label="Sklearn (fit_intercept=False)")
    +
    +
    +# Do not include the intercept in the design matrix
    +X = np.zeros((len(x), degree - 1))
    +
    +for p in range(degree - 1):
    +    X[:, p] = x ** (p + 1)
    +
    +# Intercept is not included in the design matrix
    +skl = LinearRegression(fit_intercept=True).fit(X, y)
    +
    +# Use centered values for X and y when computing coefficients
    +y_offset = np.average(y, axis=0)
    +X_offset = np.average(X, axis=0)
    +
    +theta = fit_theta(X - X_offset, y - y_offset)
    +intercept = np.mean(y_offset - X_offset @ theta)
    +
    +print(f"Manual intercept: {intercept}")
    +print(f"Fitted theta (without intercept): {theta}")
    +print(f"Sklearn intercept: {skl.intercept_}")
    +print(f"Sklearn fitted theta (without intercept): {skl.coef_}")
    +ypredictOwn = X @ theta
    +ypredictSKL = skl.predict(X)
    +print(f"MSE with Manual intercept")
    +print(MSE(y,ypredictOwn+intercept))
    +print(f"MSE with Sklearn intercept")
    +print(MSE(y,ypredictSKL))
    +
    +plt.plot(x, X @ theta + intercept, "--", label="Fit (manual intercept)")
    +plt.plot(x, skl.predict(X), "--", label="Sklearn (fit_intercept=True)")
    +plt.grid()
    +plt.legend()
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    The intercept is the value of our output/target variable +when all our features are zero and our function crosses the \( y \)-axis (for a one-dimensional case). +

    + +

    Printing the MSE, we see first that both methods give the same MSE, as +they should. However, when we move to for example Ridge regression, +the way we treat the intercept may give a larger or smaller MSE, +meaning that the MSE can be penalized by the value of the +intercept. Not including the intercept in the fit, means that the +regularization term does not include \( \theta_0 \). For different values +of \( \lambda \), this may lead to different MSE values. +

    + +

    To remind the reader, the regularization term, with the intercept in Ridge regression, is given by

     
    $$ -\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1} \mathbf{X}^{T} \mathbf{X} \{ [ \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}, +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2, $$

     
    -

    and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\beta} \) goes to zero.

    - -

    With this, we can compute the difference

    +

    but when we take out the intercept, this equation becomes

    +

     
    +$$ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2. +$$ +

     
    +

    For Lasso regression we have

     
    $$ -\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}})=\sigma^2 [ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}. +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert. $$

     
    -

    The difference is non-negative definite since each component of the -matrix product is non-negative definite. -This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\beta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. +

    It means that, when scaling the design matrix and the outputs/targets, +by subtracting the mean values, we have an optimization problem which +is not penalized by the intercept. The MSE value can then be smaller +since it focuses only on the remaining quantities. If we however bring +back the intercept, we will get a MSE which then contains the +intercept.

    -

    For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen's article is highly recommended.

    +

    Armed with this wisdom, we attempt first to simply set the intercept equal to False in our implementation of Ridge regression for our well-known vanilla data set.

    + + + +
    +
    +
    +
    +
    +
    import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import train_test_split
    +from sklearn import linear_model
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +
    +
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(3155)
    +
    +n = 100
    +x = np.random.rand(n)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
    +
    +Maxpolydegree = 20
    +X = np.zeros((n,Maxpolydegree))
    +#We include explicitely the intercept column
    +for degree in range(Maxpolydegree):
    +    X[:,degree] = x**degree
    +# We split the data in test and training data
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    +
    +p = Maxpolydegree
    +I = np.eye(p,p)
    +# Decide which values of lambda to use
    +nlambdas = 6
    +MSEOwnRidgePredict = np.zeros(nlambdas)
    +MSERidgePredict = np.zeros(nlambdas)
    +lambdas = np.logspace(-4, 2, nlambdas)
    +for i in range(nlambdas):
    +    lmb = lambdas[i]
    +    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
    +    # Note: we include the intercept column and no scaling
    +    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)
    +    RegRidge.fit(X_train,y_train)
    +    # and then make the prediction
    +    ytildeOwnRidge = X_train @ OwnRidgeTheta
    +    ypredictOwnRidge = X_test @ OwnRidgeTheta
    +    ytildeRidge = RegRidge.predict(X_train)
    +    ypredictRidge = RegRidge.predict(X_test)
    +    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
    +    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
    +    print("Theta values for own Ridge implementation")
    +    print(OwnRidgeTheta)
    +    print("Theta values for Scikit-Learn Ridge implementation")
    +    print(RegRidge.coef_)
    +    print("MSE values for own Ridge implementation")
    +    print(MSEOwnRidgePredict[i])
    +    print("MSE values for Scikit-Learn Ridge implementation")
    +    print(MSERidgePredict[i])
    +
    +# Now plot the results
    +plt.figure()
    +plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')
    +plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')
    +
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('MSE')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    The results here agree when we force Scikit-Learn's Ridge function to include the first column in our design matrix. +We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix. +What happens if we do not include the intercept in our fit? +Let us see how we can change this code by zero centering. +

    + + + +
    +
    +
    +
    +
    +
    import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import train_test_split
    +from sklearn import linear_model
    +from sklearn.preprocessing import StandardScaler
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(315)
    +
    +n = 100
    +x = np.random.rand(n)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
    +
    +Maxpolydegree = 20
    +X = np.zeros((n,Maxpolydegree-1))
    +
    +for degree in range(1,Maxpolydegree): #No intercept column
    +    X[:,degree-1] = x**(degree)
    +
    +# We split the data in test and training data
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    +
    +#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable
    +X_train_mean = np.mean(X_train,axis=0)
    +#Center by removing mean from each feature
    +X_train_scaled = X_train - X_train_mean 
    +X_test_scaled = X_test - X_train_mean
    +#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)
    +#Remove the intercept from the training data.
    +y_scaler = np.mean(y_train)           
    +y_train_scaled = y_train - y_scaler   
    +
    +p = Maxpolydegree-1
    +I = np.eye(p,p)
    +# Decide which values of lambda to use
    +nlambdas = 6
    +MSEOwnRidgePredict = np.zeros(nlambdas)
    +MSERidgePredict = np.zeros(nlambdas)
    +
    +lambdas = np.logspace(-4, 2, nlambdas)
    +for i in range(nlambdas):
    +    lmb = lambdas[i]
    +    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
    +    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data
    +    #Add intercept to prediction
    +    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
    +    RegRidge = linear_model.Ridge(lmb)
    +    RegRidge.fit(X_train,y_train)
    +    ypredictRidge = RegRidge.predict(X_test)
    +    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
    +    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
    +    print("Theta values for own Ridge implementation")
    +    print(OwnRidgeTheta) #Intercept is given by mean of target variable
    +    print("Theta values for Scikit-Learn Ridge implementation")
    +    print(RegRidge.coef_)
    +    print('Intercept from own implementation:')
    +    print(intercept_)
    +    print('Intercept from Scikit-Learn Ridge implementation')
    +    print(RegRidge.intercept_)
    +    print("MSE values for own Ridge implementation")
    +    print(MSEOwnRidgePredict[i])
    +    print("MSE values for Scikit-Learn Ridge implementation")
    +    print(MSERidgePredict[i])
    +
    +
    +# Now plot the results
    +plt.figure()
    +plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')
    +plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('MSE')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    We see here, when compared to the code which includes explicitely the +intercept column, that our MSE value is actually smaller. This is +because the regularization term does not include the intercept value +\( \theta_0 \) in the fitting. This applies to Lasso regularization as +well. It means that our optimization is now done only with the +centered matrix and/or vector that enter the fitting procedure. +

    diff --git a/doc/pub/week37/html/week37-solarized.html b/doc/pub/week37/html/week37-solarized.html index 9f9f47f47..093e60695 100644 --- a/doc/pub/week37/html/week37-solarized.html +++ b/doc/pub/week37/html/week37-solarized.html @@ -8,8 +8,8 @@ - -Week 37: Statistical interpretations and Resampling Methods + +Week 37: Gradient descent methods @@ -67,159 +67,222 @@ 2, None, 'plans-for-week-37-lecture-monday'), - ('Plans for week 37, lab sessions', + ('Readings and Videos:', 2, None, 'readings-and-videos'), + ('Material for lecture Monday September 8', 2, None, - 'plans-for-week-37-lab-sessions'), - ('Material for lecture Monday September 9', + 'material-for-lecture-monday-september-8'), + ('Gradient descent and revisiting Ordinary Least Squares from ' + 'last week', 2, None, - 'material-for-lecture-monday-september-9'), - ('Deriving OLS from a probability distribution', + 'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'), + ('Gradient descent example', 2, None, 'gradient-descent-example'), + ('The derivative of the cost/loss function', 2, None, - 'deriving-ols-from-a-probability-distribution'), - ('Independent and Identically Distrubuted (iid)', + 'the-derivative-of-the-cost-loss-function'), + ('The Hessian matrix', 2, None, 'the-hessian-matrix'), + ('Simple program', 2, None, 'simple-program'), + ('Gradient Descent Example', 2, None, 'gradient-descent-example'), + ('Gradient descent and Ridge', 2, None, - 'independent-and-identically-distrubuted-iid'), - ('Maximum Likelihood Estimation (MLE)', + 'gradient-descent-and-ridge'), + ('The Hessian matrix for Ridge Regression', 2, None, - 'maximum-likelihood-estimation-mle'), - ('A new Cost Function', 2, None, 'a-new-cost-function'), - ("More basic Statistics and Bayes' theorem", + 'the-hessian-matrix-for-ridge-regression'), + ('Program example for gradient descent with Ridge Regression', 2, None, - 'more-basic-statistics-and-bayes-theorem'), - ('Marginal Probability', 2, None, 'marginal-probability'), - ('Conditional Probability', 2, None, 'conditional-probability'), - ("Bayes' Theorem", 2, None, 'bayes-theorem'), - ("Interpretations of Bayes' Theorem", + 'program-example-for-gradient-descent-with-ridge-regression'), + ('Using gradient descent methods, limitations', 2, None, - 'interpretations-of-bayes-theorem'), - ("Example of Usage of Bayes' theorem", + 'using-gradient-descent-methods-limitations'), + ('Momentum based GD', 2, None, 'momentum-based-gd'), + ('Improving gradient descent with momentum', 2, None, - 'example-of-usage-of-bayes-theorem'), - ('Doing it correctly', 2, None, 'doing-it-correctly'), - ("Bayes' Theorem and Ridge and Lasso Regression", + 'improving-gradient-descent-with-momentum'), + ('Same code but now with momentum gradient descent', 2, None, - 'bayes-theorem-and-ridge-and-lasso-regression'), - ('Ridge and Bayes', 2, None, 'ridge-and-bayes'), - ('Lasso and Bayes', 2, None, 'lasso-and-bayes'), - ('Why resampling methods', 2, None, 'why-resampling-methods'), - ('Resampling methods', 2, None, 'resampling-methods'), - ('Resampling approaches can be computationally expensive', + 'same-code-but-now-with-momentum-gradient-descent'), + ('Overview video on Stochastic Gradient Descent (SGD)', 2, None, - 'resampling-approaches-can-be-computationally-expensive'), - ('Why resampling methods ?', 2, None, 'why-resampling-methods'), - ('Statistical analysis', 2, None, 'statistical-analysis'), - ('Resampling methods', 2, None, 'resampling-methods'), - ('Resampling methods: Bootstrap', + 'overview-video-on-stochastic-gradient-descent-sgd'), + ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'), + ('Pros and cons', 2, None, 'pros-and-cons'), + ('Convergence rates', 2, None, 'convergence-rates'), + ('Accuracy', 2, None, 'accuracy'), + ('Stochastic Gradient Descent (SGD)', 2, None, - 'resampling-methods-bootstrap'), - ('The Central Limit Theorem', + 'stochastic-gradient-descent-sgd'), + ('Stochastic Gradient Descent', 2, None, - 'the-central-limit-theorem'), - ('Finding the Limit', 2, None, 'finding-the-limit'), - ('Rewriting the $\\delta$-function', + 'stochastic-gradient-descent'), + ('Computation of gradients', 2, None, 'computation-of-gradients'), + ('SGD example', 2, None, 'sgd-example'), + ('The gradient step', 2, None, 'the-gradient-step'), + ('Simple example code', 2, None, 'simple-example-code'), + ('When do we stop?', 2, None, 'when-do-we-stop'), + ('Slightly different approach', 2, None, - 'rewriting-the-delta-function'), - ('Identifying Terms', 2, None, 'identifying-terms'), - ('Wrapping it up', 2, None, 'wrapping-it-up'), - ('Confidence Intervals', 2, None, 'confidence-intervals'), - ('Standard Approach based on the Normal Distribution', + 'slightly-different-approach'), + ('Time decay rate', 2, None, 'time-decay-rate'), + ('Code with a Number of Minibatches which varies', 2, None, - 'standard-approach-based-on-the-normal-distribution'), - ('Resampling methods: Bootstrap background', + 'code-with-a-number-of-minibatches-which-varies'), + ('Replace or not', 2, None, 'replace-or-not'), + ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison', 2, None, - 'resampling-methods-bootstrap-background'), - ('Resampling methods: More Bootstrap background', + 'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'), + ('Theoretical Convergence Speed and convex optimization', + 3, + None, + 'theoretical-convergence-speed-and-convex-optimization'), + ('Strongly Convex Case', 3, None, 'strongly-convex-case'), + ('Non-Convex Problems', 3, None, 'non-convex-problems'), + ('Memory Usage and Scalability', + 2, + None, + 'memory-usage-and-scalability'), + ('Empirical Evidence: Convergence Time and Memory in Practice', + 2, + None, + 'empirical-evidence-convergence-time-and-memory-in-practice'), + ('Deep Neural Networks', 3, None, 'deep-neural-networks'), + ('Memory constraints', 3, None, 'memory-constraints'), + ('Second moment of the gradient', + 2, + None, + 'second-moment-of-the-gradient'), + ('Challenge: Choosing a Fixed Learning Rate', + 2, + None, + 'challenge-choosing-a-fixed-learning-rate'), + ('Motivation for Adaptive Step Sizes', + 2, + None, + 'motivation-for-adaptive-step-sizes'), + ('AdaGrad algorithm, taken from "Goodfellow et ' + 'al":"https://www.deeplearningbook.org/contents/optimization.html"', + 2, + None, + 'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'), + ('Derivation of the AdaGrad Algorithm', + 2, + None, + 'derivation-of-the-adagrad-algorithm'), + ('AdaGrad Update Rule Derivation', + 2, + None, + 'adagrad-update-rule-derivation'), + ('AdaGrad Properties', 2, None, 'adagrad-properties'), + ('RMSProp: Adaptive Learning Rates', + 2, + None, + 'rmsprop-adaptive-learning-rates'), + ('RMSProp algorithm, taken from "Goodfellow et ' + 'al":"https://www.deeplearningbook.org/contents/optimization.html"', + 2, + None, + 'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'), + ('Adam Optimizer', 2, None, 'adam-optimizer'), + ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"', + 2, + None, + 'adam-optimizer-https-arxiv-org-abs-1412-6980'), + ('Why Combine Momentum and RMSProp?', + 2, + None, + 'why-combine-momentum-and-rmsprop'), + ('Adam: Exponential Moving Averages (Moments)', 2, None, - 'resampling-methods-more-bootstrap-background'), - ('Resampling methods: Bootstrap approach', + 'adam-exponential-moving-averages-moments'), + ('Adam: Bias Correction', 2, None, 'adam-bias-correction'), + ('Adam: Update Rule Derivation', 2, None, - 'resampling-methods-bootstrap-approach'), - ('Resampling methods: Bootstrap steps', + 'adam-update-rule-derivation'), + ('Adam vs. AdaGrad and RMSProp', 2, None, - 'resampling-methods-bootstrap-steps'), - ('Code example for the Bootstrap method', + 'adam-vs-adagrad-and-rmsprop'), + ('Adaptivity Across Dimensions', 2, None, - 'code-example-for-the-bootstrap-method'), - ('Plotting the Histogram', 2, None, 'plotting-the-histogram'), - ('The bias-variance tradeoff', + 'adaptivity-across-dimensions'), + ('ADAM algorithm, taken from "Goodfellow et ' + 'al":"https://www.deeplearningbook.org/contents/optimization.html"', 2, None, - 'the-bias-variance-tradeoff'), - ('A way to Read the Bias-Variance Tradeoff', + 'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'), + ('Algorithms and codes for Adagrad, RMSprop and Adam', 2, None, - 'a-way-to-read-the-bias-variance-tradeoff'), - ('Example code for Bias-Variance tradeoff', + 'algorithms-and-codes-for-adagrad-rmsprop-and-adam'), + ('Practical tips', 2, None, 'practical-tips'), + ('Sneaking in automatic differentiation using Autograd', 2, None, - 'example-code-for-bias-variance-tradeoff'), - ('Understanding what happens', + 'sneaking-in-automatic-differentiation-using-autograd'), + ('Same code but now with momentum gradient descent', 2, None, - 'understanding-what-happens'), - ('Summing up', 2, None, 'summing-up'), - ("Another Example from Scikit-Learn's Repository", + 'same-code-but-now-with-momentum-gradient-descent'), + ('Including Stochastic Gradient Descent with Autograd', 2, None, - 'another-example-from-scikit-learn-s-repository'), - ('Various steps in cross-validation', + 'including-stochastic-gradient-descent-with-autograd'), + ('Same code but now with momentum gradient descent', 2, None, - 'various-steps-in-cross-validation'), - ('Cross-validation in brief', + 'same-code-but-now-with-momentum-gradient-descent'), + ("But none of these can compete with Newton's method", 2, None, - 'cross-validation-in-brief'), - ('Code Example for Cross-validation and $k$-fold ' - 'Cross-validation', + 'but-none-of-these-can-compete-with-newton-s-method'), + ('Similar (second order function now) problem but now with ' + 'AdaGrad', 2, None, - 'code-example-for-cross-validation-and-k-fold-cross-validation'), - ('More examples on bootstrap and cross-validation and errors', + 'similar-second-order-function-now-problem-but-now-with-adagrad'), + ('RMSprop for adaptive learning rate with Stochastic Gradient ' + 'Descent', 2, None, - 'more-examples-on-bootstrap-and-cross-validation-and-errors'), - ('The same example but now with cross-validation', + 'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'), + ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"', 2, None, - 'the-same-example-but-now-with-cross-validation'), + 'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'), ('Material for the lab sessions', 2, None, 'material-for-the-lab-sessions'), - ('Linking the regression analysis with a statistical ' - 'interpretation', + ('Reminder on different scaling methods', 2, None, - 'linking-the-regression-analysis-with-a-statistical-interpretation'), - ('Assumptions made', 2, None, 'assumptions-made'), - ('Expectation value and variance', + 'reminder-on-different-scaling-methods'), + ('Functionality in Scikit-Learn', 2, None, - 'expectation-value-and-variance'), - ('Expectation value and variance for $\\boldsymbol{\\beta}$', + 'functionality-in-scikit-learn'), + ('More preprocessing', 2, None, 'more-preprocessing'), + ('Frequently used scaling functions', 2, None, - 'expectation-value-and-variance-for-boldsymbol-beta')]} + 'frequently-used-scaling-functions')]} end of tocinfo --> @@ -241,7 +304,7 @@
    -

    Week 37: Statistical interpretations and Resampling Methods

    +

    Week 37: Gradient descent methods

    @@ -254,7 +317,7 @@

    Week 37: Statistical interpretations and Resampling Methods


    -

    September 9, 2024

    +

    September 8-12, 2025


    @@ -264,808 +327,1619 @@

    September 9, 2024

    Plans for week 37, lecture Monday

    -Material for the lecture on Monday September 9 +Plans and material for the lecture on Monday September 8

    -

    -
  • Statistical interpretation of Ridge and Lasso regression, see also slides from last week
  • -
  • Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.
  • -
  • Readings and Videos:
  • - +

    The family of gradient descent methods

    +
      +
    1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge
    2. +
    3. Improving gradient descent with momentum
    4. +
    5. Introducing stochastic gradient descent
    6. +
    7. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM
    8. +
    9. Video of Lecture
    10. +
    11. Whiteboard notes
    12. +










    -

    Plans for week 37, lab sessions

    - +

    Readings and Videos:

    -Material for the lab sessions on Tuesday and Wednesday +

    -

      -
    • Calculations of expectation values
    • -
    • Discussion of resampling techniques
    • -
    • Exercise set for week 37
    • -
    • Work on project 1
    • -
    • Video of exercise sessions week 37
    • -
    • For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen's article is highly recommended.
    • -
    +
      +
    1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5 at https://www.deeplearningbook.org/contents/numerical.html and chapter 8.3-8.5 at https://www.deeplearningbook.org/contents/optimization.html
    2. +
    3. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.
    4. +
    5. Video on gradient descent at https://www.youtube.com/watch?v=sDv4f4s2SB8
    6. +
    7. Video on Stochastic gradient descent at https://www.youtube.com/watch?v=vMh0zPT0tLI
    8. +
    - -









    -

    Material for lecture Monday September 9











    -

    Deriving OLS from a probability distribution

    +

    Material for lecture Monday September 8

    -

    Our basic assumption when we derived the OLS equations was to assume -that our output is determined by a given continuous function -\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal -distribution with zero mean value and an undetermined variance -\( \sigma^2 \). -

    + +

    Gradient descent and revisiting Ordinary Least Squares from last week

    -

    We found above that the outputs \( \boldsymbol{y} \) have a mean value given by -\( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to -the design matrix are not stochastic variables, we can assume that the -probability distribution of our targets is also a normal distribution -but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a -single output \( y_i \) is given by the Gaussian distribution +

    Last week we started with linear regression as a case study for the gradient descent +methods. Linear regression is a great test case for the gradient +descent methods discussed in the lectures since it has several +desirable properties such as:

    -$$ -y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. -$$ +
      +
    1. An analytical solution (recall homework sets for week 35).
    2. +
    3. The gradient can be computed analytically.
    4. +
    5. The cost function is convex which guarantees that gradient descent converges for small enough learning rates
    6. +
    +

    We revisit an example similar to what we had in the first homework set. We have a function of the type

    -









    -

    Independent and Identically Distrubuted (iid)

    + +
    +
    +
    +
    +
    +
    import numpy as np
    +x = 2*np.random.rand(m,1)
    +y = 4+3*x+np.random.randn(m,1)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. -We define this distribution as +

    with \( x_i \in [0,1] \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). +The linear regression model is given by

    $$ -p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}, +h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x, +$$ + +

    such that

    +$$ +\boldsymbol{y}_i = \theta_0 + \theta_1 x_i. $$ -

    which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\beta} \).

    -

    Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have

    + +

    Gradient descent example

    + +

    Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \theta = (\theta_0, \theta_1)^T \)

    +

    It is convenient to write \( \mathbf{\boldsymbol{y}} = X\theta \) where \( X \in \mathbb{R}^{100 \times 2} \) is the design matrix given by (we keep the intercept here)

    $$ -p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}). +X \equiv \begin{bmatrix} +1 & x_1 \\ +\vdots & \vdots \\ +1 & x_{100} & \\ +\end{bmatrix}. $$ -

    We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is -in case we have a simple one-dimensional input and output case -

    +

    The cost/loss/risk function is given by

    $$ -\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})]. +C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] $$ -

    In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). -We can now rewrite the above probability as -

    +

    and we want to find \( \theta \) such that \( C(\theta) \) is minimized.

    + +









    +

    The derivative of the cost/loss function

    + +

    Computing \( \partial C(\theta) / \partial \theta_0 \) and \( \partial C(\theta) / \partial \theta_1 \) we can show that the gradient can be written as

    $$ -p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. +\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\ +\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\ +\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), $$ -

    It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\beta} \).

    +

    where \( X \) is the design matrix defined above.











    -

    Maximum Likelihood Estimation (MLE)

    +

    The Hessian matrix

    +

    The Hessian matrix of \( C(\theta) \) is given by

    +$$ +\boldsymbol{H} \equiv \begin{bmatrix} +\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ +\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ +\end{bmatrix} = \frac{2}{n}X^T X. +$$ -

    In statistics, maximum likelihood estimation (MLE) is a method of -estimating the parameters of an assumed probability distribution, -given some observed data. This is achieved by maximizing a likelihood -function so that, under the assumed statistical model, the observed -data is the most probable. -

    +

    This result implies that \( C(\theta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.

    -

    We will assume here that our events are given by the above Gaussian -distribution and we will determine the optimal parameters \( \beta \) by -maximizing the above PDF. However, computing the derivatives of a -product function is cumbersome and can easily lead to overflow and/or -underflowproblems, with potentials for loss of numerical precision. -

    +









    +

    Simple program

    -

    In practice, it is more convenient to maximize the logarithm of the -PDF because it is a monotonically increasing function of the argument. -Alternatively, and this will be our option, we will minimize the -negative of the logarithm since this is a monotonically decreasing -function. +

    We can now write a program that minimizes \( C(\theta) \) using the gradient descent method with a constant learning rate \( \eta \) according to

    +$$ +\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots +$$ + +

    We can use the expression we computed for the gradient and let use a +\( \theta_0 \) be chosen randomly and let \( \eta = 0.001 \). Stop iterating +when \( ||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8} \). Note that the code below does not include the latter stop criterion.

    -

    Note also that maximization/minimization of the logarithm of the PDF -is equivalent to the maximization/minimization of the function itself. +

    And finally we can compare our solution for \( \theta \) with the analytic result given by +\( \theta= (X^TX)^{-1} X^T \mathbf{y} \).











    -

    A new Cost Function

    +

    Gradient Descent Example

    -

    We could now define a new cost function to minimize, namely the negative logarithm of the above PDF

    +

    Here our simple example

    + + +
    +
    +
    +
    +
    +
    # Importing various packages
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from mpl_toolkits.mplot3d import Axes3D
    +from matplotlib import cm
    +from matplotlib.ticker import LinearLocator, FormatStrFormatter
    +import sys
    +
    +# the number of datapoints
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +# Hessian matrix
    +H = (2.0/n)* X.T @ X
    +# Get the eigenvalues
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
    +print(theta_linreg)
    +theta = np.random.randn(2,1)
    +
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +for iter in range(Niterations):
    +    gradient = (2.0/n)*X.T @ (X @ theta-y)
    +    theta -= eta*gradient
    +
    +print(theta)
    +xnew = np.array([[0],[2]])
    +xbnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = xbnew.dot(theta)
    +ypredict2 = xbnew.dot(theta_linreg)
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Gradient descent example')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -$$ -C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}, -$$ -

    which becomes

    + +

    Gradient descent and Ridge

    + +

    We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \theta \),

    $$ -C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}. +C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0. $$ -

    Taking the derivative of the new cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely

    - +

    In order to minimize \( C_{\text{ridge}}(\theta) \) using GD we adjust the gradient as follows

    $$ -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0, +\nabla_\theta C_{\text{ridge}}(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\ +\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\ +\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta). $$ -

    which leads to the well-known OLS equation for the optimal paramters \( \beta \)

    +

    We can easily extend our program to minimize \( C_{\text{ridge}}(\theta) \) using gradient descent and compare with the analytical solution given by

    $$ -\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}! +\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}. $$ -

    Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics.











    -

    More basic Statistics and Bayes' theorem

    - -

    A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry. -Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics. -

    - -

    Assume we have two domains of events \( X=[x_0,x_1,\dots,x_{n-1}] \) and \( Y=[y_0,y_1,\dots,y_{n-1}] \).

    - -

    We define also the likelihood for \( X \) and \( Y \) as \( p(X) \) and \( p(Y) \) respectively. -The likelihood of a specific event \( x_i \) (or \( y_i \)) is then written as \( p(X=x_i) \) or just \( p(x_i)=p_i \). -

    - -
    -Union of events is given by -

    +

    The Hessian matrix for Ridge Regression

    +

    The Hessian matrix of Ridge Regression for our simple example is given by

    $$ -p(X \cup Y)= p(X)+p(Y)-p(X \cap Y). +\boldsymbol{H} \equiv \begin{bmatrix} +\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} \\ +\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} & \\ +\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}. $$ -
    +

    This implies that the Hessian matrix is positive definite, hence the stationary point is a +minimum. +Note that the Ridge cost function is convex being a sum of two convex +functions. Therefore, the stationary point is a global +minimum of this function. +

    -
    -The product rule (aka joint probability) is given by -

    -$$ -p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X), -$$ +









    +

    Program example for gradient descent with Ridge Regression

    -

    where we read \( p(X\vert Y) \) as the likelihood of obtaining \( X \) given \( Y \).

    + +
    +
    +
    +
    +
    +
    from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from mpl_toolkits.mplot3d import Axes3D
    +from matplotlib import cm
    +from matplotlib.ticker import LinearLocator, FormatStrFormatter
    +import sys
    +
    +# the number of datapoints
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +
    +#Ridge parameter lambda
    +lmbda  = 0.001
    +Id = n*lmbda* np.eye(XT_X.shape[0])
    +
    +# Hessian matrix
    +H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
    +# Get the eigenvalues
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +
    +theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
    +print(theta_linreg)
    +# Start plain gradient descent
    +theta = np.random.randn(2,1)
    +
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta
    +    theta -= eta*gradients
    +
    +print(theta)
    +ypredict = X @ theta
    +ypredict2 = X @ theta_linreg
    +plt.plot(x, ypredict, "r-")
    +plt.plot(x, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Gradient descent example for Ridge')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    If we have independent events then \( p(X,Y)=p(X)p(Y) \).

    -









    -

    Marginal Probability

    +

    Using gradient descent methods, limitations

    -

    The marginal probability is defined in terms of only one of the set of variables \( X,Y \). For a discrete probability we have

    -
    - -

    -$$ -p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i). -$$ -

    +
      +
    • Gradient descent (GD) finds local minima of our function. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
    • +
    • GD is sensitive to initial conditions. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
    • +
    • Gradients are computationally expensive to calculate for large datasets. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over all \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called "mini batches". This has the added benefit of introducing stochasticity into our algorithm.
    • +
    • GD is very sensitive to choices of learning rates. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would adaptively choose the learning rates to match the landscape.
    • +
    • GD treats all directions in parameter space uniformly. Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.
    • +
    • GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
    • +
    +









    +

    Momentum based GD

    +

    We discuss here some simple examples where we introduce what is called +'memory'about previous steps, or what is normally called momentum +gradient descent. +For the mathematical details, see whiteboad notes from lecture on September 8, 2025. +











    -

    Conditional Probability

    +

    Improving gradient descent with momentum

    -

    The conditional probability, if \( p(Y) > 0 \), is

    -
    - -

    -$$ -p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}. -$$ + + +

    +
    +
    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# take a step
    +		solution = solution - step_size * gradient
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# perform the gradient descent search
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +










    -

    Bayes' Theorem

    +

    Same code but now with momentum gradient descent

    -

    If we combine the conditional probability with the marginal probability and the standard product rule, we have

    -$$ -p(X\vert Y)= \frac{p(X,Y)}{p(Y)}, -$$ -

    which we can rewrite as

    + +
    +
    +
    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# keep track of the change
    +	change = 0.0
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# calculate update
    +		new_change = step_size * gradient + momentum * change
    +		# take a step
    +		solution = solution - new_change
    +		# save the change
    +		change = new_change
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# define momentum
    +momentum = 0.3
    +# perform the gradient descent search with momentum
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + -$$ -p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}, -$$ +









    +

    Overview video on Stochastic Gradient Descent (SGD)

    -

    which is Bayes' theorem. It allows us to evaluate the uncertainty in in \( X \) after we have observed \( Y \). We can easily interchange \( X \) with \( Y \).

    +What is Stochastic Gradient Descent +

    There are several reasons for using stochastic gradient descent. Some of these are:

    +
      +
    1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.
    2. +
    3. Hopefully avoid Local Minima
    4. +
    5. Memory Usage: Requires less memory compared to computing gradients for the entire dataset.
    6. +










    -

    Interpretations of Bayes' Theorem

    +

    Batches and mini-batches

    -

    The quantity \( p(Y\vert X) \) on the right-hand side of the theorem is -evaluated for the observed data \( Y \) and can be viewed as a function of -the parameter space represented by \( X \). This function is not -necesseraly normalized and is normally called the likelihood function. +

    In gradient descent we compute the cost function and its gradient for all data points we have.

    + +

    In large-scale applications such as the ILSVRC challenge, the +training data can have on order of millions of examples. Hence, it +seems wasteful to compute the full cost function over the entire +training set in order to perform only a single parameter update. A +very common approach to addressing this challenge is to compute the +gradient over batches of the training data. For example, a typical batch could contain some thousand examples from +an entire training set of several millions. This batch is then used to +perform a parameter update.

    -

    The function \( p(X) \) on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.

    +









    +

    Pros and cons

    -

    Let us try to illustrate Bayes' theorem through an example.

    +
      +
    1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.
    2. +
    3. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.
    4. +
    5. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.
    6. +
    +









    +

    Convergence rates

    +
      +
    1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.
    2. +
    3. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.
    4. +










    -

    Example of Usage of Bayes' theorem

    +

    Accuracy

    -

    Let us suppose that you are undergoing a series of mammography scans in -order to rule out possible breast cancer cases. We define the -sensitivity for a positive event by the variable \( X \). It takes binary -values with \( X=1 \) representing a positive event and \( X=0 \) being a -negative event. We reserve \( Y \) as a classification parameter for -either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a positive thing). +

    In general, stochastic Gradient Descent is Less accurate than gradient +descent, as it calculates the gradient on single examples, which may +not accurately represent the overall dataset. Gradient Descent is +more accurate because it uses the average gradient calculated over the +entire dataset.

    -

    We let \( Y=1 \) represent the the case of having breast cancer and \( Y=0 \) as not.

    - -

    Let us assume that if you have breast cancer, the test will be positive with a probability of \( 0.8 \), that is we have

    - -$$ -p(X=1\vert Y=1) =0.8. -$$ +

    There are other disadvantages to using SGD. The main drawback is that +its convergence behaviour can be more erratic due to the random +sampling of individual training examples. This can lead to less +accurate results, as the algorithm may not converge to the true +minimum of the cost function. Additionally, the learning rate, which +determines the step size of each update to the model’s parameters, +must be carefully chosen to ensure convergence. +

    -

    This obviously sounds scary since many would conclude that if the test is positive, there is a likelihood of \( 80\% \) for having cancer. -It is however not correct, as the following Bayesian analysis shows. +

    It is however the method of choice in deep learning algorithms where +SGD is often used in combination with other optimization techniques, +such as momentum or adaptive learning rates











    -

    Doing it correctly

    +

    Stochastic Gradient Descent (SGD)

    -

    If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number. -Let us assume that the prior probability in the population as a whole is +

    In stochastic gradient descent, the extreme case is the case where we +have only one batch, that is we include the whole data set.

    -$$ -p(Y=1) =0.004. -$$ - -

    We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have

    -$$ -p(X=1\vert Y=0) =0.1. -$$ - -

    Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute

    - -$$ -p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031. -$$ +

    This process is called Stochastic Gradient +Descent (SGD) (or also sometimes on-line gradient descent). This is +relatively less common to see because in practice due to vectorized +code optimizations it can be computationally much more efficient to +evaluate the gradient for 100 examples, than the gradient for one +example 100 times. Even though SGD technically refers to using a +single example at a time to evaluate the gradient, you will hear +people use the term SGD even when referring to mini-batch gradient +descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD +for “Batch gradient descent” are rare to see), where it is usually +assumed that mini-batches are used. The size of the mini-batch is a +hyperparameter but it is not very common to cross-validate or bootstrap it. It is +usually based on memory constraints (if any), or set to some value, +e.g. 32, 64 or 128. We use powers of 2 in practice because many +vectorized operation implementations work faster when their inputs are +sized in powers of 2. +

    -

    That is, in case of a positive test, there is only a \( 3\% \) chance of having breast cancer!

    +

    In our notes with SGD we mean stochastic gradient descent with mini-batches.











    -

    Bayes' Theorem and Ridge and Lasso Regression

    +

    Stochastic Gradient Descent

    -

    Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.

    - -

    For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)

    -$$ -\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})], -$$ +

    Stochastic gradient descent (SGD) and variants thereof address some of +the shortcomings of the Gradient descent method discussed above. +

    -

    is given by

    +

    The underlying idea of SGD comes from the observation that the cost +function, which we want to minimize, can almost always be written as a +sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \), +

    $$ -p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. +C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i, +\mathbf{\theta}). $$ -

    In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)? That is, how can we define the posterior probability

    -$$ -p(\boldsymbol{\beta}\vert\boldsymbol{D}). -$$ +









    +

    Computation of gradients

    -

    Bayes' theorem comes to our rescue here since (omitting the normalization constant)

    +

    This in turn means that the gradient can be +computed as a sum over \( i \)-gradients +

    $$ -p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}). +\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}). $$ -

    We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the prior \( p(\boldsymbol{\beta}) \)!

    +

    Stochasticity/randomness is introduced by only taking the +gradient on a subset of the data called minibatches. If there are \( n \) +data points and the size of each minibatch is \( M \), there will be \( n/M \) +minibatches. We denote these minibatches by \( B_k \) where +\( k=1,\cdots,n/M \). +











    -

    Ridge and Bayes

    - -

    With the posterior probability defined by a likelihood which we have -already modeled and an unknown prior, we are now ready to make -additional models for the prior. +

    SGD example

    +

    As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) +and we choose to have \( M=5 \) minibathces, +then each minibatch contains two data points. In particular we have +\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 = +(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you +have only a single batch with all data points and on the other extreme, +you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e +\( B_k = \mathbf{x}_k \).

    -

    We can, based on our discussions of the variance of \( \boldsymbol{\beta} \) and the mean value, assume that the prior for the values \( \boldsymbol{\beta} \) is given by a Gaussian with mean value zero and variance \( \tau^2 \), that is

    - +

    The idea is now to approximate the gradient by replacing the sum over +all data points with a sum over the data points in one the minibatches +picked at random in each gradient descent step +

    $$ -p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}. +\nabla_{\theta} +C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta +c_i(\mathbf{x}_i, \mathbf{\theta}). $$ -

    Our posterior probability becomes then (omitting the normalization factor which is just a constant)

    -$$ -p(\boldsymbol{\beta\vert\boldsymbol{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}. -$$ -

    We can now optimize this quantity with respect to \( \boldsymbol{\beta} \). As we -did for OLS, this is most conveniently done by taking the negative -logarithm of the posterior probability. Doing so and leaving out the -constants terms that do not depend on \( \beta \), we have -

    +









    +

    The gradient step

    +

    Thus a gradient descent step now looks like

    $$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\boldsymbol{\beta}\vert\vert_2^2, +\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i, +\mathbf{\theta}) $$ -

    and replacing \( 1/2\tau^2 \) with \( \lambda \) we have

    - -$$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_2^2, -$$ - -

    which is our Ridge cost function! Nice, isn't it?

    +

    where \( k \) is picked at random with equal +probability from \( [1,n/M] \). An iteration over the number of +minibathces (n/M) is commonly referred to as an epoch. Thus it is +typical to choose a number of epochs and for each epoch iterate over +the number of minibatches, as exemplified in the code below. +











    -

    Lasso and Bayes

    - -

    To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution (Laplace in this case) with zero mean value, that is

    +

    Simple example code

    -$$ -p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}. -$$ -

    Our posterior probability becomes then (omitting the normalization factor which is just a constant)

    -$$ -p(\boldsymbol{\beta}\vert\boldsymbol{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}. -$$ + +
    +
    +
    +
    +
    +
    import numpy as np 
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 10 #number of epochs
    +
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for 
    +        j += 1
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    Taking the negative -logarithm of the posterior probability and leaving out the -constants terms that do not depend on \( \beta \), we have +

    Taking the gradient only on a subset of the data has two important +benefits. First, it introduces randomness which decreases the chance +that our opmization scheme gets stuck in a local minima. Second, if +the size of the minibatches are small relative to the number of +datapoints (\( M < n \)), the computation of the gradient is much +cheaper since we sum over the datapoints in the \( k-th \) minibatch and not +all \( n \) datapoints.

    -$$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\boldsymbol{\beta}\vert\vert_1, -$$ - -

    and replacing \( 1/\tau \) with \( \lambda \) we have

    - -$$ -C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1, -$$ +









    +

    When do we stop?

    -

    which is our Lasso cost function!

    +

    A natural question is when do we stop the search for a new minimum? +One possibility is to compute the full gradient after a given number +of epochs and check if the norm of the gradient is smaller than some +threshold and stop if true. However, the condition that the gradient +is zero is valid also for local minima, so this would only tell us +that we are close to a local/global minimum. However, we could also +evaluate the cost function at this point, store the result and +continue the search. If the test kicks in at a later stage we can +compare the values of the cost function and keep the \( \theta \) that +gave the lowest value. +











    -

    Why resampling methods

    +

    Slightly different approach

    -

    Before we proceed, we need to rethink what we have been doing. In our -eager to fit the data, we have omitted several important elements in -our regression analysis. In what follows we will +

    Another approach is to let the step length \( \eta_j \) depend on the +number of epochs in such a way that it becomes very small after a +reasonable time such that we do not move at all. Such approaches are +also called scaling. There are many such ways to scale the learning +rate +and discussions here. See +also +https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 +for a discussion of different scaling functions for the learning rate.

    -
      -
    1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff
    2. -
    3. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more
    4. -
    -

    and discuss how to select a given model (one of the difficult parts in machine learning).











    -

    Resampling methods

    -
    - -

    -

    Resampling methods are an indispensable tool in modern -statistics. They involve repeatedly drawing samples from a training -set and refitting a model of interest on each sample in order to -obtain additional information about the fitted model. For example, in -order to estimate the variability of a linear regression fit, we can -repeatedly draw different samples from the training data, fit a linear -regression to each new sample, and then examine the extent to which -the resulting fits differ. Such an approach may allow us to obtain -information that would not be available from fitting the model only -once using the original training sample. -

    - -

    Two resampling methods are often used in Machine Learning analyses,

    -
      -
    1. The bootstrap method
    2. -
    3. and Cross-Validation
    4. -
    -

    In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular -cross-validation and the bootstrap method. +

    Time decay rate

    + +

    As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \eta_j (0; t_0, t_1) = t_0/t_1 \) which decays in time \( t \).

    + +

    In this way we can fix the number of epochs, compute \( \theta \) and +evaluate the cost function at the end. Repeating the computation will +give a different result since the scheme is random by design. Then we +pick the final \( \theta \) that gives the lowest value of the cost +function.

    -
    -









    -

    Resampling approaches can be computationally expensive

    -
    - -

    + +

    +
    +
    +
    +
    +
    import numpy as np 
    +
    +def step_length(t,t0,t1):
    +    return t0/(t+t1)
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 500 #number of epochs
    +t0 = 1.0
    +t1 = 10
    +
    +eta_j = t0/t1
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for theta
    +        t = epoch*m+i
    +        eta_j = step_length(t,t0,t1)
    +        j += 1
     
    -

    Resampling approaches can be computationally expensive, because they -involve fitting the same statistical method multiple times using -different subsets of the training data. However, due to recent -advances in computing power, the computational requirements of -resampling methods generally are not prohibitive. In this chapter, we -discuss two of the most commonly used resampling methods, -cross-validation and the bootstrap. Both methods are important tools -in the practical application of many statistical learning -procedures. For example, cross-validation can be used to estimate the -test error associated with a given statistical learning method in -order to evaluate its performance, or to select the appropriate level -of flexibility. The process of evaluating a model’s performance is -known as model assessment, whereas the process of selecting the proper -level of flexibility for a model is known as model selection. The -bootstrap is widely used. -

    +print("eta_j after %d epochs: %g" % (n_epochs,eta_j)) +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +










    -

    Why resampling methods ?

    -
    -Statistical analysis -

    +

    Code with a Number of Minibatches which varies

    -
      -
    • Our simulations can be treated as computer experiments. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
    • -
    • The results can be analysed with the same statistical tools as we would use when analysing experimental data.
    • -
    • As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
    • -
    +

    In the code here we vary the number of mini-batches.

    + + +
    +
    +
    +
    +
    +
    # Importing various packages
    +from math import exp, sqrt
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ ((X @ theta)-y)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    - +









    -

    Statistical analysis

    -
    - -

    +

    Replace or not

    -
      -
    • As in other experiments, many numerical experiments have two classes of errors:
    • -
        -
      • Statistical errors
      • -
      • Systematical errors
      • -
      -
    • Statistical errors can be estimated using standard tools from statistics
    • -
    • Systematical errors are method specific and must be treated differently from case to case.
    • -
    -
    - +

    In the above code, we have use replacement in setting up the +mini-batches. The discussion +here may be +useful. +











    -

    Resampling methods

    +

    SGD vs Full-Batch GD: Convergence Speed and Memory Comparison

    +

    Theoretical Convergence Speed and convex optimization

    -

    With all these analytical equations for both the OLS and Ridge -regression, we will now outline how to assess a given model. This will -lead to a discussion of the so-called bias-variance tradeoff (see -below) and so-called resampling methods. +

    Consider minimizing an empirical cost function

    +$$ +C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta), +$$ + +

    where each \( l_i(\theta) \) is a +differentiable loss term. Gradient Descent (GD) updates parameters +using the full gradient \( \nabla C(\theta) \), while Stochastic Gradient +Descent (SGD) uses a single sample (or mini-batch) gradient \( \nabla +l_i(\theta) \) selected at random. In equation form, one GD step is:

    -

    One of the quantities we have discussed as a way to measure errors is -the mean-squared error (MSE), mainly used for fitting of continuous -functions. Another choice is the absolute error. +$$ +\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t), +$$ + +

    whereas one SGD step is:

    + +$$ +\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t), +$$ + +

    with \( i_t \) randomly chosen. On smooth convex problems, GD and SGD both +converge to the global minimum, but their rates differ. GD can take +larger, more stable steps since it uses the exact gradient, achieving +an error that decreases on the order of \( O(1/t) \) per iteration for +convex objectives (and even exponentially fast for strongly convex +cases). In contrast, plain SGD has more variance in each step, leading +to sublinear convergence in expectation – typically \( O(1/\sqrt{t}) \) +for general convex objectives (\thetaith appropriate diminishing step +sizes) . Intuitively, GD’s trajectory is smoother and more +predictable, while SGD’s path oscillates due to noise but costs far +less per iteration, enabling many more updates in the same time.

    +

    Strongly Convex Case

    -

    In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data, -we discuss the +

    If \( C(\theta) \) is strongly convex and \( L \)-smooth (so GD enjoys linear +convergence), the gap \( C(\theta_t)-C(\theta^*) \) for GD shrinks as

    -
      -
    1. prediction error or simply the test error \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the
    2. -
    3. training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.
    4. -
    -

    As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error. -For a certain level of complexity the test error will reach minimum, before starting to increase again. The -training error reaches a saturation. +$$ +C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)], +$$ + +

    a geometric (linear) convergence per iteration . Achieving an +\( \epsilon \)-accurate solution thus takes on the order of +\( \log(1/\epsilon) \) iterations for GD. However, each GD iteration costs +\( O(N) \) gradient evaluations. SGD cannot exploit strong convexity to +obtain a linear rate – instead, with a properly decaying step size +(e.g. \( \eta_t = \frac{1}{\mu t} \)) or iterate averaging, SGD attains an +\( O(1/t) \) convergence rate in expectation . For example, one result +of Moulines and Bach 2011, see https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html shows that with \( \eta_t = \Theta(1/t) \),

    +$$ +\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t), +$$ -









    -

    Resampling methods: Bootstrap

    -
    - -

    -

    Bootstrapping is a non-parametric approach to statistical inference -that substitutes computation for more traditional distributional -assumptions and asymptotic results. Bootstrapping offers a number of -advantages: +

    for strongly convex, smooth \( F \) . This \( 1/t \) rate is slower per +iteration than GD’s exponential decay, but each SGD iteration is \( N \) +times cheaper. In fact, to reach error \( \epsilon \), plain SGD needs on +the order of \( T=O(1/\epsilon) \) iterations (sub-linear convergence), +while GD needs \( O(\log(1/\epsilon)) \) iterations. When accounting for +cost-per-iteration, GD requires \( O(N \log(1/\epsilon)) \) total gradient +computations versus SGD’s \( O(1/\epsilon) \) single-sample +computations. In large-scale regimes (huge \( N \)), SGD can be +faster in wall-clock time because \( N \log(1/\epsilon) \) may far exceed +\( 1/\epsilon \) for reasonable accuracy levels. In other words, +with millions of data points, one epoch of GD (one full gradient) is +extremely costly, whereas SGD can make \( N \) cheap updates in the time +GD makes one – often yielding a good solution faster in practice, even +though SGD’s asymptotic error decays more slowly. As one lecture +succinctly puts it: “SGD can be super effective in terms of iteration +cost and memory, but SGD is slow to converge and can’t adapt to strong +convexity” . Thus, the break-even point depends on \( N \) and the desired +accuracy: for moderate accuracy on very large \( N \), SGD’s cheaper +updates win; for extremely high precision (very small \( \epsilon \)) on a +modest \( N \), GD’s fast convergence per step can be advantageous.

    -
      -
    1. The bootstrap is quite general, although there are some cases in which it fails.
    2. -
    3. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.
    4. -
    5. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.
    6. -
    7. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
    8. -
    -
    +

    Non-Convex Problems

    +

    In non-convex optimization (e.g. deep neural networks), neither GD nor +SGD guarantees global minima, but SGD often displays faster progress +in finding useful minima. Theoretical results here are weaker, usually +showing convergence to a stationary point \( \theta \) (\( |\nabla C| \) is +small) in expectation. For example, GD might require \( O(1/\epsilon^2) \) +iterations to ensure \( |\nabla C(\theta)| < \epsilon \), and SGD typically has +similar polynomial complexity (often worse due to gradient +noise). However, a noteworthy difference is that SGD’s stochasticity +can help escape saddle points or poor local minima. Random gradient +fluctuations act like implicit noise, helping the iterate “jump” out +of flat saddle regions where full-batch GD could stagnate . In fact, +research has shown that adding noise to GD can guarantee escaping +saddle points in polynomial time, and the inherent noise in SGD often +serves this role. Empirically, this means SGD can sometimes find a +lower loss basin faster, whereas full-batch GD might get “stuck” near +saddle points or need a very small learning rate to navigate complex +error surfaces . Overall, in modern high-dimensional machine learning, +SGD (or mini-batch SGD) is the workhorse for large non-convex problems +because it converges to good solutions much faster in practice, +despite the lack of a linear convergence guarantee. Full-batch GD is +rarely used on large neural networks, as it would require tiny steps +to avoid divergence and is extremely slow per iteration . +

    -

    The textbook by Davison on the Bootstrap Methods and their Applications provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by Efron and Tibshirani.

    +









    +

    Memory Usage and Scalability

    + +

    A major advantage of SGD is its memory efficiency in handling large +datasets. Full-batch GD requires access to the entire training set for +each iteration, which often means the whole dataset (or a large +subset) must reside in memory to compute \( \nabla C(\theta) \) . This results +in memory usage that scales linearly with the dataset size \( N \). For +instance, if each training sample is large (e.g. high-dimensional +features), computing a full gradient may require storing a substantial +portion of the data or all intermediate gradients until they are +aggregated. In contrast, SGD needs only a single (or a small +mini-batch of) training example(s) in memory at any time . The +algorithm processes one sample (or mini-batch) at a time and +immediately updates the model, discarding that sample before moving to +the next. This streaming approach means that memory footprint is +essentially independent of \( N \) (apart from storing the model +parameters themselves). As one source notes, gradient descent +“requires more memory than SGD” because it “must store the entire +dataset for each iteration,” whereas SGD “only needs to store the +current training example” . In practical terms, if you have a dataset +of size, say, 1 million examples, full-batch GD would need memory for +all million every step, while SGD could be implemented to load just +one example at a time – a crucial benefit if data are too large to fit +in RAM or GPU memory. This scalability makes SGD suitable for +large-scale learning: as long as you can stream data from disk, SGD +can handle arbitrarily large datasets with fixed memory. In fact, SGD +“does not need to remember which examples were visited” in the past, +allowing it to run in an online fashion on infinite data streams +. Full-batch GD, on the other hand, would require multiple passes +through a giant dataset per update (or a complex distributed memory +system), which is often infeasible. +

    + +

    There is also a secondary memory effect: computing a full-batch +gradient in deep learning requires storing all intermediate +activations for backpropagation across the entire batch. A very large +batch (approaching the full dataset) might exhaust GPU memory due to +the need to hold activation gradients for thousands or millions of +examples simultaneously. SGD/minibatches mitigate this by splitting +the workload – e.g. with a mini-batch of size 32 or 256, memory use +stays bounded, whereas a full-batch (size = \( N \)) forward/backward pass +could not even be executed if \( N \) is huge. Techniques like gradient +accumulation exist to simulate large-batch GD by summing many +small-batch gradients – but these still process data in manageable +chunks to avoid memory overflow. In summary, memory complexity for GD +grows with \( N \), while for SGD it remains \( O(1) \) w.r.t. dataset size +(only the model and perhaps a mini-batch reside in memory) . This is a +key reason why batch GD “does not scale” to very large data and why +virtually all large-scale machine learning algorithms rely on +stochastic or mini-batch methods. +

    -

    Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called central limit theorem.

    +









    +

    Empirical Evidence: Convergence Time and Memory in Practice

    + +

    Empirical studies strongly support the theoretical trade-offs +above. In large-scale machine learning tasks, SGD often converges to a +good solution much faster in wall-clock time than full-batch GD, and +it uses far less memory. For example, Bottou & Bousquet (2008) +analyzed learning time under a fixed computational budget and +concluded that when data is abundant, it’s better to use a faster +(even if less precise) optimization method to process more examples in +the same time . This analysis showed that for large-scale problems, +processing more data with SGD yields lower error than spending the +time to do exact (batch) optimization on fewer data . In other words, +if you have a time budget, it’s often optimal to accept slightly +slower convergence per step (as with SGD) in exchange for being able +to use many more training samples in that time. This phenomenon is +borne out by experiments: +

    +

    Deep Neural Networks

    + +

    In modern deep learning, full-batch GD is so slow that it is rarely +attempted; instead, mini-batch SGD is standard. A recent study +demonstrated that it is possible to train a ResNet-50 on ImageNet +using full-batch gradient descent, but it required careful tuning +(e.g. gradient clipping, tiny learning rates) and vast computational +resources – and even then, each full-batch update was extremely +expensive. +

    + +

    Using a huge batch +(closer to full GD) tends to slow down convergence if the learning +rate is not scaled up, and often encounters optimization difficulties +(plateaus) that small batches avoid. +Empirically, small or medium +batch SGD finds minima in fewer clock hours because it can rapidly +loop over the data with gradient noise aiding exploration. +

    +

    Memory constraints

    + +

    From a memory standpoint, practitioners note that batch GD becomes +infeasible on large data. For example, if one tried to do full-batch +training on a dataset that doesn’t fit in RAM or GPU memory, the +program would resort to heavy disk I/O or simply crash. SGD +circumvents this by processing mini-batches. Even in cases where data +does fit in memory, using a full batch can spike memory usage due to +storing all gradients. One empirical observation is that mini-batch +training has a “lower, fluctuating usage pattern” of memory, whereas +full-batch loading “quickly consumes memory (often exceeding limits)” +. This is especially relevant for graph neural networks or other +models where a “batch” may include a huge chunk of a graph: full-batch +gradient computation can exhaust GPU memory, whereas mini-batch +methods keep memory usage manageable . +

    + +

    In summary, SGD converges faster than full-batch GD in terms of actual +training time for large-scale problems, provided we measure +convergence as reaching a good-enough solution. Theoretical bounds +show SGD needs more iterations, but because it performs many more +updates per unit time (and requires far less memory), it often +achieves lower loss in a given time frame than GD. Full-batch GD might +take slightly fewer iterations in theory, but each iteration is so +costly that it is “slower… especially for large datasets” . Meanwhile, +memory scaling strongly favors SGD: GD’s memory cost grows with +dataset size, making it impractical beyond a point, whereas SGD’s +memory use is modest and mostly constant w.r.t. \( N \) . These +differences have made SGD (and mini-batch variants) the de facto +choice for training large machine learning models, from logistic +regression on millions of examples to deep neural networks with +billions of parameters. The consensus in both research and practice is +that for large-scale or high-dimensional tasks, SGD-type methods +converge quicker per unit of computation and handle memory constraints +better than standard full-batch gradient descent . +











    -

    The Central Limit Theorem

    +

    Second moment of the gradient

    + +

    In stochastic gradient descent, with and without momentum, we still +have to specify a schedule for tuning the learning rates \( \eta_t \) +as a function of time. As discussed in the context of Newton's +method, this presents a number of dilemmas. The learning rate is +limited by the steepest direction which can change depending on the +current position in the landscape. To circumvent this problem, ideally +our algorithm would keep track of curvature and take large steps in +shallow, flat directions and small steps in steep, narrow directions. +Second-order methods accomplish this by calculating or approximating +the Hessian and normalizing the learning rate by the +curvature. However, this is very computationally expensive for +extremely large models. Ideally, we would like to be able to +adaptively change the step size to match the landscape without paying +the steep computational price of calculating or approximating +Hessians. +

    + +

    During the last decade a number of methods have been introduced that accomplish +this by tracking not only the gradient, but also the second moment of +the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and +ADAM. +

    -

    Suppose we have a PDF \( p(x) \) from which we generate a series \( N \) -of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \) -is viewed as the average of a specific measurement, e.g., throwing -dice 100 times and then taking the average value, or producing a certain -amount of random numbers. -For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion -which follows. We do the same for \( \mathbb{E}[z]=z \). +









    +

    Challenge: Choosing a Fixed Learning Rate

    +

    A fixed \( \eta \) is hard to get right:

    +
      +
    1. If \( \eta \) is too large, the updates can overshoot the minimum, causing oscillations or divergence
    2. +
    3. If \( \eta \) is too small, convergence is very slow (many iterations to make progress)
    4. +
    +

    In practice, one often uses trial-and-error or schedules (decaying \( \eta \) over time) to find a workable balance. +For a function with steep directions and flat directions, a single global \( \eta \) may be inappropriate:

    +
      +
    1. Steep coordinates require a smaller step size to avoid oscillation.
    2. +
    3. Flat/shallow coordinates could use a larger step to speed up progress.
    4. +
    5. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature.
    6. +
    +









    +

    Motivation for Adaptive Step Sizes

    -

    If we compute the mean \( z \) of \( m \) such mean values \( x_i \)

    -$$ - z=\frac{x_1+x_2+\dots+x_m}{m}, -$$ +
      +
    1. Instead of a fixed global \( \eta \), use an adaptive learning rate for each parameter that depends on the history of gradients.
    2. +
    3. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.
    4. +
    5. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected
    6. +
    7. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates
    8. +
    9. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.
    10. +
    +

    AdaGrad algorithm, taken from Goodfellow et al

    -

    the question we pose is which is the PDF of the new variable \( z \).

    +

    +
    +

    +
    +











    -

    Finding the Limit

    +

    Derivation of the AdaGrad Algorithm

    -

    The probability of obtaining an average value \( z \) is the product of the -probabilities of obtaining arbitrary individual mean values \( x_i \), -but with the constraint that the average is \( z \). We can express this through -the following expression -

    +
    +Accumulating Gradient History +

    +

      +
    1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)
    2. +
    3. Let \( g_t = \nabla C_{i_t}(x_t) \) be the gradient at step \( t \) (or a subgradient for nondifferentiable cases).
    4. +
    5. Initialize \( r_0 = 0 \) (an all-zero vector in \( \mathbb{R}^d \)).
    6. +
    7. At each iteration \( t \), update the accumulation:
    8. +
    $$ - \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m) - \delta(z-\frac{x_1+x_2+\dots+x_m}{m}), +r_t = r_{t-1} + g_t \circ g_t, $$ -

    where the \( \delta \)-function enbodies the constraint that the mean is \( z \). -All measurements that lead to each individual \( x_i \) are expected to -be independent, which in turn means that we can express \( \tilde{p} \) as the -product of individual \( p(x_i) \). The independence assumption is important in the derivation of the central limit theorem. -

    +
      +
    1. Here \( g_t \circ g_t \) denotes element-wise square of the gradient vector. \( g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2 \) for each parameter \( j \).
    2. +
    3. We can view \( H_t = \mathrm{diag}(r_t) \) as a diagonal matrix of past squared gradients. Initially \( H_0 = 0 \).
    4. +
    +
    -









    -

    Rewriting the \( \delta \)-function

    -

    If we use the integral expression for the \( \delta \)-function

    +









    +

    AdaGrad Update Rule Derivation

    +

    We scale the gradient by the inverse square root of the accumulated matrix \( H_t \). The AdaGrad update at step \( t \) is:

    $$ - \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty} - dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)}, +\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t, $$ -

    and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value -we arrive at +

    where \( H_t^{-1/2} \) is the diagonal matrix with entries \( (r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2} \) +In coordinates, this means each parameter \( j \) has an individual step size:

    $$ - \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty} - dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty} - dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m, + \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}. $$ -

    with the integral over \( x \) resulting in

    - +

    In practice we add a small constant \( \epsilon \) in the denominator for numerical stability to avoid division by zero:

    $$ - \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}= - \int_{-\infty}^{\infty}dxp(x) - \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right]. +\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}. $$ +

    Equivalently, the effective learning rate for parameter \( j \) at time \( t \) is \( \displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}} \). This decreases over time as \( r_{t,j} \) grows.











    -

    Identifying Terms

    +

    AdaGrad Properties

    + +
      +
    1. AdaGrad automatically tunes the step size for each parameter. Parameters with more volatile or large gradients get smaller steps, and those with small or infrequent gradients get relatively larger steps
    2. +
    3. No manual schedule needed: The accumulation \( r_t \) keeps increasing (or stays the same if gradient is zero), so step sizes \( \eta/\sqrt{r_t} \) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.
    4. +
    5. Sparse data benefit: For very sparse features, \( r_{t,j} \) grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal
    6. +
    7. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate comparable to the best fixed learning rate tuned for the problem
    8. +
    +

    It effectively reduces the need to tune \( \eta \) by hand.

    +
      +
    1. Limitations: Because \( r_t \) accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)
    2. +
    +









    +

    RMSProp: Adaptive Learning Rates

    -

    The second term on the rhs disappears since this is just the mean and -employing the definition of \( \sigma^2 \) we have +

    Addresses AdaGrad’s diminishing learning rate issue. +Uses a decaying average of squared gradients (instead of a cumulative sum):

    $$ - \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}= - 1-\frac{q^2\sigma^2}{2m^2}+\dots, +v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2, $$ -

    resulting in

    +

    with \( \rho \) typically \( 0.9 \) (or \( 0.99 \)).

    +
      +
    1. Update: \( \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t) \).
    2. +
    3. Recent gradients have more weight, so \( v_t \) adapts to the current landscape.
    4. +
    5. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.
    6. +
    +

    RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 – unpublished.)

    +

    RMSProp algorithm, taken from Goodfellow et al

    -$$ - \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx - \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m, -$$ +

    +
    +

    +
    +

    -

    and in the limit \( m\rightarrow \infty \) we obtain

    +









    +

    Adam Optimizer

    -$$ - \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})} - \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)}, -$$ +

    Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.

    -

    which is the normal distribution with variance -\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \) -and \( \mu \) is also the mean of the PDF \( p(x) \). -

    +
      +
    1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
    2. +
    3. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
    4. +
    5. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
    6. +
    7. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
    8. +
    +

    Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.











    -

    Wrapping it up

    +

    ADAM optimizer

    -

    Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of -the average of \( m \) random values corresponding to a PDF \( p(x) \) -is a normal distribution whose mean is the -mean value of the PDF \( p(x) \) and whose variance is the variance -of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \). +

    In ADAM, we keep a running average of +both the first and second moment of the gradient and use this +information to adaptively change the learning rate for different +parameters. The method is efficient when working with large +problems involving lots data and/or parameters. It is a combination of the +gradient descent with momentum algorithm and the RMSprop algorithm +discussed above.

    -

    The central limit theorem leads to the well-known expression for the -standard deviation, given by -

    +









    +

    Why Combine Momentum and RMSProp?

    + +
      +
    1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
    2. +
    3. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
    4. +
    5. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
    6. +
    7. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
    8. +
    +

    Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice

    +









    +

    Adam: Exponential Moving Averages (Moments)

    +

    Adam maintains two moving averages at each time step \( t \) for each parameter \( w \):

    +
    +First moment (mean) \( m_t \) +

    +

    The Momentum term

    $$ - \sigma_m= -\frac{\sigma}{\sqrt{m}}. +m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t), $$ +
    -

    The latter is true only if the average value is known exactly. This is obtained in the limit -\( m\rightarrow \infty \) only. Because the mean and the variance are measured quantities we obtain -the familiar expression in statistics (the so-called Bessel correction) -

    +
    +Second moment (uncentered variance) \( v_t \) +

    +

    The RMS term

    $$ - \sigma_m\approx -\frac{\sigma}{\sqrt{m-1}}. +v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2, $$ -

    In many cases however the above estimate for the standard deviation, -in particular if correlations are strong, may be too simplistic. Keep -in mind that we have assumed that the variables \( x \) are independent -and identically distributed. This is obviously not always the -case. For example, the random numbers (or better pseudorandom numbers) -we generate in various calculations do always exhibit some -correlations. -

    +

    with typical \( \beta_1 = 0.9 \), \( \beta_2 = 0.999 \). Initialize \( m_0 = 0 \), \( v_0 = 0 \).

    +
    -

    The theorem is satisfied by a large class of PDFs. Note however that for a -finite \( m \), it is not always possible to find a closed form /analytic expression for -\( \tilde{p}(x) \). -

    +

    These are biased estimators of the true first and second moment of the gradients, especially at the start (since \( m_0,v_0 \) are zero)











    -

    Confidence Intervals

    +

    Adam: Bias Correction

    +

    To counteract initialization bias in \( m_t, v_t \), Adam computes bias-corrected estimates

    +$$ +\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. +$$ -

    Confidence intervals are used in statistics and represent a type of estimate -computed from the observed data. This gives a range of values for an -unknown parameter such as the parameters \( \boldsymbol{\beta} \) from linear regression. -

    +
      +
    • When \( t \) is small, \( 1-\beta_i^t \approx 0 \), so \( \hat{m}_t, \hat{v}_t \) significantly larger than raw \( m_t, v_t \), compensating for the initial zero bias.
    • +
    • As \( t \) increases, \( 1-\beta_i^t \to 1 \), and \( \hat{m}_t, \hat{v}_t \) converge to \( m_t, v_t \).
    • +
    • Bias correction is important for Adam’s stability in early iterations
    • +
    +









    +

    Adam: Update Rule Derivation

    +

    Finally, Adam updates parameters using the bias-corrected moments:

    +$$ +\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t, +$$ -

    With the OLS expressions for the parameters \( \boldsymbol{\beta} \) we found -\( \mathbb{E}(\boldsymbol{\beta}) = \boldsymbol{\beta} \), which means that the estimator of the regression parameters is unbiased. +

    where \( \epsilon \) is a small constant (e.g. \( 10^{-8} \)) to prevent division by zero. +Breaking it down:

    +
      +
    1. Compute gradient \( \nabla C(\theta_t) \).
    2. +
    3. Update first moment \( m_t \) and second moment \( v_t \) (exponential moving averages).
    4. +
    5. Bias-correct: \( \hat{m}_t = m_t/(1-\beta_1^t) \), \( \; \hat{v}_t = v_t/(1-\beta_2^t) \).
    6. +
    7. Compute step: \( \Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \).
    8. +
    9. Update parameters: \( \theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t \).
    10. +
    +

    This is the Adam update rule as given in the original paper.

    -

    In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is -\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \). -

    +









    +

    Adam vs. AdaGrad and RMSProp

    -

    This quantity can be used to -construct a confidence interval for the estimates. -

    +
      +
    1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)
    2. +
    3. RMSProp: Uses moving average of squared gradients (like Adam’s \( v_t \)) to maintain adaptive learning rates, but does not include momentum or bias-correction.
    4. +
    5. Adam: Effectively RMSProp + Momentum + Bias-correction
    6. +
        +
      • Momentum (\( m_t \)) provides acceleration and smoother convergence.
      • +
      • Adaptive \( v_t \) scaling moderates the step size per dimension.
      • +
      • Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.
      • +
      +
    +

    In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone











    -

    Standard Approach based on the Normal Distribution

    +

    Adaptivity Across Dimensions

    -

    We will assume that the parameters \( \beta \) follow a normal -distribution. We can then define the confidence interval. Here we will be using as -shorthands \( \mu_{\beta} \) for the above mean value and \( \sigma_{\beta} \) -for the standard deviation. We have then a confidence interval -

    +
      +
    1. Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.
    2. +
    3. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.
    4. +
    5. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.
    6. +
    +

    ADAM algorithm, taken from Goodfellow et al

    -$$ -\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right), -$$ +

    +
    +

    +
    +

    -

    where \( z \) defines the level of certainty (or confidence). For a normal -distribution typical parameters are \( z=2.576 \) which corresponds to a -confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of -\( 95\% \). A confidence level of \( 95\% \) is commonly used and it is -normally referred to as a two-sigmas confidence level, that is we -approximate \( z\approx 2 \). -

    +









    +

    Algorithms and codes for Adagrad, RMSprop and Adam

    -

    For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by Davison on the Bootstrap Methods and their Applications

    +

    The algorithms we have implemented are well described in the text by Goodfellow, Bengio and Courville, chapter 8.

    -

    In this text you will also find an in-depth discussion of the -Bootstrap method, why it works and various theorems related to it. -

    +

    The codes which implement these algorithms are discussed below here.











    -

    Resampling methods: Bootstrap background

    - -

    Since \( \widehat{\beta} = \widehat{\beta}(\boldsymbol{X}) \) is a function of random variables, -\( \widehat{\beta} \) itself must be a random variable. Thus it has -a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to -estimate \( p(\boldsymbol{t}) \) by the relative frequency of -\( \widehat{\beta} \). You can think of this as using a histogram -in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely -resembles \( p(\vec{t}) \), then using numerics, it is straight forward to -estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point -estimators. -

    +

    Practical tips

    +
      +
    • Randomize the data when making mini-batches. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
    • +
    • Transform your inputs. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
    • +
    • Monitor the out-of-sample performance. Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This early stopping significantly improves performance in many settings.
    • +
    • Adaptive optimization methods don't always have good generalization. Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
    • +










    -

    Resampling methods: More Bootstrap background

    +

    Sneaking in automatic differentiation using Autograd

    -

    In the case that \( \widehat{\beta} \) has -more than one component, and the components are independent, we use the -same estimator on each component separately. If the probability -density function of \( X_i \), \( p(x) \), had been known, then it would have -been straightforward to do this by: +

    In the examples here we take the liberty of sneaking in automatic +differentiation (without having discussed the mathematics). In +project 1 you will write the gradients as discussed above, that is +hard-coding the gradients. By introducing automatic differentiation +via the library autograd, which is now replaced by JAX, we have +more flexibility in setting up alternative cost functions.

    -
      -
    1. Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).
    2. -
    3. Then using these numbers, we could compute a replica of \( \widehat{\beta} \) called \( \widehat{\beta}^* \).
    4. -
    -

    By repeated use of the above two points, many -estimates of \( \widehat{\beta} \) can be obtained. The -idea is to use the relative frequency of \( \widehat{\beta}^* \) -(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \). + +

    The +first example shows results with ordinary leats squares.

    -









    -

    Resampling methods: Bootstrap approach

    -

    But -unless there is enough information available about the process that -generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general -unknown. Therefore, Efron in 1979 asked the -question: What if we replace \( p(x) \) by the relative frequency -of the observation \( X_i \)? -

    + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    If we draw observations in accordance with -the relative frequency of the observations, will we obtain the same -result in some asymptotic sense? The answer is yes. -











    -

    Resampling methods: Bootstrap steps

    +

    Same code but now with momentum gradient descent

    -

    The independent bootstrap works like this:

    + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x#+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 30
    +
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd")
    +print(theta)
    +
    +# Now improve with momentum gradient descent
    +change = 0.0
    +delta_momentum = 0.3
    +for iter in range(Niterations):
    +    # calculate gradient
    +    gradients = training_gradient(theta)
    +    # calculate update
    +    new_change = eta*gradients+delta_momentum*change
    +    # take a step
    +    theta -= new_change
    +    # save the change
    +    change = new_change
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd wth momentum")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -
      -
    1. Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).
    2. -
    3. Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).
    4. -
    5. Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\beta}^* \) by evaluating \( \widehat \beta \) under the observations \( \boldsymbol{x}^* \).
    6. -
    7. Repeat this process \( k \) times.
    8. -
    -

    When you are done, you can draw a histogram of the relative frequency -of \( \widehat \beta^* \). This is your estimate of the probability -distribution \( p(t) \). Using this probability distribution you can -estimate any statistics thereof. In principle you never draw the -histogram of the relative frequency of \( \widehat{\beta}^* \). Instead -you use the estimators corresponding to the statistic of interest. For -example, if you are interested in estimating the variance of \( \widehat -\beta \), apply the etsimator \( \widehat \sigma^2 \) to the values -\( \widehat \beta^* \). -











    -

    Code example for the Bootstrap method

    +

    Including Stochastic Gradient Descent with Autograd

    -

    The following code starts with a Gaussian distribution with mean value -\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data -used in the bootstrap analysis. The bootstrap analysis returns a data -set after a given number of bootstrap operations (as many as we have -data points). This data set consists of estimated mean values for each -bootstrap operation. The histogram generated by the bootstrap method -shows that the distribution for these mean values is also a Gaussian, -centered around the mean value \( \mu=100 \) but with standard deviation -\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in -this case the same as the number of original data points). The value -of the standard deviation is what we expect from the central limit -theorem. +

    In this code we include the stochastic gradient descent approach +discussed above. Note here that we specify which argument we are +taking the derivative with respect to when using autograd.

    @@ -1075,32 +1949,79 @@

    Code example for the Bootstrap me
    -
    import numpy as np
    -from time import time
    -from scipy.stats import norm
    +  
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
     import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
     
    -# Returns mean of bootstrap samples 
    -# Bootstrap algorithm
    -def bootstrap(data, datapoints):
    -    t = np.zeros(datapoints)
    -    n = len(data)
    -    # non-parametric bootstrap         
    -    for i in range(datapoints):
    -        t[i] = np.mean(data[np.random.randint(0,n,n)])
    -    # analysis    
    -    print("Bootstrap Statistics :")
    -    print("original           bias      std. error")
    -    print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
    -    return t
    -
    -# We set the mean value to 100 and the standard deviation to 15
    -mu, sigma = 100, 15
    -datapoints = 10000
    -# We generate random numbers according to the normal distribution
    -x = mu + sigma*np.random.randn(datapoints)
    -# bootstrap returns the data sample                                    
    -t = bootstrap(x, datapoints)
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
     
    @@ -1116,10 +2037,9 @@

    Code example for the Bootstrap me

    -

    We see that our new variance and from that the standard deviation, agrees with the central limit theorem.











    -

    Plotting the Histogram

    +

    Same code but now with momentum gradient descent

    @@ -1127,15 +2047,73 @@

    Plotting the Histogram

    -
    # the histogram of the bootstrapped data (normalized data if density = True)
    -n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)
    -# add a 'best fit' line  
    -y = norm.pdf(binsboot, np.mean(t), np.std(t))
    -lt = plt.plot(binsboot, y, 'b', linewidth=1)
    -plt.xlabel('x')
    -plt.ylabel('Probability')
    -plt.grid(True)
    -plt.show()
    +  
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +change = 0.0
    +delta_momentum = 0.3
    +
    +for epoch in range(n_epochs):
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        # calculate update
    +        new_change = eta*gradients+delta_momentum*change
    +        # take a step
    +        theta -= new_change
    +        # save the change
    +        change = new_change
    +print("theta from own sdg with momentum")
    +print(theta)
     
    @@ -1153,76 +2131,221 @@

    Plotting the Histogram











    -

    The bias-variance tradeoff

    +

    But none of these can compete with Newton's method

    -

    We will discuss the bias-variance tradeoff in the context of -continuous predictions such as regression. However, many of the -intuitions and ideas discussed here also carry over to classification -tasks. Consider a dataset \( \mathcal{D} \) consisting of the data -\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). -

    +

    Note that we here have introduced automatic differentiation

    -

    Let us assume that the true data is generated from a noisy model

    - -$$ -\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon} -$$ - -

    where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).

    + +
    +
    +
    +
    +
    +
    # Using Newton's method
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +from autograd import grad
    +
    +def CostOLS(theta):
    +    return (1.0/n)*np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+5*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +# Note that here the Hessian does not depend on the parameters theta
    +invH = np.linalg.pinv(H)
    +theta = np.random.randn(3,1)
    +Niterations = 5
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= invH @ gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own Newton code")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    In our derivation of the ordinary least squares method we defined then -an approximation to the function \( f \) in terms of the parameters -\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, -that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). -

    -

    Thereafter we found the parameters \( \boldsymbol{\beta} \) by optimizing the means squared error via the so-called cost function

    -$$ -C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. -$$ - -

    We can rewrite this as

    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2. -$$ +









    +

    Similar (second order function now) problem but now with AdaGrad

    -

    The three terms represent the square of the bias of the learning -method, which can be thought of as the error caused by the simplifying -assumptions built into the method. The second term represents the -variance of the chosen model and finally the last terms is variance of -the error \( \boldsymbol{\epsilon} \). -

    + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        Giter += gradients*gradients
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +        theta -= update
    +print("theta from own AdaGrad")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \). -We use a more compact notation in terms of the expectation value -

    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right], -$$ +

    Running this code we note an almost perfect agreement with the results from matrix inversion.

    -

    and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get

    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right], -$$ +









    +

    RMSprop for adaptive learning rate with Stochastic Gradient Descent

    -

    which, using the abovementioned expectation values can be rewritten as

    -$$ -\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2, -$$ + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameter rho
    +rho = 0.99
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +	# Accumulated gradient
    +	# Scaling with rho the new and the previous results
    +        Giter = (rho*Giter+(1-rho)*gradients*gradients)
    +	# Taking the diagonal only and inverting
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +	# Hadamard product
    +        theta -= update
    +print("theta from own RMSprop")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).











    -

    A way to Read the Bias-Variance Tradeoff

    - -

    -
    -

    -
    -

    +

    And finally ADAM

    -









    -

    Example code for Bias-Variance tradeoff

    @@ -1230,60 +2353,65 @@

    Example code for Bias-Variance
    -
    import matplotlib.pyplot as plt
    +  
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
     import numpy as np
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.preprocessing import PolynomialFeatures
    -from sklearn.model_selection import train_test_split
    -from sklearn.pipeline import make_pipeline
    -from sklearn.utils import resample
    -
    -np.random.seed(2018)
    -
    -n = 500
    -n_boostraps = 100
    -degree = 18  # A quite high value, just to show.
    -noise = 0.1
    -
    -# Make data set.
    -x = np.linspace(-1, 3, n).reshape(-1, 1)
    -y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
    -
    -# Hold out some test data that is never used in training.
    -x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    -
    -# Combine x transformation and model into one operation.
    -# Not neccesary, but convenient.
    -model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    -
    -# The following (m x n_bootstraps) matrix holds the column vectors y_pred
    -# for each bootstrap iteration.
    -y_pred = np.empty((y_test.shape[0], n_boostraps))
    -for i in range(n_boostraps):
    -    x_, y_ = resample(x_train, y_train)
    -
    -    # Evaluate the new model on the same test data each time.
    -    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    -
    -# Note: Expectations and variances taken w.r.t. different training
    -# data sets, hence the axis=1. Subsequent means are taken across the test data
    -# set in order to obtain a total value, but before this we have error/bias/variance
    -# calculated per data point in the test set.
    -# Note 2: The use of keepdims=True is important in the calculation of bias as this 
    -# maintains the column vector form. Dropping this yields very unexpected results.
    -error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    -bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    -variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    -print('Error:', error)
    -print('Bias^2:', bias)
    -print('Var:', variance)
    -print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))
    -
    -plt.plot(x[::5, :], y[::5, :], label='f(x)')
    -plt.scatter(x_test, y_test, label='Data points')
    -plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')
    -plt.legend()
    -plt.show()
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980
    +theta1 = 0.9
    +theta2 = 0.999
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-7
    +iter = 0
    +for epoch in range(n_epochs):
    +    first_moment = 0.0
    +    second_moment = 0.0
    +    iter += 1
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        # Computing moments first
    +        first_moment = theta1*first_moment + (1-theta1)*gradients
    +        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients
    +        first_term = first_moment/(1.0-theta1**iter)
    +        second_term = second_moment/(1.0-theta2**iter)
    +	# Scaling with rho the new and the previous results
    +        update = eta*first_term/(np.sqrt(second_term)+delta)
    +        theta -= update
    +print("theta from own ADAM")
    +print(theta)
     
    @@ -1301,7 +2429,43 @@

    Example code for Bias-Variance









    -

    Understanding what happens

    +

    Material for the lab sessions

    + +
    + +

    +

      +
    1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)
    2. +
    3. Work on project 1 +
    4. +
    +

    For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen's article is highly recommended.

    +
    + + +









    +

    Reminder on different scaling methods

    + +

    Before fitting a regression model, it is good practice to normalize or +standardize the features. This ensures all features are on a +comparable scale, which is especially important when using +regularization. In the exercises this week we will perform standardization, scaling each +feature to have mean 0 and standard deviation 1. +

    + +

    Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix \( \boldsymbol{X} \). +Then we subtract the mean and divide by the standard deviation for each feature. +

    + +

    In the example here we +we will also center the target \( \boldsymbol{y} \) to mean \( 0 \). Centering \( \boldsymbol{y} \) +(and each feature) means the model does not require a separate intercept +term, the data is shifted such that the intercept is effectively 0 +. (In practice, one could include an intercept in the model and not +penalize it, but here we simplify by centering.) +Choose \( n=100 \) data points and set up $\boldsymbol{x}, \( \boldsymbol{y} \) and the design matrix \( \boldsymbol{X} \). +

    +
    @@ -1309,52 +2473,15 @@

    Understanding what happens

    -
    import matplotlib.pyplot as plt
    -import numpy as np
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.preprocessing import PolynomialFeatures
    -from sklearn.model_selection import train_test_split
    -from sklearn.pipeline import make_pipeline
    -from sklearn.utils import resample
    -
    -np.random.seed(2018)
    -
    -n = 40
    -n_boostraps = 100
    -maxdegree = 14
    -
    -
    -# Make data set.
    -x = np.linspace(-3, 3, n).reshape(-1, 1)
    -y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
    -error = np.zeros(maxdegree)
    -bias = np.zeros(maxdegree)
    -variance = np.zeros(maxdegree)
    -polydegree = np.zeros(maxdegree)
    -x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    -
    -for degree in range(maxdegree):
    -    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    -    y_pred = np.empty((y_test.shape[0], n_boostraps))
    -    for i in range(n_boostraps):
    -        x_, y_ = resample(x_train, y_train)
    -        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    -
    -    polydegree[degree] = degree
    -    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    -    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    -    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    -    print('Polynomial degree:', degree)
    -    print('Error:', error[degree])
    -    print('Bias^2:', bias[degree])
    -    print('Var:', variance[degree])
    -    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
    -
    -plt.plot(polydegree, error, label='Error')
    -plt.plot(polydegree, bias, label='bias')
    -plt.plot(polydegree, variance, label='Variance')
    -plt.legend()
    -plt.show()
    +  
    # Standardize features (zero mean, unit variance for each feature)
    +X_mean = X.mean(axis=0)
    +X_std = X.std(axis=0)
    +X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
    +X_norm = (X - X_mean) / X_std
    +
    +# Center the target to zero mean (optional, to simplify intercept handling)
    +y_mean = ?
    +y_centered = ?
     
    @@ -1370,59 +2497,73 @@

    Understanding what happens

    +

    Do we need to center the values of \( y \)?

    - -

    Summing up

    - -

    The bias-variance tradeoff summarizes the fundamental tension in -machine learning, particularly supervised learning, between the -complexity of a model and the amount of training data needed to train -it. Since data is often limited, in practice it is often useful to -use a less-complex model with higher bias, that is a model whose asymptotic -performance is worse than another model because it is easier to -train and less sensitive to sampling noise arising from having a -finite-sized training dataset (smaller variance). +

    After this preprocessing, each column of \( \boldsymbol{X}_{\mathrm{norm}} \) has mean zero and standard deviation \( 1 \) +and \( \boldsymbol{y}_{\mathrm{centered}} \) has mean 0. This can make the optimization landscape +nicer and ensures the regularization penalty \( \lambda \sum_j +\theta_j^2 \) in Ridge regression treats each coefficient fairly (since features are on the +same scale).

    -

    The above equations tell us that in -order to minimize the expected test error, we need to select a -statistical learning method that simultaneously achieves low variance -and low bias. Note that variance is inherently a nonnegative quantity, -and squared bias is also nonnegative. Hence, we see that the expected -test MSE can never lie below \( Var(\epsilon) \), the irreducible error. +









    +

    Functionality in Scikit-Learn

    + +

    Scikit-Learn has several functions which allow us to rescale the +data, normally resulting in much better results in terms of various +accuracy scores. The StandardScaler function in Scikit-Learn +ensures that for each feature/predictor we study the mean value is +zero and the variance is one (every column in the design/feature +matrix). This scaling has the drawback that it does not ensure that +we have a particular maximum or minimum in our data set. Another +function included in Scikit-Learn is the MinMaxScaler which +ensures that all features are exactly between \( 0 \) and \( 1 \). The

    -

    What do we mean by the variance and bias of a statistical learning -method? The variance refers to the amount by which our model would change if we -estimated it using a different training data set. Since the training -data are used to fit the statistical learning method, different -training data sets will result in a different estimate. But ideally the -estimate for our model should not vary too much between training -sets. However, if a method has high variance then small changes in -the training data can result in large changes in the model. In general, more -flexible statistical methods have higher variance. +









    +

    More preprocessing

    + +
    + +

    +

    The Normalizer scales each data +point such that the feature vector has a euclidean length of one. In other words, it +projects a data point on the circle (or sphere in the case of higher dimensions) with a +radius of 1. This means every data point is scaled by a different number (by the +inverse of it’s length). +This normalization is often used when only the direction (or angle) of the data matters, +not the length of the feature vector. +

    + +

    The RobustScaler works similarly to the StandardScaler in that it +ensures statistical properties for each feature that guarantee that +they are on the same scale. However, the RobustScaler uses the median +and quartiles, instead of mean and variance. This makes the +RobustScaler ignore data points that are very different from the rest +(like measurement errors). These odd data points are also called +outliers, and might often lead to trouble for other scaling +techniques.

    +
    -

    You may also find this recent article of interest.











    -

    Another Example from Scikit-Learn's Repository

    - -

    This example demonstrates the problems of underfitting and overfitting and -how we can use linear regression with polynomial features to approximate -nonlinear functions. The plot shows the function that we want to approximate, -which is a part of the cosine function. In addition, the samples from the -real function and the approximations of different models are displayed. The -models have polynomial features of different degrees. We can see that a -linear function (polynomial with degree 1) is not sufficient to fit the -training samples. This is called underfitting. A polynomial of degree 4 -approximates the true function almost perfectly. However, for higher degrees -the model will overfit the training data, i.e. it learns the noise of the -training data. -We evaluate quantitatively overfitting and underfitting by using -cross-validation. We calculate the mean squared error (MSE) on the validation -set, the higher, the less likely the model generalizes correctly from the -training data. +

    Frequently used scaling functions

    + +

    Many features are often scaled using standardization to improve performance. In Scikit-Learn this is given by the StandardScaler function as discussed above. It is easy however to write your own. +Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature: +

    + +$$ + x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)}, +$$ + +

    where \( \overline{x}_j \) and \( \sigma(x_j) \) are the mean and standard deviation, respectively, of the feature \( x_j \). +This ensures that each feature has zero mean and unit standard deviation. For data sets where we do not have the standard deviation or don't wish to calculate it, it is then common to simply set it to one. +

    + +

    Keep in mind that when you transform your data set before training a model, the same transformation needs to be done +on your eventual new data set before making a prediction. If we translate this into a Python code, it would could be implemented as

    @@ -1432,55 +2573,22 @@

    Another Example from Sci
    -
    #print(__doc__)
    -
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from sklearn.pipeline import Pipeline
    -from sklearn.preprocessing import PolynomialFeatures
    -from sklearn.linear_model import LinearRegression
    -from sklearn.model_selection import cross_val_score
    -
    -
    -def true_fun(X):
    -    return np.cos(1.5 * np.pi * X)
    -
    -np.random.seed(0)
    -
    -n_samples = 30
    -degrees = [1, 4, 15]
    -
    -X = np.sort(np.random.rand(n_samples))
    -y = true_fun(X) + np.random.randn(n_samples) * 0.1
    -
    -plt.figure(figsize=(14, 5))
    -for i in range(len(degrees)):
    -    ax = plt.subplot(1, len(degrees), i + 1)
    -    plt.setp(ax, xticks=(), yticks=())
    -
    -    polynomial_features = PolynomialFeatures(degree=degrees[i],
    -                                             include_bias=False)
    -    linear_regression = LinearRegression()
    -    pipeline = Pipeline([("polynomial_features", polynomial_features),
    -                         ("linear_regression", linear_regression)])
    -    pipeline.fit(X[:, np.newaxis], y)
    -
    -    # Evaluate the models using crossvalidation
    -    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
    -                             scoring="neg_mean_squared_error", cv=10)
    -
    -    X_test = np.linspace(0, 1, 100)
    -    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    -    plt.plot(X_test, true_fun(X_test), label="True function")
    -    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
    -    plt.xlabel("x")
    -    plt.ylabel("y")
    -    plt.xlim((0, 1))
    -    plt.ylim((-2, 2))
    -    plt.legend(loc="best")
    -    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
    -        degrees[i], -scores.mean(), scores.std()))
    -plt.show()
    +  
    """
    +#Model training, we compute the mean value of y and X
    +y_train_mean = np.mean(y_train)
    +X_train_mean = np.mean(X_train,axis=0)
    +X_train = X_train - X_train_mean
    +y_train = y_train - y_train_mean
    +
    +# The we fit our model with the training data
    +trained_model = some_model.fit(X_train,y_train)
    +
    +
    +#Model prediction, we need also to transform our data set used for the prediction.
    +X_test = X_test - X_train_mean #Use mean from training data
    +y_pred = trained_model(X_test)
    +y_pred = y_pred + y_train_mean
    +"""
     
    @@ -1496,47 +2604,112 @@

    Another Example from Sci

    +

    Let us try to understand what this may imply mathematically when we +subtract the mean values, also known as zero centering. For +simplicity, we will focus on ordinary regression, as done in the above example. +

    + +

    The cost/loss function for regression is

    +$$ +C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,. +$$ - -

    Various steps in cross-validation

    - -

    When the repetitive splitting of the data set is done randomly, -samples may accidently end up in a fast majority of the splits in -either training or test set. Such samples may have an unbalanced -influence on either model building or prediction evaluation. To avoid -this \( k \)-fold cross-validation structures the data splitting. The -samples are divided into \( k \) more or less equally sized exhaustive and -mutually exclusive subsets. In turn (at each split) one of these -subsets plays the role of the test set while the union of the -remaining subsets constitutes the training set. Such a splitting -warrants a balanced representation of each sample in both training and -test set over the splits. Still the division into the \( k \) subsets -involves a degree of randomness. This may be fully excluded when -choosing \( k=n \). This particular case is referred to as leave-one-out -cross-validation (LOOCV). +

    Recall also that we use the squared value. This expression can lead to an +increased penalty for higher differences between predicted and +output/target values.

    -









    -

    Cross-validation in brief

    +

    What we have done is to single out the \( \theta_0 \) term in the +definition of the mean squared error (MSE). The design matrix \( X \) +does in this case not contain any intercept column. When we take the +derivative with respect to \( \theta_0 \), we want the derivative to obey +

    -

    For the various values of \( k \)

    +$$ +\frac{\partial C}{\partial \theta_j} = 0, +$$ -
      -
    1. shuffle the dataset randomly.
    2. -
    3. Split the dataset into \( k \) groups.
    4. -
    5. For each unique group: -
        -
      1. Decide which group to use as set for test data
      2. -
      3. Take the remaining groups as a training data set
      4. -
      5. Fit a model on the training set and evaluate it on the test set
      6. -
      7. Retain the evaluation score and discard the model
      8. -
      -
    6. Summarize the model using the sample of model evaluation scores
    7. -
    -









    -

    Code Example for Cross-validation and \( k \)-fold Cross-validation

    +

    for all \( j \). For \( \theta_0 \) we have

    + +$$ +\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right). +$$ + +

    Multiplying away the constant \( 2/n \), we obtain

    +$$ +\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j. +$$ + +

    Let us specialize first to the case where we have only two parameters \( \theta_0 \) and \( \theta_1 \). +Our result for \( \theta_0 \) simplifies then to +

    +$$ +n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1. +$$ + +

    We obtain then

    +$$ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}. +$$ + +

    If we define

    +$$ +\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}, +$$ + +

    and the mean value of the outputs as

    +$$ +\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i, +$$ + +

    we have

    +$$ +\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}. +$$ + +

    In the general case with more parameters than \( \theta_0 \) and \( \theta_1 \), we have

    +$$ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j. +$$ + +

    We can rewrite the latter equation as

    +$$ +\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j, +$$ + +

    where we have defined

    +$$ +\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij}, +$$ + +

    the mean value for all elements of the column vector \( \boldsymbol{x}_j \).

    + +

    Replacing \( y_i \) with \( y_i - y_i - \overline{\boldsymbol{y}} \) and centering also our design matrix results in a cost function (in vector-matrix disguise)

    +$$ +C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). +$$ + +

    If we minimize with respect to \( \boldsymbol{\theta} \) we have then

    + +$$ +\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}, +$$ + +

    where \( \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} \) +and \( \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} \). +

    + +

    For Ridge regression we need to add \( \lambda \boldsymbol{\theta}^T\boldsymbol{\theta} \) to the cost function and get then

    +$$ +\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}. +$$ + +

    What does this mean? And why do we insist on all this? Let us look at some examples.

    + +

    This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (code example thanks to Øyvind Sigmundson Schøyen). Here our scaling of the data is done by subtracting the mean values only. +Note also that we do not split the data into training and test. +

    -

    The code here uses Ridge regression with cross-validation (CV) resampling and \( k \)-fold CV in order to fit a specific polynomial.

    @@ -1546,90 +2719,87 @@

    Code Exam
    import numpy as np
     import matplotlib.pyplot as plt
    -from sklearn.model_selection import KFold
    -from sklearn.linear_model import Ridge
    -from sklearn.model_selection import cross_val_score
    -from sklearn.preprocessing import PolynomialFeatures
    -
    -# A seed just to ensure that the random numbers are the same for every run.
    -# Useful for eventual debugging.
    -np.random.seed(3155)
    -
    -# Generate the data.
    -nsamples = 100
    -x = np.random.randn(nsamples)
    -y = 3*x**2 + np.random.randn(nsamples)
     
    -## Cross-validation on Ridge regression using KFold only
    +from sklearn.linear_model import LinearRegression
     
    -# Decide degree on polynomial to fit
    -poly = PolynomialFeatures(degree = 6)
     
    -# Decide which values of lambda to use
    -nlambdas = 500
    -lambdas = np.logspace(-3, 5, nlambdas)
    +np.random.seed(2021)
     
    -# Initialize a KFold instance
    -k = 5
    -kfold = KFold(n_splits = k)
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
     
    -# Perform the cross-validation to estimate MSE
    -scores_KFold = np.zeros((nlambdas, k))
     
    -i = 0
    -for lmb in lambdas:
    -    ridge = Ridge(alpha = lmb)
    -    j = 0
    -    for train_inds, test_inds in kfold.split(x):
    -        xtrain = x[train_inds]
    -        ytrain = y[train_inds]
    +def fit_theta(X, y):
    +    return np.linalg.pinv(X.T @ X) @ X.T @ y
     
    -        xtest = x[test_inds]
    -        ytest = y[test_inds]
     
    -        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
    -        ridge.fit(Xtrain, ytrain[:, np.newaxis])
    +true_theta = [2, 0.5, 3.7]
     
    -        Xtest = poly.fit_transform(xtest[:, np.newaxis])
    -        ypred = ridge.predict(Xtest)
    +x = np.linspace(0, 1, 11)
    +y = np.sum(
    +    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0
    +) + 0.1 * np.random.normal(size=len(x))
     
    -        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
    +degree = 3
    +X = np.zeros((len(x), degree))
     
    -        j += 1
    -    i += 1
    +# Include the intercept in the design matrix
    +for p in range(degree):
    +    X[:, p] = x ** p
     
    +theta = fit_theta(X, y)
     
    -estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
    +# Intercept is included in the design matrix
    +skl = LinearRegression(fit_intercept=False).fit(X, y)
     
    -## Cross-validation using cross_val_score from sklearn along with KFold
    +print(f"True theta: {true_theta}")
    +print(f"Fitted theta: {theta}")
    +print(f"Sklearn fitted theta: {skl.coef_}")
    +ypredictOwn = X @ theta
    +ypredictSKL = skl.predict(X)
    +print(f"MSE with intercept column")
    +print(MSE(y,ypredictOwn))
    +print(f"MSE with intercept column from SKL")
    +print(MSE(y,ypredictSKL))
     
    -# kfold is an instance initialized above as:
    -# kfold = KFold(n_splits = k)
     
    -estimated_mse_sklearn = np.zeros(nlambdas)
    -i = 0
    -for lmb in lambdas:
    -    ridge = Ridge(alpha = lmb)
    +plt.figure()
    +plt.scatter(x, y, label="Data")
    +plt.plot(x, X @ theta, label="Fit")
    +plt.plot(x, skl.predict(X), label="Sklearn (fit_intercept=False)")
     
    -    X = poly.fit_transform(x[:, np.newaxis])
    -    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
     
    -    # cross_val_score return an array containing the estimated negative mse for every fold.
    -    # we have to the the mean of every array in order to get an estimate of the mse of the model
    -    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
    +# Do not include the intercept in the design matrix
    +X = np.zeros((len(x), degree - 1))
     
    -    i += 1
    +for p in range(degree - 1):
    +    X[:, p] = x ** (p + 1)
     
    -## Plot and compare the slightly different ways to perform cross-validation
    +# Intercept is not included in the design matrix
    +skl = LinearRegression(fit_intercept=True).fit(X, y)
     
    -plt.figure()
    +# Use centered values for X and y when computing coefficients
    +y_offset = np.average(y, axis=0)
    +X_offset = np.average(X, axis=0)
     
    -plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
    -plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
    +theta = fit_theta(X - X_offset, y - y_offset)
    +intercept = np.mean(y_offset - X_offset @ theta)
     
    -plt.xlabel('log10(lambda)')
    -plt.ylabel('mse')
    +print(f"Manual intercept: {intercept}")
    +print(f"Fitted theta (without intercept): {theta}")
    +print(f"Sklearn intercept: {skl.intercept_}")
    +print(f"Sklearn fitted theta (without intercept): {skl.coef_}")
    +ypredictOwn = X @ theta
    +ypredictSKL = skl.predict(X)
    +print(f"MSE with Manual intercept")
    +print(MSE(y,ypredictOwn+intercept))
    +print(f"MSE with Sklearn intercept")
    +print(MSE(y,ypredictSKL))
     
    +plt.plot(x, X @ theta + intercept, "--", label="Fit (manual intercept)")
    +plt.plot(x, skl.predict(X), "--", label="Sklearn (fit_intercept=True)")
    +plt.grid()
     plt.legend()
     
     plt.show()
    @@ -1648,9 +2818,43 @@ 

    Code Exam

    +

    The intercept is the value of our output/target variable +when all our features are zero and our function crosses the \( y \)-axis (for a one-dimensional case). +

    -









    -

    More examples on bootstrap and cross-validation and errors

    +

    Printing the MSE, we see first that both methods give the same MSE, as +they should. However, when we move to for example Ridge regression, +the way we treat the intercept may give a larger or smaller MSE, +meaning that the MSE can be penalized by the value of the +intercept. Not including the intercept in the fit, means that the +regularization term does not include \( \theta_0 \). For different values +of \( \lambda \), this may lead to different MSE values. +

    + +

    To remind the reader, the regularization term, with the intercept in Ridge regression, is given by

    +$$ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2, +$$ + +

    but when we take out the intercept, this equation becomes

    +$$ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2. +$$ + +

    For Lasso regression we have

    +$$ +\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert. +$$ + +

    It means that, when scaling the design matrix and the outputs/targets, +by subtracting the mean values, we have an optimization problem which +is not penalized by the intercept. The MSE value can then be smaller +since it focuses only on the remaining quantities. If we however bring +back the intercept, we will get a MSE which then contains the +intercept. +

    + +

    Armed with this wisdom, we attempt first to simply set the intercept equal to False in our implementation of Ridge regression for our well-known vanilla data set.

    @@ -1659,82 +2863,69 @@

    More example
    -
    # Common imports
    -import os
    -import numpy as np
    +  
    import numpy as np
     import pandas as pd
     import matplotlib.pyplot as plt
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
     from sklearn.model_selection import train_test_split
    -from sklearn.utils import resample
    -from sklearn.metrics import mean_squared_error
    -# Where to save the figures and data files
    -PROJECT_ROOT_DIR = "Results"
    -FIGURE_ID = "Results/FigureFiles"
    -DATA_ID = "DataFiles/"
    -
    -if not os.path.exists(PROJECT_ROOT_DIR):
    -    os.mkdir(PROJECT_ROOT_DIR)
    -
    -if not os.path.exists(FIGURE_ID):
    -    os.makedirs(FIGURE_ID)
    -
    -if not os.path.exists(DATA_ID):
    -    os.makedirs(DATA_ID)
    -
    -def image_path(fig_id):
    -    return os.path.join(FIGURE_ID, fig_id)
    -
    -def data_path(dat_id):
    -    return os.path.join(DATA_ID, dat_id)
    -
    -def save_fig(fig_id):
    -    plt.savefig(image_path(fig_id) + ".png", format='png')
    -
    -infile = open(data_path("EoS.csv"),'r')
    -
    -# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    -EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    -EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    -EoS = EoS.dropna()
    -Energies = EoS['Energy']
    -Density = EoS['Density']
    -#  The design matrix now as function of various polytrops
    -
    -Maxpolydegree = 30
    -X = np.zeros((len(Density),Maxpolydegree))
    -X[:,0] = 1.0
    -testerror = np.zeros(Maxpolydegree)
    -trainingerror = np.zeros(Maxpolydegree)
    -polynomial = np.zeros(Maxpolydegree)
    -
    -trials = 100
    -for polydegree in range(1, Maxpolydegree):
    -    polynomial[polydegree] = polydegree
    -    for degree in range(polydegree):
    -        X[:,degree] = Density**(degree/3.0)
    -
    -# loop over trials in order to estimate the expectation value of the MSE
    -    testerror[polydegree] = 0.0
    -    trainingerror[polydegree] = 0.0
    -    for samples in range(trials):
    -        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    -        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    -        ypred = model.predict(x_train)
    -        ytilde = model.predict(x_test)
    -        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    -        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    -
    -    testerror[polydegree] /= trials
    -    trainingerror[polydegree] /= trials
    -    print("Degree of polynomial: %3d"% polynomial[polydegree])
    -    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    -    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    -
    -plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    -plt.plot(polynomial, np.log10(testerror), label='Test Error')
    -plt.xlabel('Polynomial degree')
    -plt.ylabel('log10[MSE]')
    +from sklearn import linear_model
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +
    +
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(3155)
    +
    +n = 100
    +x = np.random.rand(n)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
    +
    +Maxpolydegree = 20
    +X = np.zeros((n,Maxpolydegree))
    +#We include explicitely the intercept column
    +for degree in range(Maxpolydegree):
    +    X[:,degree] = x**degree
    +# We split the data in test and training data
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    +
    +p = Maxpolydegree
    +I = np.eye(p,p)
    +# Decide which values of lambda to use
    +nlambdas = 6
    +MSEOwnRidgePredict = np.zeros(nlambdas)
    +MSERidgePredict = np.zeros(nlambdas)
    +lambdas = np.logspace(-4, 2, nlambdas)
    +for i in range(nlambdas):
    +    lmb = lambdas[i]
    +    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
    +    # Note: we include the intercept column and no scaling
    +    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)
    +    RegRidge.fit(X_train,y_train)
    +    # and then make the prediction
    +    ytildeOwnRidge = X_train @ OwnRidgeTheta
    +    ypredictOwnRidge = X_test @ OwnRidgeTheta
    +    ytildeRidge = RegRidge.predict(X_train)
    +    ypredictRidge = RegRidge.predict(X_test)
    +    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
    +    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
    +    print("Theta values for own Ridge implementation")
    +    print(OwnRidgeTheta)
    +    print("Theta values for Scikit-Learn Ridge implementation")
    +    print(RegRidge.coef_)
    +    print("MSE values for own Ridge implementation")
    +    print(MSEOwnRidgePredict[i])
    +    print("MSE values for Scikit-Learn Ridge implementation")
    +    print(MSERidgePredict[i])
    +
    +# Now plot the results
    +plt.figure()
    +plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')
    +plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')
    +
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('MSE')
     plt.legend()
     plt.show()
     
    @@ -1752,12 +2943,12 @@

    More example

    -

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.

    - - -

    The same example but now with cross-validation

    +

    The results here agree when we force Scikit-Learn's Ridge function to include the first column in our design matrix. +We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix. +What happens if we do not include the intercept in our fit? +Let us see how we can change this code by zero centering. +

    -

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    @@ -1765,71 +2956,82 @@

    The same example but now
    -
    # Common imports
    -import os
    -import numpy as np
    +  
    import numpy as np
     import pandas as pd
     import matplotlib.pyplot as plt
    -from sklearn.linear_model import LinearRegression, Ridge, Lasso
    -from sklearn.metrics import mean_squared_error
    -from sklearn.model_selection import KFold
    -from sklearn.model_selection import cross_val_score
    -
    -
    -# Where to save the figures and data files
    -PROJECT_ROOT_DIR = "Results"
    -FIGURE_ID = "Results/FigureFiles"
    -DATA_ID = "DataFiles/"
    -
    -if not os.path.exists(PROJECT_ROOT_DIR):
    -    os.mkdir(PROJECT_ROOT_DIR)
    -
    -if not os.path.exists(FIGURE_ID):
    -    os.makedirs(FIGURE_ID)
    -
    -if not os.path.exists(DATA_ID):
    -    os.makedirs(DATA_ID)
    -
    -def image_path(fig_id):
    -    return os.path.join(FIGURE_ID, fig_id)
    -
    -def data_path(dat_id):
    -    return os.path.join(DATA_ID, dat_id)
    -
    -def save_fig(fig_id):
    -    plt.savefig(image_path(fig_id) + ".png", format='png')
    -
    -infile = open(data_path("EoS.csv"),'r')
    -
    -# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    -EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    -EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    -EoS = EoS.dropna()
    -Energies = EoS['Energy']
    -Density = EoS['Density']
    -#  The design matrix now as function of various polytrops
    -
    -Maxpolydegree = 30
    -X = np.zeros((len(Density),Maxpolydegree))
    -X[:,0] = 1.0
    -estimated_mse_sklearn = np.zeros(Maxpolydegree)
    -polynomial = np.zeros(Maxpolydegree)
    -k =5
    -kfold = KFold(n_splits = k)
    -
    -for polydegree in range(1, Maxpolydegree):
    -    polynomial[polydegree] = polydegree
    -    for degree in range(polydegree):
    -        X[:,degree] = Density**(degree/3.0)
    -        OLS = LinearRegression(fit_intercept=False)
    -# loop over trials in order to estimate the expectation value of the MSE
    -    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    -#[:, np.newaxis]
    -    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    -
    -plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    -plt.xlabel('Polynomial degree')
    -plt.ylabel('log10[MSE]')
    +from sklearn.model_selection import train_test_split
    +from sklearn import linear_model
    +from sklearn.preprocessing import StandardScaler
    +
    +def MSE(y_data,y_model):
    +    n = np.size(y_model)
    +    return np.sum((y_data-y_model)**2)/n
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(315)
    +
    +n = 100
    +x = np.random.rand(n)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
    +
    +Maxpolydegree = 20
    +X = np.zeros((n,Maxpolydegree-1))
    +
    +for degree in range(1,Maxpolydegree): #No intercept column
    +    X[:,degree-1] = x**(degree)
    +
    +# We split the data in test and training data
    +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    +
    +#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable
    +X_train_mean = np.mean(X_train,axis=0)
    +#Center by removing mean from each feature
    +X_train_scaled = X_train - X_train_mean 
    +X_test_scaled = X_test - X_train_mean
    +#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)
    +#Remove the intercept from the training data.
    +y_scaler = np.mean(y_train)           
    +y_train_scaled = y_train - y_scaler   
    +
    +p = Maxpolydegree-1
    +I = np.eye(p,p)
    +# Decide which values of lambda to use
    +nlambdas = 6
    +MSEOwnRidgePredict = np.zeros(nlambdas)
    +MSERidgePredict = np.zeros(nlambdas)
    +
    +lambdas = np.logspace(-4, 2, nlambdas)
    +for i in range(nlambdas):
    +    lmb = lambdas[i]
    +    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
    +    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data
    +    #Add intercept to prediction
    +    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
    +    RegRidge = linear_model.Ridge(lmb)
    +    RegRidge.fit(X_train,y_train)
    +    ypredictRidge = RegRidge.predict(X_test)
    +    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
    +    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
    +    print("Theta values for own Ridge implementation")
    +    print(OwnRidgeTheta) #Intercept is given by mean of target variable
    +    print("Theta values for Scikit-Learn Ridge implementation")
    +    print(RegRidge.coef_)
    +    print('Intercept from own implementation:')
    +    print(intercept_)
    +    print('Intercept from Scikit-Learn Ridge implementation')
    +    print(RegRidge.intercept_)
    +    print("MSE values for own Ridge implementation")
    +    print(MSEOwnRidgePredict[i])
    +    print("MSE values for Scikit-Learn Ridge implementation")
    +    print(MSERidgePredict[i])
    +
    +
    +# Now plot the results
    +plt.figure()
    +plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')
    +plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('MSE')
     plt.legend()
     plt.show()
     
    @@ -1847,191 +3049,17 @@

    The same example but now

    - -









    -

    Material for the lab sessions

    - - -

    Linking the regression analysis with a statistical interpretation

    - -

    We will now couple the discussions of ordinary least squares, Ridge -and Lasso regression with a statistical interpretation, that is we -move from a linear algebra analysis to a statistical analysis. In -particular, we will focus on what the regularization terms can result -in. We will amongst other things show that the regularization -parameter can reduce considerably the variance of the parameters -\( \beta \). -

    - -

    The -advantage of doing linear regression is that we actually end up with -analytical expressions for several statistical quantities. -Standard least squares and Ridge regression allow us to -derive quantities like the variance and other expectation values in a -rather straightforward way. -

    - -

    It is assumed that \( \varepsilon_i -\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are -independent, i.e.: -

    -$$ -\begin{align*} -\mbox{Cov}(\varepsilon_{i_1}, -\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if} -& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2. \end{array} \right. -\end{align*} -$$ - -

    The randomness of \( \varepsilon_i \) implies that -\( \mathbf{y}_i \) is also a random variable. In particular, -\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim -\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\beta} \) is a -non-random scalar. To specify the parameters of the distribution of -\( \mathbf{y}_i \) we need to calculate its first two moments. -

    - -

    Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The -notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the -row number \( i \) and perform a sum over all values \( p \). -

    - -









    -

    Assumptions made

    - -

    The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off) -that there exists a function \( f(\boldsymbol{x}) \) and a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \) -which describe our data -

    -$$ -\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon} -$$ - -

    We approximate this function with our model from the solution of the linear regression equations, that is our -function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with -

    -$$ -\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\beta}. -$$ - - -









    -

    Expectation value and variance

    - -

    We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \)

    -$$ -\begin{align*} -\mathbb{E}(y_i) & = -\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\beta}) + \mathbb{E}(\varepsilon_i) -\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, -\end{align*} -$$ - -

    while -its variance is -

    -$$ -\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i -- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) - -[\mathbb{E}(y_i)]^2 \\ & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, -\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 \\ & -= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 \varepsilon_i -\mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i, -\ast} \, \beta)^2 \\ & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 -\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + -\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 -\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \, -\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2. -\end{align*} -$$ - -

    Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with -mean value \( \boldsymbol{X}\boldsymbol{\beta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). +

    We see here, when compared to the code which includes explicitely the +intercept column, that our MSE value is actually smaller. This is +because the regularization term does not include the intercept value +\( \theta_0 \) in the fitting. This applies to Lasso regularization as +well. It means that our optimization is now done only with the +centered matrix and/or vector that enter the fitting procedure.

    -









    -

    Expectation value and variance for \( \boldsymbol{\beta} \)

    - -

    With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\beta}} \) we can evaluate the expectation value

    -$$ -\mathbb{E}(\boldsymbol{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\beta}. -$$ - -

    This means that the estimator of the regression parameters is unbiased.

    - -

    We can also calculate the variance

    - -

    The variance of the optimal value \( \boldsymbol{\hat{\beta}} \) is

    -$$ -\begin{eqnarray*} -\mbox{Var}(\boldsymbol{\hat{\beta}}) & = & \mathbb{E} \{ [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})] [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})]^{T} \} -\\ -& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}]^{T} \} -\\ -% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -% \\ -% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -% \\ -& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -\\ -& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -% \\ -% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^T \, \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1} -% \\ -% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\beta} \boldsymbol{\beta}^T -\\ -& = & \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} -\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1}, -\end{eqnarray*} -$$ - -

    where we have used that \( \mathbb{E} (\mathbf{y} \mathbf{y}^{T}) = -\mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} + -\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\beta}) = \sigma^2 -\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the -variance of the estimate of the \( j \)-th regression coefficient: -\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \). This may be used to -construct a confidence interval for the estimates. -

    - -

    In a similar way, we can obtain analytical expressions for say the -expectation values of the parameters \( \boldsymbol{\beta} \) and their variance -when we employ Ridge regression, allowing us again to define a confidence interval. -

    - -

    It is rather straightforward to show that

    -$$ -\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}. -$$ - -

    We see clearly that -\( \mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\boldsymbol{\beta}}^{\mathrm{OLS}} \) for any \( \lambda > 0 \). -

    - -

    We can also compute the variance as

    - -$$ -\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1} \mathbf{X}^{T} \mathbf{X} \{ [ \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}, -$$ - -

    and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\beta} \) goes to zero.

    - -

    With this, we can compute the difference

    - -$$ -\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}})=\sigma^2 [ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [ \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}. -$$ - -

    The difference is non-negative definite since each component of the -matrix product is non-negative definite. -This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\beta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. -

    - -

    For more discussions of Ridge regression and calculation of averages, Wessel van Wieringen's article is highly recommended.

    -
    - © 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license + © 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
    diff --git a/doc/pub/week37/html/week37.html b/doc/pub/week37/html/week37.html index 433634e84..4d7dbab99 100644 --- a/doc/pub/week37/html/week37.html +++ b/doc/pub/week37/html/week37.html @@ -8,8 +8,8 @@ - -Week 37: Statistical interpretations and Resampling Methods + +Week 37: Gradient descent methods \n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
    IDAgeAgegroupCHD
    012110
    122310
    232511
    342910
    452110
    ...............
    95966181
    96976981
    97986581
    98996481
    991006380
    \n", - "

    100 rows × 4 columns

    \n", - "
    " - ], - "text/plain": [ - " ID Age Agegroup CHD\n", - "0 1 21 1 0\n", - "1 2 23 1 0\n", - "2 3 25 1 1\n", - "3 4 29 1 0\n", - "4 5 21 1 0\n", - ".. ... ... ... ...\n", - "95 96 61 8 1\n", - "96 97 69 8 1\n", - "97 98 65 8 1\n", - "98 99 64 8 1\n", - "99 100 63 8 0\n", - "\n", - "[100 rows x 4 columns]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfgAAAFnCAYAAABKGFvpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deWBU5dn38d9kmRCSQMIQIBEFBEHZt1ojLkCtgsW61VoX3kdc4gYuCCgiS6mKwIMLKgrS2mqxVYuAiLJoG0AWI5AAyg6yhiVMyDLZJpOc94808xBIMkPmTBIP389fTM6Z+1znOstv5p4hsRmGYQgAAFhKSH0XAAAAzEfAAwBgQQQ8AAAWRMADAGBBBDwAABZEwAMAYEEEPGq0aNEi9e3bV263O+jbSklJ0aBBgzR06FDvzyZMmKCFCxeato0NGzbo/vvvlyR98803Z23PzPEbqj/96U/q27evPvvss2rXKS0t1ezZs/WHP/xBQ4cO1e9//3uNGzdOW7ZsqcNK68aWLVt08803a+DAgfVdSpWOHz+u3//+9+rUqZP3Z4sXL9a4cePqsSr8LBhADZ544gmjR48exjfffFMn25s/f75x7733eh+7XC6jpKTE5/PuvfdeY/78+T7XKysrM3Jzc6vd3rnq2LGjcejQoWrHb6h89WvUqFHGM888YxQXFxuGYRjFxcXGc889Z/z2t7+tqxLr1Pr1640BAwbU2fb8PV8rHDp0yOjYsaP3scfjMVwuVzBKg4XwDh7VcrlcCg0N1YABA/TVV1/VSw1RUVEKCwszbTybzaaYmBjTxqvr8evCd999p+XLl2vixImy2+2SJLvdrnHjxnkfo36FhoYqKiqqvstAA2fenROW8/XXX+v6669XWFiYxowZo+LiYkVEREiS3n33XS1evFgtW7bUFVdcoRkzZujyyy/Xn//8Z9lsNr366qtKS0uTzWZTv3799Pjjj8tms521jYKCAo0fP1579+5Vq1atKk1DLly4UDNnztTll1+uV155RYWFhXruuefkdDpVWlqq7t27a+zYsZoxY4a2b9+uzMxMLViwQA888IBSUlL0xRdf6N5779WePXu0ceNGDRo0SD/++KM2b96snTt3erdjGIZefPFF/fjjjyouLtaf/vQndenSRdOmTdMnn3yi559/XrfddpsmTJigBQsWaO7cufrlL3+pBx98UJI0cuRIRURE6JVXXtHTTz9daXyXy6WXX35ZP/30k8rKynTdddfpwQcfVElJiR544AGlpqZqwoQJSklJ0U8//aRnn31Wv/71r6s8Hh9//LEWLFggu90um82m8ePHq0OHDtqyZYvGjx+vvLw83X333UpJSVFOTo5mzpypdu3aSZK2bt2qiRMnKiIiQt26dZNRwy+wXLZsmbp3737WC5Xo6Gi9//773serVq3SrFmzFBoaqkaNGmnChAlq06aNPv74Y82ePVs9evRQVFSUNm7cqObNm+vDDz/06zkxMTHaunWrmjdvrrfeeksRERHKz8/XpEmTdPjwYdlsNl1yySUaP368wsLC9NZbb+kf//iHBg0apOzsbKWlpSknJ0fh4eFq3LixxowZo0GDBunuu+/WwYMHNXXqVPXr16/KfZ87d26V/Tt48KAmTZokt9utsrIyjRo1Sr179z6n4/L1118rLS1NycnJZ52v/fv3P6uWimusVatWuvbaa70/37lzp8aMGaO8vDz9+9//liS99dZbWr16tex2uxwOh55//nm1aNFC+fn5evHFF7V//34ZhqGbb75Zd911l6Tyj5NmzpwpwzBUUlKiBx98UNddd50kadOmTZo+fbrCw8NlGIbuv/9+DRgwQFL5dfnRRx/JbrerZcuW+uMf/6jo6OhqzyfUo3qdP0CDNnLkSKOoqMgoLi42+vbta6xYscIwDMNISUkx+vXrZ5w6dcowDMOYMmVKpenDWbNmGUOHDjU8Ho/hdruNO++801i4cGGV25g6darxwAMPGKWlpUZxcbFx1113VZoynzlzpvHss88ahmEYf//7340JEyYYhlE+RXnbbbd516tqyvPee+81hg0bZng8HmPPnj3GJ598ctZU5/z5842uXbsaO3bsMAzDMD7//HNjwIABhtvtrnLcAQMGGOvXr/c+PnOK/szxx44d662/sLDQGDJkiLFgwYJKz58zZ45hGIaxZMkS4/rrr6+yT4ZhGP/4xz+8U+br16837rrrLu+y9evXG126dDG+//57wzAMY+LEicb48eMNwyifXr/mmmuMxYsXG4ZhGNu2bTO6du1a7RTxAw88YIwcObLaOgzDMA4ePGj07NnT2Ldvn2EYhrFw4ULjhhtu8H6cMnPmTOPKK680nE6n4fF4jGnTpvn1nKuuusrIzs42SktLjd/85jfemk+dOlXpHHr22WeNTz75pNLjIUOGGAUFBUZubq7x9ttvG3/5y1+MYcOGeddZunSp8a9//avK/ampfx6Pxxg0aJDx6aefGoZhGNu3bzcuv/xyIy8vzzAM/47LmjVrDMMwjFdeecUwDN9T9GdeY9OmTat0Xp3+kcLu3buNwYMHG2VlZYZhGMZLL73kPUfHjRtnjB492jAMw8jLyzMGDhzo3ceUlBRj//793mVXXXWV9+Ol22+/3UhPT/fub8U5vGHDBuPyyy83nE6nd3+ef/75avcD9YspelQpNzdXUVFRioiIkN1u1/XXX68vv/xSkrR06VJdc801io2NlSTddNNNlZ67YMEC3XrrrQoNDVV4eLgGDRqkzz//vMrtLF26VEOGDFFISIjsdnu1714lKTY2Vhs3blR6erpCQ0P197//3ed+XHvttQoNDVX79u11xx13VLlOu3btvDMHN954o06cOKH09HSfY/tSVlamxYsX6/bbb5ckNWrUSDfeeONZX267+uqrJUmdOnXSkSNHqh2vQ4cOeuSRR3T33XdrxowZ+vHHHystb9y4sfr27esd6/Dhw5Kk9PR0OZ1ODR48WJJ02WWXqW3btgHt2xdffKFu3bp53+EOGTJEGRkZSktL867Ts2dPNWvWTKGhoRo9erRfz+nRo4eaNm2qkJAQXXLJJd59aNq0qTIyMnTXXXdp6NChSk1NPWv/k5KSFBkZqZiYGD322GO66aablJqaquPHj0sqP9duuOGGavfp9P5deumllfp36NAh3Xzzzd5lLVu2VEpKiiTfxyUyMlJXXnmlJOnZZ5/1q79nXmM33nhjtetGRUXp5MmTWr58uUpKSjRq1Cj16dNHZWVlWrRokX73u99JKp+BGTBggPdavOSSS/TGG2/oD3/4gx599FFlZ2frp59+klTe70WLFunkyZO69NJLNXHiREnl1/bAgQPVrFkzSeXX/uLFi2ucEUL9YYoeVaqYTqz4hnlOTo4OHTqkoqIinThxQpdeeql33aZNm1Z67rFjx/T+++97gyw/P19NmjSpcjuZmZmKi4urdqzT/eY3v5HH49HLL7+s7Oxs3Xfffbr77rtr3A9/Pg8/fZuhoaGKiYlRZmamz+f5kpWVJbfb7b0ZSlKzZs28gVOhYnozIiJCJSUlVY6Vl5enhx9+WC+99JIGDRqkw4cP61e/+lWV45w5VmZmppo0aaLQ0FDv8orgqEqbNm20a9euGvft2LFjlfYrNDRUTZo00bFjx7w/O7P3/jynun1YsGCBPv74Yy1cuFCxsbF68803z3oxdOb2mjdvrqSkJH3++ee6/fbbFR4eXuNU8unL7Ha7d9sVx+v0/x3hdruVl5fn13GpzXcyfF1jp0tISNDs2bP13nvvafLkybrpppv05JNPKj8/X263W9OnT1ejRo0klb9wv+yyyySVv9jo2LGjXn31VUnSwIEDVVhYKEmaMWOG5syZo1tvvVUdO3bUqFGjdNlll+nYsWPau3ev977g8XjUvHlznTp1qtKxRcNAwKNK3377rT777DOFh4dLkkpKSpSUlKSUlBS1aNFCWVlZ3nWzs7MrPTchIUGPPvqo9x1jWVmZcnNzq9xOfHy8Tp06Ve1Yp8vKytKNN96om2++Wdu2bdOwYcN08cUX64orrqj1fp65TY/Ho7y8PMXHx0uSwsPDK/0Xwer2oyrNmjWT3W5XVlaW2rdv792Hli1bnnONP/30k1wul/fdvsfj8fu58fHxys3Nlcfj8X5hsaY+Dx48WP/617+Ul5dXKZwOHTqk6dOna+bMmUpISPC+25PK/1tdbm6uWrVqVe24tXlOhS1btqh79+7eFyb+7v8tt9yid955xzt7UhutWrVSeHi4PvzwQ+/PCgoKFBISol27dtX6uNTE1zV2usLCQnXo0EGzZs1SZmamRowYoffee0/Dhw+X3W7X+PHj1b17d0nl13FRUZGk8p6e/qLl9BeXbrdbY8aM0ciRIzV37lw99thj+s9//qOEhARdeOGF3nf0Uvk5Tbg3TEzR4yw5OTne6fUK4eHhuuaaa/TVV19p0KBBWrVqlTeYz/yG/a233qovvvhCpaWlksrffb377rtVbmvw4MFavHixysrK5Ha7tWzZsmrrmjdvnlauXClJ6tixo5o2baqysjJJ5dOUhYWF2r9/v6ZOnXpO+7t3717vl+KWLFmiFi1aqGfPnpKk1q1ba/fu3ZKk1NRU782xQuPGjVVUVKRFixZp6dKllZaFhITolltu8c5kFBUV6auvvtJtt912TvVJUmJiosLCwrz/D3316tV+P7dnz55yOBzej1i2b9+uvXv3Vrt+37599dvf/laTJ0/23vQLCgo0efJk/eIXv5BUPpvyww8/6MCBA5KkL7/8UomJierVq1e149bmORXatGmjHTt2yO12y+PxaN26dX7t+3XXXadjx47p008/1VVXXeXXc87Uo0cPJSQkaPny5ZLKQ/zxxx/X/v37a31cfJ2vZ15jX3zxRbVjbdmyRTNnzpRU/mKuXbt2Ki0t9Z5/p3889s4773h/r8RFF12kzZs3S5J27NhRadbqiSeeUGFhocLCwtS7d2/vtXzrrbdq5cqVysnJkSTt27dPjz76qF/7jLoXOmnSpEn1XQQajry8PA0dOlQHDhxQmzZtvJ/VpqSk6NNPP9UPP/ygyMhI9evXT1OmTNG///1vdenSRatXr9aIESMkld8Qd+7cqZkzZ+rzzz9Xdna2xo4dW+kFQ4VevXpp3bp1euedd5SSkqLOnTtr1apVysjIUHZ2tv72t79p3759crvd6tevn2bPnq0FCxZo3rx5GjBggO68805J5dO577zzjlauXKm7775bH330kVauXKnt27erpKREvXr1UlZWlkaMGKHjx48rNTVVTZo00Ztvvqm2bdvq8OHDmjNnjjZt2qRp06YpISFBUvlN8M9//rO+/PJLNW7cWPv27VNqaqq6deumFi1ayOVyafbs2dq1a5f+53/+R08//bR3/JtvvllXXHGFvv32W82dO1efffaZ99vcNptN999/vw4dOqTNmzdr0KBBevzxx3X8+HGlpaV5P++t0LhxYzVr1kzTp0/X2rVrZbPZtHnzZqWlpalLly4aP368jhw5omPHjsnhcOiVV17RwYMHlZ2drWuuuUZ9+vTR66+/rgULFigjI0N2u11r1qxRQkKCLr744rOOS//+/bV//35Nnz5dixcv1oIFCzR48GDdc889ksqnjLt06aKXX35ZCxYs0J49ezR9+nQ1a9ZMixcv1vvvv699+/Zp586d3u9V+PucyMhIbdu2TZ9++ql2796tZs2a6aabbtLGjRv1zjvvaMOGDYqMjFRqaqpCQkKUnp6uhQsXavfu3Tp+/Hilb8iHhYXpwIEDatu2bZXfVJekPXv21Ni/q6++WldffbXefvttffrpp/rss890880365prrvH7uKxbt04DBw70TpWfeb62bt26Uk1t27ZVSUmJpkyZoq+//lqXXHKJ1q1bp9TUVHXu3FkTJ07UkSNHtGPHDt1yyy366quvNG/ePH3yyScKCwvT6NGjFRERoV/+8pf6z3/+o/fee08LFy5UkyZN9OijjyokJESdOnXSe++9p2XLlunEiRPKyMjQxo0b9Ytf/EI2m02vvfaaFi1apFWrVmnChAm68MILlZCQoNjYWL300ktavHix1qxZo8mTJ1f6mA0Nh83g2xE4Rx6PR0VFRd7PLLds2aJHHnlEa9eurefKgLNNnz5dN9xwg3eaGjhfMEWPc3bkyBFNmDDB+/jzzz+v9fQnECwLFy5USUmJtm/fTrjjvMSX7HDO4uLi5Ha79Yc//EGGYahVq1aVvnQDNARvvPGGPvjgAw0fPry+SwHqRdCm6DMzM/X6669rx44dmj9//lnL58yZo5MnT6p58+b68ccf9cQTT3i/aQwAAAITtCn6jRs36le/+lW1vwChoKBAY8eOVXJysm644QZNnz49WKUAAHDeCVrADxo0qMY/hvDUU095fzd5WVmZGjduHKxSAAA479T7l+zcbrcWLFigp556yq/1PZ7SIFcEAMDPX71+yc7tdmvSpEl6+umnddFFF/n1nFOnCgLebnx8jDIz8wIeB5XR1+Cgr8FBX4ODvpovPr52f4K6Tt/BZ2dny+VySSr/rV4TJ07UsGHD1LVr1xp/gxkAADg3QXsHn5qaqkWLFikzM1OzZs3S/fffrzlz5ig2NlbJyckaNWqUdu/e7f2LTQUFBTX+pScAAOC/n91vsjNj6ocppOCgr8FBX4ODvgYHfTXfz2KKHgAA1A0CHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACvgErLinViVMFKi4pre9STJFX4Nb2/VnKK3DXdyk++eq9P8fGjONXXFKqoyfzaxzDjL7Wxf6a1TNf6/jTD199rata/RnDmVOotVuPyplTWOXyoyddWrr+gI6edFU7hq+emFFHxTiB9tUXs45NXVw3dTVGdcJMH/G/MjMz9frrr2vHjh2aP3/+WcuLi4s1depUtWzZUvv371dycrLatWsXrHJ+VkrLyvTxv/cobVemsnKL1axJhHp1jNedAzsoNOTn95rM7fHopQ826UimS2WGFGKTLoiP1rj/11v2sKCdgrXiq/f+HBszjl+lMfKK1Szm7DHM6Gtd7K9ZPfO1jj/98NXXuqrVnzEK3SV69p11chV6vMcrOjJMUx9NUqQ9XK4it0a+uUaeUkOS9EnKXoWF2vTqiH6KbmT36xwxow6z+hrouervOnVx3dTVGL6ETpo0aZIpI51h9erVuuSSS5Samqo777zzrOV/+ctfFBMTo+TkZF144YWaMGGCbr/9dp/jFpjw7i8qKsKUcYLln9/s1tcbDquwuPwVXWFxqfZl5Kqw2KNuFzvqubrqVdfXyX/doEMnXDL++9iQlJvv1uY9Tg3odUGd1uiLr977c2zMOH7+jGFGX+tif83qma91/OlHQ6nVnzGeeWtNpVCVJLenTKvSMzT4ijYa/toqb7hXKDOkFd8f0k392vl1jphRh1n764tZx6Yurhuzx4iKivBrzDMF7e3goEGDFBUVVe3ylJQU9erVS5LUqVMn7dixQy5X9VNM54viklKl7cqsclnarpM/u+n6vAK3jmRWfVyPZLoa1HS9r97nFbh9Hhszjp8/Y5jR17rYX7N65msdZ06hz340lFr92Y4zp/CsUK3gKvRo+37nWeFewVNq6OhJl89zxJlTGHAdzpxCU/bXFzP6XlfXjVn7Y4Z6mx91Op2VXgBER0fL6XQqOjq6xufFxTVWWFhowNuPj48JeIxgOHoyX1l5xVUuO5VXpFB7uOKbV//Cqb6d2deM3Zkqq/o+pDJDynOX6eI2DeNY+Op9nrvM57GRFPDx8+ccyMsvCbivdbG/ZvXM1zoZ2cU++9EiqlGDqNWf7WRk5FW9M/+1fkfV4VBhz7F8tW8dVmNPMrKLA64jI7tYl8VGBby/ZlwTku9jUxfXjVn7Y8Z9vt4C3uFwKD8/3/vY5XLJ4fA9tXHqVEHA246Pj1FmZs0nbn0pLSlVs5gIOXPPPvhxMY1U6i5psLVX1dcYe4hCbKryogqxlS9vKPvjq/cx9hCfx0ZSwMfPn3PAjL7Wxf6a1TNf6yTGRvjsR6m7pEHU6s92EmNrnpK94tJ4rU7PqHZ5h1ZRivZxjiTGRgRcR2JshCl9NeOakHwfm7q4bszan9PHqO0b0jr9xlZ2drZ3Gr5///5KS0uTJO3cuVOXXnqpz3fv54OI8FD16hhf5bJeHZsrIjzw2Yu6FNPYrgviqz6uF8RHK6axvY4rqp6v3sc0tvs8NmYcP3/GMKOvdbG/ZvXM1zqOppE++9FQavVnO46mkYqOrPr9V3RkmC5r61BYqK3K5WGhNiU0j/Z5jjiaRgZch6NppCn764sZfa+r68as/TFD0L5kl5qaqkWLFmn79u0qKipSt27dNGvWLO3evVt9+vRRly5dtHTpUm3btk0rV67UmDFjFBcX53Pc8+FLdp3bxqmw2KMcl1vFbo+aNWmkft1a6c6BHRRiq/qibgiq62u/bi21eY9TrgK3DJW/Um7dovxbqw3tfwX46r0/x8aM4+fPGGb0tS7216ye+VrHn340lFr9GaN/70StSs+Q21PmPV4V314PDw3VgD4XaMX3hyq9G634Fr39vx9j+uqJGXWYtb9mXBMN5boxe4zafsnOZhhGNZ9INExmTOc25Cn60xWXlCrHVaym0RE/i3fuvvqaV+DW4RMutW7RsN65V8VX7/05NmYcv+KSUoXaw1XqLql2DDP6Whf7a1bPfK3jTz989bWuavVnDGdOoXYezFani2LlaBp51vKjJ13avMepHh0cSmhe9btTXz0xo46KcQLtqy9mHZu6uG7MGqO2U/QEPExDX4ODvgYHfQ0O+mq+n8Vn8AAAoG4Q8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAWFBXPwtWvXavny5XI4HLLZbBo+fHil5YcOHdK0adPUrVs3bd++XUOGDNGvfvWrYJYEAMB5IWgBX1hYqIkTJ2rJkiWy2+0aMWKE1q1bp6SkJO86c+fOVZ8+fXTfffdp27Zteuqppwh4AABMELQp+vT0dCUmJsput0uSevfurZSUlErrNG/eXFlZWZKkrKwsdenSJVjlAABwXgnaO3in06moqCjv4+joaDmdzkrrDBs2TI8//rimTJmiLVu26LHHHvM5blxcY4WFhQZcX3x8TMBj4Gz0NTjoa3DQ1+Cgrw1D0ALe4XAoPz/f+9jlcsnhcFRa57nnntMdd9yhIUOGKCsrS9dff72+/vprxcbGVjvuqVMFAdcWHx+jzMy8gMdBZfQ1OOhrcNDX4KCv5qvtC6agTdH37NlTGRkZcrvdkqRNmzapf//+ys7OlsvlkiQdPXpU8fHxkqQmTZooJCREZWVlwSoJAIDzRtDewUdGRmrSpEl68cUXFRcXp06dOikpKUnTpk1TbGyskpOTNXbsWH3wwQdKS0vT4cOH9fTTT6tZs2bBKgkAgPOGzTAMo76LOBdmTP0whRQc9DU46Gtw0NfgoK/ma3BT9AAAoP4Q8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWFBYMAdfu3atli9fLofDIZvNpuHDh1dabhiGPvzwQ0nSkSNHlJubqylTpgSzJAAAzgtBC/jCwkJNnDhRS5Yskd1u14gRI7Ru3TolJSV511m0aJGaNGmiW265RZK0Y8eOYJUDAMB5JWhT9Onp6UpMTJTdbpck9e7dWykpKZXWWbx4sbKzs/XBBx/o1VdfVVRUVLDKAQDgvOLXO/iMjAxt3bpVNptNXbt2VWJios/nOJ3OSoEdHR0tp9N51rgul0vDhw/XTz/9pAcffFBffvmlQkNDqx03Lq6xwsKqX+6v+PiYgMfA2ehrcNDX4KCvwUFfGwafAf/SSy9p3rx5aty4sQzDUGFhoe655x6NGzeuxuc5HA7l5+d7H7tcLjkcjkrrREdHq0ePHpKkdu3ayeVy6ejRo2rdunW14546VeCrZJ/i42OUmZkX8DiojL4GB30NDvoaHPTVfLV9wVTjFP3HH3+svXv3asmSJdqwYYM2btyoL774Qnv37tU///nPGgfu2bOnMjIy5Ha7JUmbNm1S//79lZ2dLZfLJUlKSkrSoUOHJJW/ACgtLVV8fHytdgQAAPwfm2EYRnULH374Yc2YMUPR0dGVfu5yufTMM89o9uzZNQ6+Zs0aLVu2THFxcQoPD9fw4cM1bdo0xcbGKjk5WXl5eZo+fboSExN18OBB3XDDDbr22mtrHNOMV4a8wgwO+hoc9DU46Gtw0Ffz1fYdfI1T9NHR0WeFe8XPY2NjfQ7er18/9evXr9LPxowZ4/13TEyMJk+e7G+tAADATzVO0cfEVP+qoaZlAACgftX4Dn7hwoX6+uuvq1yWn5+vF154IShFAQCAwNQY8ElJSRo2bNhZPz/9N9ABAICGp8aAHz16tC6++OIql7Vo0b/9OXQAABVgSURBVCIoBQEAgMDV+Bn8gQMHql128OBB04sBAADmqPEd/Ntvv60ff/yxymWrVq3y+V/aAABA/agx4HNzc7Vv3z5J5b9bvmfPnpWWAQCAhqnGgE9OTtbvfvc7SdLIkSP16quvepd99tlnwa0MAADUWo2fwVeEuyTZbLZKy2677bbgVAQAAAJWY8Dv3bu32mUVU/cAAKDhqXGKfvr06Ro6dKgMw1BmZqa+/fZb77KPPvpIs2bNCnqBAADg3NUY8Kmpqdq1a5f38YQJE7z/5kt2AAA0XDUG/L333quRI0dWueyNN94ISkEAACBwNX4Gf91112nKlClKTU31/uzAgQP65z//qSeffDLoxQEAgNqpMeA/+OADxcTEqHPnzt6fORwObd68WX/729+CXhwAAKidGgPe4/Fo+PDhlf4mfHR0tKZMmaLNmzcHvTgAAFA7NQZ8kyZNql0WFxdnejEAAMAcNQZ8fn5+tcuKi4tNLwYAAJijxoDv1KmTXnvtNbndbu/PiouLNXPmTF122WVBLw4AANROjQH/0EMPKTs7W1dccYWGDBmiIUOG6Morr1Rubq7uueeeuqoRAACcoxr/H7zNZtMf//hHJScna+vWrZKk7t27KzExsU6KAwAAtVNjwFe44IILdMEFFwS7FgAAYJIap+gBAMDPEwEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAWFBXPwtWvXavny5XI4HLLZbBo+fHiV633++ecaPXq0Nm3apKioqGCWBADAeSFoAV9YWKiJEydqyZIlstvtGjFihNatW6ekpKRK6+3du1d79+4NVhkAAJyXgjZFn56ersTERNntdklS7969lZKSUmmdwsJCzZ07V48//niwygAA4LwUtHfwTqez0nR7dHS0nE5npXVee+01PfbYY94XAf6Ii2ussLDQgOuLj48JeAycjb4GB30NDvoaHPS1YQhawDscDuXn53sfu1wuORwO7+OjR48qNzdXX331lfdn77//vq699lp169at2nFPnSoIuLb4+BhlZuYFPA4qo6/BQV+Dg74GB301X21fMAUt4Hv27KmMjAy53W7Z7XZt2rRJd999t7KzsxUWFqaEhAS98sor3vVnzJihYcOG8SU7AABMELTP4CMjIzVp0iS9+OKLeu2119SpUyclJSVpzpw5+uijj7zrZWVladasWZKkuXPn6vjx48EqCQCA84bNMAyjvos4F2ZM/TCFFBz0NTjoa3DQ1+Cgr+ar7RQ9v+gGAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALCgsGAOvnbtWi1fvlwOh0M2m03Dhw+vtHzOnDk6efKkmjdvrh9//FFPPPGE2rdvH8ySAAA4LwQt4AsLCzVx4kQtWbJEdrtdI0aM0Lp165SUlORdp6CgQGPHjpXNZtOXX36p6dOn69133w1WSQAAnDeCNkWfnp6uxMRE2e12SVLv3r2VkpJSaZ2nnnpKNptNklRWVqbGjRsHqxwAAM4rQXsH73Q6FRUV5X0cHR0tp9NZ5bput1sLFizQxIkTfY4bF9dYYWGhAdcXHx8T8Bg4G30NDvoaHPQ1OOhrwxC0gHc4HMrPz/c+drlccjgcZ63ndrs1adIkPf3007rooot8jnvqVEHAtcXHxygzMy/gcVAZfQ0O+hoc9DU46Kv5avuCKWhT9D179lRGRobcbrckadOmTerfv7+ys7PlcrkkSUVFRZo4caKGDRumrl27atmyZcEqBwCA80rQ3sFHRkZq0qRJevHFFxUXF6dOnTopKSlJ06ZNU2xsrJKTkzVq1Cjt3r1bhw8fllT+pbsbbrghWCUBAHDesBmGYdR3EefCjKkfppCCg74GB30NDvoaHPTVfA1uih4AANQfAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAr4KeQVubd+fpbwCd63HKC4p1YlTBSouKQ1onUC3Y1YdvnqSV+DW5t2ZNfbMmVOotVuPyplTWKvl/tThzxj+rOOrJ/6cI2aNcfRkfkDH7+hJl5auP6CjJ121HsOMY2PGeebPOGZcV2bcA/xhRq1mMOOaqCt1VUdD2d9AhAVz8LVr12r58uVyOByy2WwaPnx4peXFxcWaOnWqWrZsqf379ys5OVnt2rULZkk1cns8eumDTTqS6VKZIYXYpAviozXu//WWPcy/VpWWlenjf+9R2q5MZeUWq1mTCPXqGK87B3ZQaEiI3+sEuh2z6vDVE396Vugu0bPvrJOr0OOtPzoyTFMfTVKkPdzncn/q8GcMf9bx1RN/9tf0MfKK1Szm3I+fq8itkW+ukafUkCR9krJXYaE2vTqin6Ib2f0aw4xjY8Z5ZtY574sZ9wB/mFGrGcy4JupKXdXRUPbXDKGTJk2aFIyBCwsL9fDDD+vdd9/VlVdeqQ8//FBxcXG68MILvev85S9/UUxMjJKTk3XhhRdqwoQJuv3222sct8CEV9RRURFVjjP5rxt06IRLxn8fG5Jy893avMepAb0u8Gvsf36zW19vOKzC4vJXfYXFpdqXkavCYo+6Xezwe51At2NWHb564k/PnnlrTaUbiCS5PWValZ6hwVe08bncnzr8GcOfdXz1xJ/9rYsx/Fln+GurvOFeocyQVnx/SDf1a+fXGGYcGzPOM39qPdfrqqr7gBn3AH+YcQ8wgxnXxJmqu78Gqq561lCOzemioiJq9bygvRxJT09XYmKi7Pbydwq9e/dWSkpKpXVSUlLUq1cvSVKnTp20Y8cOuVzVTyMGU16BW0cyq972kUyXX1N1xSWlStuVWeWytF0nVVxS6tc6gW4nr8BtSh2+enL0pMtnz5w5hWfdQCq4Cj3ac/hUjcudOYU+6zhwLMfnGL7qcOYU+uyJM6fQ5/7WxRj+HL8Dx3LOCvcKnlJDR0+6fI5x9KQr4GPjzCkM+Dzzpyf+nPO+mHEP8IcZ9wAzmHFN1FWtdVVHQ9lfswRtit7pdCoqKsr7ODo6Wk6n0691oqOjqx03Lq6xwsJCA64vPj6m0uOM3Zkqq/p+qDJDynOX6eI2MVWv8F9HT+YrK6+4ymWn8ooU+t8pL1/rxDePqnK5v9vJc5eZUkdefkmNPdlzLN9nz5w5VW+jwsY9WTUuz8gulqNpoxq388OBXJ9j+JKRXazLYqNq7ElGdrHP/W0R1SjoY/hz/Hz1ZM+xfCXFN6lxjD3H8mscw59jk5FdHPB55k9P/Dnnq7quTr8PmHEP8Ic/9wlf9wAzbD2QXeNyf64Jf/pqhrrqWUM5NmYJWsA7HA7l5//fDcLlcsnhcJzzOmc6daog4Nri42OUmZlX6Wcx9hCF2FTlBR5iK19+5nPOVFpSqmYxEXLmnn2CxMU0Uqm7RJJ8rhPodmLsIabU4asnHVpF+exZRGzNU0t9OjTTsvUHql2eGBshe3jNdXRt00Tza9hGoo8aKtYpdZfU2JPE2Aif+1sXY/hz/Hz1pEOrKJ/b6dCq5huZP8cmMTYi4PPMn574c86feV2deR8w4x7gD3/uE2Zsxxdf14U/14Q/fTVDXfWsoRybM9X2BVPQpuh79uypjIwMud3l01qbNm1S//79lZ2d7Z2G79+/v9LS0iRJO3fu1KWXXlrju/dgimls1wXxVW/7gvhoxTS2+xwjIjxUvTrGV7msV8fmiggP9WudQLcT09huSh2+epLQPNpnzxxNIxUdWfXryOjIMHVoHVfjckfTSJ91tGnV1OcYvupwNI302RNH00if+1sXY/hz/Nq0aqqwUFuVy8NCbUpoHu1zjITm0QEfG0fTyIDPM3964s8574sZ9wB/mHEPMIMZ10Rd1VpXdTSU/TVL0L5kFx4ervbt2+v9999Xenq6WrRoodtvv10zZ87U7t271adPH3Xp0kVLly7Vtm3btHLlSo0ZM0ZxcXE1jhvML9n169ZSm/c45Spwy1D5q/bWLcq/Qevvtyc7t41TYbFHOS63it0eNWvSSP26tdKdAzsoxGbze51At2NWHb564k/P+vdO1Kr0DLk9Zd76K76pGx4a6nO5P3X4M4Y/6/jqiT/7Wxdj+LPOgD4XaMX3hyq9I634Fr09zL/9NePYmHGemXXOn66q+4AZ9wB/mHEPMIMZ18SZgvUlu7rqWUM5Nqer7ZfsbIZhVPOpU8NkxvSIrymkvAK3Dp9wqXWL2r9qLy4pVY6rWE2jI6p91efPOoFux6w6fPUkr8CtPHeZYuwh1fbMmVOonQez1emiWDmaRp7zcn/q8GcMf9bx1RN/zhGzxgi1h6vUXVLr43f0pEub9zjVo4NDCc2rfofqawwzjo0Z55k/4/h7XdV0HzDjHuAPM+4BZjDjmqgQjCn62tTxc9mOP2o7RU/AwzT0NTjoa3DQ1+Cgr+ZrcJ/BAwCA+kPAAwBgQQQ8AAAWRMADAGBBBDwAABZEwAMAYEEEPAAAFkTAAwBgQQQ8AAAWRMADAGBBBDwAABZEwAMAYEEEPAAAFkTAAwBgQQQ8AAAWRMADAGBBBDwAABZkMwzDqO8iAACAuXgHDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWFBYfRcQbAcPHtTrr7+uzp0769ixY4qNjdXw4cOVnZ2tGTNm6MILL9T+/fs1cuRINW/evL7L/VkoKyvTI488ou7du6ukpESHDh3Syy+/rKKiInpqgqKiIt1xxx266qqr9Oyzz3KumuD3v/+9IiIiJEkhISH629/+Rl9NsG/fPi1ZskQRERH6/vvvNWLECF100UX0NQCHDx/Wfffdp4SEBEmSy+VSp06d9Nxzz517Xw2L27x5s7FixQrv48GDBxtbt241xo8fbyxZssQwDMP45ptvjFGjRtVXiT87paWlxttvv+19/MgjjxiLFi2ipyaZMmWKMWbMGOOVV14xDMOgryaYOXPmWT+jr4HxeDzGQw89ZJSWlhqGYRjHjx83nE4nfQ1QVlaWsWbNGu/jN954w/j+++9r1VfLT9F3795d1113nfdxWVmZIiMjtXLlSvXq1UuS1Lt3b61cubK+SvzZCQkJ0WOPPSZJ8ng8On78uNq1a0dPTbBw4UL17t1brVu39v6MvgZu165dmjNnjt58802lpKRIoq+B2rp1qwzD0IcffqjZs2frP//5j+Li4uhrgOLi4nTllVdKktxut3744Qf17du3Vn21/BT96VasWKGrrrpK7du3l9PpVFRUlCQpOjpaOTk58ng8Cgs7r1oSkNWrV+uvf/2r+vfvr27dutHTAO3Zs0f79u3TyJEjtXPnTu/P6WvgHnroIXXv3l2lpaW65557FBUVRV8DlJGRofT0dL366quKiYnRqFGjFB4eTl9NtHjxYv3mN7+RVLv7gOXfwVdYv369vvvuOz3//POSJIfDofz8fEnln3E0bdqUE/AcXX311frzn/+sw4cPa968efQ0QCtWrJDdbtecOXO0ceNGbdmyRX/961/pqwm6d+8uSQoNDVXfvn313Xff0dcARUVF6eKLL1ZMTIwkqU+fPkpNTaWvJlq6dKluvPFGSbXLrPOi6ykpKdqwYYPGjRunEydOKCMjQ9dee63S0tKUkJCgTZs26dprr63vMn829uzZo8OHD6t///6SpNatW+vw4cP0NECPPvqo99/FxcUqKCjQfffdp3379tHXAOzdu1ebNm3SHXfcIUk6cOCAfv3rX3O+BqhHjx7Kzs5WaWmpQkNDlZGRobZt28put9NXE6xfv169evVSeHi4JNXqfLX8H5v54YcfNHToUHXt2lWSVFBQoHvuuUcDBw7U//7v/yoxMVGHDh3SM888wzc9/XTw4EFNmzZNnTt3lsfj0d69e/XCCy8oPDycnppg2bJlmjdvnkpKSnTPPffoqquuoq8BOH78uCZPnqzOnTvL5XLJ4/Fo7Nixys3Npa8BWrFihdavX6+4uDgdPXpU48ePV1FREX01wciRI/XCCy+oWbNmkqTs7Oxz7qvlAx4AgPPRefMZPAAA5xMCHgAACyLgAQCwIAIeAAALIuABALAgAh6A1yOPPKIJEybUdxkATEDAA5AkZWZm6ujRo1qyZIkKCwvruxwAAeL/wQOQJM2ZM0e9e/fW6NGj9eSTT+qWW26RJH344Ydavny5LrnkEtlsNi1fvlyPPfaY7rrrLn355Zdau3atYmNjdfz4cY0ZM0bx8fH1vCcAJN7BA/ivtLQ09e3bV7fccovmz58vSdq5c6feffddvffee5owYYKio6PVtm1b3XXXXdq3b5/efvttTZ48WaNGjdLll1+u6dOn1/NeAKhwXvwuegA127Bhg3r27ClJuu222/Tuu+/qwIED+u6779S1a1c1atRIktS3b19t2rRJkrR27VoVFxdr0qRJkqT8/HyVlJTUS/0AzkbAA9DChQtVVlaml156SZIUHx+v+fPny+FwyGazVfkcwzDUtm1bTZ482fuzir92BaD+EfDAeS4/P9/7B0Iq9OjRQ1OnTtV7772nOXPmqKioSI0aNdLGjRu961x55ZV6++235XK5FB0drW3btumjjz7Siy++WB+7AeAMBDxwHisqKtIzzzyj/Px8HT9+XC1btpQk7d69WydOnNCcOXOUnJysBx98UJdddplCQkK8f76yffv2Gj9+vMaMGaOLLrpIubm5Gj16dH3uDoDT8C16ADVauXKl929Pz5s3T0eOHNGYMWPquSoAvvAOHkCNPvnkE61evVo2m005OTl64YUX6rskAH7gHTwAABbE/4MHAMCCCHgAACyIgAcAwIIIeAAALIiABwDAggh4AAAs6P8DlYAGgR9VzfYAAAAASUVORK5CYII=\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" + "id": "4b9647f3", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false } - ], + }, + "outputs": [], "source": [ - "# Common imports\n", - "import os\n", + "%matplotlib inline\n", + "\n", "import numpy as np\n", - "import pandas as pd\n", + "from time import time\n", + "from scipy.stats import norm\n", "import matplotlib.pyplot as plt\n", - "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.utils import resample\n", - "from sklearn.metrics import mean_squared_error\n", - "from IPython.display import display\n", - "from pylab import plt, mpl\n", - "plt.style.use('seaborn')\n", - "mpl.rcParams['font.family'] = 'serif'\n", - "\n", - "# Where to save the figures and data files\n", - "PROJECT_ROOT_DIR = \"Results\"\n", - "FIGURE_ID = \"Results/FigureFiles\"\n", - "DATA_ID = \"DataFiles/\"\n", - "\n", - "if not os.path.exists(PROJECT_ROOT_DIR):\n", - " os.mkdir(PROJECT_ROOT_DIR)\n", - "\n", - "if not os.path.exists(FIGURE_ID):\n", - " os.makedirs(FIGURE_ID)\n", - "\n", - "if not os.path.exists(DATA_ID):\n", - " os.makedirs(DATA_ID)\n", - "\n", - "def image_path(fig_id):\n", - " return os.path.join(FIGURE_ID, fig_id)\n", - "\n", - "def data_path(dat_id):\n", - " return os.path.join(DATA_ID, dat_id)\n", - "\n", - "def save_fig(fig_id):\n", - " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", "\n", - "infile = open(data_path(\"chddata.csv\"),'r')\n", - "\n", - "# Read the chd data as csv file and organize the data into arrays with age group, age, and chd\n", - "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n", - "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n", - "output = chd['CHD']\n", - "age = chd['Age']\n", - "agegroup = chd['Agegroup']\n", - "numberID = chd['ID'] \n", - "display(chd)\n", - "\n", - "plt.scatter(age, output, marker='o')\n", - "plt.axis([18,70.0,-0.1, 1.2])\n", - "plt.xlabel(r'Age')\n", - "plt.ylabel(r'CHD')\n", - "plt.title(r'Age distribution and Coronary heart disease')\n", - "plt.show()" + "# Returns mean of bootstrap samples \n", + "# Bootstrap algorithm\n", + "def bootstrap(data, datapoints):\n", + " t = np.zeros(datapoints)\n", + " n = len(data)\n", + " # non-parametric bootstrap \n", + " for i in range(datapoints):\n", + " t[i] = np.mean(data[np.random.randint(0,n,n)])\n", + " # analysis \n", + " print(\"Bootstrap Statistics :\")\n", + " print(\"original bias std. error\")\n", + " print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n", + " return t\n", + "\n", + "# We set the mean value to 100 and the standard deviation to 15\n", + "mu, sigma = 100, 15\n", + "datapoints = 10000\n", + "# We generate random numbers according to the normal distribution\n", + "x = mu + sigma*np.random.randn(datapoints)\n", + "# bootstrap returns the data sample \n", + "t = bootstrap(x, datapoints)" + ] + }, + { + "cell_type": "markdown", + "id": "26cf7fd4", + "metadata": { + "editable": true + }, + "source": [ + "We see that our new variance and from that the standard deviation, agrees with the central limit theorem." ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "b2205188", + "metadata": { + "editable": true + }, "source": [ - "## Plotting the mean value for each group\n", - "\n", - "What we could attempt however is to plot the mean value for each group." + "## Plotting the Histogram" ] }, { "cell_type": "code", "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfQAAAFnCAYAAABQJLtnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deVhU9eIG8HdYhm2AQSSXzDVFMw0xNVyue5laLjdXwDQVUTF3Tc20frmXGeaGmpqZ6VUUzcwll2uCGwpooZaKiCgiMOwwDPP9/YHOFRUHlOEwh/fzPPe5npnDmfd7CF7OrhBCCBAREZFZs5A6ABEREb04FjoREZEMsNCJiIhkgIVOREQkAyx0IiIiGWChExERyQALncq9Q4cOoVevXnB3d8fevXufeD8jIwPNmzdHx44dERgYKEHCkvnuu+/Qpk0bLF++vEw/959//kG/fv0wYMAADBs2rEw/u7h+//13dOvWDb6+vlJHITI7LHQq97p27YqZM2fC1tYWmzdvfuL93bt3Q6fT4f3338fHH38sQcKSCQgIQLt27cr8c9esWYPOnTtj27Zt6Nq1a5l/fnF07twZfn5+UscgMkssdDIb3bt3x6VLlxAVFWV4TQiBkydPokmTJhImMw93797FSy+9BAAYPHiwxGmIqLSx0MlsVK9eHZ07d8YPP/xgeO2PP/5AmzZtoFAoCs2bmZmJGTNmYNCgQRg4cCC2bt1qeO/cuXMYMmQIfH19MXDgQBw+fBgAoNVq4evrC3d3d2zZsgUjR45Ely5dcOjQoafmGT58ONzd3TFixAgAwN69e9GuXTuMGTMGADB37lwMHDgQvr6+mDRpEjIyMp66nGnTpqFJkyY4ffo0AGD06NFwd3dHXFycYZ7du3ejf//+8PHxweTJkw3Lun//PkaMGAFfX18MGjQIQUFBT/2MuXPnIjo6GkFBQfD19UV6ejoyMjIwc+ZMDBo0CAMGDMDatWshhHhiPYwYMaJQvkclJSUhICAA3t7ehdYlUHCoZPDgwfD19YW3tzfCw8MN7+l0Onz11VcYOHAgfHx8MH78eNy6dcvwvhACS5YsQf/+/TFw4EAkJSU9dVyZmZmYOnUqBg0ahMGDB2POnDnQ6XSG948fP4733nsPPj4++Oabb9CpUyf06tXL8EdhUev1aVavXo0ePXrgo48+QlBQENzd3eHr64v4+Hj0798f7u7uCA4OxpAhQ/Daa68hLi4O9+/fx7hx4+Dt7Y3+/ftj165dAICwsLBChxYOHTqETp064ZNPPgEAbNu2DZ06dUJAQAAmT56MAQMGYPDgwYXWEdETBJEZOHXqlAgMDBSnT58WjRs3Fvfu3RNCCDFp0iSRkZEhfHx8xNKlSw3zz5o1S0ydOlUIIUR6erro1KmTOHv2rBBCiGPHjomYmBjDe23bthVpaWmGr23QoIEICgoSQgixb98+8fbbbz81U15envDy8hLh4eGG10aNGiXy8/OFEEJs3LjR8HpgYKD45ptvDNPTp08XgYGBhumOHTuKU6dOFcpw69YtIYQQ586dEy1bthRJSUlCCCEWLlwoZs6cKYQQYtGiRWLNmjVCCCEyMzPFwIEDi1yHPj4+YufOnYbpGTNmiOnTpwshhMjOzhY9e/YUu3btKpRh+fLlQggh9uzZIy5duvTEMocNGyaWLVsmhBAiISFBtGzZ0pB79+7dIiUlRQghxK1bt0T79u0NX7dq1SoxdOhQodPphBBCfP7554ZsO3fuFB4eHiI2NlYIIcSIESPE6tWrnzqmlJQUsXv3bsP09OnTxfbt24UQQiQlJQkPDw/D9+fw4cPC3d3dsJ6ftV4fd+zYMdGmTRvDeBYsWCAaNGhgeP/WrVuiQYMGhvW3fv16kZCQID788EPD9zkpKUm0adNGnDlzxjBOHx8fwzICAwMN34+H082bNxcJCQmGdTZgwICn5iMSQghuoZNZadmyJerVq4eff/4ZsbGxcHNzg4ODQ6F59Ho9QkJC8MEHHwAAVCoVOnbsiD179gAA6tevj2+//RYDBw7E6NGjodFocOPGjULLeHiM293dHbdv335qFisrK/To0QO7d+8GAFy9ehUNGjSAhUXBj5WtrS0GDx4MHx8f7Nu3D3/++edzjXnXrl3o1KkTKlWqBAB47733sHfvXgghoFarceLECfz999+wt7fH999/X6xl6vV67N27F//+978NWbt3747g4OBC83Xp0sXwmY0bNy70XkJCAk6ePGlYzy+99BI8PT2xb98+AEDDhg0Ne0lmzJiBO3fuGLa0g4OD0atXL1haWgIARo0ahRYtWhiWXbt2bbzyyiuG5Ty6t+JRzs7OiI+Px6BBg+Dr64szZ84Y1vPx48fh6uoKT09PAAXH5+3t7Yu1Xh/322+/4V//+hfUarVh3qfp3LkzAOCjjz6CEAJhYWGGdVypUiV06NDhiXX8LC1atDAcJunVqxcuXLiA+Pj4Yn89VSxWUgcgKikfHx98++230Gg0GDJkyBPvJycnQ6vVYsmSJbC1tQUApKWloVGjRgCA6dOno0GDBli6dCkAoFOnTsjOzi60DJVKBQCwsbFBXl5ekVl69+6NoUOHYtasWQgJCUHfvn0BAKdPn8bChQuxd+9e1KhRA8HBwYbdrSV19+5dXLt2zbB7VqfToXLlykhJScHw4cNhZ2eHiRMnwtLSEv7+/nj33XeNLvPhOnpYZkBB4SQkJBSa7+F6KCoXULA+Hx7ySElJQYMGDQAUHDrw9vbG8OHDART8cfRwPd+9excuLi6GZVWpUqXIz1UqlUV+D3bt2oVt27Zh9+7dUKvVWL58ueEPsMTExEKfAcBQyA8zFLVeH10vAHDv3j00bNjQMO3s7PzUPI6Ojk+sn8fX8aVLl576tU/z6Oc8zJ6YmIjq1asXexlUcbDQyey8//77+Oqrr3D79m3UqlXrifcrVaoEpVKJ2bNno2nTpgCAvLw85OTkAACioqLw0UcfGeZ/VmEb07hxY1StWhWHDx9GTEwM6tWrZ/iMOnXqoEaNGgBQ6Lju01hbW0Or1QIo+OPjUdWqVcMrr7yCOXPmGF5LTk5GpUqVcO/ePfj6+sLX1xehoaEYNWoUGjdujJo1az7z8x6uo+TkZEPm5OTkJ4r1WapWrQoACAwMNJRWbm4udDodkpKScPv2bcOejsfXcbVq1ZCSkmKYTklJQWZmpmF9FVdUVBSaNm1qKLtH17ObmxuSk5MLza/RaAplKGq9Pu6ll14qtKxHl1OUh+snOTnZUMCPruNHv+fAk9/3xz/n4fpyc3Mz+tlUMXGXO5kdGxsbzJ8/HxMmTHjq+xYWFujdu7dhFzsArFq1yrBrvGbNmoiMjAQAXL58GYmJiS+U5/3338fChQvx1ltvGV6rVasWYmNjDb+E//jjj2cuo0aNGvj7778BFOwqflSfPn1w/PhxpKamAgCuX7+O0aNHAwCWLl2K6OhoAEDTpk1hbW391F3Gj3u4jh7u/s3JycH+/fsNexiKo0qVKmjbti1CQkIMr82ZMwenT5+GWq2Gk5OTYT2fOHHiiTGFhIQgPz8fAPD111/j8uXLxf7sh2rVqoXLly9Dq9VCp9MhLCzM8F779u2RnJxsOBnv999/R25ubqEMRa3Xx3Xr1g3//e9/Dd/P/fv3G832cP08XMcpKSk4duyYYRd8jRo1EBMTA61Wi9zc3KeedHjhwgXcu3cPQMEJfM2aNePWORXJcu7cuXOlDkH0LCdPnsTChQsRGRmJvLw8eHp6om7duqhcuTKAgrPEz549a/jl2Lx5c7Rq1QpHjx7F2rVrsXv3bjg5OWH06NGwsLCAu7s71q5diwMHDuDevXuIj49HeHg4WrRogSlTpuDWrVuIjIxEt27dMHbsWCQkJODChQvo1avXU/PVqFEDa9euxfz582FnZwcAqFOnDmJjY7Fs2TKcOXMGdnZ2OHv2LDQaDS5cuIB9+/YZjns3btwYL7/8MgIDA/H777+jQYMGOHz4MCIjI/Gvf/0L9evXh1qtxrx587B3716cPHkSX3zxBVxcXGBlZYVly5YhJCQE27Ztw9ChQ9G+ffsnMs6dOxehoaGIjo5GXFwc2rZti1atWuGPP/7AunXrEBwcjG7dumHw4MFQKBT46KOPDOuhRo0ahuPZj2vbti1+/vlnbN68GTt37sQbb7yBfv36wcLCAnXr1sWyZcvw3//+F0IInDt3DpGRkejatStatWqF69evY/ny5QgODkb9+vXh7e2NsLAwLF26FLGxscjJyUFGRgaCgoJw/fp1WFhYoFmzZoU+393dHeHh4Vi1ahXOnTsHOzs7nDlzBhYWFmjdujUaNWqE+fPnY//+/VCpVLhx4wa6du2Kl19+GdWqVStyvT6udu3ayMvLw4IFC3DkyBE0btwYJ06cwLhx46DRaBAQEICEhAScOXMGTZs2NWzlPyz0H3/8ESEhIRg5cqThOHu1atVw9epVrFixAlFRUahbty6OHDkCrVaLFi1a4MyZM3BwcMDZs2exdu1axMXFYfHixUXu7idSiOL8OU9EZIY0Gk2h4+bNmjXDjh07DIcZikun0yEnJ8dwbD8qKgr+/v4IDQ0t1byPeng+wMKFC032GSQv3OVORLI1duxYw272gwcPwtXV9annXRhz+/ZtfPbZZ4bpPXv2oG3btqWWk6g08KQ4IpItDw8PDB48GLa2tlAoFFi+fDmsrEr+a8/FxQVarRYDBw6EEAJVq1YtdDJdadu2bRt27dqF3NxcrFq1qshj+0SPMtku98TERCxbtgyXL1/Gzp07n3g/NzcXixYtQpUqVRATEwM/Pz/UqVPHFFGIiIhkz2S73MPDw9G5c+ciz7jdtGkTqlWrhlGjRhmu4yUiIqLnY7JC79at2xN38HrUsWPHDGesuru74/Lly8+8jzIREREVTbKT4pKSkgoVvkqlKvIBDI/iSflERERPkuykOFdXV2RmZhqmMzIy4OrqavTrFAoFEhPTTRmtTLi5OZr9OOQwBkAe45DDGACOozyRwxgAeYzDzc3R+Ewo4y10jUZj2K3eoUMHXLhwAQBw5coVNGzY8Jn3jSYiIqKimazQz5w5g5CQECQmJmLlypXIyclBUFAQfvrpJwDAkCFDEB8fj5UrV2LDhg2YN2+eqaIQERHJnlneKc7cd58A8tkNZO5jAOQxDjmMAeA4yhM5jAGQxzjK5S53IiIiMg0WOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGTAypQLDw0NxcGDB+Hq6gqFQoGAgIBC79+6dQuLFy9GkyZNEB0djZ49e6Jz586mjERERCRLJiv07OxszJkzB/v27YNSqcS4ceMQFhYGLy8vwzzr1q1D8+bNMXToUPz111+YMGECC52IiOg5mGyXe0REBKpXrw6lUgkA8PT0xLFjxwrNU7lyZSQnJwMAkpOT0bhxY1PFISKiikSrheWli0BqqtRJyozJttCTkpLg4OBgmFapVEhKSio0z7BhwzB27FgsWLAAUVFRGDNmjKniEBGRXAkByxvXYHU+HFYXwmF9PhxWl6KgyM0FfHyApSulTlgmTFborq6uyMzMNExnZGTA1dW10DyffPIJ+vXrh549eyI5ORlvv/02Dh8+DLVa/cxlu7k5miRzWZPDOOQwBkAe45DDGACOozwpt2NISADOnPnf/86eBVJS/ve+lRXwxhtAy5aAn1/5HUcpM1mhe3h4ID4+HlqtFkqlEufPn8fgwYOh0WhgZWUFlUqFO3fuwM3NDQDg5OQECwsL6PV6o8tOTEw3Vewy4+bmaPbjkMMYAHmMQw5jADiO8qTcjCEjA9ZREbA6Hw7rCwVb4JZxtwrNoqtbD7pOXaHzbI68Zs2ha9wEsLMDUI7G8QKK+weJyQrdzs4Oc+fOxZdffgkXFxe4u7vDy8sLixcvhlqthp+fH2bMmIEffvgBFy5cQFxcHCZOnIhKlSqZKhIREZVneXmwuvzX/3adXwiH5ZXLUDyyoaev7Ibcd96FrtmD8vZoBuHC3gAAhRBCSB2ipMz9ry1APn81mvsYAHmMQw5jADiO8sTkYxACFjE3DFvd1ufDYXUxEoqcnP/NYu+API9mBeXt2Rw6D0/oa7wCKBTF/hi5fC+Kw6TXoRMREQGAIjER1hHhhXadWzxy3FtYWkLXqDF0zZpD1/xN5DVrjvwG7oClpYSpzQsLnYiISldmJqwvRhbedR57s9As+bVqI6dDpwe7zt+ErklTwN5eosDywEInIqLnp9PBMvqvgq3uiPOwPh8Oy8t/FT7u7eqK3C5vF2x9ezZHnkdziMeueqIXx0InIqLiEQIWsTcLyvt8OKzPnys47p2d/b9Z7Oyga9Gq4IS1B2ed62vWKtFxb3o+LHQiInoqRVKS4bj3w13nFo/cIExYWCC/4WsFJ6w9OOs8v2GjguvAqcxxrRMRUSG2P2wAVn6LytevF3o9v2Yt5LRt/79d503eAB65IyhJi4VOREQFdDo4fDYD9uvWAI6O0Hbq8r9d5x7NIR7cCIzKJxY6ERFBkZYKJ79hUB45DF2j12D16z6kOvDENXNisqetERGRebC4GQN1z7ehPHIYuZ27QvPLQaB2baljUQmx0ImIKjCrM6fh8m4nWF2ORtZIf6Rt3gbh6CR1LHoO3OVORFRB2ezYBscJY4H8fKQvWoqcYSOkjkQvgIVORFTR6PWwXzwfDksXQ+/ohLR1m5DXsbPUqegFsdCJiCqS7Gw4fjwatiHByK9VG6k/bke+e0OpU1EpYKETEVUQioQEOH84ENbnw5HXygupG3/iLVhlhCfFERFVAJaXLsKlW0dYnw9HTv9B0OzYwzKXGRY6EZHMKQ/sh0vPt2F5Ow4Zs+YgfflqwMZG6lhUyljoRERyJQTsVn0HpyEDAaFH6vrNyB4/mQ9KkSkeQycikqO8PKg+mQy7zRuRX6Uq0jb/DJ2Hp9SpyIRY6EREMqPQpMBp+BAoTxxH3utNkfbjNuirvyx1LDIx7nInIpIRy+v/QP1uZyhPHEdutx7Q7PmNZV5BsNCJiGTC+uQJqLt1gtW1f5AVMAFpG7cAKpXUsaiMcJc7EZEM2P60Gaop4wEA6ctWIGewr8SJqKyx0ImIzJleD4f/mwP7Fd9C7+KCtO9/RF6bdlKnIgmw0ImIzFVGBpzGjITNb/ugq/cq0rZsR37dV6VORRJhoRMRmSGL+Ntw8hkA60tR0LZrj7T1P0CoXaSORRLiSXFERGbGKuI81O90hPWlKGT7DkXqz8Esc+IWOhGROVHuDYFTgB+Qk4OML+Yje9RY3vmNALDQiYjMgxCwC1wK1bzPIewdkPbDz9C+867UqagcYaETEZV3ublwnPwxbLdvRf7LNZC6eRvyX28idSoqZ1joRETlmCIpCc5DB8P6dBjyPJsjddPPEFWqSB2LyiGeFEdEVE5ZXr1S8Azz02HI6dUXml2/ssypSCx0IqJyyPro71B37wLLmzHInDQN6Wu+B+zspI5F5Rh3uRMRlTO2G9ZBNXMqYGmJtJVrkfvBAKkjkRlgoRMRlRc6HRzmzIT92tXQV66M1I1boWvZSupUZCZY6ERE5YAiPQ2OfsNg8/sh6NwbIvXH7dDXqi11LDIjLHQiIolZxN6Es09/WF2OhrZTF6QFbYBwcpY6FpkZnhRHRCQhq7On4dKtI6wuRyNrxCik/ridZU7PxWihh4aG4tChQwCAdevWYdy4cYiOjjZ5MCIiubPZuR3qvj2hSElB+oKvkDl/CWDFHaf0fIwW+vbt29GgQQNERUVh+/bt6N27N9asWVMW2YiI5EkI2C+aB6fRIyCUNkjd8h/kDPeTOhWZOaOFXqtWLdSqVQv79+/HkCFD0LlzZ1SvXr0sshERyU92NhxHDYPD14uQX7M2NL8eRl6nLlKnIhkwum8nNjYWv/32G3755ReEhIRAr9cjISGhLLIREcmKIiEBzkMHwTr8HPJavoXUjT9BVK4sdSySCaNb6L6+vggJCcHHH3+MSpUqYcmSJXj11VfLIhsRkWxY/nmp4Dau4eeQ028gNDv3ssypVBndQvf09MSqVasM09OnTzdpICIiuVEe+g2Ofh/BIjMDmTM/Q9b4yXyGOZU6o1voMTEx8PHxgbe3N7KysuDv74+4uLiyyEZEZN6EgN3q7+DkOxCKfB1S1/+ArAlTWOZkEkYLffny5Rg7dixq1qwJe3t7fPnll1i5cmVZZCMiMl95eVBNmQDVZzOhr+wGTch+aN/rLXUqkjGjhV6jRg14eXlBqVQCACpXrgxnZ970gIioSCkpcB74b9ht3oC815tCc+AodM2aS52KZM5ood+7dw85OTlQPNhFFB8fj5iYGFPnIiIySxbXrwFeXlCeOIbcbt2h2fMb9C/XkDoWVQBGT4rr3bs3unfvjtzcXJw9exZJSUlYtmxZWWQjIjIrVhHn4TygD5CSgqwxHyNz9ueApaXUsaiCMFrorVq1QnBwMCIiIiCEQLNmzaBWq8siGxGR2bAKPwvnAX2hyEgHgoKQ2Xug1JGoginWw1nUajU6dOiAjh07Qq1W4z//+Y+pcxERmQ2rM6fh3K83FBnpSF8RBIwcKXUkqoCMbqEPGTLkiddu3ryJfv36mSQQEZE5sT4VCqdBH0CRk420oA3Qvt9H6khUQRktdCcnJ0Op63Q6REdHIzc31+TBiIjKO+uTJ+Ds3Q/QapG2dhO0Pd+XOhJVYEYLfcmSJbCzszNMt27dGosWLSrWwkNDQ3Hw4EG4urpCoVAgICCg0PtCCGzevBkAcPv2baSlpWHBggUlyU9EJAnr40fhPGQgoNMhbf1maN/tIXUkquCMFvqlS5cM/9br9UhMTMSFCxeMLjg7Oxtz5szBvn37oFQqMW7cOISFhcHLy8swT0hICJycnNC7d8HNFi5fvvw8YyAiKlPWRw7DeehgQK9H2sYt0HbtJnUkIuOFPnnyZNSuXRtCCCgUCri5ueGTTz4xuuCIiAhUr17dcEMaT09PHDt2rFCh7927F+3atcMPP/yA+/fv87g8EZV7ysMH4DTUG1AokPrDVuR16ip1JCIAxSj0SZMmGbagSyIpKQkODg6GaZVKhaSkpELzxMfHIyMjAwEBAbhx4wZGjBiBX3/9FZZGrtt0c3MscZ7ySA7jkMMYAHmMQw5jAMr5OPbuBT4cXHBt+d69UHcp+jnm5XocxSSHMQDyGYcxxbqxzOPWrl2LkUYuy3B1dUVmZqZhOiMjA66uroXmUalUeOONNwAAderUQUZGBu7cuYMaNZ59V6XExHRjscs9NzdHsx+HHMYAyGMcchgDUL7Hody3F04jPwSUSqT+uB15b7QCishansdRXHIYAyCPcRT3D5IiC71Tp06G270+SgiBtLQ0o4Xu4eGB+Ph4aLVaKJVKnD9/HoMHD4ZGo4GVlRVUKhW8vLxw69YtAAWFn5+fDzc3t2IFJyIqK8o9u+A06iPAxhapW3cgz6uN1JGInlBkoXt6emLixIlPvC6EwHfffWd0wXZ2dpg7dy6+/PJLuLi4wN3dHV5eXli8eDHUajX8/PwwcuRILFmyBKtXr0ZsbCwWLVoEGxubFxsREVEpstm1A45jRkLY2SN1607oWr0ldSSip1IIIcTT3sjOzi50udqjrly5And3d5MGexZz330CyGc3kLmPAZDHOOQwBqD8jcPmPz/DcZw/hIMKqduCoXuzZbG+rryN43nIYQyAPMbxwrvcHy3zqKgoxMTEQK/XAwD27NmD77///gUjEhGVXzY/b4Hj+DEQTs5I3b6Ljz+lcs/oSXHLly/HpUuXcPv2bTRp0gTx8fFITzfvv3aIiJ7F9sdNUE3+GEKtRup/QqBr6iF1JCKjjD6cJTU1FWvWrEHr1q2xYMECbNq0Ca1atSqLbEREZc5243o4ThoH4eICzc5fWOZkNowW+sMbwzx6CdqdO3dMl4iISCK269fAcdpE6CtXhiZ4H/JfbyJ1JKJiM7rL/fr16zh06BDq16+Pvn37QqVSwdrauiyyERGVGbvV30H12Uzo3V6CJvgX5Ls3lDoSUYkYLfQVK1YAACwtLeHm5gaNRoNevXqZPBgRUVmx++5bqL6YjfwqVZEa/Avy6zeQOhJRiRnd5f7pp58a/t2jRw94e3tDpVKZNBQRUVmx+/brgjKvVh2pIb+yzMlsGd1Cj42NxaxZs+Dm5oY+ffqgbt26ZZGLiMjk7L9aCIfF85Ff4xVodu6Fvg5/v5H5MlroX331FapVq4a7d+9i586duHHjBtq0aYM+ffqURT4iotInBOwXzYPD0sXIr1kLmuBfoK9ZS+pURC/E6C53jUYDoOBe62lpaQgLC8O+fftMHoyIyCSEgMP8LwrKvFZtaHb/yjInWTC6hT579mxYWFggKSkJffv2xY4dO1CtWrWyyEZEVLqEgMPns2G/MhC6uvWQGvwL9NVfljoVUakwWuh6vR6TJk1C69atyyIPEZFpCAGHz2bAfs1K6F6tX1DmVblxQvJhtNCXLFmCevXqlUUWIiLTEAKqmVNhtz4IOveG0OzYC1GlitSpiEqV0UJnmRORWdProZo+GXab1kPX6LWCMndzkzoVUakzWuhERGZLr4dqynjY/bgJusZNoNmxB8LVVepURCbBQiciecrPh+PEANj+vAV5TT2Q+p/dEC6VpE5FZDJGL1sDgNzcXNy9exfx8fGIj4/HjBkzTJ2LiOj56XRwHOdfUObNPJG6I4RlTrJndAt92Yx6U1oAACAASURBVLJl2LRpE9RqNRQKBQAgLS0NCxYsMHk4IqIS0+ngOHYkbHftRF7zFkjdFgzh5Cx1KiKTM1roR44cwR9//AEHBwfDaz///LNJQxERPZe8PDj5D4fN3t3Ia/kWUrfugHB0kjoVUZkwWuivvfYabGxsCr1WqxbvqkRE5YxWCye/YbD5dS+0Xm2QuuU/AB8kRRWI0ULPzMxEz5490bhxYyiVSgBAVFQUb/9KROVHbi6cRgyBzYH90Lb9F1I3bwMe2atIVBEYLfQbN25g1KhRhV67e/euyQIREZVITg6cPvKBzeGD0LbviNRNWwF7e6lTEZU5o4X+f//3f2jWrFmh1zw8PEwWiIio2LKz4fzhICiPHYG2UxekbtgC2NlJnYpIEkYvW3u8zAFwdzsRSS8rC84+A6A8dgS5Xd9B6safWOZUoRndQg8PD8fs2bNx8+ZN6PV6CCGgUCgQEBBQFvmIiJ6UkQFn3wFQnjyB3G49kLZ2I/DYybtEFY3RLfStW7fixx9/hLe3N6Kjo3HkyBEMHz68LLIRET1BkZEO9aB/F5R5j/eRtm4Ty5wIxSj06tWro1KlStDr9YbpnJwckwcjInqcIj0Nzv37wPp0GHJ69UVa0AbgwdU3RBVdsc5yv3v3LvLz87Fx40ao1WqcP3++LLIRERkoUjVwHtAH1ufDkdO3H9K/WwNY8XEURA8Z/Wnw8fHBvXv34O/vj5kzZ0Kj0WDKlCllkY2ICACgSEku2DKPvICc/oOQ/u1KwNJS6lhE5YrRQm/VqpXh3+vXrzdpGCKixymSk+D8QS9YX4pC9mBfZHwdyDInegqjx9BjYmLg4+MDb29vZGdnw9/fH3FxcWWRjYgqOMX9+1D36VlQ5r7DkLF0OcucqAhGC3358uUYO3YsatasCTs7O3z55ZdYuXJlWWQjogpMce8e1H17wCr6T2QPG4GMJd8AFsV64jNRhWT0p6NGjRrw8vIy3Me9cuXKcHbmowiJyHQsEu5C3ac7rC5HI2ukPzIWfs0yJzLC6E/IvXv3kJOTY3gWenx8PGJiYkydi4gqKIs78XDu3R1Wf19Fln8AMr9cBDz4/UNERTN6Ulzv3r3RvXt35Obm4uzZs0hKSsKyZcvKIhsRVTS3bkHd611YxtxA1riJyPx0LsucqJiKdZZ7cHAwIiIiIIRAs2bNoFaryyIbEVUgFrdigQ/eg2XMDWROmoqs6Z+yzIlKoFgHpdRqNTp06ICOHTtCrVZj06ZNps5FRBWI9alQqN/tDNy4gcypM5D1yWyWOVEJGd1CP378OFavXo379+8bHs6SlpaGDz/8sCzyEZGcCQG7NSvg8PnsgulvvkGWN58VQfQ8jBb6woULMXv2bNSsWRMKhQJCCHz33XdlkY2IZEyRkQ7VxHGwDQmG3u0lpK3dCHWvd4HEdKmjEZklo4X+6quvonXr1oVeGzNmjMkCEZH8WV69AqePfGB19QryWr6FtHWboK9aTepYRGbNaKEPGzYMc+bMQePGjQ3Xou/Zswfff/+9ycMRkfwo9+yC4/ixsMjMQNaoMcj87P8Aa2upYxGZPaOFvmrVKmRlZSE3N9dwLXpCQoLJgxGRzOTlweGLz2C/ZgWEvQPSgjYgt/e/pU5FJBtGCz0jIwNbt24t9Nrx48dNFoiI5Mci4S4cRw6F8lQodK/WR9qGLch3byh1LCJZMXrZWrt27RAbG1votceniYiKYn0qFOrO7aA8FYrc93pDc+Aoy5zIBIxuoe/YsQMrV66Ei4sLlEql4bI1X1/fsshHRObqsUvSMj6fj2z/sby+nMhEjBZ61apVsXnzZsM0L1sjImOeuCRt3SbkebWROhaRrBkt9PXr18POzq7QawsWLDBZICIyb5ZXr8BpmDes/r7KS9KIypDRY+iPlzkAw9nuRESPUu7ZBfU7HQuelDZqDDS79rHMicqI0S10IiKjHr8kbe1G5PbqK3UqogqFhU5EL6TQJWn1GyDt+x95FjuRBIwWenx8PC5evAiFQoHXX38d1atXL4tcRGQGrE+FwnHEh7C8l4Dc93oj/dsVECpHqWMRVUjPLPR58+Zhy5YtsLe3hxAC2dnZ8Pb2xqxZs8oqHxGVR0LAbvUKOHzBS9KIyosiT4rbtm0brl27hn379uHcuXMIDw/HL7/8gmvXruHnn38u1sJDQ0Mxd+5cLF++/JmXuu3Zswfu7u7IzMws+QiIqEwpMtLhOHIoVHNmQu9aGanBvyB7dADLnEhiRRb6kSNHEBgYiDp16hheq1u3LgIDA3H06FGjC87OzsacOXMwc+ZMjBs3DleuXEFYWNgT8127dg3Xrl17zvhEVJYsr16B+p2OsN2zC3mtvKD5/QSvLycqJ4osdJVKBZVK9dTX1Wq10QVHRESgevXqhie0eXp64tixY4Xmyc7Oxrp16zB27NgSxiaislb4krSx0AT/An2VqlLHIqIHijyG7uhY9Iktz3rvoaSkJDg4OBimVSoVkpKSCs3zzTffYMyYMYbSLy43N3mcdCOHcchhDIA8xmGyMeTlAdOnA998Azg4ANu2wb5/f9ib5tNk8b0A5DEOOYwBkM84jCmy0Hfv3o3Dhw8/9b3MzEx8+umnz1ywq6troWPiGRkZcHV1NUzfuXMHaWlp2L9/v+G1DRs2oH379mjSpMkzl52YmP7M982Bm5uj2Y9DDmMA5DEOU43BIuEunEZ8COvTYQWXpG3YgvwG7oCJ1pccvheAPMYhhzEA8hhHcf8gKbLQvby8MGzYsCdeF0IUurd7UTw8PBAfHw+tVgulUonz589j8ODB0Gg0sLKyQrVq1bBw4ULD/F9//TWGDRtWaKueiKTz6CVpOe/3Qcay73hJGlE5VmShT506FXXr1n3qey+99JLRBdvZ2WHu3Ln48ssv4eLiAnd3d3h5eWHx4sVQq9Xw8/MDACQnJxvOml+3bh0GDhyIKlWqPM9YiKg08JI0IrOkEEKIp71x9OhRdOzY8alfdPz4cbRv396kwZ7F3HefAPLZDWTuYwDkMY7SGoMiIx2qCQGw3bML+S9VQfq6Tch7q3UpJCweOXwvAHmMQw5jAOQxjhfe5b5ixQr8+eefT33vv//9r6SFTkSl79GnpGnfao30tRt5FjuRGSmy0NPS0nD9+nUABZegeXh4FHqPiOTDJiQYjuPHQpGViaxRY5H52ReAtbXUsYioBIosdD8/P3zwwQcAgEmTJmHp0qWG94KDg02fjIhMLy8PDl/Mhv2alXxKGpGZK7LQH5Y58OTzz/v25Q88kbkr8pI0IjJLRd4p7lm3Y324K56IzJP1qVCoO7eD9ekw5LzfB5oDR1nmRGauyC30JUuWwNfXF0IIJCYm4o8//jC899NPP2HlypVlEpCIStHjl6R9MR/Zo3hJGpEcFFnoZ86cwdWrVw3Tn332meHfPCmOyPwoMtLhOH4sbPbuluSSNCIyrSIL3cfHB5MmTXrqe99++63JAhFR6eMlaUTyV+Qx9KLKHADGjx9vkjBEVPpsQoLh8naHgqek+QcgdedeljmRDBVZ6FFRUViwYAHOnDljeO3mzZuG27QSUTmXlweH2Z/AaeRQCIUCqes2IfOL+by+nEimitzl/sMPP6B27dp47bXXDK+5uroiMjISubm5+PDDD8skIBGVXKFL0hq4I+37H3kWO5HMFbmFrtPpEBAQAJVKZXhNpVJhwYIFiIyMLJNwRFRy1mEn4dKpbcElab36QvPbEZY5UQVQZKE7OTkV+UUuLi4mCUNEL0AI2K1cDue+PaFISUbG/y1AetAGPvKUqIIocpd7ZmZmkV+Um5trkjBE9JzS0+E04kPDJWlp636A7i0vqVMRURkqcgvd3d0d33zzDbRareG13NxcBAYGolGjRmUSjoiMsz55AmjZEjZ7d0P7Vmtofj/BMieqgIos9JEjR0Kj0eCtt95Cz5490bNnT7Ru3RppaWnw9vYuy4xE9BTWJ0/AuXd3qPv0AC5f5iVpRBVckbvcFQoFPv/8c/j5+eHixYsAgKZNm6J69eplFo6InmR98gTslyyAMrTgdsy5Xd6Gzbz/Q2Yd7jkjqsiKLPSHXn75Zbz88stlkYWInuFpRZ415RPoPN+Em5sjkJgucUIikpLRQiciaT2ryImIHmKhE5VTLHIiKgkWOlE5wyInoufBQicqJ6xPnoD9VwuhPHkCAIuciEqGhU4kMRY5EZUGFjqRRKxD/yjYtc4iJ6JSwEInKmNPFHnnrgVF3ryFxMmIyJyx0InKCIuciEyJhU5kYixyIioLLHQiE7EO/aPgZLc//guARU5EpsVCJyplLHIikgILnaiUWIedLNi1ziInIgmw0IleEIuciMoDFjrRc3q8yLWduiBzyifQvdlS4mREVBGx0IlKiEVOROURC52omFjkRFSesdCJjLAOO1lw1vqJ4wBY5ERUPrHQiYrAIicic8JCJ3qM9anQgl3rLHIiMiMsdKIHWOREZM5Y6EQnTsB51uz/FXnHzgVF3qKVxMGIiIqPhU4VlkXcLThO/hg4+juUYJETkXljoVOFZLNrB1RTJ8IiLRXo0gUpE6ezyInIrFlIHYCoLCnSUuE4egScRn0EhU6H9G++Aw4eZJkTkdnjFjpVGNZhJ+E41g+WcbeQ59kc6SvXIr/uq3BUKKSORkT0wriFTvKn1cJh3udw7t0dFvG3kTl5OjR7DyK/7qtSJyMiKjXcQidZs/z7KhzHjIR15AXk16qNtJVruXudiGSJW+gkT0LAduN6uHRpB+vIC8ge5IOUoydZ5kQkW9xCJ9lRJCbCceJY2Bz8DXq1GmnfrYH2vd5SxyIiMikWOsmK8uB+OE4IgMX9RGj/1RHpy1dBX6261LGIiEyOhU7ykJUF1dxZsNu4HkKpRMYX85HtNwaw4FElIqoYWOhk9qwiL8Bx9AhY/fM3dI1eQ9qq9ch/rbHUsYiIyhQ3X8h85efD7tuvoX63M6z++RtZo8Yi5cAxljkRVUjcQiezZHErFo5j/aA8FYr8qtWQHrgKeR06SR2LiEgyJi300NBQHDx4EK6urlAoFAgICCj0flBQEO7fv4/KlSvjzz//xMcff4x69eqZMhLJgM2ObVBNnwyL9DTk9uyF9K+WQVRylToWEZGkTFbo2dnZmDNnDvbt2welUolx48YhLCwMXl5ehnmysrIwY8YMKBQK/Prrr1iyZAlWr15tqkhk5hSpGqimT4Jt8A7oHVRIC1yF3AGDAd66lYjIdMfQIyIiUL16dSiVSgCAp6cnjh07VmieCRMmQPHgl7Fer4e9vb2p4pCZsz55Ai4dWsM2eAfy3myJlCN/IHegN8uciOgBk22hJyUlwcHBwTCtUqmQlJT01Hm1Wi127dqFOXPmFGvZbm6OpZJRanIYh8nHoNUCs2cDS5YUXIL2+eewnjkTrlal+58uvxflB8dRfshhDIB8xmGMyQrd1dUVmZmZhumMjAy4uj55nFOr1WLu3LmYOHEiatasWaxlJyaml1pOqbi5OZr9OEw9BsurV+DoPxzWl6KQX7tOwX3Y32wJpGSX6ufwe1F+cBzlhxzGAMhjHMX9g8Rku9w9PDwQHx8PrVYLADh//jw6dOgAjUaDjIwMAEBOTg7mzJmDYcOG4fXXX8eBAwdMFYfMiRCwXR9UcB/2S1HI9h6C5CMnC8qciIieymRb6HZ2dpg7dy6+/PJLuLi4wN3dHV5eXli8eDHUajX8/PwwZcoU/P3334iLiwNQcJLcO++8Y6pIZAYUCQlwnDAGNr8fgr5SJaStWg9tj/ekjkVEVO4phBBC6hAlZe67TwD57AYqzTEof/sVjhPHwiIpCdoOnZAeuAr6qtVKbflF4fei/OA4yg85jAGQxziKu8udN5Yh6WVmQvXZTNht3gBhY4OMeYuQPXwU78NORFQCLHSSlNWF8IL7sF+/Bt1rryNt9XrkN2wkdSwiIrPDTSCSRn4+7L9ZAnWPrrC6fg1ZYz5GyoGjLHMioufELXQqcxY3Y+A01g/WZ04hv1p1pH+3Bnnt2ksdi4jIrHELncqOELDZ9hNcOraB9ZlTyHm/D1KOhbLMiYhKAbfQqUwoUpKhmjYJtiHB0KsckbZ8NXL7D+KtW4mISgkLnUzO+sRxOAaMguWdeOS1aIW0lWuhr1Vb6lhERLLCXe5kOrm5cJgzC+p/vweLewnI/ORTaEL2s8yJiEyAW+hkEpaXo+HkPxxWf12Crm49pK9cC53nm1LHIiKSLW6hU+nS62G3dhVcuv4LVn9dQrbvUKQcPsEyJyIyMW6hU6mxSLgLx49HQ3n0d+hdXZEWtBHad3tIHYuIqEJgoVOpUO7bC8fJ42CRnAxtpy5I+3YVRJUqUsciIqowWOj0YjIyoJr9Cey2/ABha4v0BUuQ85EfL0cjIipjLHR6fqdPo9LAQbCMuYG815sifdU65Ls3lDoVEVGFxEKn4hECFnfiYRUZAavIC7CKigCO/g4LvR5ZAROQOX0WYGMjdUoiogqLhU5PEgIW8bcLyjvqAqwiI2AdGQGL+4mF53N3R+qCr5HX9l/S5CQiIgMWekUnBCzibj0o7whYRxX8v8X9+4Vmy6/xCnK7vwfdGx7Ie8MDuiYeqPxaXeQlpksUnIiIHsVCr0iEgMWt2IIt7qgHu84vRsIiKanQbPk1ayG3Z5sHxf0GdG80g3B1lSg0EREVBwtdroSAxc2YB1vdkYbj3hYpKYVmy69ZG7mt2xWUd1MP6Jq+AVGJ5U1EZG5Y6HIgBCxibjzY6n7wv4sRsNBoCs2WX6s2ctp1KCjuNx6Ut0sliUITEVFpYqGbG70eljHXYRUVaTjubRUVCYvUwuWtq1MX2g6doGvySHmrXSQKTUREpsZCL8/0eljeuPa/re6oCFhdjIJFWmqh2XR160HbqTN0TZsVlHeTphDOaolCExGRFFjo5YVeD8vr1wqOdT9a3ulphWbT1XsV2i5dC5e3k7NEoYmIqLxgoUshPx+IjobN0T8KijvyQXlnZhhmEQoF8l+tD+3b3QqK+41m0L3eBMLRScLgRERUXrHQy5hCkwKX9l7AnXg8rGahUCC/fgNoH56s9rC8VY6SZiUiIvPBQi9jQmkD3ZstYal2RIZ7Y+Q1LShvqFRSRyMiIjPGQi9r9vZIW/8D3Nwckc27rBERUSmxkDoAERERvTgWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgErUy48NDQUBw8ehKurKxQKBQICAgq9n5ubi0WLFqFKlSqIiYmBn58f6tSpY8pIREREsmSyLfTs7GzMmTMHM2fOxLhx43DlyhWEhYUVmmfTpk2oVq0aRo0ahaFDh2LWrFmmikNERCRrJiv0iIgIVK9eHUqlEgDg6emJY8eOFZrn2LFjaNasGQDA3d0dly9fRkZGhqkiERERyZbJdrknJSXBwcHBMK1SqZCUlFSseVQq1TOX7ebmWLphJSKHcchhDIA8xiGHMQAcR3kihzEA8hmHMSbbQnd1dUVmZqZhOiMjA66uriWeh4iIiIwzWaF7eHggPj4eWq0WAHD+/Hl06NABGo3GsFu9Q4cOuHDhAgDgypUraNiwodGtcyIiInqSQgghTLXwkydP4sCBA3BxcYG1tTUCAgKwePFiqNVq+Pn5IScnB4sWLYKbmxtiY2MxatQonuVORET0HExa6ERERFQ2eGMZIiIiGWChExERyYBJ7xRX2ozdec4cJCYmYtmyZbh8+TJ27twpdZznEhsbi2XLluG1117D3bt3oVarze57odfr4e/vj6ZNmyIvLw+3bt3C/PnzYWtrK3W055KTk4N+/fqhbdu2mD59utRxnkv//v1hY2MDALCwsMCmTZskTlRy169fx759+2BjY4OzZ89i3LhxaNq0qdSxSiQuLg5Dhw5FtWrVABRcfeTu7o6FCxdKnKxk1q1bh9u3b8PFxQU3b97EvHnzzPLne+PGjUhISICdnR20Wi0mT54MhULx9JmFmcjKyhJdunQRubm5QgghAgICRGhoqMSpSm7//v3i999/F3369JE6ynOLjIwUhw4dMky/++674uLFixImKrn8/HyxYsUKw7S/v78ICQmRMNGLWbBggZg2bZpYuHCh1FGeW2BgoNQRXohOpxMjR44U+fn5QgghEhISRFJSksSpSi45OVmcPHnSMP3tt9+Ks2fPSpio5O7duydatGhh+F6Y68/3X3/9Jd5//33DdEBAgDh48GCR85vNLvfi3HnOHHTr1q3QzXTMUdOmTdGlSxfDtF6vh52dnYSJSs7CwgJjxowBAOh0OiQkJJjtFRa7d++Gp6cnatSoIXWUF3L16lUEBQVh+fLlZvmzffHiRQghsHnzZqxZswZHjx6Fi4uL1LFKzMXFBa1btwYAaLVaXLp0CW+++abEqUrGzs4O1tbWhkuks7KyUL9+fYlTlVxMTIxhTwkA1KhR44lbqD/KbHa5F+fOc1T2Dh06hLZt26JevXpSR3kuJ06cwMaNG9GhQwc0adJE6jgl9s8//+D69euYNGkSrly5InWcFzJy5Eg0bdoU+fn58Pb2hoODA1q0aCF1rGKLj49HREQEli5dCkdHR0yZMgXW1tbo27ev1NGe2969e9GjRw+pY5SYSqXC1KlTMXHiRLi5uaFq1aqoWbOm1LFKrEmTJli6dClyc3OhVCpx6dKlQgX/OLPZQudd5cqfU6dO4fTp05g5c6bUUZ5bu3btsH79esTFxWHLli1SxymxQ4cOQalUIigoCOHh4YiKisLGjRuljvVcHh5rtrS0xJtvvonTp09LnKhkHBwcULduXTg6FtxmtHnz5jhz5ozEqV7Mb7/9hu7du0sdo8Sio6Oxfv16rFmzBgsXLoSLiwtWrFghdawSq1GjBr744gusXLkSmzZtQv369Z9Z6Gazhf7oneeUSiXOnz+PwYMHSx2rwjp27BjOnTuHWbNm4d69e4iPjzc8aMcc/PPPP4iLi0OHDh0AFPzgxMXFSRvqOYwePdrw79zcXGRlZWHo0KHSBXpO165dw/nz59GvXz8AwM2bN9G1a1eJU5XMG2+8AY1Gg/z8fFhaWiI+Ph61a9eWOtZzO3XqFJo1awZra2upo5RYQkIC1Go1rKwKKs7NzQ137tyRONXzUavVmDhxIgBgypQp8Pb2LnJey7lz584to1wvxNraGvXq1cOGDRsQERGBl156Cf/+97+ljlViZ86cQUhICKKjo5GTk4MmTZoY/qMzF5cuXcKoUaMghMCuXbuwe/duvPLKK2jUqJHU0YotIyMDa9euRUxMDMLCwnDt2jWMHz/ebM9vOHDgAH777TfEx8fD1tYWDRo0kDpSieTl5WHTpk2IiYnB0aNHYW9vj+HDhxd9Nm85ZGtri6pVq2Lr1q24ePEiEhMTMX78eLP7+X5o6dKlGDNmjNmdHwMAr7zyCi5duoRTp04hMjIS0dHRmDBhgln+fE+YMAGxsbEIDw/Hm2++iZYtWxY5L+8UR0REJANmcwydiIiIisZCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiGfD398dnn30mdQwikhALncjMJSYm4s6dO9i3bx+ys7OljkNEEuF16ERmLigoCJ6enpg6dSrGjx+P3r17AwA2b96MgwcPon79+lAoFDh48CDGjBmDQYMG4ddff0VoaCjUajUSEhIwbdo0uLm5PbHsopYhhMDKlSvx3nvvIS4uDmfOnMG8efPQsGFDLF26FFWrVsWdO3fwwQcfoE2bNti4cSMCAwOxZ88eWFhYYObMmahatSoWLlyIrVu3YuXKlXjnnXeg1+vx999/4+2334avr29Zr0oi82bSZ78Rkcn5+/sLIYRYtmyZ8PHxEUIIcfnyZdG6dWuRnZ0thBBi6dKlhveuXbsmunfvbni05Pbt28XUqVOfWO6zliGEENOnTxfjxo0TQggRHh4u/vrrLzFgwADDo3U1Go1o3bq1SExMFEII0bFjR3Hr1i0hhBA7d+4U06dPNyzLx8dHLFu2TAhR8Kjk1q1bi8uXL5fG6iGqMLjLnciMnTt3Dh4eHgCAvn374ty5c7h58yZOnz6N119/Hba2tgBQ6PGXoaGhyM3Nxdy5c/HZZ5/h1KlTyMnJeWLZz1rGQw8fs+np6YlXXnkFFy5cgKenJwDA2dkZVapUwblz54o1lodfZ2dnh9dff93sHs5CJDXzvMkwEQEoeBa6Xq/HvHnzABQ8hGLnzp1wdXUt8j7oQgjUrl0bX3zxheG1R59k+Oh8xu6lrlQqn3jt8a95dFo8OMKn0+meudzifDYRFcYtdCIzlZmZiZycHMyfPx+zZs3CrFmzMG3aNOzatQutWrXCxYsXDVve4eHhhq9r3bo1Ll26hIyMDADAX3/9hQULFjyx/Gct42lUKhU8PT0N86WmpiIhIQHNmzcHUPDHxr179wAUPN7ycREREQCA7Oxs/Pnnn898CAURPYlb6ERmKCcnB5MnT0ZmZiYSEhJQpUoVAMDff/+Ne/fuISgoCH5+fhgxYgQaNWoECwsLw2Mw69Wrh9mzZ2PatGmoWbMm0tLSMHXq1Cc+o2HDhvD393/qMo4ePYrIyEjcvXsXzs7O6Ny5MwBg8eLF+Prrr3H27FncvXsXixYtQuXKlQEAI0eOxFdffYVmzZrBxsYGp06dwoEDB/DOO+8AALKysjBv3jxER0fD398f7u7uJl+PRHLCs9yJZOr48eNo3749AGDLli24ffs2pk2bVubLKA5fX18EBASgVatWpb5sooqCW+hEMrV9+3acOHECCoUCqamp+PTTTyVZhjFbtmzBjRs3e2oe2gAAAEVJREFUsGHDBtSqVQtVq1Yt9c8gqgi4hU5ERCQDPCmOiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQD/w+esXslqsifDwAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" + "id": "be6f7ced", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false } - ], - "source": [ - "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n", - "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n", - "plt.plot(group, agegroupmean, \"r-\")\n", - "plt.axis([0,9,0, 1.0])\n", - "plt.xlabel(r'Age group')\n", - "plt.ylabel(r'CHD mean values')\n", - "plt.title(r'Mean values for each age group')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n", - "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, + }, + "outputs": [], "source": [ - "$$\n", - "f(y_i\\vert x_i)=\\beta_0+\\beta_1 x_i.\n", - "$$" + "# the histogram of the bootstrapped data (normalized data if density = True)\n", + "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n", + "# add a 'best fit' line \n", + "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n", + "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n", + "plt.xlabel('x')\n", + "plt.ylabel('Probability')\n", + "plt.grid(True)\n", + "plt.show()" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "69bcb406", + "metadata": { + "editable": true + }, "source": [ - "This expression implies however that $f(y_i\\vert x_i)$ could take any\n", - "value from minus infinity to plus infinity. If we however let\n", - "$f(y\\vert y)$ be represented by the mean value, the above example\n", - "shows us that we can constrain the function to take values between\n", - "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n", - "at our last curve we see also that it has an S-shaped form. This leads\n", - "us to a very popular model for the function $f$, namely the so-called\n", - "Sigmoid function or logistic model. We will consider this function as\n", - "representing the probability for finding a value of $y_i$ with a given\n", - "$x_i$.\n", + "## The bias-variance tradeoff\n", "\n", - "## The logistic function\n", + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", "\n", - "Another widely studied model, is the so-called \n", - "perceptron model, which is an example of a \"hard classification\" model. We\n", - "will encounter this model when we discuss neural networks as\n", - "well. Each datapoint is deterministically assigned to a category (i.e\n", - "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n", ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 - "classifier that outputs the probability of a given category rather\n", - "than a single value. For example, given $x_i$, the classifier\n", - "outputs the probability of being in a category $k$. Logistic regression\n", - "is the most common example of a so-called soft classifier. In logistic\n", - "regression, the probability that a data point $x_i$\n", - "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event," + "Let us assume that the true data is generated from a noisy model" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "ce87dc4f", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", "$$" ] }, { "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that $1-p(t)= p(-t)$.\n", - "\n", - "## Examples of likelihood functions used in logistic regression and nueral networks\n", - "\n", - "\n", - "The following code plots the logistic function, the step function and other functions we will encounter from here and on." - ] - }, - { - "cell_type": "code", -<<<<<<< HEAD - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "\n", -======= - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeMAAAFnCAYAAACVViH2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de1xUdd4H8M/AXBhmgBmH4SaCSoIXUEBT6WKaptlaptU+ZbWPbUWbSbvtbrlbu4/uY0+btllbm2vmblnbxTXvtxWzsPJSJuINAQUFkYswMAxzv53nD5OiVFAYzszweb9e85Izc2bOd74OfOb8zk0iCIIAIiIiEk2I2AUQERH1dgxjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQMYyIiIpExjIm66H/+53+wfv36Hl/um2++iTfeeOOijy1cuBCjRo3C2rVrL/n8/fv3Y8aMGbj77rvxu9/9zldlXpRYPSPyVxIeZ0zUNRaLBQqFAlKptEeX63Q6IQgCFArFRR9/8MEHMWPGDMycOfOij8+aNQv33Xcfbr/9dnzwwQeYNWuWT+pcu3Yt1q1bh/fee6/tPrF6RuSv+JtA1EUqlUqU5crl8i49v66uDjExMQDgsyC+FLF6RuSvGMZEneD1evGnP/0JZWVlCA0NRXJyMp577jnk5+fjtddew+jRo/Hiiy8CAHbt2oW//OUviIqKwsiRI7Fp0yZERERg4cKFWL58OXbt2oW8vDwUFhbixIkT+O1vfwuHw4G1a9eiubkZr7/+Ovr37w8AOH36NBYuXAi73Q6Px4M5c+Zg3Lhx2L17NxYuXAi9Xt+2xnnkyBHMnz8fCoUCGRkZuNyg19y5c9HQ0IAXXngBer0e119/PVasWIF7770XeXl5eO211/Duu+/i2WefxcyZM/G3v/0NH374IaZMmQKTyYSSkhIMGzYMixYtanvNjRs34l//+hfCwsIAAI8//jhCQ0OxfPlyNDY24sEHH0RqaioyMjJ+1LPDhw9j8eLFEAQBEokEzzzzDIYPH46dO3fipZdeQnR0NEaMGIH9+/cjJCQEb7zxBnQ6nS/+q4nEIRBRhwoKCoSHH364bXrOnDnCmTNnBEEQhNdee02YN2+eIAiCYDAYhMzMTOHAgQOCIAjCJ598IqSlpQn79u1re+6ECROEhQsXCoIgCDt27BDGjh0r5OfnC4IgCAsXLhT++Mc/CoIgCC6XS5gyZYqwZs0aQRAEobKyUsjKyhIqKysFQRCENWvWCA888IAgCILgcDiEcePGCZs2bRIEQRCKi4uF9PT0tudezIQJE9rVNW/ePOG1115rm37ggQfaPX/evHnC9OnTBYfDIdjtdmH06NFCYWGhIAiCcODAAeG6664TDAaDIAiCsG3btraefL/OC77fM5PJJIwePbqtlv379wujR48WWlpa2p4/YsQIoaqqShAEQXjkkUeEZcuWXfJ9EQUi7sBF1AmRkZEoKyvD7t274fV6sWTJEiQkJPxovl27dkGn0yE7OxsAMHHiRISHh/9ovuuuuw4AMGjQIDQ1NSEnJwcAkJaWhurqagDAoUOHUF1djTvuuAMAkJSUhBEjRmDjxo0/er2ioiIYDAZMnToVADBkyJC2tevuNGbMGMjlcigUCiQnJ7fVunbtWowbNw59+vQBAEyaNAn33Xdfp17zs88+g1qtxpgxYwAAo0aNQlRUFD799NO2eQYMGIB+/foBaN8jomDBMCbqhKysLCxcuBBvvfUWJkyYgH/84x8XHQZuaGiAVqttd59Go/nRfBe2mYaGhgIA1Gp127TL5QIA1NfXIzIyst1OTn369EF9ff1FlxsZGdn2epdablddqBMAFApFW611dXVtQQwAUqkUI0aM6NRr/vC5wPn3WVdX1+FyiYIFw5ioE1pbWzF69Gi88847eO+997B+/fqLHpqj1+vR1NTU7j6j0XhVy4yLi4PJZILb7W67r6mpCbGxsRdd7g/nvdLlymQyOJ3OtmmTydTp58bHx7d73263GyUlJVf1XOD8+4yLi+v08okCHcOYqBN27NiBVatWATg/XBwbGwuv1/uj+W666SY0NTXhwIEDAICdO3fC4XBc1TJHjBiBpKQkbN68GQBw5swZHDp0qG3Y+vsyMzOh0+mwdetWAMDx48dRXl5+RctLTEzEiRMnAACVlZWorKzs9HNnzJiBzz//vC1Ut27d2naMs0qlgs1mAwDk5eW1+8IAABMmTIDFYsH+/fsBAIWFhWhpacHNN998RfUTBbLQBQsWLBC7CCJ/p1Ao8OGHH2L16tV4//33kZKSgsceewybNm3CypUrUVFRAafTiXHjxmHIkCF44YUXsG3bNqjVapw6dQq33HIL+vbti2eeeQaHDh3C0aNHMWbMGDz99NOor69HcXExkpKS8OKLL6KqqgpGoxE33ngjbrzxRrz11ltYtWoVtm/fjj/84Q8YMWIEdu/ejVdeeQVVVVWor6/HzTffjJEjR+LVV1/FunXrUFNTA7lcjt27dyM+Ph4DBw5s937mzp2L4uJiHD16FBaLBdnZ2ejfvz/WrFmD1atXo7m5GYIg4IsvvkBiYiI+++wzrF+/HidOnEBCQgK2bduGTz75BMePH0dKSgpGjx6N6OhovPjii9i4cSNqa2vx7LPPQi6XQ6/XY926dVi/fj0GDhwIo9HYrmfXX389xo4diyVLlmDNmjXYt28fFi1ahKSkJOzduxdLlixBVVUV7HY7zGYzli9fjoqKCoSEhCArK0ukTwRR9+JJP4i6mdFobLe9NisrCx9//DFSUlJErIqI/BmHqYm62RNPPNE2NJ2fnw+dTofk5GSRqyIif8aTfhB1s8zMTMyaNQthYWGQSCR4/fXXedpHIrosDlMTERGJjMPUREREImMYExERiUy0DVkNDa1iLfqqabXhaG62il1GUGOPfY897hnss+8FWo/1+ohLPsY14ysglYZ2PBN1CXvse+xxz2CffS+YeswwJiIiEhnDmIiISGQMYyIiIpExjImIiETGMCYiIhIZw5iIiEhkDGMiIiKRMYyJiIhExjAmIiISGcOYiIhIZAxjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQMYyIiIpExjImIiETGMCYiIhIZw5iIiEhkDGMiIiKRMYyJiIhEJu1ohoaGBrz66qsoKSnBmjVrfvS4w+HAokWLEBsbi9OnTyM3NxcDBgzwSbFERETBqMM14wMHDmDixIkQBOGij69cuRLx8fF47LHHMHv2bDz33HPdXiQREVEw6zCMb731VqhUqks+XlBQgKysLABAWloaSkpKYDabu69CIiKiINfhMHVHDAZDu7BWq9UwGAxQq9WXfZ5WGw6pNLSri+9xen2E2CUEPfbY99jjnsE++57YPfZ4vLA7PbA73bA53LA7PXB8O50cF4lojbJTr9PlMNbpdLBYLG3TZrMZOp2uw+c1N1u7uugep9dHoKGhVewyghp77Hvscc9gn32vKz0WBAFOtxcWmwtWuxtWhxsW+3c/2xxu2B2e7352emBznr/P7vw2dF0euNzeSy5jYEIk/vCzUe3qvZSrCmOj0QipVAq1Wo3x48fj4MGDGDVqFEpLSzF48OAO14qJiIi6k9vjRavVhVarE61WF0xWJ1otTrTaXDDbXDBbXWi1uWCxuWC2u2CxueH2XDpIL0YCIEwRijC5FBHhMujlYVDIzk/LZSEIk4dCIZNCIQ+BQhaKwcnaTr92h2H89ddfY8OGDWhoaMDSpUvx85//HMuXL4dGo0Fubi5+9rOfYdGiRVi6dCmqqqrwf//3f1f05oiIiC7F6xXQYnGiyWSH0exAc6sDzWYHjK1O2FweNDZb0WJxwmx14eK7GX9HAiA8TAqVUoY+EWFQKaVQhcmgCpMiPEyKcIUM4WFSKBVSKBWhUCqkCFdIESY/P62QhUIikfjkfUqES+0m7WOBOHzDYSffY499jz3uGexz57g9Xhha7GhssaOxxfbtv3YYTHY0m+wwmp3weC8dU0qFFFEqOaJUckReuIXLEBEuR8T3/lUrZVCFyRAS4psw7YxuH6YmIiLqLLfHi8YWO+oMVtQ1WVHfbMW5ZhsajDYYTHZcbJVQIgG0EQoMiI9En0gFtBEKaCPCoFHLoY1QQKNWIKW/DiZj4O1/dDEMYyIi6hYutwe1BitqGi2oMVhQ03j+5waj7aJrtxq1HIP6RkGvVUIfpYQuKgzRUWGIjlJCEyFHaMjlj75VyALviJxLYRgTEdEVM1mdqKprxZlz5rZbrcEK7w9Wc1VhUvSPj0Bcn/BvbyrE9lFCr1EGVZh2FcOYiIguy2p341StCRW1JlTWtaKyzgSDydFuHoU8FAMSItBPr0ZCtAoJ0Sr0jVYhUiX32U5PwYRhTEREbQRBQH2zDWVnjCg/24KKGhNqGi3t9lSOVMkxPEWH5NgIJMWq0S9GjWiNEiEM3avGMCYi6sUEQcDZRgtKKptRdsaIsuoWmCzOtscVslCkJWmQ0jcKA+Ij0T8uAtoIBdd2uxnDmIiolzG02FF8ugnHK5tRXNncLnw1ajlGD4lBaj8Nrukbhb56VYc7UlHXMYyJiIKc2+PFyeoWHK4w4Ei5AWcbvzuFcZRKjpxhsRicrEVakhb6qDCu9YqAYUxEFISsdjcOVzTiYFkjjp4ywObwAABk0hBkDNQhfWAfDE3WIiFaxfD1AwxjIqIg0Wp14kBZAwrLGnD8dHPbsb3RUWG4Lj0ew1N0SOungZyHFPkdhjERUQCz2t04eKIBXx8/h+LTTW0BnBSjRnaqHtmpevTVc+3X3zGMiYgCjMfrxbFTTdh9pA4HTzS2XX0oOS4CY4bEYmSaHvpOXkeX/APDmIgoQNQ0WrD7SC32HKtDi/n8HtDxunCMHRqL0UNiEdsnXOQK6WoxjImI/JjL7UVhWQM+O3gWZWeMAIBwhRQTsvvihox49I+L4BB0EGAYExH5IUOLHZ8dPIsvDteg1eoCAAztr8W4EQnIGhQNmZQ7YQUThjERkR+pqDEhf38VvilpgFcQoAqTYsrofrgpsy/iOAwdtBjGREQi8woCDpY1YPvXZ3DybAsAIFGvxuRr+2H0kBgeitQLMIyJiETi8XrxdfE5bNlXiZpvz4o1PEWHydf2w5BkLbcF9yIMYyKiHub2eLH7SC227qtEg9GOEIkE16fHYerYZCREq8Quj0TAMCYi6iFer4C9x+qw4ctTaGyxQxoqwYSsvpg6JgnRPC64V2MYExH5mCAIOFDagHVfVKDWYIU0VIJJIxMxdWwytBEKscsjP8AwJiLyobIzRqz69ARO1bYiRCLBjcPjccf1A6CLChO7NPIjDGMiIh+oM1iwbN0RHChtAABcOzgGM8YN5OFJdFEMYyKibmRzuLFp92l8cqAabo8XKX0jce/Ng5DSN0rs0siPMYyJiLqBIAj4+vg5fPTpCbSYnYjRKjFz3EBcOziGhyhRhxjGRERdVGuw4F/5ZThe2QyZNAR33jAAD04bhhajVezSKEAwjImIrpLL7cXmPaexdV8lPF4Bw1N0mHVLKmI0Sp41i64Iw5iI6CqU17Tg7a0lqGm0QBuhwKxJqchOjeaQNF0VhjER0RVwuDxY/0UF8vefgSAAE7L74u6bUqBU8M8pXT1+eoiIOqn8bAtWbC5GfbMNMVolHpo6GGlJWrHLoiDAMCYi6oDH68Wm3aexeU8lBEHA5Gv7Yca4gVBwuzB1E4YxEdFl1Ddb8damYlTUmKCLVOCRaUO5NkzdjmFMRHQJXx6uxfs7yuBweTB2WCweuCUV4WEyscuiIMQwJiL6AYfTg3/ll2L30TooFVLk3jEUY4fGiV0WBTGGMRHR95xttODv64+iptGCAfER+MX0dOh5eUPyMYYxEdG39h6tw8rtJXC6vJg4MhE/nXANZNIQscuiXoBhTES9ntvjxaqdJ7GzsBpKRSjm3JmOUYNjxC6LehGGMRH1aiarE39fdxSlZ4zoG63C3LsyEKvlZQ6pZzGMiajXqqxrxd/WHobB5MDIVD0enjYEYXL+WaSex08dEfVKXx+vxz+3HIfL7cWMcQMxLSeZ55Um0TCMiahXEQQBm/dWYt3nFQiThyLvruHIHBQtdlnUy3UqjPfs2YP8/HzodDpIJBLMnTu33eNnzpzB4sWLkZGRgePHj2PatGmYOHGiTwomIrpabo8XK/9Tgt1H6qCLVOCX94xAol4tdllEHYexzWbD/PnzsWXLFsjlcuTl5WHv3r3Iyclpm2fFihUYOXIkZs+ejeLiYvzqV79iGBORX7HYXXhj7RGUVBnRPy4Cv7x7OKLUCrHLIgIAdHgAXVFRERISEiCXywEA2dnZKCgoaDdPdHQ0mpqaAABNTU0YNmxY91dKRHSVDC12vPDeAZRUGZGdqse8+7MZxORXOlwzNhgMUKlUbdNqtRoGg6HdPA899BCeeOIJ/PnPf8bhw4cxZ86c7q+UiOgqnG0wY8m/D6G51YHJ1/bDTydcg5AQ7qhF/qXDMNbpdLBYLG3TZrMZOp2u3Ty/+93vcM8992DatGloamrC5MmT8cknn0Cj0VzydbXacEilgXf5Mb0+QuwSgh577Hu9pcfHTzVh0QcHYba58NC0YZg54ZoeXX5v6bOYgqXHHYZxZmYmampq4HQ6IZfLUVhYiFmzZsFoNEIqlUKtVqO2thZ6vR4AEBkZiZCQEHi93su+bnOztXveQQ/S6yPQ0NAqdhlBjT32vd7S48PljVi67ijcHgEP/2QIrk+P7dH33Vv6LKZA6/Hlvjh0GMZKpRILFizA888/D61Wi7S0NOTk5GDx4sXQaDTIzc3F73//e7z77rs4ePAgqqur8dRTT6FPnz7d+iaIiDpr37E6rNh8HNJQCfLuysCIa3joEvk3iSAIghgLDqRvMxcE2rewQMQe+16w9/jzQzVYua0ESoUUv7xnOAYlXnpzmS8Fe5/9QaD1uEtrxkREgWLngWq8v6MMaqUMv/mvTCTHBcf2RAp+DGMiCgr/+aoK//7sJCJVcjx9byb68mQeFEAYxkQU8DbtOY11n1dAG6HA0/dlIa4Pr7pEgYVhTEQB7UIQ6yLD8PSsLMRolGKXRHTFGMZEFLC27atsC+J592chOopBTIGpw9NhEhH5o/yvq7C6oPz80PQsBjEFNoYxEQWcT745g48+PQmNWo5nODRNQYBhTEQBZVfRWXzwyQlEqeR4+r4sxGq5sxYFPoYxEQWMr4/X493/lEKtlOG392UhXqfq+ElEAYBhTEQB4XC5AW9tKkaYIhS/+a9M9I1mEFPwYBgTkd8rO2PE0nVHEBIiwZN3DeeZtSjoMIyJyK9V1bfirx8fhscrYM6d6UhL0opdElG3YxgTkd8612zFklVFsDvceHjaEF59iYIWw5iI/JLJ6sSSfx+CyerC/ZNTMXZonNglEfkMw5iI/I7D6cFfVx/GuWYbfpKTjJuzE8UuicinGMZE5Fc8Xi/+vuEoTtWacF16HGaOGyh2SUQ+xzAmIr8hCALe216Kw+UGpA/og9lTB0MikYhdFpHPMYyJyG9s2nManx+qRXJcBB6/Mx3SUP6Jot6Bn3Qi8gv7iuuw/otT0EWG4Vd3D4dSwYvKUe/BMCYi0Z2oNuKfW45DqQjFr+4Zjii1QuySiHoUw5iIRHWu2YrX1xyB1wvMuTMDffVqsUsi6nEMYyISjcXuwqurD8Nsc+GBKakYNqCP2CURiYJhTESicHu8WLruKOqarLh1dBLGZ/YVuyQi0TCMiUgUH+08geOVzcgaFI27J6SIXQ6RqBjGRNTjCg6exaeFZ5GoV+HR24cihMcSUy/HMCaiHlVa1Yz3d5RBrZThybuGI0zOQ5iIGMZE1GMajDa8se4oAOCJGemI1ihFrojIPzCMiahH2J1uvL7m/J7T909O5XWJib6HYUxEPicIAv655TiqGyy4Obsv95wm+gGGMRH53LavqvBNaQNS+2lw78RBYpdD5HcYxkTkU0crDFhTUA5thIIXfyC6BP5WEJHPnDPa8ObGYwgNleCJGRmIUsnFLonILzGMicgnHE4P/rbmCCx2Nx6cnIaBCZFil0TktxjGRNTtBEHAyv+UoLrBjAlZfXHjiASxSyLyawxjIup2nxaexb7ieqT0jcR9k7jDFlFHGMZE1K3Kz7bgo50nEBEuw5w7M7jDFlEn8LeEiLqNyerE0vVH4RUE/OKOYdBGKMQuiSggMIyJqFt4vQKWbzyG5lYHZo4biCH9eW1ios5iGBNRt9jw5SkUn25G5jXRmDo2WexyiAIKw5iIuuxwuQGb9pyGXhOGh6cN4SURia4Qw5iIuqTJZMeKzcWQhoZgzp0ZUIXJxC6JKOAwjInoqrk9XizbcAxmmwv3TRqE5LgIsUsiCkiduqr3nj17kJ+fD51OB4lEgrlz57Z7XBAEvPfeewCAs2fPwmQy4c9//nP3V0tEfmXt5xU4ebYFo4fEYHwmT+xBdLU6DGObzYb58+djy5YtkMvlyMvLw969e5GTk9M2z4YNGxAZGYk777wTAFBSUuK7ionILxSdbMR/vqpCrFaJ/751MCTcTkx01Tocpi4qKkJCQgLk8vMneM/OzkZBQUG7eTZt2gSj0Yh3330XS5YsgUql8kmxROQfGlts+MfmYsikIXj8znQoFZ0aZCOiS+jwN8hgMLQLV7VaDYPB0G6empoamM1mzJ07F6dOncIjjzyCrVu3IjQ09JKvq9WGQyq99OP+Sq/nNjFfY499rys9dnu8WPThQVjsbsy9ZwRGpnN4+lL4Wfa9YOlxh2Gs0+lgsVjaps1mM3Q6Xbt51Go1RowYAQAYMGAAzGYzamtrkZiYeMnXbW62Xm3NotHrI9DQ0Cp2GUGNPfa9rvZ49WcnUVrZjLFDY5E1sA//vy6Bn2XfC7QeX+6LQ4fD1JmZmaipqYHT6QQAFBYWYvz48TAajTCbzQCAnJwcnDlzBsD5sPZ4PNDr9d1ROxH5kSMVBmz7qgoxWiUenJLG7cRE3aTDNWOlUokFCxbg+eefh1arRVpaGnJycrB48WJoNBrk5ubi0UcfxUsvvYRly5ahqqoKixYtgkLBc9ISBZPmVse3xxNL8Ph0bicm6k4SQRAEMRYcSEMLFwTakEggYo9972p67PUK+MtHB1FSZcSsSYMwaVQ/H1UXPPhZ9r1A63GXhqmJiDbvOY2SKiOyBkVj4shL7wtCRFeHYUxEl1V2xogNu09BF6nAQ7cN4XZiIh9gGBPRJZltLry58RgkkCD3jmFQK3neaSJfYBgT0UUJgoC3tx5Hc6sD028cgEGJGrFLIgpaDGMiuqiCg2dx8EQjBidp8BNen5jIpxjGRPQj1efM+HDnSaiVMjx6+zCEhHA7MZEvMYyJqB2Hy4NlG4/B7fHi57cNgTaC5wwg8jWGMRG1s2rnCdQ0WjBxZCIyB0WLXQ5Rr8AwJqI2B0rPoaCoBol6NX46IUXscoh6DYYxEQEAmkx2vLOtBHJpCB6bPgyyALyqGlGgYhgTEbxeAW9tKobF7sa9EwehbzSvSU7UkxjGRIQt+ypResaI7FQ9bsrk9YmJehrDmKiXKz/bgg1fnII2QoHZUwfzdJdEImAYE/ViVrsbb248BkEQ8Oi0oTzdJZFIGMZEvdi/dpSiscWO23KSMThZK3Y5RL0Ww5iol9p7tA77jtVjQHwkpt8wQOxyiHo1hjFRL3TOaMN7+aVQyEPx2B1DIQ3lnwIiMfE3kKiX8Xi8eGvjMdidHjxwSypitOFil0TU6zGMiXqZD3eUorzGhDFDY3FdepzY5RARGMZEvUrZGSNWf1KG6KgwPDg5jYcxEfkJhjFRL2Gxu7B80zEAwKO3D0V4mFTkiojoAoYxUS8gCALe/U8pmkwO3HtLGgYlasQuiYi+h2FM1AvsPlKH/SXncE1iFH46KVXscojoBxjGREGuvsmK93eUQamQIvf2oQjlYUxEfoe/lURBzO3x4s2Nx+BwefCzKWmIjlKKXRIRXQTDmCiIrf/iFE7XteL69DiMGRordjlEdAkMY6Igdfx0E7btq0SMRolZt3A7MZE/YxgTBaFWqxNvbS5GSIgEuXcMg1LBw5iI/BnDmCjICIKAd7aVwGh24s4bB2BgQqTYJRFRBxjGREGm4OBZHDzRiCHJWkwdmyx2OUTUCQxjoiBS3WDGR5+ehFopwyPThiKEp7skCggMY6Ig4XR58ObGY3C5vXho6mBoIxRil0REncQwJgoS//7sJM42WDAhqy+yUvVil0NEV4BhTBQEDpY14NPCs+gbrcJ/3XyN2OUQ0RViGBMFuCaTHf/cehwyaQgemz4Mclmo2CUR0RViGBMFMK9XwFubimGxu3HvxEFI1KvFLomIrgLDmCiAbd57GqVnjMhO1WN8ZoLY5RDRVWIYEwWoE9VGbPjyFLQRCsyeOhgSHsZEFLAYxkQByGxzYfnGYwCAx+4YBrVSJnJFRNQVDGOiACMIAt7eehwGkwN3XD8Aqf00YpdERF3EMCYKMJ8Wnj/d5eAkDW6/rr/Y5RBRN2AYEwWQyrpWrPr0BNRKGR69fRhCQridmCgYdOq6anv27EF+fj50Oh0kEgnmzp170fk2btyIp59+GoWFhVCpVN1aKFFvZ3O4sWzDUbg9Ah6ZNpSnuyQKIh2Gsc1mw/z587FlyxbI5XLk5eVh7969yMnJaTdfeXk5ysvLfVYoUW8mCALeyy9FfbMNt45OwvAUndglEVE36nCYuqioCAkJCZDL5QCA7OxsFBQUtJvHZrNhxYoVeOKJJ3xSJFFv9+WRWuw7Vo+BCZGYedNAscshom7W4ZqxwWBoN+SsVqthMBjazfPKK69gzpw5bYHdGVptOKTSwDttn14fIXYJQY89bq+y1oT3d5yASinD72ePRpyu65uA2OOewT77XrD0uMMw1ul0sFgsbdNmsxk63XdDZLW1tTCZTNi2bVvbfW+//TZuuukmZGRkXPJ1m5utV1uzaPT6CDQ0tIpdRlBjj9uzO934v5XfwOnyIPf2oQj1ervcH/a4Z7DPvhdoPb7cF4cOwzgzMxM1NTVwOp2Qy+UoLCzErFmzYDQaIZVKER8fjxdffLFt/pdffhkPPfQQd+Ai6iJBEPDu9lLUGqyYfG0/ZCMbAMYAABTDSURBVPOyiERBq8NtxkqlEgsWLMDzzz+PV155BWlpacjJycHy5cvxwQcftM3X1NSEpUuXAgBWrFiB+vp631VN1At8fqimbTvx3eNTxC6HiHxIIgiCIMaCA2lo4YJAGxIJROzxeVX1rXj+3QNQyEIw/6FrER2l7LbXZo97Bvvse4HW48sNU/OkH0R+xmp3Y+n6o3B7vHh42tBuDWIi8k8MYyI/IggC/rn1OM412zB1TBIyr4kWuyQi6gEMYyI/sv3rMygsa0BaPw2PJybqRRjGRH6itKoZHxeUI0otxy+mD0NoCH89iXoL/rYT+QGj2YFlG85fn/jx6emIUvO800S9CcOYSGRujxfL1h9Fi8WJeyak8PrERL0Qw5hIZB8XlKOsugUj0/SYfG0/scshIhEwjIlEtO9YHfL3n0G8Lhw/v20IJBJen5ioN2IYE4mkqr4V72wrQZg8FHNnZkCp6NTlxYkoCDGMiURgtrnwt7VH4HR78ei0oYjvhisxEVHgYhgT9TCvV8CbG4+hscWO26/rjyxeAIKo12MYE/WwtZ9X4NipJgxP0WH6jQPELoeI/ADDmKgH7Suuw9Z9lYjRKpF7+1CEcIctIgLDmKjHnKo14e2t53fYevKu4QgPk4ldEhH5CYYxUQ9obnXg9TWH4XZ78Yvpw5AQzR22iOg7DGMiH3O5Pfjb2iMwmp24e0IKhqfwSkxE1B7DmMiHBEHAO9tKcKrWhJxhcbh1dJLYJRGRH2IYE/nQ1n2V2HusHgMTIjF7ahrPsEVEF8UwJvKR/SXnsGZXBfpEKjB3ZgZk0lCxSyIiP8UwJvKB8rMtWLG5GGHyUPzy7hHQ8JKIRHQZDGOibtZotJ3fc9rjxS+mp6NfjFrskojIzzGMibqR1e7Gqx8fhsnqwqxJqRieohO7JCIKAAxjom7i9njx9/VHUNNowaRRiZg4MlHskogoQDCMibqBIAh4e+txHDvdjMxronHvzYPELomIAgjDmKgbrNlVgb3H6pGSEInHpg9DSAgPYSKizmMYE3XRzgPV2LqvErF9wvHk3cOhkPEQJiK6Mgxjoi74puQcPthRhkiVHL/+6QhEhMvFLomIAhDDmOgqlVY1Y/mmYsjloXjqnhHQa5Ril0REAYphTHQVTtWa8NePD0MQBDwxIx3JcRFil0REAYxhTHSFahoteOXfh+BweZB7xzCkD+CxxETUNQxjoivQaLTh5VVFMNtc+O9bB+PawTFil0REQYBhTNRJLWYH/rKqCM2tDvx0wjUYNyJB7JKIKEgwjIk6odXqxMurinCu2Yaf5CTj1jG8LjERdR+GMVEHzDYXXv6oCNUNFkzMTsTMcQPFLomIggzDmOgyrHYXXl5VhKpzZozP6otZtwyCRMKzaxFR92IYE12C1e7Gy6sOobKuFeNGxOOByakMYiLyCYYx0UXYHG68sroIp2pNuD49Dj+7dTBCGMRE5CNSsQsg8jcWuwtLVh3CqVoTxg6LxUO3DWEQE5FPMYyJvsdkdWLJR+e3EV+fHnc+iHkFJiLyMYYx0bdazA689FERahotGJ+ZgAempHGNmIh6BMOYCECTyY6XPipCfZMVk0Yl4r6J3GuaiHoOw5h6vVqDBUtWFcFgcuC2scm466aBDGIi6lGdCuM9e/YgPz8fOp0OEokEc+fObff48uXL0djYiOjoaBw7dgxPPvkkUlJSfFIwUXeqqDHh1dWHYLa5MHPcQPwkJ5lBTEQ9rsMwttlsmD9/PrZs2QK5XI68vDzs3bsXOTk5bfNYrVb8/ve/h0QiwdatW/HSSy9h2bJlPi2cqKuOnWrC39YegdPtweypg3muaSISTYfHGRcVFSEhIQFyuRwAkJ2djYKCgnbz/OpXv2pbm/B6vQgPD+/+Som60dfH6/Hq6kPweAU8MSODQUxEoupwzdhgMEClUrVNq9VqGAyGi87rdDqxbt06zJ8/v8MFa7XhkEpDr6BU/6DX8yLyvubLHguCgHUFJ/H25mKEh0nxh5+PQUZKtM+W56/4Oe4Z7LPvBUuPOwxjnU4Hi8XSNm02m6HT/fhi6k6nEwsWLMBTTz2FpKSOr2jT3Gy9wlLFp9dHoKGhVewygpove+z2ePH+jjLsKqqBNkKBX949HHGRil73f8rPcc9gn30v0Hp8uS8OHQ5TZ2ZmoqamBk6nEwBQWFiI8ePHw2g0wmw2AwDsdjvmz5+Phx56COnp6di+fXs3lU7UPax2N/66+hB2FdUgKVaNP/xsFJJig+MbNREFvg7XjJVKJRYsWIDnn38eWq0WaWlpyMnJweLFi6HRaJCbm4vf/va3OHHiBKqrqwGc36FrypQpPi+eqDMaW2z46+rDONtowYgUHR6bPgxhch7VR0T+QyIIgiDGggNpaOGCQBsSCUTd3eOSymYsXX8UZpsLk0Ym4t6Jg3r96S35Oe4Z7LPvBVqPLzdMzdUDCkqCIGDngWp8tPMkJBLgwcmpmJCdKHZZREQXxTCmoONye/De9jJ8eaQWkeEyzJmRgdR+GrHLIiK6JIYxBZXGFhv+vv4YTtWakBwXgbyZGegTGSZ2WUREl8UwpqBRdLIR/9hcDIvdjevS4/CzKWmQywLvWHYi6n0YxhTw3B4v1u6qwH++roI0NASzpw7GjcPjeY5pIgoYDGMKaIYWO97ceAwnz7YgVqvE43em8/hhIgo4DGMKWPuK6/De9jLYHG6MHhKD/751MJQKfqSJKPDwLxcFHKvdhX/tKMO+Y/VQyEI5LE1EAY9hTAGltKoZKzYXw2ByYGBCJB69fShitbxKGBEFNoYxBQS70401BRXYWViNEIkEd1zfH7df3x+hIR2eXp2IyO8xjMnvHTvdhJXbStDYYke8Lhw/v20IUvpGiV0WEVG3YRiT37LaXfj3Zyfx+aFahEgk+ElOMu64vj9kAXgdbCKiy2EYk98RBAF7jtZh9WcnYbK60C9GjZ/fNgTJcTxkiYiCE8OY/EplrQmvrTqIsjNGyGUhuOumgZgyOgnSUG4bJqLgxTAmv2C1u7Bx92l8cqAaXq+A7FQ97ps4CLoonleaiIIfw5hE5fF6UXCwBhu+PAWzzYWYPuG49+ZrkHlNtNilERH1GIYxiUIQBBypMGDVpydRa7AiTB6Ku24aiFlTh6LFaBW7PCKiHsUwph53otqItbsqUHrGCIkEGJ+ZgOk3DkSUSs6rLBFRr8Qwph5TWdeKdV9U4HC5AQAwPEWHu29KQWKMWuTKiIjExTAmn6usa8XmvadxoLQBAJDWT4OZNw3EoESNuIUREfkJhjH5zMnqFmzee7ptTXhAfARmjkvB0P5aXtSBiOh7GMbUrbyCgKMVBvznqyqUVBkBAKn9NJh2XTKG9e/DECYiugiGMXULh8uDvUfrkL//DOqazu8NnT6gD6Zd1x+p/TgcTUR0OQxj6pIGow27imrw+aEamG0uhIZIcH16HG65th+SYnn6SiKizmAY0xXzegUcLjfgs4NncbTCAAGAWinDtOuScXN2IjRqhdglEhEFFIYxdVp9sxW7j9Rhz9FaNJkcAICUhEiMz+qLawfH8BhhIqKrxDCmy7La3fim9Bx2H6nFieoWAIBCHorxmQkYn9WXQ9FERN2AYUw/4nB5cOhkI74+fg6Hyw1we7yQABiSrMUNGfHITtVDIedaMBFRd2EYEwDA5nDj6KkmHCg9h0MnDXC4PACAhGgVxgyJQU56HKKjlCJXSUQUnBjGvViLxYlDJxtRWNaA4tPNcHu8AIAYjRKjh8Zg9JBYJOp5qkoiIl9jGPciXq+AU7UmHC434HCFAZV1rW2PJepVyE7VI2uQHkmxap6cg4ioBzGMg5ggCDjXbENxZTOKTzehpLIZFrsbABAaIsGQZC0yBuqQlRqNWG24yNUSEfVeDOMgciF8S88YUXbGiNKqZhi+PQQJAHSRCoxM0yNjYDSG9tdCqeB/PxGRP+Bf4wDmcntQWWdGeU0Lys+24ER1C1oszrbHVWFSjEzTY2j/PhjaX4sYjZLDz0REfohhHCDcHi9qGi2orGvF6fpWnK41oareDI9XaJsnSi3H6CExSO2nQWo/DRKiVQhh+BIR+T2GsR+y2t2objDjzLnvbtUNZrjc3rZ5QkMkSIqNQEpCJFL6RiElIRK6qDCu+RIRBSCGsYjMNhfqmqyoabS03c42WtDc6mg3X2iIBH2jVegfH4HkuEj0j4tAol4FmZQn3iAiCgYMYx+z2F0412w7fzPacK7ZiromK+qbbDDbXD+aXxuhwLABfdBPr0a/GDUSY9SI14VDGhoiQvVERNQTGMZd4BUEtFqcaGp1oMlkR5PJAYPJjgajDYYWOxpb7LA63D96XohEAr0mDCkJkYjThSNep0JCtAoJunCEh8lEeCdERCQmhvFFeAUBZpsLJrMTLRYnWiwOtJidcHgE1DSYYWx1wGg+f3N7hIu+hlwWgugoJa5JjEKMRokY7fmbXnP+xjVdIiK6IOjD2CsIsDvcMNvdsNhcsNhdMNtcsNjcaLU6Ybadn261utBqdcJkdcFsdcErXDxkgfNrtlFqOfrFRKBPpAJ9IsKgi1SgT2QYdFFhiI4Kg1op485URETUKX4bxl5BgMPpgcPlgcPpgd3pgd3pht3pge3bf+0OD2wO93c35/lpi90Fq/38fVaHG5fJ1XaUCikiw2WI0SoRGS5HlFqOKNX5W6RKjoH9+kBwuxEZLkdICIOWiIi6R6fCeM+ePcjPz4dOp4NEIsHcuXPbPe5wOLBo0SLExsbi9OnTyM3NxYABAy77mm+sPQKH2wOnywun63zonv/XC4fL0+4wniulkIciXCGFJkKBhGgVVGEyqJTSb/+VQRUmhVopQ4RSBnW4HGqlDGqlDDLp5YeO9foINDS0XnYeIiKiK9VhGNtsNsyfPx9btmyBXC5HXl4e9u7di5ycnLZ5Vq5cifj4eDz66KMoLS3Fc889hw8++OCyr3ugrKHtZ7ksBHJpKBSyEESEyxAtC4NCFgqFPBRh394UMikU8lAo5aEIU0jb7g9XSKFUSBGmkCL82/u5PZaIiAJJh2FcVFSEhIQEyOVyAEB2djYKCgrahXFBQQF+/etfAwDS0tJQUlICs9kMtfrSl9/765M3QC4LhVwawm2rRETUq3UYxgaDASqVqm1arVbDYDB0ap7LhXFSXw2kAXjSCr0+QuwSgh577Hvscc9gn30vWHrcYRjrdDpYLJa2abPZDJ1Od8Xz/FBzs/VKaxUdtxn7Hnvse+xxz2CffS/Qeny5Lw4dblzNzMxETU0NnM7zVwMqLCzE+PHjYTQaYTabAQDjx4/HwYMHAQClpaUYPHjwZdeKiYiI6DsSQej4wJ/du3dj+/bt0Gq1kMlkmDt3LhYvXgyNRoPc3FzY7XYsWrQIer0eVVVVeOyxxzrcmzqQvs1cEGjfwgIRe+x77HHPYJ99L9B6fLk1406FsS8EUgMvCLT/+EDEHvsee9wz2GffC7Qed2mYmoiIiHyLYUxERCQyhjEREZHIGMZEREQiYxgTERGJjGFMREQkMoYxERGRyBjGREREImMYExERiYxhTEREJDKGMRERkcgYxkRERCJjGBMREYmMYUxERCQyhjEREZHIGMZEREQiYxgTERGJjGFMREQkMoYxERGRyBjGREREIpMIgiCIXQQREVFvxjVjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQMYyIiIpExjImIiEQmFbuAQLR06VKsXLkSX331ldilBKUXXngBSqUS4eHhKCkpwbPPPgu9Xi92WUFhz549yM/Ph06ng0Qiwdy5c8UuKahUVVXh1VdfxdChQ1FXVweNRsMe+4jdbsc999yDG264AfPmzRO7nC5jGF+hr776CiaTSewygppSqcRTTz0FAFi+fDmWLVuGP/7xjyJXFfhsNhvmz5+PLVu2QC6XIy8vD3v37kVOTo7YpQUNo9GI2267DZMmTQIA3HbbbRg/fjzS09NFriz4XPjSEyw4TH0FGhsbsWXLFjzwwANilxLULgQxAAiCgPDwcBGrCR5FRUVISEiAXC4HAGRnZ6OgoEDcooLM8OHD24IYALxeL5RKpYgVBaf169cjOzsbiYmJYpfSbbhm/AMPP/wwGhsbf3T/k08+iZ07d2LevHlobW0VobLgcrk+T5w4EQBgMpnw5Zdf4vXXX+/p8oKSwWCASqVqm1ar1TAYDCJWFNx27NiBG264ASkpKWKXElROnjyJiooK/PrXv0ZpaanY5XQbhvEP/OMf/7jo/UeOHIFUKsWqVavQ0tICh8OB5cuXY/Lkyejfv3/PFhkELtXnC1pbW/GnP/0JL7zwAjQaTQ9VFdx0Oh0sFkvbtNlshk6nE7Gi4LVv3z589dVXePbZZ8UuJejs2LEDcrkcy5cvx4EDB+ByufDOO+9g9uzZYpfWJQzjTsrIyEBGRgYAoLq6Gh9//DFyc3NFrio4NTU14YUXXsAzzzyD2NhYbN++HVOmTBG7rICXmZmJmpoaOJ1OyOVyFBYWYtasWWKXFXQKCgrwzTff4LnnnsO5c+dQU1ODrKwsscsKGo8//njbzw6HA1arNeCDGOCFIq5YZWUlPvroI3z44YfIzc3F7NmzuU2zm82YMQNut7ttjVilUmHZsmUiVxUcdu/eje3bt0Or1UImk3FP32529OhRPPjgg207bFmtVtx///2YOXOmyJUFn+3bt+P999+Hy+XC/fffj2nTpoldUpcwjImIiETGvamJiIhExjAmIiISGcOYiIhIZAxjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQ8AxdRL7F27Vrs3LkTsbGxaGlpwdatW7FhwwakpqaKXRpRr8eTfhD1EgcPHoRGo8GAAQOQl5eHpKQkPP3002KXRURgGBP1OqtXr8YHH3yAVatWtV1OkYjExWFqol7k1KlTePnll/H+++8ziIn8CHfgIuolXC4XfvOb3yAvLw8pKSkoLy/HN998I3ZZRASuGRP1Gu+88w7Ky8tRXl6O//3f/0V9fT0mTpyIUaNGiV0aUa/HbcZEREQi4zA1ERGRyBjGREREImMYExERiYxhTEREJDKGMRERkcgYxkRERCJjGBMREYmMYUxERCSy/wc6Z6BeGTCN7AAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFnCAYAAABdOssgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df3RU9Z3/8dckIZgfQJJJ+JkEkG8bAY3AUiBVIPw4xePxrKYsPWyLrUUbD5ZoRVYQXZO6LgvRUA5FFoNYqcUfB8Ggiz2glBAqEZEJa8EQqEqATZEQEmDygxByv3+wuW5EyKWTmbl38nyc03Ny535m5p1PI6/5fO5nPtdlGIYhAADgSGHBLgAAAPz9CHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHIEnau3evsrKy9E//9E9auHBhQN/76aefVlFRUUDfEwgVBDngEAsXLtRvf/tbv73+b37zG82ePVtvvfWW0tPT/fY+mzZt0r333tvusQULFuiuu+7y23sCoSwi2AUAsIeTJ0+qd+/ekqQf//jHAX3vmJiYgL4fEEpc7OwG2Edra6t+/etf6/DhwwoPD9fAgQP15JNPasOGDSosLFT37t01YMAA/eM//qNmzJihAwcO6D/+4z/kcrkUHh6up59+WkOGDNHKlSv1+uuvKzMzU7W1tfrqq6/kdru1ZMkSJSQkXPG+c+fO1c6dO3XjjTcqKSlJt912m1566SXNnDlTOTk5WrFihX7/+99r0aJF+uEPf2i+/rRp03Tu3DkdOnRIw4cP19KlS83XfOedd/SHP/xBN9xwgyRpzpw5Zo2nT5/W0KFD9d3vfle33HKLVqxYoTFjxmjJkiWSpE8//VT5+fkyDEMul0uPP/640tPTtX37dj333HNKTEzUrbfeqr179yosLEwvvPCC3G53YP5PAuzGAGAbxcXFxv33328eP/TQQ8bx48cNwzCMBQsWGCtWrDDPnTt3zhg7dqyxe/duwzAMY8eOHcYPfvAD49KlS2b7qVOnGufPnzcMwzCeeuopY968eVd970mTJhkfffSRefzN95s1a5axcePGdufvvvtu48KFC0ZTU5MxZswYw+PxGIZhGPv27TO+//3vGzU1NYZhGMYf//hHY8GCBYZhGMbGjRuNWbNmtXvvFStWmOfPnTtnjBkzxqxl7969xpgxY4yzZ8+az7/11luNY8eOGYZhGA888ICxevXqa3UrENK4Rg7YSM+ePXX48GF9+OGHam1t1bJly9S/f/9vbbtjxw5FR0crIyNDkpSZmanTp0/rv//7v802EydOVGxsrCTp7rvv1tatW3Xp0qVOq3fs2LGKjIxU9+7dNXDgQJ04cULS5evgEyZMMEf/U6dO1T//8z9bes0dO3YoNjZWY8eOlSSNHj1avXr10p/+9CezzeDBg5WSkiJJSktLM98X6IoIcsBGRo4cqX/7t3/TmjVrNGnSJK1du1bGVa5+nTx5UmfPntW9995r/i8hIUF1dXVmm169epk/x8XF6eLFi6qtre20ets+JEhS9+7ddfHiRbO2/zuFHxERoVtvvdXSa37zuZKUkJCgkydPdvi+QFfEYjfARs6fP68xY8Zo4sSJOnbsmB544AH16dNH06dPv6Jtv3791LdvX7366qvmY16vV5GRkebx2bNnzZ9ra2vVrVs3xcfHW6qlW7duam5uNo/PnTtn+ffo16+fzpw5Yx63tLTor3/9q2666abrfq4knTlzRn379rX8/kBXwogcsJH3339fb775piQpNTVVffr0UWtrq6TLK7sbGxvV0NCgxx57TJMmTVJdXZ0+/fRTSVJDQ4N++tOfyuv1mq/35z//2TwuKirStGnTFB4ebqmW5ORkHTlyRJJUWVmpyspKy79HVlaWSkpKzEB+7733tGnTpna/hyTl5OSopaWl3XMnTZqk+vp67d27V5Lk8Xh09uxZTZ482fL7A11JeF5eXl6wiwBwWffu3fX6669rw4YNWr9+vYYMGaIHH3xQ4eHh6tmzp1566SVt2bJF99xzj2655RaNHTtW+fn52rRpk9555x3NmTNHw4YNkyR98MEHGjx4sP7rv/5LL730klpbW/XMM88oKirqivedO3euPvvsMx04cED19fUaNWqUBg0apI0bN2rDhg2qra2VYRjatWuXkpOTtWPHDhUVFenIkSPq37+//vjHP+qDDz5QeXm5hgwZojFjxigxMVFLlizRO++8o7/97W9atGiRIiMjlZSUpLfffltFRUW68cYbVVdXp3Xr1umLL75Qc3OzbrvtNo0bN07Lli3Txo0b9dFHH2np0qVKTU1VaWmpli1bpmPHjqmpqUler1eFhYX64osvFBYWppEjRwb6/zIg6Pj6GRCiFi5cqAEDBignJyfYpQDwI6bWAQBwMBa7ASFo5cqV2rVrl7p3766+fftqxowZwS4JgJ90ytR6dXW1li9frkOHDmnjxo1XnN+0aZPeeOMNde/eXZI0ffp03XPPPZKkzZs3q7y8XGFhYUpNTdXMmTN9LQcAgC6jU0bk+/bt05QpU1ReXn7VNsuWLVNycnK7x06ePKmXX35ZRUVFcrlcmj59usaNG6dBgwZ1RlkAAIS8TgnyO+64Q3v27Llmm/Xr1ysxMVGNjY2aNWuW4uLitGvXLg0fPlwul0vS5c0wSkpKCHIAACwKyDXy733ve8rMzFRCQoJ27typRx55ROvWrdOZM2fa3fUoJiZGNTU1Hb5eS8slRURY+y4sAAChLCBB3rYnsiSNGzdOc+bM0aVLl5SQkNBuk4n6+nqlpqZ2+Hq1tQ1+qdNfkpJ6qLr6fLDLCHn0s//Rx/5HHweG0/o5KanHVc/57etndXV15o5SBQUF5u5NR48eVXJyssLDwzV+/HgdPHjQ3Eu6rKxMEyZM8FdJAACEnE4ZkX/88cfavHmzqqurtWrVKs2ePVuFhYWKi4tTdna2EhMTlZeXp+TkZB0+fFj5+fmSpL59+2r27NlavHixwsPDNWPGDK6PAwBwHRy5s5uTpkMk503hOBX97H/0sf/Rx4HhtH4OytQ6AADwP4IcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcLCIzniR6upqLV++XIcOHdLGjRuvOF9YWKjTp08rMTFRBw8e1MMPP6whQ4ZIkiZPnqwBAwZIknr37q2CgoLOKAkAgC6hU4J83759mjJlisrLy7/1fENDg5544gm5XC699957eu6557R69WpJUlZWlnJycjqjDAAAupxOmVq/4447FBMTc9Xzv/rVr+RyuSRJra2tio6ONs/t3btXa9as0fLly+XxeDqjHAAAuoxOGZFb1dzcrLffflu5ubnmY/Pnz1d6eroaGxuVlZWlF198UQMHDrzm68THRysiItzf5XaqpKQewS6hS6Cf/Y8+9j/6ODBCpZ8DFuTNzc3Ky8vTo48+qtTUVPPx9PR0SVJUVJSGDh0qj8fTYZDX1jb4tdbOlpTUQ9XV54NdRsijn/2PPvY/+jgwnNbP1/rQ4bdV63V1dfJ6vZKkpqYm5ebm6uc//7luvvlmbd26VZJUWlqqkpIS8zmVlZVKSUnxV0kAAIScThmRf/zxx9q8ebOqq6u1atUqzZ49W4WFhYqLi1N2drbmz5+vI0eO6MSJE5IuL36bNm2aEhIStHLlSn322Wc6deqUpk2bptGjR3dGSQAAdAkuwzCMYBdxvZw0HSI5bwrHqehn/6OP/Y8+Dgyn9XNQptYBAID/EeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAg0V0xotUV1dr+fLlOnTokDZu3HjF+QsXLmjp0qXq06ePjh49quzsbA0ePFiStHnzZpWXlyssLEypqamaOXNmZ5QEAECX0ClBvm/fPk2ZMkXl5eXfen7dunXq16+ffvGLX6iiokJPPvmkXnvtNZ08eVIvv/yyioqK5HK5NH36dI0bN06DBg3qjLIAAAh5nRLkd9xxh/bs2XPV88XFxZo3b54kKS0tTYcOHZLX69WuXbs0fPhwuVwuSdLIkSNVUlJCkAM2VXXaqwOHq4NdRkjr+ZVX5842BruMkOekfu4ZE6mkpB5XPd8pQd6RmpoaxcTEmMexsbGqqanRmTNn2j0eExOjmpqaQJQE4O/wxAsf6sy5pmCXAXQ5745Ivuq5gAS52+1WfX29eez1euV2u5WQkKDKykrz8fr6eqWmpnb4evHx0YqICPdLrf5yrU9T6Dz0s395G5qVGBeluycMCXYpQJcR36P7Nc/7Lcjr6uoUERGh2NhYZWZmqqysTKNHj1ZFRYVuuukmxcbGavz48frDH/4gwzDkcrlUVlamWbNmdfjatbUN/irbL5KSeqi6+nywywh59LP/tRpSz+huum1Y72CXErL4Ow6MUOrnTvn62ccff6zNmzerurpaq1atUlNTkwoLC/Xaa69Jkn7605+qqqpKq1at0u9+9zv9+7//uySpb9++mj17thYvXqwlS5ZoxowZXB8HbM2QK9glAGjHZRiGEewirpfTPkWF0ic/O6Of/e+B/B26sV9PLbr3H4JdSsji7zgwnNbP17psyIYwAKwzDDEkB+yFIAdgWavBPxqA3fDfJABL2q7Cte37AMAeCHIAlrQtpiHHAXshyAFY879JzogcsBeCHIAlrc77ggvQJRDkAK5LGANywFYIcgCWmFtOMLUO2ApBDsCSVnIcsCWCHIA1bUHOjjCArRDkACxpNb9HHuRCALRDkAO4LuQ4YC8EOQBL2NkNsCeCHIAl7OwG2BNBDsASg53dAFsiyAFYYrDYDbAlghyAJeaIPLhlAPgGghyAJV9fIyfKATshyAFYwtQ6YE8EOQBLWOwG2BNBDsASc0Qe5DoAtEeQA7guDMgBeyHIAVjSttc6Y3LAXghyANb8b46HkeOArRDkACxpbfuBIAdshSAHYA03TQFsiSAHYAk7uwH2RJADsKSVETlgSwQ5gOtCjgP2QpADsKSVnd0AW4rojBfZvXu3tm3bJrfbLZfLpblz57Y7v2jRIh0/ftw8rqio0KZNm5ScnKzJkydrwIABkqTevXuroKCgM0oC0NnYax2wJZ+DvLGxUbm5udqyZYsiIyOVk5Oj0tJSZWRkmG1uv/123XnnnZIkr9erhQsXKjk5WZKUlZWlnJwcX8sA4GcsdgPsyeep9f3796t///6KjIyUJI0aNUrFxcXt2rSFuCS99dZbmj59unm8d+9erVmzRsuXL5fH4/G1HAB+wm1MAXvyeUReU1OjmJgY8zg2NlY1NTXf2ra1tVW7du3Sz372M/Ox+fPnKz09XY2NjcrKytKLL76ogQMHXvM94+OjFRER7mvpAZWU1CPYJXQJ9LP/nG++vCVMdHQk/exn9G9ghEo/+xzkbrdb9fX15rHX65Xb7f7Wttu3b9ekSZPafaJPT0+XJEVFRWno0KHyeDwdBnltbYOvZQdUUlIPVVefD3YZIY9+9q8zZy7/d97UeJF+9iP+jgPDaf18rQ8dPk+tjxgxQlVVVWpubpYkeTweZWZmqq6uTl6vt13bTZs2KSsryzwuLS1VSUmJeVxZWamUlBRfSwLgB4ZY7AbYkc8j8qioKOXl5enZZ59VfHy80tLSlJGRofz8fMXFxSk7O1uSVF5erkGDBrWbhk9ISNDKlSv12Wef6dSpU5o2bZpGjx7ta0kA/MBc7EaQA7biMgzz3oSO4aTpEMl5UzhORT/71xdV5/Ts7z/RHWNS9aPJ/y/Y5YQs/o4Dw2n97NepdQBdA1PrgD0R5AAsMefuCHLAVghyANb8b5CHMSQHbIUgB2BJq/OW0wBdAkEO4LqwsxtgLwQ5AEvavuBCjAP2QpADsITvkQP2RJADsKRtRM5iN8BeCHIAlrS2/UCOA7ZCkAOwxpxaJ8kBOyHIAVjCYjfAnghyAJaYG7uR5ICtEOQALDFH5CQ5YCsEOQBL+PoZYE8EOQBLzCDnKjlgKwQ5AEu4jSlgTwQ5AEu+HpEDsBOCHIAlBt8jB2yJIAdgicGQHLAlghzAdWGvdcBeCHIAlrS2jcgB2ApBDuC6hDEgB2yFIAdgSSs7uwG2RJADsMbcbD2oVQD4BoIcgCVtl8hZ7AbYC0EOwBIWuwH2RJADuC4MyAF7IcgBWGIuduMiOWArBDkAa7iNKWBLEZ3xIrt379a2bdvkdrvlcrk0d+7cduc3bdqkN954Q927d5ckTZ8+Xffcc48kafPmzSovL1dYWJhSU1M1c+bMzigJQCczF60T5ICt+BzkjY2Nys3N1ZYtWxQZGamcnByVlpYqIyOjXbtly5YpOTm53WMnT57Uyy+/rKKiIrlcLk2fPl3jxo3ToEGDfC0LQCcz+B45YEs+T63v379f/fv3V2RkpCRp1KhRKi4uvqLd+vXrtXbtWq1cuVJ1dXWSpF27dmn48OHmPwwjR45USUmJryUB8APumQLYk88j8pqaGsXExJjHsbGxqqmpadfme9/7njIzM5WQkKCdO3fqkUce0bp163TmzJl2z42Jibniud8mPj5aERHhvpYeUElJPYJdQpdAP/tPbOzlS2O9ekXRz35G/wZGqPSzz0HudrtVX19vHnu9Xrnd7nZtUlJSzJ/HjRunOXPm6NKlS0pISFBlZaV5rr6+XqmpqR2+Z21tg69lB1RSUg9VV58Pdhkhj372r3PnmyRJ58830c9+xN9xYDitn6/1ocPnqfURI0aoqqpKzc3NkiSPx6PMzEzV1dXJ6/VKkgoKCtTS0iJJOnr0qJKTkxUeHq7x48fr4MGD5rW3srIyTZgwwdeSAPiBObXONXLAVnwekUdFRSkvL0/PPvus4uPjlZaWpoyMDOXn5ysuLk7Z2dlKTExUXl6ekpOTdfjwYeXn50uS+vbtq9mzZ2vx4sUKDw/XjBkzWOgG2JS52C3IdQBoz2UYztt30UnTIZLzpnCcin72r/c/Oa7XPziiX2bdrH9I6x3sckIWf8eB4bR+9uvUOoCugal1wJ4IcgDWMLUO2BJBDsCSVkbkgC0R5ACuDzkO2ApBDsCStnWxYQQ5YCsEOQBLvv56C0kO2AlBDsASRuSAPRHkACxpNe9jGtQyAHwDQQ7AGm5jCtgSQQ7AEm5jCtgTQQ7AEnNmnRE5YCsEOQBLuGkKYE8EOQBLvt5rPbh1AGiPIAdgiSEWuwF2RJADsIQROWBPBDkAS7iNKWBPBDkAS8yp9SDXAaA9ghyAJYzIAXsiyAFYwzVywJYIcgCWtJpbtAa5EADtEOQArouLq+SArRDkACxhRA7YE0EOwBqj4yYAAo8gB2BJ26r1MIbkgK0Q5AAsMcR9TAE7IsgBWML3yAF7IsgBWMJtTAF7IsgBWNK21o0BOWAvBDkAS9pG5Cx2A+yFIAdgiWEOyYNaBoBviOiMF9m9e7e2bdsmt9stl8uluXPntjtfWFio06dPKzExUQcPHtTDDz+sIUOGSJImT56sAQMGSJJ69+6tgoKCzigJQCdjsRtgTz4HeWNjo3Jzc7VlyxZFRkYqJydHpaWlysjIMNs0NDToiSeekMvl0nvvvafnnntOq1evliRlZWUpJyfH1zIA+Bm3MQXsyeep9f3796t///6KjIyUJI0aNUrFxcXt2vzqV78yP8W3trYqOjraPLd3716tWbNGy5cvl8fj8bUcAH5icPczwJZ8HpHX1NQoJibGPI6NjVVNTc23tm1ubtbbb7+t3Nxc87H58+crPT1djY2NysrK0osvvqiBAwde8z3j46MVERHua+kBlZTUI9gldAn0s/907375nwu3O1ZJ8dEdtIYv+DsOjFDpZ5+D3O12q76+3jz2er1yu91XtGtublZeXp4effRRpaammo+np6dLkqKiojR06FB5PJ4Og7y2tsHXsgMqKamHqqvPB7uMkEc/+1dj00VJUu2ZerlaLgW5mtDF33FgOK2fr/Whw+ep9REjRqiqqkrNzc2SJI/Ho8zMTNXV1cnr9UqSmpqalJubq5///Oe6+eabtXXrVklSaWmpSkpKzNeqrKxUSkqKryUB8AcWuwG25POIPCoqSnl5eXr22WcVHx+vtLQ0ZWRkKD8/X3FxccrOztb8+fN15MgRnThxQtLlxW/Tpk1TQkKCVq5cqc8++0ynTp3StGnTNHr0aJ9/KQCdj9uYAvbkMgzDcTcndNJ0iOS8KRynop/9a/XmA/q4/JR+M/c29YrtHuxyQhZ/x4HhtH7269Q6gK6hlal1wJYIcgDWGNzGFLAjghyAJW05zl7rgL0Q5AAscdxiGqCLIMgBWPL13c+CXAiAdghyAJZw0xTAnghyAJY48JuqQJdAkAOwpC3GWewG2AtBDsASc0BOjgO2QpADsITFboA9EeQALPn6CjlJDtgJQQ7AEoObpgC2RJADsOTrr58Ftw4A7RHkACz5ekROkgN2QpADuC7EOGAvBDkAS7iNKWBPBDkAawyD6+OADRHkACxpFaNxwI4IcgDWGFwfB+yIIAdgiWEYjMgBGyLIAVhiiO1ZATsiyAFYYhgGu8EANkSQA7DEMBiRA3ZEkAOwhAE5YE8EOQBLWOwG2BNBDsASQ3z9DLAjghyAJYzIAXsiyAFYYohr5IAdEeQALLm82I0kB+wmojNeZPfu3dq2bZvcbrdcLpfmzp3b7vyFCxe0dOlS9enTR0ePHlV2drYGDx4sSdq8ebPKy8sVFham1NRUzZw5szNKAtDJDG6aAtiSz0He2Nio3NxcbdmyRZGRkcrJyVFpaakyMjLMNuvWrVO/fv30i1/8QhUVFXryySf12muv6eTJk3r55ZdVVFQkl8ul6dOna9y4cRo0aJCvZQHoZIzIAXvyOcj379+v/v37KzIyUpI0atQoFRcXtwvy4uJizZs3T5KUlpamQ4cOyev1ateuXRo+fLj5j8PIkSNVUlLSYZA/vXaPr2UHVEREuFpaLgW7jJBHP/vX6bON6hEdGewyAHyDz0FeU1OjmJgY8zg2NlY1NTWW2pw5c6bd4zExMVc899vUeZt9LRvAdbohMkKjh/ZRUlKPYJcS8ujjwAiVfvY5yN1ut+rr681jr9crt9ttqU1CQoIqKyvNx+vr65Wamtrhe654ZLyvZQdUUlIPVVefD3YZIY9+9j/62P/o48BwWj9f60OHz6vWR4wYoaqqKjU3Xx4lezweZWZmqq6uTl6vV5KUmZmpsrIySVJFRYVuuukmxcbGavz48Tp48ODlmzFIKisr04QJE3wtCQCALsNltKWoDz788ENt3bpV8fHx6tatm+bOnav8/HzFxcUpOztbTU1NWrp0qZKSknTs2DE9+OCD7VatHzhwQOHh4Ro0aJClVetO+hQlOe+Tn1PRz/5HH/sffRwYTuvna43IOyXIA81JnS857w/Gqehn/6OP/Y8+Dgyn9bNfp9YBAEDwEOQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAg0X48uS6ujoVFBQoJSVFR48e1bx585SYmNiuzaeffqp169Zp2LBh+vLLL5Wenq4f/ehHkqSnn35aX375pdn2qaeeUlpami8lAQDQpfgU5MuWLVNGRobuvPNO/elPf9LSpUv13HPPtWtTXV2tn/3sZ0pPT9fFixf1/e9/X1OnTlVCQoKSkpL0zDPP+PQLAADQlfkU5Dt37tScOXMkSaNGjdLChQuvaDNlypR2x+Hh4erWrZskqb6+Xv/5n/+p8PBwRUdHa+bMmYqI8KkkAAC6FJdhGMa1Gtx///06ffr0FY8//PDDeuSRR7R792717NlTLS0tGj58uA4ePHjVMH7llVckSffdd58k6eDBg0pLS1NERITy8/MVExOjX/7ylx0W3dJySRER4R22AwAg1HU4/F27du1Vz7ndbtXX16tnz57yer3q1avXVUP83XffVUNDgx566CHzseHDh5s/jxs3TmvWrLEU5LW1DR22sZOkpB6qrj4f7DJCHv3sf/Sx/9HHgeG0fk5K6nHVcz6tWp84caLKysokSR6PRxMnTpQktba2qqqqymy3YcMG1dTU6KGHHlJFRYW5wG3p0qVmm8rKSg0cONCXcgAA6HJ8uiA9b948Pf/88zp69KiOHz+uBQsWSJIqKir0+OOP691339UHH3ygJUuWaNiwYdq+fbvq6ur01FNPafDgwaqtrdXzzz+vG264QV9++aWeeOKJTvmlAADoKjq8Rm5HTpoOkZw3heNU9LP/0cf+Rx8HhtP62W9T6wAAILgIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAABwswpcn19XVqaCgQCkpKTp69KjmzZunxMTEK9pNnjxZAwYMkCT17t1bBQUFkqQTJ05o1apVGjhwoP7nf/5HCxYsUExMjC8lAQDQpfg0Il+2bJkyMjKUnZ2tqVOnaunSpd/aLisrS6+++qpeffVVM8QlKTc3VzNnztSDDz6o73znO1qzZo0v5QAA0OX4FOQ7d+7UyJEjJVVpVl4AAAV4SURBVEmjRo3Szp07v7Xd3r17tWbNGi1fvlwej0eSdPHiRe3Zs0e33HJLh88HAADfrsOp9fvvv1+nT5++4vGHH35YNTU15lR4bGyszp49q5aWFkVEtH/Z+fPnKz09XY2NjcrKytKLL76oqKgo3XDDDXK5XObza2pqLBWdlNTDUjs7cWLNTkQ/+x997H/0cWCESj93GORr16696jm32636+nr17NlTXq9XvXr1uiLEJSk9PV2SFBUVpaFDh8rj8eiuu+5SU1OTDMOQy+WS1+uV2+324VcBAKDr8WlqfeLEiSorK5MkeTweTZw4UZLU2tqqqqoqSVJpaalKSkrM51RWViolJUXdunXT2LFj9Ze//OWK5wMAAGtchmEYf++T6+rq9Pzzz6t///46fvy4HnvsMSUmJqq8vFyPP/643n33XVVUVGjlypUaPny4Tp06pT59+ujBBx+UdHnV+gsvvKCUlBT97W9/08KFC1m1DgDAdfApyAEAQHCxIQwAAA5GkAMA4GA+7eyG67dq1SqtW7dOe/bsCXYpIWfx4sWKiopSdHS0Dh06pEWLFikpKSnYZYWE3bt3a9u2bXK73XK5XJo7d26wSwo5x44d0/LlyzVs2DCdPHlScXFx9LOfNDU1acaMGbr99tu1YMGCYJfjM4I8gPbs2aNz584Fu4yQFRUVpUcffVSSVFhYqNWrV+tf//Vfg1yV8zU2Nio3N1dbtmxRZGSkcnJyVFpaqoyMjGCXFlLq6up05513aurUqZKkO++8U5mZmbr55puDXFnoafvAFCqYWg+Q06dPa8uWLZo1a1awSwlZbSEuSYZhKDo6OojVhI79+/erf//+ioyMlHR5F8bi4uLgFhWC0tPTzRCXLn+NNyoqKogVhaaioiKNGjVKycnJwS6l0zAi70TX2gVv+/btWrBggc6fPx+EykLHtfp4ypQpkqRz587pz3/+s377298GuryQ9H93cJSubxdG/H3ef/993X777RoyZEiwSwkpf/3rX/XFF19o3rx5qqioCHY5nYYg70RX2wXvL3/5iyIiIvTmm2/q7NmzunDhggoLC/WDH/xAgwYNCmyRDnetnQYl6fz58/r1r3+txYsXKy4uLkBVhba2HRzbsAujf3300Ufas2ePFi1aFOxSQs7777+vyMhIFRYWat++fbp48aJeeeUV3XfffcEuzScEeQDccsst5s1hTpw4obfeekvZ2dlBrir0nDlzRosXL9bjjz+uPn36aOvWrZo2bVqwy3K8ESNGqKqqSs3NzYqMjJTH49GPf/zjYJcVkoqLi/XJJ5/oySef1KlTp1RVVWXemAq+mzNnjvnzhQsX1NDQ4PgQl9gQJqAqKyv1xhtv6PXXX1d2drbuu+8+ruN2oqysLLW0tJgj8ZiYGK1evTrIVYWGDz/8UFu3blV8fLy6devGamo/OHDggO69915zcVtDQ4N+8pOf6Ic//GGQKws9W7du1fr163Xx4kX95Cc/0V133RXsknxCkAMA4GCsWgcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMHY2Q1AhzZt2qTt27erT58+Onv2rN577z1t3rxZ3/3ud4NdGtDlsSEMgA6VlZUpLi5OgwcPVk5OjlJTU/Uv//IvwS4LgAhyANdhw4YNeu211/Tmm2+atzUFEFxMrQOw5Msvv1RBQYHWr19PiAM2wmI3AB26ePGiHnvsMeXk5GjIkCH6/PPP9cknnwS7LABiRA7AgldeeUWff/65Pv/8cz3zzDP66quvNGXKFI0ePTrYpQFdHtfIAQBwMKbWAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAH+/+w2Id0pcVObAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFnCAYAAABdOssgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXhU5d038O+syWQmy2QyCSRkYwsQiYAgpAVBUam4Aq++PG71kUesFdDy8AiKCvVVnoJgaaUoUFQui9VaWYq0iqVlUcKasIUkbEnIQkIyySSZJbOe94/AaIQhEybJmeX7uS4vMjP3zPzmZ5Jvzn3OuY9EEAQBREREFJSkYhdAREREN45BTkREFMQY5EREREGMQU5ERBTEGORERERBjEFOREQUxBjkRCFq1qxZGDp0KA4cOODzc5YvX46pU6fi/vvvx86dO7uxuvZsNhvGjx8Pq9XaY+9JFCoY5EQB5IknnsCmTZu65LVWrVoFvV7v8/gLFy5g48aN+PTTT/Huu+9CpVJ1SR3X8uPPGRERgW3btnXrexKFKrnYBRBRYKipqYFWq4VSqURGRgYyMjJ69P1jYmJ69P2IQgWDnChArFixAkVFRairq8PmzZsxY8YMjBo1CosXL0ZlZSUkEgkGDBiA1157DXK5HKtWrcKf//xnTJo0Cc3NzSguLkZ2djaWLl3a7nWPHTuGDRs24MyZM3jqqafw2GOPXfXex48fx1tvvYW6ujo88cQTuPPOO7F7925899132LlzJxISEvDkk0/i2LFjKCkpgd1ux4wZM3Dw4EG8/vrr2LVrF0pLSzF//nzcddddAACz2Yz//d//xblz5wAAmZmZmDdvHj788MOrPue2bduwY8cO/PGPf8To0aMhCALWr1+PHTt2QCaTISMjAwsXLoRGo8HixYvx5Zdf4vHHH8f58+dRUlKCSZMmYe7cud3/P4koEAlEFDAef/xx4YsvvvDcbmxsFLZs2eK5PX/+fOEvf/lLu9sPPvigYLPZhNbWVuHWW28V8vPzPY/ffvvtwqJFiwRBEIRjx44Jw4YNExwOxzXfe//+/cLtt9/e7r6BAwcKFRUVgiAIQkVFhTBw4MCrHl+7dq0gCIKwfft24e677/Y89uqrrwoLFiwQBEEQXC6X8Oyzzwr79++/5ue8UuuVxzdv3ixMnjxZsFgsgiAIwiuvvCK8/PLL7fr0zDPPCG63W6itrRWGDBki1NTUXPNzEYU67iMnCmCxsbGorq7Gf/zHf+CJJ57AwYMHUVhY2G7M6NGjoVQqERERgfT0dFRWVrZ7fNy4cQCArKwsWCwWGAyGLq3xh69fVVUFAHC73diyZQumTp0KAJBKpViwYAH69+/v02tu3boV99xzj2ef+dSpU/G3v/0NTqfTM2bs2LGQSCRITExEXFyc572Jwg2n1okC2ObNm/HZZ59hy5YtiIuLw7vvvntVYGk0Gs/XERERcDgc13w8IiICAK563F8/fP0rr93Q0AC73Y74+HjPuM7sc6+pqWn33Pj4eDgcDhgMBiQlJbV73x+/N1G44RY5UQA7fvw4cnJyEBcXBwDttkh7gkKhgN1uBwC0tLT4/Lz4+HgolUo0NDR47qutrUVdXZ1Pz+/du3e75zY0NEChUCAhIcHnGojCBYOcKICo1WpYrVaUlZVh6dKlSE9PR3FxMex2O5xOJ/Ly8nq0npSUFJw5cwYAsHv3bp+fJ5VK8dBDD3lOMXO73Vi4cKEnyH/8OX9sypQp+Oqrr9Da2goA2LJlCx544AHIZDJ/PxJRyJEtXrx4sdhFEFGbiIgIvPfee9i9ezceffRRTJw4EUeOHMF7772Hw4cPQ6VS4eDBg5BKpTh69Ci2bNmCM2fOIDk5Gf/4xz/wz3/+E0VFRejXrx/effddHDt2DCdPnsTYsWOxcOFCnD9/HseOHcNdd92FyMhIz/seP34cb7zxBqqqqpCXl4fs7GwkJCQgMTERb7/9Nvbu3YusrCzs3r0bBw8exIMPPoj/+q//QkVFBY4dO4af/exneP7551FbW4uCggI8+OCDGD16NL799lv88Y9/xBdffIFJkyZh4sSJ1/ycv/3tb3HixAmcPHkSQ4cOxbhx42C1WrFixQps2rQJMTExWLhwIZRKJZYtW4bdu3ejqKgI2dnZWLNmDQ4fPoyTJ09i1KhR7abkicKBRBAEQewiiIiI6MZwap2IiCiIMciJiIiCWJecflZXV4eVK1eiuLgYX3zxxVWP22w2LF26FElJSSgrK8PMmTORmZkJoO180aKiIkilUqSlpWH69OldURIREVFY6JIgP3LkCCZOnIiioqJrPr5hwwb07t0bzzzzDEpKSrBw4UJ88sknqKmpwQcffIAtW7ZAIpFg2rRpGDNmTI+v8UxERBSsumRq/Wc/+xnUarXXx3ft2oXhw4cDaFv9qbi4GCaTCXv37kV2djYkEgkAYPjw4dizZ09XlERERBQWemRlN4PB0C7oNRoNDAYDGhoa2t2vVqt9Wj7S6XRBLuf5pEQUWhxONwxNVjQ0t6Kx2QZDsxVNJjuaTDY0m+1oNtthsthhsjpgsjpgs7u6rRaZVNL2n0wCqUQCqVQKmVQCqRSQSiSQSK/cL4FUAgDffy2RSCD54b9o+1p6+caV2wA8465oG9L2vCu3217d88UP/2n3XPzgyx/fbDeuE27waVe/zo+L89Hdo9Mw+qbe1x3TI0Gu0+lgNps9t00mE3Q6HeLj41FeXu6532w2Iy0trcPXa2y0+PS+en006up8X40qnLA33rE33rE33vnaG6vNiYsGC2obLKhttKCmwQJDUyvqm1vRbLLjeucDSwCoIuSIipSjlzYKUZFyRCplUEW0/RuhlCFSIUOEQgalQgalQgqlXAaFXAqlXAq5XAqFXAq5TAqFTAqZTAK5rO32j4P7RoPPn96EI196U1fXAr0+2uvj3RbkRqMRcrkcGo0GEyZMQEFBAUaOHImSkhIMGjQIGo0G48aNw5/+9CcIggCJRIKCggI8/vjj3VUSEVGParbYcb66GaXVzai4ZEJlnQn1Ta1XjZNJJdBGRyArLQ7a6EhooyMQq1EiThOBmCgFoqOUiI5SQB2pgFTadQFLoaFLgvzgwYPYunUr6urqsHr1ajz99NNYu3Yt4uLiMHPmTDz55JNYunQpVq9ejQsXLuCtt94CAPTq1QtPP/00lixZAplMhocffpgHuhFR0GpssaGovAFFZY04XWlEnbF9aMdEKTA4XYsUvRq946OQGB+FJK0K8dGRDGi6YUG5spuvUzSczvGOvfGOvfGOvWlPEASUXmxB/uk6nChtQEXt971RR8qRmRyDfsmx6Jscg7SkaMSqlSJWKx5+33jna29EmVonIgpVVXUm7D1+EYeKL6GxxQYAUCpkGNpXh8HpWgzJ0KJPoqbt4C6ibsYgJyLygc3hQl5hDfYeu4jSi80A2ra6f3JTL4wYqMf4UWloabKKXCWFIwY5EdF1mKwO/OtIJf55pBImqwMSCZDTT4exQ3tj2IAEyGVty3FEKuXg5DGJgUFORHQN5lYHvtxXhn8XVMHucEMdKcd9P8nA7cNToI2OELs8Ig8GORHRDzhdbvw7vwp/+64U5lYntNERmHpbGm67uTcilfyVSYGH35VERJedKmvAx1+XoLbRClWEHI/c3h8Tb+kDhZwXiqTAxSAnorDXanfi813n8O/8KkglEkwc0QcPjM1AdFR4ni5GwYVBTkRh7XSFEeu3n0KdsRXJCWrMuHcwMnvHiF0Wkc8Y5EQUlgRBwDeHKvDZv88CAO4Zk4aHxmZCwQsyUZBhkBNR2LE7XNjwVQnyCmsQq1biuYduwsDUOLHLIrohDHIiCiuNLTa8+8VxlNW0ILN3DGZNHcrTySioMciJKGzUG61Y9ucC1De14qdDe+HJSVmcSqegxyAnorBQ22DB258WoKHZhgfHZuKBn2Z06TW3icTCICeikFdVb8byTwvQZLLj4Qn9cM+YdLFLIuoyDHIiCmmXGi14+5N8NFsc+I87B+Cukalil0TUpRjkRBSyWix2/PYvx9BsceCxuwZi4i19xC6JqMtx3UEiCkkOpwvvbjqB2kYrJo9JZ4hTyGKQE1HIcQsC1n1ZhLOVTbh1cCKmju8rdklE3YZBTkQhZ8veUhwuvoQBfWIx497BkPLodAphDHIiCimFpQ3Yvq8M+rhIzJ6Ww/PEKeQxyIkoZDSZ7Vj35SlIpRL84sGboFEpxC6JqNsxyIkoJLgFAX/cVohmsx3/Z0I/XsGMwgaDnIhCwj/2l6OwrBE5/XS4exTPFafwwSAnoqBXXtOCzXtKEadRYsa9g7n0KoUVBjkRBTW3W8BHXxXDLQiYce8QREcpxS6JqEcxyIkoqO08UonymhbkZvdCdma82OUQ9TgGOREFrYbmVmzaex7qSDn+78T+YpdDJAoGOREFrY3fnIbN7sIjd/RHDKfUKUwxyIkoKOWfrkPBmXpkpcZh7NDeYpdDJJouufrZvn37sGPHDuh0OkgkEsyaNavd46+88goqKio8t0tKSrBp0yb06dMHd9xxB1JSUgAAiYmJWLFiRVeUREQhzOF049OdZyCXSfDkz7J4lDqFNb+D3Gq1YtGiRdi+fTuUSiVmz56NvLw85ObmesaMHTsWkydPBgCYTCYsWLAAffq0XYloypQpmD17tr9lEFEY2XW0CvVNrbh7VCp669Ril0MkKr+n1o8ePYrk5GQolW37p0aMGIFdu3a1G3MlxAHgr3/9K6ZNm+a5fejQIaxbtw4rV65Efn6+v+UQUYiz2pzY9l0ZVBEy3JubLnY5RKLze4vcYDBArf7+L2KNRgODwXDNsW63G3v37sXPf/5zz33z5s1DTk4OrFYrpkyZgjVr1iA9/fo/nFptFOQ+XghBr4/2aVw4Ym+8Y2+8E7s3f/qqCCarA4/fMwh903Wi1vJjYvcmkLE33vnbG7+DXKfTwWw2e26bTCbodNf+4dq5cyduv/32dvuzcnJyAAAqlQqDBw9Gfn5+h0He2GjxqTa9Php1dS0+jQ037I137I13YvemyWTD5l1nEatW4qeDkwLq/5PYvQlk7I13vvbmemHv99T6sGHDUF1dDbvdDgDIz8/HhAkTYDQaYTKZ2o3dtGkTpkyZ4rmdl5eHPXv2eG6Xl5cjNZVrJBPRtf1tXxnsDjceHJuJCCUvT0oEdMEWuUqlwuLFi/Hmm29Cq9UiKysLubm5WLZsGeLi4jBz5kwAQFFRETIyMtpNw8fHx2PVqlU4deoULl26hEmTJmHkyJH+lkREIehSowV7jlYjSavC2ByebkZ0hUQQBEHsIjrL1ykaTud4x954x954J2ZvNnxVjN1Hq/HsA9kYPSRJlBquh9833rE33gXE1DoRUXdrMtnw3YkaJMapMGpQotjlEAUUBjkRBbxvDlfC6XLjZ6PTIJVy8ReiH2KQE1FAs7Q68e+CSsSolfjp0F5il0MUcBjkRBTQdh+tgtXmwl0j+0Dh4/oRROGEQU5EAcvhdGPH4QpEKmW4fXiK2OUQBSQGOREFrLzCGjSZ7JgwLAVRkQqxyyEKSAxyIgpIgiDg64MXIJNKcNcoLhRF5A2DnIgCUskFIy4aLBg1KBHa6AixyyEKWAxyIgpIu45WAQAmcN840XUxyIko4DSZ7ThSUocUvRoD+sSKXQ5RQGOQE1HA+fZ4NVxuAROGpbS7WiIRXY1BTkQBxe0WsPtoNZQKKXKzuQAMUUcY5EQUUE6WNqC+qRVjhiQhKtLvCzQShTwGOREFlF0FPMiNqDMY5EQUMBqaW3HsXD0ye0cjo1eM2OUQBQUGOREFjO9OXIQgAOOHcWucyFcMciIKCIIgIK+wFkq5lNccJ+oEBjkRBYSymhbUNFgwbEACVBE8yI3IVwxyIgoIeYU1AMBTzog6iUFORKJzud04eKoWGpUC2ZnxYpdDFFQY5EQkusLSRjRbHBg9OAlyGX8tEXUGf2KISHT7L0+rj7kpSeRKiIIPg5yIRGW1OZF/ug6JWhX69ua540SdxSAnIlEVnKmD3elGbnYvXiCF6AYwyIlIVHmFtQCAMdmcVie6EQxyIhJNi8WOU2UNyOwdgyRtlNjlEAUlBjkRiabgTD0EAVzJjcgPDHIiEs2RkjoAwIgsvciVEAUvBjkRicLS6sSpsgakJWqQGKcSuxyioNUlCxrv27cPO3bsgE6ng0QiwaxZs9o9vmnTJnz66aeIiIgAAEybNg0PPfQQAGDr1q0oKiqCVCpFWloapk+f3hUlEVGAO3auHi63gFu4NU7kF7+D3Gq1YtGiRdi+fTuUSiVmz56NvLw85Obmthv3zjvvoE+fPu3uq6mpwQcffIAtW7ZAIpFg2rRpGDNmDDIyMvwti4gC3PfT6tw/TuQPv6fWjx49iuTkZCiVSgDAiBEjsGvXrqvGbdy4EevXr8eqVatgNBoBAHv37kV2drbn3NHhw4djz549/pZERAHOZnfh5HkDeuuikJKgFrscoqDm9xa5wWCAWv39D6JGo4HBYGg3ZtSoUZgwYQLi4+Oxe/duvPDCC9iwYQMaGhraPVetVl/1XCIKPSfOG2B3ujmtTtQF/A5ynU4Hs9nsuW0ymaDT6dqNSU1N9Xw9ZswYPPfcc3C5XIiPj0d5ebnnMbPZjLS0tA7fU6uNglwu86k+vT7ap3HhiL3xjr3xrit6c/LrEgDAxNEZIdXrUPosXY298c7f3vgd5MOGDUN1dTXsdjuUSiXy8/Px6KOPwmg0Qi6XQ6PRYMWKFXjhhRcgl8tRVlaGPn36QCaTYdy4cfjTn/4EQRAgkUhQUFCAxx9/vMP3bGy0+FSbXh+NuroWfz9iSGJvvGNvvOuK3jicbhwsrEFCbCRilNKQ6TW/b7xjb7zztTfXC3u/g1ylUmHx4sV48803odVqkZWVhdzcXCxbtgxxcXGYOXMmEhISsHjxYvTp0wenT5/GsmXLAAC9evXC008/jSVLlkAmk+Hhhx/mgW5EIe5UWQNa7S6MH5bMtdWJuoBEEARB7CI6y9e/7PhXoHfsjXfsjXdd0ZuP/lGMPceq8fLjIzCgT1wXVSY+ft94x9541xVb5FwQhoh6jCAIOHHeAI1KgX7JsWKXQxQSGORE1GMqLpnQ2GLD0L7xkEo5rU7UFRjkRNRjjp1rO700p1+CyJUQhQ4GORH1mOPn6iGVSHBT33ixSyEKGQxyIuoRLRY7zlc1o39KDNSRCrHLIQoZDHIi6hEnzhsgAMjpz2l1oq7EICeiHnHcs39c18FIIuoMBjkRdTuX242T5xugi4ngRVKIuhiDnIi63dnKJlhsTuT0S+BqbkRdjEFORN2O0+pE3YdBTkTd7vg5AxRyKQala8UuhSjkMMiJqFs1NLeiqt6MwelaRCh8u/wwEfmOQU5E3aqwtAEAkJ3JRWCIugODnIi6VWHZ5SDPYJATdQcGORF1G7cg4FRZI7TREeitixK7HKKQxCAnom5zobYFJqsD2RnxPO2MqJswyImo21zZPz4kk0erE3UXBjkRdZtTZY0AgCHcP07UbRjkRNQtbA4XzlQakZakQUyUUuxyiEIWg5yIusXpCiOcLoGnnRF1MwY5EXULz/njnFYn6lYMciLqFoVlDVDIpRjQJ1bsUohCGoOciLqc0WRDVZ0ZWalxUMi5LCtRd2KQE1GX85x2xml1om7HICeiLnfltDMe6EbU/RjkRNSlBEFA8YVGREcpkKJXi10OUchjkBNRl7rUaEVjiw1ZaVpIuSwrUbdjkBNRlyq60DatPjgtTuRKiMIDg5yIulRxeVuQD0rn+upEPYFBTkRdpm3/uBGxGiV6xfOypUQ9Qd4VL7Jv3z7s2LEDOp0OEokEs2bNavf42rVrUV9fj4SEBBQWFmLOnDno168fAOCOO+5ASkoKACAxMRErVqzoipKISATVBguazXaMGZLEy5YS9RC/g9xqtWLRokXYvn07lEolZs+ejby8POTm5nrGWCwWvPzyy5BIJPj73/+Ot99+G++//z4AYMqUKZg9e7a/ZRBRAOC0OlHP83tq/ejRo0hOToZS2XZ1oxEjRmDXrl3txrz44ouev87dbjeior6fcjt06BDWrVuHlStXIj8/399yiEhEDHKinuf3FrnBYIBa/f25ohqNBgaD4Zpj7XY7Nm/ejEWLFnnumzdvHnJycmC1WjFlyhSsWbMG6enp131PrTYKch+XfdTro30aF47YG+/YG++89cbtFnC60gi9VoUh/fVhObXO7xvv2Bvv/O2N30Gu0+lgNps9t00mE3Q63VXj7HY7Fi9ejF/96ldIS0vz3J+TkwMAUKlUGDx4MPLz8zsM8sZGi0+16fXRqKtr8WlsuGFvvGNvvLteby7UtqDF4kBOPx3q6009XJn4+H3jHXvjna+9uV7Y+z21PmzYMFRXV8NutwMA8vPzMWHCBBiNRphMbT/Mra2tWLRoEf7zP/8TN910E77++msAQF5eHvbs2eN5rfLycqSmpvpbEhGJwDOtnsZpdaKe5PcWuUqlwuLFi/Hmm29Cq9UiKysLubm5WLZsGeLi4jBz5kzMmzcPZ86cQWVlJYC2g98mTZqE+Ph4rFq1CqdOncKlS5cwadIkjBw50u8PRUQ9r/iCEQAwmPvHiXqURBAEQewiOsvXKRpO53jH3njH3njnrTcutxtzfrcX0VFK/ObZ3Gs8M/Tx+8Y79sa7gJhaJyK6UGuC1ebitDqRCBjkROS30xVt0+pZXF+dqMcxyInIbyWX949npTLIiXoag5yI/OIWBJypNCIhNhLxMZFil0MUdhjkROSXqjozzK1OTqsTiYRBTkR+Kbl8/fGBnFYnEgWDnIj88v2BbjxinUgMDHIiumGCIOB0hRHa6AjoY7l/nEgMDHIiumE1DRY0WxzISo0Ly4ukEAUCBjkR3bArp50N5IFuRKJhkBPRDfPsH+eBbkSiYZAT0Q0RBAElFUbERCnQKz5K7HKIwhaDnIhuSJ3RisYWGwZy/ziRqBjkRHRDSnjaGVFAYJAT0Q05zfXViQICg5yIbsjpSiPUkXIk69Vil0IU1hjkRNRpjS021BlbMaBPHKTcP04kKgY5EXXaldPOBqTGilwJETHIiajTTldeXgimD/ePE4mNQU5EnXamwgilXIr0XtFil0IU9hjkRNQpJqsDVXVm9E2OgVzGXyFEYuNPIRF1ytnKJgjg9ceJAgWDnIg6xbN/nEFOFBAY5ETUKWcqjJBJJeiXzCPWiQIBg5yIfNZqd6KspgVpSdGIUMrELoeIwCAnok44faERLreAgTx/nChgMMiJyGeF5wwAeP44USBhkBORzwpL24J8AA90IwoYDHIi8onT5UZxeSOSE9TQqBRil0NEl8m74kX27duHHTt2QKfTQSKRYNasWe0et9lsWLp0KZKSklBWVoaZM2ciMzMTALB161YUFRVBKpUiLS0N06dP74qSiKiLXag1wWZ3YWAf7h8nCiR+B7nVasWiRYuwfft2KJVKzJ49G3l5ecjNzfWM2bBhA3r37o1nnnkGJSUlWLhwIT755BPU1NTggw8+wJYtWyCRSDBt2jSMGTMGGRkZ/pZFRF3s+wulcFqdKJD4PbV+9OhRJCcnQ6lUAgBGjBiBXbt2tRuza9cuDB8+HACQlZWF4uJimEwm7N27F9nZ2ZBcvgzi8OHDsWfPHn9LIqJucIYXSiEKSH5vkRsMBqjVas9tjUYDg8Hg05iGhoZ296vV6queey1abRTkct/OYdXreVEHb9gb79ib9txuAWermqHXqjCov17scgIWv2+8Y2+887c3fge5TqeD2Wz23DaZTNDpdD6NiY+PR3l5ued+s9mMtLS0Dt+zsdHiU216fTTq6lp8Ghtu2Bvv2JurVdWb0WKx45ZBfdgbL/h94x17452vvble2Ps9tT5s2DBUV1fDbrcDAPLz8zFhwgQYjUaYTCYAwIQJE1BQUAAAKCkpwaBBg6DRaDBu3DgUFhZCEAQAQEFBAW677TZ/SyKiLnbm8v7xIX11HYwkop7m9xa5SqXC4sWL8eabb0Kr1SIrKwu5ublYtmwZ4uLiMHPmTDz55JNYunQpVq9ejQsXLuCtt94CAPTq1QtPP/00lixZAplMhocffpgHuhEFoCsXSsnOjBe5EiL6MYlwZXM4iPg6RcPpHO/YG+/Ym6v9z+rvYHO48cn/uwf19SaxywlI/L7xjr3xLiCm1okotBmaWmFotmFAn1jPGSZEFDgY5ER0XVem1QfwtDOigMQgJ6LrunKg20AuBEMUkBjkRHRdpyuboFRIkZakEbsUIroGBjkReWWyOlBdb0a/5FjIZfx1QRSI+JNJRF55lmXltDpRwGKQE5FXZyqaAAADeMUzooDFICcir05XGiGTStAvmUFOFKgY5ER0Ta12J8prWpDeKxoRSt8uUkREPY9BTkTXdK66GS63wP3jRAGOQU5E13T6Ag90IwoGDHIiuqbTFUZIwAPdiAIdg5yIruJwunGuuhl9EjVQRyrELoeIroNBTkRXKb3YDKfLzWl1oiDAICeiq1xZCCaLQU4U8BjkRHSVkssXShnAICcKeAxyImrH5XbjbGUTesVHIVatFLscIuoAg5yI2qm4ZEKr3cX940RBgkFORO1cOX+c+8eJggODnIjaubJ/nFvkRMGBQU5EHm5BwJnKJuhiIqGLjRS7HCLyAYOciDwu1pthsjq4NU4URBjkRORxZVo9K41BThQsGORE5FF8gUFOFGwY5EQEABAEASUXGqGNjkBinErscojIRwxyIgIAVNeb0WJxYFBaHCQSidjlEJGPGOREBOCH0+pakSshos5gkBMRAKDkQiMAYBD3jxMFFQY5EcEtCCi+YIQ2OgJ67h8nCioMciJC9eXzxwelabl/nCjIyP15stFoxIoVK5CamoqysjLMnTsXCQkJ7cYcP34cGzZswJAhQ1BaWoqcnBw88sgjAIDXX38dpaWlnrGvvvoqsrKy/CmJiG5AyeX945xWJwo+fgX5O++8g9zcXEyePBn/+te/sHTpUrz99tvtxtTV1eHnP/85cnJy4HA48JOf/AR33nkn4uPjodfr8cYbb/j1AYjIf8WX949npfNAN6Jg41eQ7969G8899xwAYMSIEb7Wz8AAABeNSURBVFiwYMFVYyZOnNjutkwmg0KhAACYzWa89957kMlkiIqKwvTp0yGX+1USEXWSWxBQcsGI+JgI6Lm+OlHQ6TA1Z8yYgfr6+qvunzNnDgwGA9RqNQBAo9GgqakJTqfTaxhv3LgRv/jFLxAdHQ0AuP/++5GVlQW5XI5ly5ZhzZo1eP755zssWquNglwu63AcAOj10T6NC0fsjXfh1Juyi80wWR24IzsViYkxHY4Pp950FnvjHXvjnb+96TDI169f7/UxnU4Hs9mMmJgYmEwmxMbGeg3xbdu2wWKx4Je//KXnvuzsbM/XY8aMwbp163wK8sZGS4djgLbm1NW1+DQ23LA33oVbb/KOVgIA0hPVHX7ucOtNZ7A33rE33vnam+uFvV9HrY8fPx4FBQUAgPz8fIwfPx4A4Ha7UV1d7Rn3+eefw2Aw4Je//CVKSko8B7gtXbrUM6a8vBzp6en+lENEN+D7A924f5woGPm1Q3ru3LlYvnw5ysrKUFFRgfnz5wMASkpK8NJLL2Hbtm345z//id/85jcYMmQIdu7cCaPRiFdffRWZmZlobGzE8uXLERkZidLSUrz88std8qGIyDdt5483QhcTgQTuHycKShJBEASxi+gsX6doOJ3jHXvjXTj1prymBb/+6BDGDu2Np+8d3OH4cOpNZ7E33rE33ok+tU5Ewe1UWQMAYEgGp9WJghWDnCiMXQnywRnxIldCRDeKQU4UphxOF05XNqGPXo1YtVLscojoBjHIicLU2apmOJxuDOHWOFFQY5AThSnuHycKDQxyojB1qqwRMqkEA1N5oRSiYMYgJwpD5lYHymqa0Tc5BpFKXt+AKJgxyInCUHG5EYIA7h8nCgEMcqIwdKqc+8eJQgWDnCgMnSprRIRShszeHV/tjIgCG4OcKMw0NLeitsGCQalxkMv4K4Ao2PGnmCjMFHpOO+P+caJQwCAnCjMnz18O8kwGOVEoYJAThRGX243C0gboYiKQrIsSuxwi6gIMcqIwcr66GRabE0P76iCRSMQuh4i6AIOcKIycOG8AAAztqxO5EiLqKgxyojBy4lwDZFIJBqXz/HGiUMEgJwoTTSYbymtbMDA1DqoILstKFCoY5ERh4mRp29HqnFYnCi0McqIw8f3+cZ52RhRKGOREYeDKaWfxMRFITlCLXQ4RdSEGOVEYKK1ugbmVp50RhSIGOVEYOM7TzohCFoOcKAycOG+ATCrBYJ52RhRyGOREIc5osqG8pgUD+sTytDOiEMQgJwpxR8/UAwCGD9CLXAkRdQcGOVGIK/AEeYLIlRBRd2CQE4Uwq82JovIGpCVqkBCnErscIuoGDHKiEHbivAFOl4DhAzmtThSq/DryxWg0YsWKFUhNTUVZWRnmzp2LhISrp+/uuOMOpKSkAAASExOxYsUKAEBlZSVWr16N9PR0VFVVYf78+VCruVgFUVfhtDpR6PNri/ydd95Bbm4uZs6ciTvvvBNLly695rgpU6bg448/xscff+wJcQBYtGgRpk+fjmeffRYDBgzAunXr/CmHiH7A6XLj+Ll66GIikZqoEbscIuomfgX57t27MXz4cADAiBEjsHv37muOO3ToENatW4eVK1ciPz8fAOBwOHDgwAEMHTq0w+cTUecVX2iE1ebC8IEJXM2NKIR1OLU+Y8YM1NfXX3X/nDlzYDAYPFPhGo0GTU1NcDqdkMvbv+y8efOQk5MDq9WKKVOmYM2aNVCpVIiMjPT8gtFoNDAYDD4VrdVGQS6X+TRWr4/2aVw4Ym+8C4XeFO05DwC4Y1R6l36eUOhNd2FvvGNvvPO3Nx0G+fr1670+ptPpYDabERMTA5PJhNjY2KtCHABycnIAACqVCoMHD0Z+fj7uu+8+tLa2QhAESCQSmEwm6HS+LR/Z2GjxaZxeH426uhafxoYb9sa7UOiNWxCQd7wa6kg59NGKLvs8odCb7sLeeMfeeOdrb64X9n5NrY8fPx4FBQUAgPz8fIwfPx4A4Ha7UV1dDQDIy8vDnj17PM8pLy9HamoqFAoFRo8ejRMnTlz1fCLyT3lNC4wmO4b1T4BMypNTiEKZX0etz507F8uXL0dZWRkqKiowf/58AEBJSQleeuklbNu2DfHx8Vi1ahVOnTqFS5cuYdKkSRg5ciQA4Ne//jX+8Ic/4Ntvv8XFixexYMEC/z8RESH/dB0A8LQzojAgEQRBELuIzvJ1iobTOd6xN94Fe28EQcCCNXloNjuwcs5YRCh8O57EF8Hem+7E3njH3ngn+tQ6EQWe8xebUWdsxfCBCV0a4kQUmBjkRCHmwKlaAMCtg5NEroSIegKDnCiEuN0CDhVfgjpSjpsy48Uuh4h6AIOcKISUVBjRZLLjlqxEyGX88SYKB/xJJwohV6bVRw9OFLkSIuopDHKiEOF0uXGk5BJiNUpkpWnFLoeIegiDnChEFJY2wNzqxKhBiZBKubY6UbhgkBOFiANFV6bVebQ6UThhkBOFAJvDhYIz9UiIjUTf5BixyyGiHsQgJwoBR0ouwWZ3YUx2Ei9ZShRmGOREIWDPsYsAgLE5ySJXQkQ9jUFOFORqGiw4XWHE4HQtEuNUYpdDRD2MQU4U5PYea7tk8Libe4tcCRGJgUFOFMScLje+O1kDdaQct/CSpURhiUFOFMSOnzOg2WxHbnYvKOS80hlROGKQEwWxPZ5pdR7kRhSuGOREQaqhuRUnzhuQ2TsGqYkascshIpEwyImC1HcnLkIQeJAbUbhjkBMFIafLjX8XVCFCKeOSrERhjkFOFIQOFV+C0WTHuJzeUEXIxS6HiETEICcKMoIgYMfBCkgkwJ0jU8Uuh4hExiAnCjKnK4wor23BiIF6ruRGRAxyomCz41AFAODuUdwaJyIGOVFQqW204OiZemT2jkH/lFixyyGiAMAgJwoi/zxUCQHApFtTeblSIgLAICcKGuZWB/aeqIYuJgK3ZHFddSJqwyAnChJfH6yA3eHGnSNTIZPyR5eI2vC3AVEQaLHY8c3hCsSolZgwPEXscogogDDIiYLAVwcuwGZ34d7cdEQoeJUzIvqeX0tCGY1GrFixAqmpqSgrK8PcuXORkJDQbsyBAwfwxhtvID4+HgBgMBhwzz33YPbs2Xj99ddRWlrqGfvqq68iKyvLn5KIQk6TyYadRyqhjY7AhGG8yhkRtedXkL/zzjvIzc3F5MmT8a9//QtLly7F22+/3W5MYmIi3n77bQwZMgQA8Morr2Dq1KkAAL1ejzfeeMOfEohC3vb95bA73fi/P8ngNceJ6Cp+Bfnu3bvx3HPPAQBGjBiBBQsWXDUmMzPT83V9fT3sdjtSUtr28ZnNZrz33nuQyWSIiorC9OnTIZdz3WiiKxqaW7GroBoJsZEYl8OrnBHR1TpMzRkzZqC+vv6q++fMmQODwQC1Wg0A0Gg0aGpqgtPp9BrGn3zyCaZPn+65ff/99yMrKwtyuRzLli3DmjVr8Pzzz3dYtFYbBbmPWyZ6fbRP48IRe+NdoPTm893n4XS58eikQejdKzAWgAmU3gQi9sY79sY7f3vTYZCvX7/e62M6nQ5msxkxMTEwmUyIjY31GuJ2ux0nT57EnDlzPPdlZ2d7vh4zZgzWrVvnU5A3Nlo6HAO0NaeursWnseGGvfEuUHpTccmEr/aXISk+CkMz4gKipkDpTSBib7xjb7zztTfXC3u/jlofP348CgoKAAD5+fkYP348AMDtdqO6urrd2G3btuHee+9td9/SpUs9X5eXlyM9Pd2fcohChiAI+NOOEggC8NidA3jeOBF55dcO6blz52L58uUoKytDRUUF5s+fDwAoKSnBSy+9hG3btnnGfvXVV1i9enW75zc2NmL58uWIjIxEaWkpXn75ZX/KIQoZ+0/V4kxlE0YM1OOmvjqxyyGiACYRBEEQu4jO8nWKhtM53rE33ondG6vNiVfW7ofF5sRb/zUaCQF0qVKxexPI2Bvv2BvvRJ9aJ6Kut+27MjSZ7bh3THpAhTgRBSYGOVEAqaoz4ZvDFdDHReKeMWlil0NEQYBBThQgnC431m07BZdbwKN3DuTiL0TkEwY5UYDYsrcUFy6ZMC6nN27un9DxE4iIwCAnCginK4z4x/5y6OMiMX3iALHLIaIgwiAnEpnV5sQfvzwFSIBn7suGKoLLFBOR7xjkRCISBAEbvzmN+qZW3Jubjv59AmMZViIKHgxyIhHtPFKJfSdrkNErGg/8NLPjJxAR/QiDnEgkhWUN+HTnWcSolZg1dSjkMv44ElHn8TcHkQhqGyx4b/NJSKXArKlDER8TKXZJRBSkGOREPczS6sTvvzgOi82JJycNQv8U7hcnohvHICfqQa12J1Z+fgwXDRbcPSoVY3N6i10SEQU5BjlRD7HZXVj5+XGcrWrCrYMT8cjt/cUuiYhCAIOcqAfYHS78/ovjOF1hxC1Zejxz/xBIpRKxyyKiEMCVJ4i6WavdiT9sPomi8kYMH5CAZx/IhkzKv6GJqGswyIm6UWOLDb/7/BguXDIhp58Ov3jwJp5mRkRdikFO1E0u1Lbgd389jsYWG8YPS8Zjdw1kiBNRl2OQE3WDw8WXsH57EewOFx65vT8m3ZoKiYT7xImo6zHIibqQze7Cn3eexp5jF6GUS/HcQzdh5KBEscsiohDGICfqIuU1LXj/b4WobbAgLVGDmQ9kIzlBLXZZRBTiGOREfrLanNj6bSl2HqmEyy3g7lGpmDa+HxRy7g8nou7HICe6QYIgYH9hLf7y77NoMtuhj4vEE5OycFOmTuzSiCiMMMiJOkkQBBw7a8DfvitFWU0LFHIpHhqXiXtGp0Ehl4ldHhGFGQY5kY/cbgFHz9Zj23dlKK9tAQCMHJSIRyb0Q0KcSuTqiChcMciJOtBktuPb49XYVVANQ3MrJABuHZyI+36SgT56jdjlEVGYY5ATXYPN7sKxc/U4VHQJR8/Ww+UWoFRIcdvNybhrVCpSeDQ6EQUIBjnRZU0mG06WNqC4shiHTtXA7nADAFIS1JgwPAW52b0QFckfGSIKLPytRGGr2WzHmcomnK0yoqisERcumTyPJWlVGDU4CbcOSkSKXs1V2YgoYDHIKeQJgoAmsx2Vl0wor21BxSUTympacKnR6hkjl0mQnaFFdqYOt92SCpUMDG8iCgoMcgoJTpcbTSY76pusqG9qRZ3RijpjK2oazKhpsMBqc7UbHxUhx9C+OvTvE4sBKbHITI5BhKLt1DG9Php1dS1ifAwiok7zK8jdbjf+8pe/4He/+x02bNiAgQMHXnPc1q1bUVRUBKlUirS0NEyfPh0AUFlZidWrVyM9PR1VVVWYP38+1GoeRBTu3G4BVrsTVpsTVpsLllYHzK1OmK1t/7ZY7Gi22NFsdqDJbIOxxYYWiwPCNV5LLpMgSRuFXhlRSElQIy0pGmmJGuhiI7nFTUQhwa8gLy4uxs033wyVyvs5tDU1Nfjggw+wZcsWSCQSTJs2DWPGjEFGRgYWLVqEF154ATk5Ofj444+xbt06vPjiix2+b4vF7lN9SpPN57E/dK1AuJHBVz0kCF4fE673Oj98nnDl+Z4vIFx5vcsPCj+8/8p9QtvXVx4zOdxoaDBDEAD35TFuQfCMc7sFuK98LQhwu3/4ddtjLrcbbrcA1+X/3G4BLpcAp9sNp0uAy9X2r9Pl9vzncLphd17+1+GC/fK/NocLrXYXbPa2+3yllEsRFx2B3jo1tNERiI+JREJcJPSxKiTERSIhNhIyKZdKJaLQ5VeQDxkypMMxe/fuRXZ2tmfrZ/jw4dizZw9SUlJw4MABDB06FAAwYsQIvPrqqz4F+Qu//9afsimAyGVSRCikUCpkUEcqoIuJRIRCBlWE/PJ/MkRFyqGJVECtUkAdqUB0lALRaiViohSIUMi4ZU1EYa3DIJ8xYwbq6+uvun/OnDmYOHFih2/Q0NDQbrpcrVbDYDCgsbERkZHfT29qNBoYDAafit624kGfxhHdKL0+WuwSAhZ74x174x17452/vekwyNevX+/XG8THx6O8vNxz22w2Iy0tDVqtFq2trRAEARKJBCaTCTodLzZBRETUGd2y89DtdqO6uhoAMG7cOBQWFnr21RYUFOC2226DQqHA6NGjceLECQBAfn4+xo8f3x3lEBERhSyJIFzvMKvra2pqwsaNG/Hhhx/iwQcfxH333Ydhw4ahqKgIL730ErZt2wag7aj1kydPQiaTISMjo91R63/4wx+QmpqKixcvYsGCBTxqnYiIqBP8CnIiIiISF8/LISIiCmIMciIioiAW8ku02u12fPjhh1CpVDh79iy0Wi1+9atfiV1WQFm9ejU2bNiAAwcOiF1KwFiyZAlUKhWioqJQXFyMV155BXq9XuyyRLNv3z7s2LEDOp0OEokEs2bNErukgHHhwgWsXLkSQ4YMQU1NDeLi4tifH2htbcXDDz+MsWPHYv78+WKXE1DOnz+P7du3IyIiAocOHcLs2bORk5PT6dcJ+SBft24dbr31VowaNQpA22p09L0DBw6gublZ7DICjkql8vzBt3btWrz//vt47bXXRK5KHFarFYsWLcL27duhVCoxe/Zs5OXlITc3V+zSAoLRaMTkyZNx5513AgAmT56MCRMm4KabbhK5ssBw5Y8cas/lcuE3v/kN3n//fUilUjz00EOQy28skkM+yL/88kskJyejsLAQRqMRTzzxhNglBYz6+nps374dM2fOxObNm8UuJ6D8cNZGEARERUWJWI24jh49iuTkZCiVSgBtqzDu2rWLQX7Zj7eg3G73dZetDidbtmzBiBEjUFJSAovFInY5AeXEiRMQBAEff/wxWltbERcXh0ceeeSGXiskgvx6q89VVVVBIpHgqaeewr59+/Diiy/i448/FqFKcVyvNzt37sT8+fPR0hKeV/ryZdXC5uZmfPvtt3j33Xd7uryAYTAY2p0W2plVGMPNN998g7Fjx6Jfv35ilyK6s2fP4vz585g7dy5KSkrELifgVFdX4+jRo3jnnXcQHR2NefPmQaFQYOrUqZ1+rZAI8uutPqfRaDx/Md9yyy04fPgwXC4XZDJZT5UnKm+9OXHiBORyOT777DM0NTXBZrNh7dq1uPvuu5GRkdGzRYqko1ULW1pa8Otf/xpLlixBXFxcD1UVeHQ6Hcxms+c2V2G8tv379+PAgQN45ZVXxC4lIHzzzTdQKpVYu3Ytjhw5AofDgY8++ghPPfWU2KUFBLVajb59+yI6um151ltuuQUHDx4M3yC/ntzcXFRUVKBv376oqqpCWlpa2IT49QwdOtRzwZrKykr89a9/xcyZM0WuKnA0NDRgyZIleOmll5CUlISvv/4akyZNErssUQwbNgzV1dWw2+1QKpXIz8/Ho48+KnZZAWXXrl04fPgwFi5ciEuXLqG6uhrDhw8XuyxRPffcc56vbTYbLBYLQ/wHbr75ZhiNRs+GZXV19Q1vRIX8gjC1tbX4/e9/j7S0NJw7dw6PP/74DR0VGKrKy8vx6aef4s9//jNmzpyJp556Kqz3B18xZcoUOJ1Oz5a4Wq3G+++/L3JV4vnuu+/w9ddfQ6vVQqFQ8KjsHzh58iSeeOIJz8FtFosFjz322A1tWYWir7/+Ghs3boTD4cBjjz2G++67T+ySAsY333yD/fv3Q6vV4uLFi3jttdcQGRnZ6dcJ+SAnIiIKZVwQhoiIKIgxyImIiIIYg5yIiCiIMciJiIiCGIOciIgoiDHIiYiIghiDnIiIKIiF/MpuROS/TZs2YefOnUhKSkJTUxP+/ve/Y+vWrRg4cKDYpRGFPS4IQ0QdKigoQFxcHDIzMzF79mykpaXhf/7nf8Qui4jAICeiTvj888/xySef4LPPPvNc1pSIxMWpdSLySWlpKVasWIGNGzcyxIkCCA92I6IOORwO/Pd//zdmz56Nfv364dy5czh8+LDYZRERuEVORD746KOPcO7cOZw7dw5vvPEGamtrMXHiRIwcOVLs0ojCHveRExERBTFOrRMREQUxBjkREVEQY5ATEREFMQY5ERFREGOQExERBTEGORERURBjkBMREQUxBjkREVEQ+/+RA+hyZdAJPAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "id": "b8d1371c", + "metadata": { + "editable": true + }, "source": [ ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 - "\"\"\"The sigmoid function (or the logistic curve) is a\n", - "function that takes any real number, z, and outputs a number (0,1).\n", - "It is useful in neural networks for assigning weights on a relative scale.\n", - "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", - "\n", - "import numpy\n", - "import matplotlib.pyplot as plt\n", - "import math as mt\n", - "\n", - "z = numpy.arange(-5, 5, .1)\n", - "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", - "sigma = sigma_fn(z)\n", - "\n", - "fig = plt.figure()\n", - "ax = fig.add_subplot(111)\n", - "ax.plot(z, sigma)\n", - "ax.set_ylim([-0.1, 1.1])\n", - "ax.set_xlim([-5,5])\n", - "ax.grid(True)\n", - "ax.set_xlabel('z')\n", - "ax.set_title('sigmoid function')\n", - "\n", - "plt.show()\n", - "\n", - "\"\"\"Step Function\"\"\"\n", - "z = numpy.arange(-5, 5, .02)\n", - "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", - "step = step_fn(z)\n", - "\n", - "fig = plt.figure()\n", - "ax = fig.add_subplot(111)\n", - "ax.plot(z, step)\n", - "ax.set_ylim([-0.5, 1.5])\n", - "ax.set_xlim([-5,5])\n", - "ax.grid(True)\n", - "ax.set_xlabel('z')\n", - "ax.set_title('step function')\n", - "\n", - "plt.show()\n", - "\n", - "\"\"\"tanh Function\"\"\"\n", - "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", - "t = numpy.tanh(z)\n", - "\n", - "fig = plt.figure()\n", - "ax = fig.add_subplot(111)\n", - "ax.plot(z, t)\n", - "ax.set_ylim([-1.0, 1.0])\n", - "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", - "ax.grid(True)\n", - "ax.set_xlabel('z')\n", - "ax.set_title('tanh function')\n", + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Two parameters\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", "\n", - "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\beta$ in our fitting of the Sigmoid function, that is we define probabilities" + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "c95f3051", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\begin{align*}\n", - "p(y_i=1|x_i,\\hat{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n", - "p(y_i=0|x_i,\\hat{\\beta}) &= 1 - p(y_i=1|x_i,\\hat{\\beta}),\n", - "\\end{align*}\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", "$$" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "107fab0a", + "metadata": { + "editable": true + }, "source": [ - "where $\\hat{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$. \n", - "\n", - "Note that we used" + "We can rewrite this as" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "d56b4bd7", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "p(y_i=0\\vert x_i, \\hat{\\beta}) = 1-p(y_i=1\\vert x_i, \\hat{\\beta}).\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", "$$" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "4712d813", + "metadata": { + "editable": true + }, "source": [ - "\n", - "## Maximum likelihood\n", + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", "\n", - "In order to define the total likelihood for all possible outcomes from a \n", - "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", - "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", - "We aim thus at maximizing \n", - "the probability of seeing the observed data. We can then approximate the \n", - "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "43a58a59", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\begin{align*}\n", - "P(\\mathcal{D}|\\hat{\\beta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\hat{\\beta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\hat{\\beta}))\\right]^{1-y_i}\\nonumber \\\\\n", - "\\end{align*}\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", "$$" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "6333f694", + "metadata": { + "editable": true + }, "source": [ - "from which we obtain the log-likelihood and our **cost/loss** function" + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "24e27c2b", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\mathcal{C}(\\hat{\\beta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\hat{\\beta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\hat{\\beta}))\\right]\\right).\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", "$$" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "0462c197", + "metadata": { + "editable": true + }, "source": [ - "## The cost function rewritten\n", - "\n", - "Reordering the logarithms, we can rewrite the **cost/loss** function as" + "which, using the abovementioned expectation values can be rewritten as" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "965cd453", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\mathcal{C}(\\hat{\\beta}) = \\sum_{i=1}^n \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", "$$" ] }, { "cell_type": "markdown", - "metadata": {}, - "source": [ - "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\beta$.\n", - "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, + "id": "4426c74e", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\mathcal{C}(\\hat{\\beta})=-\\sum_{i=1}^n \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n", - "$$" + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$." ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "d68ec470", + "metadata": { + "editable": true + }, "source": [ - "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", - "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.\n", + "## A way to Read the Bias-Variance Tradeoff\n", "\n", - "## Minimizing the cross entropy\n", + "\n", + "\n", "\n", - "The cross entropy is a convex function of the weights $\\hat{\\beta}$ and,\n", - "therefore, any local minimizer is a global minimizer. \n", - "\n", - "\n", - "Minimizing this\n", - "cost function with respect to the two parameters $\\beta_0$ and $\\beta_1$ we obtain" + "

    Figure 1:

    \n", + "" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "0198c371", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\frac{\\partial \\mathcal{C}(\\hat{\\beta})}{\\partial \\beta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right),\n", - "$$" + "## Example code for Bias-Variance tradeoff" ] }, { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "and" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "\\frac{\\partial \\mathcal{C}(\\hat{\\beta})}{\\partial \\beta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right).\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, + "cell_type": "code", + "execution_count": 3, + "id": "af517050", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], "source": [ - "## A more compact expression\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", "\n", - "Let us now define a vector $\\hat{y}$ with $n$ elements $y_i$, an\n", - "$n\\times p$ matrix $\\hat{X}$ which contains the $x_i$ values and a\n", - "vector $\\hat{p}$ of fitted probabilities $p(y_i\\vert x_i,\\hat{\\beta})$. We can rewrite in a more compact form the first\n", - "derivative of cost function as" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "\\frac{\\partial \\mathcal{C}(\\hat{\\beta})}{\\partial \\hat{\\beta}} = -\\hat{X}^T\\left(\\hat{y}-\\hat{p}\\right).\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we in addition define a diagonal matrix $\\hat{W}$ with elements \n", - "$p(y_i\\vert x_i,\\hat{\\beta})(1-p(y_i\\vert x_i,\\hat{\\beta})$, we can obtain a compact expression of the second derivative as" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "\\frac{\\partial^2 \\mathcal{C}(\\hat{\\beta})}{\\partial \\hat{\\beta}\\partial \\hat{\\beta}^T} = \\hat{X}^T\\hat{W}\\hat{X}.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Extending to more predictors\n", + "np.random.seed(2018)\n", "\n", - "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "\\log{ \\frac{p(\\hat{\\beta}\\hat{x})}{1-p(\\hat{\\beta}\\hat{x})}} = \\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here we defined $\\hat{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\hat{\\beta}=[\\beta_0, \\beta_1, \\dots, \\beta_p]$ leading to" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "p(\\hat{\\beta}\\hat{x})=\\frac{ \\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}{1+\\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Including more classes\n", + "n = 500\n", + "n_boostraps = 100\n", + "degree = 18 # A quite high value, just to show.\n", + "noise = 0.1\n", "\n", - "Till now we have mainly focused on two classes, the so-called binary\n", - "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", -<<<<<<< HEAD - "of simplicity assume we have only two predictors. We have then\n", - "following model" -======= - "of simplicity assume we have only two predictors. We have then following model" ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ -<<<<<<< HEAD - "1\n", - "5\n", - " \n", - "<\n", - "<\n", - "<\n", - "!\n", - "!\n", - "M\n", - "A\n", - "T\n", - "H\n", - "_\n", - "B\n", - "L\n", - "O\n", - "C\n", - "K" -======= - "$$\n", - "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\beta_{10}+\\beta_{11}x_1,\n", - "$$" + "# Make data set.\n", + "x = np.linspace(-1, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n", + "\n", + "# Hold out some test data that is never used in training.\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "# Combine x transformation and model into one operation.\n", + "# Not neccesary, but convenient.\n", + "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + "\n", + "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n", + "# for each bootstrap iteration.\n", + "y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + "for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + "\n", + " # Evaluate the new model on the same test data each time.\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + "# Note: Expectations and variances taken w.r.t. different training\n", + "# data sets, hence the axis=1. Subsequent means are taken across the test data\n", + "# set in order to obtain a total value, but before this we have error/bias/variance\n", + "# calculated per data point in the test set.\n", + "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n", + "# maintains the column vector form. Dropping this yields very unexpected results.\n", + "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + "print('Error:', error)\n", + "print('Bias^2:', bias)\n", + "print('Var:', variance)\n", + "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n", + "\n", + "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n", + "plt.scatter(x_test, y_test, label='Data points')\n", + "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n", + "plt.legend()\n", + "plt.show()" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "2b502d1d", + "metadata": { + "editable": true + }, "source": [ - "and" ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 + "## Understanding what happens" ] }, { - "cell_type": "markdown", - "metadata": {}, + "cell_type": "code", + "execution_count": 4, + "id": "9a5194fb", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], "source": [ - "$$\n", - "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\beta_{20}+\\beta_{21}x_1,\n", - "$$" + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", + "plt.show()" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "727c7723", + "metadata": { + "editable": true + }, "source": [ - "and so on till the class $C=K-1$ class" + "## Summing up\n", + "\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", + "\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", + "\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", + "\n", + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "7e90566c", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\beta_{(K-1)0}+\\beta_{(K-1)1}x_1,\n", - "$$" + "## Another Example from Scikit-Learn's Repository\n", + "\n", + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." ] }, { - "cell_type": "markdown", - "metadata": {}, + "cell_type": "code", + "execution_count": 5, + "id": "7c760f15", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, + "outputs": [], "source": [ - "and the model is specified in term of $K-1$ so-called log-odds or\n", - "**logit** transformations.\n", "\n", "\n", - "## More classes\n", + "#print(__doc__)\n", "\n", - "In our discussion of neural networks we will encounter the above again\n", - "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", "\n", - "The softmax function is used in various multiclass classification\n", - "methods, such as multinomial logistic regression (also known as\n", - "softmax regression), multiclass linear discriminant analysis, naive\n", - "Bayes classifiers, and artificial neural networks. Specifically, in\n", - "multinomial logistic regression and linear discriminant analysis, the\n", - "input to the function is the result of $K$ distinct linear functions,\n", - "and the predicted probability for the $k$-th class given a sample\n", - "vector $\\hat{x}$ and a weighting vector $\\hat{\\beta}$ is (with two\n", - "predictors):" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\beta_{k0}+\\beta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}}.\n", - "$$" + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "2619ab70", + "metadata": { + "editable": true + }, "source": [ - "It is easy to extend to more predictors. The final class is" + "## Various steps in cross-validation\n", + "\n", + "When the repetitive splitting of the data set is done randomly,\n", + "samples may accidently end up in a fast majority of the splits in\n", + "either training or test set. Such samples may have an unbalanced\n", + "influence on either model building or prediction evaluation. To avoid\n", + "this $k$-fold cross-validation structures the data splitting. The\n", + "samples are divided into $k$ more or less equally sized exhaustive and\n", + "mutually exclusive subsets. In turn (at each split) one of these\n", + "subsets plays the role of the test set while the union of the\n", + "remaining subsets constitutes the training set. Such a splitting\n", + "warrants a balanced representation of each sample in both training and\n", + "test set over the splits. Still the division into the $k$ subsets\n", + "involves a degree of randomness. This may be fully excluded when\n", + "choosing $k=n$. This particular case is referred to as leave-one-out\n", + "cross-validation (LOOCV)." ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "3e4d0bdb", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}},\n", - "$$" + "## Cross-validation in brief\n", + "\n", + "For the various values of $k$\n", + "\n", + "1. shuffle the dataset randomly.\n", + "\n", + "2. Split the dataset into $k$ groups.\n", + "\n", + "3. For each unique group:\n", + "\n", + "a. Decide which group to use as set for test data\n", + "\n", + "b. Take the remaining groups as a training data set\n", + "\n", + "c. Fit a model on the training set and evaluate it on the test set\n", + "\n", + "d. Retain the evaluation score and discard the model\n", + "\n", + "5. Summarize the model using the sample of model evaluation scores" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "65d5f3f5", + "metadata": { + "editable": true + }, "source": [ - "and they sum to one. Our earlier discussions were all specialized to\n", - "the case with two classes only. It is easy to see from the above that\n", - "what we derived earlier is compatible with these equations.\n", - "\n", - "To find the optimal parameters we would typically use a gradient\n", - "descent method. Newton's method and gradient descent methods are\n", - "discussed in the material on [optimization\n", - "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html).\n", - "\n", -<<<<<<< HEAD - "\n", - "\n", + "## Code Example for Cross-validation and $k$-fold Cross-validation\n", "\n", - "## A simple classification problem" + "The code here uses Ridge regression with cross-validation (CV) resampling and $k$-fold CV in order to fit a specific polynomial." ] }, { "cell_type": "code", - "execution_count": 2, - "metadata": {}, + "execution_count": 6, + "id": "66c55986", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, "outputs": [], "source": [ "import numpy as np\n", - "from sklearn import datasets, linear_model\n", "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.linear_model import Ridge\n", + "from sklearn.model_selection import cross_val_score\n", + "from sklearn.preprocessing import PolynomialFeatures\n", "\n", + "# A seed just to ensure that the random numbers are the same for every run.\n", + "# Useful for eventual debugging.\n", + "np.random.seed(3155)\n", "\n", - "def generate_data():\n", - " np.random.seed(0)\n", - " X, y = datasets.make_moons(200, noise=0.20)\n", - " return X, y\n", + "# Generate the data.\n", + "nsamples = 100\n", + "x = np.random.randn(nsamples)\n", + "y = 3*x**2 + np.random.randn(nsamples)\n", "\n", + "## Cross-validation on Ridge regression using KFold only\n", "\n", - "def visualize(X, y, clf):\n", - " plot_decision_boundary(lambda x: clf.predict(x), X, y)\n", + "# Decide degree on polynomial to fit\n", + "poly = PolynomialFeatures(degree = 6)\n", "\n", - "def plot_decision_boundary(pred_func, X, y):\n", - " # Set min and max values and give it some padding\n", - " x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5\n", - " y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5\n", - " h = 0.01\n", - " # Generate a grid of points with distance h between them\n", - " xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n", - " # Predict the function value for the whole gid\n", - " Z = pred_func(np.c_[xx.ravel(), yy.ravel()])\n", - " Z = Z.reshape(xx.shape)\n", - " # Plot the contour and training examples\n", - " plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)\n", - " plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)\n", - " plt.show()\n", + "# Decide which values of lambda to use\n", + "nlambdas = 500\n", + "lambdas = np.logspace(-3, 5, nlambdas)\n", "\n", + "# Initialize a KFold instance\n", + "k = 5\n", + "kfold = KFold(n_splits = k)\n", "\n", - "def classify(X, y):\n", - " clf = linear_model.LogisticRegressionCV()\n", - " clf.fit(X, y)\n", - " return clf\n", + "# Perform the cross-validation to estimate MSE\n", + "scores_KFold = np.zeros((nlambdas, k))\n", "\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + " j = 0\n", + " for train_inds, test_inds in kfold.split(x):\n", + " xtrain = x[train_inds]\n", + " ytrain = y[train_inds]\n", "\n", - "def main():\n", - " X, y = generate_data()\n", - " # visualize(X, y)\n", - " clf = classify(X, y)\n", - " visualize(X, y, clf)\n", + " xtest = x[test_inds]\n", + " ytest = y[test_inds]\n", "\n", - "if __name__ == \"__main__\":\n", - " main()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Cancer Data again now with Decision Trees and other Methods" -======= - "This will be discussed next week. Before we develop our own codes for logistic regression, we end this lecture by studying the functionality that **Scikit-learn** offers. \n", + " Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n", + " ridge.fit(Xtrain, ytrain[:, np.newaxis])\n", "\n", + " Xtest = poly.fit_transform(xtest[:, np.newaxis])\n", + " ypred = ridge.predict(Xtest)\n", "\n", + " scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n", "\n", + " j += 1\n", + " i += 1\n", "\n", "\n", - "## Wisconsin Cancer Data\n", + "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n", "\n", - "We show here how we can use a simple regression case on the breast\n", - "cancer data using Logistic regression as our algorithm for\n", - "classification." ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 - ] - }, - { - "cell_type": "code", -<<<<<<< HEAD - "execution_count": 3, - "metadata": {}, - "outputs": [], -======= - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(426, 30)\n", - "(143, 30)\n", - "Test set accuracy with Logistic Regression: 0.95\n", - "Test set accuracy Logistic Regression with scaled data: 0.96\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/hjensen/opt/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n" - ] - } - ], ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn.model_selection import train_test_split \n", - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.linear_model import LogisticRegression\n", + "## Cross-validation using cross_val_score from sklearn along with KFold\n", + "\n", + "# kfold is an instance initialized above as:\n", + "# kfold = KFold(n_splits = k)\n", + "\n", + "estimated_mse_sklearn = np.zeros(nlambdas)\n", + "i = 0\n", + "for lmb in lambdas:\n", + " ridge = Ridge(alpha = lmb)\n", + "\n", + " X = poly.fit_transform(x[:, np.newaxis])\n", + " estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n", + "\n", + " # cross_val_score return an array containing the estimated negative mse for every fold.\n", + " # we have to the the mean of every array in order to get an estimate of the mse of the model\n", + " estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n", + "\n", + " i += 1\n", + "\n", + "## Plot and compare the slightly different ways to perform cross-validation\n", + "\n", + "plt.figure()\n", "\n", - "# Load the data\n", - "cancer = load_breast_cancer()\n", + "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n", + "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n", + "\n", + "plt.xlabel('log10(lambda)')\n", + "plt.ylabel('mse')\n", + "\n", + "plt.legend()\n", "\n", - "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", - "print(X_train.shape)\n", - "print(X_test.shape)\n", - "# Logistic Regression\n", - "logreg = LogisticRegression(solver='lbfgs')\n", - "logreg.fit(X_train, y_train)\n", - "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", - "#now scale the data\n", - "from sklearn.preprocessing import StandardScaler\n", - "scaler = StandardScaler()\n", - "scaler.fit(X_train)\n", - "X_train_scaled = scaler.transform(X_train)\n", - "X_test_scaled = scaler.transform(X_test)\n", - "# Logistic Regression\n", - "logreg.fit(X_train_scaled, y_train)\n", - "print(\"Test set accuracy Logistic Regression with scaled data: {:.2f}\".format(logreg.score(X_test_scaled,y_test)))" + "plt.show()" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "8bd8e7a8", + "metadata": { + "editable": true + }, "source": [ -<<<<<<< HEAD - "## Other measures in classification studies: Cancer Data again" -======= - "## Using the correlation matrix\n", - "\n", - "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n", - "We use **Pandas** to compute the correlation matrix." ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 + "## More examples on bootstrap and cross-validation and errors" ] }, { "cell_type": "code", -<<<<<<< HEAD - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ -======= - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAscAAAWYCAYAAABaiWuCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVxVdf7H8RcgCCKKCyou4JJLariUO2pmmWZlOjVqSpk2JmqOU5ZbpdakYmWGaepkZmaTlYqmllqTy2TgkmCMSCqCqIGEIigg2/f3h3h/kqBgXLjg+/l48Hjcs9zv+XzPOXzv537P955jZ4wxiIiIiIgI9qUdgIiIiIiIrVByLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxyE188MEHdOvWjYULF1rmPfLII8TExJRiVCIiImINSo5FbsLf35/u3bvnmbd69Wq8vb1LKSIREduzcOFCpkyZUixl3XfffYSEhBRLWSJFpeRY5BZUqVKltEMQERERK6hQ2gFI+bFmzRqWLl1KmzZtcHNz4+eff6ZFixaMHz+e+fPnExERwYgRIxg2bBgAmZmZzJ8/n4MHD2JnZ0e3bt0YN24cdnZ2HD16lICAANLT08nMzGTQoEEMHjwYgPHjx7Nz506ef/55QkNDOXr0aJ5yr3Xo0CFeffVVUlJSePLJJ/nuu+84ePAgkZGRLFq0iF27dlGxYkVcXFx4/fXXqV27NgAnT55kypQpZGdn07hxY9LT0y1lzp07l6+++opp06Zx3333MXr0aMLCwoiMjCQ2Npbx48eTkpLCf/7zHwC+++47/vWvf+Hs7Iy9vT0TJkygXbt21j4cIlJO2WJbu2XLFtavX8/ly5fx8/Oja9eu+Pv7c/LkSWbOnElGRgY5OTlMmjSJ9u3bM3fuXD777DM8PT1Zt24dEyZM4OjRo7zwwguEhISQkJDA7NmzqVKlCpMnT+b1118vsJ09c+YMEydOJCwsjDlz5hAUFMT+/fvZtm0bLi4uzJgxg/Pnz5Odnc2zzz7L/fffX6LHS8ogI1KMAgMDTY8ePUxycrK5fPmy6dKli5k+fbrJyckx4eHhpm3btiYzM9MYY8zixYuNn5+fycrKMhkZGWbw4MEmKCjIGGNMaGioCQ0NNcYYk5GRYfr27WtOnDhh2U6vXr3MjBkzjDHGhIWF5Sn3j4KDg02rVq3Mjz/+aIwxZu7cucYYYz755BOTk5NjjDFm7dq1ZtKkSZb3PP7442bJkiXGGGPi4uJMhw4dTGBgoGX58OHDzdq1a40xxsTGxppmzZrl2V6vXr0s0507dzYJCQnGGGO2b9+epxwRkVthi21tYGCgmTx5smU6KyvL9O3b13z55ZfGGGMiIiJMx44dTUpKijHGmKVLl5pBgwaZrKwsM2fOHBMREZFnu8HBwZbpm7WzV5evX7/eGGPM8uXLTXx8vHnmmWfMggULjDHGxMfHm44dO5rY2Ngi7Gm5HWlYhRQ7Hx8f3NzccHJywtvbm+bNm2NnZ0fz5s1JTU0lMTERgPXr1zNw4EAcHBxwdHSkb9++bNy4EQBvb2+++uorhgwZwsiRI0lISODw4cN5tnN1HPAfy82Pi4sLXbt2BWDy5MkAeHp68tRTTzFs2DBWrlzJ//73PwBOnz7NoUOHePTRRwGoXbs27du3v+X9UbVqVb744guSk5MtPc0iIn+WLba11woNDSU2NpYBAwYA0KJFC2rXrs2OHTsAGDVqFAAvvvgilStXpkWLFn96n/Tu3RuAkSNHYozhxx9/5PHHHwegVq1atG/fns2bN//p7Uj5pmEVUuxcXV0trytUqGCZrlDhyumWmZkJQFxcHCtWrGDdunUAXLp0yTKWd+7cuSQnJ7N69WocHBzw8/PLM7QBoHLlygBUrFgxT7n5cXNzyzMdHR3NxIkT+eyzz/Dx8SEkJISpU6cCkJCQAEC1atUs67u7uxdlF+SxYsUKlixZQr9+/bj77rt56aWXaNCgwS2XJyICttnWXis+Ph64kqhelZGRQUpKCgAODg5MnTqVYcOGWZL1P+vatj4uLg640iFiZ2cHwPnz52nWrFmxbEvKLyXHUmo8PT3x9/enX79+AOTk5JCcnAxcGSv85JNP4uDgABS+MS6sw4cP4+rqio+PDwBZWVmWZR4eHgCcO3eOunXrApCUlES9evXyLcvR0RG40ug7OTlZGv6rHBwcmDVrFlOnTiUgIICpU6fy6aefFmt9REQKUlptbZ06dXB0dGTVqlWWeampqdjb//9F66+//ponnniCGTNm8Nlnn+VZdq2btbMFbR8gMDCQ6tWrA3D58uU87b1Ifgo1rOLIkSPWjkNuQwMHDmTTpk1kZ2cDVy79LVmyBAAvLy/CwsIAOHv2LJGRkcW6bW9vb5KTkzlx4gQAu3fvtiyrV68ePj4+bNiwAbjS+7F3794Cy6pRowYuLi78+uuvAOzatSvP8jFjxpCdnY2zszM+Pj6W+oqIlISSamtdXV1JS0vDGMO4ceNo06YNnp6ebNu2DbjSCTFu3Diio6MB2LlzJ40aNWLGjBmkpaXxySef5CkrPT2d4OBgVq5cedN2Nj+1a9fG19fX0pYDzJgxQ7eIk5tymDlz5sybrfTMM89Qt25dGjZsaP2IpMz6+uuvWbFiBVFRUbi4uLBz506+++47IiIiaNWqFW+++SZRUVGEhYXxwAMP0KlTJyIjIwkMDGTjxo0kJSUxdepUHB0dufPOO1mzZg3r1q0jMjKS9PR09u3bR5MmTVi4cCFhYWGEh4fj6+vL9OnT85Tr7OxsienYsWO8+uqrnD59mp9++on77rsPZ2dnatWqRVZWFm+99RbBwcE4OTlx4MABoqOjeeCBB+jcuTPLly/niy++4JdffqFu3brs3r0bZ2dnNm3axK5du4iIiKB+/fo0btyYypUrExAQwE8//USLFi344YcfOHLkCP369SMqKorFixezYcMGwsPDmTFjBjVr1izFIyUiZZkttrUA1atXZ9WqVWzcuJFu3bpxzz330L17dxYtWsSXX37JunXrGDBgAD169ODDDz8kICAAT09PWrduzZo1a/juu+84d+4cPXr0ICcnhyVLlnDgwAFGjBiBh4dHge1sly5dGD9+vKUjw8fHx9JT7Ovry+eff86qVatYu3Ytbdq04YknniiNwyZliJ0xxtxspUWLFuHt7c0PP/xAixYtePzxx/OMxxQRERERKQ8KlRxfa8+ePUyePJkuXbowbNgw2rRpY63YRERERERKVKGS49WrV3PXXXexevVqduzYQZ8+fRg0aBBhYWGcOnWKV155pSRiFRERERGxqkIlx23btsXT05MhQ4YwaNAgy61SjDGMHz+eRYsWWT1QERERERFrK9St3B5//PF8e4fPnDlDhw4dij0oEREREZHSUKie4/379xMZGWl5nvqXX37JI488ct0vVa9KSLj5/QfLumrVKnH+fGpph2GTtG/yp/2Sv9tpv3h4uN18pVtUXO1ueTke5aUeUH7qonrYltulHrfS7hbqPsdLly6lfv36lulatWrx5ptvFnlj5UmFCg6lHYLN0r7Jn/ZL/rRfbEt5OR7lpR5QfuqietgW1eMGZRZmpWbNmtGzZ0/LdM+ePQkODi72YG43m6O25Tu/f+M+JRyJiMjtQ22viNxIoZLjuLg4srKy8jyv/eoz06X4FdRwF0QNuoiIiEjxKFRy3Lt3b3r37s2dd96JnZ0dERERvPzyy9aOTUREpNQVpadZvdIiZV+hkuOHHnqI5s2b89NPPwHw0ksv0bhxY6sGJiIiIiJS0gqVHAM0adKEJk2aWKZXrVqFn5+fVYISERERESkNhUqOf/jhB5YuXUpiYiI5OTkYY0hOTlZybCN0GU9ExLapnRYpOwqVHAcEBPDqq6/i5eWFvb09xhiWLVtm7dhEpAQF7Y4q1vIe666hVyIiUvYU6j7HTZs2pVu3bjRo0IB69epRv359xowZY+3YRKQcCw39mfHjR9OrVy8yMzPzLFu8OJABA/ry9ddB+b43LCyUkSOH8fPP+wH48MMl/Pe/O60e87W++OKzEt2eFGxz1Lbr/r4I31TaYYlIGVWonmNPT0+mTp1Ku3btcHJyAmDjxo189NFHVg1O/hxdxhNb1rZte9q1u5v9+4P5+usgBg16AoDz588TEfE/atb04JFHHsv3vW3atKVJk6aW6VGjnsPOzq5E4r7qiy/+zV//+mSJblNERKyvUMnxN998g6+vLwcPHrTM032ORaQ4jBs3jtdem8HDDw/AycmJdeu+YODAJ1i9eiUAAQH/pGZND9LS0qhRoyZDhw7P8/64uDjee+8t7rijGaNGPceZM6d57723ueOOZri6uvLJJx/x979PIicnh6VLF/HXvw7lzJnTxMREM2/eu7i6Vmb9+q84ceI41avXIC7uNyZNmkpqaiozZ07HwcGeJk2a8r///cIDD/Tl0UcH8v3327l4MYXly5fi7d2Q++9/sDR2nYiIWEGhkuOxY8cydOjQPPN27NhhjXikBKhHWWxJ06ZNad3ah40b13HffQ9gb2+Pu7u7ZXnXrr50734vACNGPMmAAQOpVMnVsrxOnTp0734vv/12BrgyJKNPn3707t2HU6diWbfuS/r1exiAb77ZRNOmzRk+fATvvBPAvn0h3Htvbzw8ajFgwCDs7e1ZsOAt9u4NpmtXX4YPf5qlSxcxZsx4zp8/z9//PoZHHx1I794P8MEHgYwa9VzJ7SgRESkRhUqOhw4dSkxMDKdPn6ZTp06cPXuWe++918qhicjt4pln/saLLz5PfHw8w4Y9TVTUMcuyxMTfWbp0EZUquXLp0iUuXLiQJzn+o+joKOrXfwaAunXrXbe8QQMvANzd3UlNTQXA2dmZxYsDqVrVnRMnTtCsWYvr1q9WrZplfREo+tNMRaRsKFRyvG7dOhYvXky9evW45557mDNnDr1792bAgAHWjk9EbgONGjWmTZt2VKhQIU+v8dGjv7J69Sq+/HIDAD/+uOumZTVs2IjY2BiaN2/BmTOnr1ue39jkV16ZzMcf/5s6deqQmnrppusDljv3HD0amSeZFrElBd2FRneTESlYoZLj/fv3s23bNmbNmoWTkxOBgYHMmjVLybFIOVLSH5ZHjhwmLOwgdnbZPPXUaGbM+Cdw5Qd5W7duITHxd06cOE7Dhg2ZO/cNvLwakpBwls2bN9KhQ2eOHz/K1q1bqFu3Hj/+uIuUlBROnIjC338C7747j+PHj1G7dm1LcrtvXzDx8XFs3ryRfv0eJizsIFFRx+jSxZfHHvsL8+cH4OPThvDwQ5w8GU3nzl3ZunULx48f5ciRw0RFHefixYvs2PE9997bm65dfXn//QXk5OQoOb5FGuJVsGuTWlfXily6dBlQUitSEgqVHHt4eGBvn/eub3+cFhEpihYtWhIYuAQPDzcSElIs86tVq8bUqa9Zpvv06Wd5/eST///goY8+Wm15/eabb1lex8ae5JVXXsfd3Z24uDh++OE/AHTo0Jkvv9xoWS8wcInl9Zgx4y2vhw8fYXl9bRwtWrTkoYcesUxPnPhS4SsrIiJlRqGS43PnzvH111+TkpLCoUOH+PHHH0lKSrJ2bOWGxqWJlJy4uN9YvnwpzZo159SpWJ5//h+lHZKUErW9InIrCpUcT5o0iTfffJM9e/awZ88eevTowSuvvGLt2EREiqxDh0506NCptMMQK7B2sqtkWkSgkMlx1apVmTdvXp55GRkZVglIRERERKS0FCo5PnPmzHXzFi5cyJw5c4o9IBERkdtFwb3Vd5RoHCLy/wqVHD/yyCO4u7tjjCErK4vExETq1Klj7djExhXUqI/w+EsJRyIiIiJSPAqVHL/00ksMGTLEMn327Fm2bt1qtaBsye2UAOq2Sre34h5vqfNGpOQUdD/joq6vW8WJFDI5vjYxBqhVqxZHjhyxSkAicns4fDicxYsDsbMztG17D+fOJXLhwgVmzPgnjo6ORSrr0qWLTJ78Au+/v8xK0YqUrOPZ+yyvHdMqkJmdBUDQ7tKKSOT2UajkeOrUqZbXxhgSEhJ0n2MR+VNatmxNu3Z3Y2eXzciRzwHw/PPPERKyB1/fnkUqy9W1MgsXLrVGmFIKyuNdI46cPG/V8q9Npq/VxKFDkcpRj7JIIZPjuLg4Hn30UeDKo1Rr1qxJp066VdLtojx+UIntycnJITn5Au7u1QgLO8jmzRtp2LARJ0/GMGbM86SkJDN79iw8PetSo0ZNwsPD8PMbSdeuvnzzzSbee+9tvv12BwCrVn1MdPRxvLwa8ssvYTg6OvL3v09i3rzZODjY06RJU/73v1944IG+PProwNKtuIiI2JRCJcdjx46lQ4f8v33GxcXpx3nXUCJZMPVISH4OHTrEqlUfc/RoJHfc0ZRmzVrw178O4F//WomHRy22bPmaTz5ZzoQJL/LII48REvIT48b9nYiI/7FixYd07epLv34Ps3z5lZ7jqKhjbN26mU8//RKA119/lXvu6UidOp4MH/40S5cuYsyY8Zw/f56//32MkmMREcmjUMnxJ598gp2dHcaY65atWLGCxYsXF3tgtu6L8E2k5j7rXqSw9AXhej4+Pvj5jQBg1aoVLFz4LsnJyXz77RYAUlIuYG/vYFnfy8sbAHf3aqSmXrquvBMnTlCvXn3LdN269fIsb9DAC7jymOrU1NRirYuItRU0fKKo6xd1uIXI7aRQyfHp06d57rnnaNq0KQBHjx6ladOmODo6EhMTY9UApez5bOsRLumLg9yCGjVqEhl5BHd3dwYMGESVKlW4cCGJ8PBfCl1Gw4aNOHUq1jJ95szpPAmynZ1dscYscjsryl0yXF0r8kD7ejdfUaSUFSo5vueee1ixYgVVq1YFIDk5mSVLlvDyyy/z+eefWzVAKf+Kqzc1v3IKKqOo2yzqbZLKYk9wSd967ciRw4SFHQRyWLlyOdnZ2URFHWPUqDFcvJjCBx8spHbt2sTHxzF48DDOnUvkxx93kZKSwqlTsWzduoX4+Dj279/LuXOJXLx4kaCgr3jsscfp06cfM2ZM4447mpKRkYGdnR0ZGRls3bqF48ePcuTIYaKijnPx4kV27Piee+/tXaJ1l7KloB/TtfCqVuh1bU1Z6VHW1TYpDYVKjpOTky2JMUCVKlU4f/5KA/DH27yVVRorXLCifDCIFFaLFi0JDFyCh4cbCQkp1y2/664218178823LK9HjXqOUaOes0z36dPP8rp9+3t4+ulRAMyePYu6devj5OTE1Kmv5dn+Qw89Uix1ERGR8qNQyXFCQgIff/wxHTp0wM7OjpCQEM6ePWvt2MTGFZQ0+1Qp4UCKUVF7iK29XfWO3JqvvvqcAwf2kZOTQ82aHrRp07a0Q5JyqKz0EheH4mob1dZJWVCo5Hj27Nm8+eabfPDBBwB07tyZ2bNnWzUwa1EPsXqCpfybNWtOaYcg11C7W34VdXhGwT8oVHIstqNQyXHt2rUJDAy0dixSyspKL0hRejBsqSfY1bViKUQiInJ7K+jLmR5xLwUp1GPuEhMTmTx5MhMnTiQtLY3XXnuNCxcuWDs2EREREZESVaie43nz5nHPPfewd+9eXFxcGDJkCG+99Rb//Oc/rR3fLdNlvLLTE1yQ0ur1LQs0bk/k5sp6GygipaPQwyqeeOIJwsPDAWjZsiVubm5WDUzKroi0YDKzs66bb2u3CCoLbGlYCCj5vt0UtZNBl6nlqvzGFjumFSrlKJR3dqwp9LpF/T2NhmFIoc7UpKQk4P9vnp+amkpsbOyN3lLsbqeTVb0dIlIWWfuKnX5MbH1l5f7HRVHQedO/gO/6RV1fyp9CJcedO3fm4YcfJiMjg9GjRxMeHs4rr7xi7dgKpSwMn1CDfkVRH3takLLcSIuIiIhtszPGmMKseOLECfbs2YMxhq5du9K4ccFfofK7oX9hlYVkF6CSa0VSC/mI5NutJ9jRsQKZmdcPq7C2/JJmW+oFcXWtaHOP1S6uJwL+mW1efQiItZ9CWFx1+jNDSzw8rDccrbja3WvbtqJ+sbd2R0BRyq/kWpGfI+KKZbulrbTa1OJ2o3oUtU0urs6Worh6nv3x87+g87KgOhXXk1sLGlry4r2D853/RwU9gKk03cqQvpvV41ba3UIlx507d2bChAk8+eSTRd6AiIiIiEhZUahbuTVr1uy6xPjcuXNWCUhEREREpLQUKjl+6KGH2LlzJ5mZmZZ5ixcvtlpQIiIiIiKloVDDKlq0aPH/b7CzwxiDnZ0dERERVg1ORERERKQk3fBuFWfOnKFChQr07t2bRYsW5Vn23nvvWTUwEREREZGSdsOe47/85S88/vjj9OzZEwBnZ2eqV69eYsGJiIiIiJSkG445bt26NUOHDmXdunU899xz7N69u6TiEhEREREpcTdMjq8+EW/8+PE0bdqUAQMGWJZlZGRYNzIRERERkRJW6AedX02Ur5o3b57NPCXP2hISEliwYAFHjhxh7dq1wJVHar/zzjs0aNCA6OhoXnjhBWrWrFnKkZa8/PbNwoUL2bt3r2WdMWPG0K1bt9IKscSdPHmSBQsW0LJlS+Li4nB3d2f8+PE6Zyh439zu50xpKS9tW3lph8pL21Fe/s9zcnIYM2YMPj4+ZGZmEhsby+zZs0lPTy9TxwMKrsu//vWvMnVMANLT03niiSfw9fVl8uTJVvn/uGFyvH37do4cOQJAdHQ0Q4YMsSw7derUbZMcHzhwgN69e+e5O8f8+fPp0qULDz30EP/5z38ICAjgrbfeKsUoS0d++wZg1apVpRRR6UtKSuKhhx7i/vvvB67cCvHee+/liy++uO3PmYL2Ddze50xpKS9tW3lph8pL21Ge/s/btm3L2LFjAfD392fbtm3s37+/TB2Pq/KrC5S9Y3L1i9dV1mizbpgcN2nShIEDB+a7bOPGjX9qw2VJ3759CQkJyTNv586d+Pv7A9C+fXumTJlSGqGVuvz2DcAHH3yAk5MT2dnZ+Pn54eLiUgrRlQ4fH5880zk5Obi4uOicoeB9A7f3OVNaykvbVl7aofLSdpSX/3N7e3tLMpmVlUV8fDyNGjXinXfeKVPHAwquS0xMTJk6JkFBQbRv357IyEhSU1MB67RZN0yOJ06cSPv27fNd1qhRoz+98bIsMTERV1dXACpXrsyFCxfIysqiQoVCj1Qpt/r27Uu9evWoVKkSq1ev5o033mD27NmlHVap2L59O76+vjRp0kTnzB9cu290ztiO8nKelvVzqry0HeXh/3z37t18/PHH3Hvvvdx1111l+nj8sS7Ozs5l5pgcO3aMqKgoXnjhBSIjIy3zrXE8bviDvIISY7jSPX87q1GjBpcuXQLg4sWLVK1atUz8Y5SEpk2bUqlSJQA6d+5McHBwKUdUOoKDgwkJCWHatGmAzplr/XHf6JyxHeXlPC3L51R5aTvKy/959+7dWb58OadOnWL16tVl9njA9XUpS8dk+/btODk5sWzZMg4cOMChQ4f4+OOPrXI8CvX4aLlez549OXjwIAA///yz5V7QAgEBAZbXMTExeHt7l2I0pWPHjh3897//Zfr06SQkJHDw4EGdM7ny2zc6Z2xHeTlPy+o5VV7ajvLwf37s2DF27Nhhma5fvz6nTp0qk8ejoLqUpWPi7+/P+PHjGT16NHfffTc+Pj6MGDHCKsejUI+Pvt3t3buXoKAgdu/ezdChQxk5ciTp6em8/fbb1K1bl9jYWF588UWb/7WqNeS3bxYtWkRaWho1atTg119/ZcKECbfVMJzw8HD8/Pxo3bo1AKmpqQwbNoz77rvvtj9nCto3J06cuK3PmdJSXtq28tIOlZe2o7z8n588eZJ58+bRsmVLsrKyOH78OK+88gqOjo5l6nhAwXX55JNPytQxAdi6dSurV68mMzOTYcOG4evrW+zHQ8mxiIiIiEguDasQEREREcml5FhEREREJJeSYxERERGRXEqORURERERyKTkWEREREcml5FgsDhw4gJ+fH127duW1116z/A0aNIhTp07dUpl+fn75Pta1vPnHP/7Brl27AAgJCcHPz6/IZcTFxTFmzJhbeq+IiIgUj7LxSBcpEXfffTcDBw7k888/5/XXX7fMX716NY6OjqUYme2bMmUK1apV+1Nl1KlTh2eeeYb333+/mKISERGRolJyLDe0cOFCBg4cSO3atdmyZQt79uzB3d2d+Ph4Xn75ZTw8PNi8eTPfffcd9erV48yZM4wdO5Y77riDDRs2EB0dzSeffMLWrVsZMGAA7777LnXq1GHu3Ln8+9//5q233mLjxo1Ur16dUaNGER8fz6BBg9i1axcZGRkEBQXxySefEB0dTcWKFUlJSWHq1KmW56hfFRgYyKpVq3jqqac4fPgw0dHRzJw5k2+++YZDhw7RvHlz5syZA1BgvADffvstK1eupHnz5ri5ubF69WpGjx5N48aNCQgIoFOnTqSlpREREcHTTz/N0KFDCQkJ4a233qJnz574+flZ4n399ddp164diYmJBAYGsnHjRuzt7Zk2bZplHwCsWrWKrVu30rhx4+vqVdA+FxERESsxItdYu3at6dixo5k4caKZOHGi6d+/v4mNjTXHjx83Dz30kMnOzjbGGPPFF1+Yl156yRhjzI4dO0xKSooxxpiwsDAzatQoS3nDhw83wcHBecqfPHmyZbpXr14mNjbWGGNMbGysufPOO83hw4eNMcZ8+umnZs+ePebpp5+2rD9//nyzYMGCfGMfPny4ee+99yzxdevWzSQlJZns7GzTs2dPc+LEiRvG+/vvv5t77rnHxMfHW8ro1auXpfzAwEAzfPhwk5OTY6Kioky3bt3yLAsMDDTGGBMcHGyGDx+eJ7Zr63ntPjhy5Ijp2rWrSUtLs9Tv6ntvtM9FRETEOtRzLNfx9vbm3XffBSAoKAhnZ2d27NjB5cuXmTlzJgCXLl0iMzMTuPKM9jlz5uDi4sKlS5eIjo6+5W1Xr16dO++8E4Bhw4YREBDA+fPnee211wBISkq6Yc9pu3btAGjQoAH16tWjatWqlhgTEhJo2LBhgfGGhoZSu3ZtatWqBVwZZvJHbdu2xc7ODi8vL37//WfOfZwAACAASURBVPdbrudVISEhtG7dGmdnZwDuuecefv75ZwD27NlT4D4XERER61ByLDf02GOPAWCMoWHDhnnGIl+6dAmAsWPH8sILL/Dggw9y6tQpnnrqqQLLs7OzIycnxzL9x2TPyckpz7QxhrZt2zJr1izLdFpaWoHlX32/nZ1dnrKu3W5B8RpjsLOzK7Dsa8t3cHDAFPHJ61fXz8rKyjOvoG3eaJ+LiIiIdehuFVIoXbt2JTw8nIsXLwJw+PBhyxjepKQk3N3dAfjtt9/yvM/JyYmcnBzCw8OJjIykZs2aJCQkAHD27FkSExNvuN0ePXoQEhJiSSi/++47Vq5c+afqUlC87dq147fffuPs2bPAlbt33IqKFSuSnZ0NwNq1awHw8PCwlBsREWFZt1OnTvzyyy+kp6dft80b7XMRERGxDoeZV6/Zym3v4MGDlh+TnT59mk6dOlGhwpWLC9WrV8fT05PFixfzyy+/sG/fPiZNmoSLiwu1atVi0aJFHD9+nIiICA4dOoSzszNt27YlIyODL774ggMHDtCvXz/uuOMOtmzZQnBwMBcuXODYsWOcOnWKbt26MW/ePA4fPkx8fDzdunXD3t6eBg0akJWVxcqVKwkLCyM6Oprnn3/eEtdVQUFBfPPNN8TFxdGqVSs++OADIiMjqVGjBpGRkWzZsoW4uDi6dOmCt7d3vvF26dIFLy8v5s6dS0REBJmZmRw/fhw/Pz8OHjzIqlWriImJoVWrVqxbt47g4GBLb/Bnn31GTEwM3t7etGrVig0bNnDgwAHs7e3p2LEj1apVY9GiRURFReHg4EBISAgeHh507twZJycn3n33XcLDw8nMzOTAgQO4uLjQo0ePAve5iIiIWIedKeq1YZFybOfOnfTs2ROAHTt2sH79et57771SjkpERERKisYci1zjP//5D99//z0uLi6cPXuWyZMnl3ZIIiIiUoLUcywiIiIikks/yBMRERERyaXkWEREREQkl5JjEREREZFcSo5FRERERHIpORYRERERyaXkWEREREQkl5JjEREREZFcSo5FRERERHIpORYRERERyaXkWCTXI488QkxMTGmHISIiIqVIj48WyZWcnEyVKlWKvdwpU6ZQr149nn/++WIvW0RERIqXeo5FclkjMRYREZGypUJpByC2bc2aNSxdupQ2bdrg5ubGzz//TIsWLRg/fjzz588nIiKCESNGMGzYMAAyMzOZP38+Bw8exM7Ojm7dujFu3Djs7Ow4evQoAQEBpKenk5mZyaBBgxg8eDAA48ePZ+fOnTz//POEhoZy9OjRPOVe6/vvv+ett96iWrVqNG3alF9//ZWMjAzeeOMNWrVqBcDJkyeZOXMmGRkZ5OTkMGnSJNq3b295b82aNbnrrrv46aefSE5Opk+fPnz11VdMmzaNQYMG5Ynn559/5ujRo0yaNInLly+zbt06zp8/z8KFC2nYsOENt7dy5Up2795NxYoV2bt3L48++ihPPPEE4eHhzJkzBzs7OxwcHHjttddo0qRJnv3t6urKgQMHqFmzJqtWrSqZAy4i5YIttt0A27dvZ8WKFTg4OJCTk8MLL7zA3XffzZkzZ5g4cSJhYWHMmTOHoKAg9u/fz7Zt23BxcWHGjBmcP3+e7Oxsnn32We6///4blifypxiRmwgMDDQ9evQwycnJ5vLly6ZLly5m+vTpJicnx4SHh5u2bduazMxMY4wxixcvNn5+fiYrK8tkZGSYwYMHm6CgIGOMMaGhoSY0NNQYY0xGRobp27evOXHihGU7vXr1MjNmzDDGGBMWFpan3D9au3atufPOO82RI0eMMcZs3LjR9OrVy2RkZJisrCzTt29f8+WXXxpjjImIiDAdO3Y0KSkplvf6+PiYY8eOGWOMmTt3rjHGmOHDh5u1a9fmieeNN94wxhizfft207lzZ7Nt2zZjjDFvvPGGefXVV40x5qbbmzx5sgkMDLSUm5ycbDp16mT27NljjDHmhx9+MH369DHZ2dmW/d21a1eTmJhosrKyzLx58wp7qERELGyx7Q4KCjLnz583xhgTGxtrevbsaVkWGxtrmjVrZtavX2+MMWb58uUmPj7ePPPMM2bBggXGGGPi4+NNx44dTWxs7E3LE7lVGlYhheLj44ObmxtOTk54e3vTvHlz7OzsaN68OampqSQmJgKwfv16Bg4ciIODA46OjvTt25eNGzcC4O3tzVdffcWQIUMYOXIkCQkJHD58OM92unfvDnBdufm54447aN68OQAPPfQQZ8+eJTQ0lNDQUGJjYxkwYAAALVq0oHbt2uzYscPy3kaNGtGkSRMAJk+eXOA2unbtCkDTpk05d+4cXbp0scR36tQpgEJt71o//PADlSpVspR177338vvvvxMWFmZZp23btlSvXh0HBwdeeumlAuMTEbkRW2u7W7RowdSpUxk6dChTp07lt99+u27d3r17AzBy5EiMMfz44488/vjjANSqVYv27duzefPmQpcnUlQaViGF4urqanldoUIFy3SFCldOoczMTADi4uJYsWIF69atA+DSpUuWsbxz584lOTmZ1atX4+DggJ+fH+np6Xm2U7lyZQAqVqyYp9z8VK1a1fLawcEBNzc3EhISLPNGjhxpeZ2RkUFKSopl2s3NrUj1dnBwyBOfg4ODJbb4+Pibbu9acXFxXLhwAT8/P8u86tWrk5SUVOT4RERuxNbabn9/f4YNG8aoUaOAK8l0WlpannWubf/i4uKAK50YdnZ2AJw/f55mzZoVujyRolJyLMXK09MTf39/+vXrB0BOTg7JyckAHDp0iCeffNKSaN4o8S2Ma5PJrKwsUlJS8PDwsPR8XDtONzU1FXt761woqVOnTpG25+npSZ06dfKsf/HiRZycnKwSn4jIzZRE252YmMjp06ctvcyFKadOnToABAYGUr16dQAuX75MVlbWLZUnUhgaViHFauDAgWzatIns7GzgyqW6JUuWAODl5WUZOnD27FkiIyP/1Laio6MtZWzevJlatWrRtm1b2rRpg6enJ9u2bQOuJM7jxo0jOjr6T22vIDfbnqurK2lpaaSmpvLiiy/Sq1cvkpKSOHToEHAlkX7qqae4ePGiVeITEbmZkmi73d3dqVKliqWs3bt33/Q9tWvXxtfXlw0bNljmzZgxg5CQkFsqT6QwHGbOnDmztIMQ2/X111+zYsUKoqKicHFxYefOnXz33XdERETQqlUr3nzzTaKioggLC+OBBx6gU6dOREZGEhgYyMaNG0lKSmLq1Kk4Ojpy5513smbNGtatW0dkZCTp6ens27ePJk2asHDhQsLCwggPD8fX15fp06fnKdfZ2TlPXBEREZw/f56EhAQ++OADDhw4wLx58/D09MTe3p7u3buzaNEivvzyS9atW8eAAQPo0aMHP/30E/Pnz+fkyZMEBwdbxgnPnTuXXbt2ERERQf369Vm8eLElnk6dOvHSSy8RHx/P4cOH8fLyYu7cuZw8eZKkpCS6d+9e4Pbgyi3iPvzwQzZv3sxjjz3GXXfdRadOnZg3bx7r1q1j48aN+Pv707Jlyzz7OzIykgceeKDEj7mIlH222Hbb29vTuHFjFixYwK5duzDGsH//fsu648ePJz4+nr179+Lj42PpKfb19eXzzz9n1apVrF27ljZt2vDEE0/ctLw/fm6IFJYeAiJl0rp161i/fr1ucSYiIiLFSsMqRERERERyKTmWMuf7779n2bJlRERE8MYbb5R2OCIiIlKOaFiFiIiIiEgu9RyLiIiIiORSciwiIiIikssqDwFJSMj/yWAA1apV4vz5VGtsttgoxuKhGIuHYiwethCjh4f1nnx4o3a3PLKF41laVHfV/XbzZ+p+K+1uiT8hr0IFh5LeZJGVVIybo7blO79/4z43fa/2Y/FQjMVDMcqtKO9toLWo7rcn1b3kaFiFiIiIiEiuEu85FhERuZE/06MsIvJnKTm2QUX9YNAHiYiIiEjx0LAKEREREZFc6jkWEZEy4dqrZJXiK5J66TKgq2QiUrzUcywiIiIikkvJsYiIiIhILg2ruA3pB3wiIiIi+VNyXIYUNN5ORERERIqHhlWIiIiIiORSciwiIiIikkvDKkpAQWN8RURERMS2qOdYRERERCSXkmMRERERkVwaVlGOaTiHiIiISNEoORYRkTJN924XkeKkYRUiIiIiIrnUcywiIsXC1oZyqUdZRG6Feo5FRERERHIpORYRERERyaVhFbdIl+tEREREyh8lx7muTXYrxVck9dJlQMmuiIiIyO1EybHcVHH1kgftjsp3/mPdGxc5JhERERFrUHJ8E7b262sRERERsR4lx8XsdkqmNe5aveFyeyrr7ZzaLhG5ESXHIiIiKGkWkSuUHIvNKo+9suWxTiIiIuWJkmO5qSMnz+c7v4VXtRKO5MbySzyVdIrIn6UvtSK3FyXHcssKSpr7l+HPC30IipQfZeWLvYjYFiXHYvHOjjWlHUKxKyjZvcrVtSKXcu9pXZLbFZGyrziuVukLuYjtKbfJsX5YUbCCelOKy9XGvrCJZ1ETydJIPJXsipR/BbWNTRwKX0ZxJbvWTpqt3aYpuZeyrNwmxyIi8ueU9Vu2FcTaHQSlQV/gRYqPkuNyrDx+AJRXurQqcnPXtmmOjhXIzMwqxWhKR9DuKKsNBxORK5QclwNKgsuvon4Q2tqlWxERkbKmzCTHxTWGuLxeJiwLjmfvK9L6TRw6WCkSEbmW2kXru92GPRS1vgV9If9jOVc7C4rjC7w6B6QgZSY5Lo/+TI/v7XpJUW6sLDT2ZSFGKDtxlke2djWsKF/s9aVepOyzueS4qD0YZaHHw9Yaemu7+kHimFaBzOxbT+AL+kAq6MMnv/X1QXVritrr87dBbawUSfH1QH229Ui+w1OU7Frf7dQGFtcVsqK2f2Vdcdy1yNr/y0X9wlzcdy754xA7W6uvNbdpzc+Y/NgZY0yJblFERERExEbZl3YAIiIiIiK2QsmxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrms/hCQ9PR0nnjiCXx9fZk8eTJJSUm88847NGjQgOjoaF544QVq1qxp7TBuKCoqis2bN1OxYkX27dvH888/j5eXl03F+eGHH3L69GmqVatGTEwMb775Junp6aUaY0JCAgsWLODIkSOsXbsW4IbH98MPP+TixYskJyfTrVs3evfuXSoxzp49GxcXFypVqsSRI0eYNm0aHh4eNhXjVYsXL2blypWEhIRY5tlKjBkZGaxYsQIXFxeOHTtGtWrV+Mc//mFTMYaHh7Ns2TJat27NoUOHGDVqFO3atSu1GKXw/vrXv1KxYkUA7O3tWblypc21L8WpuNrTiIgIVq9eTf369UlMTGTy5MlUqGBzz/vKI7+6L1y4kL1791rWGTNmDN26dQPKT91PnjzJggULaNmyJXFxcbi7uzN+/Pjb4rgXVHebOe7GyubMmWNefvllM3fuXGOMMa+++qrZvHmzMcaY77//3kyaNMnaIdxQVlaW+dvf/mays7ONMcbEx8ebxMREm4rz7NmzpkOHDpYYx4wZYzZs2FDqMX7zzTfm+++/NwMHDrTMKyim0NBQ8+yzzxpjjMnIyDAPPPCAuXDhQqnEOH/+fMvrpUuXmtdff93mYjTGmODgYDNnzhzTsWNHyzxbivH99983e/futUxHRETYXIyjRo0y27ZtM8YYs23bNjNixIhSjVEKLzAw8Lp5tta+FKfiaE9zcnJM//79zdmzZ40xVz5/v/jiixKuSdHlV/f8jr8x5avuYWFhZvv27Zbpfv36mV9++eW2OO4F1d1WjrtVh1UEBQXRvn176tevb5m3c+dOS89N+/bt2blzpzVDuKlffvkFYwyrVq1i6dKl/PDDD1SrVs2m4nRxccHR0ZGLFy8CkJqaStOmTUs9xr59++Lq6ppnXkEx/fDDD7Rt2xYAR0dHGjduzL59RXvManHFeLV3E8AYQ6VKlWwuxt9//53NmzczfPjwPPNtKcZNmzZx6tQpPv74YxYsWGDpfbelGGvWrMm5c+cAOHfuHK1atSrVGKXwfv31V5YtW8bChQvZsWMHYHvtS3EqjvY0NjaW9PR0y/9iaX92FVZ+dQf44IMPWL58OcuWLSMtLQ0oX3X38fHh/vvvt0zn5OTg4uJyWxz3guoOtnHcrZYcHzt2jKioKPr06ZNnfmJiouWfoHLlyly4cIGsrCxrhXFTZ86cITQ0lEGDBvHcc8+xb98+1q9fb1NxVq5cmZdeeol//OMfTJkyhTp16uDl5WVTMV5VUEznzp3L0/hVrlzZkrSUluTkZP773/8yatQoAJuJMScnh/nz5/Piiy9et8xWYgQ4ffo0dnZ2jBgxgo4dOzJx4kSbi3HixIkEBQUREBDA+vXr6du3r83FKPn729/+xujRoxk7dixLlixh3759Zap9KQ5Fre+161+dn5iYWOJxF4e+ffvy9NNPM2rUKFxdXXnjjTeAgv93y3rdt2/fjq+vL02aNLntjvu1dbeV4261ASnbt2/HycmJZcuWceDAATIzM/n444+pUaMGly5dokqVKly8eJGqVauW6rgYV1dXGjdujJubGwB33303e/futak4IyIiWL58OevXr6dChQrMnTuXRYsW2VSMVxUUU/Xq1bl06ZJlvYsXL1K9evVSizMlJYVZs2Yxe/Zs3N3dAWwmxv/9739UqFCBNWvWcOHCBS5fvsyyZcvo06ePzcQIVxohHx8f4Mr/zf79+8nOzrapGP39/Xnttddo164dkZGRPPPMM/z44482FaPk7+q55eDgwD333ENISEiZaV+KS1Hre3X9a+fXqFGjNEL/05o2bWp53blzZ5YvXw4U3E6X5boHBwcTEhLCtGnTgNvruP+x7rZy3K3Wc+zv78/48eMZPXo0d999Nz4+PowYMYKePXty8OBBAH7++Wd69uxprRAKpU2bNiQlJZGdnQ1c6Ulu2LChTcUZHx+Pu7u7JfH18PAgIyPDpmK8qqCYevXqRWhoKABZWVkcP36cDh06lEqM586dY9asWbz88ss0aNCArVu32lSMd911F6+//jqjR49m6NChVKxYkdGjR9OwYUObiRGgS5cuxMbGAld6kb28vHBwcLCpGH/77TfL5bar/zdgO8da8nf8+HG+/PJLy3RMTAxeXl5lon0pTkWtb4MGDXB2diYhIeG695Q1AQEBltcxMTF4e3sD5a/uO3bs4L///S/Tp08nISGBgwcP3jbHPb+628pxtzPGmD9dyg1s3bqV1atXk5mZybBhw/D19eXtt9+mbt26xMbG8uKLL5b63Sq2b99OcHAw1apV47fffuPVV18lPT3dZuLMzs7mn//8JxUrVsTNzY2jR48ybdo0nJycSjXGvXv3EhQUxO7duxk6dCgjR4684X778MMPSU5O5sKFC/To0aNEfk2eX4xDhw4lKyvL0mPs6urKkiVLbCpGZ2dnYmJi+Pzzz/n3v//N6NGjGTFiBJUqVbKZGC9cuEBgYCBeXl4cP36c4cOHW3r7bCXG3bt3880339C8eXOOHTtGnz59eOCBB0otRimc+Ph4Xn/9dVq2bMnFixfJyspi6tSpJCcn21T7UpyKqz2NiIhg1apV1K1blwsXLtj8XQsg/7ovWrSItLQ0atSowa+//sqECRNo1KgRUH7qHh4ejp+fH61btwau/J5o2LBh3HfffeX+uBdU9xMnTtjEcbd6ciwiIiIiUlboISAiIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLFIEa9as4b777mPKlCmlHYqIiIhYgZJjkRtYuHBhnkR48ODBDBw4sBQjEhGRa/2xnRb5s5Qci4iIiIjksjPGmNIOQkrfmjVrWLp0KW3atMHNzY2ff/6ZFi1aMH78eObPn09ERAQjRoxg2LBhAGRmZjJ//nwOHjyInZ0d3bp1Y9y4cdjZ2XH06FECAgJIT08nMzOTQYMGMXjwYADGjx/Pzp07ef755wkNDeXo0aN5ys0vrrVr1+Li4oKzszMvv/wyLi4uTJw4kbCwMObMmUNQUBCJiYksWLCADRs2sGfPHmrUqMH7779PxYoVAdi1axeLFy/GwcEBZ2dnXnvtNby9vQGIjo7mjTfeID09nezsbMaOHUuPHj3YsmULb7/9NpcvX6Zx48Z07doVf39/Fi5cSFRUFG5ubvzyyy/UrFnTsq2ZM2eyadMmhg8fTlRUFJGRkTz44IO88MILljp9+OGHbNu2jQoVKnDnnXcyefJknJycOH78OLNmzQIgKyuLxx9/nEGDBvH7778zZcoULl++TFZWFr169WL06NFWOxdExHpsta29dOkSc+bM4fjx4wA0atSISZMmUb169QLbyEOHDvHqq6+SkpLCk08+yXfffUdWVhYLFixg2bJlHDx4kJYtWxIQEEBGRgajRo1i7969vPDCC4SEhBAfH8+AAQMs7dn+/fsJDAzEGENmZibPPvss999/vyXGjz76iG+//ZaKFSvi7OzMxIkTiYmJua6drl69ep59/Md2GmD37t28//77ODk54erqyqxZs6hduzZpaWlMmTKFxMREsrOz8fHxYerUqeTk5DBr1ix+/fVXHBwc8Pb2Zvr06VSqVMlq54qUIiOSKzAw0PTo0cMkJyeby5cvmy5dupjp06ebnJwcEx4ebtq2bWsyMzONMcYsXrzY+Pn5maysLJORkWEGDx5sgoKCjDHGhIaGmtDQUGOMMRkZGaZv377mxIkTlu306tXLzJgxwxhjTFhYWJ5yr3Xx4kXTsWNHc/nyZWOMMR9//LFZu3atMcaY2NhY06xZM7N161ZjjDH//Oc/Te/evc3p06dNTk6OefTRR82mTZuMMcacPHnStG3b1kRFRRljjAkKCjIPPvigyczMNJmZmebBBx+0lBsTE2PatWtnYmJiLPtk8uTJ1+0nX19fk5SUZLKzs83DDz9svv76a8vy4cOHm7/97W8mJyfHxMfHm5YtW5q4uDhjjDEbNmwwffv2NampqSYnJ8dMmDDBLFq0yBhjzIQJE8zmzZuNMcacPXvWjBo1yhhjTEBAgFm6dKkxxphLly6ZIUOGFPaQiogNsrW21hhjXnnlFTNlyhRjjDHZ2dnmueeeM8HBwTdtI4ODg02rVq3MwYMHjTHG+Pv7m4EDB1rq1rlzZ8syY4xp1qyZmTdvnjHGmPPnz5tu3bqZ3bt3G2OM2bFjh4mOjjbGGJOSkmJ8fX1NcnKyMcaYjRs3mv79+5vU1FRjjDEffvihCQwMtOzPm7XT/fv3t7TTVz8Tjh8/bowx5tNPPzVPP/205fVrr71mjDEmKyvLDBo0yBLb1TbZGGPGjh1rYmNj892XUvZpWIXk4ePjg5ubG05OTnh7e9O8eXPs7Oxo3rw5qampJCYmArB+/XoGDhyIg4MDjo6O9O3bl40bNwLg7e3NV199xZAhQxg5ciQJCQkcPnw4z3a6d+8OcF2513JwcAAgKCiItLQ0hg0bxsMPP5xnna5duwLQrFkzqlSpQt26dbGzs6Np06bExsYCsGnTJu666y4aNWoEwMMPP8yZM2c4ePAgYWFhnDp1ikcffRQALy8v2rRpY6lLQdq0aUPVqlWxt7enadOmnDp1Ks9yX19f7OzsqFWrFu7u7pw+fdqy3/r374+Liwt2dnY8/PDDbNiwAYCqVavy7bffcurUKTw8PFi4cCEA7u7u7N69m6NHj1KpUiU++uijG8YmIrbPltranJwcgoKCGDRoEAD29vZMmTKFO+64o1BtpKurK23btgWgadOm1KtXz1K3hg0bWtriq/r37w9cadt69OjB5s2bLe997733GDJkCP7+/iQlJXHixAkA1q1bR9++fXFxcQHgr3/9Kw8++OAN93FB7fSmTZto3bo1jRs3Bq58Jvz000+cPXsWd3d3Dhw4QGhoKA4ODnz66acAVKlShV9//ZUff/yRnJwc5s+fT926dW+4fSm7KpR2AGJbXF1dLa8rVKhgma5Q4cqpkpmZCUBcXBwrVqxg3bp1wJVLclWqVAFg7ty5JCcns3r1ahwcHPDz8yM9PT3PdipXrgxgucR1tdxrOTs78+mnn7J06VIWLFjAvffea7nM98dyHBwcrov92livfY+DgwNVqlQhLi7O8vpq/QCqV69OfHz8DffT1e0CODk5XRf/tcsrVqyYJ5avv/6akJAQAC5fvoy9/ZXvqNOmTeOjjz7i6aefplatWkyYMIEuXbowatQoXFxc+Mc//oGDgwNjxoyhX79+N4xPRGybLbW1586dIyMjI0872bBhQwBCQkJu2kYWVJer03/c5tX44UqC/OuvvwIwefJkmjVrxvz58wG47777SEtLs+yHa+Nzc3PDzc3turrkV/er9b92nx4/fhw/Pz/L8nr16pGYmEj//v3Jyspi9uzZJCUlMWLECJ588knatWvHG2+8wb/+9S+mTZvG4MGDee655264fSm7lBzLLfH09MTf39+SpOXk5JCcnAzAoUOHePLJJy09v/k1xoWRmZlJjRo1ePvtt0lJSWHKlCkEBAQQEBBQ5Fiv9j4AZGdnk5ycTJ06dXBwcCA5OZmsrCxL43/u3DlLL3Nx8/T0pGvXrjz77LOWeefOnQMgOTmZsWPH4u/vz4YNG/D392fPnj1cvHgRPz8//Pz82LNnD8899xytWrXCy8vLKjGKBgGNgwAAIABJREFUiO0oiba2evXqODk5ce7cOZo0aQJAfHw89vb21KlTp9jbyAsXLlC/fn0Azp8/j4eHh6U+I0eOtKx3bX08PT0tbSVAamoqcXFxlt7fovD09KR169YsW7YsT0yVK1fm3LlzPPTQQwwYMIDDhw/zzDPP0LhxY1q1akXHjh3p2bMnJ0+e5Nlnn6V27dr85S9/KfL2xfZpWIXckoEDB7Jp0yays7OBK5f+lixZAly57BYWFgbA2bNniYyMvKVtxMfH8+qrrwJXegnuvPNOy/aKon///oSHhxMTEwPAli1bqFu3Lu3ataNNmzZ4eXmxadMmAGJjYwkLC7NcQnR1dSUtLQ1jDOPGjbulelxr4MCBfPvtt1y+fBm40iszY8YMAKZOncrvv/+OnZ0dHTp0ICsrCzs7O8uPdODKpVhHR0eMfkcrclsoibbW3t6exx57zNI7nZOTw/Tp00lISLhpG3krtm7dClxJjHft2mUZZnFtfY4cOUJCQoLlPVfbzqs9yStXrmT37t1A0dvp/v3/j707DYjiStuHf0EDymKCIO6CSiIqERSNGPd11BjX/Ccug0bjjJoEMybRUYMKMeO4xO0BEwc0cTJqMmgQxGWikUR0VHBBUBCJCorg3kIQEJrlvB9s6hWlsaG3arh+n6SsrrrP4XDX3adOdY9CUlKStNxNqVTCz88P5eXl2LlzJ2JjYwE8Wa738ssvo7y8HD///DPCw8OlOJs1a4by8vJa9wHJmyIoKCjI1EGQ6e3btw/btm1Deno6bG1tERsbiyNHjiA1NRWenp5YsWIF0tPTkZSUhGHDhsHX1xdpaWkIDg5GdHQ0cnNzsXjxYlhbW6NTp04IDw/Hnj17kJaWhqKiIpw5cwbu7u4ICQlBUlISkpOT0bdvXwQEBFQ6bsOGDaWYrKyscPr0aWzduhWRkZG4f/8+AgICUFZWBn9/f9y9exeXLl2Cq6srVq1ahczMTBQVFeH69evYvXs3rly5AicnJ7z++uvw9PTEP/7xD0RGRuLq1av48ssv4eTkBEtLS/Tr1w9btmxBeHg4Dh06hCVLlsDb2xvAkxmV7du3Izo6Gn369EF2dnalfrp06VKlc+3duxexsbFSv4WGhuLs2bNITk7G66+/jjfeeAP5+flYs2YN9u/fj9TUVHz++eews7NDeXk5vvzyS0RHR2Pv3r1YuHAhOnfuDCsrK+mTOMLDwzF9+nQMGDDAVEOFiHQgx1wLAL6+vvjf//6HrVu3IiIiAsOHD8eQIUOqzZFXr17F0qVLkZ2djTt37qC8vBxhYWFVts3d3R1t2rTBpk2bMHz4cISEhOC7777DxIkTpbXOHh4e2LJlCw4dOoR79+7h1q1bOHfunJQ7i4qKsHbtWkRFRUGhUOCjjz6CpaVljfP066+/jk6dOuHvf/879u7di0OHDiEgIACtWrVCgwYNEBoaisjISOzcuRODBg3CxIkT0aBBA/zwww/YvXs3du7cCXd3d8yePVuatae6hR/lRkREREbh4eGBmJgYaVkFkRxxWQURERERkRqLYyIiIjIolUolfTrEJ5988sJPBCIyJS6rICIiIiJS48wxEREREZGaQT7n+P79R4Y4bK00bmyHnJxCU4ehV3WxTQDbZU7qYpsAw7fLxaX6Ly3QhZzy7ouY6/gxx7jNMWbAPOM2x5gB84y7JjHXJu/W+ZljK6u69zErdbFNANtlTupim4C62y65Mdd+Nse4zTFmwDzjNseYAfOM29Ax8xvy6qED6Yer3D6q/R+MHAkRke6Y04hIn1gcm5GKC4Dd3QYoLCiWtvMCQERERKQfLI6JiMgkOONLRHJU59ccExERERFpi8UxEREREZEal1XUYZpuWRIRERFR1VgcExGRrHAtMhGZEpdVEBERERGpsTgmIiIiIlLjsgoyOX5+MxEZApdnEFFtcOaYiIiIiEiNM8dERGQW+Ak8RGQMnDkmIiIiIlLjzDG9UE3X7Wna/3JmTpXbO7o2rl1gRERERHrG4pgkNb1lyVucREREVNdwWQURERERkRpnjomIyKCevcv07Mc2GpshP+KNHx9HZP5YHNcBplreILc1xFX1Ay9IRKQtFrZEBLA4JiKiGuLzBkRUl7E4JrOz7mh4ldv5qRdERESkKxbH9EKalk/o+/jW1lYoKSmVto9qX7vjPK2mxyAikoOo4+lVbh/Xj0mNyNBYHJNsaZohJiIiIjIUFsdEREQmommGmIhMh8WxEfAJaNPhrUkiMiZNy9C4xIvIfLA4liE+Ca4/18rOaPifml2pWGQT1V8VOfnZz2fmBAdR3cTimOolfRW7LJqJiIjqFhbHJiS3GWJDfyoFEZE50vSJOpcztX9o2NBrizUd/y8TvPVyHL7hp/qExTEREZHM6GtJ2PeHLqOgiq/qZrFLpBmL41riQ3Z1E58cJyIiqt9YHNdhmpZJ8JvkNNM8W1M1d8XrBoqEiAzFFLlRU25hDiGSHxbHdYDc1grLLR4iIrmq6RtyfTHVGmgu5yBzwOK4HmLxKj9cpkP0YrwbZjosdqk+YXFMZESaLjDWrYwcCJGM6esNPCcCDE9fM9DPHsfevgEKCoo1Ft9VnZeFOukLi2MiIqqS3D5ukmpejKY+jkNJWelz20211pkPPZM5YHFMREQE855p1rR22dqyZpd5uT04qI9iuqZLQuS2hKSmfcAZdN2xOFbT15pPzrTUL5ouJFHH//9/V9werG5/ZFa9uSSb3+RHhmfovPVs0fnsl2no+/hUP2nKc9rkaV2Ozzxa98iuODb0g0k1vQjo46Kh78St7wtLfWToJ8SfPr71Y6sqb2vqYt3Rqr+ZS1+zO5rGfUn2KwAqF/w1ZS4XEj4kSXKkr9xlquNoylHPHqcib9Ykp5nqkz80MfcZX1O8GZDLGxALIYQw6hmJiIiIiGTK0tQBEBERERHJBYtjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpCa7LwHR1smTJ3H48GE4OzvDwsIC/v7+lf6/uLgYq1evRrNmzXD9+nXMmjUL7dq1AwAkJibixIkTsLS0RHx8PFauXIkWLVqYohnP0aVdK1euhJWVFcrLy1FUVISlS5fC0tL0739e1CYAOHjwINavX4+AgAAMGjRI2r53716kpqbC0tISrq6umDRpkjFDr1Zt23XhwgV899136Ny5MzIyMuDl5YV33nnH2OFrpMvvCwCUSiXGjRuH2bNnw8/Pz1hhV0uXNsk5X8iRLjls8ODBaNWqFQCgadOmWLdunSxiBuSZo3SJW659HRYWhgcPHqBJkyZISUnBRx99BHd3dwDy7uvq4pZrXx88eBAxMTHo2LEjLl68iHHjxmHw4MEA5N3X1cWtt74WZqiwsFAMHTpUFBcXCyGE8Pf3FydPnqy0T2hoqAgLCxNCCHH58mUxefJkIYQQjx49Ev7+/tJ+mZmZoqCgwEiRV0+XdiUmJorRo0dL+40ePVqcPXvWSJFrpk2bMjMzxalTp4Sfn5/45ZdfpO23b98WY8aMEeXl5UIIISZMmCAyMjKMFnt1dGnXkSNHRFJSkhBCCJVKJXr06CGUSqXxgq+GLu0SQoiysjIREBAg5syZI7Zv3260uKujS5vknC/kSJccJoQQwcHBxgtWzVxzlK5/q3Lt6w0bNkj9eeDAATF79mwhhPz7WlPcQsi3ryMiIkR2drYQQoiUlBQxbNgwIYT8+1pT3ELor69NP61YC4mJiWjZsiVsbGwAAD4+Pjh69GilfY4ePYpu3boBADw8PHD58mXk5+cjNjYWdnZ22LZtGzZt2oSUlBTY2dkZuwlV0qVdjo6OKCwsRGlpKUpLS2FhYYHWrVsbuwnP0aZNbdq0Qa9evZ577fHjx+Hp6QkLCwsAQLdu3XDs2DGDx6wNXdo1ZMgQeHl5ST8rFApYW1sbNF5t6dIuANiyZQv++Mc/4uWXXzZ0qFrTpU1yzhdypEsOA4AzZ85gy5Yt2LhxIxISEmQTsxxzlK5/q3Lt63nz5kn9WV5eLv29yb2vNcUNyLevJ0yYgJYtWwIAbty4Ic10y72vNcUN6K+vzXJZhVKphL29vfSzg4MDlEqlVvtkZ2cjKSkJf//736FQKDBt2jQ4OjpqTCDGpEu73Nzc8M477+Cvf/0rLC0t0bt3bzg5ORktdk20aZMmDx8+rPRae3t7rV9raLq062k7d+7EnDlz0KhRI32GV2u6tCsuLg4NGzaEt7c3fvjhB0OFWGO6tEnO+UKOdMlhDg4OmD9/Pry8vPD48WOMHz8eoaGhcHNzM3nMmpgyR+mag+Te1yqVCpGRkQgMDARgPn39bNyAvPu6qKgIISEhOH36NNauXQvAPPq6qrgB/fW1Wc4cOzs7o6CgQPo5Pz8fzs7OWu3j4OCAzp07w9raGpaWlujatSvOnDljtNiro0u7YmJiEB8fj6+++gohISHIysrCrl27jBa7Jtq0SRMnJ6dKry0oKND6tYamS7sq7Nu3D4WFhZg+fbqeo6s9XdoVExOD4uJihIWF4bfffsOJEycQERFhqFC1pkub5Jwv5EiXHAZAuqNia2uLTp06GWWWzVxzlK45SM59rVKpEBQUhI8//hiurq4AzKOvq4obkHdfN2zYEAsWLMDatWsxbdo0lJSUmEVfVxU3oL++NsviuGvXrrh16xZUKhUAICEhAQMHDkRubq50e27gwIE4f/48ACAtLQ0dO3aEg4MDfH19kZ2dLR3r1q1baNu2rdHbUBVd2nXnzh24uLhIx3JxcZGOY0ratEmTfv36ISUlBUIIAMD58+fRv39/g8esDV3aBQC7d++GUqnEBx98gLS0NGRkZBg6ZK3o0q6AgADMmjULs2bNQocOHdCnTx+8/fbbxgi7Wrq0Sc75Qo50yWGnTp2qdOv2xo0baNOmjSxi1sSUOUqXuOXc10VFRQgMDMSMGTPw2muv4dChQwDk39ea4pZzX3/zzTdSfzZv3hw5OTkoLi6WfV9riluffa0ICgoK0q0pxmdtbQ13d3ds27YNiYmJaNq0Kd5++20EBwfjypUr6N69Ozw9PfHTTz/h0qVLiI2Nxd/+9jc0btwYTk5OUKlU+Omnn3D69Gk0aNAAM2bMkNbWmGu7XnnlFcTExCAlJQWnT5/GgwcP4O/vb/K1rNq0SQiBzZs3Iz4+HgUFBbC1tYWbmxscHBxgZ2eHiIgInDx5En369EHfvn1N2p4KurTryJEjCAoKQn5+PiIjIxEdHY3u3bvLYo24Lu2q8OOPP+LXX3/Fw4cP0ahRI5MXk7q0Sc75Qo50yWGFhYX417/+hczMTBw8eBDe3t4YM2aMLGKWY47SJW459/W8efOQkpKCs2fPIjIyEnFxcZg4caLs+1pT3HLu6/j4eBw4cAC//fYb9uzZg4kTJ8LHx0f2fa0pbn32tYWoKL+JiIiIiOo5s1xWQURERERkCCyOiYiIiIjUWBwTEREREamxOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqLI6JiIiIiNRYHBMRERERqbE4JiIiIiJSY3FMRERERKTG4piIiIiISI3FMRERERGRGotjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpMbimIiIiIhIjcUxEREREZEai2MiIiIiIjUWx0REREREaiyOiYiIiIjUWBwTGVhWVhbefPNNU4dBRETPYH6mqlgIIYSpgyCq6/Ly8vDSSy8BAPbs2YPIyEhs377dxFEREdHT+flFFi1ahFatWmHu3LkGjopMiTPHREagbeIlIiLjYn6mZ3HmmF4oPDwcoaGh8Pb2RqNGjZCQkICOHTvC398f69evR2pqKqZPn44//elPAICSkhKsX78e58+fh4WFBfr06YMPP/wQFhYWuHLlClavXo2ioiKUlJRgwoQJmDhxIgDA398fsbGxmDt3LhITE3HlypVKx31WQUEBVq5ciWvXrgEA2rVrh/nz58PJyQnXr1/HF198gaKiIpSVleGDDz5A//79ceHCBSxduhSPHj3ClClTcPToUfz+++8IDg5Gu3btAACZmZlYvny5FGPfvn0xd+5cFBQUICgoCFlZWbCwsMCrr76KpUuX4t69e5gxYwYePnyISZMm4dNPP8WaNWsQERGBmTNn4sSJE4iLi0NMTAxu3bqFZcuW4cGDB+jUqRM6dOiAa9eu4dSpUxg4cCBCQ0MRFRWFtWvXwsfHB8HBwUb4DRORvjFv6pY3Z82ahaioKHz//fewsbFBs2bN8Pnnn8PBwaFSe1QqFWbOnInTp0/jk08+QXx8PO7evYuxY8di1qxZz/UtAHTr1g2ffPIJrK2t8e6770r52dLSEvPmzUNSUhJWrVqFvXv34vbt21i5ciV8fHzw3XffISwsDA0aNECrVq0wZswYvPXWW1i0aBGUSiXKysrg5eWFxYsX639AkXEJIi0EBweL/v37i7y8PFFcXCzeeOMNERAQIMrLy0VycrLo2rWrKCkpEUII8fXXX4upU6eK0tJSoVKpxMSJE0VUVJQQQojExESRmJgohBBCpVKJESNGiIyMDOk8gwYNEoGBgUIIIZKSkiod91lLliwRixYtEkIIUVZWJmbPni3i4uJESUmJGD58uIiIiBBCCHHjxg3RrVs3cePGDSGEEHFxccLT01OcOXNGCCFEYGCgWLp0qRBCiNLSUjFy5EixZ88eIYQQeXl5ol+/fkIIIXJycqR2CCHEwoULxa5du4QQQly+fFn4+PiIx48fCyGEUCqVUmxCCNGhQwdx8+ZNIYQQERERws/PT/o/lUolevbsKc6dOydtmz17tigvL3/Rr4WIZIx5s/Z58+zZs6Jnz55CqVQKIYRYtWqV+OyzzzT2dYcOHcSaNWukc/bp00ccP35cCCHEpk2bxLvvvitKS0tFaWmpeO+998SmTZsqvbYiP9+8eVN06NBBHDhwQAghRFhYmHjvvfcqxR8cHCz9vGPHDrFs2TKpHyZMmKAxRjIfXFZBWvPy8kKjRo1gY2MDNzc3eHh4wMLCAh4eHigsLIRSqQQAREZGYvz48VAoFLC2tsaIESMQHR0NAHBzc8OPP/6ISZMm4b333sP9+/dx6dKlSufp168fADx33KeVl5cjKioKEyZMAABYWlpi0aJFeOWVV5CUlISsrCyMGTMGAODq6gpvb28pBgCws7NDjx49pPNkZWUBABITE5GZmYnRo0cDABo1aoQNGzYAAF5++WXcunULkydPxtSpU3H69GmkpKRIx2jdujWOHDkCADh48KDWD3lYW1tj1KhRiIqKAgBcunRJ6lsiMm/Mm7XLm5GRkRg8eDCcnJwAAKNHj8a+ffsgqrnZPWrUKACAo6Mj+vfvjwMHDgAA9u7di3HjxkGhUEChUGDs2LHYs2dPtb+3/v37P9fOqjg6OuLcuXNITEyEQqHAjh07qj0umQcrUwdA5sPe3l76t5WVlfSzldWTYVRSUgIAuHPnDrZt2yYln4KCAmlN16pVq5CXl4edO3dCoVBg6tSpKCoqqnSeittmDRo0qHTcpz18+BAqlUpKnADQtm1bAEB8fDxeeuklKS4AcHJywt27d587R8V5Ks5x9+7d517bvXt3AE+SdXh4OKKiouDo6IiQkBBkZ2dL+40bNw5RUVF46623cPLkSUyePFlTVz5n3Lhx+POf/4wlS5Zg7969eOedd7R+LRHJF/Nm7fLmnTt3cO3aNUydOhUAUFpaiiZNmiAnJ6dS/E97eu2wo6MjfvvtN+lYjRs31tiuqjzdn1X1ZYVRo0ahtLQU//jHP5Cbm4vp06djypQp1R6b5I/FMeldixYt8P7772PkyJEAnsxW5OXlAQAuXLiAKVOmQKFQAKg6gWvDyckJNjY2ePjwIdzd3QE8SdCWlpZo3rw58vLyUFpaKiXrhw8fSmvjqlPVa69du4ZWrVrhwoUL8PLygqOjI4Anyfppo0ePxvr16xEXF4e2bdtKbdSGl5cXnJ2d8fPPP+P69etSm4iofmDerJw3W7RogTZt2iAwMFDa9+HDhxoLYwD4/fff0bp1awBATk4OXFxcpGPl5ORUOk6zZs1e2C5tPHz4EG+++SbGjh2LS5cuYcaMGWjfvj169eqll+OTaXBZBend+PHjsX//fpSVlQF4MnPwz3/+E8CTW3VJSUkAgHv37iEtLa1W57C0tMS4ceOkWZby8nIEBATg/v378Pb2hqurK/bv3w8AuHnzJpKSkqTbhdV59rW5ubmYN28eFAoF3NzccPnyZahUKpSWluLUqVOVXtukSRP06tULCxYswNixYzWew97eHo8fPwYAzJ07V7pYjB07FitXrkSfPn1q3iFEZNaYNyvnzfHjxyM2Nha///47ACA9PR3vv/9+tXEcOnQIwJPC+NixY9Iyi/HjxyM6OhplZWUoLy9HdHS0tLSkpiryd2FhIT799FPs3LkTsbGxAIAOHTrg5ZdfRnl5ea2OTfKhCAoKCjJ1ECRv+/btw7Zt25Ceng5bW1vExsbiyJEjSE1NhaenJ1asWIH09HQkJSVh2LBh8PX1RVpaGoKDgxEdHY3c3FwsXrwY1tbW6NSpE8LDw7Fnzx6kpaWhqKgIZ86cgbu7O0JCQpCUlITk5GT07dsXAQEBlY7bsGHDSnH5+vrif//7H7Zu3YqIiAgMHz4cQ4YMgaWlJfr164ctW7YgPDwchw4dwpIlS+Dt7Y2rV69i6dKlyM7Oxp07d+Ds7IxVq1YhMzMTubm56Nevn/TaXbt2Yf/+/Vi4cCFcXV3h4eGBc+fOYfPmzTh79ixsbW1x+vRpWFpaolu3bgCe3CpNTU2Fv7+/FOe7776LrKwsJCUloX///mjbti0iIyMRFRWF9u3bY+DAgQCAVq1aYcuWLfjHP/4BOzs7o/1+iUj/mDd1y5stWrSAo6MjVqxYgX379uHEiRNYvnx5peURT9u0aROGDx+OkJAQfPfdd5g4caJUAHt7e+PatWvYtGkTIiIi4OnpCX9/fygUiufy87x583D37l1cunQJPj4+WLRoEbKzs3H9+nUMGzYML730ErZu3YoDBw5g3Lhx6NKlC0JDQxEZGYmdO3di0KBB0ieJkPniR7kRyUReXh4+++wzbNq0ydShEBGZFQ8PD8TExEjLKoh0wWUVRCYWGxuLhw8fYv/+/fwaUyIiIhNjcUxkYpmZmZgyZQqOHTuGP/zhD6YOh4jIbKhUKukTLT755JMXfgoFkTa4rIKIiIiISI0zx0REREREagb5nOP79x/V+rWNG9shJ6dQj9HUDeyX57FPqsZ+eZ5c+sTFpZHBjl3X8i5j0g5j0g5j0o7cYtJHPLXJu7KbObay0v6LE+oT9svz2CdVY788j31SPTn2D2PSDmPSDmPSjtxiMlU8Zv8NeQfSD9do/1Ht+cATEZEcaMrfzNNEZEqymzkmIiIiIjIVs585rinOVBARERGRJpw5JiIiIiJSY3FMRERERKRW75ZVEBGRcdX0wWkiIlOSXXG8K3k/CguKn9vONcFEREREZGhcVkFEREREpCa7mWMiIqrfDqQfht3dBlXeRawK7ywSkT5x5piIiIiISI3FMRERERGRGotjIiIiIiI1s1lzzI8CIiIiIiJDM5vi2ND4tdJERERExOKYiIj0gnf4iKgu4JpjIiIiIiI1FsdERERERGpcVvECXItMREREVH9w5piIiIiISI0zx0REVCN88I6I6jLOHBMRERERqXHmuJa4FpmI6opdyftRWFD83Hbms7or6nh6ldvH9Wtv5EiI5IfFMRERVclclk9wsoKI9InFMekdZySIiIjIXHHNMRERERGRGmeO6wBDz9RyJpiIzBGXWxBRbbA4NgImaCIiIiLzwOKYiIioFqq6q1bTO2pyuzMnt3iITIHFsZ6Zy9PdRET1Fe/mEVF1WBybkKYEfTkz57lt1tZWcC3vVqPjcwaAiEh7+iiaTZV35XRee/sGGObTyqDnJTIkFsdUZ+jjFicRkbaulZ15bpu74nUTRGJ4mopvorqIxbEJVTVDbAxMckRE5unZ/G1v3wAFVXy7oab9iejFWBxTrekr6db0OJwNJiIiIkNhcfwCmmZ3O7o2rtH++lDVLbzq1PXbey+aMXl6X21pKry5fpuo/qpYi2x3twEKX5BzNKlvM7g1zZmGzLFRx9OrvF4wf5MmdbY4rmlRS4ZXFy8O5l40m3v8VL/UdPJBX/m+4rzW1lYoKSmtdl9NkxiaJis07R91vAYBmhFDXgfkdo15UTzPFuzMu/JRZ4tj0qymybs+kVtyJSLjkdOdP+bpJ+SUkzmZUH/Uu+LYVA/BmQN9zXjUt+StD+aSdGsSp7m0SRNzj98cVJePtZmlNcR5zRXzcc2Zy9+43OKsD58MZSGEEKYOgoiIiIhIDixNHQARERERkVywOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqJvsSkJMnT+Lw4cNwdnaGhYUF/P39K/1/cXExVq9ejWbNmuH69euYNWsW2rVrZ6JojeNFfQIABw8exPr16xEQEIBBgwaZIErje1G/hIWF4cGDB2jSpAlSUlLw0Ucfwd3d3UTRGseL+uTgwYOIiYlBx44dcfHiRYwbNw6DBw82UbTGo83fEABER0djwYIFSEhIgL29vZGjNDxd8uvevXuRmpoKS0tLuLq6YtKkSQCArKwsfP3113Bzc0N2djYWLlxYo76rbUwXLlzAd999h86dOyMjIwNeXl545513AADLli1DRkaGdIwlS5bAw8PD4H00ePBgtGrVCgDQtGlTrFu3zqR9FB8fj+XLl8PJyQkAoFQqMXLkSMydO1enPtImJkDzdclUY0lTTIYaS7rEBJhuPGmKyZTjqbrruaHGU5WECRRC0f9OAAAgAElEQVQWFoqhQ4eK4uJiIYQQ/v7+4uTJk5X2CQ0NFWFhYUIIIS5fviwmT55s9DiNSZs+yczMFKdOnRJ+fn7il19+MUWYRqdNv2zYsEGUl5cLIYQ4cOCAmD17ttHjNCZt+iQiIkJkZ2cLIYRISUkRw4YNM3qcxqZNvwghxNWrV8X69etFhw4dRH5+vrHDNDhd8uvt27fFmDFjpL+nCRMmiIyMDCGEEO+9955ISkoSQgjx73//W2zYsMEoMR05ckQ6r0qlEj169BBKpVIIIURwcLDWMegrnurOa6o+Sk9PFykpKdJ+ixcvFllZWdXGqq+YNF2XTDmWNMVkiLGka0zVndtU/WTK8aTpem6o8aSJSZZVJCYmomXLlrCxsQEA+Pj44OjRo5X2OXr0KLp16wYA8PDwwOXLl5Gfn2/sUI1Gmz5p06YNevXqZYLoTEebfpk3bx4sLCwAAOXl5bCzszN2mEalTZ9MmDABLVu2BADcuHGjzs+kA9r1y+PHj7F161Z8+OGHJojQOHTJr8ePH4enp6f099StWzccO3YMJSUliI+PR5cuXaRjxsbGGiWmIUOGwMvLS9pPoVDA2toaAFBQUIDNmzcjLCwMO3bsQGmpdl83res16MyZM9iyZQs2btyIhIQEADBpH7Vr1w6dO3cGADx48AAqlUqaiaxtH2kbk6brkinHkqaYDDGWdI0JMN140hSTKceTpuu5ocaTJiZZVqFUKitNeTs4OECpVGq1j4ODg9HiNCZt+qQ+qkm/qFQqREZGIjAw0FjhmYS2fVJUVISQkBCcPn0aa9euNWaIJqFNv2zYsAEffPCBlJzrIl3y68OHDyttt7e3h1KpRE5ODho2bChdmGqan/SV83fu3Ik5c+agUaNGAIDRo0fDw8MDVlZWWLNmDUJDQ7V646NrPPPnz4eXlxceP36M8ePHIzQ0FLa2trLoo++//1663QzUvo+0jUkTU44lbehrLOkjJlONJ22Yajw9ez031HjSxCQzx87OzigoKJB+zs/Ph7Ozc433qUvqW3u1pW2/qFQqBAUF4eOPP4arq6sxQzQ6bfukYcOGWLBgAdauXYtp06ahpKTEmGEa3Yv65fbt28jLy8N///tfhIWFAQC2bduGixcvGj1WQ9Ilvzo5OVXaXlBQAGdnZzRu3BhFRUUQQmg8pqFiqrBv3z4UFhZi+vTp0jZPT09YWT2Z4+nVqxfi4uKMEk/F7KOtrS06deqEhIQEWfSRSqVCcnIyevToIW2rbR9pG5MmphxLL6LPsaSPmEw1nl7EVOOpquu5ocaTJiYpjrt27Ypbt25BpVIBABISEjBw4EDk5uZKt60GDhyI8+fPAwDS0tLQsWPHOjtrDGjXJ/WRNv1SVFSEwMBAzJgxA6+99hoOHTpkypANTps++eabb6Rk0bx5c+Tk5KC4uNhkMRvDi/qlRYsWWLVqFWbNmoVZs2YBAGbMmCHdjqsrdMmv/fr1Q0pKijR2zp8/j/79+8Pa2hq+vr7SG4mEhAQMGDDAKDEBwO7du6FUKvHBBx8gLS1NeiBo9erV0jlu3LgBNzc3g8dz6tQpHDt2rNJ527RpY/I+Ap4UfaNGjap03Nr2kbYxaWLKsVQdfY8lXWMy5Xh6EVOMJ03Xc0ONJ00UQUFBQTofpYasra3h7u6Obdu2ITExEU2bNsXbb7+N4OBgXLlyBd27d4enpyd++uknXLp0CbGxsfjb3/6Gxo0bGztUo9GmT4QQ2Lx5M+Lj41FQUABbW9saDUxzpE2/zJs3DykpKTh79iwiIyMRFxeHiRMnmjp0g9GmT+Lj43HgwAH89ttv2LNnDyZOnAgfHx9Th25Q2vQL8OT23LZt2xAfHw+FQoF27drVqTfeuuRXBwcH2NnZISIiAidPnkSfPn3Qt29fAED37t3x7bff4rfffsONGzfw17/+VevlKbrEdOTIEQQFBSE/Px+RkZGIjo5G9+7d0bp1a+zfvx+JiYlISEjA5cuXsWDBAq2eOdAlnsLCQvzrX/9CZmYmDh48CG9vb4wZM8akfVRhw4YNmDdvHhQKhbSttn2kbUyarkumHEuaYjLEWNI1JlOOpxfVFKYYT5qu54YaT5pYiIoynIiIiIionuOXgBARERERqbE4JiIiIiJSY3FMRERERKTG4piIiIiISI3FMRERERGRGotjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpMbimIiIiIhIjcUxEREREZEai2MiIiIiIjUWx0REREREaiyOiYiIiIjUWBwTEREREamxOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqLI6JiIiIiNRYHBMRERERqbE4JiIiIiJSY3FMRERUh6xduxYTJkzA6NGjERMTY7I4YmJiMGLECEydOlWr/b/44gv06NEDe/bsAQAUFxdjwIABePz4sSHDrJHRo0fjxo0bpg6DDIzFMZGBTZ06VUr2RESGlJmZiZ07d+I///kPQkJCYGtrq9fj79mzR+tid8iQIZg1a5bWx166dCk6deok/dygQQPs27dP723Qxc6dO+Hm5mbqMMjArEwdABEREenHnTt30LhxY9jY2KBt27Zo27atqUPSyUsvvWTqECqRWzxkGCyOqVrh4eEIDQ2Ft7c3GjVqhISEBHTs2BH+/v5Yv349UlNTMX36dPzpT38CAJSUlGD9+vU4f/48LCws0KdPH3z44YewsLDAlStXsHr1ahQVFaGkpAQTJkzAxIkTAQD+/v6IjY3F3LlzkZiYiCtXrlQ6blVxRUREwNbWFg0bNsTf/vY3HDhwAKGhoXj11Vfxf//3fygqKsLHH38MIQRmzZqF0NBQNGnSBF5eXoiLi0Pjxo3xxRdfYMOGDbh48SKGDx+Ojz/+GCqVCjNnzsTp06exbNky/PLLL7h58yZWrFiBixcv4vDhwxBCYPPmzXBycgIAJCcnY+XKlbCwsIBCocCyZcvg7u6OdevWITU1Fffv30dkZCRmzpyJo0ePYv/+/fDz88PVq1dx7tw5WFhY4PHjx2jevDm++OILtG/fHjNnzsTvv/+OzZs3w8PDwzi/cCLSihxz44ULF7BixQrcv38fU6dOxdChQ5GRkfFcvpkyZQpGjBih8ZwAEB0djR07dqBhw4YAgPfffx8KhQJhYWF48OABpk6dig4dOmDp0qUIDw9HZGQkbGxsYGFhgaVLl+KVV17Rqh8vXryIwMBANGjQAF26dIEQQvq/Tz/9FIcPH8bWrVvRrVs3veXlZ393Fy9eRJMmTbBp0yY0aNAADx48wKJFi1BcXIzS0lIMGjQIs2bNwqpVq/Djjz/is88+w4QJEwAAUVFR+P7772FtbQ0nJycEBgZKx/rhhx8wfPhw5OXl4fLly/D09MTq1atrMdrI6ATRCwQHB4v+/fuLvLw8UVxcLN544w0REBAgysvLRXJysujatasoKSkRQgjx9ddfi6lTp4rS0lKhUqnExIkTRVRUlBBCiMTERJGYmCiEEEKlUokRI0aIjIwM6TyDBg0SgYGBQgghkpKSKh33afn5+aJnz56iuLhYCCHEv/71LxERESGEEGL27Nli8+bN0r5ffPGFyMzMFEIIERERIbp27Sqys7NFeXm5GDt2rPjLX/4iiouLxYMHD4Snp6e4e/eu9NoOHTqIb7/9VgghxLZt20T//v3F+fPnpfOEhoYKIYTIy8sTvr6+4uTJk0IIIX799Vfxhz/8QZSVlQkhhPDz85Piq+Dn5ydmzJghSktLxdWrV8WuXbvEF198IZYsWSLt880334hTp05p90siIqOTW24UQoi4uDgxaNCgStuqyjfVnfPcuXOid+/eQqlUCiGE+O9//ysWLlwohHiSR/38/Cod/4cffpDycVxcnJg8ebL0f1XtX6G4uFj0799f7Nu3TwghxKVLl8Rrr71WKV8OGjRIxMXFST/rKy8HBweLvn37itzcXFFWViZGjRolxbF69WrpOAUFBWLSpEmV+rIivjNnzohevXpJ/fTVV1+JadOmSfsuXLhQjB07VhQXF4uioiLRs2dPkZCQUGVfkLxwzTFpxcvLC40aNYKNjQ3c3Nzg4eEBCwsLeHh4oLCwEEqlEgAQGRmJ8ePHQ6FQwNraGiNGjEB0dDQAwM3NDT/++CMmTZqE9957D/fv38elS5cqnadfv34A8Nxxn6ZQKAA8ecf++PFj/OlPf8Jbb70FABg3bhz27t0L4MlMza1bt9CmTRvpte3atUPLli1hYWGBV155Be3bt4eNjQ2cnZ3h5OSErKysSufq3bs3AKBDhw4oKipC165dpfhu3rwJAPj1119hZ2eHN954AwAwcOBAPHjwAElJSdX26YABA6BQKODu7o4//vGPGDt2LH766ScUFxcDAOLj49GzZ89qj0FEpiWn3FidZ/NNdefcs2cP+vfvL83ADh06FJMnT9Z47FdeeQVz5szBlClTsG7dOqSkpGgVU2JiIpRKJUaOHAkA6NSpk1bLQPSVl729vfHyyy/D0tISr776qpT/HR0dcfz4cVy5cgV2dnb49ttvq4wjKioKAwcOlPrp7bffRlxcHG7duiXt4+vrCxsbGzRo0ABubm7PXWNInrisgrRib28v/dvKykr62crqyRAqKSkB8GS927Zt26QH0AoKCqQ1WqtWrUJeXh527twJhUKBqVOnoqioqNJ5HBwcADx5EOPp4z6tYcOG2LFjB0JDQ7Fx40YMHDgQ8+fPh5OTEwYPHoxly5bhwoULuHfvnnRBeVE7Kn5+9nwV/69QKDTue+fOHfz++++VHlJxcnJCbm5uVV0padSoUaWfu3TpgqZNmyImJgbt27fHq6++CktLvn8lkjM55cbqPJtvqjvnnTt3Ki3lsrKygre3d5XHffToEWbPno0VK1ZgxIgRyMrKwpAhQ7SK6f79+3jppZekCQ/gSWH6IvrKyxV9Cjzp14rXzpw5E7a2tvj444+hUCgwZ84cqYB/2rP91LhxY2l7y5Ytqz0HyRuLY9KrFi1a4P3335cSSXl5OfLy8gA8WQ83ZcoUKRHWNkmUlJTA2dkZa9euxaNHj7Bo0SKsXr0aq1evho2NDUaMGIGoqCj8/vvvWLp0qX4aVo0WLVqgefPm2L59u7QtPz8fNjY2NT7W2LFjsXfvXrRv315a00ZE5s8YubEmqjtnixYt8PDhQ+nn0tJSXL16FR07dnzuOBkZGcjPz5cmIkpLS7WOwcXFBXl5eSgtLZXeTLxoUkFbuuRlpVKJqVOnYurUqTh58iRmz54NT09PuLq6PneOp/spJycHANC8eXO9tIFMh9NSpFfjx4/H/v37UVZWBuDJrcR//vOfAABXV1fplta9e/eQlpZWq3PcvXtXKnobNWqETp06SecDniytOHDgAIQQWs1C6GrQoEHIzc3FhQsXAACFhYWYNm0a8vPzATyZ5Xj8+DGuX7/+wocxxowZgxMnTuC3337Dq6++avDYicg4jJEba6K6c44fPx7Hjh2TCr+DBw9KM94V+QwA5s6di5YtW8LKykrKf8ePH9c6hq5du8LZ2RkHDx4EAKSmpuLatWu6Nw4vzsvVqXigEniybMba2rrSg4IVnu2nyMhI9OrVS5o1JvOlCAoKCjJ1ECRf+/btw7Zt25Ceng5bW1vExsbiyJEjSE1NhaenJ1asWIH09HQkJSVh2LBh8PX1RVpaGoKDgxEdHY3c3FwsXrwY1tbW6NSpE8LDw7Fnzx6kpaWhqKgIZ86cgbu7O0JCQpCUlITk5GT07dsXAQEBlY5b8cQ08OTW2enTp7F161ZERkbi/v37CAgIkG4btmjRArt378a7774Ld3d3AMCpU6ewfv16ZGZmoqioCNevX8fu3btx5coVtGzZEtu3b8fZs2eRnJyM119/HfPnz8fNmzeRlJSE7t27Y8mSJcjOzsadO3dQXl6OsLAwpKenw9LSEr6+vvD19cWaNWuwZ88eREdH4/3330fnzp0BPLmVtnnzZsTGxmLKlCn4/vvvERsbi9TUVJSUlKBbt25S2xwcHHD69GkMGDBAWkdHRPIjx9x44cIFLF++HNnZ2Th16hQ8PT3x7bffVplvqjtnz5490aRJE6xatQrR0dG4ffs2PvvsM9jY2MDFxQWRkZGIiopC+/btMWLECDg5OeHLL7/EyZMnYWFhgaSkJJw/fx4ODg4ICQlBZmYm7t69iwEDBlTqQ4VCge7du2Pjxo2IjIzErVu3YGNjgxMnTqBFixb46quvcPHiRSQnJ6NLly5YtGiRXvLys7+7S5cuSdcDJycneHp6YuPGjdi7dy/Cw8Mxffp0DBgwAKtWrcKxY8eQmpqK1q1b44033kCTJk2wcuVKREVF4dGjR1i5ciXs7Oywbds2REVFSdeY//73v9L4cHd3r/QsDMmPhajq7RCRmZs5cyY2b95cq6UNpvbpp59i0aJFcHFxMXUoRERE9Q7XHFOdcf36deTk5KBRo0ZwdXU1q8I4NzcXiYmJ8Pb2RklJCQtjIiIiE2FxTHXGo0eP8Omnn8LFxQXr1q0zdTg1olKpEBQUBGdnZwQGBpo6HCIionqLyyqIiIiIiNT4aRVERERERGoGWVZx//6jKrc3bmyHnJxCQ5xSJ3KMizFpT45xyTEmQJ5xyTEmwDBxubg0evFOtaQp72pLrr8HQ6uv7Qbqb9vra7uB+tn22uRdo84cW1kpXryTCcgxLsakPTnGJceYAHnGJceYAPnGZSj1rb0V6mu7gfrb9vrabqB+t70m+EBeHXYg/XCV20e1/4ORIyEiMj7mQCKqDa45JiIiIiJSY3FMRERERKTGZRUk4S1IIiIiqu84c0xEREREpMaZYyIiIvDuGRE9wZljIiIiIiI1FsdERERERGosjomIiIiI1LjmuB7StK6OiEjOuCaYiIyBxXEdYKpilxcqIiIiqmtYHJPRsJgmIiIiueOaYyIiIiIiNc4ck95xTTMRERGZKxbHJFtchkFE5oZ5i8j8sTiWoQPph2F3twEKC4orba+ryTXqeHqV261bGTkQIqoXni5gq8q11e1PRHUfi2MzIrcEfTkzp8rtHV0b1+g418rOVH0c1Ow4RERERLpicUxERGRgXG5BZD5YHBMRkazI7S4ZEdUvLI5NyFwuABVxarM2z1xoWuc8rl97I0dCRHKhr6ViNcEZZSL5YXFMsqXpQjWK9SsREREZCItjkphi1oSIiIhITlgcExGRWTOXJWpEZB5YHBMRkazI7S6WpniqwjttROaPxTG9UMWFwdraCiUlpVrvbyjPzhJV96AgH2ohkq91R8NNHYJs8UE9ItNhcVxLNbmNx2RmPvgpFkRERPUbi2Mj0Nd6uJreapTbrUl9ebZdFTPa5t4uIqoec6BmnGkm0h8Wx0REZFCa7siQ4bFoJqo5Fsd1gKHX+BIR1SVP50xtn6UgovqDxTHVaZoe+HFXvF6j43AtMhEZQs0nN/ixdUSGxuKY6qVrZWeq3F7TotnQWJQT0dPq0zpqIlNhcUx1hpyWl3x/6DIKqvh4ORa1VJdxbbH5FK9Rx9Nhb9/guTylKUfVdO0y39iTOWNxTERUz+mrUAJeqXKrpjs19Ymh37zXtCi/VnYG1o+tUFJWeb31gfSrNTqvvpau1VRVY5APGZK+sDiuh+Q0w0pEdQeLYPlhvieqORbHMnQ5M4dPUJsJTTNoNvCscntNbzVqLjb0c2uSH/NE1dE0Xq1bGTmQF2ABaHiG7mMuwyA5qbPFsb4u+vr4Ag8mbvNR05mvjtDPOkJ9fVGMJhUXnmdvnWsqcmp6oapqf17UiIjIHMmuOK5pUVvToqKq/e3uNkBhFQ9PAebzcAWZhqbx4fWSfo6jieYZPU3rBateC6r5vIabsX72b7Di709fs9X6moHiQ5Wa8Q1/3aWv362miYao4zU7jqY1zVUZVcM/Td45I01kVxwTEZE8cA0xGYs+xpqmN8bPHvtFyxZLsrU7ToVPB06scruh72DXdNJQX0V/TR6GrOkEZkl21RM5xp6UsBBCCKOekYiIiIhIpixNHQARERERkVywOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIitVp/zvHJkydx+PBhODs7w8LCAv7+/s/tc/DgQaxfvx4BAQEYNGiQtH3v3r1ITU2FpaUlXF1dMWnSJABAVlYWvv76a7i5uSE7OxsLFy6Evb29UeK6cOECvvvuO3Tu3BkZGRnw8vLCO++8AwBYtmwZMjIypNcvWbIEHh4eBo8JAAYPHoxWrZ58jVnTpk2xbt06AKbtq/j4eCxfvhxOTk4AAKVSiZEjR2Lu3LkG76uwsDA8ePAATZo0QUpKCj766CO4u7sDMO240hSXKcdVdX1lynGlKS5TjquDBw8iJiYGHTt2xMWLFzFu3DgMHjwYgGHHlTG8qO3FxcVYvXo1mjVrhuvXr2PWrFlo164dAM3jxFzo0vbExEScOHEClpaWiI+Px8qVK9GiRQtTNKPGatvu6v4GzYUuv/OVK1fCysoK5eXlKCoqwtKlS2FpaR5ziLq0e+3atbC2tkZxcTFcXFwwY8YMUzRBXkQtFBYWiqFDh4ri4mIhhBD+/v7i5MmTlfbJzMwUp06dEn5+fuKXX36Rtt++fVuMGTNGlJeXCyGEmDBhgsjIyBBCCPHee++JpKQkIYQQ//73v8WGDRuMFteRI0ekc6tUKtGjRw+hVCqFEEIEBwfXKA59xVTduU3ZV+np6SIlJUX6efHixSIrK6vaePUV04YNG6Sxc+DAATF79mwhhOnHlaa4TDmuNMVU3blN2VemHFcREREiOztbCCFESkqKGDZsmBDCsOPKGLRpe2hoqAgLCxNCCHH58mUxefJk6f906XdT06Xtjx49Ev7+/tJ+mZmZoqCgwEiR60aXdlf3N2gOdGl7YmKiGD16tLTf6NGjxdmzZ40UuW50affPP/8s5syZI+03fvx4kZycbKTI5atWb4kSExPRsmVL2NjYAAB8fHxw9OjRSvu0adMGvXr1eu61x48fh6enJywsLAAA3bp1w7Fjx1BSUoL4+Hh06dJFOmZsbKzR4hoyZAi8vLyknxUKBaytrQEABQUF2Lx5M8LCwrBjxw6Ulmr+Zh19xgQAZ86cwZYtW7Bx40YkJCQAgMn7ql27dujcuTMA4MGDB1CpVNLskqH7at68edLYKS8vh52dHQDTjytNcZlyXGmKCTDtuNIUlynH1YQJE9CyZUsAwI0bN6QZdkOOK2PQpu1Hjx5Ft27dAAAeHh64fPky8vPzAVQ9TsyFLm2PjY2FnZ0dtm3bhk2bNiElJaXS34+c6dLu6v4GzYEubXd0dERhYSFKS0tRWloKCwsLtG7d2thNqBVd2n39+nUp9wFA69atERcXZ7TY5apWyyqUSmWl24cODg5QKpVavfbhw4eVXmtvbw+lUomcnBw0bNhQugjV5Jj6iOtpO3fuxJw5c9CoUSMAwOjRo+Hh4QErKyusWbMGoaGh+PDDD40S0/z58+Hl5YXHjx9j/PjxCA0Nha2trWz66vvvv5duMwPG6yuVSoXIyEgEBgYCkM+4ejaup5lqXFUVkxzGVXV9ZYpxVVRUhJCQEJw+fRpr164FYNhxZQzatF3TPg4ODlWOEzc3N6PFrwtd2p6dnY2kpCT8/e9/h0KhwLRp0+Do6KhxEkNOdP2dV3j2b9Ac6NJ2Nzc3vPPOO/jrX/8KS0tL9O7dW1peIne6tNvHxwerVq1CeXk5ysrKcPnyZWlyoD6r1cyxs7MzCgoKpJ/z8/Ph7Oys1WudnJwqvbagoADOzs5o3LgxioqKINTfZl2TY+ojrgr79u1DYWEhpk+fLm3z9PSEldWT9xG9evWq0bsqXWOqmHW0tbVFp06dkJCQIJu+UqlUSE5ORo8ePaRtxugrlUqFoKAgfPzxx3B1dQUgj3FVVVwVTDWuNMVk6nFVXV+Zalw1bNgQCxYswNq1azFt2jSUlJQYdFwZgzZtr26fqsaJudCl7Q4ODujcuTOsra1haWmJrl274syZM0aLXRe6/s6Bqv8GzYEubY+JiUF8fDy++uorhISEICsrC7t27TJa7LrQpd0+Pj6YMWMGvvrqK+zcuRNdunSpNJNcX9WqOO7atStu3boFlUoFAEhISMDAgQORm5sr3Y7TpF+/fkhJSZEuKufPn0f//v1hbW0NX19fXLx4UTrmgAEDjBYXAOzevRtKpRIffPAB0tLSpAeAVq9eLe1z48aNGs2c6BLTqVOncOzYsUrnbtOmjSz6CnhS8I0aNarSNkP3VVFREQIDAzFjxgy89tprOHToEADTjytNcQGmG1eaYjL1uKqurwDTjKtvvvlGGjvNmzdHTk4OiouLDTqujEGbtg8cOBDnz58HAKSlpaFjx45wcHDQOE7MhS5t9/X1RXZ2tnSsW7duoW3btkZvQ23o0u4KVf0NmgNd2n7nzh24uLhIx3JxcZGOI3e6tLu4uBgdOnTA3LlzMX36dOTm5mLo0KEma4tcWIiKrF9DJ06cwKFDh9C4cWNYW1vD398fa9asgaOjI2bNmgUhBDZv3owff/wR3bt3x5gxY9CvXz8AT57+Tk5OhkKhQNu2bSs9/f3VV1+hTZs2uH37NhYtWlTjp79rG9eRI0ewcOFCab1Vbm4ulixZAl9fXyxatAhNmjRBw4YNkZGRgcWLF6NJkyYGjyktLQ2bNm2Cp6cn7t27h2bNmmH27Nkm76sKf/nLX/D1119La2gBGLyv/P39ceXKFTRt2hQAUFhYiIiICACmHVea4jLluNIUk6nHVXW/Q8A042rz5s24e/cuWrZsiWvXrsHHxwcTJ04EYNhxZQwvantRURFWr14NFxcXZGZmYvbs2WjXrl2148Rc1LbtwJNlUNnZ2bC2tkZRUREWLlxoNp9coEu7gar/BnKUr48AACAASURBVM1FbdteWFiIZcuWoVWrVrC0tERWVhY+//xzs1lrXtt2P3z4EB999BF8fX1RWlqK3r17w9fX19TNMblaF8dERERERHWNebwNJiIiIiIyAhbHRERERERqLI6JiIiIiNRYHBMRERERqbE4JiIiIiJSY3FMRERERKTG4piIiIiISI3FMRERERGRGotjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpMbimIiIiIhIjcUxEREREZEai2MiIiIiIjUWx0REREREaiyOiYiIiIjUWBwTEREREamxOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqLI5Jdj766CN06dIF8fHxAICsrCy8+eabJo6KiIiI6gMWxyQ7wcHBcHFxkX5u3bo1/vOf/5gwIiIi+Zk6dSr27Nmj83GysrLg4eGhh4iI6gYWx2QWXnrpJVOHQERERPWAlakDoLolPDwcoaGh8Pb2hr29Pc6dO4cmTZrgn//8J4KCgpCVlQULCwu8+uqrWLp0KaysngzB2NhYrF27Fo6OjujTp0+lY7777ruIi4tDTEwMbt68ic8//xwuLi7Yvn07fv75Z6xcuRI9e/bEqlWrAACbNm3C8ePHYWNjA2dnZ3z22Wdo2rTpc7EqlUoEBgYiJycHZWVl+POf/4yhQ4fiwoULWLp0KR49eoQpU6bgyJEjOH/+PJYvX15l27Zv345jx47h66+/hkKhQMOGDbFs2TK4ublp7I/t27cb/pdBRHXWunXrkJqaivv37yMyMhIzZ87EwIEDcfz4cWzatAk2Njawt7fH559/jmbNmmHy5MlISEjAW2+9hU8//RQzZsyAtbU11q5di2XLlgF4MhMNAB988EG1eTYmJgZffvklmjRpgi5duuDUqVPIy8vDL7/8guTkZKxcuRIWFhZQKBRYtmwZ3N3dq2yDpn015U13d3fs378ffn5+uHr1Ks6dO4cpU6bA398f33zzDQ4fPgyFQoG2bdsiICAADg4OCAoKqvI1c+fONdrvisyQINKz4OBg0bt3b6FUKkVpaalYs2aNyMnJEVFRUdI+CxcuFLt27RJCCKFUKkXXrl3FuXPnhBBCHDlyRHTu3FnExcVJ+3fo0EHcvHlTCCFERESE8PPzq3S+hQsXCiGEuHLlihg5cqQoLy8XQgixYsWKSsd52owZM8TGjRuFEELcvXtX9OzZUzpHXFyc8PT0FCdOnBBCCLFq1SqNbcvMzBRdu3YV6enpQgghoqKixPDhw0VJSYnG1xAR6crPz09ERERIP1fkomvXrgkhhNixY4d49913pf//85//LFasWCFKSkrE3LlzRUFBgRBCiJs3b4oOHTpUOnZ1ebbi/728vMTVq1eFEE9yZF5envD19RUnT54UQgjx66+/ij/84Q+irKzsudhftK+mvOnn5ydmzJghSktLxdWrV8WuXbtEZGSkePPNN0VhYaEQQojPPvtMLF68uFI/PfsaoupwWQUZRNeuXeHk5ASFQoEFCxbg5Zdfxq1btzB58mRMnToVp0+fRkpKCoAns8bOzs7w8fEBAAwZMgQNGjSo1Xnt7e3x4MEDHD58GCUlJZg/fz66d+/+3H53797FiRMn8P/+3/8DADRt2hQ+Pj44cOCAtI+trS169+4NAFi4cKHGtu3fvx9dunRBu3btAABvvfUWbt26hfPnz2t8DRGRvu3fvx+vvfYa2rdvD+BJLjp16hTu3bsHAFi+fDkiIyMxf/58TJ48GXZ2djqdr127dtKs8MKFC/Hrr7/Czs4Ob7zxBgBg4MCBePDgAZKSkp57rTb7asqbAwYMgEKhgLu7O/74xz9i7969GDlyJGxtbQEAEyZMQHR0NEpLSzW+hqg6XFZBBtGoUaNKP0dGRiI8PBxRUVFwdHRESEgIsv8/9u49oOb7/wP4s073Uiqkchn2FaJcZhgmZsNyGb62L8uYO2uYr2mEYfu6/th+uU2Y+W62mW8qsrmOmG5Yl7W5VVLxLekilS6n8/794fT5iXNSdDqXno+/Op/r6/25vM6r93mfz7l1CwCQnZ0Ne3v7Kss3btz4mfbr7OyMHTt2YOfOnVi1ahVGjBiBefPmScM3KmVmZgJ4mNCNjIwAAHl5eWjfvr3aNqibnpmZCQcHB+m1TCaDra2ttI/qtkVEVFcyMzORnJwsDY8AAFdXV+Tk5KBZs2ZwdnbG5MmT8eOPP2L9+vXPvT9VufDevXtV9u/g4ID8/HyVsT5t2WfNwQ4ODigvL0dOTg6cnJyq3RaRKiyOqV4kJCTAw8NDKnof/Y++adOmyM3NrbK8qmRaydTUFGVlZdLrgoIC6e8HDx7gxRdfxLZt25CdnY0PP/wQO3fuxNy5c6tso3nz5gAePhmjMqmWlpZWiaumnJ2dcePGDel1RUUFCgoKpH0QEdUHZ2dndO7cGYGBgdK0e/fuwcbGBsDDHHf16lW0a9cOmzdvxj//+U+126ouz1a3/+bNm1f5TkVhYSHMzMyea9ma7PfR95Dc3FyYmpqiSZMmtd4WEcCnVVA9ad26Na5cuYKysjLI5XJERkZK8wYMGIDc3FxcunQJAHDy5EkUFxer3VaLFi2QmpqKsrIylJaWSs9DBh4W4QEBAQAeFt1t2rRBRUXFE9twcnJCv379EBoaKk379NNPq2yrpry9vZGYmIibN28CAH7++We4uLigW7dutd4WEVFNWVtb48GDB0hNTcW6devg7e2N+Ph46VO5nJwc+Pj4QKFQAAC2bt2KadOm4bPPPsMPP/yAP/74Q9oO8LBzITAwEHFxcdXmWXUGDhyI/Px8JCQkAACKi4vx3nvvobCw8LmWfZrRo0fj6NGjKCkpAQCEhIRg5MiRkMlktd4WEQDIVqxYsULbQZDhOHz4MPbs2YOUlBRcvXoVr7/+OgDAzc0Nly5dwvbt23Hx4kVYWloiJiYGxsbGeOWVV9CxY0esXr0aR44cgUwmw3//+1/ExMTA3d0dixYtQkZGBuLj4/Hqq6/ixRdfxLVr17B161YkJCSgbdu2+PXXX1FWVoa+ffvil19+wb59+/DTTz/BxMQEH3/8scoxzP369cOPP/6Ib7/9FkFBQfD09MS4ceOQlJSEZcuW4datW4iMjMSgQYNgYWGhtm12dnZwd3fH6tWrERwcjKSkJGzYsAEODg5q1yEiel7m5ubYvn07wsPDMWHCBLi7u6Njx474/PPPERoaimPHjsHf3x+urq745z//iZ9//hmdOnVCWVkZTp06hV9++QWNGzdG9+7dkZqait27dyM7OxuTJ09Gy5Yt1eZZuVyOTZs2IS0tDVFRURg1ahQAwMzMDL169cL69etx8OBBHDp0CLNnz0anTp2eiL26ZdXlzfXr1yM8PByXL19GeXm51AHh5uaGBw8eYOPGjTh48CBsbW3h7+8PMzMztesQVcdICCG0HQQRERERkS7gsAoiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESlp5DnH2dn3pb/t7a2Ql6f+sVz6iu3SH4bYJoDt0jf29lYwMdHco6UezbuGylCvjdricXiIx4HHoFJ1x6Fp09r/AIzGe441+WagTWyX/jDENgFsl74x1HbVJx7Dh3gcHuJx4DGoVNfHQe9/Ie9IynGV073bvlHPkRARkSb9lBiG4qLSJ6Yz3xNRXeKYYyIiIiIiJb3vOSYioobhj6S7KC+XPzHdu60WgiEig6U3xbG64RNERERERHWFwyqIiIiIiJT0pueYiIj0U22/OM1PColIm9hzTERERESkxOKYiIiIiEiJwyqIiKhOcDgEERkC9hwTERERESmx55iIiLRC0z3N/AVVInoW7DkmIiIiIlJizzERETUo7FEmouqw55iIiIiISMlge47ZM0BEpNuupOXVanlTU9VvWRy7TER1yWCLYyIiotrgo+iICGiAxTF7AIiIGgZ1PdMdWtnXcyREpE8aXHFMREQ1w84EImqI+IU8IiIiIiIl9hwTEVGDwuEWRFQdFsdERFQr/OKa5nFIC5H2cFgFEREREZESe46JiEijavs8Y12jLn7vtvUcCBHVC50rjvlxHRGRfjLUIrg2OByCSP9xWAURERERkZLO9RwTEREZGvYoE+kPFsdKIedSVE5/qz8HlRER0ZM4DJDIMLE4VkquuKBmDotjIjJsLPKejaoxynxWMpH+Y3FMRES1omtfvNO1eIhIv7E4JiIi0hJ1Q/pMXes5ECKSGGxxrK2fB1WV6DhumYiIiEg/GGxxXFfU/Vc/fYxnPUdCREQN3ZGU47DKMkdxUWmV6XzqBVHd0fviWFtjzZaFfIPycvkT09vJej4xjU/CICIiXcJHyxGpp/fFcW3pwxc3WEwTEeknXRrSB/B9g+hZ6E1xrA9FbW2pS2Z1tR11SZFJlIiofql7D2snq93ytS2ytfWYPvZMkz7Tm+JYW9Q9/9jUuP4PXV0V05rG4pvIMBhip4SuUf+M/bpRV0X2xjP762Q7tcUim7SBxbEBq20xHXIuBdbW5ih67Ise6rDYJSIiIkPD4riOaboXoDZUfTmwLml6WIg6tSnK1RX8LOz1h7Y+iWhIn4Cwh1j/1fa9R/051+wwDHU9wbW9Br1V3Ia6litMXZNUTq9tr3dD6j3XlbYaCSFEve6RiIiIiEhHGWs7ACIiIiIiXcHimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESnV2Y+ARERE4Pjx43B0dISRkRF8fX2rzC8tLcW6devg5OSE1NRUzJgxA23atKmr3WvE09p08OBB/PjjjzA3NwcAjB07Fm+99ZY2Qq2V7OxsfPnll7hy5QqCgoKemK+P5wp4erv08XylpaXhyy+/RKdOnZCZmYnGjRsbxL1Vk3bp2/lSKBSYNWsWPDw8UF5ejvT0dKxevRoWFhbSMvp4rnTB0+7thqAm90xDUJP7rKEoKSnBuHHj0K9fP/j5+Wk7HK14++23pfcIY2Nj7N27t242LOpAcXGxGDx4sCgtLRVCCOHr6ysiIiKqLLNjxw4RGBgohBDiypUrYvz48XWxa42pSZuCgoJEenq6NsJ7Lr/88os4deqUGD16tMr5+nauKj2tXfp4vuLj48WJEyek18OGDRN//PFHlWX08XzVpF36dr4qKirE1q1bpdezZs0SoaGhVZbRx3OlC552bzcENblnGoKa3GcNxZo1a8SiRYvE2rVrtR2K1gQEBGhku3UyrCIuLg4uLi4wMzMDAHTv3h1nzpypssyZM2fQrVs3AICbmxuuXLmCwsLCuti9RtSkTQCwb98+7N69G1u2bEF+fn49R/lshg4dCmtra7Xz9e1cVXpauwD9O18eHh4YPHiw9FqhUMDS0rLKMvp4vmrSLkC/zpexsTHmzJkDAJDL5cjKynqiV1gfz5UuqMm9behqes8YuprcZw1BSEgIunfvjhYtWmg7FK26du0aAgMDsXnzZpU12rOqk2EVOTk5VRKXjY0NcnJyarSMjY1NXYRQ52rSpp49e8LLywsODg4IDw/HvHnz6q5LX4v07VzVlL6frxMnTqBfv35o165dlen6fr7UtUtfz9e5c+fwzTffwMvLC126dKkyT9/PFekGdfdMQ1LdfWbokpKSkJKSggULFuDq1avaDkerpk+fDg8PD1RUVODdd9+FtbU1evbs+dzbrZOeY0dHRxQVFUmvCwsL4ejoWOtldElN4m3ZsiUcHBwAAL1798aFCxdQUVFRr3Fqgr6dq5rS5/MVFRWF6OhoLFmy5Il5+ny+qmuXvp6v/v37Y/fu3cjIyMC+ffuqzNPnc0W6obp7piGp7j4zdCdOnICZmRkCAwNx6dIlJCQk4JtvvtF2WFrh4eEBAJDJZHjppZcQHR1dJ9utk+K4a9euuH37NsrKygAAv//+O7y8vJCfny99ZOjl5YXY2FgAwNWrV9GhQwed7i2pSZs2btwIuVwOAEhNTUWLFi0gk8m0FvPz0OdzVR1DOF9nzpzBb7/9Bn9/f2RnZyM2NtYgztfT2qVv5yspKanKx3otWrRARkaGQZwr0g2q7pmGRt191pDMnj0bvr6+mDFjBnr06AEPDw9MnjxZ22HVu+TkZBw4cEB6ffPmTbRq1apOti1bsWLFiufdiKmpKdq1a4c9e/YgLi4OzZo1w9ixYxEQEIDr16+jR48ecHd3x9GjR/HXX38hPDwcixYtgr29fR00QTNq0qbr16/j4MGDuHbtGo4fP46FCxeiefPm2g79qWJiYhAaGorLly+jpKQEXbp0wbZt2/T2XFV6Wrv08XwlJiZi5syZEEIgODgYISEhaNmyJU6cOKHX56sm7dK381VYWIidO3ciNTUVkZGRSE5Oxrx587B79269Ple6QNW9bWJSZw9b0gvq7pmOHTtqO7R6pe4+a4hj0o8dO4ajR4/i9u3bsLCwQPv27bUdUr0qLy/H3r17kZqaitOnT8PKygpTp06FkZHRc2/bSAgh6iBGIiIiIiK9xx8BISIiIiJSYnFMRERERKTE4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkxOKYiIiIiEiJxTERERERkRKLYyIiIiIiJRbHRERERERKLI6JiIiIiJRYHBMRERERKbE4JiIiIiJSYnFM9S4jIwNvvvmmtsMgIiIiegKLY6p3LVq0wI8//qiRbQ8aNAjR0dEa2TYREREZPhbHpBW2trbaDoGIiIjoCUZCCKHtIEh3bdmyBT/88AO8vLyQl5eHrKwsODo6Yu3atXBwcAAAnDt3Dlu2bIGZmRmsra2xcuVKODk5SesOHToU+fn5iI2Nxcsvv4z//ve/iIqKwqlTp2BsbIz58+cjPj4ea9asQUhICHJycvDll18iNDQUERERcHR0xJYtW2Bubl7t/hYvXoywsDC0bdsWtra28PPzQ+fOnRESEoLvv/8eZmZmcHJywsqVK2FjY4MVK1YgLCwMPj4+SEpKwqVLlzBhwgR8+OGHVY5BeXk5Nm3ahNjYWBgZGaFv37744IMPUF5ejqlTpyImJgbLly/H6dOnER0djblz5yIsLAz379/HhAkTcPLkScTGxuLq1atITU3FZ599hpKSElRUVGDOnDl49dVXkZCQgGXLlqlch4joWWzduhVnz56Fubk5LC0tsWrVKjg5OeHUqVPYsGEDmjRpgi5duiAyMhIFBQX49ddfkZiYiDVr1sDIyAgymQzLly9Hu3btqt2eKmlpaVixYgXKysqgUCiwcOFCdO/eXe2+x4wZo/L9Yu3atVIONzU1hYODAz799FM0adJE7XvM2rVr6/MwkyESRE/h5+cnBg8eLO7fvy+EEGLp0qViwYIFQggh0tLSRNeuXUVycrIQQojvvvtOTJo0qcq6w4cPF8XFxaKgoEBs3bpVCCFE+/btRXp6uhBCiPT0dNG+fXtx7NgxIYQQn3/+uXjttdfErVu3hEKhECNHjhRhYWE12t/AgQNFVFSU9PrixYvi5ZdfFjk5OUIIIdauXSuWLFkizffx8RHvv/++kMvlIikpSfz0009PtH/btm1i4sSJQi6Xi7KyMvHOO++IkJAQaX779u3F5s2bhRBCHDp0SCQmJoqoqCjh7u4uzp8/L+23vLxcDBkyRAQFBQkhhLh586bo1q2buHnzphBCqFyHiOhZ/fvf/xYKhUIIIURQUJBYuHChNC8oKEh4eHiIpKQkIcTDfFNQUCB69eolIiIihBBCnD59WrzxxhuioqLiqdt7lFwuF0OHDhUHDhwQQghx+fJl8fLLL0vvIar2LYTq94sLFy6I3r17Szl869at4r333pP2pe49huh5cFgF1ciAAQNgY2MDABg1ahSOHTuGiooKhIWFoXPnzmjbti0AYPjw4YiMjMSdO3ekdfv06QNLS0s0atQIc+bMUbuPV155BQDQvn172NrawsXFBUZGRvjb3/6G9PR0AKjR/h4VHByMQYMGSb3cI0aMwOHDhyEe+cBkwIABkMlkaNeuHcaNG6dyG6NHj4ZMJoOpqSmGDh2KQ4cOVVlm8ODB0vbd3d0BAJaWllKb/Pz8EB8fj4yMDIwcORIA0KpVK3h6elbZ1uPrEBE9K2dnZ7z33nt49913sXfvXvz5559V5rdp00bqFfbz88Pp06dhZWWFPn36AAC8vLxw9+5dxMfH12h7leLi4pCeno5Ro0YBADp06AAnJyecOXNG7b4rPf5+ERISAi8vLymHjx07FlFRUbh9+7badYiel4m2AyD9YGdnJ/3duHFjlJeXIy8vD5mZmUhOTsbEiROl+a6ursjJyUGzZs0AAI0aNarRPiqLb5lMBmtra2m6iYkJysvLAaBG+3vU48vL5XI0adIEeXl5UrJ9WnyZmZnYs2cPDh48CAAoKip6Ysx0ZeyPeny7WVlZsLW1hYnJ/992Dg4OyMrKUrsOEdGzSE1Nxfz58/H999/Dw8MD0dHRWLx4cZVlHs83mZmZuHfvXpX86uDggPz8/Bptr1JlTpsyZYo0raysDPfv31e77+picnNzk17b29tL011cXKrdFtGzYnFMNXLv3j3p77y8PJiamsLe3h7Ozs7o3LkzAgMDqyyrqlisC7Xdn7OzM1q2bIlPP/1UmpabmysVxjXd5+zZszFs2DAAgEKhQEFBQa1jb968OQoKCiCXy6UCOTc3F23atKn1toiIqvPXX3/B2toaHh4eAB52DDyNs7Mzmjdvjm+//VaaVlhYCDMzM5w8ebLG22vevDlMTU2rbKe4uBjGxrX/sNrZ2Rm5ubnS67y8PGkfRJrCYRVUI7/99hsKCwsBACEhIRgyZAhkMhm8vb0RHx+PW7duAQBycnLg4+MDhUKhkTietj9ra2uUlJQgKioKe/fuxejRoxEeHi4V9ykpKZg9e3at9jl69GiEhYWhoqICwMNhFl999VWtY/f09ESrVq0QFhYGAEhPT0d8fLw0zIKIqK60bt0aBQUFuHHjBoCHX2R+moEDByI/Px8JCQkAHha07733HgoLC2u1PU9PTzg7O+P48eMAHhbSH3zwAVJTU2vdjtGjR+Ps2bNSgRwcHIzevXtLvcZEmiBbsWLFCm0HQbrt5MmTaNOmDcLCwrBr1y4oFAqsWrUKlpaWsLOzQ8eOHfH5558jNDQUx44dg7+/P1xdXbFnzx6EhITg+vXryMrKQt++fQEAkyZNQkZGBuLj4/Hqq69i/vz5yMrKwl9//YVWrVph7dq1SEtLQ0lJCVJTU3HgwAFcv34dDg4O6Nmzp9r9AQ97db/66itcunQJkydPRufOndG4cWP861//wuHDh3H+/HmsWrUK9vb2WL9+PcLDw3H58mWUl5ejW7duKtvv6emJq1evIiAgAIcOHUJ+fj4WL14MU1NTTJkyRSpyW7RogZYtWyIpKQnLli3DrVu3EBkZiUGDBsHCwgLGxsbo378/du7cif379+PYsWNYunQpPD091a5DRPQsmjVrBrlcjg0bNiAqKgpmZma4dOkSUlNTYWNjg02bNiEtLQ1RUVHS2GAzMzP06tUL69evx8GDB3Ho0CHMnj0bnTp1qnZ7r7/+epV9V+a6rVu34sCBAzh48CBGjRqFV199FZGRkSr3re79wtnZGU2aNJGeZnT//n2sWbMGVlZWatchel58lBs91SeffAJXV9cnHnFGREREZGg4rIKIiIiISIlfyKNqbdmyBefOnYO5uTmaN2+u8lFnRERERIaCwyqIiIiIiJQ4rIKIiIiISInFMRERERGRkkbGHGdn36/y2t7eCnl5xZrYlV7jcVGNx0U1HhfV9Om4NG2quV/yejzvPo0+HbfaYtv0lyG3z5DbBuhu+54l79ZLz7GJiaw+dqN3eFxU43FRjcdFNR6XZ2PIx41t01+G3D5DbhtgWO3Tm6dVHEk5rnK6d9s36jkSIqKGgXmXiBoijjkmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiU9OY5x0REpBnqnmdMRNQQ6X1xzIfUExEREVFd4bAKIiIiIiIlFsdEREREREp6P6xCHQ63ICIiIqLaYs8xEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkZLDPOVaHzz8mIiIiInXYc0xEREREpKRzPcfqenaJiIiIiDSNPcdEREREREosjomIiIiIlFgcExEREREp6dyYY20JOZeicvpb/dvWcyREREREpC0sjomIqFb4SEwiMmQGWxxfSctTOb1DK/t6joSIiIiI9IXBFsfqqCua28nqORAiIiIi0jkNrjhWJ7nigsrpIedqtx2OUSaihorf3SAiQ8DimIiIVFL3SZs6/ASOiAyB3hfHtU3emqau50SV6WM8NRgJEZFuqE1eBNjTTETapffFsSHiR5NERDX3eM60tjZHUVEpcyYRPRO9KY611UOsbiyyOu1kPWu87PfHrqCoqLS2IT03Ft9EpAnq8mVt8iLAHEVE2qU3xTFpHt+QiEiX1XZ4BodzENGzYHFcx2rT02z6wATlFfInpqvrZaltotc0TcXDj0SJ6pemP5mrqx5lTdNkjmU+I9IfRkIIoe0giIiIiIh0gbG2AyAiIiIi0hUsjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZGSxn8EJCIiAsePH4ejoyOMjIzg6+ur6V3qhbfffhvm5uYAAGNjY+zdu1fLEWlHdnY20T/XNwAAIABJREFUvvzyS1y5cgVBQUEAgPz8fGzcuBEtW7ZEamoqFixYgCZNmmg50vql6rhs3rwZMTEx0jKzZs1C3759tRWiVqSlpeHLL79Ep06dkJmZicaNG8PX15fXTC0YQk6ubd7YtWsXCgsLUVBQgL59++K1117TZvjVepZrXF/ap1AoMGvWLHh4eKC8vBzp6elYvXo1SkpK9L5tlUpKSjBu3Dj069cPfn5+BnHeKqmqWwypfVUIDSouLhaDBw8WpaWlQgghfH19RUREhCZ3qTcCAgK0HYJO+OWXX8SpU6fE6NGjpWnLli0TR44cEUIIcerUKbFw4UJthac1qo4Lrxkh4uPjxYkTJ6TXw4YNE3/88QevmRoylJxcm7wRFxcnpk2bJoQQoqysTLz++uvi3r179R90DdX2Gten9lVUVIitW7dKr2fNmiVCQ0MNom2V1qxZIxYtWiTWrl0rhDCc61II1e9BhtS+R2l0WEVcXBxcXFxgZmYGAOjevTvOnDmjyV3qjWvXriEwMBCbN29u0Mdk6NChsLa2rjItPDwc3bp1A/DwmgkPD9dGaFql6rgAwPbt27F7924EBgbiwYMHWohMuzw8PDB48GDptUKhgKWlJa+ZGjKUnFybvHH69Gl07doVAGBqaoq2bdviwgXVP2etC2p7jetT+4yNjTFnzhwAgFwuR1ZWFtq0aWMQbQOAkJAQdO/eHS1atJCmGUrbANV1iyG171EaHVaRk5NTJYHZ2NggJydHk7vUG9OnT4eHhwcqKirw7rvvwtraGj179tR2WDrh0evGxsYG9+7dg1wuh4mJxkcB6bShQ4fC1dUVVlZW2LdvHz777DOsXr1a22FpzYkTJ9CvXz+0a9eO10wNGXJOVncN5Obmom3bttJyNjY2yM3N1VaYtVKTa1wf23fu3Dl888038PLyQpcuXQyibUlJSUhJScGCBQtw9epVabohtK2SqrrFkNr3KI32HDs6OqKoqEh6XVhYCEdHR03uUm94eHgAAGQyGV566SVER0drOSLd8eh1U1hYCDs7OxY5AP72t7/BysoKANC7d29ERUVpOSLtiYqKQnR0NJYsWQKA10xNGXJOVncNODg4PNFmBwcHbYVZYzW9xvWxff3798fu3buRkZGBffv2GUTbTpw4ATMzMwQGBuLSpUtISEjAN998YxBtq6SqbjGk9j1Ko8Vx165dcfv2bZSVlQEAfv/9d3h5eWlyl3ohOTkZBw4ckF7fvHkTrVq10mJEumXAgAGIjY0F8PCaGTBggJYj0g3r1q2T/r558yZat26txWi058yZM/jtt9/g7++P7OxsxMbG8pqpIUPOyequgYEDByIuLg7Aw4/yk5OTdf5Tutpc4/rUvqSkpCrDeFq0aIGMjAyDaNvs2bPh6+uLGTNmoEePHvDw8MDkyZMNom2A+rrFUNr3OCMhhNDkDs6fP49jx47B3t4epqamevnN6LqWlZWFVatWoVOnTigsLIRcLsfixYthbNzwnqwXExODkJAQnDt3DuPHj8eUKVNQUlKC//mf/4GLiwvS09Pxz3/+s8E9eUDVcdm6dSsePHgAR0dHXLt2DXPnzkWbNm20HWq9SkxMxMSJE9G5c2cAQHFxMd59910MGjSowV8zNWUIObm2eWPXrl0oKCjAvXv38Oqrr+r0t+af5RrXl/alpaVh/fr16NSpk1QwLV26FKampnrftkrHjh3Dvn37UF5ejnfffRf9+vUziLapq1sKCgoMon2P03hxTERERESkLxpeVyURERERkRosjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkxOKYiIiIiEiJxTERERERkRKLYyIiIiIiJRbHRERERERKLI6JiIiIiJRYHBMRERERKbE4JiIiIiJSYnFMRERERKTE4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgck05bsWIFXnrpJRw8eFDboRAREVEDwOKYdMrEiROrFMIrVqxAx44dtRgRERE96vE8TWRoWBwTERERESmZaDsA0h0KhQIrV67EtWvXIJPJ0Lp1a/j7+yMyMhIbNmxAkyZN4OHhgaioKNjb2+Ozzz7DF198gT/++ANDhgzBRx99BAAQQmD37t04fvw4ZDIZXnjhBfj7+8PGxgYAcPbsWWzbtg0ymQwWFhZYvnw5WrdujY0bN+Ly5cvIzs5GcHAwpk6dCi8vLwBAWloa5s6di6tXr2LIkCFYsGABysrKMHXqVMTExGD58uU4c+YMbty4AT8/P7z++usAgKKiInz++edITU2FEAKjRo3C+PHjAQAnT57Ezp07YWFhAWNjY8ydOxfdunXD77//jg0bNsDU1BRCCEyZMgUDBw5Uecx27dqF48ePw8TEBB07doSfnx/MzMzg6+uL8PBwzJs3DxcvXkRMTAymTJmCs2fPIj4+HmvWrEFISAguXryI48ePw8LCAitXrkRubi7Ky8sxfvx4jB49Grdv38b8+fNVrtOiRQsNXxFEpClpaWlYtWoVSkpKUF5ejn79+uHDDz8EoD5H7t+/Hzt27ICnpycaNWqE33//HR06dICvry82bdqEy5cvY/LkyXj33XeRkJCAZcuW4f79+xgxYgQuXbqEe/fuYdGiRejfvz8AYP/+/QgODoaZmRmMjIywbNkyvPjiiwAe5s41a9YgOTkZANCmTRssXLgQe/bseSJPnzlzBmFhYfDx8UFKSkqVPF1JXa5MTk7GypUrAQByuRx///vfMWbMGNy9exeffPIJSktLIZfLMXDgQMyYMUPlsTx37hy2bNkCMzMzWFtbY+XKlXBycsKWLVvwww8/YOjQocjPz0dsbCxefvllFBYWPpGfly5dihEjRmDTpk2IjY0FAHTr1g0LFiyAqampypy+dOlSjBkzRjMXCGmXIFI6c+aMmDp1qvR6zpw5Ij09XQghRFBQkOjatau4deuWUCgUYtSoUWL69OmitLRU3L17V7i7u4usrCwhhBDBwcHizTffFMXFxUIIIZYsWSIWL14shBAiLS1NdO3aVaSkpAghhAgJCRFDhgwR5eXlQgghfHx8RFBQUJW4fHx8xIwZM4RCoRBZWVmiU6dOIjMzU5rfvn17ERgYKIQQ4siRI+KNN96Q5vn7+4uPP/5YCCHE/fv3xaBBg8SFCxeEEEL07t1bZGdnCyGEOHHihAgICBBCCDF27FgRFxcnhBDi8uXLws/PT+XxCg0NFUOHDhXFxcVCoVCIuXPniq1bt0rzBw4cKD755BMhhBCRkZHi9OnTIj09XbRv314EBwcLIYTYvXu3yMrKEpMmTZL2n5OTI/r27StiYmKEEELtOkSkn+RyuRg2bJg4ePCgEEKIgoIC0b9/fyHE03NkQECAePXVV0VBQYEoLS0Vffr0Ef7+/kKhUIjExETRtWtXadmoqCjh5uYmTp8+LYQQ4tKlS6Jr164iNzdXCCHEDz/8IEpLS6Vlx48fL8W4dOlSKX9VVFSImTNniqioKCGE+jw9ffp0lXm6ulw5d+5cceTIESGEEHfu3JHeg9atWyd27NghhBCiqKhI/OMf/1B5LCuPV3JyshBCiO+++05MmjRJmu/n5yeGDx8uiouLRUFBgbRfVfl5y5YtYtKkSUIulwu5XC6mTJkitmzZIm1L1TpkmDisgiS2tra4du0azp8/D4VCgU2bNsHFxUWa36ZNG7i4uMDIyAgvvvgi2rZtCzMzMzg6OsLBwQEZGRkAgNDQUAwbNgyWlpYAgDFjxuDQoUOQy+UICwtDly5d0KZNGwDA8OHDcfv2bek/dXX69u0LIyMjNGvWDPb29rh161aV+ZU9IW5ubtI8hUKB0NBQ/P3vfwcA2NjYYODAgTh06BAAwM7ODj/99BMKCgowaNAgqVfCzs4OoaGhuHv3Ljp06IBPP/1UZUzBwcHw9vaGpaUljIyMMHz4cISGhlZZ5rXXXgMA9O7dW+oFf3T6lClTIIRAZGQkxo4dCwBwcHCAl5fXE2P6Hl2nWbNm1R4vItJdcXFxSEtLw4gRIwAAjRo1whdffAEANcqRHh4eaNSoEczMzNC6dWu4ubnByMgIbm5uKC4uRk5OjrSstbW1lHu6d+8OR0dHhIeHAwBefPFFzJo1CxMmTMDGjRvx559/AniYO0NCQqReUWNjY3zyySdSr7I6/fr1k/J048aNpVxcXa60s7PD0aNHkZGRgaZNm2Lz5s0AgMaNG+PcuXO4fv06rKys8PXXX6vcZ1hYGDp37oy2bdtKxysyMhJ37tyRlunTpw8sLS3RqFEjzJkzR5r+eH4ODQ3FW2+9BZlMBplMhlGjRqnNw4/ndDIsHFZBkm7duuGzzz7Dzp07sWTJErzzzjuYOXOmNN/a2lr628TE5InX5eXlAIDMzEw4ODhI8xwcHFBeXo6cnJwn5slkMtja2iIzM7Pa2CqHZACAmZmZtK/H55ubm0vzcnNzUVZWhg0bNsDCwgIAUFBQIH3Bb8+ePfjqq68wbNgw9OjRAx9//DFatmyJjRs3IjAwEKNHj0b79u2xcOFClV8KzMzMxOHDhxEdHQ0AKC0thbFx1f83GzVqpLI9j06vbPvjxywxMbFG2yIi/ZKVlQVbW1uYmPz/W3CPHj0APJk/VeVIdbm4cnuP5kc7O7sq+27cuDHu3LmD+/fvY+bMmfjXv/6FoUOHIiMjQyr8KnPno3G88MILT23Xo3n60VxcXa5csmQJvv76a0yaNAnNmjXD3Llz0adPH0ydOhWWlpb46KOPIJPJMGvWLAwbNuyJfWZmZiI5ORkTJ06Uprm6uiInJ0fqRKhJHq7clr29vfTawcEBWVlZ1a5DhonFMUnu37+Pl19+GQMGDEBaWhqmTZsGJycnqUezppydnZGbmyu9zs3NhampKZo0aQJnZ2fcuHFDmldRUYGCggI0b968ztpRycHBAWZmZli2bBk8PDwAPHzTKCkpAfDwTWflypVYvHgx1q1bh8WLF+O7775DWVkZFi1ahAULFmDXrl2YM2cOTp8+rbKdr7zyCqZNm1alrbVV2fbc3Fyppz43NxdOTk613hYR6b7mzZujoKAAcrlcKmiTk5Ph6upa5zny3r17VV7n5eWhWbNmuHHjBgoLC6VP3eRyubRMZe7Mzc1Fu3btADws6I2NjdG0adNax1BdriwoKMCcOXMwe/ZshIaGYvbs2YiIiEBhYSEmTpyIiRMnIiIiAjNnzoS7uztatWr1xLY7d+6MwMDAKm1+tFCvTZx5eXlVYmQebpg4rIIkJ06cwP79+wEArVq1gpOTExQKRa23M3r0aBw9elQqQkNCQjBy5EjIZDJ4e3sjMTERN2/eBAD8/PPPcHFxQbdu3QA87BF58OABUlNTsW7duudqj7GxMd566y1pGAUAbN++HSEhIQCAWbNmoaKiAhYWFvDw8EBFRQUAYO7cuXjw4AFMTEzQvXt3abq6dpaWlgIAoqOj1Q7BqI6TkxP69esnfXyXl5eHM2fO1PqfEiLSD56enmjVqhXCwsIAAPn5+Zg/f36NcmRtlZSU4MyZMwCAixcvIjc3FwMGDICLiwtMTEyQkJAA4OGX2ipV5s7KnKRQKODv74/s7GwAtc/T1eXKxYsX4+7duzAyMkLPnj0hl8thZGQkfcEQeDiMpPIL0o/z9vZGfHy8NIQjJycHPj4+z/zedejQIVRUVEChUODQoUP8wl0DJVuxYsUKbQdBusHc3Bw//PADDhw4gH379qFdu3aYOXMmYmJisGnTJqSlpaGkpASpqak4cOAArl+/DhcXF3z77be4ePEiEhMT0bNnT/Tp0wcPHjzAxo0bcfDgQdja2sLf3x9mZmaws7ODu7s7Vq9ejeDgYCQlJWHDhg3Sx3fm5ubYvn07wsPDMWHCBHz//fcIDw/H5cuX4e7ujh07dlTZ18KFC5Geno74+HgMHToUH3zwAbKyshAbG4tRo0ahV69eOH36NHbu3ImQkBDY2tpi9uzZMDY2RkpKCrZt24bQ0FAkJibi008/RZMmTZCXl4cvvvgCoaGhOHv2LJYvX46WLVs+cbzc3NxQWFiI9evXIywsDJcvX8bKlSthZWWFRYsWIT4+HomJibCxsYGbmxvy8/Ph6+uLrKwsxMTEwMPDQ2p3ZXH83XffITQ0FNOnT8drr71W7TpEpJ+MjY3Rv39/7Ny5Ez/99BPCwsLg5+eHVq1aVZsjDx8+jD179iAlJQWWlpYIDw/HyZMnpfz4r3/9CykpKYiPj8frr7+OnJwcREVFoVmzZggICMCxY8fw2WefoUOHDrCysoKDgwM2bNiAiIgIGBkZIT4+vkru/O2337Br1y4EBQVhyJAh0rCL2ubpPn36qM2VCoUCGzZswKFDhxAaGgo/Pz906tQJJiYm+PLLLxEaGor9+/dj8uTJGDBgwBPH0s7ODh07dsTnn3+O0NBQHDt2DP7+/nB1dcWePXsQEhKC69evIysrC3379gUAlfkZePhPS3JyMrZs2YKgoCC4u7vD19cXMplM7TpkmIyEqn/FiIiISK9FR0dj8eLF+PXXX7UdCpFe4bAKIiIiIiIlFsdEREQGJiEhAatXr0Z2djbmzp2r7XCI9AqHVRARERERKbHnmIiIiIhISSPPOc7Ovq+JzdYbe3sr5OUVazuMOmEobTGUdgCG0xZDaQdQf21p2lRzPyBQ07yrj+dN32LWt3gBxlwf9C1eQP9iVhXvs+Rd9hyrYGIi03YIdcZQ2mIo7QAMpy2G0g7AsNryNPrYVn2LWd/iBRhzfdC3eAH9i7mu4uUv5D3FkZTjKqd7t32jniMhItJtzJdEZAjYc0xEREREpMTimIiIiIhIicUxEREREZESxxw/I46tIyKqGeZLItIn7DkmIiIiIlJiz3E9YK8JERERkX5gzzERERERkVKD6znWdC+uuu0TERmKuspz/FSNiHRRgyuOa+tKWp7K6R1a2ddzJERERESkaRxWQURERESkxOKYiIiIiEiJwyqUHh37ZpVljuKiUi1GQ0RERETawJ5jIiIiIiIl9hw/I35Rj4iIiMjwsOeYiIiIiEiJPcdKj/YEm5qaoLxc/tzbqcTeZCIiIiL9wJ5jIiIiIiIlFsdEREREREosjomIiIiIlDjmmIiI9MKjz6N/lHfbN+o5EiIyZOw5JiIiIiJSMtieY3U9DOqeT6wNIedSVE43dU1SOZ29I0TUEDyev/mrpURUnwy2ONYHyRUXVE7vAD76jYiIiEgbWBwTEZFK6j6BIyIyZCyODQC/pEJEhuTx4W+VP8yk7geV1A1Rq623+retk+0QkX5jcVwP6mqcc217cY6kHFc5Vo9FMxEREZFqLI6JiEgvqOtoaCerm+2r64FW16Osann2PhPpPxbHRERUJ2r7KZm6YRLaou5L0gALXqKGRO+L44b0hRF1bzy69gZDRFQT2nq0pvoimIjIAIpjdXTpecZERKQ5LHaJqC4ZbHGsz2pb2LNHmYieR0P6BI6I6GlYHNMz4yPkiKghUPdFPVU91iHnAGtrcxQ99pQgdV/UU5dHy2+9qHI6v/BHpHksjg3YlbQ86fmgVafvV7m8up5mFrtE1JBx2AZRw6L3xTHHFmveszxfWZW6KrLZY02kXcy76iVXXIDpAxOUV1TtlNh4pnYFdm0fT/e8j6Gr7O1mzzSRHhXHdfULSKR7WOwSEVWlrrc65Fw9B/IUdZW/+T5AukTnimN1N0hyBXsqNE3TvUG1Hc7x6LWg6pf+qlu+krrEWttErGu94bVdvra9SsRjRrpJXdHcTtZT5fSajpeu7O1WV3ybuibVPMhq9quN+0eXYiH9YCSEENoOgoiIiIhIFxhrOwAiIiIiIl3B4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgcExEREREp6dyPgGhCREQEjh8/DkdHRxgZGcHX17fK/NLSUqxbtw5OTk5ITU3FjBkz0KZNGwBAaGgoLl++DGNjY7Rq1Qr/+Mc/AADLly/HjRs3pG0sXboUbm5uOtuOmzdvYt26dTAxMUFAQIC0TkZGBrZt24bWrVvj1q1b8PPzg7W1td61Qxvn43nakpCQgL1796JTp064ceMGPDw88PbbbwPQr3NSXTv07Zzk5ORg8eLF6NGjB3JyclBeXo5ly5bB2NhYK+fkWWgi1+livNVdd7oac6WcnBy89dZbmDlzJnx8fHQ+5ri4OJw/fx7GxsaIjo7GmjVr4OzsrLPxrlmzBiYmJlAoFCgpKZHuYU17WswA8PPPP2PTpk3w9/fHwIEDpem6eO+pi1eX7z11MVeq1b0nDFxxcbEYPHiwKC0tFUII4evrKyIiIqoss2PHDhEYGCiEEOLKlSti/PjxQggh/vvf/4qRI0cKhUIhhBBizJgx4saNG0IIIQICAuqpBQ89TzuEECI0NFT8+OOP4sMPP6yyzpQpU0R8fLwQQoh///vf4osvvtBkMzTWjvo+H0I8X1tOnjwpHfeysjLx0ksviZycHCGEfp2T6tqhb+ckMzNT7N+/X1puxIgR4uLFi0KI+j8nz0JTuU4X463uutPVmIUQoqKiQvj7+4tZs2aJb7/9VuPxPm/M9+/fF76+vtJyaWlpoqioSGfjjYuLEyNGjJCWe/Qe1nbMaWlpIjIyUvj4+Ihff/1Vmq6r9566eHX53lMXsxC1v/cMflhFXFwcXFxcYGZmBgDo3r07zpw5U2WZM2fOoFu3bgAANzc3XLlyBYWFhTh37hzc3d1hZGQEAOjWrRvOnj0LACgqKsL27dsRGBiI7777DnK5XGfbAQAjR46EqalpleXLy8sRHR2NLl26SNsMDw/Xu3YA9X8+gOdry2uvvQYPDw9pOZlMBlNTU707J+raAejfOXFycpJ6QIqKilBcXAxXV1etnJNnoalcp4vxVnfd6WrMALBz506MGzcOdnZ2Go+1LmIODw+HlZUV9uzZgy1btuDPP/+ElZWVzsbbuHFjFBcXQy6XQy6Xw8jICC1atNBovDWNuWXLlujdu/cT6+rqvacuXl2+99TFDNT+3jP44jgnJ6fKx582NjbIycmp0TK5ublVpltbW0vrjhgxAtOnT8eMGTNw+/Zt7NixQ2fboU5eXh4sLCykm/Jpy9cFTbQDqP/zAdRdW/bt24dZs2ahUaNGen1OHm0HoL/n5MiRI5g5cyamTZuG5s2ba+WcPAtN5TpdjPdRj193mvQ8MUdFRcHCwgKenp4aj7Mm8dRkmVu3biE+Ph4+Pj6YM2cOvvvuO0RFRelsvK1bt8bbb7+NefPm4aOPPsIrr7wCBwcHjcZb05jV0dV7ryZ07d5T51nuPYMvjh0dHVFUVCS9LiwshKOjY42WcXBwqDK9qKhIWtfd3R0mJg+HbPfu3VvjCeN52qGOvb09SkpKIISo0fJ1QRPtAOr/fAB105bDhw+juLgYkydPBqC/5+TxdgD6e068vb3x73//G0eOHEF4eLhWzsmz0FSu08V4K6m67jTpeWI+deoUSktLERgYiGvXruH8+fMICgrS6ZhtbGzQqVMnmJqawtjYGF27dsWFCxd0Nt5Tp04hOjoaW7duxebNm5GRkYGffvpJo/HWNGZ1dPXeexpdvPfUeZZ7z+CL465du+L27dsoKysDAPz+++/w8vJCfn6+9FGXl5cXYmNjAQBXr15Fhw4dYGNjg/79++PPP/+U3hRjY2Px6quvAgDWrVsn7ePmzZto3bq1zrZDHVNTU/Tq1Qt//PGHtM0BAwboXTuA+j8fwPO35cCBA8jJycGcOXNw9epV3LhxQy/Piap2APp3TmJiYpCQkAAAMDY2houLC9LT07VyTp6FpnKdLsYLqL/udDVmf39/zJgxAzNmzED79u3Rt29fjB07Vqdj7tWrF27duiVt6/bt23jhhRd0Nt7MzEw0bdpU2lbTpk2l7Wg7ZnV09d6rjq7ee+o8y70nW7FixYq6Cl4XmZqaol27dtizZw/i4uLQrFkzjB07FgEBAbh+/Tp69OgBd3d3HD16FH/99RfCw8OxaNEi2Nvbw8bGBlZWVggKCkJERAT69u2Lfv36AQDCwsIQFxeH33//HVeuXMHHH3+s0bFYz9MOADh58iSOHj2KGzduoKioCD169AAA9OjRA19//TWuXbuGmzdvYt68edKYHn1qR32fj+dty8mTJ7FixQoUFhYiODgYhw4dQo8ePdCiRQu9OifVtUPfzsndu3exe/du3Lx5E+Hh4SgsLMTs2bNhampa7+fkWWgq1+livNVdd7oac6X//Oc/OH36NHJzc9GoUSONF5vPE7ODgwPKyspw9OhRxMTEwNzcHO+//740xEjX4n3xxRdx6tQp/Pnnn4iJicHdu3fh6+ur8TGxNYlZCIHt27cjOjoaRUVFsLS0ROvWrXX23lMXry7fe+pirlSbe89IVP67QkRERETUwBn8sAoiIiIioppicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkxOKYiIiIiEiJxTERERERkRKLYyIiIiIiJRbHRERERERKLI6JiIiIiJRYHBMRERERKbE4JiIiIiJSYnFMRERERKTE4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimAzO4cOH4e/vr+0wiIjoMczPpA+MhBBC20EQ1aWKigqUlJTA2toaAPDJJ5/A1dUVH374oZYjIyJq2B7Pz08zaNAgrFmzBr169dJwZET/jz3HZHBkMlmNEy8REdUf5mfSB+w5phpLS0vDqlWrUFK2gVvYAAAgAElEQVRSgvLycvTr10/qjT179iy2bdsGmUwGCwsLLF++HK1bt8b+/fuxY8cOeHp6olGjRvjjjz/QpEkTbNmyBebm5gCA8+fPY/PmzTA1NYVCoYCPjw+GDRuG69evY926ddL+xowZg3feeQenTp3C0qVLYWlpiYULF+LNN9/ExIkTkZKSglmzZuE///kP7t+/j19//RV79+5FYGAgzM3N4erqipEjR2Lnzp3Iy8vDhAkT8NFHH2Ht2rUIDg7G9OnTMW3atCfavWvXLhw/fhwmJibo2LEj/Pz8YGZmBl9fX4SHh2PevHm4ePEiYmJiMGXKFJw9exbx8fFYs2YNQkJCcPHiRRw/fhwWFhZYuXIlcnNzUV5ejvHjx2P06NG4ffs25s+fr3KdFi1a1Os5JqK6pQ95c82aNejTpw82bdqE2NhYGBkZoW/fvvjggw9gZGRUpT0JCQlYtmwZ7t+/jxEjRuDSpUu4d+8eFi1ahP79+wMA7t69qzLXXb16FYsWLZLy86lTp7BhwwY0adIEnp6euHDhAoyNjbF161Y4Ojpi8eLFCAsLQ9u2bWFraws/Pz9YWlpi5cqVAAC5XI6///3vGDNmjMpjHxISgu+//x5mZmZwcnLCypUrYWNjgxUrViAsLAw+Pj5ISkrCpUuXMG7cOMTGxiImJgbLly/H6dOnER0djV27dsHd3R2rV6/GjRs3oFAoMHjwYEybNg3l5eWYOnWqynXY063nBFENyOVyMWzYMHHw4EEhhBAFBQWif//+Qggh0tLSRNeuXUVKSooQQoiQkBAxZMgQUV5eLoQQIiAgQPTr10/k5+eLiooK4e3tLQ4fPiyt261bN3Hjxg0hhBDx8fHCx8dHCCFEXFyciIuLE0IIUVZWJoYOHSott3fvXvH+++9L8Z04cULs379fCCFEVFSUGDhwoDTPz89PBAQESK8TExNF9+7dxYMHD4QQQty9e1csXrxYZbtDQ0PF0KFDRXFxsVAoFGLu3Lli69at0vyBAweKTz75RAghRGRkpDh9+rRIT08X7du3F8HBwUIIIXbv3i2ysrLEpEmTpDhycnJE3759RUxMjBBCqF2HiPSXPuXNbdu2iYkTJwq5XC7KysrEO++8I0JCQlS2KyoqSri5uYnTp08LIYS4dOmS6Nq1q8jNzRVCiGpz3eP5OSgoSHh6eoq0tDQhhBDTpk0TX331lTR/4MCBIioqSno9d+5cceTIESGEEHfu3BFTp05VGePFixfFyy+/LHJycoQQQqxdu1YsWbJEmu/j4yPef/99IZfLRVJSkvjpp5+EEEK0b99ebN68WQghxKFDh0RiYqJYvHix8PPzE0II8eDBAzF8+HApV6tbh/Qbh1VQjcTFxSEtLQ0jRowAADRq1AhffPEFACAsLAxdunRBmzZtAADDhw/H7du3ERsbK63v6ekJOzs7GBsb429/+xsyMjKkdTt37owXXngBAODh4YH58+cDAFq3bo3//Oc/+Mc//oEpU6YgOzsbf/31l7SPCxcuICsrCwBw9OhRDBs2rEZtcXd3h4uLC06dOgXg4RdEvL29VS4bHBwMb29vWFpawsjICMOHD0doaGiVZV577TUAQO/eveHl5fXE9ClTpkAIgcjISIwdOxYA4ODgAC8vLxw8eFDltqZMmYJmzZrVqD1EpJv0KW8GBwdj9OjRkMlkMDU1xdChQ3Ho0CG1bbO2tpbyXffu3eHo6Ijw8HBkZWXVKNc9qk2bNmjZsiUAwM3NTWqnKnZ2djh69CgyMjLQtGlTbN68WeVywcHBGDRoEBwcHAAAI0aMwOHDhyEe+bB8wIABkMlkaNeuHcaNGydNHzx4sLROx44dcfjwYak9FhYWePPNN59oz6PruLu7q42f9IOJtgMg/ZCVlQVbW1uYmPz/JdOjRw8AQGZmppSAgIdjymxtbZGZmSlNs7Gxkf42NzdHeXm5ynUf3e7atWtRUFCAffv2QSaTYeLEiSgpKQHwMOH27dsXoaGheOeddyCTydCoUaMat2fkyJEICQmBt7c3IiMj8d5776lcLjMzE4cPH0Z0dDQAoLS0FMbGVf+nVLffR6dXHotH2+rg4IDExMQabYuI9I8+5c3MzEzs2bNHKvqKiopga2urtm12dnZVXjdu3Bh37typca57lLp2qrJkyRJ8/fXXmDRpEpo1a4a5c+eiT58+TyyXmZmJ5ORkTJw4EcDDIRhNmjRBXl6eFJu6fPtoPLm5uSgrK3uiPZX/YKhah/Qfi2OqkebNm6OgoAByuVxK9MnJyXB1dYWzszNu3LghLVtRUYGCggI0b978qdt9fF0ASExMROfOnZGQkIAJEyZAJpMBwBMJ86233kJAQAAaNWpU417jSiNHjsT//u//IiIiAm3btn2i4H00vldeeaXKWOTc3Nxa7QuAdCxyc3Ph4uIi/e3k5FTrbRGRftCnvOns7IzZs2dL0xQKBQoKCtTGcO/evSqv8/Ly0KxZM43nuoKCAsyZMwezZ89GaGgoZs+ejYiICFhZWVVZztnZGS1btsSnn34qTcvNzX3in4qncXBwgJmZGXJzc9GuXTtpO8zdho3DKqhGPD090apVK4SFhQEA8vPzMX/+fMhkMnh7eyMxMRE3b94EAPz8889wcXFBt27dnrrdx9e9dOkStm/fDgBo1aoV4uPjAQB37tzB1atXq6w7aNAg3P0/9u48LIorbRv4Dc0moCKouODKBBWUuAXiCi5JjCuaSSYqRuOG+mLejFERiUTNJIpbHJcYUV8Tt6gZdh1joiJxA1xBjeKCChhBbUBk3873h01/ojQC3U1X4/27rlyRorrqOafop54+dar68WPs378fffv2VbkPCwsL5OXlITc3F1988QUAwNbWFi4uLpg/fz5GjRql8rWjR4/Gr7/+ioKCAgBATExMuWRbVba2tujbt69yVCYjIwPHjx9XXqojorpHn/Lm6NGjceDAAZSUlAB4Ni3hhx9+UBlDfn4+jh8/DgA4d+4c0tPT4ebmpvFcZ2Fhgfz8fERHR+Onn36Cr68vHj9+DAMDA7z11lsoLi5+6abBsvZERUUpi/jExETMnDmz2vs3NDSEh4eHsj35+fk4dOiQypsAqW6QLV68eLGugyDpMzQ0RL9+/bBlyxbs378fBw4cgI+PD1q3bo2GDRsq7+YNCQnBrVu3sHLlSlhbWyMiIgLbt29HYmIi6tWrhz///BO//PILbt68CWtra7z11lvK14aFheHSpUtYvHgxLC0t0alTJ+zbtw/BwcFISEhAfn4+zp49C3t7e7Rq1QoymQwpKSlo3bo13NzcAAAJCQnw9/fH/fv3cf36dbz//vto0KABtm7dioMHD8LDwwMdOnRQtuvatWv47LPPVLa7Q4cOyM7OxooVK3DgwAFcu3YNS5Ysgbm5OebPn4+4uDhcuXIFlpaW6NChAzIzM+Ht7Y20tDTExsbC2dlZOVJRdsLYtWsXwsLCMG3aNAwaNKjS1xCR/tKXvAk8K+QTEhKwbt06hIeHIzMzE76+vjA2Nn6pXffv30d0dDSaNm2KdevW4fDhw/j666/RsWNHAKpz3Yv52crKCmvWrEFSUhLy8/ORnZ2NwMBAJCYmwtDQEN26dUNpaSl++OEHnD9/HpMmTYKVlRVWrlyJ8PBwhIWFwcfHB46Oji/F2Lx5c1hZWeGbb75BREQETp06haVLl6JRo0ZYsWIFoqKicO3aNRQVFSk/kEyePBnJycmIi4uDnZ2dch60q6srTp48ia1btyI4OBhDhgzBuHHjYGBgoPI1pN/4KDd6bUVFReHmzZsVPr6NiIgqFhMTA19fXxw7dkzXoRBpBadV0GsnNDQUABAeHq68i5yIiIgIYHFMr6HIyEh4eHigdevWvKmCiKga4uPj8e233+LRo0eVTkkj0mecVkFEREREpMCRYyIiIiIiBa085/jRo6c1fm2jRubIyMjVYDSaIcW4GFPVSTEuKcYESDMuKcYEVD+uJk209yUvdTHvalJdbyPbp9/qevsA3bWxJnlXciPHRkYyXYdQISnGxZiqTopxSTEmQJpxSTEmQLpxVVddaUdl6nob2T79VtfbB+hXG/X+G/IOJv5W4fJh7d+t5UiIiEgXeB4gIk3Sm+JYVfIjIiIiItIUvSmOiYjo9cCRYCLSJcnNOSYiIiIi0hWOHBMRkV7g9Doiqg0cOSYiIiIiUuDIMRER6QRHgolIijhyTERERESkwJFjIiKqk54fmTZPM0VuTgEAPvWCiCrHkWMiIiIiIgWOHBMR0WuFz1Emospw5JiIiIiISEFyI8f7rxxQzgsjIiIiIqpNkiuOiYiIpCT0RGKFyz36ta/lSIioNnBaBRERERGRAotjIiIiIiIFTqsgIiKqxO2Ssyp+w2kVRHURR46JiIiIiBRYHBMRERERKdTZaRW8u5iIiLSpovMMzzFE+q/OFsecI0ZERNWh6pvziOj1UmeLY1U4okxEREREqrx2xTEREVFFridl6DoEIpIAFsdERKQRUpuW8Hyxa2xshKKiYq3vU9XVSVVUXbXkVU4i3WFxrMBERERERER6XxzzMhgRkXr2XzmA3JyCl5YPa/+uDqKpPimdB1TdDG4ve6uWIyGimuJzjomIiIiIFFgcExEREREp6P20CiIiIn1V3Rv4iEj7XrviWPWXg1Qs9MSz/1tYmCLnuTl5vFGPiIjUxTnKRNLz2hXHREREmlDdwRYi0g8sjmuIj34jIiIiqntYHL9C2ciAcZ4Rikr+/wPkecmLiIiqSlOjzKoGZqaNeVMj2yciFsdERFRN2v4mPFXPLe7YupFW96vP9hy+Xu6+mFfR1FVOXkWlukhvimMpPeSdiIhqH88DmrP6+L4Kl3/h/o9ajoRIevSmOJYaVZfIyp5uURX8ZE1ERBWp7jQMZ/TVUiTPaOqRcxxpJn3A4ljDqpPQVh+veF1Vlw6L7v+t3M9lj5djUiEifcSRYP2h7UfOqZqqoy9fYU51C4tjCVJ9wiifnMpuElQ1Wq0qmVW1+C6jqviu6LKcsbER7JvXr3B9Jjn9wRMV+wDQ3NxiFsHady0vutxN4zWlqRFiVX87t0tUzCdHxeelsnjU+a4BVbGoOucZt7xV4XJNvfcr6uMX2/e812kATCp510AIIWp1j0REREREEmWo6wCIiIiIiKSCxTERERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBRYHBMRERERKdTqc45Pnz6N3377DTY2NjAwMIC3t3e53xcUFCAgIAC2tra4e/cupk+fjnbt2gEAwsLCcO3aNRgaGqJ169b4+OOPdR7TwIED0bJlSwBA06ZNsXr1ao3EVJW4AOC///0v1qxZAz8/PwwYMEC5XFd9VVlM2uqrV8UUGBiIx48fo3Hjxrh69So+++wz2NvbA9BeP6kbl6766r///S+OHj2Kjh074vLly/Dw8MDAgQMB6LavKotLV31VJjw8HPPmzcOFCxdgYWEBQLt9VRNSzLuapE777t27h4CAABgZGWHdunW6CP+Vatq++Ph4/PTTT3B0dMSdO3fg7OyMjz76SEetqFxN2yiXy+Hr64sePXpALpejqKgIixYtgqGhtMb91PkbBQC5XA4PDw94eXnB09OztsN/JanWUWoRtSQ3N1cMHjxYFBQUCCGE8Pb2FqdPny63zubNm0VgYKAQQojr16+LsWPHCiGEePDggRg5cqQoLS0VQggxZswYcefOHZ3GJIQQ69atUzuGmsaVlJQkzpw5Izw9PcWxY8eUy3XZV6piEkI7fVWVmL777jtlXxw8eFB4eXkJIbTXT+rGJYTu+iooKEjcv39fCCHE1atXxTvvvCOE0H1fqYpLCN31lRBC3Lp1S6xZs0Y4ODiI7OxsIYR2+6ompJh3NUndHB4WFib27t0rZs+eXXtBV4M67Tty5IiIi4sTQghRWFgoevbsKeRyeS1GXzXqtDE1NVXs27dPud6IESPEuXPnainyqlH3b7SkpET4+fmJGTNmiJ07d9Ze4FUk1TpKXbX28erSpUto0aIFTExMAADdu3fH8ePHy61z/PhxdOvWDQDQoUMHXL9+HdnZ2Thx4gScnJxgYGAAAOjWrRv++OMPncYEAGfPnsWWLVuwdu1aXLhwQe14qhNXq1at8Pbbb7/0Wl32laqYAO30VVVi+vzzz5V9UVpaCnNzcwDa6yd14wJ011djxoxBixYtADwbUSsbydZ1X6mKC9BdX+Xl5WHr1q34n//5n3LLtdlXNSHFvKtJ6ubwkSNHwtjYuFZjrg512jdo0CA4Ozsr15PJZJJsqzpttLW1VY6G5+TkIDc3VzkKKRXq/o1u2bIFH374IRo2bFircVeVVOsoddXatAq5XK687AgAlpaWkMvlVVonPT293HILC4uXXlvbMVlaWmLu3LlwdnZGXl4eRo8ejc2bN6NNmza1Epcquuyrymijr6oTU2FhIUJCQvDVV18B0F4/qRsXoNu+ys/Px/r16xEbG4tVq1YBkEZfVRQXoLu++u677zBr1izlCaGMNvuqJqSYdzVJ3RwudZpq3+7duzFjxgzUr19f+0FXkybaePDgQfz888+YOnUqmjVrVjuBV5E67bty5QrMzMzw5ptv4ueff661mKtDqnWUumpt5NjGxgY5OTnKn7Ozs2FjY1Oldaytrcstz8nJeem1tR0TAOWn8nr16qFTp04a+9RTlbhU0WVfVUYbfVXVmAoLC7F48WL885//ROvWrQFor5/UjQvQbV+ZmZlh3rx5WLVqFT755BMUFRVJoq8qigvQTV89ePAAWVlZOHToEAIDAwEA27dvx+XLl7XaVzUhxbyrSermcKnTRPsiIiKQm5uLSZMmaT3emtBEG4cNG4YdO3bg4MGDiIqK0n7Q1aBO+44ePYqCggIEBgbixo0bOHXqFIKCgmot9qqQah2lrlorjrt27Yq//voLhYWFAIALFy7A3d0dmZmZyuF1d3d3XLx4EQCQkJCAjh07wtLSEv369cPVq1chhAAAXLx4Ef3799dpTGfOnCl3ifHevXto1aqV2jFVNS5VdNlXqmirr6oSU35+Pr766it8+umn6Ny5Mw4fPgxAe/2kbly67Ktt27Yp+6NZs2bIyMhAQUGBzvtKVVy66qvmzZtj+fLlmD59OqZPnw4A+PTTT9GlSxet9pU22gLUft7VJHXapw/Ubd8vv/wCuVyOWbNmISEhAXfu3NFNQyqhThtjY2MRHx8PADA0NESLFi2QnJysm4aooE77/Pz8lHnGwcEBffr0wQcffKCztlREqnWUugxEWearBadOncLhw4fRqFEjGBsbw9vbGytWrICVlRWmT5+O/Px8BAQEoEmTJkhKSoKXl1e5u6avXLkCmUyGtm3bauyu6ZrGlJCQgA0bNsDJyQkPHz6Era0tvLy8NBJTVeISQmDTpk34z3/+gx49emDkyJHo168fAN31laqYtNlXr4rJ29sbN2/eRNOmTQEAubm5yk/e2uondeLSZV9t2rQJaWlpaNGiBW7fvo3u3bvjH//4BwDd9pWquHTZV8CzKRR79+7Fv//9b8yaNQsff/wxbG1ttdpX2miLLvKuJqnTviNHjiAsLAx37tzBqFGjMG3aNB235mU1bd+RI0fg4+MDR0dHAEBmZia+/PJLuLq66rhFL6tpG+Pi4rBt2zY4OjoiJycHaWlpWLx4cbl7OKRAnb9RAPjPf/6D3bt3w9bWFmPHjoWbm5sOW/MyqdZR6qjV4piIiIiISMqk9TBAIiIiIiIdYnFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgUWBwTERERESmwOCYiIiIiUmBxTERERESkwOKYiIiIiEiBxTERERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBRYHBMRERERKbA4JiIiIiJSYHFMRERERKTA4piIiKiGcnNzMX36dHz88ccYNWoU7t+/r7NYNmzYgD59+mD9+vWvXLekpAQTJkxAhw4dkJKSAgA4d+4cJk+erO0wqywlJQVDhw7VdRj0GmJxTHXO88meiEibDh48iOLiYuzduxeff/45DA01e1pdsGBBlYpdAPD29ka/fv2qtK5MJsPOnTvLLevRowf+/e9/VztGbbGzs8PevXt1HQa9hox0HQAREZG+SktLQ9OmTQEAAwYM0HE06jEwMED9+vV1HUY5DRo00HUI9BpicUyvVFpaiiVLluDGjRuQyWRo06YN/Pz8MGvWLJw5cwbu7u7YvHkzQkNDsWrVKnTv3h0ODg74+eef8d577yErKwuXL1/G4MGD8d577+GHH37AjRs3sGDBAgwePBjx8fFYtGgRnj59inHjxuHIkSMoLi7G2rVrERgYiIsXL8LR0REBAQHKmEJDQ7Fnzx6YmJjA1tYWS5YsgaWlJaZOnQoAmDNnDkxNTbF8+XIsWLAAsbGx8Pf3R2RkJGJiYtCsWTMkJSWha9eu+L//+z9cunQJ/v7+aNiwIYKDg1/qgytXrmDZsmUwMDCATCaDv78/7O3tsW/fPmzevBlvvvkmLCwscP78eTRu3Bj29vY4cOAAPD09cevWLZw/fx7jxo2Dt7c3tm3bht9++w0ymQxt27aFn58fLC0tsXjx4gpfM3v27Fo71kRUdfv27UNwcDAKCgowYcIEzJ49G+vXr38p32zduhVZWVnYvn07ZDIZSktLMWfOHPTo0QMAlPnu3LlzMDIygo2NDebOnYtjx47hxIkTMDU1RWxsLEaOHIkPP/wQixcvxvXr12FsbIwmTZpg6dKlsLS0rFLMQUFB2L59Oxo3boxhw4Ypl6enp2PGjBmIi4tDQkKCRvPy87ktMTERCQkJeO+99zBnzhwAwIULF7By5UoYGxtDCIHJkydjwIABmDhxIqKjo3H06FHY2dmhqKgIa9aswcWLFwEA3bp1w5w5c2BsbAxvb29ERUVh9uzZuHTpEm7evIlJkyZh/PjxFfZDTk4O/vWvf+Hu3bsQQmDUqFEYO3Ys/vrrL3z++eeIi4vDsmXLEBoainPnzmHBggXYs2cPGjdujC5duuDMmTPIysrCsWPHEB8fjxUrVkAIAQMDA8yfPx/Ozs44evQoVq5cWeFrSOIE0SscP35cTJkyRfnzrFmzRHJysigsLBQuLi7i/Pnzyt95eXmJ0tJSIYQQPj4+YvTo0aKgoEDI5XLh5OQkNm7cKIQQ4vDhw+K9995Tvi46Olo4OTmJixcvCiGEmDlzphg9erTIysoSBQUF4u2331b+7ty5c8LFxUXI5XIhhBDLly8XCxcuVG7LwcFBJCcnl2uDg4ODWL9+vRBCiPDwcHHlyhUxYsQIER4erlzn888/F9nZ2S+1PysrS7i6uorTp08LIYSIjIwU7777rigpKRFCCLFu3TrRu3dvIZfLRXFxsVixYoUQQghPT0/x6aefiuLiYnHr1i2xf/9+ERISIoYOHSpyc3OFEEIsXLhQ+Pr6KvdV0WuISLrWrVsnfHx8yi2rKN+EhoaKjIwMIYQQycnJws3NTbn+pk2bxKRJk0RxcbEQQoglS5aIoKAgIcSzPLpu3bpy2//xxx/L7f+7775T/lzR+mVu3LghnJ2dRVJSkhBCiN27d5fLl8nJycLBwUG5vibzsqenp5g2bZooLS0VaWlpwtHRUaSmpgohhPjggw/EpUuXhBBCXLt2rVx/Ph/fhg0bxMSJE0VxcbEoLi4WkydPFhs2bFCuO2DAAPHVV18JIYSIi4sTXbt2FUVFRRX2hZ+fn5g3b54QQoinT5+KgQMHirNnz5brh5CQECGEENu2bRNpaWkiKChIODs7i1u3binbmJWVJVxcXER0dLQQQoizZ88KFxcX8eTJEyGEqPA1JH2cc0yv1KBBA9y4cQOnTp1CaWkp1qxZgxYtWsDY2BjDhg1DaGgoAODPP/9Ehw4dYGBgoHyti4sLTExMYG1tDWtra3Ts2BFAxfOCLSws0LVrVwDAG2+8gZYtW6J+/fowMTFB27ZtkZycDAAICQnBwIEDYW1tDQAYMWIEIiIiIISotB2DBw9Wru/k5IRRo0YpY09PT4epqSksLCxeel1kZCTMzc3Rq1cvAIC7uzseP36MuLg45Tpdu3aFtbU1ZDIZ5s2bp1zu5uYGmUwGe3t7fPjhhwgLC8P777+PevXqAQDGjBmD8PBwFBcXq3wNEemfF/NNx44d4evri7Fjx8LX1xcPHjyAXC4HAAQHB2PUqFGQyWQAAC8vL7z11lsqt21mZoZx48bB09MTBw8exNWrV6sU0+HDh9G1a1e0atUKAKp0s5sm83Lfvn1hYGCApk2bwsrKSnnzYsOGDREWFobHjx+jY8eO+OqrryqMJSwsDB4eHpDJZJDJZBg1atRLV/rK5lx36NABubm5yj5+XmlpKcLCwvD3v/8dAGBpaYkBAwYgPDy83HqDBg0CAEyePFk5daZdu3awt7cHAPj4+CAyMhKWlpZwdXUFAPTs2RMNGzYsNzr84mtI+jitgl6pW7du+Prrr7FlyxYsXLgQ//jHP+Dl5QUA8PDwwNSpU/Hll18iLCwMH330UbnXPl9sGhkZKX+WyWQoKiqq0rplP5etn5qaitu3b2PChAkAnl2SbNy4MTIyMpSJuSIvXnYcMWIEvvvuOzx8+BCHDh1SeaJITU3FkydPlPsDAGtra2RmZip/VjVP78Xlqamp5WK0trZGUVER5HI5bG1tK90WEemPF/PNzJkzMX78eEyZMgXAs+ItLy8PwLO80KhRI+W6ZbmgIjExMVi+fDkiIiJgZ2eH4OBghISEVCmmhw8fltuPlZXVK1+jybz8fJ+YmpoqX7t69WoEBgZi9OjRcHBwwNy5c9GpU6eXYnmxn6VWyAUAACAASURBVKytrZGWllZunbJ9mJqaAsBL5xng2WBIYWEhVq5cCTMzMwBAVlbWS/usKBe/KqeXxZWamlrpdkjaWBzTKz19+hQuLi5wc3NDUlISpk6dCltbW3zwwQdwdnaGjY0Nfv/9d9y9e1f56VibmjdvjlatWpUbXUhPT6+0MK5I06ZN4erqioiICMTExMDT01Pl/po1a1buzu7s7GyYmJjUKPb09PRycRsbG6Nx48bV3hYR6Qe5XI779+8rRzVfLNiaN2+OjIwM5c8ZGRnIycmBnZ3dS9uKj49Hu3btlL97/qrTqzRt2hR3794ttx9NUScvFxYWYv78+ZgzZw62bt2KWbNmITIyssJ9PB9zenp6pR8kVLG2toaJiQkWLVoEZ2dnAM+OSX5+frW39WJOL4urWbNm1d4WSQenVdAr/f7779i3bx8AoHXr1rC1tUVpaany96NGjcKyZcvQp0+fWoln9OjRiIqKwpMnTwAAiYmJmDlzpvL35ubmyM/PR1hYGH799ddKt+Xh4YHt27ejXbt2ykuaLxowYAAyMzMRHx8P4NlzTT/55BNkZ2fXKPZff/1VmYRDQ0MxcuRIlfsmIv1nZWWFBg0aKKdinThxotzvR48ejbCwMJSUlAB4NpJ6/fp1AM9GbvPy8pCbm4svvvgCbdq0QVJSkrJIPHnyZJXjeO+993Dp0iXlVIgDBw6o3bbn21BZXq7MZ599hry8PBgZGaF79+7KfqhoH+Hh4SgpKUFpaSnCw8MxZsyYasdqaGgIDw+PctMoNm3apJxmVx0DBgxATk4Ozp49C+DZzYVPnjzBwIEDq70tkg7Z4sWLF+s6CJI2U1NT/Pzzz/jll1+we/du2Nvbw8vLS1nQtWzZElu2bMG3334Lc3NzAMD27dsRGhqKmzdvokWLFti5cyfOnTuHK1euwNXVFfPmzUNaWhouXrwIJycnLFq0CPfv30dqaipKS0sRGBiIxMRE1KtXD1FRUThy5AiuXbsGe3t7uLi4wMrKCt988w0iIiJw6tQpLF26VHm5LTs7G5s3b8aNGzcwZcoUzJ49G8nJyYiLi4OdnZ1yvh0AtGnTBj/88AMWLlyIJk2aVNh+ExMTuLq6YsWKFQgODkZ4eDhmzpwJR0dHREREYPv27co7sN955x0AwIoVKxAVFYVr166hqKgI3bp1A/D/L6WuXr0awcHBaNCgAfz8/GBiYqLyNUQkTfv27cOuXbuQmJiIP/74A6NHj8bkyZNfyjeGhoZo37491q5diz/++ANCCJw7dw5xcXF455134OrqisTERKxfvx7BwcF44403lE9ZaNCgAbZu3YqDBw/Cw8MDQ4YMQVJSEtauXYvY2FjUq1cPZ8+eRWZmJi5evIiDBw/i5s2bMDc3h5OTU7l4bWxs0LRpU3z99dc4fPgw2rZti5MnTyIuLg69e/fGnDlzkJaWhtjYWHTp0kVjefn53Obk5ITNmzcrzwdvvfUWDAwM8N133yEsLAx//PEH/P390apVK0ycOBEpKSmIi4tD//790adPH9y+fRsbNmxAUFAQnJyc4O3tDZlMhvnz5yMuLg5XrlxB37594efnh8TERGUfl02fKOPq6orIyEhs2bIFoaGhaNCgAWbOnImsrCx4e3sr+8HZ2RnW1tY4c+YM1qxZg6SkJERHR2PUqFEAnp0f3n77baxZswZBQUGIjo5GQEAAWrdurfI1JH0G4lV3MRG9QlZWFhYuXIgNGzboOpRqKy0txeTJk/Hjjz/qOhQiIiKSAM45phqLiopCly5d8Ouvv+rdV3zGxcWhQYMGSEpKqrXpIERERCR9LI6pxpKSkrBs2TK0bdtW70aNHz16hLlz56JZs2b4/vvvdR0OERERSQSnVRARERERKfBpFUREREREClqZVvHo0VNtbFapUSNzZGTkanUfmqAPcepDjADj1DTGqTnVibFJE+19GYCqvKsPfVhVdaktQN1qD9siXXWpPTVpS03yrl6OHBsZ6cczYfUhTn2IEWCcmsY4NUfqMUo9vuqoS20B6lZ72Bbpqkvtqa228Ia8GjqY+FuFy4e1f7eWIyEiql3Mf0RUl+nlyDERERERkTawOCYiIiIiUmBxTERERESkwDnHCpxDR0RUnqq8SERUl3HkmIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgU+LQKDXv+7m7zNFPk5hTwiRdEREREeuK1K45DTyRWuNy4ZS0HQkRERESSw2kVREREREQKr93IMRERaQe/TImI6oI6WxyrStK3SzIqXN4RjbQZDhERERHpgTpbHGsKvz6ViEg9HFEmIn3C4ljhepKKEeXW6o8o88RAREREpB94Qx4RERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBR4Q94raPNGPU3hDX9EJGWq8uiw9rUcCBFRFbA4riFVyf55xsZGKCoqVllIs6glIiIikhYWx0REpBFVGTQgIpI6Fsd6hF9IQkR1SeiJxAqXG7e8VeHySU0+0GY4REQA6nBxzBEMIiJpu11ytsLlHSGdezqI6PVTZ4tjKdGHm/qI6PWl6qoUBxmI6HWk98VxXZxqUBfbRERUVSqLctfajYOIXk98zjERERERkYLejxzzsh8RERERaYreF8ekO3xOMxHVpj2HryMnp+Cl5R79+G0iRKQ5LI51qLqj3qpu4OO3TxHR62z18X0VLv/C/R+1HAkR1QUsjuuwqtzYZ55mitwKRmKex5FgIiIiel3oTXH8/MhA2dcyk2ZxBJqI6hJVXzLCaRhEVBm9KY5Jd6T2aDnOdSZ6PV3Li0ZRifoDI8whRFQZFsd1WFXmNGtjFF7Viafo/t8qXK6LURyOKBHRi1g0ExHA4liv6Ptj61R9VezBxFsVLtfnExKLb9In+p5bVFGVc5BU3S1VvWhmgU2k/yRXHKsqKuj1o2qeuaqndlR0UtLUCamqf5cWFqYVPmrqVdtRVTRX9/1Q1e2UxWncsu59MNEUfsChMqo+PFxPqvgpGVVdt7Ird/aytypcrqrgr+7TjPT9SR7azPeaUt0piVKLXxek8uHSQAghanWPREREREQSxa+PJiIiIiJSYHFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFCT1JSCnT5/Gb7/9BhsbGxgYGMDb27vc7wsKChAQEABbW1vcvXsX06dPR7t27QAA9+7dQ0BAAIyMjLBu3TpJxhkfH4+ffvoJjo6OuHPnDpydnfHRRx9JLk65XA5fX1/06NEDcrkcRUVFWLRoEQwNtfNZSp3jDgByuRweHh7w8vKCp6en5GIcOHAgWrZsCQBo2rQpVq9erZUY1Y3z0qVLOHXqFAwNDRETE4Nly5ahefPmkoozJiYGS5cuhbW1NYBnx/7999/H7NmzJRUnACxbtgxGRkYoLS1Ffn6+Rt5D6sQTFhaGa9euwdDQEK1bt8bHH38MAEhJScH333+PNm3a4P79+/Dx8YGFhYVaceqyPf7+/rhz545yG19++SU6dOgg6baoOn/p6thooy26Oi7qtKeyc7a+HZvK2qJv75nKahSNHRchEbm5uWLw4MGioKBACCGEt7e3OH36dLl1Nm/eLAIDA4UQQly/fl2MHTtW+buwsDCxd+9eMXv2bMnGeeTIEREXFyeEEKKwsFD07NlTyOVyycWZmpoq9u3bp1xvxIgR4ty5c5KLUwghSkpKhJ+fn5gxY4bYuXOnJGNct26dVuLSZJxPnz4V3t7eyvWSkpJETk6O5OJMTEwUV69eVa7n6+srUlJSJBfnpUuXxIgRI5TraeI9pE48Dx48ECNHjhSlpaVCCCHGjBkj7ty5I4QQYvLkycq8tGPHDvHdd9+pFWdVaas9tfV+e562zl+6ODbaaosujosQ2jtn69uxqawt+vaeqaxG0dRxkcy0ikuXLqFFixYwMTEBAHTv3h3Hjx8vt87x48fRrVs3AECHDh1w/fp1ZGdnAwBGjhwJY2NjScc5aNAgODs7K9eTyWRai1mdOG1tbZWfKHNycpCbm6sc+ZRSnACwZcsWfPjhh2jYsKFW4tNEjGfPnsWWLVuwdu1aXLhwQZJxRkVFwdzcHNu3b8eGDRtw9epVmJubSy7Odu3awdHREQDw+PFjFBYWSvJv08rKCrm5uSguLkZxcTEMDAxgZ2ens3hOnDgBJycnGBgYAAC6deuGP/74A0VFRYiJiUGXLl2U24yKilIrTl22B3iWszZt2oTAwEDs2rULxcUVfz2zVNoCVHz+0tWx0da5WBfHBdDOOVsfj01l9Ye+vWdU1SiaPC6SKY7lcnm5oW9LS0vI5fJqr6Ntmopz9+7dmDFjBurXry/ZOA8ePAgvLy9MnToVzZo1k1yc0dHRMDMzw5tvvqmV2DQRIwDMnTsX06ZNg5eXFxYuXIh79+5JLs779+8jLi4Onp6emDVrFnbt2oXo6GjJxfm8PXv2KC+lSy3ONm3a4KOPPsL//u//4p///Cd69+6tnAqii3jS09PLLbewsIBcLkdGRgbMzMyURWZt5lRttAcARowYgWnTpmH69On466+/sHnzZi23RDvnL10dG22di3VxXADtnLP1/di8WH/o63vmxRpFk8dFMsWxjY0NcnJylD9nZ2fDxsam2utomybijIiIQG5uLiZNmiTpOIcNG4YdO3bg4MGDWvtUrE6cR48eRUFBAQIDA3Hjxg2cOnUKQUFBkooRgPLTer169dCpUyetjR6rE6elpSUcHR1hbGwMQ0NDdO3aFWfPnpVcnGUKCwtx5coV9OzZUysxqhvn0aNHERMTg40bN2L9+vVISUnB/v37dRaPtbV1ueU5OTmwsbFBo0aNkJ+fDyGEym1qizbaAwBOTk4wMnp2O83bb7+ttQ95VYmzuus8T1fHRlvnYl0cF0A752x9PjYV1R/6+p55sUbR5HGRTHHctWtX/PXXXygsLAQAXLhwAe7u7sjMzFRernF3d8fFixcBAAkJCejYsSMsLS31Ks5ffvkFcrkcs2bNQkJCQrlJ8FKJMzY2FvHx8QAAQ0NDtGjRAsnJyZKL08/PD9OnT8f06dPh4OCAPn364IMPPpBUjGfOnFFe7gWe3azSqlUrjceobpyurq64f/++clt//fUX2rZtK7k4y0RERGDYsGFaiU8TcaampqJJkybKbTVp0kS5HV3E069fP1y9elV50rh48SL69+8PY2NjuLq64vLly8pturm5qRWnLtsDAAEBAcp93Lt3D23atJF0W1TR1bHR1rlYF8cF0M45W1+Pjar6Q9/eM6pqFE0eFwNRll0k4NSpUzh8+DAaNWoEY2NjeHt7Y8WKFbCyssL06dORn5+PgIAANGnSBElJSfDy8lLeIXvkyBGEhYXhzp07GDVqFKZNmya5OI8cOQIfHx/lnMnMzEx8+eWXcHV1lVSccXFx2LZtGxwdHZGTk4O0tDQsXrxYa3NQ1TnuAPCf//wHu3fvhq2tLcaOHauVJFXTGBMSErBhwwY4OTnh4cOHsLW1hZeXl8bjUzdO4Nmltvv378PY2Bj5+fnw8fHR2hNK1D3m06ZNw/fff6/1+wxqGmdubi78/f3RsmVL5R3US5YsUfs9pE6/hYWF4cqVK5DJZGjbtm25p1Vs3LgRrVq1woMHD7BgwYJae1qFNtqzYMECNG7cGGZmZrhz5w58fX3RuHFjSbdF1flLV8dGG23R1XFRpz2VnbP17dhU1hZ9e89UVqNo6rhIqjgmIiIiItIlyUyrICIiIiLSNRbHREREREQKLI6JiIiIiBRYHBMRERERKbA4JiIiIiJSYHFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgUWBwTERERESmwOCYiIiIiUmBxTERERESkwOKYiIiIiEiBxTERERERkQKLYyIiIiIiBRbHRK/w2WefoUuXLoiJiQEApKSkYOjQoTqOioiIiLSBxTHpteDgYEyYMEGr+1i3bh2aNGmi/NnOzg579+7V6j6JiKSmKvl2woQJCA4OVntfKSkp6NChg9rbIaoJFsdENdCgQQNdh0BERERaYKTrAKhuWLBgAUJCQtCzZ0/s2LEDycnJ8PT0xIkTJ/Dw4UNMmzYNhYWF2LNnD0xMTPDtt9/izp07KC0txeDBgzF16lQUFRVhypQpiI2Nhb+/PyIjIxETE4OtW7fi7t27CAoKQr169WBmZob58+dDLpcjMDAQjx8/xoQJE+Dg4IBFixaVi2vfvn3YvHkz3nzzTVhYWOD8+fNo3LgxfvjhByxevBgpKSkwMDDAG2+8gUWLFsHI6NlbIioqCqtWrYKVlRX69OlTbpsTJ05EdHQ0jh49iuTkZCxZsgRNmjTBzp078fvvv2PZsmVwcXHB8uXLAQAbNmzAiRMnYGJiAhsbGyxcuBBNmzatnQNDRHWOVPPt6tWrce3aNTx69AghISGYMmUK3N3dceLECWzYsAEmJiawsLDAkiVLYGtri7Fjx+LChQsYPnw4vvjiC3z66acwNjbGqlWr4O/vDwDKkepZs2ZVmmuPHj2KlStXonHjxujSpQvOnDmDrKwsHDt2DFeuXMGyZctgYGAAmUwGf39/2Nvb1/pxIz0iiDTEzc1NXLhwQQghxPbt20WnTp3E5cuXhRBCBAYGKv/t6+srfHx8hBBC5OXlieHDh4uQkBDldhwcHMT69euFEEKEh4eL2NhY4eLiIgoKCoQQQvz4448iKChICCFEUFCQ8PT0rDSudevWid69ewu5XC6Ki4vFihUrREZGhggNDVWu4+PjI/bv3y+EEEIul4uuXbuK8+fPCyGEOHLkiHB0dBTR0dHlYkxOTq4whnXr1inbd/PmTfH++++L0tJSIYQQ33zzTbntEBHVhFTzraenp3J9IYRISkoSXbt2Fbdv3xZCCLFr1y4xceJE5e+nTp0qvvnmG1FUVCRmz54tcnJyhBBCJCcnCwcHh3LbrizXlv3e2dlZ3Lp1SwghxPLly0VWVpZwdXUVp0+fFkIIERkZKd59911RUlJSaTvo9cZpFaQx/fv3x/HjxwEAly9fxjvvvIOoqCgAwJ9//gknJyeUlpYiIiICH3zwAQDAzMwMQ4cOfWmO2uDBgwEAI0aMQJcuXQAAoaGhyMvLw/jx4zF8+PBqxda1a1dYW1tDJpNh3rx5aNiwIf766y+MHTsWEyZMQGxsLK5evQrg2aixjY0NunfvDgAYNGgQTE1Na9QnFhYWePz4MX777TcUFRVh7ty56NGjR422RURURsr59nkHDhxA586d0b59ewDA8OHDcebMGTx8+BAAsHTpUoSEhGDu3LkYO3YszM3Na7wvAGjXrp1yVNjHxweRkZEwNzdHr169AADu7u54/Pgx4uLi1NoP1W0sjklj3N3dERkZidzcXJibm2PgwIGIiorCkydPUL9+fRgYGCA9PR2FhYWwtrZWvs7a2hppaWnltmVpaan8t5mZGXbt2oXY2FgMGjQI/v7+yM7OrlZs9evXL/dzSEgI9u3bh02bNmHnzp0YPXo08vPzAQCPHj1Co0aNyq1vZWVVrf2Vad68OTZv3oywsDC4u7tjzZo1KCoqqtG2iIjKSDnfPi81NRW3b9/GhAkTMGHCBHh7e6Nly5aQy+UAnuXISZMm4fz58xoZOHgx16empuLJkyfK/U+YMAHW1tbIzMxUe19Ud7E4Jo3p3bs37t27h+DgYPTq1Qv9+/fHn3/+ibCwMPTv3x/As8RsYmKC9PR05evS09Nha2urcrtFRUWwsbHBqlWrcPjwYTx58gQBAQFqxRofHw9nZ2dl0VtcXKz8XZMmTcrFB6DSRGpsbIzCwkLlz1lZWcp/5+Xl4W9/+xu+//57hIaG4tKlS9iyZYtasRMR6Uu+bd68OTp37oydO3cq/wsJCYGDgwMAoKCgAAkJCbC3t8f69esr3VZlubay/Tdr1uyl/b94LwnR81gck8aYmZnBxcUF33//Pfr164dGjRqhc+fOCAwMVF7SMjQ0hIeHh/KyXn5+Pg4dOoQxY8ao3G5aWpryxo/69eujU6dOKCkpAfBs2kJeXh4AYPbs2eWK3Mq0adMG169fR2FhIYqLi3HmzBnl79zc3JCeno7z588DAI4cOYLc3FyV27Kzs8Pdu3dRWFiIgoIC5fOQgWdF+Lp16wA8K7rbtWunjJ2IqKakmm/L1rl79y4CAgIwbNgwxMXF4f79+wAAuVwOT09PlJaWAgA2btyIqVOn4uuvv8bPP/+My5cvK7cDPBtgCAwMxKVLlyrNtaoMGDAAmZmZiI+PBwDk5ubik08+UWs0nOo+2eLFixfrOgiqO54+fYqMjAx8/PHHAIDHjx+joKAAf//735XruLq64uTJk9i6dSuCg4MxZMgQjBs3DgYGBpg8eTKSk5MRFxcHOzs7tGrVCkZGRoiNjcXWrVsREhKCR48ewc/PD/Xr10eTJk0QEhKC0NBQtG/fHu7u7uXiiYiIwPbt25GYmIiEhAS88847AIAOHTrg/Pnz2LRpE86dO4d69eohNjYWhoaG6N27Nzp16oRvv/0WBw8ehEwmw4MHDxAbGwsnJyfMnz8fKSkpiIuLQ//+/fG3v/0NN27cwMaNGxEfH4/27dvj2LFjKCwsRJ8+fXDo0CHs3r0b+/fvh5GREebNm1fjOcxERGWklm8BwNTUFJs2bUJUVBTGjRsHJycndOrUCf/6178QFhaGw4cPw8/PDy1btsQXX3yB//73v3B0dERhYSGOHj2KQ4cOwcrKCt27d8fdu3exbds2PHr0CJMmTUKrVq1U5tri4mKsWbMGSUlJiI6OxqhRowAAJiYmcHV1xYoVKxAcHIzw8HDMnDkTjo6OtXKMSD8ZCCGEroMgIiIiIpICTqsgIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUjbWz00aOn2tisWho1MkdGhupn1eojtkn66lp7ALZJHU2a1H/1SjUkxbxbRsp/M1KODWB86pBybIC045NybED14qtJ3tVKcSxFRkYyne37YOJvFS4f1v5dtbaryzZpS11rU11rD8A2UfVJuX81HZum872U+w6QdnxSjg2QdnxSjg3QfnycVkFEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIoXX5oY8IiIioPxNc+ZppsjNKdBhNEQkNRw5JiIiIiJS4MgxERGRxJSNbr84sq3uI0CJ6NU4ckxEREREpMCRYx3S1peDEBEREVHNsDgmIqI6SdUABBFRZVgcExER6QlecSTSPs45JiIiIiJS4MhxDfHTOxFR7aqLeVdXUz9CTyRWuNyjX/tajoRIelgcS1BdPAEQEVHds+fwdeRU8CUqLLJJn7E41iMvFs1lz79k0UxERESkGSyONYx3RxMRkbquJ2UAAIyNjVBUVKxc3rF1owrXr+4Vx9slZ1XsmSO+RLwhj4iIiIhIgSPHCpznS0RUu3ilTXPKRppfNKyaA8HaPhfyRkDSByyOiYhIr+lDka2pGFUVwVKjqgjmdA7SB5xWQURERESkwJHjOoBTQoiIqCKqRnCJSDUWx0REpFX7rxxAbgXPwiV6leoU95y3TJrCaRVERERERAocOSYiIqqjVN8AVz2qRnAtLEw1sn0iKeHIMRERERGRAkeO6zDeqEdERERUPSyOX0Efnp+p756/XGdhYYocxY07vLmCiOo6fXlusbbxqRokJSyOX0McUSYiIiKq2GtXHHMkuPr4dZ/sAyJSj76MEKuK015Wy4EQ6dBrVxyTalK7rMWClIikTF8KXm26lheNopLil5bby97SQTQV47mEqqvOFscvjhCbp5nW2YfQaypBV3dkoLoJR1PFd3X2K7WCn4iIiKRN74tjTpOgMvpQCHMEg4heB5p6vnJ1vJhfn7/BW53tlGGefn3ofXFMpEtMokTSpeqqWsfWjTSyHdI+VUV2daZtqNpG6InqrQ9UL6/z/KC/WBzrkRcTtLGxEYqKXp7rVVOaSEKAfozgaltZH1R15IJJlIiISBpYHBPVISyyqSaq+3hHbT8OsrojtdUdCabqUzV4YmyomTKiutMwqrK+cZ5RhTcL1lT1p3H+Tav75eNXtUdvimN9nlusqUt7ulLdpCWlu5T1naaKXU1up6qj4ZoqyFnw6051866q9c0tTDURjkrVLaafX1/TV+BIf1T33Fbdv7Pq3uS++vi+aq0/TEUK1MTN8pXl+erc/K6rc5W6JFcc63MRDFTvzcN5bM88n6Ce/6TPIls1XTz5Q9vbltp0HKkkaSIiql0GQgih6yCIiIiIiKTAUNcBEBERERFJBYtjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpCC5LwHRhEePHmHt2rW4fv06goKCAACZmZlYvXo1WrVqhbt372LOnDlo3LixjiOtuoratH79esTGxirXmTFjBvr06aOrEKslKSkJa9euhaOjI1JTU2FlZQVvb2+9Pk6q2qTPx6m0tBQzZsyAs7MzioqKkJycjG+//Rb5+fl6eZxUtWfLli16e4ykRsq5Ssp5R+r5Q+q5QB/e2/n5+fjwww/Rt29f+Pj4SOLvTlVsUvm7A4CPPvoIpqbPvmXT0NAQP/30k9b7rk4Wx+fPn8egQYNw7do15bI1a9agV69eGDp0KI4dO4aAgACsXLlSh1FWT0VtAoCdO3fqKCL1ZGZmYujQoRg8eDAAYOjQoXB3d8f+/fv19jipahOgv8cJALp27YpZs2YBAGbOnInffvsN586d09vjVFF7AP0+RlIi5Vwl5byjD/lD6rlA6u/tsg8/ZaRUl7wYt7j5mAAAIABJREFUGyCdfuvXrx9mz55dbpm2+65OTqsYMmQILCwsyi2LiopCt27dAADdu3dHVFSULkKrsYraBACbNm3Ctm3bEBgYiLy8PB1EVjPOzs7KkwDw7FN/vXr19Po4qWoToL/HydDQUHmyKS4uRlpaGtq1a6e3x0lVewD9PUZSI+VcJeW8I/X8IfVcIPX3dmhoKLp37w47OzvlMqn0XUWxAdLoNwC4ceMGAgMDsX79ehw/fhyA9vuuThbHFZHL5cqEbWlpiSdPnqC4uFjHUalnyJAhmDhxIqZMmQILCwt8/fXXug6pRn7//Xf07dsX9vb2deY4Pd+munCcTpw4AS8vL7i7u6NLly56f5xebE9dOEZSJsX+lXLekXL+kHoukOJ7+9atW0hMTMS7775bbrkU+k5VbFLotzLTpk3D9OnTMWvWLPzwww84e/as1vvutSmObWxskJOTAwDIzs5Gw4YNYWSk37NK3njjDZibmwMA3n77bURHR+s4ouqLjo5GTEwMFi5cCKBuHKcX21QXjlO/fv2wbds2pKSkYPfu3Xp/nF5sT104RlImtf6Vct6Rev6Qei6Q4nv7999/h4mJCQIDA3H+/HnEx8fjxx9/lETfqYpNCv1WxtnZGQAgk8nQs2dPxMTEaL3vXpvi2M3NDRcvXgQAXLhwAW5ubjqOSH0BAQHKf9+7dw9t2rTRYTTVd/z4cZw8eRJ+fn549OgRLl68qPfHqaI26fNxunXrlvIyFgDY2dkhJSVFb4+Tqvbo8zHSB1LqXynnHSnnD6nnAim/t2fOnAlvb29Mnz4dPXr0gLOzMyZNmiSJvlMVmxT6DQBu376NX375pVwsrVu31nrfGQghhEa3KAGxsbEIDQ3FiRMnMHbsWEyePBn5+flYtWoVWrRogeTkZHzxxRd6cXd9mYratHHjRuTl5cHGxgY3btzAZ599ppxjJXVXrlzBhAkT0LlzZwBAbm4uxo8fj4EDB+rtcVLVpjt37ujtcUpKSsKKFSvg6OiI4uJi3L59G19++SWMjY318jipas+OHTv09hhJjZRzlZTzjtTzh9RzgT68tw8fPozdu3ejqKgI48ePR9++fSXRdxXFlpCQIIl+S0tLw9KlS+Ho6Ijs7GwUFxfD19cXWVlZWu27OlkcExERERHVxGszrYKIiIiI6FVYHBMRERERKbA4JiIiIiJSYHFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgUWBwTERERESmwOCYiIiIiUmBxTERERESkwOKYiIiIiEiBxTERERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBRYHBMRERERKbA4JgKQkpKCoUOH6joMIiIi0jEWxyQ5wcHBmDBhQq3u087ODnv37tXKtgcOHIiYmBitbJuISCp0kbuJtIHFMZFCgwYNdB0CERER6ZiBEELoOgiSpgULFiAkJAQ9e/bEjh07kJycDE9PT5w4cQIPHz7EtGnTUFhYiD179sDExATffvst7ty5g9LSUgwePBhTp05FUVERpkyZgtjYWPj7+yMyMhIxMTHYunUr7t69i6CgINSrVw9mZmaYP38+5HI5/P398fjxY3Tq1AkODg5YtGhRubg2bNiAn3/+Ge7u7sjIyEBaWhpsbGywfPlyWFtbAwBOnDiBDRs2wMTEBBYWFliyZAlsbW2Vrx0yZAgyMzNx8eJFuLi44MGDB4iOjsbRo0dhaGiIzz//HHFxcVi2bBlCQ0Mhl8uxdu1ahIWF4fTp07CxscGGDRtgampa6f58fX1x4MABtG/fHg0aNICPjw86d+6M0NBQZb/Z2tpiyZIlsLS0xOLFi3HgwAF4enri1q1bOH/+PMaNG4fZs2fX+vEnIv0k1dwNABs3bsQff/wBU1NT1KtXD0uXLoWtrS2OHj2KlStXonHjxujSpQvOnDmDrKwsHDt2DFeuXMGyZctgYGAAmUwGf39/2NvbV7o9IrUIokq4ubmJCxcuCCGE2L59u+jUqZO4fPmyEEKIwMBA5b99fX2Fj4+PEEKIvLw8MXz4cBESEqLcjoODg1i/fr0QQojw8HARGxsrXFxcREFBgRBCiB9//FEEBQUJIYQICgoSnp6elcbl4+MjBg8eLJ4+fSqEEOLLL78Uc+bMEUIIkZSUJLp27Spu374thBBi165dYuLEieVeO3z4cJGbmyuysrLExo0blTEmJycLIYRITk4WDg4O4vDhw0IIIf71r3+JQYMGifv374vS0lIxcuRIceDAgSrtb8CAASI6Olr587lz54SLi4uQy+VCCCGWL18uFi5cqPy9p6en+PTTT0VxcbG4deuW2L9/f6V9QUT0Iqnm7h07dojS0lLl+nPnzlX+LigoSDg7O4tbt24JIZ7lxqysLOHq6ipOnz4thBAiMjJSvPvuu6KkpOSV2yOqKU6roEr1798fx48fBwBcvnwZ77zzDqKiogAAf/75J5ycnFBaWoqIiAh88MEHAAAzMzMMHToUwcHB5bY1ePBgAMCIESPQpUsXAEBoaCjy8vIwfvx4DB8+vFqxubm5wdLSEgAwatQoHD58GCUlJThw4AA6d+6M9u3bAwCGDx+OM2fO4OHDh8rX9urVC/Xq1UP9+vUxa9Yslfvo3bs3AMDBwQENGjRAixYtYGBggDfeeAPJyckAUKX9PS8kJAQDBw5UjnKPGDECEREREM9dxHFzc4NMJoO9vT0+/PDDavULEZFUc3fz5s3xySefYPz48fjpp59w9erVcr9v166dclTYx8cHkZGRMDc3R69evQAA7u7uePz4MeLi4qq0PaKaMNJ1ACRt7u7uWLt2Lby8vGBubo6ePXti9+7d8PT0RP369WFgYAC5XI7CwkJlsQcA1tbWSEtLK7etskIWeJaEd+3ahc2bN2Pt2rVwd3fH3Llzy23jVRo2bKj8t5WVFYqKipCRkYHU1FTcvn273I0hLVu2hFwuR9OmTQEA9evXr9I+ymKWyWSwsLBQLjcyMkJRUREAVGl/z3tx/eLiYjRu3BgZGRnK9lc1PiKiikgxd9+9exeff/459uzZA2dnZ8TExMDX17fcOi/mvtTUVDx58qRcfrW2tkZmZmaVtkdUEyyOqVK9e/fGP//5TwQHB6NXr17o1asX/Pz8EBYWhv79+wN4lqhMTEyQnp6u/MSfnp5e6byvoqIi2NjYYNWqVXj69CkWLFiAgIAABAQEVDm2J0+eKP+dkZEBY2NjNGrUCM2bN0fnzp0RGBhYbt3nE7wmVXd/zZs3R6tWrfDVV18pl6Wnp1frgwERUWWkmLv//PNPWFhYwNnZGcCzgYFXad68OZo1a4adO3cql2VnZ8PExARHjhyp9vaIqoLTKqhSZmZmcHFxwffff49+/fqhUaNGykKw7DKXoaEhPDw8lJfi8vPzcejQIYwZM0bldtPS0pQ3a9SvXx+dOnVCSUkJAMDCwgJ5eXkAgNmzZ6tMeCdPnkR2djaAZ5f43nvvPchkMgwbNgxxcXG4f/8+AEAul8PT0xOlpaUa6JGXvWp/FhYWyM/PR3R0NH766SeMHj0aUVFRyuI+MTERM2fO1EpsRPR6kmLubtOmDbKysnDnzh0Az25kfpUBAwYgMzMT8fHxAIDc3Fx88sknyM7OrtH2iKpCtnjx4sW6DoKk7enTp8jIyMDHH38MAHj8+DEKCgrw97//XbmOq6srTp48ia1btyI4OBhDhgzBuHHjYGBggMmTJyM5ORlxcXGws7NDq1atYGRkhNjYWGzduhUhISH4f+zdeWCM59oG8CuZrJJYEqEoitYWYiut2mI7qJ0uVHKq9mrqw2klaUrRfvatVBGctFR7tCKLpdVS0RxEbLFGqC2xRRZEZM/c3x8m7ydkZCKzxvX7y0xmud7nnfee2zPPO5OcnIzAwEC4uLjA3d0doaGhCAsLQ/369eHl5fVEpt27d6NevXrYvn071q1bB7VajdmzZ8PR0RGVKlVCkyZN8NVXXyE8PBy7du1CYGAgatWqheDgYISFheHChQtISkpChw4dAADvv/8+rl27hhMnTqBz586YPHkykpKScPbsWdSpUwfz5s1DQkICsrOzceXKFfzyyy+4cOECXF1d0bZtW63PBwBqtRqrV6/G0aNHMXLkSDRr1gyVK1fG//7v/2Lbtm3Yv38/Zs+ejSpVqmDBggXYt28f4uLikJeXh1atWhl+BxNRuWRutbtatWrIz8/HwoULER0dDTs7Oxw9ehRXrlyBs7MzlixZgoSEBERHR2PgwIEAADs7O7z22mtYsGABtm7dioiICHz44Ydo2rTpUx+vZ8+eRhtnKn/4VW5kkfz9/VGrVi1+xRkRERHpFZdVEBERERFp8IQ8sjjffPMNoqKiYG9vjxdeeIFfdUZERER6w2UVREREREQaXFZBRERERKRhkGUVycn3n+l+VapUwJ07mXpOUzbMpBtm0g0z6aa8ZnJ3N9yPuxTWXXMbO3PLA5hfJnPLA5hfJnPLA5hfJnPLA5hHpmepu2Y1c2xjozJ1hCcwk26YSTfMpBtmenbmltPc8gDml8nc8gDml8nc8gDml8nc8gDmmUkXPCFPz3Zc+v2J6/rW/4cJkhARmYfi6iLA2khE5onNsRHwjYGIiIjIMpjVsgoiIiIiIlNic0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg0+D3HRESkF8V9p3uFJHsTJCEienacOSYiIiIi0uDMsQnxl/OIiIiIzAubYzPEppmIiIjINNgcPyNtDSwREemGEwFEZI645piIiIiISIPNMRERERGRBpdVaGj7eG+k+1AjJ9GuMGOFJHtkPshRrudHkERERET6weaYiIhKxdDnXHAtMhGZEpdVEBERERFpsDkmIiIiItLgsooS/Hx6e5H1vURERERUfnHmmIiIiIhI47mbOeaPdxARERGRNs9dc0yGFxZ1qdjrB3Wqb+QkRERERKXDZRVERERERBqcOSYiomKZ2zI0fv8xERkDm+NygG8YRERERPrB5pie2aNri52c7PHADL/yjuufiYiIqDTYHJdj5XVG2ZANL5tpIiKi51u5bY7Nba2cIZ1LuFPK228u9voGqrbFXm+qxlBbo0pEzydtta6vif7vWlyN4n+kiSxfuW2OSX/YpBKRJdJWu8YOaWHkJERkSdgck8mZovlmw09Ufhj6eDanGsWZaSLDY3NMRmPsNzAnJ3uDPXYhvlERkS70VUNYi4gMz2Ka4+dpDTGVXz/uOlfst3rwjY3o/5X2PIqLBYeLvV7beRTajkMiIsCCmmP6/zcMW1sb5OXlmzgNAYafxeEsERkDJx/0p7SfkIVFXdLLV2Hq+5O5xzOx5tDzhM2xCWmbHWlcp4qRkzxU2tkX0k7bG5W2pR7mtga6tE05z9qnpzFVrdNW0zzR0aDPS0SWzeyaY0ufwSjtx4GGegx9Km3TzCbb8PTRvBYy9g+46Cu7oZvv4tawP3iQw6a/HIjLikZewZOfvrFGaaev/8A/S40CdKtTlvKJHT8RNH9WIiKmDkFEREREZA6sTR2AiIiIiMhcsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGib9EZB33nkH9vYPfzHM2toa33//Pe7evYvFixejdu3auHLlCqZOnYqqVasaJc+1a9cwcuRI1KhRAwCQkZGBRo0aoVatWoiJiVFuN2HCBHTo0MFgOZKTk7Fs2TKcO3cOISEhAPDUcVm3bh0yMjKQnp6ODh06oHv37kbJNGfOHDg6OqJChQo4d+4cPvvsM7i7u+PatWsYM2YM3N3dAQAeHh7w9/c3SqYVK1Zo3VemGqdx48YhKytLuU18fDyioqKQnJxs8HFKSEjAsmXL0LRpU9y6dQuVK1eGr6+vSV9P2jKZ8vWkLZOpX0+lceDAAfz+++9wc3ODlZUVfH19jfbcpa3l+h47fdXMuLg4bNq0CS+++CJSU1Ph5+cHG5tne5vUV33SVyZ91gJ9ZNLnMaevMVKr1ZgwYQI8PT2Rl5eHxMREzJkzB9nZ2SYZI2151q5da7IxAoDs7Gy8/fbb6NixI/z8/Ex+rBmEmNDy5cufuG769OmyY8cOERHZs2ePfPLJJ0bLk5aWJvv371cuf/3113L48OFicxrSr7/+Knv27JHBgwcr12kbl9jYWBkzZoyIiOTm5krPnj3l3r17Rsm0ZMkS5d9r1qyR2bNni4hIYmKihISE6D2DLpm07StTjlPhfhMRSUhIkOnTp4uIccbpxIkT8scffyiX+/TpI6dOnTLp60lbJlO+nrRlMvXrSVeZmZnSo0cPycnJERERX19fOXDggNGevzS13BBjp4+aqVarpW/fvnL79m0REZk7d678/PPPes1U2teTPjPpqxboK5O+jjl9jlFBQYGsXLlSuTxhwgQJDw832Rhpy2PKMSq8/7Rp02TevHkiYvpjzRBMuqzi/PnzCAoKwooVKxAZGQkA2LdvH1q1agUAaN26Nfbt22e0PFWqVMEbb7wBAMjNzcXp06fx6quvAgBWrVqF9evXIygoqMgsoCH07t0bTk5ORa7TNi579+5Fy5YtAQC2traoX78+Dh8u/ueb9Z1pypQpyr9FBBUqVFAu7927F+vWrcOyZcvw999/6z2PtkxA8fvKlOP05ptvKv/esGEDvL29lcuGHidPT0/06NFDuaxWq+Ho6GjS15O2TKZ8PWnLBJj29aSr2NhY1KxZE3Z2dgAe7tPCmmoMpanlhhg7fdTMxMREZGdnK59QlPX9Rx/1SZ+Z9FUL9JVJX8ecPsfI2toaEydOBADk5+cjKSkJ9erVM9kYactjyjEKCwtD69at8eKLLyrXmfpYMwSTzmGPHTsWnp6eKCgowIgRI+Dk5ITU1FSloDg7O+PevXvIz883+nT7tm3b0LdvXwAPi1ytWrVQoUIFbNq0CV9++SXmzJlj1DzaxiUtLQ316///77E7OzsjLS3NqNnS09Px3//+FytWrAAAuLq6YtKkSXjllVeQkpKCd955B2FhYahYsaLBs2jbV+YwThkZGbh58yYaNmwIwPjj9Mcff6Bjx45o0KCB2byeHs1UyNSvp0czmfPr6VGP7s/CPKmpqUZ7/tLUcmONXWmf3xhjWNrXk6EylaUWGCJTWY45Q+SJiorCd999By8vLzRv3tzkY/R4HgcHB5OM0d9//41Lly5h6tSpiI+PV6439fgYgklnjj09PQEAKpUKr776Kg4dOgQ3Nzc8ePAAwMNmolKlSiZZh/Lbb78pM36vvPKKMov1+uuvIzo62uh5tI2Lq6urcn3h31xdXY2W6/79+5g1axbmzJmDypUrAwAqVKiAV155BQBQtWpVVK1aFefOnTNKHm37ytTjBABbtmzB0KFDlcvGHKfo6GgcOnQIn332GQDzeD09ngkw/evp8Uzm/Hp61KP7szCPm5ub0Z6/NLXcWGNX2uc3xhiW9vVkiExlrQX6zlTWY84QY9SpUyesX78e165dw6ZNm0w+Ro/nMdUY/fHHH7Czs0NQUBCOHj2KkydP4rvvvjP5+BiCyZrjixcv4pdfflEuX716FXXq1EGXLl1w/PhxAMCxY8fQpUsXo2eLjo5Gq1atYGtrCwCYP39+kZx169Y1eiZt49K1a1fExsYCePixy8WLF9G2bVujZEpLS8OsWbMwbdo01K5dG7t27QLw8GOXwv9V5uXl4datW6hVq5ZRMmnbV6YcJ+DhR4ZRUVHw8vJSrjPWOEVGRuK///0vAgMDkZycjOPHj5v89VRcJlO/norLZK6vp8e1bNkSN27cQG5uLoCH+/TR15ohlbaWG2vsSvv8tWvXhoODA5KTk5+4j76U9vWk70z6qAX6zKSPY06fef7+++8iy5FefPFFXLt2zWRjpC2Pqcboww8/hK+vL8aNG4c2bdrA09MTI0eONMtjraysRERM8cRJSUmYPXs2mjZtioyMDOTn5yMgIADp6elYtGgRatasicTERPzrX/8y2rdVFJo6dSo+//xzZTZj8eLFyMrKgpubG86fP49JkyYp634MISYmBmFhYYiKisLw4cMxatQoZGdnax2XdevWIT09Hffu3UPnzp0NctZ8cZmGDx+O/Px8ZYbPyckJq1evxsGDB7F582Y0adIEV69eRZs2bYrMmBoy08qVK7XuK1ONk4ODA3bv3o1bt24VWW9sjHE6ffo0fHx80KxZMwBAZmYmRowYgW7dupns9aQt08aNG032etKW6fLlyyZ9PZXG/v37sWvXLlSpUgW2trZG+7aKZ6nl+h47fdXMuLg4bNy4ETVr1sS9e/fKdAa9vuqTvjLpsxboI5M+jzl9jVFCQgIWLFiApk2bKs3c559/DltbW5OMkbY8GzZsMNkYAcCuXbuwadMm5OXlYcSIEejYsaNJjzVDMFlzTERERERkbvgjIEREREREGmyOiYiIiIg02BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkQabYyIiIiIiDTbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISIPNMRERERGRBptjIiIiIiINNsdERERERBpsjomIiIiINNgcExERERFpsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg02BwTldLMmTPx6quvYuvWraaOQkRERHrG5phMauvWrfDx8TF1jKfy8fEp0gjPnDkTTZo0MWEiIiLdWUKdLYvHazRRWbE5JiIiIiLSsBIRMXUIMj1/f3+Ehobi1VdfxYYNG5CYmAhvb29ERUXh9u3bGDt2LHJzc/Hjjz/Czs4Oc+bMweXLl6FWq9GjRw+MGTMGeXl5GD16NGJiYjBjxgzs3bsXhw4dwrp163DlyhWEhITA0dERDg4OmDZtGlJTUzFjxgykpKSgSZMmaNiwIaZPn14kl1qtxqxZs3D+/HmoVCrUrVsXgYGBOHjwIBYuXIiqVavC09MT0dHRqFKlCr788kssXboUp06dQq9evTBlyhQAgIhg/fr1+P3336FSqfDSSy8hMDAQzs7OAIC//voL3377LVQqFRwcHDBjxgzUrVsXixcvxk8//YSqVavC3d0do0ePhpeXF3x8fNCmTRtcunQJ8fHx6NWrF6ZOnYrc3NwiYxAZGYnLly/Dz88PPXv2BAA8ePAAX331Fa5cuQIRwcCBAzF8+HAAwO7du7F27Vo4ODjA2toakyZNQqtWrXDs2DEsXLgQtra2EBGMGjUKXbt2NeIrhIjKylzrLAAkJCRg9uzZyM7ORl5eHjp27IiPP/4YgPb6uHnzZqxZswYtWrSAi4sLjh07hsaNG8PX1xdLlixBXFwcRo4ciREjRuDkyZOYPn067t+/j/79++Po0aO4d+8epk2bhk6dOgEANm/ejNDQUNjZ2cHKygrTp0/Hyy+/DOBh3Zw7dy4uXrwIAKhXrx4++eQTBAcHP1GjIyMjsX37dnh7ez9RowutW7cOv//+O2xsbNCkSRP4+fnBzs4OFy9exKxZswAA+fn5eOuttzBkyBCkpKTA398fOTk5yM/PR9euXTFu3DiDvl7IhIRIo0uXLnLs2DEREQkODpYmTZrIqVOnREQkKChI+XdAQID4+fmJiEhWVpb069dPQkNDlcdp2LChrFixQkREIiIiJCYmRtq1ayc5OTkiIvLdd99JSEiIiIiEhISIt7e31kyRkZEyevRo5fLEiRMlMTFRuW/Lli3l+vXrolarZeDAgTJ27FjJycmRlJQU8fDwkKSkJBERCQ0NlTfffFMyMzNFROSzzz6TgIAAERFJSEiQli1byqVLl0REJCwsTHr16iV5eXkiIuLt7a3kLeTt7S3jxo0TtVotSUlJ0rRpU7l161aRMQgKChIRkR07dsg//vEP5W+BgYHy6aefiojI/fv3pVu3bnL48GEREXn99dclOTlZRET++OMPWb58uYiIDB06VGJjY0VEJC4uThl/IrIs5lhn8/PzpU+fPrJ161YREUlPT5dOnTqJSMn1cfny5dK5c2dJT0+XnJwcad++vQQGBoparZbTp09Ly5YtldtGR0dLo0aNZO/evSIicvToUWnZsqWkpaWJiMhPP/2k5I+Ojpbhw4crGT///HPx9/cXEZGCggIZP368REdHi4j2Gj127Nhia3R4eLj07t1bMjMzRa1Wy6RJk2TlypUiIjJp0iTZsWOHiIjcvn1bef+ZP3++rFmzRkREHjx4IMOGDdM6nmT5uKyCFJ07d0ZkZCQA4NSpU+jZsyf27dsHADh79iw8PDygVquxbds2DB06FADg4OCAN99884n1Xj169AAA9O/fH82bNwcAhIWFISsrCyNGjEC/fv10ylSxYkWcP38e+/fvh1qtxpIlS1CzZk3l7/Xq1UPNmjVhZWWFl19+GfXr14ednR3c3Nzg6uqKa9euAQDCw8PRp08fODo6AgCGDBmCiIgI5OfnY/v27WjevDnq1asHAOjXrx9u3LiB48ePPzVbhw4dYGVlhWrVqqFKlSq4fv16kb8XzoY0atRI+ZtarUZ4eDjeeustAICzszO6du2KiIgIAEClSpXw888/Iz09Hd26dVNmJipVqoTw8HCkpKSgcePG+OKLL3QaPyIyL+ZYZ2NjY5GQkID+/fsDAFxcXLB06VIA0Kk+enp6wsXFBXZ2dqhbty4aNWoEKysrNGrUCJmZmUhNTVVu6+TkBC8vLwBA69at4ebmpmz/yy+/jAkTJuC9997D4sWLcebMGQAP62ZYWBiGDBnjU8HRAAAgAElEQVQCALC2toa/v78yq6xNx44dlRpduXJlpQ6Hhoaib9++cHR0hJWVFfr164fw8HAAD2vtb7/9hmvXrsHd3R0rVqwAAFSuXBlRUVG4cOECKlSogH//+986jS1ZJjbHpPDy8sLevXuRmZmJChUqoFu3bti3bx/u3bsHFxcXWFlZIS0tDbm5uXB1dVXu5+rqiqSkpCKPVbhcAXhY2H/44QfExMSge/fumDFjBjIyMnTK1KpVK3z55ZdYu3YtunbtivXr10MeWQnk5OSk/NvGxuaJy3l5eQCAW7duPZE5Ly8PqampT/xNpVKhYsWKuHXr1lOzPbqNdnZ2ynM9/nd7e3vlb4Xjt3DhQvj4+MDHxweHDx9Gbm4uACA4OBhJSUno06cPJk+ejNu3bwMAFi9eDAcHBwwePBijR4/GlStXSh48IjI75lhnk5KSULFiRdjY2CjXtWnTBsCTtbO4+qitDhc+3qO1sVKlSkWeu3Llyrh9+zbu37+P8ePH45133sGPP/6IJUuWIDs7GwCKHY+XXnoJbm5uT92uR8fn0Tp869YtbNu2TanBa9euhbX1w3bos88+Q+PGjfH+++9j+PDhiI2NBQCMHj0a//jHPzBlyhQMHDhQ+Q8OlU9sjknxxhtv4OrVq9i6dSvat2+Pzp074+zZswgPD0fnzp0BPCzQdnZ2SEtLU+6XlpaG6tWra33cvLw8uLm5YdGiRdi1axfu3buH+fPn65Tp/v37aNeuHb777jts3LgRYWFhCAsLK/W21ahR44nMtra2qFq16hN/KygoQHp6Ol544YVSP09JCsdv+vTp2LhxIzZu3IgtW7YgMDAQwMM3nlmzZmHPnj1wc3NDQEAAACA3NxfTpk3D3r170bZtW0ycOFHv2YjI8Myxzr7wwgtIT09Hfn6+ct3FixeRnZ2t9/p47969Ipfv3LmDatWq4fLly8jIyFA+cXs0S3HjkZSUhOTk5GfKUKNGDbz99ttKDf7555+xadMmAEB6ejomTpyI3bt3491338WHH36ozH77+Phg+/bt8PPzw7Rp05CQkPBMz0/mj80xKRwcHNCuXTt8++236NSpE6pUqYJmzZohKCgI7du3B/Dw46xBgwYpH+9lZ2fj119/VT7uKk5SUpJyAoiLiwuaNGmCgoICAA9nHLKysgAAH3/8cZGCCAB//PEHNm/eDACoU6cOqlevDrVaXeptGzx4MH777TdlJiIsLAwDBgyASqVC3759cfr0aVy9ehUAsHPnTtSsWROtWrUqkvHKlSs6v9loUzh+hcsoAGDVqlVKwz9hwgQUFBTAwcEBnp6eyjhNmjQJWVlZsLGxQevWrZXriciymGOdbdGiBerUqYPt27cDAO7evYvJkyfrVB9LKzs7W5l1PXLkCNLS0tClSxfUrFkTNjY2OHnyJAAgKipKuc/j46FWqxEYGKg0x6Wt0YXvBzk5OQCAQ4cOKUvVAgICkJKSAisrK7Rt2xb5+fmwsrJSTjAEHi4jKTw5msonm5JvQs8TLy8v5OTkwMXFRbl8+PDhIh+b+fv7Y86cORg+fDgKCgrQr18/DBw4EAAwatQoAMDUqVMxZcoUtG/fHq6urqhUqRKGDx8Oa2tr2Nvb46uvvgIAvP7661i9ejWGDRuGZs2aFflYDwBatmyJefPm4c8//0RmZiYaNWqEgQMH4uDBgwgKCkJKSgqWL1+O6tWrIyoqCvb29mjcuDH279+P5ORkzJkzB4sXL0b//v1x+/ZtvP/++7C2tla+rQIAateujeXLl8PPz085G3v16tVKlqFDh2LRokUIDQ3FJ598ggULFiAuLg7JycmoV68ewsPDizzXnDlzlDFYv369cob0qFGj8O9//1sZv2HDhilnSn/00UcAgLZt22LEiBGwtbVFQUGBUrC7d++ODz74ALa2tsjOzi5zk05EpmNudValUmH16tWYPXs2tmzZArVajenTp8PW1vap9XHbtm0IDQ1FTk4OfvzxR6SlpRWpjevXr1dyBgUFAQCqV6+OuLg4rF+/Hnfv3sXXX3+NKlWqAAA+//xzBAYG4pVXXkHdunWVbX20bg4fPhwigv79+6Np06YASl+j+/fvj+TkZHh7e8PR0RHOzs6YPXs2gIdrqn19fWFnZ4eMjAwsWLAAjo6O6N27N7766iuoVCpkZGRg8uTJSkYqf/hVbkRERGRwhw4dQkBAAP78809TRyF6Ki6rICIiIiLSYHNMREREBnXy5EnMmTMHycnJmDRpkqnjED0Vl1UQEREREWlw5piIiIiISMMg31aRnHzfEA+rsypVKuDOnUyTZngac87HbM+G2Z6dOefTdzZ3dxe9PdbjnlZ3zXmM9YnbWX48D9sIPB/baeptfJa6Wy5njm1sVKaO8FTmnI/Zng2zPTtzzmfO2UqjvGxHSbid5cfzsI3A87GdlriN/J5j0rsdl34v9vq+9f9h5CREVJ6wthCRMbA5JiIis6KtCSYiMgY2x6TYcel3VEiyR+aDnCLXc1aGiIiInhflcs0xEREREdGz4MwxERHpBdcEE1F5wOaYiIhMgmuLicgccVkFEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg0eEIeEREZFE+8IyJLwuaYiIhKxdya3cI8j/+IEb9CjoieBZdVEBERERFpsDkmIiIiItLgsgoLYm6/PlXaj1b1lX9x5OZir/+X17ulehwiKt/MrWYSkWVgc0wmxzcwIiIiMhdsjssxfTWd5nbyDREREZGhsDkms/X48glbWxvk5eWbKA0RlRf8tIqInoYn5BERERERaXDmuBzQ14lxz5OwqEvFXj+oU30jJyEiIiJzwub4GenjYzl+tEdE5oz/kSai5xGbYyN4/A3m8V9xKun2ZDycUSayPOcS7gB48ryExnWqmCoSEVkwNsdUosI3nsdZwhuPtmaXiIiIqDhsjomIiCxEWNQlODnZ48Fjnz7y0y0i/WFzTOVG8ctRXjZ6DiKyTDwPhIgANsdkBrQt29DH4zRQle4xLhYc1vIXzsoQ0bMr7fkMXBJGZDpsjumZWfJaZEPjiX1E5Ychj2c2wUTmh80x6R2bZiIyZ/qqUY//iicA/Mvr3WfKVFb8DzmR/rA5JiIiegrty62IqDxic/wcep5mdp+3N7XiZo+cnOzRs3UtE6QhMq3Sns+gj/MfuEyCyPKxOdYz/oAHPU1p3zj5kSg9j/R1kq450fYf9QaqtkZO8hCXYRBpx+aYFOcS7jzxC1P6fnxLVVJTW9z3jhLR86e0n1YZumnmTDZR6bE5LoElzARbctNZXpnbLJE2nD0iwDLqHBGRsbA5JiIisnDmNgNd1u9vLunTuOIen//ZJ3157ppjS54h4QwxGQvfZIieT/pahsHlHGTJnrvm2Jyw2bUcJa0jtM2yQV7Bs6/V1vb4iyOLv95U36VK9CxY60rvaTWnrPWGno25TRqYW57yxOyaY339tn1pZ4hL+/Vmpbn947c15ElvZN709dVyxf34gDa2WTZ4ENWqVHm0fRSr7XlL06w/fmxWSLJH5oOcUh/j+npjeNoMV3Ef7T5Pbzxsai2fvmqOvpZn6Jqn8D8AhjxPwxxmyR+tMebWZGtjyJzm0vBbiYgY9RmJiIiIiMyUtakDEBERERGZCzbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISMPsfgSkJAcOHMDvv/8ONzc3WFlZwdfXt8jfc3JyMH/+fFSvXh1XrlzBuHHjUK9ePQDA3LlzYWNjA7VajezsbEyfPh3W1vr7/0FJ2QBg586dWLJkCQIDA9G1a1fl+vDwcMTFxcHa2hp16tTBsGHD9JarLNlOnjyJ77//Hk2bNsXly5fh6emJd955xyyyFUpNTcWgQYMwfvx4eHt76zVbWfPFxsZi//79sLa2xqFDhzB37lzUqFHDLLKZ+ngICgpCSkoKqlatijNnzmDSpElo0KABANMfD9qyGeN4KIuy1EdLUpbXlqXQ5dgGgIiICHz66ac4duwYnJycjJyy7EraThHBxo0bAQDXr19Heno65s6da4qoZVLSdiYmJmLBggVo3rw54uLi0K9fP3Tv3t1EaZ9NcnIyli1bhnPnziEkJOSJv1tU/RELkpmZKT169JCcnBwREfH19ZUDBw4Uuc2aNWskKChIRETOnTsnw4cPFxGR2NhY6d+/v3K7/v37y5EjR4yaLSEhQQ4ePCje3t7y559/KtffvHlTBgwYIGq1WkREhgwZIpcvXzaLbLt375YTJ06IiEhubq68+uqrkpqaahbZREQKCgokMDBQJkyYIBs3btRbLn3ku3//vvj6+ha53YMHD8wimzkcD0uXLlVe8zt27JDx48eLiHkcD9qyGfp4KIuy1EdLUpb9Zyl02UYRkb///luWLFkiDRs2lIyMDGPHLDNdtjM0NFRCQ0OVy3FxcUbNqA+6bOeMGTMkODhYRETOnDkjPXv2NHbMMvv1119lz549Mnjw4GL/bkn1x6KWVcTGxqJmzZqws7MDALRu3RqRkZFFbhMZGYlWrR7+VG6jRo1w7tw5ZGRkoHLlysjMzER+fj7y8/NhZWWFF1980ajZateujddff/2J+0ZFRcHDwwNWVlYAgFatWuGvv/4yi2zdu3eHp6enclmlUsHW1tYssgHA2rVr8fbbb6NSpUp6y6SvfPv27UOFChUQHByMb775BmfOnEGFChXMIps5HA+TJ09WXvNqtVoZG3M4HrRlM/TxUBZlqY+WpCz7z1Loso1ZWVlYt24dPvroIxMk1A9dtnPbtm24e/cuNmzYgCVLlljk7Lgu21m1alWkpaUBANLS0uDh4WHsmGXWu3fvp+4fS6o/FrWsIjU1tcjAOzs7IzU1Vafb1K1bF++88w7+53/+B9bW1njjjTfg6upq1GzapKWlFbmvk5OTzvc1dLZHbdq0CRMmTICLi4tZZIuOjoaDgwNatGiBn376SW+Z9JXv+vXrOHHiBL766iuoVCr885//ROXKlbU2+sbMZk7HQ25uLkJDQ/HFF18AMK/j4fFsjzLE8VAWZamPzs7ORstZVvraf+ZMl21cunQpJk6cqDRclkiX7bxx4wYyMjLg6+uLy5cvY8yYMdi5cydUKpWx4z4zXbbzgw8+wEcffYS5c+fi5MmTmDhxorFjGpwl1R+Lmjl2c3PDgwcPlMsZGRlwc3PT6TZ79uzBoUOHsHLlSqxYsQLXrl3Dzz//bNRs2ri6uha574MHD3S+r6GzFdq2bRsyMzMxcuRIveUqa7Y9e/YgJycHQUFBOH/+PPbv31/sOidT5XN2dkbTpk1ha2sLa2trtGzZEocPHzaLbOZyPOTm5mLmzJmYMmUK6tSpA8B8jofishUy1PFQFmWpj5ZEH/vP3JW0jTdv3kR6ejp+/fVXBAUFAQCCg4Nx6tQpo2ctC132pbOzM1q0aAEAqFevHjIyMnDz5k2j5iwrXbbT398fb7/9NgICArBy5UpMmTIFd+/eNXZUg7Kk+mNRzXHLli1x48YN5ObmAgCOHTsGLy8v3L17V5ma9/LywvHjxwEA8fHxaNy4MZydnXHr1i24u7srj+Xu7q48jrGyadOpUyecOXMGIgIAOH78ODp37mwW2QDgl19+QWpqKiZOnIj4+HhcvnzZLLIFBgZi3LhxGDduHBo2bIgOHTpg6NChestW1nyvvfYarl+/rly+ceMGXnrpJbPIZg7HQ3Z2Nr744gt88MEHaNasGXbt2gXAPI4HbdkAwx4PZVGW+mhJyrr/LEFJ21ijRg3MmzdPqX/Aw5nH5s2bmzJ2qemyL9u3b4/ExEQAD5upgoKCIrXLEuiynTdv3lS2q2LFirC2toZarTZZZn2x1Pqjmjlz5kxTh9CVra0tGjRogODgYMTGxqJatWoYOnQoli9fjgsXLqBNmzbw8PDAb7/9hrNnz2Lfvn2YNm0aqlSpgpdffhl79uzBmTNnEBMTg5SUFPj6+uptvaAu2UQEq1atwqFDh/DgwQM4Ojqibt26cHZ2RoUKFRASEoIDBw6gQ4cO6Nixo15ylTXb7t27MXPmTGRkZCA0NBQRERFo06aN3tanliVboS1btmDv3r1IS0uDi4uLXhvQsuRzdXVFbm4ufvvtN8TExMDe3h4ffPCBshbSlNnM4XiYPHkyzpw5gyNHjiA0NBTR0dF49913zeJ40JbN0MeDobdLW320JGXZf5ZCl20EHi5BCg4OxqFDh6BSqVCvXj2zbTaKo8t2NmvWDBEREYiPj8f27dsxYcIEi/vmEV22s0GDBvjhhx9w9epVREREYMCAAWjXrp2po5dKTEyM8k1D2dnZaN68Ob799luLrD9WUjg9Q0RERET0nLOoZRVERERERIbE5piIiIiISIPNMRERERGRBptjIiIiIiINNsdERERERBpsjomIiIiINNgcExERERFpsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg02BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkQabYyIiIiIiDTbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISIPNMZERbNu2DYGBgaaOQUREj2BtpuJYiYiYOgRRcbZu3YrQ0FBs3LjR1FHKrKCgANnZ2XBycgIA+Pv7o1atWvj4449NnIyIyovyVDON5fHaXJJu3bph7ty5eO211wycjEyJM8dERqBSqXQuvkREZByszVQczhxTifz9/REaGopXX30VGzZsQGJiIry9vREVFYXbt29j7NixyM3NxY8//gg7OzvMmTMHly9fhlqtRo8ePTBmzBjk5eVh9OjRiImJwYwZM7B3714cOnQI69atw5UrVxASEgJHR0c4ODhg2rRpSE1NxYwZM5CSkoImTZqgYcOGmD59+hPZEhISMHv2bGRnZyMvLw8dO3ZUZmP/+usvfPvtt1CpVHBwcMCMGTNQt25dbN68GWvWrEGLFi3g4uKCU6dOoWrVqvjmm29gb28PANi/fz9WrFgBW1tbqNVqeHt7o0+fPrhw4QLmz5+vPN+QIUPw7rvvYs+ePfj888/h6OiITz75BG+++SZ8fHxw6dIlTJgwAVu2bMH9+/fx559/4vvvv0dQUBDs7e1Rq1YtDBgwAGvXrsWdO3fw3nvvYcqUKZg3bx5CQ0MxduxYjBkzxqj7m4jKhjWzbDVz7ty5aN++PZYsWYLjx4/DysoKHTp0wEcffQQrK6si23Py5ElMnz4d9+/fR//+/XH06FHcu3cP06ZNQ6dOnQAAKSkpmDVrFtLS0pCXl4fhw4dj8ODBiI+Px7Rp05TavGfPHixcuBBVq1ZFixYtcPjwYVhbW2PlypVwc3NDQEAAtm/fjvr166NixYrw8/ODo6MjZs2aBQDIz8/HW2+9hSFDhhjy5UXGIEQ66NKlixw7dkxERIKDg6VJkyZy6tQpEREJCgpS/h0QECB+fn4iIpKVlSX9+vWT0NBQ5XEaNmwoK1asEBGRiIgIiYmJkXbt2klOTo6IiHz33XcSEhIiIiIhISHi7e2tNVN+fr706dNHtm7dKiIi6enp0qlTJxERSUhIkJYtW8qlS5dERCQsLEx69eoleXl5IiKyfPly6dixo9y9e1cKCgqkb9++sm3bNuW+rVq1ksuXL4uIyIkTJ5QcsbGxEhsbKyIiubm50rt3b+V233//vXzwwQdKvj/++EM2b94sIiLR0dHStWtX5W9+fn6yfPly5fLp06eldevWkpWVJSIiKSkpEhAQoHXbici8sWaWrWZ+++234uPjI/n5+ZKbmyvvvvuuhIWFFbtd0dHR0qhRI9m7d6+IiBw9elRatmwpaWlpIiLy/vvvK/U2NTVVOnToIDExMcp9H63NISEh0qJFC0lISBARkTFjxsjq1auVv3ft2lWio6OVy5MmTZIdO3aIiMjt27dl9OjRWsefLAeXVZBOOnfujMjISADAqVOn0LNnT+zbtw8AcPbsWXh4eECtVmPbtm0YOnQoAMDBwQFvvvkmtm7dWuSxevToAQDo378/mjdvDgAICwtDVlYWRowYgX79+umUKTY2FgkJCejfvz8AwMXFBUuXLgUAbN++Hc2bN0e9evUAAP369cONGzdw/Phx5f4tWrRApUqVYG1tjVdeeQXXrl1T7tusWTO89NJLAABPT09MnjwZAFC3bl1s2bIFw4YNw6hRo5CcnIyzZ88qz3H48GEkJSUBAH777Tf06dNHp23x8PBAzZo1sWfPHgAPTxLp27evTvclIvPDmlm2mhkaGorBgwdDpVLB1tYWvXv3RkREhNZtc3JygpeXFwCgdevWcHNzw759+5CUlISDBw8qY+zq6govL68nxvhR9erVQ+3atQEAjRo1UrazOJUqVcJvv/2Ga9euwd3dHStWrNB6W7IcbI5JJ15eXti7dy8yMzNRoUIFdOvWDfv27cO9e/fg4uICKysrpKWlITc3F66ursr9XF1dlcJXyNnZWfm3g4MDfvjhB8TExKB79+6YMWMGMjIydMqUlJSEihUrwsbGRrmuTZs2AIBbt24VyaFSqVCxYkXcunWr2Bz29vbIy8sr9r6PPu68efOQmpqKTZs2YePGjWjSpAmys7OVbe3QoQPCw8Nx7949qFQquLi46LQtADBgwACEhYUBAA4ePIj27dvrfF8iMi+smWWrmbdu3UJwcDB8fHzg4+ODiIgIFBQUaN22SpUqFblcuXJl3L59W8lf0hg/Stt2Fuezzz5D48aN8f7772P48OGIjY3VeluyHGyOSSdvvPEGrl69iq1bt6J9+/bo3Lkzzp49i/DwcHTu3BnAw4JjZ2eHtLQ05X5paWmoXr261sfNy8uDm5sbFi1ahF27duHevXuYP3++TpleeOEFpKenIz8/X7nu4sWLyM7ORo0aNYrkKCgoQHp6Ol544YUSH/fx+wLA6dOnATxc3/bGG29ApVIp+R81aNAghIWFYefOnTrPGhcaMGAADh48iAMHDqB+/fqwtubhSWSpWDPLVjNr1KiBDz/8EBs3bsTGjRuxZcsWLFu2TGuGe/fuFbl8584dVKtWTclfmjEujfT0dEycOBG7d+/Gu+++iw8//BCZmZl6eWwyHb77kk4cHBzQrl07fPvtt+jUqROqVKmCZs2aISgoSJnhtLa2xqBBg5SPq7Kzs/Hrr78+9eSEpKQk5aQRFxcXNGnSRJkdcHJyQlZWFgDg448/LlLQgYcf8dWpUwfbt28HANy9exeTJ0+GSqVC3759cfr0aVy9ehUAsHPnTtSsWROtWrUqcVsfv+/Ro0exatUqAECdOnVw4sQJAMDt27cRHx9f5L7dunVDSkoKfv75Z3Ts2FHrcxRuW2ZmJv71r38BAKpXr4527dph2rRpGDhwYIk5ich8sWaWrWYOHjwY27dvV7YtNDQUq1ev1pohOztbWcZy5MgRpKWloUuXLqhevTo6duyojPGdO3cQGRmpLLMoLScnJ2RnZyM6Ohrff/89AgICkJKSAisrK7Rt2xb5+flPnDRIlkc1c+bMmaYOQZbh/v37uHPnDoYNGwbg4RnAOTk5eOutt5TbvPbaa/jvf/+LdevWYevWrejduzfee+89WFlZYdSoUUhMTMSJEyfw4osvonbt2rCxsUFMTAzWrVuH0NBQJCcnIzAwEC4uLnB3d0doaCjCwsJQv359ZT1ZIWtra3Tq1Alr167Fzz//jO3bt8PPzw916tRBpUqV4OHhgTlz5iA0NBR///03Fi5cCFdXV2zbtg3BwcG4dOkSHB0dcfbsWfzyyy+4cOECXF1d0bZtW+W+4eHhiI2NxcyZM+Hs7IwmTZpg8+bN2Lp1K+Lj45GdnY3Dhw+jQYMGqF27NlQqFa5du4Y6deqgS5cuAID4+HjMmDED169fx7lz59CnTx9UrFgR69atw44dOzBo0CA0atRI2a64uDhMmjTJ8DuUiAyKNfPZaibwsJGPj4/H8uXLERERgbt37yIgIAC2trZPjPP169cRHR2NatWqYfny5di1axe+/PJLNG7cGACU5viHH35AeHg4xo4di+7duz9RmytXrowlS5YgISEB2dnZyMjIQFBQEC5dugRra2u0atUKarUaq1evxtGjRzFy5EhUrlwZCxcuREREBMLDw+Hn54emTZsa4NVExsSvciMyI/v27cOFCxf49W1ERDo6dOgQAgIC8Oeff5o6CpUTXFZBZAYKT8SLiIhQziQnIiIi42NzTGQG9u7di0GDBqFOnTp6O1GEiKi8O3nyJObMmYPk5GQuRyO94bIKIiIiIiINzhwTEREREWnYlHyT0ktOvv/M961SpQLu3LHM7wi05OyAZee35OyAZee35OyAcfO7u+v+ozCl9bzW3cdxW8wTt8U8PQ/b8ix11+xmjm1sVKaO8MwsOTtg2fktOTtg2fktOTtg+fn1oTyNAbfFPHFbzBO3Rctj6e2RnjM7Lv3+xHUVkuzRtXqXYm5NRETPg+LeGwCgb/1/GDkJET0rs5s5JiIiIiIyFTbHREREREQabI6JiIiIiDTMbs3xz6e3I/NBzhPXG3q9FteJEREREZHZNcf6wmaXiIiIiEqLyyqIiIiIiDTYHBMRERERaZTbZRXaaFtuQURERETEmWMiIiIiIg02x0REREREGs/dsorS4jIMIiIioucHm2MiInqulHbSg18BSvR8sZjmmN9bTERERESGZjHNsTZc9kBERERE+mLxzTGVXljUpWKvHzukhZGTEBEZjqEnTzg5Q1Q+8dsqiIiIiIg02BwTEREREWlwWQUREZGF4MnpRIbH5piIiCzaow1jhSR7ZD7IMdjjE1H5x+aYiIjIwPjdykSWg80xERERGYW2b0sa1Kl+uXpOsmxsjo2Aa8SI6HnGGtH/KzwAACAASURBVEhEloTNMRERkZnZcel3g6yfJqKSsTm2IJx9ISJD+Pn09mKbMFPVFtY6MkeFyzOcnOzx4JHjhcszyh82x3pWmpMuDP0GoG2dlTY/7jpX5IAvxAOfiIiInhdsji3IuYQ7xV6fd710TTARkSXiV6qVX6U9aa6423Mih/SFzbEJaWt2+/L4JiIiC/Z48/r4UgRjPKexHqe0t2cTb/7YHJshfR3gREREhqR9Nv9lo+YoD/iVc+aDzbERaJsh1uZiwWG93L6Bqm2pHkcbff2v2BIOfEvISFRecJmE/vBHRoj0h80xmZw+1po5OdmjZ+taes1FRPS8MbdvCuEnqWQKbI5J70y17qs0j8GZYCL940xw+cV9W37xffJJbI7LMUMvtyCi8s3cZhFJO23L9xrXqWLQx9emgUovT2tQ5XVWms1u2bE51rPSFhCyDCw2RGSJSvutSHwPMzxTNeXaliQa6rEBy32PZHNM5UZ5nQUgMjf8iN3yLY7cbNDHf/yTS9ssG+QV5Gu9PT/RLD1zarIN/djGbrLZHD+HtC238ERHgz4+i1/pmUuhICIiel6wOaYSlcdm19AnDZpb82opOYksXWmXJWhbE3wu4Q5sbW2Ql6d9tpXIkJ7nT2PNrjk+9XdKscXgaQWkOKW9vT7Y2prdcJZKXFb0Uz/2elxpv4+5tI9T2ua7uMfR9hilzV7ax1kcWbptqtQwEZnF/nqUYb9I3xJ+2UkfX/X3tNtrw5PR9MfQJ4vpi74aW0M+Z3lV2veB0tRwS57IASy7SbXUiRkrERFThyAiIiIiMgfWpg5ARERERGQu2BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkYbJvpj3wIED+P333+Hm5gYrKyv4+voW+XtOTg7mz5+P6tWr48qVKxg3bhzq1atnorRFlZQ9KCgIKSkpqFq1Ks6cOYNJkyahQYMGJkr7pJLyF4qIiMCnn36KY8eOwcnJycgpi1dSdhHBxo0bAQDXr19Heno65s6da4qoTygpe2JiIhYsWIDmzZsjLi4O/fr1Q/fu3U2U9knJyclYtmwZzp07h5CQkCf+bs7HbEnZzf2YLYuy1Nrw8HDExcXB2toaderUwbBhw0yxCYqybEu3bt1Qq1YtAEC1atWwePFio+d/lC51eOfOnViyZAkCAwPRtWtX5XpL2y+A9m2xtP3ytFphafvladtiaftl586d2LNnDxo3boxTp05h0KBB6NatG4Bn3C9iApmZmdKjRw/JyckRERFfX185cOBAkdusWbNGgoKCRETk3LlzMnz4cKPnLI4u2ZcuXSpqtVpERHbs2CHjx483ek5tdMkvIvL333/LkiVLpGHDhpKRkWHsmMXSJXtoaKiEhoYql+Pi4oyaURtdss+YMUOCg4NFROTMmTPSs2dPY8d8ql9//VX27NkjgwcPLvbv5nrMipSc3ZyP2bIoS629efOmDBgwQBmXIUOGyOXLl40X/jFlfd9Yvny58cKWQJdtSUhIkIMHD4q3t7f8+eefyvWWuF+0bYuI5e0XbbXCEvfL0+qepe2XkJAQuX79uogUff981v1ikmUVsbGxqFmzJuzs7AAArVu3RmRkZJHbREZGolWrVgCARo0a4dy5c8jIyDB21Cfokn3y5MmwsrICAKjValSoUMHYMbXSJX9WVhbWrVuHjz76yAQJtdMl+7Zt23D37l1s2LABS5YsMZsZb12yV61aFWlpaQCAtLQ0eHh4GDvmU/Xu3fup42muxyxQcnZzPmbLoiy1NioqCh4eHsq4tGrVCn/99ZdR8z+qrO8bhw8fxtq1a7Fs2TIcO3bMqNkfp8u21K5dG6+//voT97XE/aJtWwDL2y/aaoUl7pen1T1L2y9DhgxBzZo1AQBXr15VZsCfdb+YZFlFampqkTcqZ2dnpKam6nQbZ2dno+Usji7ZC+Xm5iI0NBRffPGFseKVSJf8S5cuxcSJE5UXornQJfuNGzeQkZEBX19fXL58GWPGjMHOnTuhUqmMHbcIXbJ/8MEH+OijjzB37lycPHkSEydONHbMMjHXY7Y0zPGYLYuy1Nq0tLQi1zs5OWmtdcZQ1veNTz75BJ6ensjKysLgwYOxZs0a1K1b12j5dcmpC0vcL09jqfvl8VphyfuluLpnifslOzsbK1asQExMDBYtWgTg2feLSWaO3dzc8ODBA+VyRkYG3NzcSn0bU9A1V25uLmbOnIkpU6agTp06xoz4VCXlv3nzJtLT0/Hrr78iKCgIABAcHIxTp04ZPevjdBl7Z2dntGjRAgBQr149ZGRk4ObNm0bNWRxdsvv7++Ptt99GQEAAVq5ciSlTpuDu3bvGjvrMzPWY1ZW5HrNlUZZa6+rqWuT6Bw8emHR/lvV9w9PTEwDg6OiIJk2amHQ2rCzHiiXul6exxP1SXK2w1P2ire5Z4n5xcHDAp59+ikWLFuGf//wn8vLynnm/mKQ5btmyJW7cuIHc3FwAwLFjx+Dl5YW7d+8qH4F5eXnh+PHjAID4+Hg0btzYLGagdMmenZ2NL774Ah988AGaNWuGXbt2mTJyESXlr1GjBubNm4dx48Zh3LhxAB7OaDZv3tyUsQHoNvbt27dHYmIigIcHUEFBAdzd3U2WuZAu2W/evKlkrVixIqytraFWq02WWReWcMxqYynHbFmUpdZ26tQJZ86cgYgAAI4fP47OnTubZkNQtm05ePBgkY9Sr169itq1axt/IzR02RZtLHG/aGOJ+0VbrbDE/aJtWyxxv6xfv14Z+xdeeAF37txBTk7OM+8XKym8h5Ht378fu3btQpUqVWBrawtfX18sWLAAlStXxrhx45CdnY358+fD3d0dCQkJGD9+vNmc+V5Sdl9fX1y4cAHVqlUDAGRmZhZ7hryplJQfePhRxH/+8x98/fXXmDhxIoYNG4bq1aubOHnJ2e/fv4+FCxeiZs2aSEhIQK9evdClSxdTxwZQcvYjR45gw4YNaNq0Ka5duwYPDw8MHz7c1LEVMTExCAsLQ1RUFIYPH45Ro0Zh+fLlFnHMlpTd3I/ZsihLrQ0PD8fp06ehUqnw0ksvmfzs+2fdlvj4eHzzzTfw8PDA7du3Ub16dYwfP96st0VEsGrVKmzZsgVt2rTBgAED0KlTJwCWt1+0bYsl7pen1QpL2y/atsUS98uqVauQlJSEmjVr4uLFi2jdujXeffddAM+2X0zWHBMRERERmRv+CAgRERERkQabYyIiIiIiDTbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISIPNMRERERGRBptjIiIiIiINNsdERERERBpsjomIiIiINNgcExERERFpsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg02BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkQabYyIiIiIiDTbHRERE5UhmZibGjRuHYcOGYeDAgbh+/brJsnzzzTfo0KEDVqxYUeJtCwoK4OPjg0aNGuHatWsAgCNHjmDUqFGGjqmza9eu4c033zR1DDIwNsdkdrZu3QofHx9Tx9CbRws9EVFZ6FIfd+zYgfz8fPznP//B5MmTYW2t37d6f39/nZpdAPD19UWnTp10uq1KpcLGjRuLXNemTRt8/fXXpc5oKC+++CL+85//mDoGGZiNqQMQERGR/iQlJaFatWoAgK5du5o4TdlYWVnBxcXF1DGKqFixoqkjkIGxOSat/P39ERoaildffRUbNmxAYmIivL29ERUVhdu3b2Ps2LHIzc3Fjz/+CDs7O8yZMweXL1+GWq1Gjx49MGbMGOTl5WH06NGIiYnBjBkzsHfvXhw6dAjr1q3DlStXEBISAkdHRzg4OGDatGlITU1FUFAQUlJS4OPjg4YNG2L69OlFcqnVasyaNQvnz5+HSqVC3bp1ERgYiIkTJ+LgwYPw8vLCmjVrEBYWhkWLFqF169Zo2LAhfvrpJ/Tq1Qvp6ek4deoUevTogV69emH16tU4f/48/P390aNHD5w8eRLTp0/H/fv38d5772H37t3Iz8/HsmXLEBQUhOPHj6Np06aYP3++kiksLEwZh+rVq2PWrFlwdnbGmDFjAABTp06Fvb095s2bB39//yfG44UXXkBCQgJatmyJf//734iNjcWMGTNQqVIlbN261aj7nYhKZq71cfPmzdi6dStycnLg4+ODjz/+GCtWrCj2OdLT0xEcHAyVSgW1Wo2pU6eiTZs2AKDUvCNHjsDGxgZubm745JNP8OeffyIqKgr29vaIiYnBgAED8Pbbb2PmzJk4d+4cbG1t4e7ujtmzZ8PZ2VmnsQwJCUFwcDCqVq2Kvn37KtenpaVhwoQJOHHiBOLj4/Vam2fOnInt27fD29sbly5dQnx8PHr16oWpU6cCAI4dO4aFCxfC1tYWIoJRo0aha9eueP/99xEdHY09e/bgxRdfRF5eHpYsWYLjx48DAFq1aoWpU6fC1tYWvr6+2LdvHz7++GPExsbiwoULGDlyJEaMGFGm1x4ZgRA9RZcuXeTYsWMiIhIcHCxNmjSRU6dOiYhIUFCQ8u+AgADx8/MTEZGsrCzp16+fhIaGKo/TsGFDWbFihYiIRERESExMjLRr105ycnJEROS7776TkJAQEREJCQkRb29vrZkiIyNl9OjRyuWJEydKYmKi5ObmSrt27eTo0aPK38aPHy9qtVpERPz8/GTw4MGSk5Mjqamp4uHhIStXrhQRkV27dkmvXr2U+0VHR4uHh4ccP35cREQ+/PBDGTx4sKSnp0tOTo68/vrryt+OHDki7dq1k9TUVBERmTdvnnz22WdFtj0xMbHINjw+HqdPn5b+/ftLRESEcpvJkydLRkaG1nEgItMyx/ooIrJ8+XLl+bQ9x+nTpyUsLEzu3LkjIiKJiYnSpUsX5farVq2SkSNHSn5+voiIzJo1S8ng5+cny5cvL/L43333XZHnX7p0qXK5uNsXOn/+vHh6ekpCQoKIiGzatKlIzUxMTJSGDRsqt9dnbfb29paxY8eKWq2WpKQkadq0qdy6dUtERIYOHSqxsbEi/9fenUdFVf5/AH8PA4IsihC4IiKlFGpoIq4cNP3WqdRRS1tsUTtqhu16TnFKTlaYx6WTZIl6qNCfmcoaCqYpbgl6ktEQxULDUEkBl2GHeX5/CPcIMjAMs9xh3q//mLlz5/3c633ux+c+944QIjc3t9H2vDdfdHS0ePXVV0Vtba2ora0Vc+fOFdHR0dKy48ePF8uWLRNCCKFWq0VQUJCoqalpdluQfHDOMbUoNDQUBw8eBACcOXMGkyZNQkZGBgDg7NmzCAwMhFarRUpKCmbMmAEAcHJywlNPPXXfiOfEiRMBAJMnT8bgwYMB3P1ffUVFBV566SU888wzemXq0qUL8vLycPToUWi1WqxZswa9evWCg4MDnn76aSQmJkr5Bg4cCIVCIX12xIgR6NSpEzw8PODh4YGAgAAAzc8LdnFxQVBQEADgoYceQu/eveHm5oZOnTqhX79+uHz5MgAgISEBEyZMgIeHh9S+lJQUCCFabMe92yMwMBBTp06VspeUlMDR0REuLi56bRMiMj859o8tadrnBAQE4MMPP8QLL7yADz/8EFevXkVxcTGAu3Obp06dCqVSCQBYsGABgoODda7byckJL774ImbPno3U1FTk5OTolSk9PR1BQUHw8fEBAL1udjNm3zx27FgoFAp4e3vD3d1dunmxa9euSEpKwo0bNxAQEIBly5Y1myUpKQkqlQpKpRJKpRJTp069b982zLkeOHAgysvLpW1M8sXimFoUFhaGAwcOoLy8HM7OzpgwYQIyMjJw69YtuLm5QaFQoKSkBNXV1VIHBAAeHh4oKipqtK57L7E5OTlhy5YtyMrKwuOPP45PPvkEGo1Gr0xDhw7F8uXLsXHjRowfPx6bN2+WOjuVSoW0tDRUV1cjKSkJU6ZMafTZe4tNe3t76W+lUomamhq9lm34u2H5a9eu4fjx43j55Zfx8ssvY/ny5XjggQdQWlraYjuaXnKcPHkyMjMz8d9//yElJYV3RBPJnBz7x5Y07XPeeOMNDB8+HNu2bZNuhKuoqABwt1/r1q2btGz37t2lArapzMxMrFixAitXrsSWLVswf/58VFZW6pXpv//+a/Q97u7urX7GmH3zvdvE0dFR+uzq1avh5OSEadOmYd68ebh06VKzWZpup5b2raOjIwDcd64h+WFxTC0aPXo0/vnnH8THx2PUqFEIDQ3F2bNnkZSUhNDQUAB3O4NOnTqhpKRE+lxJSQm6d++uc701NTXw9PTEqlWrkJ6ejlu3bjWaJ9aSO3fuYMSIEfj+++8RFxeHxMREacR1yJAh8PT0xK+//opLly7B39+/Ha3XT8+ePREWFoa4uDjExcVh27Zt2LlzZ6OToT68vb0REhKClJQUHD16FGPGjDFRYiIyBjn2j/oqLi5GYWGhNKrZtGDr2bNnoyKytLRU51N3Tp8+DT8/P/Tp0wfA3fnK+vL29m60bVobVGiL9vTN1dXVWLp0KQ4cOIDg4GAsWrRI53fcm7m1fUvWgcUxtcjJyQkjRozA+vXrMW7cOHTr1g2DBg1CTEwMRo0aBQCws7ODSqWSLiVVVlZiz549mD59us71FhUVSTeSuLm54eGHH0ZdXR2Au6MCDaMXixcvvq+j/fXXX7F9+3YAQN++fdG9e3dotVrp/alTpyIqKspsxeW0adOk0SIAyM/PxxtvvCG97+zsjMrKSiQlJSEtLa3FdalUKsTGxsLPz0+6nElE8iTH/lFf7u7u6NKlC9RqNQDg8OHDjd6fNm0akpKSpO9dvXo1zp071yhDeXk53n//ffj6+qKgoEAqEo8cOaJ3jieeeALZ2dnSVIhffvnFoPY0p7W+uSVvvfUWKioqYG9vj2HDhknbobnvSE5ORl1dHbRaLZKTk1vct2QdlJGRkZGWDkHydufOHZSWluL5558HANy4cQNVVVV49tlnpWVCQkJw5MgRbNq0CfHx8XjyySfx4osvQqFQYO7cubh8+TLUajX69OkDHx8f2NvbIysrC5s2bUJCQgKuX7+OiIgIuLm5wcvLCwkJCUhMTET//v0RFhbWKI+joyO2bduGHTt2YOvWrfD398eCBQukYrJ3797YuHEjvvjiCzg7OwMAYmNjkZiYiAsXLqBXr16Ii4vDyZMn8eeffyIkJARLlixBUVERTp06hcDAQHz88ccoLCzEtWvXoNVqERMTg/z8fHTu3BkZGRnYt28fcnNz4e/vjxEjRsDd3R2ff/65NOr76aefSpfaNBoNNmzYgLy8PMybNw+LFy++b3s08PX1xXfffYePPvoIXl5eptytRGQEcusft2/fji1btiA/Px+HDh3CtGnTmv0OOzs79O/fH1999RUOHToEIQROnjwJtVqNSZMmISQkBPn5+Vi3bh3i4+Px0EMPSU9Z6NKlCzZt2oTU1FSoVCo8+eSTKCgowFdffYWsrCx07twZJ06cwM2bN3Hq1CmkpqbiwoULcHZ2RmBgYKO8np6e8Pb2xvLly5Geno5+/frhyJEjUKvVGD16NN577z0UFRUhKysLgwcPNlrfvHLlSmRkZCA3NxeBgYHYsGGDdE4IDg6GQqHA2rVrkZSUhEOHDuGTTz6Bj48PXn31Vfz7779Qq9UIDQ3FmDFj8PfffyM6Ohq7du1CYGAgwsPDoVQqsXTpUqjVavz5558YO3YsIiIikJ+fL21jJycn0/7jJIMpRGt3DRFZmdu3b+Ojjz5CdHS0paO0mVarxdy5c/H9999bOgoREZFN4nOOqcPIyMjA4MGDkZaWZnU3s6nVanTp0gUFBQWca0xERGRBLI6pwygoKEBUVBT69etndaPG169fxwcffIAePXpg/fr1lo5DRERkszitgoiIiIioHp9WQURERERUzyTTKq5fv2OK1RpVt27OKC0tt3QMk7OVdgK201ZbaSfQ8drq5eVmsnWbo9+V8/5gNsMwm2GYzTCWyGZIv2uzI8f29rbxDFlbaSdgO221lXYCttVWayDn/cFshmE2wzCbYeSc7V68IU+GUvP3Nvv60/3/Z+YkRETywb6RiMyBxTEREZkUi1oisiYsjo2suZMATwBERERE1oHFsQXpGk0hIrIFHFEmIjmy2RvyiIiIiIiaYnFMRERERFSPxTERERERUT3OObYinJ9HREREZFosjs2AN94REZlOav5eOBc5orysqtHrHDggIkNwWgURERERUT0Wx0RERERE9TitwkCcKkFEZBrsX4nIkjhyTERERERUjyPHHQCfYkFERERkHCyOW8HLe0RE1okDB0RkCBbHHZiuxxvpwhMGEbUHBxOIqCNgcdyBnSsohYODPWpqahu9HtC3m4USEREREckbb8gjIiIiIqrH4piIiIiIqB6nVZAk8XB+s6+rxvU3cxIiIiIiy2BxbEXOFZQ2+zrnEBMREREZB4tjIiKSlbYOBPDmYyIyJhbHJPm77oSOdzitgoiIiGwDi2MiIrIpbf1xEP6YCJFt4dMqiIiIiIjqceRYhnTNt7PU+jlqQkRERLaCxXE9/uyp6fFRcUQdg7X0l229sc+U7WL/R2Q9WBybgakfwWbqkWYiIjlgX0dE5sDimIiImtV0JNW5yBHlZVUWSmM8xiqynzbCoC9HlInkh8WxBXEUhIjIeukqbJtycXE0cRIiMiYWxwbir9Xppu8Jg4hsA/tLIrImLI6pVbpObDWF8iqC/y/9HMqaXPLlpUkiIiJqiw5bHPPxY0RE5tXWqWLWPrWsuV8V9VcG671sS8sTkeV02OJYl4ai2VQ3llh7Z98WxurseUMKkXWxpX6urZrrFx0qrOdUy/6YqAMUx9byvE1boqto1sVYxXRbbnrhCYCodU2LYAcHe9TU1FooTcekq79cfbD5198Pm9Xs67r6NIfefzX7ek3hgwDu9ptNp6MR2TrZFcdyK1o4QkKGMNZNiSzWyRx4E631aOu+0nUO81caI43pya0mINsgu+JY96ijcQ6Eho6i6QgI75qWH0ucsNkRky1q69Uesj2rD25v9vW2XvnT1Zc21/fyEXhkKbIrjnUx9fQJjhDLjzHmNOvq0HXRtW5rGVkzVnHf1vY2t35r/4+GteenjslY93o0rMehwh41dYZPlWlrHt3n8gfb9L3NHZ+6ppBYy4347HPksw0UQghh1m8kIiIiIpIpO0sHICIiIiKSCxbHRERERET1WBwTEREREdVjcUxEREREVI/FMRERERFRPRbHRERERET1WBwTEREREdWzmh8BMcSxY8ewd+9eeHp6QqFQIDw8vNH7MTExuHHjBh544AHk5OTgrbfegr+/v4XStk9rbd29ezf279+PgIAAnDlzBiqVChMmTLBQWsO11s4GycnJWLJkCf744w+4uLiYOaVxtNbW+Ph4/PTTT3B0vPsrUjNmzIBKpbJE1HZprZ1CCMTFxQEACgsLcfv2bURFRVkiaofV2j6oqqrCl19+ie7du+PSpUuYP38+/Pz8AADZ2dk4evQo7OzskJmZiaioKPTs2VMW2aKiomBvbw+tVovKykp8/PHHsLMz3piQPv3R7t27sWbNGkRERGD8+PHS60lJScjNzYWdnR369u2L559/3mi52pPt9OnT+OGHH/DII4/g4sWLGDJkCGbOnCmLbA2Ki4uhUqmwYMECzJ49WzbZLH0stJTN0sdCS/WWqY8Fg4gOqry8XEycOFFUVVUJIYQIDw8Xx44da7TM2rVrhVarFUIIkZqaKhYsWGD2nMagT1t37dolCgsLhRBC5OTkiEmTJpk9Z3vp004hhPjrr7/EmjVrxIABA4RGozF3TKPQd59evnzZEvGMRp92JiQkiISEBOnv3Nxcs2bs6PTZBxs2bBAxMTFCCCHOnTsnXnjhBSGEEHfu3BHh4eHScgUFBaKsrEwW2bKzs8XkyZOl5SZPnixOnjxp1mwFBQXi999/F7Nnzxa//fab9PrVq1fFlClTpPPP9OnTxcWLF2WRbd++fUKtVgshhKiurhbDhw8XxcXFssgmhBB1dXUiIiJCLFy4UMTFxRktV3uzyeFY0JVNDseCrnrL1MeCoTrstIrs7Gz06tULnTp1AgAMGzYMBw8ebLTMO++8A4VCAQDQarVwdnY2d0yj0Ket06dPR69evQAA//zzj1WOkOvTzoqKCmzatAlvvvmmBRIajz5tBYCtW7di8+bNiI6Oxs2bN82csv30aWdKSgpu3ryJH3/8EWvWrLHaKwFypc8+OHjwIIYOHQoAGDhwIM6dOweNRoOMjAw4OzsjNjYW0dHRyMnJMWo/2p5s7u7uKC8vR21tLWpra6FQKNCnTx+zZvPx8cHIkSPv++zhw4cRGBgonX+GDh2KQ4cOySLb448/jiFDhkh/K5VKODg4yCIbAGzcuBHPPfccunbtarRMxsgmh2NBVzY5HAu66i1THwuG6rDTKoqLixudRF1dXVFcXNzsstXV1UhISMCyZcvMFc+o9G1rZWUl1q1bh6ysLKxatcqcEY1Cn3auXbsWixYtkg5Sa6VPW4ODgxEWFgYPDw9kZGTg7bffxg8//GDuqO2iTzuvXLkCjUaD8PBwXLx4Ea+//jp2794NpVJp7rgdkj77QNcyhYWFUKvV+Oyzz6BUKvHKK6/A3d1dZ2Fjzmy+vr6YOXMm3n77bdjZ2WH06NHw8PAwSi59s+lSUlLS6LMuLi56f9bU2e61detWLFy4EG5ubrLIdvz4cTg5OeHRRx/Ftm3bjJbJGNnkcCzoIqdjoWm9ZepjwVAdduTY09MTZWVl0t8ajQaenp73LVddXY3IyEi8++676Nu3rzkjGo2+bXVycsKSJUuwatUqvPLKK6ipqTFnzHZrrZ1Xoy34GgAAA5hJREFUr17F7du3sWfPHsTExAAAYmNjcebMGbNnbS999qmPj4/UwY0cORInTpxAXV2dWXO2lz7tdHV1xaOPPgoA8PPzg0ajwdWrV82asyPTZx/oWsbV1RWPPPIIHBwcYGdnh6CgIJw4cUIW2fbv34/MzEx88803WLduHf7991/8/PPPZs2mi4eHR6PPlpWV6f1ZU2drkJKSgvLycrz22mtGy9XebPv370dVVRViYmKQl5eHo0ePYteuXbLIJodjQRe5HAvN1VumPhYM1WGL46CgIFy5cgXV1dUAgD/++ANhYWG4efMmNBoNgLsjqcuWLcOcOXMwaNAgpKenWzKywfRp6+bNmyGEAAD06NEDpaWlqKqqslhmQ7TWzp49e2LFihWYP38+5s+fDwCYM2cOBg8ebMnYBtFnn65evRq1tbUAgEuXLqFPnz5WN5qqTztHjRqFy5cvA7jb6dbV1cHLy8timTsaffZBWFgYTp06BQA4f/48AgIC4OrqipCQEBQWFkrrunLlCvr16yeLbNeuXWv078TLy0taj7my6TJu3Djk5ORIffKpU6cQGhoqi2wAsGPHDhQXF2PRokU4f/48Ll68KItsERERUv8+YMAAjBkzBjNmzJBFNjkcC7rI4VjQVW+Z+lgwlDIyMjLS0iFMwcHBAf7+/oiNjUV2dja8vb0xY8YMfP3117hw4QIee+wxvPPOO8jJycHJkyeRkJCA48ePY9asWZaO3mb6tDUzMxOpqanIy8tDfHw8Zs2ahWHDhlk6epvo007g7mWa2NhYZGZmQqlUws/PD66urhZO3zb6tPXChQuIj49HXl4e9u7diw8++AA9evSwdPQ20aedgwYNQnJyMs6fP49ffvkFCxcutMo583Klzz4IDAxEWloazp49i4yMDCxduhTdunWDh4cHqqurkZaWhqysLDg6OmLOnDnS/EFLZnvwwQexf/9+5OTkICsrCzdu3EB4eLjR5s/qk00IgW+//RaZmZkoKytD586d4evrC1dXVzg7O2PXrl04duwYxowZg7FjxxolV3uz7du3D5GRkdBoNEhISEBycjIee+wxo81RbU+2Bjt37sSBAwdQUlICNzc3oxWh7ckmh2NBVzY5HAu66i1THwuGUoiGcp2IiIiIyMZ12GkVRERERERtxeKYiIiIiKgei2MiIiIionosjomIiIiI6rE4JiIiIiKqx+KYiIiIiKgei2MiIiIionr/D/0jt4KC39abAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4oAAAJECAYAAABQGNqIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydeViU5frHPzPDNsOwgwuCgqiguGHu21HcLTUtS3+m5hJpp1XLNFHTcE87HrOOWKZldtTUTE+dNDyZ4q6o6VHADRJxgQGGGfZhfn9A4MgyqDwqp+dzXVwXL3PP97253+d9nvd+n01hNpvNSCQSiUQikUgkEolEUozyUTsgkUgkEolEIpFIJJLHC5koSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEAptH7YBEIpFIJBKJRCKRSMrn9u3b/O1vf+PChQts3bq1zOe5ubksXryY2rVrc/XqVcLCwvD393/g88oeRYlEIpFIJBKJRCJ5TDlx4gS9evXCbDaX+/n69eupW7cuL7/8Mi+++CIzZ86slvPKRFEikUgkEolEIpFIHlP69++Po6NjhZ//8ssvhISEABAYGMiFCxcwGAwPfF6ZKEokEolEIpFIJBJJDSU1NdUikdRqtaSmpj6wrpyjKKmQ/JTL1a75a/CMatf8g+7nFgrTFsG61rOFaQ8N/l2IrmZAkBBdhX+AEF2AeW+cEqK79Po+Ibr6JU8J0QXoszReiO6edxoL0RVJ0LyDQnRnaVoJ0RVJN41OiO4EY44Q3QY2LkJ0159YJkQXxLV9oto907X/CtEFmPv0RiG6Y+3Sheh6+huF6IK4NvXrD8X4/B+VuFhsTNguTLs6EfFsDGDr2fCBvu/h4YHRWHp9DAYDHh4eD+qW7FGUSCQSiUQikUgkkppEenp6yfDSHj16EBMTA0BsbCxBQUFotdoHPofsUZQ8ECmpOv4e+SWxFy+z6fO/37eOW/cW1BrYnryUDDDDlWXfWnxea0gnvPq3JfNsAs6tA7ixZR8pu08+Mn9Fant3DcZ/QDuyU/VgNnPyI8u3bK1eeQq1lwvZtzPwbOHP8Q+/JeNSslVd25AnsOvSHXN6Gmazmeyv11t8bt+nPw5PDoa8PAByfvqB3KjdVnWVvkGoGoVAdiZmMxQc2WXxuV3v0ShcvUrtPX3I2Tgfs976kIjD8UlEnb2Ku6MahQIm9Wlj8XmSLpPlu44S7OtJ7PVUBrQOoEdwA6u6AV2a07x/OwzFMY5asa2MTYsnO9Bv2gh2zf2SC3tjrGoCuLm5smD+DK5cSaRRI3/CZy3i1q0UCxsvLw8+X/MR0QePUsvLE1s7W954M7zCCep/ICrOTq5OTJ7xEtcTk/Hxr8fqRZ+TlpJWxq6enzevzpqEyWQiPGyutVAILReitF1cnZkx5y0Sr17DL6A+Sz74Oym3Lb/TMiSYCZNGc+638wQ08uPUybN882XZFejuRNQ9LVJb07k1Tn06Y9JlYDabSf24/F4g50E98F42jdjWwzBnWe9BFFXeysPRRcvI6aO5mXiTOv512bRkA/qUjPvSupOa1u5Vp8+Hz8QRdfQ33F20KIBJw/tZfJ50S8enW34iwKc2l67dZPSTfyHQz9uqrqg6WVQ5hprXpoqsh+6meZeWtBvQEX1KUdy3rdh8XzqPJYWmR3Lao0ePsmPHDm7fvs0nn3zC+PHjiYyMxNXVlbCwMMaMGcPixYv55JNPSExMZP78+dVyXpko1kCOHz/O/PnzmT59Oh06dGDFihU0b96cXr16PXRfTp45R2i3jlyIv/+ueKXajqAlEzncfSrmvAJafD4Ft27NSdt/tsRG5WDHxYiN5Calom3uR4s1b95Xg1kd/orUVjnY0XXReL4NfZfCvAJ6R76Od5dgrkefK7Gx0ThweO7XADQc1IEO4SPZPW555cL29mhfn0Ja2IuQn4/TrHnYtm5D/inLGGYunEfhzRtVd9jGFrteo8j5ai6YCrB78mWUvkEU/n6hxMSUeB7Tz18VHdg5YNf3xSolA9l5BczfFs3Wqc9gZ6Ni6pdRHIm/TofGpQ8d6345Q2u/2ozu3pwLSSm8s+E/VhNFWwc7hs4fz0d9p2HKK2DUp28S0DmYSwdLY+zm44VRl0lG8r2N74/4YDpRew/w7bc7eerJPixZPJsXx71uYWNjY8OO7//N52uLHlhOHN9Dp45PcPDQ8YqFBcZ50vQJHD9wgr0799GlTydenT2JD14vO5wtOKQph/Yeof1f2loPhEB/RWq/O+sNDuw7zK7vfqJ3v78QPm8qb05+z8Kmdm0v1q7ewOmTZ7GxsSEmbh//3hVFmq78YW/C7mmB2goHe+rMfZUrAydhzi+g3sqZaDq1IuvQaQs7uwBf7BrVt+rnnQgpbxXw/LQX+O3AaY786yBterVl1MwX+fStFfet9wc1rd2rLp+zc/OI+Oxbti2bhp2tDVOWrePIb3F0aNGkxGbp+u8Y9Jd29GrfgvjEZN5b+TVblr5dqa6oOllkOa5pbarIeuhu7BzsGL9gEtP6vE5BXgFv/mMawV1acC76t3vWeiwxFz6S07Zv35727dtb/G3atGklvzs4ODBnzpxqP68celoDadu2LYGBgSXHr7/++iNJEgH69uyGRqN5IA2Xtk3IuXYbc14BAOlHY/HsHWJhk7xpH7lJRRWhxr8Oxrhrj8xfkdq1n2iM4VoKhcWxuHksnvq9WlvYnPiw9K2zQqkk35hrVde2aTCmmzchPx+A/HNnsWvfqYydw+ChqJ99HvWosSicnKzqKusGYNbrwFTkb+H1S6j8W1jYmOJKEyCb4C4UnIu2qgtwJuEWdd202NmoAGjtV4v9FxItbNy1atKK50HpjDk087E+Hr9+m8akJaVgKo5xwvE4gkIty1vatdtcPnTvc3QGDujF4cMnAIg+eIyBA0LL2CQn3yxJEh0dNWgdNSQkJlWqKzLOnXp15OyJov/1zLGzdA7tUK7d7u1RFOQXVElTpL8itUP7dufEsaI5rceOxBDat3sZmz3//oXTJ0sf5gsKCiqNi6h7WqS2OiSI/Ou3MBf/X1kn/4u2h+UDisLBHveJz5JSQQ9NRYgobxUREvoE8SdjAYg9foGQ0CceSO8Palq7B9Xj85m4q9T1csPOtqiPoXWgP7/GnLewSbiRQl1PVwDq1XInLjGZNH3lKy+KqpNFluOa1qaKrIfupvETgaQk3aag+Fxxxy8QEnr/L3wkj5Y/fY/i1q1bWb58OePGjSM2Npa0tDSGDRvGgQMHSEhIYPXq1Wi1WuLj41mzZg1NmjTh8uXLTJ48GV9fXzZu3MjFixfx8PDg+vXrzJ07F6PRyJQpU1CpVAQGBnLq1CkGDRrEc889Z3HuvXv3snDhQnr27ElhYSF79uzhhx9+4K233qJt27ZcuXKFQYMG0blzZwAiIiLIz8/H19eXGzeK3lBdv36diIgImjZtSlhYWMnbhEWLFrFp0yZWr17N3r170el0LFq0iICAAH7//Xeefvpp2rZ9PG5cO09nTIbSoR4mQza2nmUXK1A62OL/9nDcujTj3OSVD9PFh4ba05l8Q3bJcZ4hGw9P53JtlbYqGg/vRvTMdVZ1Fa5umLOzSo7NWUYUrpYLkOSfOUXe0UOYMzKwbdcBp5lz0U+fUrmuxglzfum1M+dlo1RX9GZWgapBMAUxUVb9BdAZstHY25YcO9rboTNYvjUd3b05U778mQ93Hubs7ymE3dXwlYfW05ncOxbZyDFk4e3hVyWfrFGrlgeZmUUPRXp9Ju7ubqhUKkymskNVnntuMJPCxvDhsk9JSqp8eI/IOLt5uJJlKCobWZlGnN2cUamUmEz3/9ZUpL8itT083TFmFsXCkGnE1c2lwusHMPalkXz80Wcl17w8RN3TIrVV7q4UGkt1Cw1ZqNwt62Svt8aQ+sk3cI/JnIjyVhHOHi7kFP8f2YYstK5OKFVKCgWc616pie2eTm/A0cG+5Firtud8hmXZDwn050x8As0a+nL2YtGiasbsXNycK54rJapOFlmOa1qbKrIeuhtnDxdy7jhXliELP48HW6jlsaLw0dcfD5M/faL4zDPP8N133xEcHMzEiRN55ZVXMBqNLFiwgIiICKKjo+nXrx/h4eG8++67tGnThiNHjrBo0SJWrVpFnTp1GDFiBEqlkoiICA4cOECPHj0ICwtj+fLlTJ06FZ1Ox9ixY8skiqGhoezevZsGDRowatQohgwZglKp5MUXX6Rz586kp6czYcIEOnfuzC+//MLVq1f57LPPAIiKKqoYvL296d27N0lJSdjb2zN06FC2by8ad/7888+zevVqAE6ePElGRgajR48mNzeX9HQxq4PdD3kpelRah5JjlVZNfjnzSApz8rkUsRG1X23abJvNwfavYy54NGPFRZGdosdWqy45ttOqyUnRl7FT2qrounAcxxdvJjPhllVdc3oaCnXp22SFxhHzXWXgzuEx+adicJ67AJTKSitFc1YmCtvSa6ewU2POzizXVhXQCtOVM1Z9/QN3rZqs3PySY2NuHu53lBOA2Zt/ZWi7QAaEBKAzZDN4ybf8a/pzuGjs75YrwZCix96xVMdBq8GYWjbGVeWliS/w9JD+GIxZ3LqVipOTlowMPc7OTuh0aRUmGZs3f8+WLTv5efdmrl27zo//3lvhOao7zkNeeIru/buSnZVNWmo6Gq0Gg96IxskRfZr+gR/aRZaL6tYeNXY4/Z4KJcuYRWqKDkcnDXp9JlonR9LTMiq8fkOeGYhGo2blsshK9UXd0yK1Tbp0lI6lukqtBpOutE62qeOJykWL04BuJX9zHzcU477j5Jwtu7Ku6PJ2J6H/15d2/TqQk5WDPjUDB0c1Wfos1FoNhvTMxyJJhJrZ7rk7azHmlPYyGbJzcXexTADfHjOYL3ft46t/7cPZUY2rk4baHpWvUlvddfIfVHc5vpOa1qaKrIfuRp+agcMd59JoNehTH3xusOTRIIeeFuPr6wuAs7Mz9esXvb1xcXEpWWo2NjaW6OhoIiMjOXLkSMkQDrVazdKlS4mMjOTixYvodKXLjPv5+QHg7u5usWTt3QQEFG0N0KJFC8xmM0eOHGHVqlVs3ryZtLSiSf7x8fElenf6W1V69OhBu3btmDBhAuHh4djYPD7vCDKOx+Hg44XCrsgn1/aBpPwcg42rI6riyqb+5NItA3KTddi6O6N0sHsk/ork5ol4tD6eKItjUbtdYxKjTmHv6lhSyascbOm6aAK/Rf5Iym9X8RvYzqpu/vlzqGrXBtuiHjrb4ObkHT2EwskJRXFZ1ox7CZRFwzxV9XwovHHD6puzwuRLKJzdQVXkr9I7ANOV38BeA3aWSZ2qWScK/nuoyrFo2aAWyWkG8oofik5dvUW3oPpkZOViyClaHOBGuhFP5yL/ndX2KBVQaGVRmMST8bjV80RVHOMGbZtwYW8MahdH7O9o3KrKms828OSgF3h+RBg//BhFx45Fw9u6dG7HDz8WJX8KhQJf36K5ld27daRd26KeT7PZTEJiEv7+lc+Pqe4479iwi6kvTCc8bC6Hog7T/IlmALRs15yDe4+U+Fzbu9a9hEKYvyK1v16/hTHDJzPpxans3f0rT7QrujbtOoSwd/evQFEsvOvVKfnOiNHD8PRyZ+WySAKbNsY/oOJ5saLuaZHa2TEXsPWuhaJ4iKGmTTMMvxxF6aJF6aim4EYKydM/Qhe5BV3kFgB0X2yv8OFadHm7k70bd7N47AesmLyUmL0naNymaJpGYNsgYvaeeGD96qImtnstm/iRfDuNvOLet1OxV+ge0pQMQxaG4gVgbun0jB3Ug9FP/oVWTfzo1DIQWyvPG9VdJ/9BdZfjO6lpbarIeuhu4k/E4lnPC5viczVpG0TM3krm4NcwzOZCIT+PK49PtvCYExQURJ8+fQgKCiIvL489e/YARfMDd+zYgbe3d8kStX+gUCiqpH2n3ZYtW7h16xYLFy4kPz+ff/7znwA0atSIw4cPl9j9/nv5++Q5OjqW+HH9+vWSv8fFxTFo0CAmTpzI119/zfr16wkPD6+Sf5VxLOYMO3+KIiVVx+p13zB25DAc7CvuzSmPwuw8Yqd9RuD8ceSl6jH8N5G0/WdpNGsU+ekGElbuQGlvS+CiCeQkpeDYuB5xs9ZhumNow8P0V6S2KSeP6Blf0HneGHJS9ejO/8716HO0nzmC3HQjp1ftpOfKV3AP9MGp/osA2KrtufrDscqFc3MxrPwIx8mvY85Ip+DyJfJPnUQzYRLmTD3ZmzdSmKZD+/oUTDeSsfFrSOaSKqyYVZBP3t6N2PZ4HrIMFKYkUfj7BWy7DsOcY6Tg+E8AKLx8MKfdgvyqz3lQ29nw3tAuLN5xCDdHBxrXdaNDY28++tdRXDT2jO/ZincGdeDrA+c4nXCTJF0mr/Vvi5ujQ6W6+Tl5fBe+lkHvj8WYqufGhUQuHTxH/+kjyc4wsO/TnQD0fPVpXOt50vKpjpgKTMT/av3NbfisRSxc8B5NGjekYcMGTHt3HgAtWzZj3RcrCGnTm5ycXKZOncypU2dxcnJEoVCwbv2myoUFxvkfiz7nlffC8G3oS70G3nw87x8ANGrWkFkrZjCm90QAuvbtTJfenagf4Mv/TX6ejZ9W4rNAf0VqL/5gBe+9/xb+AQ1o4O9LxOyiffWaBjfhb/9YSN+uw+gzoCfhH7zNuTMX6DswFDd3V2a/u4ArlxLK1RR2TwvUNufkcmPOKmrNmoRJpycn9gpZh07j9c54TBmZJQ/VKjdnXEcMBMBj4rOkb/qRgpuVL6ohpLxVwKYlGxg5Ywx1G3pTu34dvp6/7p41yqOmtXvV5bPa3o6ZE59h0RfbcXfW0qR+XTq0aMJHG3birNUw4elenI67woGYCzRr6EOGMYsZ44dZ1RVVJ4ssxzWtTRVZD91NXk4ea2euZuz7E9Hr9CSev/q/s5DNnxCF2dqa7P/jREdHM2vWLIYOHUpoaCjh4eE0bdqUl156iTlz5uDi4sKcOXPIyMhg7dq1+Pj4kJyczODBg2nbti3Lli0jPj6eNm3asH//flxdXZkzZw7Lly/n/PnzzJs3j/j4eBYuXEhERAT9+pUuJX3mzBnmzJlD06ZNefnll2nQoAGXLl1i1qxZtGrVCldXVz777DMiIiLo27cv8+bNIy8vj7p163LgwAEaNmzIa6+9xoIFC8jIyGD27Nn4+/vz2muvERwcTL169Zg/fz5z5syhTp06bNmyhYCAABISEnj++edp2bJlpbERsamoqE2HQdzGw6JY13q2MO2hweW/SHhQRG0OrPAPEKILMO+NU0J0l17fJ0RXv+Qp60b3SZ+l1t+U3w973mls3egxI2jeQSG6szSthOiKpJtGZ93oPphgrNo2A/dKA5vKhzLeL+tPLBOiC+LaPlHtnunavS/mVVXmPn1vC8dUlbF2YqbUePqL22ReVJv69YdifP6PSlwsNiZst270GJB3TUzSa+fTwrrRI+BPnyhKKkYmimKRiWIpMlEsRSaKDweZKJYiE8UiZKJYikwUS5GJYikyUYS8309bN7oP7Hwfz7ZDzlGUSCQSiUQikUgkEokFco6iRCKRSCQSiUQikVij8H9rtX1ryB5FiUQikUgkEolEIpFYIHsUJRUiYk6FyHmEIuc/iuA/DnnixM/d2/YpVaXhKVH7bwpctl6tEiLbwStQiO6hheL2OP3cpfIVYe8XkT6LoqtTIzHCgl42X7YRuHx6lrsQ2QY2YuYzNVDc/5YJlXGp86tCdAG6/PSKEF1RPidlOAnRBYTVyaJ8TjolMBaC2tTLgmLcADH3Xo3iMd7KQgQyUZRIJBKJRCKRSCQSa1jZC/N/DTn0VCKRSCQSiUQikUgkFsgeRUmVcevegloD25OXkgFmuLLsW4vPaw3phFf/tmSeTcC5dQA3tuwjZffJ+zpXSqqOv0d+SezFy2z6/O+Pnc8PMxYAji5aRk4fzc3Em9Txr8umJRvQp2Tck4Z312D8B7QjO1UPZjMnP7JcirrVK0+h9nIh+3YGni38Of7ht2RcSraqKzIWorQDujSnef92GIpjEbViWxmbFk92oN+0Eeya+yUX9sZUyV8nVycmz3iJ64nJ+PjXY/Wiz0lLSStjV8/Pm1dnTcJkMhEeNrdK2qJioencGqc+nTHpMjCbzaR+XP7S9c6DeuC9bBqxrYdhzrK+7UFNLBfl0bxLS9oN6Ig+pSg+21ZsvmcNUfceiCvLIn2+m+qIMYiLhah75PCZOKKO/oa7ixYFMGl4P4vPk27p+HTLTwT41ObStZuMfvIvBPp5PzJ/oebVyTWxHqppMRat/bhh/pMNPZU9ig+Rbdu2odfr7+u7er2ebdvK3ngPC6XajqAlE4mbvZ4rH36Ltll93Lo1t7BROdhxMWIjiau+5+qK7TSeO+a+z3fyzDlCu3XkQXb5FOXzw44FwPPTXuC3A6fZ+ek2Tvx0hFEzX7yn76sc7Oi6aDyH5m7g5PJtuDf1xbtLsIWNjcaBw3O/5vQnu7jyw1E6hI+0qisyFqK0bR3sGDp/PLs++Iqov22lTlB9AjpbxsLNxwujLpOM5NQq+foHk6ZP4PiBE2xY9Q37f4rm1dmTyrULDmnKob1HqqwrKhYKB3vqzH2VWwsiSVn5NQ6B/mg6ld3LyS7AF7tG9R+5v6K178bOwY7xCybx1by1bP3bJuo39SO4y71tiizq3gNxZVmkz3dTHTEGcbEQdY9k5+YR8dm3vDN2CJOH9yMuMZkjv8VZ2Cxd/x092zVn3JBQxg7qQfgq6/sPivIXal6dXBProZoWY9HakkePTBQfItu3b3+gRHH79ke3GalL2ybkXLuNOa8AgPSjsXj2DrGwSd60j9ykokpA418HY9y1+z5f357d0Gg09+8w4nx+2LEACAl9gviTsQDEHr9ASOgT9/T92k80xnAthcJin28ei6d+r9YWNic+LH1rqVAqyTfmWtUVGQtR2vXbNCYtKQVTsW7C8TiCQi11067d5vKhe99wulOvjpw9UfS9M8fO0jm0Q7l2u7dHUZBfUGVdUbFQhwSRf/0W5mJfsk7+F22P9hY2Cgd73Cc+S0oFvRIP01/R2nfT+IlAUpJuU1B8rrjjFwgJbXtPGqLuPRBXlkX6fDfVEWMQFwtR98iZuKvU9XLDzrZoYFfrQH9+jTlvYZNwI4W6nq4A1KvlTlxiMml6wyPxF2penVwT66GaFmPR2o8lhYVifh5T/lRDT7du3cry5csZN24csbGxpKWlMWzYMA4cOEBCQgKrV69Gq9USHx/PmjVraNKkCZcvX2by5Mn4+vqyceNGLl68iIeHB9evX2fu3LkYjUamTJmCSqUiMDCQU6dOMWjQIJ577jmLcx84cICkpCTWr19Pw4YNGTlyJBs3buTKlSu4ubmRmZnJtGnT2Lp1K/Pnz2fevHnY2Njw1VdfMX36dH7++WeSkpJYuXIl3bp14+eff+b06dN89dVX/PDDD8yePZvjx48TExPD+++/T3BwMM7OzuzYsYOffvqJXbt2lTmXQqGocuzsPJ0xGUqHppgM2dh6upSxUzrY4v/2cNy6NOPc5JX3f7GqAVE+P4pYOHu4kGPMBiDbkIXW1QmlSkmhqWqVi9rTmXxDdslxniEbD0/ncm2VtioaD+9G9Mx1VnVFxkKUttbTmVxjqW6OIQtvD78q+WQNNw9XsgxZAGRlGnF2c0alUmKq4nWqCFGxULm7UmgsLReFhixU7pa6Xm+NIfWTb+AeEtuaWC7Kw9nDhZw77pssQxZ+Hg3vSUPUvQfiyrJIn++mOmIM4mIh6h7R6Q04OtiXHGvV9pzPsEwCQwL9OROfQLOGvpy9+DsAxuxc3Jy1D91fqHl1ck2sh2pajEVrP5b8yYae/qkSxWeeeYbvvvuO4OBgJk6cyCuvvILRaGTBggVEREQQHR1Nv379CA8P591336VNmzYcOXKERYsWsWrVKurUqcOIESNQKpVERERw4MABevToQVhYGMuXL2fq1KnodDrGjh1bJlHs2rUr9erVY+zYsfj4+HDp0qWSJE+hUDB9+nSioqJ49tlnsbGx4fvvvycgIICVK1fi4eGBu7s7MTExvPbaawB4enpy+vRpAAYOHMiHH34IQEhICL179yY7O5tp06bx9NNPk5ycXO65evfuXeXY5aXoUWlLl9ZXadXklzNHrjAnn0sRG1H71abNttkcbP865oJHszmpKJ8fVixC/68v7fp1ICcrB31qBg6OarL0Wai1GgzpmVVOEgGyU/TYakuXtbbTqslJKdu7rbRV0XXhOI4v3kxmwi2ruiJjIUrbkKLH3rFU10GrwZh6fz39AENeeIru/buSnZVNWmo6Gq0Gg96IxskRfZr+gZNEEBcLky4dpWNpuVBqNZh0pbo2dTxRuWhxGtCt5G/u44Zi3HecnLPxD91f0dp3o0/NwOGO+0aj1aBPvbe5waLuPaj+svwwfL6b6ogxiIuFqHvE3VmLMae0F9aQnYu7i2UC+PaYwXy5ax9f/Wsfzo5qXJ001PYomzQ8DH+h5tTJov0VqV3TYixaW/Lo+VMOPfX1LdpjztnZmfr1i8bou7i4YDQW7fkUGxtLdHQ0kZGRHDlypGQIpFqtZunSpURGRnLx4kV0Ol2Jpp+fHwDu7u4lOpURFxeHUqlkzZo1REZGYmNjg8FQ9Dbx6aefxmQyoVAo8PDwuK//MSAgAICgoCAuX75c4bmqSsbxOBx8vFDYFb1bcG0fSMrPMdi4OqIqbuTrT36qxD43WYetuzNKB7v78r86EOXzw4rF3o27WTz2A1ZMXkrM3hM0blO0b19g2yBi9t7bvoM3T8Sj9fFEWexz7XaNSYw6hb2rY8kDocrBlq6LJvBb5I+k/HYVv4HtrOqKjIUo7cST8bjV80RVrNugbRMu7I1B7eKIvfbe94jasWEXU1+YTnjYXA5FHab5E80AaNmuOQeL5yEqFApqe9e6Z+0/EBWL7JgL2HrXQlE8/E3TphmGX46idNGidFRTcCOF5OkfoYvcgi5yCwC6L7ZbfaCsieWiPOJPxOJZz1d6y0oAACAASURBVAub4nM1aRtEzN7j96Qh6t6D6i/LD8Pnu6mOGIO4WIi6R1o28SP5dhp5xb16p2Kv0D2kKRmGLAzFC8vc0ukZO6gHo5/8C62a+NGpZSC2NpW/3xflL9ScOlm0vyK1a1qMRWs/lhSaxPw8pvypehSrSlBQEH369CEoKIi8vDz27NkDwOuvv86OHTvw9vYuk2hVZRinUqnEbDYTGxtLo0aNsLe3JywsDIBz585hU9wAxMTE0LNnT7Zv386pU6do3bo1KpUKc/HKLufPn6dOnTolPuTm5lokrXf706RJkwrPVVUKs/OInfYZgfPHkZeqx/DfRNL2n6XRrFHkpxtIWLkDpb0tgYsmkJOUgmPjesTNWofpjiFF98KxmDPs/CmKlFQdq9d9w9iRw3Cwt7f+xYfg88OOBcCmJRsYOWMMdRt6U7t+Hb6ev+6evm/KySN6xhd0njeGnFQ9uvO/cz36HO1njiA33cjpVTvpufIV3AN9cKr/IgC2anuu/nCsUl2RsRClnZ+Tx3fhaxn0/liMqXpuXEjk0sFz9J8+kuwMA/s+3QlAz1efxrWeJy2f6oipwET8r2es+vyPRZ/zynth+Db0pV4Dbz6e9w8AGjVryKwVMxjTeyIAXft2pkvvTtQP8OX/Jj/Pxk83PZJYmHNyuTFnFbVmTcKk05MTe4WsQ6fxemc8pozMkgdJlZszriMGAuAx8VnSN/1Iwc2KFyWoieWiPPJy8lg7czVj35+IXqcn8fxVzkX/dk8aou49EFeWRfp8N9URY5GxEHWPqO3tmDnxGRZ9sR13Zy1N6telQ4smfLRhJ85aDROe7sXpuCsciLlAs4Y+ZBizmDF+mNU4iPIXal6dXBProZoWY9HakkePwmx+kHUlaxbR0dHMmjWLoUOHEhoaSnh4OE2bNuWll15izpw5uLi4MGfOHDIyMli7di0+Pj4kJyczePBg2rZty7Jly4iPj6dNmzbs378fV1dX5syZw/Llyzl//jzz5s0jPj6ehQsXEhERQb9+lktdf/bZZyQmJpKbm8vixYvZvHkzly5dwtHRkfT0dKZOncrBgwf59NNPWbZsGTt27OC7774jPDycHj168PLLL9OoUSMaNWrE8OHDmTFjBl5eXvj6+rJixQr++te/0rFjx5L/JSwsjBYtilaPK+9cjo6OlcYrqvbz1X4Nup9bWO2af/Br8Axh2iL43CFPmHZPU+XX9n5pmJ8vRFckP6tVQnR/zb8hRHdegZcQXYB6LplCdJMynIToikTU/Sfq3rtsI25eTMMCMYOL/qOyPrrmfmigENNLMdYuXYgugN/mV4ToXn3uEyG6Iu9pUXVy7+zHt1fmYSMqxiJZePXeFld6VOSe/48QXfumPYXoPih/qkRRcm/IRFEsMlF8OMhEsRSZKJYiE8VSZKJYhEwUS5GJYs1GJori+LMlinLoqUQikUgkEolEIpFY4zHeykIEMlGUSCQSiUQikUgkEmvI7TEkkiJEDBMVOTxU5LBWEVxuPVuY9ohhYoZQqRrWE6Kr7DFIiC7A5af+KUR36e1YIbqdljQWogvQZ+ltIbp7ZvgK0RXJuHkHhehOsG0mRBdsBelCN43OutF9sO6OvdWqFUFPLgEHPxYjjLi2r/s5MT77XRO3ObqoOrlV6yQhuvaB4obh2vTqLkT38l/FLBQjcgi85PFEJooSiUQikUgkEolEYo0/2dDTP+U+ihKJRCKRSCQSiUQiqRjZoyh5IFJSdfw98ktiL15m0+d/v28dt+4tqDWwPXkpGWCGK8u+tfi81pBOePVvS+bZBJxbB3Bjyz5Sdp98ZP6K1PbuGoz/gHZkp+rBbObkR9stPm/1ylOovVzIvp2BZwt/jn/4LRmXkq3qqpq0wqZVZ8yGDDCbyfv3N2VsbLsXDQFVetRGoXYkZ+MKq7pK3yBUjUIgOxOzGQqO7LL43K73aBSupSt5Kj19yNk4H7O+8j27AA6fiSPq6G+4u2hRAJOGW245k3RLx6dbfiLApzaXrt1k9JN/IdDP26quqBi7ubmyYP4MrlxJpFEjf8JnLeLWrRQLGy8vDz5f8xHRB49Sy8sTWztb3ngzHGsLUIuKs5OrE5NnvMT1xGR8/OuxetHnpKWklbGr5+fNq7MmYTKZCA+bay0UQsuFKG0XV2dmzHmLxKvX8Auoz5IP/k7KbcvvtAwJZsKk0Zz77TwBjfw4dfIs33y5tVJdkfWbqLKs6dwapz6dMekyMJvNpH5c/oqEzoN64L1sGrGth2HOsj7UVFR5Kw9HFy0jp4/mZuJN6vjXZdOSDehTMu5L605qWrtXnT7XtDrZNuQJ7Lp0x5yehtlsJvvr9Raf2/fpj8OTgyGvaBXknJ9+IDdqt1VdENemHo5PIursVdwd1SgUMKlPG4vPk3SZLN91lGBfT2KvpzKgdQA9ghtY1RUVY4CALs1p3r8dhmLtqBXbyti0eLID/aaNYNfcL7mwN6ZKuo8jZvOfa3Vd2aP4CFmxYgVRUVHVorVt2zb0en21aN0LJ8+cI7RbRx5kkxWl2o6gJROJm72eKx9+i7ZZfdy6NbewUTnYcTFiI4mrvufqiu00njvmkfkrUlvlYEfXReM5NHcDJ5dvw72pL95dgi1sbDQOHJ77Nac/2cWVH47SIXykdWFbexye/yu529eQ9+NGlN5+qJq0stRt1xNztpH8X3eSu/0z8n7ZYV3Xxha7XqPI/3UL+Yd3ofSsh9I3yMLElHie3G+XF/18/wmma3FVSgayc/OI+Oxb3hk7hMnD+xGXmMyR3+IsbJau/46e7ZozbkgoYwf1IHyV9eW1hcUYiPhgOlF7D7Bk6Sq+//4nliwuOw/VxsaGHd//m8VLPmbqO+/TpUt7OnV8onJhgXGeNH0Cxw+cYMOqb9j/UzSvzp5Url1wSFMO7T1iVU+0vyK13531Bgf2HeaTFZ+z+197CZ83tYxN7dperF29gciP1zPz7fm89/4U3NxdK9QUWb+JKssKB3vqzH2VWwsiSVn5NQ6B/mg6tSpjZxfgi12j+lXy9Q+ElLcKeH7aC/x24DQ7P93GiZ+OMGrmiw+k9wc1rd2rLp9rXJ1sb4/29SkYV39M1oZ12DQMwLZ1mzJmmQvnkTHtTTKmvVnlJFFUm5qdV8D8bdG8M6gjk/u2IT45jSPx1y1s1v1yhtZ+tRnfsxXjerRk2a6jVnVFtnu2DnYMnT+eXR98RdTftlInqD4BnS213Xy8MOoyyUiuQh3/uGMuFPPzmCITxUfI66+/Tq9evapFa/v27Y8kUezbsxsajeaBNFzaNiHn2m3MeQUApB+NxbN3iIVN8qZ95CYVVTAa/zoY4649Mn9Fatd+ojGGaykUFsfi5rF46vdqbWFz4sPSt84KpZJ8Y65VXZV/EIW621BQpGu6ch6b4HYWNrZte6Bw1GLbfRB2T43BnJttVVdZNwCzXgemIt3C65dQ+bewsDHFHS/53Sa4CwXnoq3qApyJu0pdLzfsbIsGPrQO9OfXmPMWNgk3UqjrWfSQXq+WO3GJyaTpDZXqiooxwMABvTh8+AQA0QePMXBAaBmb5OSbfL626OHJ0VGD1lFDQmLlizCIjHOnXh05e6Jo4Yozx87SObRDuXa7t0dRkF9QJU2R/orUDu3bnRPHTgFw7EgMoX3LLjSx59+/cPrk2ZLjgoKCSuMisn4TVZbVIUHkX7+Fufj/yjr5X7Q92lvYKBzscZ/4LCkV9DRWhIjyVhEhoU8Qf7Jo0anY4xcICbXyQqaK1LR2D6rH55pWJ9s2DcZ08yYU7/+bf+4sdu07lbFzGDwU9bPPox41FoVT1RauEdWmnkm4RV03LXY2Rfsgtvarxf4LiRY27lo1acULRemMOTTz8bCqK7Ldq9+mMWlJKZiKtROOxxEUalmW067d5vIhcQskScRRY4eebt26leXLlzNu3DhiY2NJS0tj2LBhHDhwgISEBFavXo1WqyU+Pp41a9bQpEkTLl++zOTJk/H19WXjxo1cvHgRDw8Prl+/zty5czEajUyZMgWVSkVgYCCnTp1i0KBBPPfccxbnjoqKYv78+Tz55JPY29tz9uxZXnvtNYKDg8s9n7OzM1OmTAEgKCiI/fv3M378eHbv3k3Tpk157bXXePPNN0lKSqJz587ExMTQu3dvdDod58+fp1mzZrzxxhsAbNy4kStXruDm5kZmZibTpk0jOjqapKQk1q9fT8OGDRk5cmS5dv/5z39YuHAhPXv2pLCwkD179rBv376Hfu3uxs7TGZOhdMiSyZCNradLGTulgy3+bw/HrUszzk1e+TBdfGioPZ3JN5Q2JnmGbDw8ncu1VdqqaDy8G9Ez11nVVWhdMOdmlf4hJwuF1jLGCrdaKBw05P37nyi8vNFMnodx/uRK33QpNE6Y80uvnTkvG6W6oh4GBaoGwRTEVK0XXac34OhgX3KsVdtzPsPygSMk0J8z8Qk0a+jL2Yu/A2DMzsXNWVuhrqgYA9Sq5UFmZpGPen0m7u5uqFQqTKayQ1Wee24wk8LG8OGyT0lKqnx4j8g4u3m4kmUoKhtZmUac3ZxRqZSYTPf/hlOkvyK1PTzdMWYWxcKQacTVzaXC6wcw9qWRfPzRZyXXvDxE1m+iyrLK3ZVCY6luoSELlbulz15vjSH1k2/gHpM5EeWtIpw9XMgp/j+yDVloXZ1QqpQUCjjXvVIT272aVicrXN0wZ5e2e+YsIwpXy1Wl88+cIu/oIcwZGdi264DTzLnop0+xri2oTdUZstHYl65w7Ghvh85g2Qs3untzpnz5Mx/uPMzZ31MIuyvhKw+R7Z7W05ncO1Y4zjFk4e3hV6Xv1kj+ZIvZ1NhE8ZlnnuG7774jODiYiRMn8sorr2A0GlmwYAERERFER0fTr18/wsPDeffdd2nTpg1Hjhxh0aJFrFq1ijp16jBixAiUSiUREREcOHCAHj16EBYWxvLly5k6dSo6nY6xY8eWSRR79erFunXr6NSpE507d+b06dPMnj2brVu3Vni+sLAwli5dyjvvvMOLL75IYWEhhYWFJCUV9SS8/fbbjB49mjfeeAODwUC3bt04ePAgarWa0NBQ3njjDS5dusRXX33FDz/8gEKhYPr06URFRdG7d2/q1avH2LFj8fHxqdRu9+7dNGjQgFGjRjFkyJBHcenKkJeiR6V1KDlWadXklzOPpDAnn0sRG1H71abNttkcbP865oL/rbHi2Sl6bLXqkmM7rZqclLI9xUpbFV0XjuP44s1kJtyyqms2ZKCwv+NtsoOmaF7FneRkYbpaNIzIfPs6OGhQuHli1lWsb87KRGFbeu0UdmrM2Znl2qoCWmG6UvUlu92dtRhzSt9oGrJzcXexfNh4e8xgvty1j6/+tQ9nRzWuThpqe5R92LqT6o7xSxNf4Okh/TEYs7h1KxUnJy0ZGXqcnZ3Q6dIqTDI2b/6eLVt28vPuzVy7dp0f/723wnNUd5yHvPAU3ft3JTsrm7TUdDRaDQa9EY2TI/o0/QM/tIssF9WtPWrscPo9FUqWMYvUFB2OThr0+ky0To6kp2VUeP2GPDMQjUbNymWRleqLrN9E1RcmXTpKx1JdpVaDSVfqs00dT1QuWpwGdCv5m/u4oRj3HSfnbHwZPdHl7U5C/68v7fp1ICcrB31qBg6OarL0Wai1GgzpmY9Fkgg1s92rKXXyH5jT01CoS9s9hcYRc7rl1lGFN2+U/J5/KgbnuQtAqbSaDIhqU921arJy80uOjbl5uN9RTgBmb/6Voe0CGRASgM6QzeAl3/Kv6c/horG/W64EUTEGMKTosXcs9dFBq8GY+vBHuEnEUOOHnvr6Fu3f5ezsTP36RW+VXVxcMBqNAMTGxhIdHU1kZCRHjhwpGXqhVqtZunQpkZGRXLx4EZ2udB8pPz8/ANzd3Ut0Kjt3/fr1uXjxYqXnAwgICADAy8uL2rVrl9Hz8fFBqVTi7OyMh4cHjo6OKJVKlMqiyxQXF4dSqWTNmjVERkZiY2ODwVD2TbY1uz/8aNGiRZnvPgoyjsfh4OOFwq7ovYVr+0BSfo7BxtURVXHFVn/yUyX2uck6bN2dUTrYPRJ/RXLzRDxaH0+UxbGo3a4xiVGnsHd1LKnkVQ62dF00gd8ifyTlt6v4DWxXmSQApisXULp7gU2Rrsq/KQXnjoFGCw5FugVxp1F6FpdLBzUolZj1ZReZuJPC5EsonN1BVaSr9A7AdOU3sNeAnWXjpmrWiYL/HqpyLFo28SP5dhp5xT0Wp2Kv0D2kKRmGLAzFi2bc0ukZO6gHo5/8C62a+NGpZSC2NpW//6ruGK/5bANPDnqB50eE8cOPUXQsnm/YpXM7fvixKPlTKBT4+hYt6NC9W0fatS16A2w2m0lITMLfv/J5XtUd5x0bdjH1hemEh83lUNRhmj9RtPdfy3bNOVg8L0yhUFDbu1alOg/LX5HaX6/fwpjhk5n04lT27v6VJ9oVXZt2HULYu/tXoCgW3vXqlHxnxOhheHq5s3JZJIFNG+MfUPFCEiLrN1H1RXbMBWy9a6EoHmKoadMMwy9HUbpoUTqqKbiRQvL0j9BFbkEXuQUA3Rfby00SQXx5u5O9G3ezeOwHrJi8lJi9J2jcJhCAwLZBxOw98cD61UVNbPdqSp38B/nnz6GqXRtsi3robIObk3f0EAonJxTFz2aacS+BsmiYp6qeD4U3blSpx0hUm9qyQS2S0wzkFb8MOHX1Ft2C6pORlYshp2jBnRvpRjydi/x3VtujVEChlcmnomIMkHgyHrd6nqiKtRu0bcKFvTGoXRyxvyM5/Z/hTzZHscb2KFaVoKAg+vTpQ1BQEHl5eezZswcomh+4Y8cOvL29yyRbCoWiStq///47vr6+XL16tST5quh896JbEU2aNMHe3p6wsDAAzp07h01xJaVUKjGbzcTGxtKoUaMK7arDjzs5FnOGnT9FkZKqY/W6bxg7chgO9hW/1SqPwuw8Yqd9RuD8ceSl6jH8N5G0/WdpNGsU+ekGElbuQGlvS+CiCeQkpeDYuB5xs9ZhMlgf7y/CX5Happw8omd8Qed5Y8hJ1aM7/zvXo8/RfuYIctONnF61k54rX8E90Aen+i8CYKu25+oPxyoXzs8lZ/Mn2D/zMmZDBoXXr2KKO4394HGYszLJ+/lb8n7+Fvsh47DrMxyFZ11yNnwEBfmV6xbkk7d3I7Y9nocsA4UpSRT+fgHbrsMw5xgpOP4TAAovH8xptyC/anMeANT2dsyc+AyLvtiOu7OWJvXr0qFFEz7asBNnrYYJT/fidNwVDsRcoFlDHzKMWcwYP8yqrrAYA+GzFrFwwXs0adyQhg0bMO3deQC0bNmMdV+sIKRNb3Jycpk6dTKnTp3FyckRhULBuvWbKhcWGOd/LPqcV94Lw7ehL/UaePPxvH8A0KhZQ2atmMGY3hMB6Nq3M116d6J+gC//N/l5Nn5aic8C/RWpvfiDFbz3/lv4BzSggb8vEbOXAdA0uAl/+8dC+nYdRp8BPQn/4G3OnblA34GhuLm7MvvdBVy5lFCupsj6TVRZNufkcmPOKmrNmoRJpycn9gpZh07j9c54TBmZJcmhys0Z1xEDAfCY+Czpm36k4Gbli1UIKW8VsGnJBkbOGEPdht7Url+Hr+evu2eN8qhp7V51+Vzj6uTcXAwrP8Jx8uuYM9IpuHyJ/FMn0UyYhDlTT/bmjRSm6dC+PgXTjWRs/BqSuWR+1YIhqE1V29nw3tAuLN5xCDdHBxrXdaNDY28++tdRXDT2jO/ZincGdeDrA+c4nXCTJF0mr/Vvi5ujQ6W6Itu9/Jw8vgtfy6D3x2JM1XPjQiKXDp6j//SRZGcY2PfpTgB6vvo0rvU8aflUR0wFJuJ/rfpIkseKwv+tkWzWUJitrcn+mBIdHc2sWbMYOnQooaGhhIeH07RpU1566SXmzJmDi4sLc+bMISMjg7Vr1+Lj40NycjKDBw+mbdu2LFu2jPj4eNq0acP+/ftxdXVlzpw5LF++nPPnzzNv3jzi4+NZuHAhERER9OtnuQT06NGj6dq1K7m5uZw5c4Y33niDFi1acOnSpTLna9myJe+//z7nz59n0qRJ9OvXj+TkZBYsWEBGRgazZ89m586d7Ny5kwULFnD9+nUWLlzIggULAHjvvfeYNm0aw4cPZ/PmzVy6dAlHR0fS09OZOnUqjo6OfPbZZyQmJpKbm8vixYvLtbt06RJz5syhadOmvPzyyzRoUPlyyvkpl6v9uv0aPKPaNf+g+7mFwrRFsK512RUxq4sRw9KtG90Hqob1hOgqewwSogvw1VP/FKI7+dZ/hOjqlzxl3eg+6bO0/B6fB2XPO42tGz1mBM07KET3C9tmQnQv29paN7pPuml01o3ugwlG69tl3A8NbCofyni/rD+xTIguiGv7RLV7pmviFh4RVScPDf5diK59YNUWuLkfbHqVXTCrOtjwVzFJ2GUbcT1fC6/e28JYj4qcY5Vvh3S/OLR7Rojug1JjE8VHzejRo1m4cCE+Pj6P2hVhyERRLDJRLEUmiqXIRPHhIBPFUmSiWIRMFEuRiWIpMlEsRSaKkHN0ixBdh/bDheg+KDV+juKj4JdffiEpKYmNG2tGoZZIJBKJRCKRSCSSe+F/fo6iCHr06EGPHj0etRsSiUQikUgkEonkYfEn2x5D9ihKJBKJRCKRSCQSicQC2aMokfwPkhtb/r5yD4qmoRBZzAni5sNIai4K/wCB6mLmKIqcSyiRPCxUPmLm2oK4eW4pVxyF6Hoipj0FUDW8JERXVIwTzPe36u7/FI/xVhYikImiRCKRSCQSiUQikVhDDj2VSCQSiUQikUgkEsmfGdmjKHkgUlJ1/D3yS2IvXmbT53+/bx237i2oNbA9eSkZYIYry761+LzWkE549W9L5tkEnFsHcGPLPlJ2n3xk/orU9u4ajP+AdmSn6sFs5uRH2y0+b/XKU6i9XMi+nYFnC3+Of/gtGZeSrerahjyBXZfumNPTMJvNZH+93uJz+z79cXhyMOTlAZDz0w/kRu22qqv0DULVKASyMzGboeDILovP7XqPRuHqVWrv6UPOxvmY9ZVvzA1wOD6JqLNXcXdUo1DApD5tLD5P0mWyfNdRgn09ib2eyoDWAfQIrnx/UBAXYzc3VxbMn8GVK4k0auRP+KxF3LqVYmHj5eXB52s+IvrgUWp5eWJrZ8sbb4ZjbaciUXF2cnVi8oyXuJ6YjI9/PVYv+py0lLQydvX8vHl11iRMJhPhYXOthaJGlgsXV2dmzHmLxKvX8Auoz5IP/k7KbUt/WoYEM2HSaM79dp6ARn6cOnmWb76sfF8tUeVNpLamc2uc+nTGpMvAbDaT+nH5q3w7D+qB97JpxLYehjnL+nYYospbeTi6aBk5fTQ3E29Sx78um5ZsQJ+ScV9ad1LT2r3q9FmUbkCX5jTv3w5DcTmOWrGtjE2LJzvQb9oIds39kgt7Y6qkK6ocQ81rU0XFuDyad2lJuwEd0acUxX3bis33rfXYIXsUJZKqc/LMOUK7deRBduNUqu0IWjKRuNnrufLht2ib1cetW3MLG5WDHRcjNpK46nuurthO47ljHpm/IrVVDnZ0XTSeQ3M3cHL5Ntyb+uLdJdjCxkbjwOG5X3P6k11c+eEoHcJHWhe2t0f7+hSMqz8ma8M6bBoGYNu6TRmzzIXzyJj2JhnT3qxSg4aNLXa9RpH/6xbyD+9C6VkPpW+QhYkp8Ty53y4v+vn+E0zX4qqUDGTnFTB/WzTvDOrI5L5tiE9O40j8dQubdb+cobVfbcb3bMW4Hi1ZtuuoVV1hMQYiPphO1N4DLFm6iu+//4kli8vulWljY8OO7//N4iUfM/Wd9+nSpT2dOj5RubDAOE+aPoHjB06wYdU37P8pmldnTyrXLjikKYf2HrGqJ9pfUeUC4N1Zb3Bg32E+WfE5u/+1l/B5U8vY1K7txdrVG4j8eD0z357Pe+9Pwc3dtUJNkeVNlLbCwZ46c1/l1oJIUlZ+jUOgP5pOrcrY2QX4YteofpV8/QMh5a0Cnp/2Ar8dOM3OT7dx4qcjjJr54gPp/UFNa/eqy2dRurYOdgydP55dH3xF1N+2UieoPgGdLcuxm48XRl0mGcnW64g/EFmOa1qbKirG5WHnYMf4BZP4at5atv5tE/Wb+hHcpcUDaUoeHTJRlDwQfXt2Q6PRPJCGS9sm5Fy7jTmvAID0o7F49g6xsEnetI/cpKLKS+NfB2PctUfmr0jt2k80xnAthcLiWNw8Fk/9Xq0tbE58WPrWWaFUkm/Mtapr2zQY082bkJ8PQP65s9i171TGzmHwUNTPPo961FgUTtY3GVbWDcCs14GpyN/C65dQ+Vs2CKa44yW/2wR3oeBctFVdgDMJt6jrpsXORgVAa79a7L+QaGHjrlWTVrypt86YQzMfD6u6omIMMHBALw4fPgFA9MFjDBwQWsYmOfkmn68teqvt6KhB66ghITGpUl2Rce7UqyNnTxQtJnTm2Fk6h3Yo12739igK8guqpFkTywVAaN/unDh2CoBjR2II7Vt2M+w9//6F0yfPlhwXFBRUGheR5U2UtjokiPzrtzAX/19ZJ/+Ltkd7CxuFgz3uE58lpYIemooQUd4qIiT0CeJPxgIQe/wCIaFWXshUkZrW7oG4tq86dOu3aUxaUgqm4lgkHI8jKNQyFmnXbnP50L0teiayHNe0NlVUjMuj8ROBpCTdpqD4XHHHLxAS2vaBdR8XzGaTkJ/HFTn0tIps3bqV5cuXM27cOGJjY0lLS2PYsGEcOHCAhIQEVq9ejVarJT4+njVr1tCkSRMuX77M5MmT8fX1ZePGjVy8eBEPDw+uX7/O3LlzMRqNTJkyBZVKRWBgIKdOnWLQoEE899xzZc7/17/+lRYtWnDjxg3atGnD4MGD2bt3LwsXLqRnz54UFhayZ88e9u3bx4oVKzCZTCiVShwdHXnppZcwGo289dZbE29KwAAAIABJREFUtG3blitXrjBo0CA6d+78CCJZFjtPZ0yG0qEeJkM2tp4uZeyUDrb4vz0cty7NODd55cN08aGh9nQm31C6qlieIRsPT+dybZW2KhoP70b0zHVWdRWubpizs0qOzVlGFK6NLWzyz5wi7+ghzBkZ2LbrgNPMueinT6lcV+OEOb/02pnzslGqK3ozq0DVIJiCmCir/gLoDNlo7EtXkHS0t0NnsHzTObp7c6Z8+TMf7jzM2d9TCLvrIbk8RMUYoFYtDzIzDQDo9Zm4u7uhUqkwmco2As89N5hJYWP4cNmnJCVVPhRQZJzdPFzJMhSVjaxMI85uzqhUSkym+x9eUxPLBYCHpzvGzKJYGDKNuLq5VHj9AMa+NJKPP/qs5JqXh8jyJkpb5e5KobFUt9CQhcrdsk72emsMqZ98A/eYzIkobxXh7OFCTvH/kW3IQuvqhFKlpFDAue4V2e6VovV0JtdYGoscQxbeHn4PrCuyHNe0NlVUjMvD2cOFnDvqpSxDFn4egpZMfxT8yYaeykSxijzzzDN89913BAcHM3HiRF555RWMRiMLFiwgIiKC6Oho+vXrR3h4OO+++y5t2rThyJEjLFq0iFWrVlGnTh1GjBiBUqkkIiKCAwcO0KNHD8LCwli+fDlTp05Fp9MxduzYchPFoUOH0rt3b0wmEwMHDmTw4MGEhoaye/duGjRowKhRoxgyZAj79+/n9OnTrF27FoDRo0fTtWtX/Pz8ePHFF+ncuTPp6elMmDDhsUkU81L0qLQOJccqrZr8cuaRFObkcyliI2q/2rTZNpuD7V/HXPD4voW5H7JT9Nhq1SXHdlo1OSn6MnZKWxVdF47j+OLNZCbcsqprTk9DoS5966vQOGJOT7ewKbx5o+T3/FMxOM9dAEplpZWiOSsThW3ptVPYqTFnl7+UuCqgFaYrZ6z6+gfuWjVZufklx8bcPNzvKCcAszf/ytB2gQwICUBnyGbwkm/51/TncNHYV6hb3TF+aeILPD2kPwZjFrdupeLkpCUjQ4+zsxM6XVqFScbmzd+zZctOft69mWvXrvPjv/dWeI7qjvOQF56ie/+uZGdlk5aajkarwaA3onFyRJ+mf+CH9ppULv6fvTOPi7Jc//97ZphhZhjZRFEEQUkWcQ8yc8k9K9G0zTI1zSW1/JaWqWGmuaWpmac6UqfN9ByXMn9lZamnUjLNFBdCwA0UEYWBgRkYhhme3x9D4IQyaN4qp+f9evF6MTPX83muuea+n+u5n3sbPuph7hnYmxJLCfl5Rrwa6CkqKsbQwIvCAtMVf7/BD96HXq9j1bLEWv0VVadFajuMhSi9qnWVBj0OY/U12aNJACofAw3u7V71nv/oIVh+3I/1aEYNPdHl7VJ6P96fuHs6Yy2xUpRvQuulo6SoBJ1Bj7mw+JZoJIKc9y7FnFeEp1d1LLQGPZb8muX4arne5fhS6ltOFRXjy1GUb0J7yXVJb9BTlP/X5wbL3BzkoadXSUhICADe3t40b+58yuPj44PFYgEgLS2NpKQkEhMT2bt3b9WQDJ1Ox9KlS0lMTOT48eMYjcYqzbCwMAD8/f2rdC7Fbrdz4sQJ3nrrLf71r3+5HAsQHu7ca6xt27akpaVRWlpKYmIiiYmJNGnSBKPRiCRJ7N27l7fffpsNGzZQUFBz8YCbhWl/OtrgRig0zucWvndEkrf9IB6+XqgqLzbNJw6ssi/LMaL290ap1dwUf0WS+1sGhuAAlJWxCIxrRdaOZDx9vapuCFVaNd0WP8WRxG/IO3KasPvi3OqWp6agCgyEyj3e1DFtsO3bg6JBAxSVZVQ/ehwoncP5VM2CqTh/3u2Ts4qcEyi8/UHl9FcZFI7j1BHw1IPG9eZd1boL9t/31DkW7UIbk1NgxlZ5U5R8+gLdo5pjKinDbHUuDnC+0EKAt9N/b50nSgVUuJksc71j/N77n3J//BM8Omw8X3+zgzsr5xt2vSuOr79xNv4UCgUhIUEA9Oh+J3Gxzh4uSZLIzMqmRYva58dc7zhv+fQrpj0xg4Txc9mz4xfa3O7cM61dXBt+rpwXplAoCAxqXKvOjfL3Uq53uVj78UZGPjyRp5+cxs7vfuL2OOdvE9e5Izu/+wlwxiKoWZOqY4aNGEpAI39WLUskMroVLcKvvFCOqDotUrv04DHUQY1RqJ26+k6tMf+wD6WPAaWXDvv5PHJmrMCYuBFj4kYAjB9uvuLNtejydik7133H66NeY+XEpRzc+RutOkUCEBkbxcGdv/1l/euFnPeqyTqQgV+zAFSVsQiNjeDYzoPofLzwvKTBcbVc73J8KfUtp4qK8eXI+C2NgGaN8Kg8V0RsFAd37ndzVD1CqhDzd4si9yheZ6KioujXrx9RUVHYbDa+//57AKZMmcKWLVsICgrCbHYdpqRQKGrV/OGHH0hKSuKTTz4BYM2aNVc8PioqiuTkZMaPHw/Anj17CA0NZePGjVy4cIFFixZRXl7Of/7zn7/8XQF+PXiYL7ftIC/fyOqP/s2ox4ai9bxyb87lqCi1kTb9fSIXjMaWX4T59ywKdh3lttnDKS80k7lqC0pPNZGLn8KanYdXq2akz/4Ih/nqN369Hv6K1HZYbSTN/JC75o3Eml+EMfUM55JSuOPlYZQVWjj09pf0WjUJ/8hgGjR/EgC1zpPTX/9au3BZGeZVK/CaOAXJVIj95AnKkw+gf+pppOIiSjeso6LAiGHKVBznc/AIa0nxkgXuHbaXY9u5DnXPR6HETEVeNhVnjqHuNhTJasG+fxsAikbBSAUXoLxuc68AdBoPZg3pyutb9uDnpaVVUz86twpixdZ9+Og9GdOrPS/Gd2bt7hQOZeaSbSzm2QGx+Hlpa9UVFmMgYfZiFi2cRUSrlrRsGcr0l+YB0K5daz76cCUdO/XFai1j2rSJJCcfpUEDLxQKBR99vL52YYFx/ufifzFp1nhCWobQLDSIf8z7JwC3tW7J7JUzGdl3LADd+t9F175daB4ewuMTH2Xdu7X4XA/LBcDrr61k1qvP0yI8lNAWIcx/ZRkA0TERvPnPRfTvNpR+9/Yi4bUXSDl8jP739cbP35dXXlrIqROZl9UUWd5EaUvWMs7PeZvGs5/GYSzCmnaKkj2HaPTiGBym4qqbapWfN77D7gOg4diHKFz/Dfbc2hfCEFLersD6JZ/y2MyRNG0ZRGDzJqxd8NFVa1yO+pb3rpfPonTLrTa+SPiA+FdHYckv4vyxLE78nMKAGY9RajLz47tfAtDrmQfwbRZAu4F34rA7yPip9t40keW4vuVUUTG+HDarjQ9eXs2oV8dSZCwiK/U0KUlHrlpH5tZAIblbk10GgKSkJGbPns2QIUPo3bs3CQkJREdHM27cOObMmYOPjw9z5szBZDLxwQcfEBwcTE5ODoMGDSI2NpZly5aRkZFBp06d2LVrF76+vsyZM4fly5eTmprKvHnzyMjIYNGiRcyfP5977rmn6tz5+fk899xzREREEBgYSGJiIjNmzCAiIoI5c+YQHR3NhAkTCA11PtV+5513KC0tRaVSUVZWxgsvvMDp06eZPXs27du3x9fXl/fff7/Gef5Med7J6x7Hn2JmXnfNP+iRskiYtgg+6lBzRczrxZCYM0J09fdGuTe6BhQtwoXoAnw6+eoTXV2YeOG/QnSLlgx0b3SN9Fvq/kn5tfD9i63cG10DIstF5Og17o2ugdn6mqsq3up01xvdG10DT1nqts3A1RLqUXMu3/Xg49+WCdEFcbmvvuU9gFdiE4TojtIUuje6BgJa1Bzpdb0QlVNfe/Pyw1X/KpnStT2oqAvrMje7N7oFKN1R+3SDa0XXZ7wQ3b+K3KNYR7p27crOndVziDZvri7Qf/T0AQQEBLBgQc2nRtOmVS+z/kdvH8DChQur/m/bti1Dhw6tcWzDhg1dehEvPf5SP/5g0qRJNd4LDw9n3brqVb0mTJhQw0ZGRkZGRkZGRkZG5grcwsNERSDPUZSRkZGRkZGRkZGRkZFxQe5RlJGRkZGRkZGRkZGRccffbHsMeY6izBV5L/iJ6675X5W4sf69HF7CtEXwZPI8Ydrln74uRLfkm2NCdD0j3W9E/Hdh33px5Xi7TiVEt2+pmOX6T6rV7o2ukZbl5e6NbiFExkLUHMVdJf5CdEX9diJjLCr31be8B/DE2+2E6Fre2SpEN++UuBiLmv8oymeR8zUDtv0oTPt6UvrdO0J0df1rThu7FZB7FGVkZGRkZGRkZGRkZNzxN5ujKDcUZWRkZGRkZGRkZGRk3PE3G3oqNxRl6kxQtxha3BtHaX4RSBIHVriuuNp+0kB0jXwovWgioG0L9r+xCdOJnGs+n5ePgcdmjCA3K5cmLZqyfsmnFOWZbgmfb2Qs8vKNvJX4CWnHT7L+X29dkwaAMiQK1W0dobQYSQL73q9cPtf0HYHCt1G1fUAw1nULkIpq309K3fF2NF17IBUWIEkSpWs/dvncs98AtPcPAptzM3Trtq8p2/FdnXxWRbTHo/1dSGYTSBK2b/9d8/w94p3+NgxEofPCum7l/5wugF+PtjS+7w5seSaQ4NSyTS6fNx7chUYDYik+mol3h3DOb/yRvO8OuNUN79qGNgPiMFeW5R0rP69h0/b+ztwzfRhfzf2EYzsP3lR/QVz9E+VzfYyF/q4ONOh3Fw6jCUmSyP/Husvaecf3JGjZdNI6DEUqcb8dhshrp6g438jrfZuu7Yi7906K8pxx/3zlhqvWEOmvKO1fMrLZcfQ0/l46FAp4ul8nl8+zjcUs/2ofMSEBpJ3L594O4fSMCXWrKzI/iaojonwW5a9In2VuPnJD8SZy9uxZjh07Rt++fQFYsmQJR44ccdkK41ZBpdXQbfEYNvV+iQqbnb6JUwjqGsO5pJQqGw+9ll/mrgWgZXxnOic8xnejl1/zOR+d/gRHdh9i79af6dQnluEvP8m7z9ftplqkzzc6FgcOp9C7+50cy/gL+1p6qNH0GY51zVxw2NHcPwFlSBQVZ6rnHDqyUnFsryx7Gi2a/k+6bSTi6YlhylQKxj8J5eU0mD0PdYdOlCe73nwVL5pHRe75q/NZ7Yn20clYFk0Cux3tmJmoItrjSD9U/bXieiGVWrD/6ty6RhkU9r+nCyh1GqKWjOWXHtOQbHba/msqft3bULDraJWNSqvh+Px1lGXnY2gTRtv3nnN7E6zWahiyYAwr+k/HYbMz/N3nCL8rhhM/V5dlv+BGWIzFmHLclIUb4O8fx4mof6J8ro+xUGg9aTL3GU7d9zRSuZ1mq15G36U9JXsOudhpwkPQ3NbcrZ+i/QVxcb6R13uNVsOYhU8zvd8U7DY7z/1zOjFd217VZuUi/RWlXWqzs+DzJD6b9iAaDxXTPtnB3oxzdG4VVGXz0Q+H6RAWyIgebTiWnceLn/7XfUNRYH4SVUdE+SzMX4E+37L8zXoU5e0xbiLZ2dls37696vXjjz9+E72pncDbW2E+m0eFzQ5A7q8ZNO/TwcXmtzeqn94qlErKLWV/6Zwde99OxoE0ANL2H6Nj79tvCZ9vdCz69+qOXq+/5uMBlE3DkYqM4HD6XHHuBKoWbV1sHOn7q/73iOmKPSXJra46OgZHbi5ULi5RnnIUzR1dathpBw1B99Cj6IaPQtGgbgvXqFpEUWG8CHanz45TqXjExLmeP7YnCi8D6h7xaAaORCpzvxlwfdMF8ImNwHr2IlJlmSvcl0ZA344uNjnrf6Qs29mY07dogiX9rFvd5p1aUZCdh6NSN3N/OlG9XXULzl7k5J7f6+SnaH9BXP0T5XN9jIWuYxTl5y4glTt1Sw78jqHnHS42Cq0n/mMfIu8KvRI30l8QF+cbeb1vdXskedkXsVeeK33/MTr2jr0qDZH+itI+nHmBpn4GNB7OxbY6hDVm17EsFxt/g44Ci7N3y2ix0jq4oVtdkflJVB0R5bMof0X6LHNrcEv2KH722WcsX76c0aNHk5aWRkFBAUOHDmX37t1kZmayevVqDAYDGRkZvPfee0RERHDy5EkmTpxISEgI69at4/jx4zRs2JBz584xd+5cLBYLU6dORaVSERkZSXJyMvHx8TzyyCMu5zYajSxevJjw8HDOnDnDAw88QFFREYsWLeK+++4jLy+P06dPM2rUKJKSkkhLS2PZsmU0a9aM3Nxc3nrrLcLCwsjMzGTIkCHcfvvtl32/bdu2bN68mdTUVFatWsV9992Hp6cnVquVVatWkZKSQuvWrZkyZQo7d+5k0aJF3HvvvZjNZn7//XfeeOMNgoODyc3NZdmyZbRq1YqsrCweffRR2rRpw9q1azlz5gx+fn5kZ2czb968y75XV3QB3pSbq29qbeZSGgZ4X9ZWqVbR6uHuJL380TX9/n/g3dAHq8V5zlJzCQbfBihVSiocdXuaI8rnmxGLv4pC3wCpvHoIiWQrRam70lNDBarQGOwHd7jX9fVDKi2p1i2xoPBt5WJTfjgZ2749SCYT6rjONHh5LkUzprrXNvgglVVrYy1BYfBxtfFrjEKrx/btf1A0CkI/cR6WBRNrnWxe33QBNAHeOMzVv5/DXIo6wKeGnVKrpsULD+PXtTUpE1fVqglgCPCmzFKtazWXENQwzO1x7hDlL4irf6J8ro+xUPn7UmGp1q0wl6Dyd/W50fMjyX/n31B543kz/QVxcb6R13vvhj5YLzlXibmEsIYtr0pDpL+itI3mUvSe1SvOenlqMJpdRzCM6NGGqZ9s540vf+HomTzG/6mBejlE5idRdUSUz6L8FenzLctNWszm559/5rvvvqNhw4YoFAqeeeYZl8/PnDnDkiVLaNu2LampqQwcOJA+ffr85fPekj2KDz74IC1btiQmJoalS5ei0WiwWCwsXLiQ6OhokpKcPR0JCQkMGzaMsWPHMnjwYBYvXgxAkyZNSEhIYPLkyeh0Onbv3o2Pjw/jx4/HZDIxbdo0Vq5cedkhngcOHMBkMjFixAimTZtGw4YN6d27N7fffjvBwcEsWLCA1q1b8/vvvzN37lwGDBjAtm3bAHj99dfp3r0748aN47nnnuP5559HkqTLvq9WqxkyZAjR0dE8++yzhIeHA3Dx4kUmT57MO++8w6ZNzidzf5w/KCiIV155hb59+/Ldd9/VOOeECRNISEgAYMOGDfTu3ZsJEybwwAMPXPG9ulKaV4TaoKt6rTHosOYV1bBTqlV0WzSa/a9voDjzwlWdA6D34/156ePZ/N+7L1KUb0Lr5TynzqDHXFhc50aiSJ9vVCyuJ1JJMQq1tuq1QqNDKi2+rK0qvD2OU4frpltYgEJX3dup0HshFRa62FTknkcyOeeWlicfRN2uPSjdX3okswmF5yU9qVq9c+7fpVhLcJxOd9pfPAdaPQq/gP8pXQBbXhEqQ/XvpzLoKL/MfN0Kazkn5q8jZeIqOn3+CgqP2rfDMOcV4elVras16LHk1yzLV4sof0Fc/RPlc32MhcNYiNKrWldp0OMwVvvs0SQAlY+BBvd2x3/8wwD4jx6Ctk2rGlo3wl8QF+cbeb0vyjehveRceoOeovyrm5cv0l9R2v4GHSVl1VueWMps+F/yWwK8suEnhsRF8kL8nSwf2Yfpa/+LqaT23kqR+UlUHRHlsyh/RfosU01paSlz5sxh1qxZPPvss6SlpbFnzx4Xm/fff5/bb7+d8ePHM27cOF5//fpsk3ZL/0ohISEAeHt707y5s/fDx8cHi8W5j0taWhpJSUkkJiayd+/equF5Op2OpUuXkpiYyPHjxzEaq/eICgsLA8Df379K51J69uxJXFwcTz31FAkJCXh4VHe6/uHDpf54e3u7+POHzwEBARQXF1NQUHDF9y9HcHAwSqUSpVLpcu4r+Z6WlsahQ4dITExk69atNGzYkIqKChYvXsz69et56KGHSElxzh+43Ht1Jfe3DAzBASg1Tp8C41qRtSMZT1+vqsSh0qrptvgpjiR+Q96R04TdF1eb5GXZue47Xh/1GisnLuXgzt9o1SkSgMjYKA7u/O2W8PlGxeJ6UpFzAoW3P6icPiuDwnGcOgKeetC4JmRV6y7Yf99zOZkalKemoAoMhMr9x9QxbbDt24OiQQMUlfVRP3ocKJ03Y6pmwVScP1+nMf6OU8dQ+jeCynqgahGNPeVX0BtA64yzPf0QyoBA5wFaHSiVSEWXr1v1VRfAtD8dbXAjFJVlzveOSPK2H8TD1wtVZZlrPnFglX1ZjhG1vzdKraZW3awDGfg1C0BVqRsaG8GxnQfR+XjheckN4dUiyl8QV/9E+VwfY1F68BjqoMYo1E5dfafWmH/Yh9LHgNJLh/18HjkzVmBM3IgxcSMAxg83Yz2acVP8BXFxvpHX+4zf0gho1giPynNFxEZxcOd+N0fdOH9FabcLbUxOgRmb3bkfa/LpC3SPao6ppAyz1bn4yflCCwHezpzirfNEqYAKN9uAi8xPouqIKJ9F+SvS51uWigoxf7WQnJxMUFAQGo3zetWpUyd++OEHF5uAgICq9o7RaCQmJua6fN1bcuhpXYmKiqJfv35ERUVhs9n4/vvvAZgyZQpbtmwhKCgIs9nscoxCoahVMz09nfj4eMaOHcvatWv5+OOPq3rp6uJPVlYWMTExXLx4EW9vb/z8/K74vkqlQpIkrFYr586dQ6PR1Orf5T6LioqiS5cu9OnTB0mSCAwMRKlUkpOTw7JlyygpKWHgwIHEx8df9j1fX986fTeH1UbSzA+5a95IrPlFGFPPcC4phTteHkZZoYVDb39Jr1WT8I8MpkHzJwFQ6zw5/fWvddK/HOuXfMpjM0fStGUQgc2bsHbBR1d1vCifb3Qsfj14mC+37SAv38jqj/7NqMeGovX0vDoRezm2netQ93wUSsxU5GVTceYY6m5DkawW7PudveKKRsFIBRegvI5zbMrKMK9agdfEKUimQuwnT1CefAD9U08jFRdRumEdFQVGDFOm4jifg0dYS4qXLKibdnkZ1g3v4PngBCSziYpzp3GkH8Jz0GikkmJs2zdh274Jz8Gj0fR7GEVAU6yfrgC7m82465suUFFqI236+0QuGI0tvwjz71kU7DrKbbOHU15oJnPVFpSeaiIXP4U1Ow+vVs1In/0RDnPtcyDLrTa+SPiA+FdHYckv4vyxLE78nMKAGY9RajLz47tfAtDrmQfwbRZAu4F34rA7yPip9h5nUf6CuPonyuf6GAvJWsb5OW/TePbTOIxFWNNOUbLnEI1eHIPDVFx1I6ny88Z32H0ANBz7EIXrv8Gee+VFj0ReO0XF+UZe721WGx+8vJpRr46lyFhEVurpq1rIRrS/orR1Gg9mDenK61v24OelpVVTPzq3CmLF1n346D0Z06s9L8Z3Zu3uFA5l5pJtLObZAbH4eWlr1RWZn0TVEVE+C/NXoM+3LDdh6Gl+fj5eXl5Vrw0GA/n5rr/L6NGjmTx5MosWLeLw4cNMmjTpupxbIUluHsncBJKSkpg9ezZDhgyhd+/eJCQkEB0dzbhx45gzZw4+Pj7MmTMHk8nEBx98QHBwMDk5OQwaNIjY2FiWLVtGRkYGnTp1YteuXfj6+jJnzhyWL19Oamoq8+bNIyMjg0WLFjF//nzuueeeqnPv37+fjRs3Eh4eTmZmJo8++igAc+bMqeHD1KlTWb58OSaTiblz56LX63nzzTcJDQ0lMzOThx56qGqO4uXeLyws5P/+7/8ICwujd+/eHDhwgC+//JIFCxZgNpuZNWsWL730EhEREVXnf+6553jttdcwmUy89tpraLVa3nrrLYKDg8nLy6Nz587079+fhIQEmjRpAkBxcTEzZ8687Hu18V7wE9f9t/2vqmYv7vWil8PLvdEtxJPJdZ8jerWUf3p9hhz8mZJvjrk3ugY8I+WJ7X+wb724crxd534447XQt9QhRPekWu3e6BppWe6+gX4rITIW3fVG90bXwK4SfyG6on47kTEWlfvqW94DeOLtdkJ0Le9sFaKbd0pcjANaiCkXonwW5S9AwLYfhWlfT0q3LBGiqxs8/Yqf7dmzh3/+8598/LFz25EPP/yQ8+fPu9zHP/PMMwwYMICBAwdiNBrp378/27dvr3OH0JW4JRuKMrcGckNRLHJDsRq5oViN3FCsRm4oViM3FKuRG4rV1Le8B3JD8VLkhuIl2vWlobh5sRBd3ZAZVz5naSmDBg1i69ataDQann32WR5//HGio6Px8PDAYDDw4IMPMn36dDp37ozdbueuu+7i22+/xd//r12H6/XQUxkZGRkZGRkZGRkZmf9VdDodr776KvPnz8fPz4/IyEi6dOnCkiVL8PX1Zfz48cycOZNPPvmEgwcPcvbsWZ5//vm/3EgEuaEoIyMjIyMjIyMjIyPjnpu0PUbXrl3p2rWry3vTp1cPV42NjSU29ur2Xa0LckNRRkZGRkZGRkZGRkbGHfV1tdZrRG4oylyRITFnrr9oSsj116xk2NBC90bXQFna5fcb/KuImkcIoH7iJSG6hp6/C9GVMsXoAth3/CRMWwTtO5wXpv2vY2LmjLXvIGaOW3shqk6WpTUTojv7OTHzbe84mS1E14mgnbI+FyMrqo6ILG+icl99y3sA9h1ifDa8/qIQXS+B+UkYguZr6u+NEqIrc+siNxRlZGRkZGRkZGRkZGTc8TfrURT0GFFGRkZGRkZGRkZGRkamviL3KMrUGXXH29F07YFUWIAkSZSu/djlc89+A9DePwhsNgCs276mbMd3ddIO6hZDi3vjKM0vAkniwIrNLp+3nzQQXSMfSi+aCGjbgv1vbMJ0IsetriqiPR7t70Iym0CSsH3775rfq0c8AMqGgSh0XljXrXSrKzIWypAoVLd1hNJiJAnse79y+VzTdwQK30bV9gHBWNctQCpysynun8jLN/JW4iekHT/J+n+9dVXHXsovh9PZse8I/j4GFMDTD9/j8nn2BSPvbtxGeHAgJ87mMuL+u4kMC6qbdkY2O44Dqv9kAAAgAElEQVSext9Lh0IBT/fr5KptLGb5V/uICQkg7Vw+93YIp2dMqFtdUeVClC6ILXN/pk3XdsTdeydFeSYkSeLzlRuuWkOkv6K0w7u2oc2AOMyV16EdK2uOnWx7f2fumT6Mr+Z+wrGdB+vkr8g6Xd/KsqhrPYgrF6J061veA3GxEOVzfcxPonT/F+5bbgn+ZrsKyg3FW5Tt27cTFRVFcHDwzXbFiacnhilTKRj/JJSX02D2PNQdOlGefMDFrHjRPCpyr27+iEqrodviMWzq/RIVNjt9E6cQ1DWGc0kpVTYeei2/zF0LQMv4znROeIzvRi+vXVjtifbRyVgWTQK7He2Ymagi2uNIP1StG9cLqdSC/dedACiDwtw7LDAWeKjR9BmOdc1ccNjR3D8BZUgUFWeq9y90ZKXi2L7G+UKjRdP/yWu62B44nELv7ndyLOPkVR/7B6VlNua/v4nPl01Ho/Zg6rKP2Hsknc5tI6psln78BfF3x9HnjrZkZOUwa9VaNi59wb22zc6Cz5P4bNqDaDxUTPtkB3szztG5VXUS/+iHw3QIC2REjzYcy87jxU//6z5hiioXonRBbJn7ExqthjELn2Z6vynYbXae++d0Yrq2JSXpSN1FRPorSFut1TBkwRhW9J+Ow2Zn+LvPEX5XDCd+rr4O+QU3wmIsxpRzFfVNZJ2uZ2VZ2LUexJU5Qbr1Lu+BuBgL8rk+5idhee9/5L7llkAeeipzK7B9+3ays0UuXnB1qKNjcOTmQuVGx+UpR9Hc0aWGnXbQEHQPPYpu+CgUDeq2qEPg7a0wn82jwmYHIPfXDJr36eBi89sbm6r+VyiVlFvK3OqqWkRRYbwIdqeu41QqHjFxrt8rticKLwPqHvFoBo5EKit1qysyFsqm4UhFRnA4fa44dwJVi7YuNo70/VX/e8R0xZ6SVCftP9O/V3f0ev01HfsHh9NP07SRHxq185lTh8gW/HQw1cUm83weTQN8AWjW2J/0rBwKiszutTMv0NTPgMbDuUl8h7DG7DqW5WLjb9BRYLECYLRYaR3c0K2uqHIhShfElrk/0+r2SPKyL2KvrI/p+4/RsffVLbkt0l9R2s07taIgOw9H5ffO3J9OVO+OLjYFZy9ycs/VLWwhsk7Xt7Is6loP4sqFKN36lvdAXCxE+Vwf85Mo3f+V+xaZG89N7VH87LPPWL58OaNHjyYtLY2CggKGDh3K7t27yczMZPXq1RgMBjIyMnjvvfeIiIjg5MmTTJw4kZCQENatW8fx48dp2LAh586dY+7cuVgsFqZOnYpKpSIyMpLk5GTi4+N55JFHapx//fr1nDp1Cj8/P5KTk1m6dCkAS5YsITg4mHPnztGtWzf69u3L0qVL2bp1K48//jj79+8nOjqaBg0acOTIEfR6PYsWLeLXX39l/vz5tG/fniZNmnDkyBEef/xxunfvzvbt29mxYwctWrQgLS2NuXPnYjAYyM3N5c033yQ8PJysrCzatm1LZGQkqanOi9mhQ4eIi4vj1VdfpVOnTqjVag4dOkRCQgJt27bFbDazYMECwsLCOH/+PL1796Z79+589913/PzzzzRr1oyjR4+ycuXKy75XVxS+fkilJVWvpRILCt9WLjblh5Ox7duDZDKhjutMg5fnUjRjqlttXYA35ebqi77NXErDAO/L2irVKlo93J2klz9y77PBB6ms2mesJSgMPq42fo1RaPXYvv0PikZB6CfOw7JgYq375IiMhULfAKncWq1tK0Wpa34la1ShMdgP7nCrKwpjkRkvrWfVa4POk1STa5LtGNmCwxmZtG4ZwtHjzpV0LaVl+Hkbatc2l6L3VFe99vLUYDS7PoEc0aMNUz/Zzhtf/sLRM3mM/9ON1uUQVi4E6YLYMvdnvBv6YL2kPpaYSwhr2PKqNITWEUHahgBvyizVdc9qLiGoYZhbf9z6K7BO17eyLOpaD+LKhSjd+pb3QGCMBflcH/OTsLwn37dcP+QexRvHgw8+SMuWLYmJiWHp0qVoNBosFgsLFy4kOjqapCTnE4eEhASGDRvG2LFjGTx4MIsXLwagSZMmJCQkMHnyZHQ6Hbt378bHx4fx48djMpmYNm0aK1euZM2aNTXOfeLECdasWcOMGTOYMGECgwcPRpIkVq9eTWhoKOPHj2fWrFnMmzcPk8nEiy++iNFoZPjw4fzzn//kP//5D/3792flypWkpKRQUFBAXFwc0dHRtG3blkmTJjFv3jxmzJiBJEl4e3sza9Ysxo8fT5s2bdiyZQsAr7/+Ot27d2fs2LEkJCSg0Who164d0dHRDBkyhPHjx9OxY0f69u2LwWBg1qxZPPnkk3zxxRcAVf5OmDCBl156iVdeeQW73c4XX3xB+/btGTduHKNGjQK47Ht1RSosQKGr7n1S6L2QCl2XuK7IPY9kMgFQnnwQdbv2oHRfxErzilAbdFWvNQYd1ryiGnZKtYpui0az//UNFGdecO+z2YTC85IeM63eOf/hUqwlOE6nO+0vngOtHoVfQO26AmMhlRSjUGurtTU6pNLLL1OuCm+P49Rht5oi8fc2YLFWP+U2l5bh7+OaYF8YOYjC4hLWbP2RnDwjvg30BDb0+bNUTW2DjpKy8qrXljIb/gati80rG35iSFwkL8TfyfKRfZi+9r+YSmp/6i6sXAjSBbFl7s8U5ZvQXlIf9QY9RfmmWo64sf6K0jbnFeHpVV2+tAY9lvya16GrRWSdrm9lWdS1HsSVC1G69S3vgcAYC/K5PuYnYXlPvm+RuUZuiaGnISHO/YW8vb1p3tz5FMLHxweLxQJAWloaSUlJJCYmsnfv3qrhcjqdjqVLl5KYmMjx48cxGqv38goLCwPA39+/SudS0tPTXeb/DRgwgAYNGpCWllblj0ajwcfHh8zMTAACAgLw8vJCqVTi5eV1WV+BKt1GjRpRUlKC0WhEr9fz9ttvs3r1ag4ePFjla1paGqGhoVXnGzx48BXjdLnvlJaWxsmTJ0lMTOSTTz4hIiICk8nEzJkz+e233xg6dCi7du1CkqTLvldXylNTUAUGgtr5pEsd0wbbvj0oGjRAUfl76EePA6VzuISqWTAV58/X6clL7m8ZGIIDUGqcHdyBca3I2pGMp69XVSJVadV0W/wURxK/Ie/IacLui6tNEgDHqWMo/RuBh1NX1SIae8qvoDeA1qlrTz+EMiDQeYBWB0olUlHBTYtFRc4JFN7+oHL6rAwKx3HqCHjqQeOaLFStu2D/fY9bTZG0iwgj52IBtnLnkJPktFP06BiNyVyCucT5hPGCsYhR8T0Zcf/dtI8Io0u7SNQe7gcztAttTE6BGZvd4dQ+fYHuUc0xlZRhtjon258vtBDg7Yy5t84TpQIq3JRrUeVClC6ILXN/JuO3NAKaNcKjsj5GxEZxcOd+N0fdOH9FaWcdyMCvWQCqyu8dGhvBsZ0H0fl44XnJDf3VIrJO17eyLOpaD+LKhSjd+pb3RMZClM/1MT+J0pXvW64jUoWYv1uUerGYTVRUFP369SMqKgqbzcb3338PwJQpU9iyZQtBQUGYza7DCRQKRa2aERERLnMAt23bRlxcHFFRUWRlOceD22w2TCZTVQOtrpw9exaACxcuoNPp8Pf356mnnuLll18mLi6O9evXc+HCharvlpWVRUxMDFarlW+//ZYHHngApVKJJElkZmYSEBBwxe8UFRVFQEAAI0eOBJy9hr6+vhw6dIj58+dTXl7OiBEj6Nu3L7m5uTXei4mJqduXKivDvGoFXhOnIJkKsZ88QXnyAfRPPY1UXETphnVUFBgxTJmK43wOHmEtKV6yoE7SDquNpJkfcte8kVjzizCmnuFcUgp3vDyMskILh97+kl6rJuEfGUyD5k8CoNZ5cvrrX2sXLi/DuuEdPB+cgGQ2UXHuNI70Q3gOGo1UUoxt+yZs2zfhOXg0mn4PowhoivXTFWAvr11XYCywl2PbuQ51z0ehxExFXjYVZ46h7jYUyWrBvn8bAIpGwUgFF6C8bvN3LsevBw/z5bYd5OUbWf3Rvxn12FC0np7uD7wEnaeGl8c+yOIPN+PvbSCieVM6t41gxadf4m3Q89QDfTiUfordB4/RumUwJksJM8cMrZu2xoNZQ7ry+pY9+HlpadXUj86tglixdR8+ek/G9GrPi/GdWbs7hUOZuWQbi3l2QCx+XtrahUWVC1G6ILbM/Qmb1cYHL69m1KtjKTIWkZV6+uoWshHtryDtcquNLxI+IP7VUVjyizh/LIsTP6cwYMZjlJrM/PjulwD0euYBfJsF0G7gnTjsDjJ+cvN0XGSdrmdlWdi1HsSVOUG69S7viYyxIJ/rY34Slvf+R+5bZG48CulqupWuM0lJScyePZshQ4bQu3dvEhISiI6OZty4ccyZMwcfHx/mzJmDyWTigw8+IDg4mJycHAYNGkRsbCzLli0jIyODTp06sWvXLnx9fZkzZw7Lly8nNTWVefPmkZGRwaJFi5g/fz733OO6LPL69es5fvw4fn5+VFRU8Mwzz2A2m1m8eDFBQUHk5ORw991307dvXzZu3MiSJUtYuHAhALNmzWLmzJkEBQUxa9Ys4uPjef7555kxYwb+/v54eXlx6NAhhg8fzt13382nn37Kjh076Ny5MykpKZhMJl577TW0Wi1vvvkmYWFhXLx4kYcffpjIyEi++uor/vvf/yJJEuPHj2fhwoX4+Pgwe/Zs3nzzzarv16JFC5YuXUpgYCDFxcWEhITw+OOPs3z5chQKBVqtlnPnzjF79mz+8Y9/1HhPo9Fc8ffJu+fu6/6bb04Jue6afzBsaKF7o2ugLO3yQyj+Kvp7o4ToAqifeEmIruPs1S3kUVekTDG6APYdPwnTFoGo8gYw5Zi/EN23oozujW4xlqU1E6I7+7lrW0TIHY6Tt87iZnXlP5/7CtEdEnNGiK5IROW++pb3ADwjxdQR9ZinheiKzE+isLyzVYiuyPsW/XOrhWlfT0o/mSlEVzdykRDdv8pNbSj+LzJjxgyGDBlC586db7Yrfxm5oehEbihWIzcUxSM3FG8MckNRPHJDsRq5oViN3FAUj9xQFEfpxzOE6OpGLRai+1e5JeYo/q+wf/9+0tLS2LJlS42hsDIyMjIyMjIyMjIyMvWFejFHsb4QGxvL5s2bb7YbMjIyMjIyMjIyMjLXm7/Z9hhyQ1HmiogYYtAyWcwwGQBVSzFDyfRXt4VcnSn55pgYYcDQU8xQGVVwayG6DiGqTsrSxAzBOZTcRIjuHY8KkQUgNO3aV++sDVHDkYQOt0wTJy0CUdc3sdRccfx6kHfKS4hu80nipkaIyn31Le8BZL0jZuhwaB8xeU8RKibviSTv1A9CdAME3rfonxMmLfMXkBuKMjIyMjIyMjIyMjIy7pB7FGVkZGRkZGRkZGRkZGRcuIX3PBSB3FCUqTPKkChUt3WE0mIkCex7v3L5XNN3BArfRtX2AcFY1y1AKsp3q+3Xoy2N77sDW54JJDi1bJPL540Hd6HRgFiKj2bi3SGc8xt/JO+7AzfNZ5GxUHe8HU3XHkiFBUiSROnaj10+9+w3AO39g8Dm3HzXuu1rynZ851b3l8Pp7Nh3BH8fAwrg6Yddt4vJvmDk3Y3bCA8O5MTZXEbcfzeRYUFudS9HXr6RtxI/Ie34Sdb/661r0hDps6gYiyrHAKqI9ni0vwvJbAJJwvbtv2t+rx7xACgbBqLQeWFdt9KtbnjXNrQZEIc5vwgkiR0rP69h0/b+ztwzfRhfzf2EYzsP1slfkXVEjoV4bVG6Qd1iaHFvHKWVMT6wwnVef/tJA9E18qH0oomAti3Y/8YmTCdyatX8A/1dHWjQ7y4cRhOSJJH/j3WXtfOO70nQsumkdRiKVLnxem2IikV9y3sitUX9dr9kZLPj6Gn8vXQoFPB0v04un2cbi1n+1T5iQgJIO5fPvR3C6RkT6lYXxOUnUbqiYgzicqrMzUduKP4PsWnTJkpLSxkxYgRFRUVs376doUPrtnmsWzzUaPoMx7pmLjjsaO6fgDIkiooz1ePVHVmpOLavcb7QaNH0f7JuiUenIWrJWH7pMQ3JZqftv6bi170NBbuOVtmotBqOz19HWXY+hjZhtH3vOfcJU5TPAmOBpyeGKVMpGP8klJfTYPY81B06UZ7s+l2LF82jIve8e71KSstszH9/E58vm45G7cHUZR+x90g6ndtGVNks/fgL4u+Oo88dbcnIymHWqrVsXPpCnc9xKQcOp9C7+50cyzh5TccL9VlQjIWVYwC1J9pHJ2NZNAnsdrRjZqKKaI8j/VCViUdcL6RSC/Zfdzr9CQpzL6vVMGTBGFb0n47DZmf4u88RflcMJ35OqbLxC26ExViMKacO5bfKGYF1RI6FeG1Buiqthm6Lx7Cp90tU2Oz0TZxCUNcYziVVx9hDr+WXuWsBaBnfmc4Jj/Hd6OVuQ6HQetJk7jOcuu9ppHI7zVa9jL5Le0r2HHKx04SHoLmtuVu9aofExKLe5T2B2qJ+u1KbnQWfJ/HZtAfReKiY9skO9maco3Or6kbVRz8cpkNYICN6tOFYdh4vfvrfOjUUReUnUbrC6gcIy6m3KlLF32tXQXl7jP8hHnroIZ544gkAioqKrusKrMqm4UhFRnDYAag4dwJVi7YuNo70/VX/e8R0xZ6SVCdtn9gIrGcvItmc2oX70gjo29HFJmf9j5RlO5ONvkUTLOlnb5rPImOhjo7BkZsL5eUAlKccRXNHlxp22kFD0D30KLrho1A0cL8n1eH00zRt5IdG7Xw21CGyBT8dTHWxyTyfR9MA575nzRr7k56VQ0HRtW3z0r9Xd/R6/TUdK9pnUTEWVY4BVC2iqDBeBLtT23EqFY+YONfvFdsThZcBdY94NANHIpWVutVt3qkVBdl5OCp9ztyfTlRvV58Lzl7k5J6rWyRCZB2RYyFeW5Ru4O2tMJ/No6Iyxrm/ZtC8TwcXm9/eqO5VUyiVlFvK3OoC6DpGUX7uAlK5U7vkwO8Yet7hYqPQeuI/9iHyrtCTcjlExaK+5T2R2qJ+u8OZF2jqZ0DjoQKgQ1hjdh3LcrHxN+gosDh7zYwWK62DG9ZNW1B+EqUrKsYgLqfK3BrU+x7Fzz77jOXLlzN69GjS0tIoKChg6NCh7N69m8zMTFavXo3BYCAjI4P33nuPiIgITp48ycSJEwkJCWHdunUcP36chg0bcu7cOebOnYvFYmHq1KmoVCoiIyNJTk4mPj6eRx55pMb5169fz6lTp/Dz8yM5OZmlS5cCsGTJEoKDgzl37hzdunWjb9++LF26lK1btzJ8+HCOHj2KXq9n0aJFAGzfvp3du3fTrFkzkpOTmTHDuaHnokWL6NixI+np6YwZM4bAwEBeeOEFbDYbS5Ys4fz588yfP5/Jkyfz/fffA7B48WI2bNhAdnY2q1atokuXLixevJhGjRqxdOlSfvvtN9566y3mz59PdHR0neKs0DdAKq8egiDZSlHqrvTUSYEqNAb7wR110tYEeOMwV2s7zKWoA3xq2Cm1alq88DB+XVuTMnHVTfNZZCwUvn5IpSXV2iUWFL6tXGzKDydj27cHyWRCHdeZBi/PpWjG1Fp1jUVmvLSeVa8NOk9STa6JpWNkCw5nZNK6ZQhHjztXpbOUluHnbaiT79cbUT6LirGocgygMPgglVX7jLUEhcFVW+HXGIVWj+3b/6BoFIR+4jwsCybWOp/CEOBNmaXaZ6u5hKCGYXXyqVZ/RdYRORbCtUXp6gK8KTdXN9pt5lIaBnhf1lapVtHq4e4kvfyRW10Alb8vFZZq7QpzCSp/13LR6PmR5L/zb6i8Wa4LomJR3/KeSG1Rv53RXIreU1312stTg9Hs2rs5okcbpn6ynTe+/IWjZ/IY/6cHF1fUFpSfROmKijGIy6m3LH+zxWzqfY/igw8+SMuWLYmJiWHp0qVoNBosFgsLFy4kOjqapCTn06yEhASGDRvG2LFjGTx4MIsXLwagSZMmJCQkMHnyZHQ6Hbt378bHx4fx48djMpmYNm0aK1euZM2aNTXOfeLECdasWcOMGTOYMGECgwcPRpIkVq9eTWhoKOPHj2fWrFnMmzcPk8nEiy++SH5+PsOGDWPlypUcPnyYgoICTCYT8+bNY9asWYwbN46RI0ciSRJqtZrJkyczbtw4nnzySd599138/f155ZVXMJlMBAUF0bRpU/r370+fPn0YMmRIlW+PPPIIzZo149lnnyU2NpZp06Zht9sxGAx4eXkxcuTIOjcSAaSSYhRqbdVrhUaHVFp8WVtVeHscpw7XWduWV4TKUK2tMugozzPVsKuwlnNi/jpSJq6i0+evoKh8SnijfRYZC6mwAIWuuidOofdCKnRdVr0i9zySyRmf8uSDqNu1B2XtVdnf24DFWv1k3lxahr+Pa1J5YeQgCotLWLP1R3LyjPg20BPYsOaNy41ClM+iYiyqHANIZhMKz0t6aLV65/y8S7GW4Did7rS/eA60ehR+AbXqmvOK8PSq9llr0GPJL3Lrj1t/RdYRORbCtUXpluYVoTZUb9GiMeiw5tWMsVKtotui0ex/fQPFmRfqpO0wFqL0qtZWGvQ4jNXlwqNJACofAw3u7Y7/+IcB8B89BG2bVjW0LkVULOpb3hOpLeq38zfoKCkrr3ptKbPhf0nMAV7Z8BND4iJ5If5Olo/sw/S1/8VU4r4XW1R+EqUrKsYgLqfeskgVYv5uUerpr1STkBDn/kfe3t40b+58wuXj44PF4tzHKS0tjaSkJBITE9m7d2/VsDidTsfSpUtJTEzk+PHjGI3GKs2wsDAA/P39q3QuJT09neDg4KrXAwYMoEGDBqSlpVX5o9Fo8PHxITMzE4CAgAAaVHa5/6GbmZmJj48PGo0GgM6dO9O8eXM8PDzYunUr7777Ll9//TUFBQVVfgUGBrJ37142btzIQw895DY+Xbp0ITc3l9OnT/PNN99w77331jGyTipyTqDw9geVsxNaGRSO49QR8NSDxvXCq2rdBfvve+qsbdqfjja4EQqNU9v3jkjyth/Ew9cLVeVNRfOJA6vsy3KMqP29UWo1N8VnkbEoT01BFRgIaudTUHVMG2z79qBo0ABFZZnVjx4HSufNgqpZMBXnz7t9wtUuIoyciwXYKp8UJqedokfHaEzmEsyVk9UvGIsYFd+TEfffTfuIMLq0i0TtcfMGHYjyWVSMRZVjAMepYyj9G0Hld1O1iMae8ivoDaB1atvTD6EMCHQeoNWBUolUVFCrbtaBDPyaBaCq9Dk0NoJjOw+i8/HC03Dtey6KrCNyLMRri9LN/S0DQ3AAysoYB8a1ImtHMp6+XlUNSJVWTbfFT3Ek8Rvyjpwm7L642iSrKD14DHVQYxSVQ/b0nVpj/mEfSh8DSi8d9vN55MxYgTFxI8bEjQAYP9yM9WjGTYlFfct7IrVF/XbtQhuTU2DGZnfu1Jt8+gLdo5pjKinDbHUuqnK+0EKAt/O6763zRKmACsn9HDRR+UmUrqgYg7icKnNrUO+HntaVqKgo+vXrR1RUFDabrWqY5pQpU9iyZQtBQUGYza7d+wqFolbNiIgIsrOrN4Tetm0bcXFxREVFkZXlHAdvs9kwmUxVjc7LaYaGhmIymbDZbGg0Gvbu3UtAQADr16/H29ubiRMncurUKQ4frn4698QTT/DBBx8QGhpKQEDNJ+UqlQqp8mKXmppKdHQ0I0aM4I033qBNmzZVjdI6Yy/HtnMd6p6PQomZirxsKs4cQ91tKJLVgn3/Nuf3axSMVHAByus2rwSgotRG2vT3iVwwGlt+EebfsyjYdZTbZg+nvNBM5qotKD3VRC5+Cmt2Hl6tmpE++yMcZjfzjkT5LDAWlJVhXrUCr4lTkEyF2E+eoDz5APqnnkYqLqJ0wzoqCowYpkzFcT4Hj7CWFC9Z4FZW56nh5bEPsvjDzfh7G4ho3pTObSNY8emXeBv0PPVAHw6ln2L3wWO0bhmMyVLCzDHXvhDSrwcP8+W2HeTlG1n90b8Z9dhQtJ6e7g+8ET4LirGwcgxQXoZ1wzt4PjgByWyi4txpHOmH8Bw0GqmkGNv2Tdi2b8Jz8Gg0/R5GEdAU66crwF5eu6zVxhcJHxD/6igs+UWcP5bFiZ9TGDDjMUpNZn5890sAej3zAL7NAmg38E4cdgcZP7npKRBZR+RYiNcWpOuw2kia+SF3zRuJNb8IY+oZziWlcMfLwygrtHDo7S/ptWoS/pHBNGj+JABqnSenv/7VrbZkLeP8nLdpPPtpHMYirGmnKNlziEYvjsFhKq66+VX5eeM77D4AGo59iML132DPrWWxFUGxqHd5T6C2qN9Op/Fg1pCuvL5lD35eWlo19aNzqyBWbN2Hj96TMb3a82J8Z9buTuFQZi7ZxmKeHRCLn5f2ippV2oLykyhdYfUDhOXUW5a/2WI2Ckmqw6OTW5ikpCRmz57NkCFD6N27NwkJCURHRzNu3DjmzJmDj48Pc+bMwWQy8cEHHxAcHExOTg6DBg0iNjaWZcuWkZGRQadOndi1axe+vr7MmTOH5cuXk5qayrx588jIyGDRokXMnz+fe+5xXaZ4/fr1HD9+HD8/PyoqKnjmmWcwm80sXryYoKAgcnJyuPvuu+nbty8bN25kyZIlzJ8/Hx8fH2bNmsXgwYP5v//7P7Zv385PP/1Es2bNKCwsZOrUqRw6dIjly5cTFxeHzWZj27ZtLFiwgC5dulBRUcE999zDsmXLaNeuHWazmYULF5Kamsrs2bNp164dEyZM4LbbbuO2227j4Ycfxmq10q9fPzZv3nzZxuWfKXlzwnX/vfYsKnRvdI10mekrTFsEJd8cc290jRhef1GIriq4tRBdx9mrWyDkajC/tFSI7qHkJkJ073i05uiF68XC/3f5+WB/ldnPiVmYwHEy273RNVLfYlEfWfuGmLLcXW90b3QNNJ8UIkQXxOW++pb3ALLeOSNEN3RpT53NfTgAACAASURBVCG6ilAxeU8kpx95R4huQAtx+Slg24/CtK8nJW8/I0RXP/kfQnT/KvW+oShTN2w2GxUVFaxYsYKZM2fW6Ri5oSgWuaFYjdxQrEZuKFYjNxTrN3JDsRq5oViN3FAUj9xQFEfJqklCdPXPivnN/ip/m6Gnf3cmTZpEWFgYjz/++M12RUZGRkZGRkZGRkbmFkduKP5NeP/992+2CzIyMjIyMjIyMjL1l7/ZIjxyQ1FGRkZGRkZGRkZGRsYdf7MZe3JDUeaKKFqEC1D9TYCmE2XPeCG6UqaY+XOekeLmX4ny2SFEVdzcR5GcVKvdG10DXVo2E6ILkCnVbU+6q0fMvDxVPYxFyTdi6nXeKS8hugDNBojZKeukh5h5oC1NYspbqJCc9wdicl99y3sAAS3EzM+XTp0QoiuS+jb/UeR1yP0SizI3A7mhKCMjIyMjIyMjIyMj446/2dBTMY8RZWRkZGRkZGRkZGRkZOotco+iTJ35JSObHUdP4++lQ6GAp/t1cvk821jM8q/2ERMSQNq5fO7tEE7PmNA6afv1aEvj++7AlmcCCU4t2+TyeePBXWg0IJbio5l4dwjn/MYfyfvugHufD6ezY98R/H0MKICnH3bdBzP7gpF3N24jPDiQE2dzGXH/3USGBbnXFRgLVUR7PNrfhWQ2gSRh+/bfNWzUPZzDjZQNA1HovLCuW3nTfBYV48uRl2/krcRPSDt+kvX/euuaNADUHW9H07UHUmEBkiRRuvZjl889+w1Ae/8gsNkAsG77mrId37nVDeoWQ4t74yjNLwJJ4sCKzS6ft580EF0jH0ovmgho24L9b2zCdCKnTj4rQ6JQ3dYRSouRJLDv/crlc03fESh8G1XbBwRjXbcAqcjNZsmXoU3XdsTdeydFeSYkSeLzlRuuWkOkv/UtFqLKG4D+rg406HcXDqPTv/x/rLusnXd8T4KWTSetw1CkEqtbXVHXofCubWgzIA5zZR3ZsfLzGjZt7+/MPdOH8dXcTzi286BbzT8QlkcEXTvrW94DcbEQVUdEXivqW04Vda0QrX3LUfH3mqMo9yjWU4YOHYrDUfuMsaKiIj7/vGYSvhZKbXYWfJ7Ei/F3MrF/JzJyCtibcc7F5qMfDtMhLJAxvdozumc7ln21r07aSp2GqCVjSX/lY069sQlD6+b4dW/jYqPSajg+fx1Zb/8/Tq/cTKu5I937XGZj/vubeHHUYCY+fA/pWTnsPZLuYrP04y/oFdeG0YN7Myq+JwlvX/7i5qIrMBaoPdE+Opmyze9h+2YdyqAwVBHtXUw84nohlVoo/+lLyja/j+2HLTfNZ1ExvhIHDqfQu/udf20uuacnhilTsaz+ByWffoRHy3DUHTrVMCteNA/T9OcwTX+uTjckKq2GbovHsGfupxxY/jn+0SEEdY1xsfHQa/ll7loOvfMVp77eR+eEx+rms4caTZ/hlP+0kfJfvkIZ0AxlSJSLiSMrlbJNy51//+8dHGfTr6lhpNFqGLPwadbM+4DP3lxP8+gwYrq2vToRkf7Wt1gIKm8ACq0nTeY+w4WFieStWos2sgX6Lu1r2GnCQ9Dc1rzuPgu6Dqm1GoYsGMNXr61hx5uf0SSqOeF3udYRv+BGWIzFmHKu7vcSlkcEXTvrW94DgblPVB0ReK2obzlV2LVCsPYtiVQh5u8WRW4o1lM+++wzVCpVrTZFRUVs3ry5Vpu6cjjzAk39DGg8nOfsENaYXceyXGz8DToKLM4nREaLldbBDeuk7RMbgfXsRSSbHYDCfWkE9O3oYpOz/kfKsp0Xb32LJljSz7r3Of00TRv5oVE7O847RLbgp4OpLjaZ5/NoGuDcsLhZY3/Ss3IoKDLXriswFqoWUVQYL4LdGQvHqVQ8YuJcbNSxPVF4GVD3iEczcCRSWalbXVE+i4rxlejfqzt6vf6ajv0DdXQMjtxcKC8HoDzlKJo7utSw0w4agu6hR9ENH4WigfvFMwJvb4X5bB4VleU499cMmvfp4GLz2xvVPQYKpZJyS1mdfFY2DUcqMoLDqV1x7gSqFq4NFkf6/qr/PWK6Yk9JqpP2n2l1e+T/Z+/M46Is1z7+nRlmmBmGVdBiUXBBEM0l0czlmGmLpaaW2fHkUu6ZlZYrZnLANfXteMrEFtP0HLVO+paVqZ1KydxxIVlUXAJcYGBggGEW5v1jEJxABtE75O35fj5+Ps7MNb/nmov7vu7nfu6NnMxrWMt/R9rhFDr26XxLGiL9bWixEFXeADQdI7BkXcVucfhXfPRXdL27ONnI1O74jX2anJs84a8OUXmoaadW5GXmYCuP54XDaUT0cc71eb9d49z+W99IRVg7Iih3NrR2D8TFQlQdEZkrGlqbKipXiNaWqH+ETD39/PPPWbFiBWPGjCE1NZW8vDyGDBnCvn37uHDhAmvWrEGn05Gens7atWsJDw/n3LlzTJo0iZCQEDZt2sSZM2do1KgRWVlZLFiwgKKiIqZNm4ZCoaB169YkJSUxYMAAhg0bVuX6mzdvJiMjA19fX5KSkli2bBkAS5cuJTg4mKysLHr06EHfvn1ZtmwZO3bsYMSIEZw6dQqtVsuiRYsA2L17N/v27SMoKIikpCRmzZoFwKJFi+jYsSNpaWm88MILNGnShNdffx2z2czSpUu5fPkycXFxzJw5Ez8/v2p/43WMRiNxcXGcP3+enj17cvXqVZRKJTExMQC8++67WK1WysrKUCqVTJkyhT179hAfH8/69eu5du0ab731Fp06dUKpVHL8+HFiYmJo164dW7ZsITMzk1WrVlVo//zzzwQFBXHq1Cneecf1NKHr6I0laN0rd3n0cFehNzo/dXu+V1umrd/N21/+wqlLOYz/3U3yzVD5e2EzVk5BsBlLUPp7V7GTq5WEvf4Mvt3bkDxplWufC4x4qN0rXus07pw2OCfTjq3DOJF+gTbNQzh15hIARSWl+Hrpbq4rMBYynTf20uLKN0zFyHTOsZD5Nkam1mL+9t/IAgLRToqlKH5SjU+kRPksKsYikfn4Yi+pjLG9uAiZTysnG8uJJMwH92M3GFBGd8Vz7gIKZk2rUVfj74XFWHmzbDaW0Mi/+l0f5UoFrZ7pSeLcdbXzWeuJ3VJZR+zmEuSamz2ZlaFoFoX12J5aaf8er0bemG74HcXGYkIbNb8lDZH+NrhYCCpvAAo/H8qKKv0rMxaj8HPOFwGvjST3vX9B+U1crXwWlId0/l6UFlX+7UzGYgIbhdbar5oQ1o4Iyp0Nrd0DcbEQVUdE5oqG1qaKyhWite9KpKmnt8/QoUNp3rw5UVFRLFu2DJVKRVFREQsXLiQyMpLERMcTm5iYGIYPH87YsWMZNGgQixcvBuCee+4hJiaGl156CY1Gw759+/D29mb8+PEYDAamT5/OO++8w4YNG6pc++zZs2zYsIFZs2YxYcIEBg0ahN1uZ82aNTRr1ozx48czZ84cYmNjMRgMvPHGG+Tm5jJ8+HDeeecdTpw4QV5eHgaDgdjYWObMmcO4ceMYOXIkdrsdpVLJSy+9xLhx4xg9ejSrV6/Gz8+PN998E4PBQGBgIPfeey+PPPIIXbt2velvvI5Op2Pw4MHIZDJeeuklFixYwPnz5/nhhx/Yu3cvJ06c4JVXXuG1114jKSmJffv28fDDDxMU5Ng2vmPHjvTt2xedTsecOXMYPXo027ZtA2DYsGEEBQXx8ssv06FDB7Zt20b79u0ZN24co0aNuqW/qZ9OQ3GppeJ1UakZP53ayebNLT8xOLo1rw94gBUjH2bGxv9iKHY9YmLOKUBxg5ZCp8GSY6hiV2aycDZuE8mTVtHpP28ic6t5RNXPS0eRqfL6xpJS/LydE+nrIweSX1jMhh0/kp2jx8dTS5NGVRtrJ12BsbAbDcjcbxgxU2sda4RuxFSM7bxjKor9Whaotch8a95YWpTPomIsEnt+HjJNZYxlWg/s+flONmVXLmM3OOJuSTqG8r72IK85XZbkFKDUaSpeq3QaTDkFVezkSgU9Fo3h8JItFF6o3REN9uJCZMrKv5dMpcFeUlitraJFe2wZJ2qlWx0FuQbUN/wOrU5LQW7V+lgTIv1tcLEQVN4AbPp85B6V/sl1Wmz6Sv/c7vFH4a3D8/Ge+I1/BgC/MYNRt21VRcvJZ0F5yJhTgLtH5d9OrdNSlFu1jtQFYe2IoNzZ0No9EBcLUXVEZK5oaG2qqFwhWlui/hE69fT6yJmXlxdNmzqe4nh7e1NUVARAamoqiYmJJCQkcODAgYopZRqNhmXLlpGQkMCZM2fQ6/UVmqGhoQD4+flV6NxIWloawcHBFa8fe+wxPD09SU1NrfBHpVLh7e3NhQsXAPD398ezfBrDdd0LFy7g7e2NSqUCoGvXrjRt2hQ3Nzd27NjB6tWr+frrr8nLy6vwq0mTJhw4cICtW7fy9NNP1/gbbxYrgGbNmpGenu7k8/X3U1KqP3/IVVwAZs+ezZEjRxgyZAh79+7FfgsLve5r1pjsPCNmq2NdZNL5q/SMaIqhuBSjybG4/HJ+Ef5ejt/npXFHLoOyWlzDcDgNdXAAMpVjgNunS2tydh/DzccDRflNWtNJT1bYl2brUfp5IVeravY5PJTsa3mYy59gJaVm0KtjJAZjMcbyRdRX9QWMGtCb55/4C+3DQ+l2X2uUbjUPtIuMhS0jBblfAJT7oAiLxJp8CLQ6UDtiYU07jty/ieMLag3I5dgL8urFZ1ExFonldDKKJk2g/BxEZVRbzAf3I/P0RFZeP7VjxoHccUOmCAqm7PJll1tiXzmSji7YH3l5OW4S3YqLe5Jw9/Go6EAq1Ep6LH6RkwnfkHPyPKH9o2uSrKAs+ywyLz9QOLTlgS2wZZwEdy2onG9OFG26Yf11fy2jUZX0I6n4BwXgVv47wjtHcOz7wy6+9cf529BiIaq8AZQcS0EZ2BhZ+TQ1bac2GH84iNxbh9xDg/VyDtmzVqJP2Io+YSsA+o+/wHQqvUZdUXno4tF0fIP8UZTHs1nncFK+P4bG2wP3GzrkdUFYOyIodza0dk9kLETVEZG5oqG1qaJyhWjtuxF7WZmQf3cr9brraUREBP369SMiIgKz2cyuXbsAmDp1Ktu3bycwMBCj0XnIXSaT1agZHh5OZmblgcc7d+4kOjqaiIgILl50zB83m80YDIaKzlV1ms2aNcNgMGA2m1GpVBw4cAB/f382b96Ml5cXkyZNIiMjgxMnKp9A/e1vf+Ojjz6iWbNm+Pv71/gbf8+lS5cq/n/+/Hl69OiBUqnk4MGDTu/36dOn2u9X9xsUCkVFZ/D06dNkZ2cTFxeHxWLh+eefp2/fvkRFRVX5XnVoVG7MGdydJdv34+uhptW9vnRtFcjKHQfx1rrzwkPteWNAVzbuS+b4hStk6gt5+bHO+HqoXWqXlZhJnfEBrePHYM4twPjrRfL2nqLlvBFY8o1cWLUdubuS1otfxJSZg0erINLmrcNmrHlNjMZdxdyxQ1n88Rf4eekIb3ovXduFs/LTL/HSaXnxqYc5npbBvmMptGkejKGomNkvDKnXWGApxbTlPdyHTsBuNFCWdR5b2nHcB47BXlyIefdnmHd/hvugMaj6PYPM/15Mn64Eq6VGWVE+i4rxzTh07ARf7txDTq6eNev+xajnhqB2d3f9xRspLcW4aiUek6ZiN+RjPXcWS9JRtC9OxF5YQMmWTZTl6dFNnYbtcjZuoc0pXBrvUtZmMpM4+2MejB2JKbcA/elLZCUm02XucErzizj+7pc8tGoyfq2D8Ww6GgClxp3zXx9y7bPVgvn7TSh7PwvFRspyMim7lIKyxxDspiKsh3cCIAsIxp53FSy1W/tYHWaTmY/mrmHUW2Mp0Bdw8fR5khNP3pqISH8bWiwElTcAu6mUy/PfpfG8idj0BZhSMyjef5yAN17AZiisuClT+HrhM7w/AI3GPk3+5m+wXqlhww5BechiMrMt5iMGvDWKotwCLqdc5OzPyTw26zlKDEZ+XP0lAA9NeQqfIH/ue/IBbFYb6T+5HukR1o4Iyp0Nrd0TGQthdURgrmhobaqwXCFY+67kTzb1VGa/lWGlWpKYmMi8efMYPHgwffr0ISYmhsjISMaNG8f8+fPx9vZm/vz5GAwGPvroI4KDg8nOzmbgwIF07tyZ5cuXk56eTqdOndi7dy8+Pj7Mnz+fFStWcPr0aWJjY0lPT2fRokXExcXx6KPOWwdv3ryZM2fO4OvrS1lZGVOmTMFoNLJ48WICAwPJzs7mL3/5C3379mXr1q0sXbqUuLg4vL29mTNnDoMGDeKVV15h9+7d/PTTTwQFBZGfn8+0adM4fvw4K1asIDo6GrPZzM6dO4mPj6dbt26UlZXx6KOPsnz5cu677z7AMRW2ut94IwcOHGD16tU8+OCDXLx4EaVSyZtvvolMJuOf//wnpaWl2O121Go1U6ZM4YcffiA2Npb+/fszdOjQipjOmzeP//mf/6mIUWRkJBMmTKBly5a0bNmSS5cuIZPJUKvVZGVlMW/evIoR0+oo2b70ThcNfh5/5I5rXqf7zueF6Nov3PrGCrXBuucnIboAbg/3EqIra9ZGiK4iWIwugGHEGCG6XySHuDaqAyNe9xCiCzB2Ze2mut4qH7zWWIiuSETF4h8RetdGdSAnQ1y5CHpMzOSihf9b/Rrd26VvSc07fteVBxPuF6IL4tq+htbuARS9t0OIrvbxCNdGdUAW1kKILohrU88Pe0+Irkgi0r6ubxdqRVG8692H64LH3PVCdG8XIR1FiVvjwIEDfPHFF1XWL9Y3UkfRgdRRrETqKFYidRQrkTqKlUgdxUqkjmIlUkexEqmjeIO21FGsoMF0FOP+JkTXI+ZTIbq3i3Q8Rj1jNBrZvn07qampHD58a2tfJCQkJCQkJCQkJCQkRFCvaxQlHLueLly4sL7dkJCQkJCQkJCQkJCoCWmNooSEg9mhf61vF26J5taGNUA+fEi+a6O7jNLU6rcWv5vx3vixEF1RU1qPJ90jRBdgt6bmrfXriqipgOeUStdGdxk9tWKmnmYaXB8wXleCvBtWvRYVC5Hl7ZybmF0NG1q7B+LqiH9Y9Tu+3y4ip32LoulkMUsjLr53ybVRHWkwU09jRwjR9XhzoxDd20UaUZSQkJCQkJCQkJCQkHDFXXyUhQikjqKEhISEhISEhISEhIQr/mRTT6WOokStadG9LW0fi8aYWwB2O3ve+U8Vm3ZPdOXRGcP5asF6Ur4/Vu/agT2iCHs8mpJy3aMrv3D6vP3kJ9EEeFNyzYB/uzAOv/0ZhrPZ9aYLoAhvj1v7B7EbDWC3Y/72X1VslL0GACBv1ASZxgPTpnfqTVfZ8X5U3Xthz8/DbrdTsvETp8/d+z2G+omBYHYcQmza+TWle75zqSta+0ZycvX8I2E9qWfOsfnDf9zy9/8If317taNx/y6Ycwxgh4zlnzl93nhQNwIe60zhqQt4dWjB5a0/kvPdUZe6ouqeKH+h4dVr7YMd8Oz3IDa9AbvdTu4/N1Vr5zWgN4HLZ5DaYQj28sO1XSEqzqJ8boixEFUuGlq7J1JbVLkQmZMbWh2Rh0SgaNkRSgqx28F64Cunz1V9n0fmE1Bp7x+MaVM89gLXZx2KrNcS9YvUUfx/QEFBAbt372bIkLofZO4KpVrF4PgXWPnIDGxmKyNWv0qLB6M4+3NyhY1vcABF+kIM2bd2gKoobYVaRY/FL/BZn5mUma30TZhKYPcoshIrdd20an5Z4JgX3nxAV7rGPMd3Y1bUi64jGO6on32JokWTwWpF/cJsFOHtsaUdr9SOfgh7SRHWQ98DIA8MrT9dd3d0U6eRN340WCx4zotF2aETliTnm6/CRbGUXbnsWu+P0v4dR08k06fnA6Skn6u7iEB/5RoVEUvH8kuv6djNVtp9OA3fnm3J23uqwkahVnEmbhOlmbno2obSbu2rLm+CRdU9Uf5e/15DqtcytTv3LJhCRv+J2C1WglbNRdutPcX7jzvZqVqEoGrZ1OXvvxFRcRblc0OMhahy0dDaPZHawsqFwJzc4OqImxLVwyMwbVgANiuqJyYgD4mg7FJKhYnt4mlsuzeUX0CN6pHRteokiqzXdyX2P9fU04a3ClqiCgUFBXzxxReuDW+Dpp1akZeZg81sBeDC4TQi+nR0ssn77Rrn9t/62UuitJvc3wrjbzmUleteOZRO04c7ONkcebvyibNMLsdSVFpvugCKsAjK9NfA6tC2ZZzGLSrayUbZuTcyDx3KXgNQPTkSe2lJvekqI6OwXbkCFgsAluRTqLp0q2KnHjgYzdPPohkxCpln7TaiEKn9ex55qCdarbZO3/0j/PXuHI7pt2vYy8tc/sFU/Ps615HszT9Smulo1LVh91CU9ptLXVF1T5S/0PDqtaZjBJasq9gtDt3io7+i693FyUamdsdv7NPk3OQp/M0QFWdRPjfEWIgqFw2t3ROpLapciMzJDa2OyO9tgb1ADzaHblnWWRRh7ZxsbGmVR7S5RXXHmpxYrz5L3B002BHFzz//nBUrVjBmzBhSU1PJy8tjyJAh7Nu3jwsXLrBmzRp0Oh3p6emsXbuW8PBwzp07x6RJkwgJCWHTpk2cOXOGRo0akZWVxYIFCygqKmLatGkoFApat25NUlISAwYMYNiwYU7X1uv1LF68mBYtWnDp0iWeeuopdu3axXfffcfSpUtp2rQpr732GgMGDOC3335jx44d/PWvf+Xw4cNERkbi6enJyZMn0Wq1LFq0iGPHjvHWW2/RuXNnrFYrKSkpvPjiixw4cIBTp04RExNDu3btMBqNxMfHExoayuXLl+nTpw89e/Zky5YtZGZmsmrVKnr27MmuXbvYsWMHzzzzDCdOnADg/PnzdOrUiUWLFvHFF1+wefNm3n77bYKDg2sVb52/F6VFldMETMZiAhuF3pG/pShtjb8XFmNlZ8dsLKGRf/UHQMuVClo905PEuevqTRdApvPGXlpc+YapGJnO29nGtzEytRbzt/9GFhCIdlIsRfGTanzKJUzXxxd7SaWuvbgImU8rJxvLiSTMB/djNxhQRnfFc+4CCmZNqykMwrVFINJflb8XNmNlHbEZS1D6e1exk6uVhL3+DL7d25A8aZVLXVF1T5S/0PDqtcLPh7KiSt0yYzEKP+dYBLw2ktz3/gXlN1q1RVScRfncEGMhqlw0tHZPpLaociEyJze0OiLTemK3VJY3u7kEueZmo3syFM2isB7bU68+37X8ydYoNtgRxaFDh9K8eXOioqJYtmwZKpWKoqIiFi5cSGRkJImJjichMTExDB8+nLFjxzJo0CAWL14MwD333ENMTAwvvfQSGo2Gffv24e3tzfjx4zEYDEyfPp133nmHDRs2VLn20aNHMRgMPP/880yfPp1GjRoxc+ZMVCoVTZs2pXHjxoSHh/Pcc8/xxhtvoNfrGTFiBO+//z7//ve/eeSRR3jnnXdITk4mLy+Pjh070rdvXzw9PVmwYAGPP/44u3btYt68eYwdO5Zt27YBsGbNGpo1a8aECROYOXMmb775JlarlWHDhhEUFMTLL79Mhw4deOONN8jNzeVvf/sb7733Hq+++irjxo3D3d0dALlczuuvv17rTiKAMacAdw91xWu1TktRbkGd/35/hHZJTgFKnabitUqnwZRTVVeuVNBj0RgOL9lC4YWr9aYLYDcakLnfMLKl1jrWFN6IqRjb+TSH/bUsUGuR+frXj25+HjJNpa5M64E93/nYj7Irl7EbHNeyJB1DeV97kLtOPSK1RSDSX3NOAQpdZR1R6DRYcgxV7MpMFs7GbSJ50io6/edNZG41H4chqu6J8hcaXr226fORe1TqynVabPrKWLjd44/CW4fn4z3xG/8MAH5jBqNu26qK1u8RFWdRPjfEWIgqFw2t3ROpLapciMzJDa2O2IsLkSkry5tMpcFeUv2xOIoW7bFlnKhR74/w+W7FXlYm5N/dSoPtKF4nJMRxVoyXlxdNmzqejnh7e1NU5DhPJzU1lcTERBISEjhw4EDF9DKNRsOyZctISEjgzJkz6PWV5/qEhoYC4OfnV6FzI7179yY6OpoXX3yRmJgY3NzckMvlPPvss2zatIkff/yRXr16Vdj7+/vj4eGBXC7Hw8OjWj+Bivdv/C1eXl5Ov+XcuXMkJCSwfv16wsPDMRiqNoTXr+nt7Y1CoSAyMpIBAwbw008/UVhYyJEjR+jcufMtxfni0XR8g/xRqByD0M06h5Py/TE03h6439Bw1AVR2leOpKML9kdertskuhUX9yTh7uNR0dgp1Ep6LH6RkwnfkHPyPKH9o2uSFKoLYMtIQe4XAG4ObUVYJNbkQ6DVgdqhbU07jty/ieMLag3I5dgL8upF13I6GUWTJlB+/pgyqi3mg/uReXoiK69r2jHjQO64GVMEBVN2+XKttpcWqS0Ckf4aDqehDg5AVl7mfLq0Jmf3Mdx8PFCUl7mmk56ssC/N1qP080KuVtWoK6ruifIXGl69LjmWgjKwMTKlQ1fbqQ3GHw4i99Yh99BgvZxD9qyV6BO2ok/YCoD+4y8wnUp3qS0qzqJ8boixEFUuGlq7J1JbVLkQmZMbWh0pyz6LzMsPFA5deWALbBknwV0LKrWTraJNN6y/7ncZA9E+S9wdNNipp7UlIiKCfv36ERERgdlsZteuXQBMnTqV7du3ExgYiNFodPqOTCarUTMtLY0BAwYwduxYNm7cyCeffEJMTAxPP/00gwcP5tq1a8TFxQn5Lf7+/owcORKAbdu24ePjg9lsxm53DIWfPn2ayMjIKr/B3d2dAQMGMHfuXPr06XPL17aYzGyL+YgBb42iKLeAyykXOftzMo/Neo4Sg5EfV38JwENTnsInyJ/7nnwAm9VG+k+un0qJ0raZzCTO/pgHY0diyi1Af/oSWYnJdJk7nNL8Io6/+yUPrZqMX+tgPJuOBkCpcef814fqRdcRjFJM4ChkGAAAIABJREFUW97DfegE7EYDZVnnsaUdx33gGOzFhZh3f4Z592e4DxqDqt8zyPzvxfTpSrBa6ke3tBTjqpV4TJqK3ZCP9dxZLElH0b44EXthASVbNlGWp0c3dRq2y9m4hTancGm86ziI1v4dh46d4Mude8jJ1bNm3b8Y9dwQ1OUj8LVGoL9lJWZSZ3xA6/gxmHMLMP56kby9p2g5bwSWfCMXVm1H7q6k9eIXMWXm4NEqiLR567AZa15nKqruifIXGl69tptKuTz/XRrPm4hNX4ApNYPi/ccJeOMFbIbCihsnha8XPsP7A9Bo7NPkb/4G65WaN5IQFWdRPjfEWIgqFw2t3ROpLaxcCMzJDa6OWC2Yv9+EsvezUGykLCeTskspKHsMwW4qwnp4JwCygGDseVfBUrt1q0J9vlv5k009ldmv9zAaGImJicybN4/BgwfTp08fYmJiiIyMZNy4ccyfPx9vb2/mz5+PwWDgo48+Ijg4mOzsbAYOHEjnzp1Zvnw56enpdOrUib179+Lj48P8+fNZsWIFp0+fJjY2lvT0dBYtWkRcXByPPvpoxbUPHz7M1q1badGiBRcuXODZZ5/lvvvuA2DBggWEhoYyatQoALZu3crSpUtZuHAhAHPmzGH27NkEBgYyZ84cBgwYwFNPPVXh87Rp01ixYgUGg4EFCxawdu1aTp8+zYIFC2jevDnLli2jSZMmFBYWEhISwl//+lesVisTJkygZcuWtGzZEoClS5cyefJkxowZU+H3lStXGDZsGLt27UKlcv3kfnboX+/Y3+uPoLm1YQ2QDx+S79roLqM0tfqpKncz3hs/FqJrGDHGtVEdOJ50jxBdgN0a11M760LfEpsQ3XPlIwENiZ5avWujOpBpqNtGTbUhyLth1WtRsRBZ3s65iZnp0NDaPRBXR/zDqs4AuxPkZHgI0RVJ08khQnQvvndJiC5ARNrXwrTvJMaZYk4Y0C2pejzO3UCD7SjebZjNZlQqFcuWLWPSpEnodLr6dskJs9lMXl4en3/+OZMnT67Vd6SOolikjuIfg9RRrETqKIpH6iiKR+ooVtLQ2j2QOop/BFJHURzGNwYL0dUtE3t6QV35fz/19I8iISEBs9nMvffee9d1EktKSpg4cSLNmzfn5Zdfrm93JCQkJCQkJCQkJBoef7JzFKWO4h1iypQp9e3CTdFoNHzyySf17YaEhISEhISEhISERANB6ihKSEhISEhISEhISEi44k+2mY3UUZS4Kcuyfrzjml0DWt9xzessu5YqTFsEw+lY3y7cMqLWz4lcGzRY0FpCUWsfP7x/uhBdgIesYtbafKh2vVtpXbhgvSZEF+BDD7VrozoQ9JigNWPfiltHKHL9owi6zfYRovvojK+E6IK4tm9jibg6IoqjUX717cItIbJ+iGr7mi8SswfCgwm9hehKuObnn3/mu+++o1GjRshksiozGe12e8XZ75mZmRQUFLBo0aLbvq7UUZSQkJCQkJCQkJCQkHCBvR5GFEtKSpg/fz47duxApVLx8ssvs3//frp161Zhs337dry8vHjqqacASElJuSPXbnjbZUlISEhISEhISEhISPwJSEpKIjAwsOJou06dOvHDDz842Xz55Zfk5+ezfv16VqxYgYfHnZlBJI0oStwSvr4+LIyfTUbGRVq2DCNm3mKuXs1xsgkIaMSHa1eS+PNBGgf4o1QpeeXVGGo6icXTx5NJs8eRdTGb4LAg1iz+kLycvCp2QaGBTJk3EZvNRsz4BfXmr0htRXh73No/iN1oALsd87f/qmKj7DUAAHmjJsg0Hpg2veMyFqJ0fXu1o3H/LphzDGCHjOWfOX3eeFA3Ah7rTOGpC3h1aMHlrT+S891Rl7oAgT2iCHs8mpLcArDbObrSefvo9pOfRBPgTck1A/7twjj89mcYzma71FV2vB9V917Y8/Ow2+2UbHTe7Mm932OonxgIZjMApp1fU7rnu1r5/HtycvX8I2E9qWfOsfnDf9RJozradr+P6McfoCDHgN1u5z/vbKmTjqgYi/RZVL7QPtgBz34PYtM7/Mv956Zq7bwG9CZw+QxSOwzBXmxyqSuq7on0WVS9Fpkv5CERKFp2hJJC7HawHnCeSqrq+zwyn4BKe/9gTJvisRfUfOh3Q2v3ALx9vJg9/zUunv+N0BZNWfr3f5Bzzfl33tcxihcnPk/yydO0aBlK0tFT/Gv95/WiKyoni8z1osqyqJwssu79kp7JnlPn8fPQIJPBxH6dnD7P1Bey4quDRIX4k5qVy+MdWtA7qlmttO866mFEMTc316njp9PpyM11rndZWVkYjUamTJlCRkYGY8eO5euvv0ahuL1jse76juL+/fv56quv8PLyonXr1hVDqnXlwIEDeHl5ERkZeVObtLQ04uLieOqppxgy5OYHa27cuJEPP/yQ77//HoAhQ4awdevW2/6j3CqzZ8/m+eefp02bNsKvFff3Wez5fh+fffYlTz7Rj6VL3mT0mKlONm5ubmz/32/58CPHDcuRw7vo9sD9/Lz/8E11J856kcP7jvD9lz/SvV83prw5kb9PrTq3OqpjJPu/P0CXv3SuV3+FaSvdUT/7EkWLJoPVivqF2SjC22NLO16pGf0Q9pIirIcc5U4eGOo6EIJ05RoVEUvH8kuv6djNVtp9OA3fnm3J23uqwkahVnEmbhOlmbno2obSbu2rtWp8FGoVPRa/wGd9ZlJmttI3YSqB3aPISkyu9Fmr5pcFGwFoPqArXWOe47sxK2oWdndHN3UaeeNHg8WC57xYlB06YUly9qlwUSxlVy679NMVR08k06fnA6Skn7ttreuo1CpeWDiRGf2mYjVbefX9GUR1b0dy4slb0hEWY4E+g5h8IVO7c8+CKWT0n4jdYiVo1Vy03dpTvP+4k52qRQiqlk1r76yoOi3QZ1H1WmS+wE2J6uERmDYsAJsV1RMTkIdEUHapcvqV7eJpbLs3lAdFjeqR0S47idDw2j2AmfNeYd+Pv/DVtp30ffQvxMRO59VJc5xsmjQJ4KM1n3L86Cnc3Nw4lvYj3361hzz9zde3CdEVlZMF5npRZVlUThZZ90rMVuL/k8jn04eiclMwff0eDqRn0bVVYIXNuh9O0CG0Cc/3aktKZg5vfPrfBtxR/OOPx2jUqBFFRZXnhBqNRho1auRko9PpaN++PQBhYWEYjUays7MJDg6+rWvf9VNPd+zYwZNPPsnMmTN54oknblvv4MGDnD59ukab8PBwoqOjXWqNGDHC6fXnn3/+h3cSARYuXPiHdBIB+j/+ML/8cgSAxJ8P0f/xPlVssrOvVDSWHh5adB5aLlzMrFG328MPcOrIrwCcOHSKB/t0rdbuuy/2YLVY691fUdqKsAjK9NfA6viNtozTuEU5l0Vl597IPHQoew1A9eRI7KWuNxIRpevdORzTb9ewmx26+QdT8e/rvElP9uYfKc103Ixpw+6hKO03l7oATe5vhfG3HMrKta8cSqfpwx2cbI68XflEVCaXYykqdamrjIzCduUKWCwAWJJPoerSrYqdeuBgNE8/i2bEKGSedd/M4JGHeqLVauv8/epodX9rcjKvYS2PTdrhFDr2qf1N5HVExVikzyAmX2g6RmDJuoq93L746K/oendxspGp3fEb+zQ5Nxm1qw5RdU+kz6Lqtch8Ib+3BfYCPdgc2mVZZ1GEtXOysaVVdtrcorpjTU6slXZDa/cA+jzSiyOHkgA4dOAYfR7pVcVm17c/cPxoZUfBarW6vI4IXVE5WWSuF1WWReVkkXXvxIWr3OurQ+XmuP/tENqYvSkXnWz8dBryihwzGfRFJtoEN6qiI3FzOnToQFZWFubyke+jR4/Su3dv8vPzMRqNAHTr1o1Lly4Bjo6kzWYjICDgppq1xeWI4ueff86KFSsYM2YMqamp5OXlMWTIEPbt28eFCxdYs2YNOp2O9PR01q5dS3h4OOfOnWPSpEmEhISwadMmzpw5Q6NGjcjKymLBggUUFRUxbdo0FAoFrVu3JikpiQEDBjBs2DCnax86dIiTJ09itVrJycnhypUrvPvuu7zyyiucPHkSk8nEuHHj+OSTT2jTpg0pKSlMnz6dwMBAjEYjS5YsISQkhJycHLy9venfvz8HDx7E09OTzMxMxo8fz+rVq7FYLCiVSkpLS5k5c2aN8bh06RLx8fG0adOGJk2aVLy/Z88e4uPjWb9+PdeuXeOtt96ic+fOWK1WUlJSePHFFzlw4ACnTp0iJiaGdu3aYTQaiY+PJzQ0lMuXL9OnTx969uzJsmXL2LFjByNGjODUqVNotVoWLVqEyWQiNjaW4OBg9Ho9nTt3JjQ0lPj4eAYPHsyQIUM4e/YsH330EaGhoZw7d46xY8fi7+9fq3jXhsaNG1FY6CiUBQWF+Pn5olAosNlsVWyHDRvIxPEjeXv5ajIza54W4dvIh2JjMQDFhUV4+XqhUMix2W7vyY0of0Vpy3Te2EuLK98wFSPTeTvb+DZGptZi/vbfyAIC0U6KpSh+Uo2HwIrSVfl7YTNWTmOzGUtQ+ntXsZOrlYS9/gy+3duQPGnVTfVuROPvhcVYecNsNpbQyN+rWlu5UkGrZ3qSOHedS12Zjy/2kspY2IuLkPm0crKxnEjCfHA/doMBZXRXPOcuoGDWtFr5/Ufg1cgb0w2xKTYWE9qo+S3riIpxddwpn0FMvlD4+VBWVOlfmbEYhZ9zWQ54bSS57/0LbuGmXVTdE+mzqHotMl/ItJ7YLZXadnMJcs3NRlFlKJpFYT22p1baDa3dA2jk70dRoUPbWFiEj6/3TX0GGDXuOf658oOK3/lH6orKySJzvaiyLConi6x7emMJWvfK3Vs93FXojc4j9c/3asu09bt5+8tfOHUph/G/6/w2KOph6qlGo+Gtt94iLi4OX19fWrduTbdu3Vi6dCk+Pj6MHz+ecePGsWzZMt5//30uXrzIkiVLcHd3v+1ru+woDh06lG3bthEVFcXYsWOZPHkyRUVFLFy4kLi4OBITE3n00UeJiYlh5syZdOrUiQMHDrB48WLeffdd7rnnHoYPH45cLicuLo59+/bRu3dvxo8fz4oVK5g+fTp6vZ5Ro0ZV6bhER0cTGRnJ4MGD6drV8aRt06ZN9OjRg9GjR3Py5ElUKhXTpk0jJCSE7777jg0bNjBz5kzWrFlD06ZNGTduHACfffYZYWFhdOnShaCgoIoppW3btqVv374ATJw4kfT0dFq1ck4kN7Js2TIGDhxI//79KzrKAA8//DDr1q0DoGPHjvTt2xebzcarr77KunXr2LVrF8uWLWPXrl1s27aNdu3asWbNGpo1a8aECRMwmUw8/vjj7Nq1izfeeIP169czfPhwPD09eeKJJ8jLy+Py5cukpKQwY8YMtFotqampRERE0KVL5RPkOXPmVHREjx8/zty5c/n3v/9dq3jfjHFj/8ZTgx7DWFTM1au5eHrqMBgK8PLyRK/Pu2kDsWXL/7J165fs/m4Lv/2WxTfffu/0+aC/PUmvx3pQUlxCXm4+Wp0WY0ERWk8PCvIK6txYivJXtDaA3WhA5n7D6JNa61jXdCOmYmzn0xz217JArUXm649df/WmMRGla84pQKGrPGZAodNgyTFUsSszWTgbtwlNaBM6/edNfu4yFbu1+lhdpySnAKVOU/FapdNgyimoYidXKuixaAyHl2yh8MLNfb2OPT8PmaYyFjKtB/Z85ylRN05DsiQdw2vBQpDL62XKSXUU5BpQ3xAbrU5LQW7VuLtCVIxF+CwqX1zHps9H7lHpn1ynxaav9M/tHn8U3jo8H+9Z8Z7fmMEU/XgY06n0m+qKqnsifRZVr0XmC3txITJlpbZMpcFeUv2RIooW7bFlnKhRr6G1ewAjRj3Do0/2obiomNwcPR6eWgoKCtF5epCfZ7ipz4OG9ker1bBqecIfqnsdUTlZZK4XVZZF5WSRdc9Pp6G41FLxuqjUjJ/O+fihN7f8xODo1jzesQV6YwkDl37GjlnD8Nbefkfmz0L37t3p3r2703szZsyo+L+npyexsbF3/Lq1nnoaEhICgJeXF02bOp7SeXt7V8yZTU1NJTExkYSEBA4cOFAx1Uqj0bBs2TISEhI4c+YMer2+QjM0NBQAPz8/p7m3rmjRogUA7dq1Q61Ws3HjRtasWcPevXvJy8ur8KdZs8r5z08//XS1WhaLhaVLl5KQkMDVq1ed/KuOM2fOVOhej8nNuB6nG2Pm5eXlFLNz586RkJDA+vXrCQ8Px2BwVFx/f388y6dAXI9PZGQkzz33HFOnTmXSpEnI5VX/fKmpqRV+NW3a1Gl73LrGe+0Hn/LEgL/x7PDxfP3NHh544H4Auj8YzdffOBpBmUxGSIhjPnqvng8Q3dnxtMhut3PhYiZhYVWf7G7/9Cum/20WMeMXsH/PL7S93zF99r7otvz8/YEK3SaBjWvtq0h/RWsD2DJSkPsFgJvjGY4iLBJr8iHQ6kDtaDysaceR+5ePZqs1IJdjL6i6AcIfoWs4nIY6OACZyqHr06U1ObuP4ebjgaK8sWs66ckK+9JsPUo/L+RqVY26AFeOpKML9kdert0kuhUX9yTh7uNR0ZAq1Ep6LH6RkwnfkHPyPKH9XU8Zt5xORtGkCZSfX6WMaov54H5knp7IyvOWdsw4kDum0SiCgim7fPmu6SQCpB9JxT8oALfy2IR3juDY9zWvqa0OUTEW4bOofHGdkmMpKAMbI1M6/NN2aoPxh4PIvXXIPTRYL+eQPWsl+oSt6BO2AqD/+IsaO1wgru6J9FlUvRaZL8qyzyLz8gOFQ1se2AJbxklw14LK+aZV0aYb1l/316jX0No9gI2fbGXkM5OYOHo633/3E/dHO/yJ7tqR77/7qUI7MKjyPNzhzw/BP8CPVcsTaB3ZirAWVdeNidK9jqicLDLXiyrLonKyyLp3X7PGZOcZMZd3KJPOX6VnRFMMxaUYTY6pkpfzi/D3csTcS+OOXAZlLjYMvGsps4v5d5dyxzaziYiIoF+/fkRERGA2m9m1axcAU6dOZfv27RXTQW9EJpPV6Vo3fm/p0qX069ePp556in379vHVV19V+HPxomOOtN1uZ/PmzRUjm3a7nStXrqBUKpkxYwZHjhxBpVKRmur6wPaWLVty/vx5oqKiKuYC15WIiAj8/f0ZOXIkANu2bcPHx6fKb7zOpUuXaN++Pc888ww//PADq1at4v3336+iefHiRXx8fLhw4QIREREVn9U13jcSM28xixbOIbxVc5o3b8aMmY6nF/fd14Z1H79Dx059MZlKmT59EklJp/D09EAmk7Huk8016r6/+EMmzxlPSPMQgpoF8s9Yx+9q2aY5896Zzci+YwHo8ciDdO/bjaYtQvjrpGfZtLpmXVH+CtO2lGLa8h7uQydgNxooyzqPLe047gPHYC8uxLz7M8y7P8N90BhU/Z5B5n8vpk9XgtVyc02BumUlZlJnfEDr+DGYcwsw/nqRvL2naDlvBJZ8IxdWbUfurqT14hcxZebg0SqItHnrsBldr8Gymcwkzv6YB2NHYsotQH/6ElmJyXSZO5zS/CKOv/slD62ajF/rYDybjgZAqXHn/NeHahYuLcW4aiUek6ZiN+RjPXcWS9JRtC9OxF5YQMmWTZTl6dFNnYbtcjZuoc0pXBrv0t+bcejYCb7cuYecXD1r1v2LUc8NQX2b00HMJjMfzV3DqLfGUqAv4OLp83XaFEZYjAX6DGLyhd1UyuX579J43kRs+gJMqRkU7z9OwBsvYDMUVnS0FL5e+AzvD0CjsU+Tv/kbrFdq2BBFVJ0W6LOoei0yX2C1YP5+E8rez0KxkbKcTMoupaDsMQS7qQjr4Z0AyAKCseddBUvt19o2tHYPYMnf32HOW68R1qIZzcJCiHtzOQCRUeH8z/uLeKTHEPo9/hAxf3+d5BMpPNK/D75+Prw5cyEZZy/8sbqicrLAXC+qLIvKySLrnkblxpzB3VmyfT++Hmpa3etL11aBrNxxEG+tOy881J43BnRl475kjl+4Qqa+kJcf64yvh9ql9t2Iqx3x/78hs7v4xYmJicybN4/BgwfTp08fYmJiiIyMZNy4ccyfPx9vb2/mz5+PwWDgo48+Ijg4mOzsbAYOHEjnzp1Zvnw56enpdOrUib179+Lj48P8+fNZsWIFp0+fJjY2lvT0dBYtWkRcXByPPvpoxbUPHz5MfHw8kZGRjBgxgosXL/Lmm28yZswYxo0bh1KpZOfOnXz66ad07dqV7Oxsfv31V2JjYwkLC2PJkiUEBgZSUFBAr1696NatGwcPHmTdunVoNBpmzJjB8uXLsVgstG3blv/93/8lKiqKUaNGER8fj7e3NzExMU5rES9evMjf//53IiIi8PT0ZO3atcybNw8vLy9iY2Pp378/Q4cOrYjNtGnTWLFiBQaDgQULFrB27VpOnz7NggULaN68OcuWLaNJkyYUFhYSEhLCX//6V7Zu3crSpUuJi4vD29ubOXPmMGjQIAYMGMCqVauIjIwkOzubv/zlLwQGBlbYxcTEYDQa+eCDD2jWrBkZGRmMHz+ekJAQ3nrrLZfx/j1uqqA7UMSc6RrQ+o5rXufANdcd/buJvIkdXRvdZRzcfGfO5fk955RK10Z1ZHDU7T3QuRneGz8Wojvq/ulCdAEeson5+/1XUfsZCrfCBeutT6etLR8KukkJekzMHnGZ34ob0c401H2zpvqg22wfIbpeM75ybVRHRLV9l0quCdEVydFov/p24ZY4nnSPa6M6Iqrta25x/aCpLjyYcL8QXQDNoBmuje4CCibc/L75dvBas1OI7u3isqMo8edF6iiKReooViJ1FCuROoqVSB3FSqSOYiVSR7ESqaMoHqmjWInUUYSCcY8I0fVaW7ezmkVz1x+PISEhISEhISEhISEhIfHHcsfWKEpISEhISEhISEhISPy/5S7eeEYE0tRTiZtS/D8T7rjm/kX5ro3qiKjpSKIQGYv2HS67NqoD7q3FTFFTNL/z05yvIyrOH6rNQnQ/ObJciC7Aug5vCtEdnXTnt+QWjahY9NTWvHN2XRE5PVRUvvgiueadwe82RrwuZmo2wMa3xUzPFuWzLKyFEF2A2FeShOj2Lan5GIe6EuRd/TErdwL/MDHlYnmqmDb1gr0WG0vVkU0XvhCmfScxjOkrRNf7491CdG8XaeqphISEhISEhISEhISEhBPS1FMJCQkJCQkJCQkJCQlX/MmmnkodRYlaIw+JQNGyI5QUYreD9YDzDnGqvs8j8wmotPcPxrQpHntBDWeMlePbqx2N+3fBnGMAO2Qs/8zp88aDuhHwWGcKT13Aq0MLLm/9kZzvjtabzw0xFsqO96Pq3gt7fh52u52SjZ84fe7e7zHUTwwEs2NKpWnn15Tucb0LlyK8PW7tH8RuNIDdjvnbf1W9dq8BAMgbNUGm8cC06R2XuiAuzqJiXB1tu99H9OMPUJBjwG638593ttRJ5/fk5Or5R8J6Us+cY/OH/6izTmCPKMIej6YktwDsdo6udJ7+037yk2gCvCm5ZsC/XRiH3/4Mw9nsevVZlK6oWGgf7IBnvwex6R1lIPefm6q18xrQm8DlM0jtMAR7salWPje0fCGyvInSFpWHGpq/AL+kZ7Ln1Hn8PDTIZDCxXyenzzP1haz46iBRIf6kZuXyeIcW9I5q5lK3Rfe2tH0sGmN5LPa8858qNu2e6MqjM4bz1YL1pHx/zKUmiM31ouq1qLonKsbVIardk/jjkTqKt8kPP/xAbGws69evJzg4mNmzZ/P888/Tpk2b+nbtzuKmRPXwCEwbFoDNiuqJCchDIii7lFJhYrt4GtvuDY4XKjWqR0bXquGRa1RELB3LL72mYzdbaffhNHx7tiVv76kKG4VaxZm4TZRm5qJrG0q7ta+6TuaifG6IsXB3Rzd1GnnjR4PFgue8WJQdOmFJcv5e4aJYyq7cwnolpTvqZ1+iaNFksFpRvzAbRXh7bGnHK0zcoh/CXlKE9dD3jt8YGFo7bUFxFhbjalCpVbywcCIz+k3Farby6vsziOrers4Hzd/I0RPJ9On5ACnp5+qsoVCr6LH4BT7rM5Mys5W+CVMJ7B5FVmJyhY2bVs0vCzYC0HxAV7rGPMd3Y1bUm8+idEXFQqZ2554FU8joPxG7xUrQqrlou7WneP9xJztVixBULZveks8NLV+ILG/CtAXloYbmL0CJ2Ur8fxL5fPpQVG4Kpq/fw4H0LLq2CqywWffDCTqENuH5Xm1JyczhjU//67KjqFSrGBz/AisfmYHNbGXE6ldp8WAUZ3+ujIVvcABF+kIM2a79vI7IXC+sXguqe6JiXB0i2727AnEnFd2VSGsUb5PevXsTFFS5aHjhwoX//zqJgPzeFtgL9GCzAlCWdRZFWDsnG1va4Yr/u0V1x5qcWCtt787hmH67ht3s0M4/mIp/X+czBrM3/0hppiN5acPuoSjtt3rzuSHGQhkZhe3KFSg/W8mSfApVl25V7NQDB6N5+lk0I0Yh83S9eYYiLIIy/TWwOvy1ZZzGLSra+dqdeyPz0KHsNQDVkyOxl9ZuMbyoOIuKcXW0ur81OZnXsJZfK+1wCh37dK6T1u955KGeaLXa29Jocn8rjL/lUFbu35VD6TR9uIOTzZG3K5/Ay+RyLEWldb7enfBZlK6oWGg6RmDJuord4tAtPvorut5dnGxkanf8xj5Nzk1GJG5GQ8sXIsubKG1Reaih+Qtw4sJV7vXVoXJTANAhtDF7Uy462fjpNOQVOUbN9EUm2gQ3cqnbtFMr8jJzsJXH4sLhNCL6OJfjvN+ucW7/r7Xy8zoic72oei2q7omKcXWIbPck/nikEcVyPv/8c1asWMFzzz3H5cuXSUlJYeXKlSxatIiOHTuSlpbGCy+8QGRkJGazmTlz5uDv70/jxo0xGo0ApKSkEB8fz+DBg+nZsyfz588nMjKSl19+mRU6ud3QAAAgAElEQVQrVnDs2DE2bNjAuXPnWLNmDS1atCA9PZ3JkycTFhbm5M+mTZvIyMjA19eXwsJCZsyYwX//+18WLVrEQw89RFlZGbt27WLq1KlV/N68eTNLlizBx8eHgoICwsLCePbZZ/nggw949913eeWVVzh58iQmk4l33323VvGRaT2xWyqnTNjNJcg1N3tKJkPRLArrsT210lb5e2EzVmrbjCUo/b2r2MnVSsJefwbf7m1InrSq3nxukLHw8cVeUlzpc3ERMp9WTjaWE0mYD+7HbjCgjO6K59wFFMyaVrOuzht7aaUupmJkOmd/Zb6Nkam1mL/9N7KAQLSTYimKnwT2mh/LiYqzqBhXh1cjb0zGyo5xsbGY0EbN66QlAo2/F5Yb/DMbS2jk71WtrVypoNUzPUmcu+4P8u6PRVQsFH4+lBVV6pYZi1H4OZe3gNdGkvvev6D8prO2NLR8IbK8idIWlYcamr8AemMJWvfKA+I93FXojc6jT8/3asu09bt5+8tfOHUph/G/6/xWh87fi9KiSp9NxmICG4XWyqeaEJnrRdVrUXVPVIyr425v924Xu7RG8c/J0KFD2bZtG+3atWPKlCmcPHkSpVLJSy+9RFRUFMnJyaxevZp//OMfbN26FQ8PD2bNmkVZWRnr168HICIigi5dHE+UAgIC6Nu3L5mZmQAMGzaMY8cc871/+ukn3N3dGT16NFeuXMHd3d3Jl7Nnz7Jhwwa+/vprZDIZs2bNYs+ePfTt25fvvvuOZs2aMWLECAYNGkS7du2q+L1161asViuTJ08G4Mknn6Rz586MHTuWTZs20aNHD0aPHs3Jk7WfBmAvLkSmVFe8lqk02Euq3zJa0aI9towTtdY25xSg0FVqK3QaLDmGKnZlJgtn4zahCW1Cp/+8yc9dpmK33nw7bFE+N8hY5Och01SOuMi0HtjznY+NuHEaiyXpGF4LFoJcDmU379DZjQZk7jeM5Ki1jrWKN2IqxnY+zWF/LQvUWmS+/tj1V2+qC+LiLCrG1VGQa0Ct01S81uq0FORWvVZ9UZJTgPIG/1Q6Daacgip2cqWCHovGcHjJFgov1Px3a6iIioVNn4/co1JXrtNi01eWAbd7/FF46/B8vGfFe35jBlP042FMp9Jr1G5o+UJkeROlLSoPNTR/wTFaWFxqqXhdVGrG74byB/Dmlp8YHN2axzu2QG8sYeDSz9gxaxjeWvffy1VgzCnA3aNSR63TUpRbNRa3ishcL6pei6p7omJcHXd7u3fb/Mk6itLU09/RvLnjqUe7du1wc3Njx44drF69mq+//pq8vDwA0tPTCQ0NBUAulztNPa0Nw4YNw8/PjxEjRrBq1Src3Jz762lpacjlctauXUtCQgJubm4Vo5YALVq0qPCxOr9TU1MJCak8vyo4OJi0tLQav++KsuyzyLz8QOHwVR7YAlvGSXDXgsq5oVC06Yb11/211jYcTkMdHIBM5dD26dKanN3HcPPxQFGebJpOerLCvjRbj9LPC7laVS8+N8RYWE4no2jSBJSOp8HKqLaYD+5H5umJrHzKnnbMOJA7phQpgoIpu3y5xoYHwJaRgtwvAMrLsCIsEmvyIdDqQO3w15p2HLl/E8cX1BqQy7EX5LmMhag4i4pxdaQfScU/KAC38muFd47g2PeHXXzrj+PKkXR0wf7Iy/1rEt2Ki3uScPfxqLiJVaiV9Fj8IicTviHn5HlC+0fXJNlgERWLkmMpKAMbI1M6dLWd2mD84SBybx1yDw3Wyzlkz1qJPmEr+oStAOg//sJlJxEaXr4QWd5EaYvKQw3NX4D7mjUmO8+IubwTlXT+Kj0jmmIoLsVocmyscjm/CH8vRxnx0rgjl0GZi+O6Lx5NxzfIH0V5LJp1Difl+2NovD1wv6HDcauIzPWi6rWouicqxtVxt7d7EreGNKL4O2QyWcX/ExIS8PLyYtKkSWRkZHDihOPJW8uWLSs6XmVlZRWjhr/Hw8OjooOXnV25U9nx48cZP348r776KkuWLGH79u2MGTOm4vPw8HDc3d0ZP348AMnJyU6dyRt9rO69iIgIUlIqF67/9ttvhIeH1/h9l1gtmL/fhLL3s1BspCwnk7JLKSh7DMFuKsJ6eKdDOyAYe95VsNR+HVNZiZnUGR/QOn4M5twCjL9eJG/vKVrOG4El38iFVduRuytpvfhFTJk5eLQKIm3eOmxGF2vdRPncEGNRWopx1Uo8Jk3FbsjHeu4slqSjaF+ciL2wgJItmyjL06ObOg3b5WzcQptTuDTetcOWUkxb3sN96ATsRgNlWeexpR3HfeAY7MWFmHd/hnn3Z7gPGoOq3zPI/O/F9OlKsFpcawuKs7AYV4PZZOajuWsY9dZYCvQFXDx9/o4t6D907ARf7txDTq6eNev+xajnhqB2v/lT++qwmcwkzv6YB2NHYsotQH/6ElmJyXSZO5zS/CKOv/slD62ajF/rYDybjgZAqXHn/NeH6s1nUbqiYmE3lXJ5/rs0njcRm74AU2oGxfuPE/DGC9gMhRU3kQpfL3yG9weg0dinyd/8DdYrNW8q0dDyhcjyJkxbUB5qaP4CaFRuzBncnSXb9+ProabVvb50bRXIyh0H8da688JD7XljQFc27kvm+IUrZOoLefmxzvh6qGvUtZjMbIv5iAFvjaIot4DLKRc5+3Myj816jhKDkR9XfwnAQ1OewifIn/uefACb1Ub6TzWPhorM9cLqtaC6JyrG1SGy3bsr+JNtZiOz21086vmTkJiYyLx583j00UcZN24cfn5+HD58mBUrVhAdHY3ZbGbnzp3Ex8fTqVMn5syZg6+vL97e3nzzzTf07t2bp556iri4OLy9vYmJiUGr1fLaa6/RpUsXVCoV69evZ/78+ZSUlPDzzz8THBzMuXPneOmll5xGAAG2bNnC2bNn8fDwID8/n+nTp3P27NmKdY8TJkygWbNm1fpts9lYvHgxXl5eGAyG/2PvzMOiKts//pkZZhiGYRXUF1nFBUTc0VQ0F3JfsjLtl+ZuatpqprmUvppLaqmvlmSLVpZpapttapmpqbiiKSAouCDKOgwMDAzz+wMCJ9AB5RGmzue6uC7mzH2+z33u8yzznPMsNG7cmGHDhvH9998zb948xowZw4QJE1AqlbeJRjG5bz9d7XE+tDjTutFd0nGWqzBtEYiMRctWVVi5tArYN7U+af5uUDSs2lv5qiAqzu+rjUJ0Nx5bIUQX4KNW84Tojj65QIiuSETFoosmXYju1SwxZQ/E1Rc7zvpYN6pFPDndUZj2p8tzhOiK8lkWEChEF2DBcyeF6EYYqjY1oLI0cKl46G514BEgJl+siBHTpiaaq/7AtLJsTtxh3agWkDmsuxBd1y2/CNG9V6Q3iiV07tyZvXv3Whxr164dmzeXrVb1yiuvlP6/YkXZj7lp06aV/v/XfMW/2LBhQ+n/o0ePLv2/T58+d/Tn8ccfL3esRYsW7NhhWZAq8luhUDB79uxy5/ft25e+ffveMV0JCQkJCQkJCQkJifJIi9lISEhISEhISEhISEhIWPIvG3oqLWYjISEhISEhISEhISEhYYH0RlFCQkJCQkJCQkJCQsIK/7ahp9JiNhK3pXODHtWu+b6V1c/uhXG3bCZrC3RV1hemLWrCuZ+sepfR/guRE+RF+dyw0PYGZIhadEbUwjAJduLG+IxSiVnkaH+uuxBdW8TWFvaZZ3dTiC6Ia/tsrd2TsG387FyEadvKYjbpQx4Uouu+Y58Q3XtFeqMoISEhISEhISEhISFhjX/ZHEWpoyghISEhISEhISEhIWEFs9RRlJC4PU6uTkyeNYFrScl4BzRg/ZL3yUjNKGfXwN+LqXMnYTKZmDNxvlVdTadWOD3UCVN6FmazmbT/ba7QznlgN7xWzCCm1SOYc60PuRHlr0jtwM7Nad4nDH2aDsxm9qzaXs4mtH8Hes8YzrfzN3F+74lK+VsRzTu3IKzvA+hSi+O+fdUXVdawNX9F+uwVHkJA3zAMJbrH37IcStNyygAcPF0w3MzCIzSAqOXbyIpPrnHtW0lNS2d15CZiLiSw5f3VVT7/fvgr6v6JqodsMV+I0hUVYwC3rqHU7dceY2oWmOHiim0W39cd3BHPPu3IPpOIc6tArm/dR+pPx63q2lq7J9JnW9O1RZ9tMRYV4eii5YmZI0lJSqF+wH/YsuwTdKlZd6UlUXPY3iSbauKjjz6663OvXLnC7t27q88ZG2LSzHFE/X6MT9Z+xv4fDzB13qQK7UJaB3No7+FKacrU9tSfP5Ubb0SSuuZT1E0D0HRsWc5OFeiDqpFvjfsrUlupVjFk0Vi+/e/H7Hn7S+oH+RLYKcTCxs3bk5z0bLKS06rk799RqVWMfWMSHy/4gC/f3oJvsD8hnUOrpGFr/or0WaFWEb5kLIfmf8LxldtxD/bBq7Olrp1GzR/zP+XUum+5uOsIHeY8UePaf+f46bP06PIA9zJ7XaS/ou6fqHrIFvOFKF2Rdb3cQUXQsvHEztvIxeXb0Dbzxa1L83LXdWHhZpLWfs2lVTtoPP+pSmnbWrsnymdb1LVFn20xFhUxbMYIon8/xTfvbOfYj4d5cvboe9KrNRQJ+qul/Gs7ips2bbrrc69evfqv7Sh27PkAZ479CcDpo2fo1KNDhXY/7dhDYUFhpTQdWgdRcO0G5hL73ON/ou3W3sJGprbHffxjpN7miev99Fektm+bxmRcTcVkLLZPjIolqEdrC5uMKzdJOPRnlXytiMZtm5J69SaFJWnFRp2ndY92VdKwNX9F+lyvbWP0V1IpKtFNORqHb89WFjbHlpe94ZDJ5RTk5Ne49t/p1b0LGo3mrs79C5H+irp/ouohW8wXonRF1vUu7ZqQd+Um5hKfM4/E4BFhmS+St+wj/2rxwwNNQH1yYq9UStvW2j1RPtuiri36bIuxqIjWPdoSdzwGgJio87Tu0fae9CRqhn/l0NNdu3ah0+lYs2YNDRs2pH///qxatQqTyYRcLsfR0ZEJEyawfv161q5dy4YNGzh79iwHDx7k1VdfZceOHZw7d441a9bQr18/3nvvPQCWLFnCli1bWL9+PXv37mXv3r0sXryY7t27U1RUxM8//8y+ffsqTOvvbN68mYsXL+Lm5kZ2djYzZszgl19+Kaf37LPPsnLlSp544gmuX7/O+fPn2bJlC0uXLsXV1RWdTkdAQADDhg1jw4YNrF27lueee47o6Gjy8vJYu3ZtlWLnVseVXH0uALnZOTi7OaNQyDGZ7v5xiMLdlaKcslUvi/S5KNwtV9byfOEp0tZ9BlWsuET4K1Jb6+FM/i2r2OXpc/Gq43+vrlaIcx0X8vRlcc/V5+Jfp2GVNGzNXxDns4OHMwW3+GfUG6jj4VyhrVypoPHQLhyY/VGNa4tApL+i7p+oesgW84UoXZF1vcrDGZO+LF+Y9AaUHuVXaJSrlQRMH4pb52acnbymUtq21u6J8tkWdW3RZ1uMRUU413EhrySPG/S5aF2dkCvkFAlI634izVH8F9CvXz+WL1/OtGnTANi/fz+nTp3igw8+AGDkyJGEh4fz9NNPU1hYyNdff41SqWTNmjWo1WqGDBkCUHr+kCFD2LGjeP7GsGHDWL9+PQA9evTgp59+ws/PjyeffJLBgwffNq3g4OBS/+Lj4/n444/ZtWsXMpmMmTNnsmfPHiIiIsrphYaGsnPnTkJDQ5k6dSrR0dFs3bqVwsJCpkyZAsCAAQNo164d48ePZ/PmzYSHhzN69Giio6MrFa/BIwbQtU84hlwDGWmZaLQa9LocNE6O6DJ091zBmNIzkTuWbWEg12owpZeNY7er74HCRYtT3y6lx9zHDCFnXxR5Z+Luq7+iY6FP1WF/yzLqaq2GnDTdPWneDl1aFmptWdw1Wg26tKrNH7A1f0Gcz4ZUHcpb/FNpHchLLa8rVyoIXzyGqKVfkJ14o8a1RSDSX1H3r7rrob+wxXwhSldUjAGMqToU2rJ8odA6UFDBfKiivALiF27Gwb8ebbbP42D7ZzEXmsrZ2Vq7J9JnW9O1RZ9tMRYV0eP/ehHWuwN5uXnFbbajA7m6XBy0GvSZ2TbfSfw38q8denorMTExGAwGIiMjiYyMpH79+qSnF+//NHnyZKKiomjUqBFq9d3tgxQYGAhAaGjoHdP6i9jYWORyOe+99x6RkZHY2dmh1+sr1PuLhg0bWqTh4+NT+p23tzexsbF3PP9OfPXJt7w0YiZzJs7n0J4/aN62GQAtwppzsGQMu0wmo55X3coF5G8YTpxH6VUXmbL4uYWmTTP0vx5B7qJF7uhA4fVUkme+RXrkVtIjtwKQ/uGO2zaWIv0VHYuk43G4NfBAoSqOhV+7JpzfewIHF0fstdW7H2DcsRg8GnhiV5JWk3ZBnNgb9Y/2V6TPKcfi0Hp7IC/RrRfWmKQ9J7F3dSz90a1QKwlfMo7oyO9Jjb6Ef7+wGtcWgUh/Rd2/6q6H/sIW84UoXVExBsiKikXt7YmsxGfX9k1J3X0CO1dHFCU++04eUGqfn5yO0t0ZuVpVoZ6ttXsifbY1XVv02RZjURF7N//E0lH/ZdXkNzmx9xiN2zQFoGm7IE7sPXbP+rWCf9kcRcXrr7/+ek07URN8/PHHjBw5knPnzuHo6EhCQgJz586lbdu2ODk54e/vj5OTEz/99BMtWrRg48aNdO3aFRcXF65fv87Zs2fp0qULSUlJyGQyDh48SL9+/bh27Rrbt29n1KhRAOzevZvg4GC8vb0ByM3NvW1afyGXy9m7dy/Lli2jbdu21KtXj3r16uHh4VFOD2DHjh1ERETg7Fw8NCglJYX4+Hi6desGwLvvvsuIESNwd3dn48aNjB49ulIx+mDlxnLHoqPO8vCIgTRqFkhou+asWxhJXm4ejUMCWRj5Ojs2fQ1AeK9O9BjQDb9AXxwcHYiOOgvAYFUFL7ELTeTHX8Z93CM4tAyi8GY6uu278Xh2BPZN/DGUjKdXuDnjPmYIjh1bQqEJ46WrFkN3vq5geM69+nsn7lXbT6Etp1lUaOLGhat0mdAfn1aNyL6RybFtvxHxwmPUb+pNYlRxh7/71Idp2LEZaq0DRoOR9MQUC50srA9VMhWauHrhCv0nDKZR6yZkpqTz29Zf7niOq0xpU/6K9NmtSGbx2VxoIjPuGi2e7kfd1oHk3sgk9ovfaPvSo7gH+ZByNJae70zDs0UA9TsE0eTxrjTo1Izzm61fQ3Vpt5rU3WpaR0+c5psf9xATl0Befj7Ng5tgZ3fnwScn37VMp7r8zZCXX1Gnuu5fK8XfVo+spnooqcCys2oL+UKUrp/SYClcTTHOzrev0Ofc2Kv4TR6Ic9vGGFMySf78Vxq+/DjaYB+yjsTgFh5CvYc7ow32xWt4N65+sgddVNlD1F/kuRXGozrakXJtn8B2r7p8/ifo2qLPtTkWrvLKvzCJPXaenk/2xjfYjyZtg9i8eBP5ubefy/zoC8MrrV2T5Gz6CMxU+5/jU2Pu63VUFpnZfC9r29kuCxcuxM7ODpPJxOzZs1m3bh0GgwGFQkF+fj7Tp09nx44dfPbZZ6xfv56lS5cSExPDnDlzaNKkCc899xz+/v706NGD8PBwpk2bRkhICA0aNGDRokW89tpr+Pr68tprrxEcHMzTTz+Nn58fQIVpKRQKC/+++OIL4uPjcXR0JDMzk5deeon4+PhyegcOHGDu3Ln07t2bCRMm4O7ujslkYsmSJTg7O5OVlUXjxo0ZNmwY33//PfPmzWPMmDFMmDABpVJZUWhK6dygR7XH/X3Hu3srWxnG5VRu2fDaQldlfWHaiWaDdaO7wE9WvW8J/0KUvyDO54aFtjcgY/TJBUJ0P2o1T4hugp24x6yjVJlCdPfnugvRtUW6aNKtG90FV7OcrBvdBfPsbgrRBXFtn621exK2jZ9d+Xm/1cXmxB3WjWoBNx96UIiu58/7hOjeK//ajqKEdaSOolikjmIZUkfx/iB1FMuQOorikTqKZUgdRYl/AlJHEW70FNNRrLundnYUbe+XjoSEhISEhISEhISEhIRQ/pWrnkpISEhISEhISEhISFQFaXsMCYkSfn65cbVrHlosZrgXwM+zfKwb1SJExqJlKzFDvjR9g4TogpihZCAuzu+rxQyX3XhshRBdEDdEVNSQVtOVP4XoAnw84HMhuqKGW4qkQR8xg4uObBFTrhOszK+/W36eXv1t3l98ujxHiO7PLzsK0RXJf9/OFqIbYSi/zUl10MBFjL8gruy98XXF+55KVANmmXWbfxDS0FMJCQkJCQkJCQkJCQkJC6Q3ihISEhISEhISEhISElaQhp5KSNwGuU8QikatwZCN2QyFh7+1+F4VMRKZq2eZvYc3eZsXYdalWdV26xpK3X7tMaZmgRkurthm8X3dwR3x7NOO7DOJOLcK5PrWfaT+dLzGfLbFWChbt0XVuSvmzAzMZjOGTy33ybR/qA/q/oPAaAQg78dd5O/5yaquyFiI0hYV44po3rkFYX0fQJeahdlsZvuqL+5K5++kpqWzOnITMRcS2PL+6rvW8QoPIaBvGIY0HZjNHH/LcuW5llMG4ODpguFmFh6hAUQt30ZWfHKN+vzH6Vj2HInG3UWLDJg0tLfF91dvpPPO1h8J9K5H/JUURvZ/kKb+XlZ1RcVC06kVTg91wpRenAfS/re5Qjvngd3wWjGDmFaPYM6t3GqWorQVTVpi17ITZn0WmM0Yf/isnI2y60AA5HXqIXNwJG/zKqu6IsueqPsnqh6yNX9Fagd2bk7zPmHoS2KxZ9X2cjah/TvQe8Zwvp2/ifN7T1j1FcTmN1sre6JiLFpbomaROorArFmzGDlyJM2aNbtnrY8++qjSG9rbFHZKVD2fJO/j+WAqRNX/aeQ+QRRdPl9qYko6h2n3x8UfVGpUvUZXruFxUBG0bDx/dH0Js7GQ0PdfxK1LczL2nym1UahVXFi4mfyraWib+xP63vPWK3NRPttiLOzt0T77IhkTR0NBAU5zF6Bs1YaCk5bnZS9eQFHKdat+liIwFqK0hcW4AlRqFWPfmMSMh56l0FjI8+/OIKRzKGcPRFdZ6+8cP32WHl0e4Hxcwl1rKNQqwpeMZVuPVygyFhIR+SxenUO4dqBs42Y7jZo/5n8KQMOBHegw5wl+GrOyxnw25BtZuGEb21fMQKW048UVH3E4OpYOoU1Kbd7cuJOBD4bRs30ocUnJvLrmU7a+Of2OuqJiIVPbU3/+VC72m4S5oJAGa2aj6diS3EOnLOxUgT6oGvlWKRbCtJX2qIc9Q87iKVBYiHrsLBRNWmKKLdO1C+uO2ZBD4dG9AMi9/K3Kiix7wvKyoHrI1vwVqa1UqxiyaCxv9ZqByVjIk+88T2CnEOIPlsXCzduTnPRsspIr4WcJIvObrZU9UTEWrV0bMRdJcxT/dbzxxhvV0kkE2LRpU7Xo1Dbk/wnErEsHUyEARdfiUQSEWtiYYqNK/7cL6Uzh2QOV0nZp14S8KzcxG4u1M4/E4BHR2sImecs+8q8WVzCagPrkxF6pMZ9tMRbK4BBMKSlQUABAwdkzqNp3LGenHjQEh8eG4fDkKGRO1heiEBkLUdqiYlwRjds2JfXqTQpL0oqNOk/rHu3uSuvv9OreBY1Gc08a9do2Rn8llaIS/1KOxuHbs5WFzbHlZU/gZXI5BTn5d51edfh8OvYS//F0Q6Usfs7ZqmkAv504Z2GTeD2V/3i4AtCgrjuxSclk6PR31BUVC4fWQRRcu4G5oFg39/ifaLu1t7CRqe1xH/8Yqbd5I3G/tRUBQRSl34TCYl3TxXPYhYRZ2CjbdUPmqEXZdSCqAU9hzre+uJPIsifq/omqh2zNX5Havm0ak3E1FVNJLBKjYgnqYZkvMq7cJOFQ1Ra3EpnfbK3siYqxaO3aiLlIzF9txabfKO7cuZOFCxfy9NNPk5OTw/nz55k9ezY+Pj5ERUXx5ZdfEhgYyMWLF3nppZfQ6XTMmjULT09PvL29+eGHH1i6dCmrV69myJAh9OzZkxdffBGFQkFgYCAnTpxg2LBhxMbG8ueff9KvXz+GDRsGwKpVqzCZTMjlchwdHZkwYQK7du1Cp9OxZs0aGjZsSP/+/Su0+/LLL1m5ciVPPPEE169f5/z582zbZjkcorLn9enTh7Vr1/Lcc88RHR1NXl4eS5cuZdmyZXh7e3Pt2jXCw8OJiIjgzTff5LvvvmPo0KGcPn0aX19fZs+eXalYyzROmAvKhkyYjQbkDrd7SiZD4RdC4Yk9ldJWeThj0pdpm/QGlB7lN3WVq5UETB+KW+dmnJ28psZ8tslYuLphNuSW+Zybg8zVcoW/gtMnMR45hDkrC2VYB5xmz0c388U76wqMhShtUTGuCOc6LuTpyxrxXH0u/nUa3pWWCBw8nCm4xT+j3kAdj4pXy5MrFTQe2oUDsz+6T95VTLpOj6PavvSz1sGec1mWncDWTQM4HZdIs4Y+nLlwGYAcQz5uztrb6oqKhcLdlaKcMt0ifS4Kd8v85vnCU6St+wxKfnRWFlHaMq0L5vyy+oK8XGRaS12ZW11kag3GHz5H5umFZvICchZNvuMvHpFlT9T9E1UP2Zq/IrW1Hs7k55Tp5ulz8arjXymf7oTI/GZrZU9UjEVrS9Q8Nt1RfPjhh1m9ejW9evXCz8+PXbt28eabb7Jq1SpeeOEFtm3bRr169di+fTvvvvsur776KkOHDmXfvn3MmDGD4cOH4+rqSvv2xU+BXFxcmDhxIqtWreKVV17h3LlzPPPMM+zevZvs7GxGjBjBsGHD2L9/P6dOneKDDz4AYOTIkYSHh9OvXz+WL1/OtGnTAG5r9+ijj7Jz505CQ0OZOnUq0dGWw9Cqcl5oaCibN28mPDyc0aNHEx0dzfr16/Hz82PcuHEYjUYiIiIICwvj5ZdfZtOmTYwYMQKtVktsbGylY23OzUamVJd+lqkcMBsqXjJaEV62DNEAACAASURBVNgS08XTldY2pupQaMu0FVoHClKzytkV5RUQv3AzDv71aLN9HgfbP4u58PbLYYvy2SZjkZmBzKHsTY5M44g503LbiFuHnBacPIHz/DdALoei2zc+ImMhSltUjCtCl5aFWutQ+lmj1aBLK59WTWFI1aG8xT+V1oG8VF05O7lSQfjiMUQt/YLsxBv308VyuDtryckre7OiN+Tj7mLZAZz+1CA2fbuPj7/bh7OjA65OGurVKf8D8VZExcKUnoncsUxXrtVgSi/LA3b1PVC4aHHq26XsGscMIWdfFHln4mpE26zPQmZ/y5tftaZ4vtSt5OViulTchphvXgO1BpmbB+b028dEZNkTdf9E1UO25q9IbX2qDnvHMl21VkNOWvlYVBWR+c3Wyp6oGIvWro2Ype0xbA8fn+L983x9fblw4QIZGRlkZWXx1VdfERkZyYULF1AoFKX2gYGBpfbOzuWf4Pn6Fj8hc3JyokGDBsjlclxcXMjJKd4HKSYmBoPBQGRkJJGRkdSvX5/09PJ7Z1mza9iw+M1CaGjoPZ/31zWFhoYSExNTGhOVSoWLiwuJiYkAeHh44OLigkKhIDg42HpwSyhKjkfm7A6K4mcLcq9ATBejwV4DKrWFraJZRwr/PFRp7ayoWNTenshUxdqu7ZuSuvsEdq6OKEoaUt/JA0rt85PTUbo7I1erasRnW4xFwbmzKOrVg5L9x5QhzTEeOYTMyQlZyVBAzZgJIC8uJ4oG3hRdv37HTiKIjYUobVExroi4YzF4NPDEriStJu2COLE3yspZ94+UY3FovT2Ql/hXL6wxSXtOYu/qWPojVqFWEr5kHNGR35MafQn/fmF3khROiyb+JN/MwFjylP5kzEW6tg4mS5+LvmShiBvpOkYN7MbI/g/Ssok/HVs0RWl35+eiomJhOHEepVddZCVDZTVtmqH/9QhyFy1yRwcKr6eSPPMt0iO3kh65FYD0D3dY7SSK1DZdPI/c3RNKYqYICKbw7FHQaEFdHIvC2FPIPeoVn6B2ALkcsy7jjroiy56o+yeqHrI1f0VqJx2Pw62BB4qSWPi1a8L5vSdwcHHE/pbOdFURmd9sreyJirFobYmax6bfKP7F5cuX8fPz49KlSzRq1Ag3Nzfc3d0ZNmwYLi4uZGRkcPLkyVJ7mezengYEBQVx8uRJJk6cCMChQ4fw8/MDQC6XYzabOXfu3B3t7uTH3Zx367GgoCCSkpIAMBqNZGVl4e/vf8c0rVJYgHHvZpTdhkGunqLUqxRdPo8y/BHMeTkURv1YrO/pjTnjBhRUfh5TkcFIzIwNNF00BmOaDv2fSWTsP0OjuU9SkKkncc1XyO2VNF0yjryrqTg2bkDs3I8w6a2Myxflsy3GIj8f/Zq3cJz8LOasTAoT4ik4eRzNuEmYs3UYvthMUUY62mdfxHQ9GTv/hmQvW1SjsRClLSzGFWDMM/LB7PWMen08unQdSecuVctCNgBHT5zmmx/3kJqWzvqPPmPUE4+gtre3fuItmPKMHJj1IZ0WPEVemo70c5e5duAs7WcPJz8zh1Nrv6H7mim4N/XGyXc0AEoHey7tOlpjPjvYq5g9/lGWfLgDd2ctTXz/Q4fQJrz1yTc4azWMe7gnp2Iv8vuJ8zRr6E1WTi6zxj5iVVdULMx5+Vx/bS11507ClK4jL+YiuYdO4fnyWExZ2aU/IhVuzrgO7wdAnfGPkbnlewpT7rzwgzDtgnzyvliH/aNPY9ZnUXTtEqbYU9gPGoM5Nxvj7m0Yd2/DfvAYVA8NRebxH/I+eQsKC+7or8iyJywvC6qHbM1fkdoFeUZ2zvmAga+PIidNx/XzScQfPEufmU9gyNKz751vAOg+9WFcG3jQYsADmApNxP125zeWIvObrZU9UTEWrV0bqc3zCUUgM5vN5pp24l7o0aMHEyZM4Pr16/z555/MnTsXX19fjh8/zo4dO6hfvz7JycmMGTMGFxcX5s+fT1ZWFpMnT6Zjx47ExsaycOFCXFxcmDlzJmvXruXcuXMsXLiQvXv3smPHDt544w2uXbvG4sWLWbBgAX379mXdunUYDAYUCgX5+flMnz4dhULBwoULsbOzw2QyMXv27Art/vjjD+bOnUvv3r2ZMGEC7u7u5a6rsud9//33zJs3jzFjxjBhwgSUSiV6vZ4lS5bg5eVFcnIyDz74IBEREWzdupVly5YxZcoUxowZYzW2uW8/Xe3369DiTOtGd0nHWa7CtEUgMhYtW1Vh5dIqoOkbJERXJKLi/L7aKER347EVQnQBPmo1T4ju6JMLhOiarohb/ODjAZ8L0e2iKT+6pLbToI+YwUVHtjgK0U0oGRlR3Tw5XYy/AJ8uzxGiK9JnUfz37YqHq94rEYaqTQ2oLA1cxPgL4sreG19XPN+1NrP4UtUW86oprnToIUTX+/BeIbr3yj+io7h3b+0Mrq0jdRTFInUU7w9SR7EMqaNYhtRRLEPqKBYjdRTvD1JH8RZtqaNYiq10FC+H9RSi63O0cotK3W9seo7i119/TXZ2Np9++mlNuyIhISEhISEhISEh8Q/GbBbzV1ux6TmKgwYNYtCgQTXthoSEhISEhISEhISExD8Km+4oStgeHWe5Ch1yKQpZQGC1a3aKhE+eETOZu6UQVTAlXBWkDIqGDYToihqmllh4U4hut5bj+fXUBiHaCXZiZuGLHCKq8G4mRFdULBpmOQnRFcoPYobW7XZQWDe6K8Tcu/++nc28Va2sG94VYur6T5fnMGJtCyHa4jhp3eQu2O2gEDL89GqWk7Dhp/kxYnRfaprNihgxbWqiueoLu/2TMBdJ22NISAhD6iSWIaqTaIuI6iTaIqI6ibaIqE6ihERFiOskisP2OonisMU5iqKQOokS1YX0RlFCQkJCQkJCQkJCQsIK/7Y3ilJHUUJCQkJCQkJCQkJCwgq1eeEZEUgdRYlKI/cJQtGoNRiyMZuh8PC3Ft+rIkYic/Uss/fwJm/zIsy6O28aDeDWNZS6/dpjTM0CM1xcsc3i+7qDO+LZpx3ZZxJxbhXI9a37SP3peI35/EfcVfacuYS7owMyGUx6qI3F91fTs1n57RFCfDyIuZZG31aBdAvxs+ovgFd4CAF9wzCk6cBs5vhbOyy+bzllAA6eLhhuZuERGkDU8m1kxSdb1VW2bouqc1fMmRmYzWYMn260+N7+oT6o+w8CY/G2D3k/7iJ/z09WdRVNWmLXshNmfRaYzRh/+Kx82l0HAiCvUw+ZgyN5m1dZ1QVx909UjJ1cnZg8awLXkpLxDmjA+iXvk5GaUc6ugb8XU+dOwmQyMWfifKu6FZGals7qyE3EXEhgy/ur70oDILBzc5r3CUNfEos9q7aXswnt34HeM4bz7fxNnN97olK6f5yOZc+RaNxdtMiASUN7W3x/9UY672z9kUDvesRfSWFk/wdp6u91V9dQ22Mhqn4Tqa3p1AqnhzphSs/CbDaT9r+Kl653HtgNrxUziGn1CObcPKu6omIsUltUfS+qHhLZPonStsWyJ6qMiGqrRZa9v9O8cwvC+j6ALrU4NttXfXHXWhI1izRHUaJy2ClR9XySgt+2UvDHt8g9GiD3sdxTz5R0jvxtK4v/vl6H6UpspTqJcgcVQcvGEztvIxeXb0PbzBe3Ls0tbBRqFRcWbiZp7ddcWrWDxvOfqjGfDcZCFm0/wMsDH2ByrzbEJWdwOO6ahc1Hv56mlX89xnZvyZhuLVjx7RHr/pZcZ/iSsRya/wnHV27HPdgHr84hlpelUfPH/E85te5bLu46Qoc5T1gXtrdH++yL5Kz/H7mffIRdw0CUrdqUM8tevICsGc+TNeP5SjU8KO1RD3uG/B3vYfx+M3IvfxRNLJfSsQvrjtmQQ8Fv35C/YwPGX7+yrgvC7p+wGAOTZo4j6vdjfLL2M/b/eICp8yZVaBfSOphDew9XSvN2HD99lh5dHrinp5tKtYohi8by7X8/Zs/bX1I/yJfATpaxcPP2JCc9m6xk62X5Lwz5RhZu2MbLowYzeWhvYpOSORwda2Hz5saddA9rzpjBPRg1sBtz1t79Hlq1ORbC6jeB2jK1PfXnT+XGG5GkrvkUddMANB3LL5GlCvRB1ci3Ur6CuBiL1BZV34uqh0S2T6K0bbHsiSojotpqkWXv76jUKsa+MYmPF3zAl29vwTfYn5DOofekWZswF8mE/NVWpI6iRKWQ/ycQsy4dTIUAFF2LRxFgWfBNsVGl/9uFdKbw7IFKabu0a0LelZuYjcXamUdi8IhobWGTvGUf+VeLKy9NQH1yYq/UmM+nE2/wHzctKrviFf1a+ddl//kkCxt3rQMZOcVPDtNz8mjmXceqLkC9to3RX0mlqCQWKUfj8O1pucDCseVlT0RlcjkFOflWdZXBIZhSUqCgAICCs2dQte9Yzk49aAgOjw3D4clRyJysr96oCAiiKP0mFBb7a7p4DruQMMu023VD5qhF2XUgqgFPYc6v3GR4UfdPVIwBOvZ8gDPHilcAPX30DJ16dKjQ7qcdeygsKKyU5u3o1b0LGo3mnjR82zQm42oqppJYJEbFEtTDsuxlXLlJwqGqrWp6OvYS//F0Q6UsHrTSqmkAv504Z2GTeD2V/3i4AtCgrjuxSclk6PR3dR21ORai6jeR2g6tgyi4dgNzSR7NPf4n2m7tLWxkanvcxz9G6m3eolSEqBiL1BZV34uqh0S2T6K0bbHsiSojotpqkWXv7zRu25TUqzcpLEkrNuo8rXu0u2ddiZpBGnpaBXJycnjhhRdo164dFy9eZODAgXTq1Ikvv/ySlStX8sQTT3D9+nXOnz/P559/zhtvvIG7uzt6vZ6goCAefvhhLl++zOLFi2ndujWxsbGMHTuW4ODgcmmtWrUKk8mEXC7H0dGRCRMmVJhOnz59WLt2Lc899xzR0dHk5eWxdOlSli1bhre3N9euXSM8PJyIiAjefPNNvvvuO4YOHcrp06fx9fVl9uzZlbp2mcYJc0HZkAmz0YDc4XZPyWQo/EIoPLGnUtoqD2dM+jJtk96A0sOlnJ1crSRg+lDcOjfj7OQ1NeZzut6Axr5sywVHexXpessncCO7NufFTbtZ/s0fnLmcysSelVtNz8HDmQJ9WUfKqDdQx8O5Qlu5UkHjoV04MPsjq7oyVzfMhtzSz+bcHGSujS1sCk6fxHjkEOasLJRhHXCaPR/dzBfvrKt1wZxfpkteLjKt5b2TudVFptZg/OFzZJ5eaCYvIGfRZDDfeYl7UfdPVIwB3Oq4kqsvjkdudg7Obs4oFHJMJjHL+d8rWg9n8nPKYpynz8Wrjv8966br9Diq7cvScbDnXJZlJ7B10wBOxyXSrKEPZy5cBiDHkI+bs/ae078bRMVCVP0mUlvh7kpRTlkZKdLnonC31PV84SnS1n0GVXjgISrGIrVF1fei6iGR7ZMobVsse6LKiKi2WmTZ+zvOdVzIuyVv5+pz8a/TUEhaNYHZXHvf/olA6ihWAblczujRo+nUqROZmZmMGzeOTp068eijj7Jz505CQ0OZOnUq0dHRbNu2jYKCAqZOnYrZbKZv37506dIFpVLJM888Q0hICGfPnuWdd95h9WrLOTX79+/n1KlTfPDBBwCMHDmS8PDwCtMJDQ1l8+bNhIeHM3r0aKKjo1m/fj1+fn6MGzcOo9FIREQEYWFhvPzyy2zatIkRI0ag1WqJjY2t6DIrxJybjUypLv0sUzlgNlS8ZLQisCWmi5Xf+sGYqkOhLdNWaB0oSM0qZ1eUV0D8ws04+NejzfZ5HGz/LObC2y+HLcpnd60DufkFpZ9z8o243+I/wLwvfmNIWFP6tg4kXW9g0LJtfDfzcVw09n+Xs8CQqkOpdSj9rNI6kJeqK2cnVyoIXzyGqKVfkJ14w6rP5swMZA5lb1xkGkfMmZZblRSlXC/9v+DkCZznvwFyORTdvpNj1mchs7/lTY5aUzxX8VbycjFdKs5r5pvXQK1B5uaBOf3Ofou6f9Ud48EjBtC1TziGXAMZaZlotBr0uhw0To7oMnS1tpMIoE/VYe9YFmO1VkNOWvlYVBV3Zy05eWVvP/SGfNxdLDuA058axKZv9/Hxd/twdnTA1UlDvTrlf8TdL0TFQlT9JlLblJ6J3LGsjMi1GkzpZbp29T1QuGhx6tul9Jj7mCHk7Isi70zcbXVFxViktqj6XlRdL7J9EqVti2VPVBkR1VaLLHt/R5eWhfqWvK3RatCllY+7rWLlGfc/DmnoaRUwm80cPnyYtWvX8sUXX5CRYblIRcOGxU9MQkNDiYmJ4ebNm0RGRvLee+/RpEkTbt68iZ2dHd999x3vvPMOu3btKqcBEBMTg8FgIDIyksjISOrXr096enqF6fxFYGCgRdo+Pj4AqFQqXFxcSExMBMDDwwMXFxcUCkWFbzJvR1FyPDJnd1AUP1uQewViuhgN9hpQWTYUimYdKfzzUKW1s6JiUXt7IlMVa7u2b0rq7hPYuTqiKKlsfCcPKLXPT05H6e6MXK2qEZ9b+NUlOUOPsaQhOXnpBl2CfMnKzUefVzy5/HpmDh7OxZW9s4M9chkUVWICVcqxOLTeHshLYlEvrDFJe05i7+pY+qNCoVYSvmQc0ZHfkxp9Cf9+YXeSBKDg3FkU9epByebzypDmGI8cQubkhKxkyJ5mzASQFw8pUjTwpuj69Ts2PACmi+eRu3uCXbG/ioBgCs8eBY0W1MX+FsaeQu5Rr/gEtQPI5Zh15fP93xF1/6o7xl998i0vjZjJnInzObTnD5q3Ld77r0VYcw6WzEOUyWTU86pbKf/uJ0nH43Br4IGiJBZ+7Zpwfu8JHFwcsb+loa8qLZr4k3wzA2PJk/STMRfp2jqYLH0u+pLFHG6k6xg1sBsj+z9Iyyb+dGzRFKVdzT27FBULUfWbSG3DifMoveoiKxk6rGnTDP2vR5C7aJE7OlB4PZXkmW+RHrmV9MitAKR/uOOOP4BBXIxFaouq70XV9SLbJ1Hatlj2RJURUW21yLL3d+KOxeDRwBO7krSatAvixN4oK2dJ1FYUr7/++us17YStsHnzZhITE5k1axYtWrTgk08+YdSoUQDs2LGDiIgInJ2Lh46kpKRgNpt5/vnnadu2LXZ2doSEhLBu3Tq0Wi3PPPMMXl5e/PLLLzzyyCMW6eTm5pKQkMDcuXNp27YtTk5O+Pv74+TkVC4dgI0bNzJ69OjSz3FxceTl5dGmTRuMRiPvvfceU6ZMwd7enk2bNpX6bI2CP25ZYbKoiKL069i1fQhF/YaYc7Iw/XkQZceByOt4UXQtHgCZpzdyRzeKLkVXqHnl9/IrfpkLTeTGXsVv8kCc2zbGmJJJ8ue/0vDlx9EG+5B1JAa38BDqPdwZbbAvXsO7cfWTPeiiLN+I+nSx7DxUl88yN3eLz0qFnIC6rnz8WzTRSTfwdNbwcFgT3vnpOPEpGbQOqE/Duq58duBPktJ0fHfiAgPaNKJtw/9Y6JzelVJhLDLjrtHi6X7UbR1I7o1MYr/4jbYvPYp7kA8pR2Pp+c40PFsEUL9DEE0e70qDTs04v/kXC53gun97UmgyYbqchMOjw1AGNaMoLY38n39AM3Isdv4BFJ6NRuEfgLp3PxT+DbEPf5Cc99dTlHrTQsbO429PhYtMFKVcRtVjCAr/pph1GRQe3o193ydR/McPU8KfmJLiUIb1QOHlj13bBzH+vBVziuU8ELlbBUOuqun+RR8ssPhcXTE+Sfm3m9FRZ3l4xEAaNQsktF1z1i2MJC83j8YhgSyMfJ0dm74GILxXJ3oM6IZfoC8Ojg5ER50t1Rg9eVCF1/F3jp44zTc/7iEmLoG8/HyaBzfBzkpH65fIvRafiwpN3LhwlS4T+uPTqhHZNzI5tu03Il54jPpNvUksKWPdpz5Mw47NUGsdMBqMpCda5t1uwy3njirtFAQ0qMumb/cRHZeEp5szD3fvwDtf/MCFy9dpE9SQfcfOsvGbX7l6I41TcZeYNrwfapXSQkfu7EllqE2xaFho+aO4uuq3iqgubWe10VK40ER+/GXcxz2CQ8sgCm+mo9u+G49nR2DfxB9DyTxchZsz7mOG4NixJRSaMF66ajEc75TJsk6urhhXRHVoP9i3fjldUfV9ddVDLfrXE+JvRVSX9r4frlt8ru1lr1z5gGorIxo3y/aputrqQ2mWbWp1xTgL68NoTYUmrl64Qv8Jg2nUugmZKen8tvUXq+c9+sJwqza1gdT/fYoZWbX/1Zk64o7pHjx4kPfff5/o6GiOHj1K+/btK7T7+uuvGTRoEGPHjkWlsv7Qwxoys/nftiPI3RMfH8/cuXNp2bIlrq6ubNiwgYULF6LVapk7dy69e/dmwoQJuLu7YzKZePPNN3F0dKSgoAB7e3ueeeYZoqKiWLlyJWFhYRiNRn788UcWLVpEx46Wk5XXrVuHwWBAoVCQn5/P9OnT+eOPP8ql8/333zNv3jzGjBnDhAkTUCqV6PV6lixZgpeXF8nJyTz44INERESwdetWli1bxpQpUxgzZozV6819++lqj+GhxZnWje6SjrNchejKAgKF6H7yTOWH51aVISGXhejaN7U+af5uUDRsIEQX4NPlOUJ0PzJfs250F/x6aoMQXYB57eYI0X1t5/8J0VV4NxOiC+JiEWG485C12kgDl4qHdN8rG41i6mRRzFtVubl6d4Oo+n7E2hZCdEWy4LmTQnRFlT1R5QPAI0BM+7QiRkybmmiu3GJ0d8PmxB3WjWoBscF9hOg2OffDbb8zGAwMGjSI7777DpVKxbRp0/i///u/cn2H+Ph4vv76a959912OHz+Oo6PjPfslzVGsAoGBgWzeXLZ61dNPl3Wk9u61fDqtUCiYOXNmOY127dpZaLzyyisVpjVlypRyxzp37lwunb59+9K3b1+LY1qtloULF5Y7f+jQoQwdOrTC9CQkJCQkJCQkJCQkbk9NLGZz8uRJvLy8St8QtmnThl9//dWio2gwGNiwYQPz58/n3Xffrba0pY6ihISEhISEhISEhISEFWpiz8O0tDSLt4NarZa0NMsVh9966y2mTJlSLcNNb0XqKEpISEhISEhISEhISNRC6tSpQ05O2TBlvV5PnTple5QmJyej0+n4/vvvS499+OGHPPjggxYLX94N0hxFidvi635vmasiwp0aVbvmX/yefUGYtgg+VIqbf7XbQSFMWwQi5z10N937GP2K6KJJt250F+zPdbdudJfYms8JduLWIV8QVX54fnXwW8gsIbq2iMi5XSLolSpmbjeAj0PlFmaqKpcNN60b1TJEtX3z7GwvFrZGV2X5BZ+qi8WXNls3qgWca9xPiG5w3K7bfne7OYrBwcHY2dmh1VpuP9W0adNqm6MobY8hISEhISEhISEhISFRC3FwcOD1119n4cKFvPXWWzRt2pSOHTsSGRlpse5Jeno669atA2DDhg2kpFhfMdoa0tBTCQkJCQkJCQkJCQkJK9TEHEUoXtCyc+fOFsdmzJhh8dnd3Z0pU6ZUuCDm3SJ1FCWqhIurM7Nee4GkS1fwD/Rl2X9Xk3rTckJti9YhjJs0krPR5whs5M/J42f4bNOXVUqneecWhPV9AF1qFmazme2rvqh1/orSdusaSt1+7TGmZoEZLq7YZvF93cEd8ezTjuwziTi3CuT61n2k/nTcqr+BnZvTvE8Y+jQdmM3sWbW9nE1o/w70njGcb+dv4vzeE1Y1RepWRHXlC6/wEAL6hmEo8fn4W5bLcrecMgAHTxcMN7PwCA0gavk2suKTrepqOrXC6aFOmNKL/Uv7X8VDaZwHdsNrxQxiWj2CObf83qL/BJ9F+Qv3L8+lpqWzOnITMRcS2PL+6rvSAHFlWqS2KF2RZUSUtqi63snVicmzJnAtKRnvgAasX/I+GakZ5ewa+Hsxde4kTCYTcybOt+qvSJ9trd0DcXG2NV2R2vfzd4DE/UUaegqcO3eOw4cPA8UTRF999dUKt7aQgFfmPsfv+/5g3ar3+em7vcxZ8FI5m3r1PPlg/SdE/m8js6cv4tXXX8TNvfL7aanUKsa+MYmPF3zAl29vwTfYn5DOdzdfUqS/IrTlDiqClo0ndt5GLi7fhraZL25dmlvYKNQqLizcTNLar7m0ageN5z9l1VelWsWQRWP59r8fs+ftL6kf5EtgpxALGzdvT3LSs8lKTruNyv3TrYjqyhcKtYrwJWM5NP8Tjq/cjnuwD16dLX2206j5Y/6nnFr3LRd3HaHDnCes6srU9tSfP5Ubb0SSuuZT1E0D0HRsWf46An1QNfL9R/ssyl+4v3nu+Omz9OjyAPcyk19UmRapLUpXZBkRqS2qHZk0cxxRvx/jk7Wfsf/HA0ydN6lCu5DWwRzae7hW+GxL7d5fiIqzremK0r6fdXJtoMgsE/JXW5E6ihR3FI8cOQIULzk7ePDgGvao9tKjV1eOHS3eLPfo4RP06NW1nM3PP/zKqeNnSj8XFhZSWFBY6TQat21K6tWbFBqLz4mNOk/rHu1qnb8itF3aNSHvyk3MJdeeeSQGj4jWFjbJW/aRf7W4stUE1Ccn9opVX33bNCbjaiqmEt3EqFiCeljqZly5ScKhP61q3Q/diqiufFGvbWP0V1IpKtFJORqHb0/LjbaPLS97mi2TyynIybeq69A6iIJrNzCX3N/c43+i7dbewkamtsd9/GOk3uZNxz/FZ1H+wv3Nc726d0Gj0dyThqgyLVJblK7IMiJSW1Q70rHnA5w5VpxPTx89Q6ceHSq0+2nHniq1oSJ9tqV27y9ExdnWdEVp3886uTZgNsuE/NVWau3Q04SEBNavX09gYCBxcXFMmTKF48ePs3LlSsaMGUNMTAwZGRk88sgj/P777yQmJrJ+/Xq0Wi3x8fF88MEH+Pv7k5CQwPjx4wkMDKzwuKurK7t37yY7O5s1a9YwfPhwoHhC6MqVKzl58iQDBgzg8ccf58svv2TlypWMGjWKwnsP+QAAIABJREFUK1euEB8fX5pmXFwc7733Hk2aNCEhIYHJkyfj4+PD6tWrMZlMKJVKCgoKeOGFFyo8dispKSmsWLGCxo0bk5SUxLBhw2jevDnPP/88ly9fpnPnzkRFRdGrVy/2798PQFBQEPv37+fZZ5/Fzc2NnTt34uvrS0JCAs8//zwGg4FZs2bh6emJt7c3P/zwA2vXriU4OLhK96WOhzs52bkA6LNzcHVzQaFQYDKZKrQfNeEJ/vfWBrKz9ZVOw7mOC3n6slUwc/W5+NdpWCU/74e/IrRVHs6Y9GVDoUx6A0oPl3J2crWSgOlDcevcjLOT11j1VevhTH5OmW6ePhevOv5Wz6sp3Yqornzh4OFMwS06Rr2BOh7OFdrKlQoaD+3CgdkfWdVVuLtSlFOmW6TPReFuee88X3iKtHWfQRUbd1vzWZS/cH/zXHUgqkyL1BalK7KMiNQW1Y641XElV1+sm5udg7ObMwqFHJPp3lf7FeWzLbV7fyEqzramK0rb1upkiapRazuKv/32G/b29owePZqUlBTs7e159NFH2blzJyEhIYwfP54pU6aQk5PDG2+8wcKFCzlw4AC9e/fm1VdfZc6cOYSGhnLq1Clmz57N559/ftvjERERXL16lWnTpgHFnVSdTseLL75Ieno6o0aN4vHHHy9NPzg4mIkTJzJ//vzSNOfMmcMrr7xCmzZtOHz4MEuWLGHt2rV88cUXbNy4kcDAQI4fLx5PX9GxW1m6dCndu3dn4MCBXLlyhalTp7Jz506mT5/OE088wbRp08jPz+fmzZsEBwfz5ptv8vLLLzN69GiKiooYOnQoO3fuxN3dnV27drFs2TJWrFjB0KFD2bdvHzNmzGD48OG4ulZuOOiTo4bSe0APcnNySUtNx9FJg06XjdbJkcyMrNs2EIMf7YdG48CaFZFVuve6tCzUWofSzxqtBl1aVqXPF+mv6FgYU3UotOrSzwqtAwWp5a+9KK+A+IWbcfCvR5vt8zjY/lnMhRWnDaBP1WHvWKar1mrISdPd0ZfKIEq3Iu41X/yFIVWH8hYdldaBvNTyPsuVCsIXjyFq6RdkJ96wqmtKz0TuWKYr12owpZf5Z1ffA4WLFqe+XUqPuY8ZQs6+KPLOxP2jfBblL9zfPFcdiCrTIrVF6YosI9WtLaquHzxiAF37hGPINZCRlolGq0Gvy0Hj5IguQ3dPP9hF+WyL7Z6oONuarmhtsL06+V75t20qWGuHnj7++OO4u7vz5JNPsmbNGuzsyvq0Pj4+ADg7O+PrWzzXwMXFpXQzypiYmFIbX19fzp8/f8fjFeHn5wcUryB06yaXAP7+/uW+i4mJ4cCBA0RGRnL48OHS4UorVqxg5cqVDB8+nOTk5Nseu5WYmBhOnTpFZGQk3333HXXq1KGoqKjUL6VSiVarJSAgAIDAwEAAPD09USqV6PV63N3dK7zOv2x9fX1xdq74Cf/f+XTjVp4aOplJo19i70+/0TaseAhZWIfW7P3pNwBkMhleDcr21xk+8hE8PN1ZsyKSpsGNCQj0q1RaAHHHYvBo4ImdqvieN2kXxIm9UZU+X6S/omORFRWL2tsTWcm1u7ZvSuruE9i5OqIo+eHtO3lAqX1+cjpKd2fkatUdY5J0PA63Bh4oSnT92jXh/N4TOLg4Yn/LD/qqIkq3Iu41X/xFyrE4tN4eyEt06oU1JmnPSexdHUs7Nwq1kvAl44iO/J7U6Ev49wuzqms4cR6lV11kymJdTZtm6H89gtxFi9zRgcLrqSTPfIv0yK2kR24FIP3DHVZ/ANuiz6L8hfub56oDUWVapLYoXZFlpLq1RdX1X33yLS+NmMmcifM5tOcPmrct3lOwRVhzDpbMCZPJZNTzqmv1mu+Xz7bY7omKs63pitYG26uTJapGrX2jeOrUKSZOnMjzzz/P0qVL+eqrrxgzZkylzg0KCiIpKQlXV1cSExMJCgq643G5XI7ZbCYzM5Pc3OJX8jLZ7ccLV/RdUFAQDz30EEFBQRiNRn7++WcAcnJyWLt2LWlpaQwePJj+/ftXeOzvWh07dqRnz56YzWbq1auHXC6/bdq3HnNzc8PJyYm0tDTq1KljcZ3WrqsyLP3vKl59/QUCAv3wC/Bh4bwVAASHNOHtdxfTK/wRHurbnTn/nc7Z0+fp1a8Hbu6uzHvlDS7GJ1YqDWOekQ9mr2fU6+PRpetIOneJsweia52/IrSLDEZiZmyg6aIxGNN06P9MImP/GRrNfZKCTD2Ja75Cbq+k6ZJx5F1NxbFxA2LnfoRJf+cN6wvyjOyc8wEDXx9FTpqO6+eTiD94lj4zn8CQpWffO98A0H3qw7g28KDFgAcwFZqI++10jehWRHXlC1OekQOzPqTTgqfIS9ORfu4y1w6cpf3s4eRn5nBq7Td0XzMF96beOPmOBkDpYM+lXUfvqGvOy+f6a2upO3cSpnQdeTEXyT10Cs+Xx2LKyi79capwc8Z1ePGGvXXGP0bmlu8pTLnzBH9b81mUv3B/89zRE6f55sc9pKals/6jzxj1xCOo7e2rpCGqTIvUFqUrsoyI1BbVjry75H2mvDoRn4Y+NPDz4n8L3gWgUbOGzF01i6cixgMQ3qsTnSM64hvow/9NHsbmd7bc0V+RPttSuyc6zramK0r7ftbJtYHavPCMCGRmc+18ifrDDz9w8OBBvL29SUhI4JlnniEpKYm5c+cyZMgQevTowZw5cwgODmbChAm89tpruLi48Nprr5GVlcWGDRvw8/Pj4sWLTJw4sXSOYkXHExISWLRoEXXr1mX48OFs2bKFc+fOsWDBAuLi4li8eDELFy5Eq9Uyd+5cBg8ezCOPPMLs2bNxcXFh/vz5ZGRk8MEHH+Dt7U1ycjKDBg2iXbt2TJs2jWbNmpGXl4eDgwOTJk2q8NitpKSksHr1ary9vUlNTaVDhw706tWLt956i2+++YYpU6bw2GOPYTQaef311zl37hyTJk2id+/eABw7dowvv/wSX19fLl68yEsvvYRcLmf+/PlkZWUxefJkOnbsaPUe+Lrf3UqjdyLcqVG1a/7F79kXhGmL4ENlM2Haux0UwrRFkGiuXIN/N3Q3OQrR7aJJF6K7P9ddiC7Yns8Jdvc+J+d2LIhaKET3t5BZQnRtkQYu2TXtQpXolXpZmLaPg6cQ3cuGm0J0RSKq7ZtnZ3uxsDW6KutbN7pLFl+q2iJTNcUJXzELXrZO+kqI7r1SazuKEjWP1FEUi9RRLEPqKJYhdRTLkDqKto3UUSxD6iiWIXUUbRepo/jv6yjW2qGnEhISEhISEhISEhIStYV/2+u1WruYjYSEhISEhISEhISEhETNIA09lbgt73mPqGkXqkTDggIhuglKpRBdEOdzx1mV2/qkNpH7/e1XIb4XUi+KGXraoI+452xXfxAz5FLUEFFRQ1qvZjkJ0RVJ17OLhegWHtgmRBfAfDFeiG7SOjFDOf3e7CZEF+CTZ8QssCFqGHWE4c5bqNwLIts+ETw5XUxdD2BKuCpENz9GzPDsUyfFDRFtPyzHutFd4LT6WyG61U2U98NCdNtd2SlE916Rhp5KSNQQojqJtoioTqItIqqTKCEhcWdEdRJtEVvrJIpEVCfRFhHVSbQlzP+yVU+loacSEhISEhISEhISEhISFkhvFCUkJCQkJCQkJCQkJKzwb9tHUeooSlQar/AQAvqGYUjTgdnM8bd2WHzfcsoAHDxdMNzMwiM0gKjl28iKT65RbbeuodTt1x5jahaY4eIKy3k+dQd3xLNPO7LPJOLcKpDrW/eR+tPxGvNXpM9ynyAUjVqDIRuzGQoPW84HUEWMROZatoS73MObvM2LMOvuvBm1KF0AZeu2qDp3xZyZgdlsxvDpRovv7R/qg7r/IDAaAcj7cRf5e36yqqvp1AqnhzphSs/CbDaT9r+Kl+V2HtgNrxUziGn1CObcPKu6iiYtsWvZCbM+C8xmjD98Vv6aug4EQF6nHjIHR/I2r7KqK9JnUXlZlL8groyI0q2I1LR0VkduIuZCAlveX31XGgB/xF1lz5lLuDs6IJPBpIfaWHx/NT2bld8eIcTHg5hrafRtFUi3ED+ruqLKtch8ISoWospIYOfmNO8Thr5Ed8+q7eVsQvt3oPeM4Xw7fxPn956wqgli87GoWIjSFdk+iarvRbV7IvOFyLZPomaROopW2L59OxERETg7O9e0KzWKQq0ifMlYtvV4hSJjIRGRz+LVOYRrB86W2thp1Pwx/1MAGg7sQIc5T/DTmJU1pi13UBG0bDx/dH0Js7GQ0PdfxK1LczL2n7FI+8LCzeRfTUPb3J/Q9563WjGKjIUon7FTour5JHkfzwdTIar+TyP3CaLoctncQFPSOUy7Py7+oFKj6jXaemMpShfA3h7tsy+SMXE0FBTgNHcBylZtKDhpea3ZixdQlHLdul4JMrU99edP5WK/SZgLCmmwZjaaji3JPXTKwk4V6IOqkW+ldVHaox72DDmLp0BhIeqxs1A0aYkptkzXLqw7ZkMOhUf3AiD38q9Rn0XlZWExRlwZEVb2bsPx02fp0eUBzscl3NX5AAZjIYu2H+DLlx5FZafgpU17OBx3jQ6NvUptPvr1NK386zGya3POX03l5U9+sd45ElSuReYLUbEQVUaUahVDFo3lrV4zMBkLefKd5wnsFEL8wTJdN29PctKzyUquRH1Zgsh8LCoWwtpUke2TqPpeULsntH4T2PbVRv5tK4BKcxStsGPHDnQ6XU27UePUa9sY/ZVUioyFAKQcjcO3ZysLm2PLy55OyeRyCnLya1T7/9k787goy/X/v2eG2WBYBTFklVAUzV2PC/6UzNTU0jT1qJlWpuaWlrmUS+WuWWqdI3k6aWqZpJZWato3M7VjiGKigAqCDkgOy8CwMzO/PyBwAgGXW6Se9+vF68XMXM9nrrmee3nu577u+3Hu0JSCazewlulmnYzDvXdbG5vUHUco1Jd2CvYBjciNv1Zn/or0Wf5QINbsDDCX6lpSLqMIaGVjY46PLP/fLqQbJTHH6kwXQNk8BHNaGpRt+lMccw5Vpy6V7DSDBqMdOhztqLHIHGveJVPbNpjilN+xFpf6nBd1Hl3PTjY2Mo0atxeGYrjFbEdVKAKCsWTcgJJSXXPiBexCOtr+pg49kTnoUPYYiGrAs1gL82ulLcpnUWVZlL8gro6I0r0VfXqFYm9vf8fHA5xN+p2HXHWo7BQAtPFvyNHYZBsbN52WzNzSGbmM3AJaeDeoUVdUvRZZLkTFQlQd8W0XRKbegLlMNykynuAw2/KWee0GCSfO16h1MyLLsahYiNIV2T+Jau9F9Xsiy4XIvk+i7nkgZhQ3bNhAcXExSqWS+Ph41q1bR1paGuvWrcPf35+kpCQGDx5M+/btmTFjBnq9nq5du3L69Gl69+5NRkYGFy5coEWLFkyfPp3Dhw+zZMkSnnjiCdRqNefOnWPq1KmEhISwfft2Ll26RIMGDUhJSWHx4sXY2dlx+fJlNm3aRGBgIPHx8fTr1w+lUoler2fz5s00adIET09Pli1bRr9+/TCZTJw/f57Vq1fj7e1NWloaa9asISgoiOTkZIYPH07Lli3Ztm0bV69exdXVFb1ez1tvvVXlezdjMplYsmQJ/v7+XL9+nbCwMEJDQ1m1ahXffPMNw4YN4+zZs/j6+nLjxg2uXr1Kt27diIyMpE+fPoSGhvLxxx/j7+9PQkICL7zwAu7u7sycOROA4OBgjh49yrRp0+jdu3etzpHW3YliU0XFLjLl08C96llWuVJB0LBQjs3/pE61Ve5OmE0VKUtmUz5Kd+fKmholAa8Ow7VbC2Imra8zf0X6LLN3xFpcoWstykeuvdUdexkKvxBKTh+uM10AmYsr1vy8Cu28XGQuQTY2xWfPUHTyBFajEWXHzjjOX0z2nJnV6ircXLDkVpw/iykPhZttjD1eeZb0Dz+DsgvaWvmrc8ZaWOEvBXnIdLa6MteGyDT2FO3/HJmHF/aT3iJ3ySSwVr/TqSifRZVlUf6CuDoiSlckGaZ87NUVu1M6qFVkmGxnQ8b0aMnMLYdYvfcXzl01MOFPF+BVIapeiywXomIhqo7o3J0ozK2IcYEpD68G/jUeVxMiy7GoWIjSFdo/CWrvRfV7IsuFyL7vQURao3ifOXr0KNHR0Xz00UcA7Ny5E4AVK1bQp08f+vbti8FgYMiQIRw5coRXX32VMWPGMH36dEwmE6GhoRw/fhytVktYWBjTp0/n0Ucf5ZNPPqFLly507dqV6OhoFixYwJdffkmjRo0YMWIEcrmcd955h59//pmePXsyb9485s+fzyOPPMKNGzeIiYmhe/fuNG7cmLFjx+Lt7Q3AwYMH8fLyYsSIEWzatImDBw8yfvx4VqxYQa9evRg4cCDXrl1jypQp7Nmzhy+++IL58+fTqVMnoqJKp/Creu9mNm7ciJ+fHy+99BIFBQX069eP77//ntdee40tW7YwevRodDod8fHxODo6MnLkSKZOnUphYSE3btxgzpw5vPHGG7Rq1Yro6Gjmz5/P559/zoQJE1i1ahWvvfYazz33HBZL7StoviEbpU5b/lql01JgqDzTKlcq6L5sHJErviAn6fc61S4yZKPQacpfK3Raig3GSnaWgmIuv7Mdrb8n7XYt4HinaVhLbv1sKpGxEOWzNS8HmbJCV6bSYs2v+vlNisDWmBNrt028KF0Aa1YmMm3FjIvM3gFrVpaNzc2pN8VnTuO0eCnI5VBN2TZnZCF3qDh/cp095oyKGNs1ckfhrMOxX2j5e27jBpN7JJKCcxdv7a/JiEx90wyRxr50vcbNFORhvhJfan8jBTT2yFzdsWZUXz5E+SyqLIvyF8TVEVG6InHTackrrHjMTm5hEW43/QaABV/8xOCOzejXNpAMUz6DVkbwzZxncLZX31JXVL0WWS5ExUJUHTEZslE7VPin0dmTm3732Usiy7GoWIjSFdo/CWrvRfV7IsuFyL7vQUR6PMZ9Ji4uDj+/ijUCw4YNK3/fx8cHAHd3d3JycsjMzATA29sbuVyOk5MTDRo0wMHBAblcjlxu+3P+ON7X15dLly4BoNVqWbVqFeHh4Vy6dImMjIzy7/P1Lb3T5OHhQc+ePW/ps7+/PwBubm7k5uaWHx8dHU14eDjffPMNDRo0wGKxsHz5cnbs2MHQoUOJiSnNt6/qvT/HJCEhgfDwcLZs2ULTpk0xGo3lsXB2dkahUNC8eXMA/Pz8UCqV6HQ6AgICbGLn6+tLbGxFPn5gYGD5b/T09Lzlb/wzaacuovN2R64qvbfg2TGI5MNnULs4lDfwCo2S7suf57fw7zD8dgX//h2rkxSubYyMR+PtgaxM16VTMwyHTmPn4oCiTNd30oBy+8LUDJRuTsg1qjqLhSifLamXkTm5gaJUV+4ViDnxN1Dbg8r2YkrRogsl50/Uyl9RugDFF2JQeHpC2fO8lCEtKTp5ApmjI7KylD37cS+CvDTVTNHYG8v169V2lgD5p2NRejVEpiz12b5dC0w/nkTurEPuoKXkuoHUOWvJCN9JRnjpjauM/+6u8ULVnBiL3M0D7Ep1FQHNKYn5Fex1oCk9dyXx0cjdy+qdRgtyOdbszBpjIcpnUWVZlL8gro6I0hXJI34NSc00UVR2IXfmyu+EBvtizCvEVFC60cX1rFzcnUrri5NWjVwGFmv1q2xE1WuR5UJULETVkeSoi7g2dkdRpuvXoSmxP5xG6+yA+qZB0+0ishyLioUoXZH9k6j2XlS/J7JciOz7JOqeOp9RDA4O5uTJk+WvIyIiGDRoEMHBwSQnJxMSEsKNGzdwcnLC1dWVvLy8atRsuXr1Kj4+Ply5cqV8gDRt2jS++uorvLy8MJlMNn4kJyfj4uJCWloaMTExhIWFIZfLsVqtxMXF8fDDDwMgk1W+mxAcHEyXLl149NFHsVqteHp6IpfLSU1NZc2aNeTl5TFgwAAGDhxY5XsuLi42Wu7u7jz77LMA7Nmzp/zzqr77z+/d/FuSkpIIDg6+pW1tMRcUcWzuf+n61rMUpGeTceEqKcdi6DR/BIVZuUR/sJde6yfj1swbR9/nAFBq1Vz59tc607bkFxE3exPNloyjKD0b0/lkMo+e4+E3R1GcZSJp/VfI1UqaLX+eAr0Bh6DGxL/5CWZT9bnzImMhymdKiin6YTvKnsMhz4TFoMdyNRZl9yFYC3IpiTwAgMzDG2vm71BcuzWVwnQBCgsxrV+Lw6RpWI1ZlCRcpvhMFPbPT8Sak03+F9uxZGagmzYT8/VU7PybkLNySY2y1oJCri/8gIZvTsSckU1BXCJ5J6LxeG08ZmNO+QWqwtUJlxH9AWjwwlCydnxHSVo1mxwUF1LwxYeon34Jq8mIJeUK5vho1IPGYc3LoehQBEWHIlA/OQ7VY8OQuT9Ewda1UFJ8a03BPosqy8JijLg6Iqzu3YJfT59l74HDGNIz2PjJZ4wdOQSN+tYzW1WhVdkxb3A3Vnx1AlcHDUEPudI5yIu135zE2V7N+F6teW1gZ7b9HEN0Uhr6jBym9u2Aq4OmemFB9VpkuRAVC1F1pLigiD1vfMzARWPJTc/memwyl4/H0HfOSPKNJo78ay8AvaY8hUtjdx4Z8A/MJWYu/lT9rJfIciwqFsL6VJH9k6j2XlC/J7R9E9j3PYjUv2TZu0NmtdZwO+0+sGHDBgoLC1Gr1bi4uDB69GjS0tJ477338PPzIykpiaFDh9K+fXvWrl3L3r17Wbp0KSkpKSxbtoylS5cCMG/ePGbPns2wYcMYM2YM3bt3p7CwkLNnzzJ9+nRatWrFmjVruHjxIu3atePo0aO4uLiwePFiMjMz2bRpEwEBAVy/fp2XXnoJT09PNm3aRHJyMoWFhYwaNYqFCxfSvHlzZsyYwdtvv43RaOTtt99Go9Gwbt06vL29MRgMdO7cmT59+vDGG2/QqFEjAHJycpg7d26V792MyWRi1apVeHp6kpOTg4+PD//85z/ZuXMnK1euZPLkyYwbNw6gPB6TJ09m6NChAOXrLf38/EhMTGTChAn4+PiwaNEiLly4wMSJE3n88cdrPC8feY++Z+f4ftCkWEyjk6BU1mx0B4jyF6DLXJeajR4g8r6LrdnoDjEkOgjRbdxXTEKGfr+4buhonpsQ3VD7DCG6emPNmzQ8aPSIWSZEt+RYRM1Gd4g18bIQ3eQPrwrR9VvVU4ju1pdrn3p4uyTYianXvfPFpD2L6vdEMupVMW29OUEvRBegMK7qVNi7JfpMIyG6nYbnCtEFcFy3r2ajB4CjjYYK0Q29Lq6NvxseiIGiCMaMGcOyZcvK1xZK3D7SQLEUaaAoHmmgWIE0UKxAGihWIA0UK5AGihVIA8UKpIFiBdJAURw/NRomRLfH9Z1CdO+WOl+jKIIff/wRvV7P9u23t722hISEhISEhISEhIREVVisYv4eVOp8jaIIevbsWe1mNBISEhISEhISEhISEhK35i85UJR4cBGVflNK/UqVEZna00lQqoyiSWMhuqLSQ0Fg+uJ+MSlDQtMt61cVqZeIShG16yZmXQxACWJ81huzaja6A3wFpcrWR+pjiqgoRC1hsO8XXLPRHWLYLyY9W1S5aCxwaYS4KN9bLEiPx5CQkJCQkJCQkJCQkJD4GyPNKEpISEhISEhISEhISNSA9W82oygNFCVqjVf3EAL6dSQ/PRusVqLW7rb5vPXkAWg9nMm/YcS9VQCRqyMwXk6tlXZgt5a07NsRU5n24fd3VbJp9URnHp89gn2LtxD7w+k69VlkLERpK5q2xq51V6wmI1itFO3/rJKNssdAAOQNPJFpHSjY/n6NunKfYBQPt4X8HKxWKPmf7c5lqt5jkLl4VNi7e1OwfQnW7OqfiwZg37UNjo91xZxhxGq1kr6h6g2qnAb2xGvNbOLaDMGaV1CjrmuPVjTs34kigxGskLjGNvWu4ZNd8OjbgZxzSTi1CeT6ziMYDkbVmb8ifRZV3upjLETp/nJRz+FzV3Bz0CKTwcTH2tl8rs/I4d19JwnxcScuJZ1+bQLpGeJXi0hUxpCewbrwLcRdSmDHf9bdkYZIn0XFGMS1RaLqSH3r90Rqi9JVtm2PqlsPrFmZWK1W8rdttvlc/VhfNE8MgqIiAAoOfEvh4YM16oK48iaq7RRZLkS29w8af7fnKEqpp4J4//33OXz4MADXrl3j0KFDdezR3aHQqOi+fDwnFm8l6t1duDX3watbiI2Nnb2GXxZvI/rDfSR+e5LOb4yslbZSo2LwkvHse/tTDr/3JY2CfQnsaqvt6u1BbkYOxtSaBxeifRYZC2HaSjWa4S9TuPsjir7bjtzLH0XT1ra6HXthzc+l+Ke9FO7eRNGPX9Wsa6dE9egoin/aSfEv+5C7N0buY7vSwJx8gcKId0v/vv4Q87X4Wg0SZRo1jRZP4fel4RjWb0PTLAD7Lq0r2akCfVA97Fuzr2XItSqCV75A/ILNJK6OQNfCF9fQljY2Co2KS+9sJ/mDr7ny/m6CFj9bZ/6K9FlUeauPsRClm19UwpJdx3ht4D+Y1KcdF1Mz+d/FFBubT348Sxt/T8b3as24no+wZt/JWkSiaqLOxhAW+g/u5sFXonwWFWNAWFskqo7Ut35PpLYwn9VqdNNmkrtxA3lbP8GuSSDKNu0qmeUsewvj7BkYZ8+o9SBRVHkT1XaKLBci23uJukcaKApi2rRpPProowDo9fp6P1D0bB+E6ZoBS1EJAGm/XsT30TY2NqdWV9wZlsnlFOcW1krbt10QmXoD5jLtpMh4gsPa2thkXrtBwonzD4TPImMhSlsREIwl4waUlOqaEy9gF9LRxkbZoScyBx3KHgNRDXgWa2F+jbryhwKxZmeAuVTXknIZRUArGxtzfGT5/3Yh3SiJOVajLoC2bTDFKb9jLS7Vzos6j65nJxsbmUaN2wtDMdzi7mVVOHdoSsG1G1jLYpx1Mg6eFuGnAAAgAElEQVT33rblLXXHEQr1pR26fUAjcuOv1Zm/In0WVd7qYyxE6Z5N+p2HXHWo7BQAtPFvyNHYZBsbN52WzNzSu+sZuQW08G5Qo+6t6NMrFHt7+zs+HsT5LCrGIK4tElVH6lu/J1JblK6yeQjmtDQoe2Zxccw5VJ26VLLTDBqMduhwtKPGInOs3WZiosqbqLZTZLkQ2d4/iFiRCfl7UPlLpZ5u2LCB4uJilEol8fHxrFu3jrS0NNatW4e/vz9JSUkMHjyY9u3bM2PGDPR6PaGhoZw7d46WLVsybdo0AHbs2EFiYiKurq6cOXOGVatWcenSJTZv3kyLFi2IjY1l1qxZFBcXM2vWLDw8PFi1ahWnTp1i3bp1zJkzh//+9780b96cl156id27d3PhwgXWr19Ply5dmD9/Pu3atWPZsmXs3r2bHTt2sHr1ary9vct/S2RkJF9++SWBgYEkJiYya9YssrOzmTt3Lh4eHnh7e7N//37WrFnDokWLCAkJwcnJia+++ooDBw7w3XffceXKFRwdHcnIyGDu3LkcOXKEZcuW0atXLywWC99//z1HjhypVWy17k4UmyoGDkWmfBq4O1VpK1cqCBoWyrH5n9RKW+fuRGFuRQpCgSkPrwb+tTq2OkT5LDIWorRlOmeshXkVbxTkIdM529q4NkSmsado/+fIPLywn/QWuUsmgfXWiRYye0esxRXnzlqUj1x7qzuGMhR+IZScPlyjvwAKNxcsuRWxsJjyULjZ+uzxyrOkf/gZlHVQtUHl7oTZVOGz2ZSP0t25kp1coyTg1WG4dmtBzKT1deavSJ9Flbf6GAtRuhmmfOzVFTsQOqhVZJhsZxXG9GjJzC2HWL33F85dNTDhTxdw9xtRPouKMYhri0TVkfrW74nUFtbvubhiza/o96x5uchcgmxsis+eoejkCaxGI8qOnXGcv5jsOTNr1hZU3kS1nSLLhcj2XqLu+csMFI8ePUp0dDQfffQRADt37gRgxYoV9OnTh759+2IwGBgyZAhHjhzh1VdfZdSoUbz88stA6bMXp02bxuXLl/n000/Zt68013z//v1YrVa0Wi0zZ87Ex8eHgwcP8umnn/L6668zc+ZMNm/ejE6nw9HRkTFjxtCxY0euXr2KXq9HpVIxePBgAKZOnQrAiy++yLlz5wCQy+W8+uqrNoNEq9XKK6+8QkREBJ6enuzatYt///vfzJs3j2HDhnHkyBFmz57NiBEjcHFxoXfv3uTn5zN79myeeuopUlNT2bp1K3v37gVg4cKFREREMHz4cA4ePIifnx+jRo3iySefrHV88w3ZKHXa8tcqnZYCQ3YlO7lSQfdl44hc8QU5Sb/XSttkyEbtoCl/rdHZk5teWft2EeWzyFiI0raajMjUN80yaOxL1yreTEEe5ivxpfY3UkBjj8zVHWvGrfWteTnIlBXnTqbSYs2v+rERisDWmBPP1ujrH5gzspA7VMRCrrPHnFHhs10jdxTOOhz7hZa/5zZuMLlHIik4d/GWukWGbBS6Cp8VOi3FBmMlO0tBMZff2Y7W35N2uxZwvNM0rCXm++6vSJ9Flbf6GAtRum46LXmFxeWvcwuLcLvpewAWfPETgzs2o1/bQDJM+QxaGcE3c57B2V59S12RiPJZVIxBXFskqo7Ut35PpLawfi8rE5m2ot+T2TtgzbJ9ZIsl7Xr5/8VnTuO0eCnI5WCpfiWaqPImqu0UWS5EtvcPItIaxXpKXFwcfn4VC+mHDRtW/r6Pjw8A7u7u5OTkkJmZCYCPjw8KhQKFQoGy7Jkz8fHxNoO2vn374ujoiEajYdu2bWzcuJGjR4+Wa3Tt2hW9Xs/Vq1f59ttv6d+/f42+Dhw4kJ9++omcnBxOnTpFhw4dbD7PzMzEaDTy1VdfER4ezqVLl1AoFOWfBwYGAuDr64uTk5PNe8HBwSQkJNC4ccXz7vz8/IiNja10fKtWtmkS1ZF26iI6b3fkqtJ7C54dg0g+fAa1i0N546PQKOm+/Hl+C/8Ow29X8O/fsTrJcpKjLuLa2B1FmbZfh6bE/nAarbMD6psatttFlM8iYyFK25wYi9zNA+xKdRUBzSmJ+RXsdaAp1S2Jj0bu7ll6gEYLcjnW7MxqdS2pl5E5uYGiVFfuFYg58TdQ24PK9sJS0aILJedP1CoOAPmnY1F6NUSmLNW2b9cC048nkTvrkDtoKbluIHXOWjLCd5IRXnpjKOO/u2vseIyR8Wi8PZCVxdilUzMMh05j5+KAoizGvpMGlNsXpmagdHNCrlHVib8ifRZV3upjLETpPuLXkNRME0VlA50zV34nNNgXY14hpoLSDTSuZ+Xi7lR6QeukVSOXgeVuFhneJaJ8FhVjENcWiaoj9a3fE6ktSrf4QgwKT08ou75ThrSk6OQJZI6OyMrSs+3HvQjy0usrRWNvLNev1zhIBHHlTVTbKbJciGzvJeqev8yMYnBwMCdPViymj4iIYNCgQQQHB5OcnExISAg3btzAyckJV1dX8vLykMkq5wQ3bdoUvb7iYeUHDhygY8eOrFy5kscee4ynnnqKn3/+uXzGEWDkyJG89957BAQEoFJV7tAUCgVWq5WCggJSUlJo0qQJAwcOZP78+YSFhVWyd3V1xc3NjeHDh+Ps7ExmZiZnzpwp/7wqv29+78+/4cqVK7Rs2bJK29piLiji2Nz/0vWtZylIzybjwlVSjsXQaf4ICrNyif5gL73WT8atmTeOvs8BoNSqufLtrzVqFxcUseeNjxm4aCy56dlcj03m8vEY+s4ZSb7RxJF/lc6M9pryFC6N3XlkwD8wl5i5+FP1d+hE+SwyFsK0iwsp+OJD1E+/hNVkxJJyBXN8NOpB47Dm5VB0KIKiQxGonxyH6rFhyNwfomDrWigprl63pJiiH7aj7Dkc8kxYDHosV2NRdh+CtSCXksgDAMg8vLFm/g7FtVvzAGAtKOT6wg9o+OZEzBnZFMQlknciGo/XxmM25pR3OApXJ1xGlN6gafDCULJ2fEdJ2q03DLDkFxE3exPNloyjKD0b0/lkMo+e4+E3R1GcZSJp/VfI1UqaLX+eAr0Bh6DGxL/5CWZT9Ws2Rfkr0mdR5a0+xkKUrlZlx7zB3Vjx1QlcHTQEPeRK5yAv1n5zEmd7NeN7tea1gZ3Z9nMM0Ulp6DNymNq3A64Ommp1b8Wvp8+y98BhDOkZbPzkM8aOHIJGfXszk6J8FhVjQFhbJKqO1Ld+T6S2MJ8LCzGtX4vDpGlYjVmUJFym+EwU9s9PxJqTTf4X27FkZqCbNhPz9VTs/JuQs3JJjXEAhJU3UW2nyHIhsr1/EPm7zSjKrNY6vG15j9mwYQOFhYWo1WpcXFwYPXo0aWlpvPfee/j5+ZGUlMTQoUNp3749a9euZe/evSxZsgSTycS8efN4/fXXGTp0KDt27ODSpUu4urpisViYMmUKBw4cYOvWrXTu3JnU1FTOnz/PW2+9RatWrcjNzSUsLIyvv/4aT09PUlNTWbp0KUajkQULFuDu7s706dPx9/cnLCyM//f//h9paWk888wzfP/991UOLqOioti9ezeNGjUiNTWVcePG4ezszOLFizEajUyaNIkuXbqQmJjIwoULcXZ2ZsKECeWzhH/8BkdHx/K1jTExMSxcuLB87eTNM7BV8ZH36Ht+jhLsxFWxJiV/mQnyu2bEkKyaje4ARZPGNRvdAckfXhWiC6A31m5zgtulsXPVaUZ3iyh/ARKUypqN7oBQ+wwhuiJjIYqu4e2F6Np1GypEF6DkWETNRnfA8QmnhOh2mesiRHfb6lwhuiCu75P6vQoGh4jpR+z7BddsdIeI6vuO5rkJ0RXV1gMEx38rTPte8o1n7XaDvV2eSKv8yLIHgb/UQLG+UFRURGZmJl9++SWTJ0+ua3duiTRQrL9IA8UKpIFiBdJAUTzSQLECaaBYgTRQFI80UKxAGiiK4+82UPzLpJ7WF/Lz85k4cSJNmjQp39xGQkJCQkJCQkJCQuLBxvLgPslCCNJA8T6j1WrZvHlzXbshISEhISEhISEhISFxS6SBosQtEZFiEEr9TImof9SvdKTGfQX6u79+pYi2bnO9ZqM7JCHGR4iusPMn6NyBuPNnTbwsRLcEMemhIDKtVUzqqTlBX7PRHTBiCOj3i0kRTSgSky4rst8TVUdEpcCLIu+72JqN7pDGfQXF+Gtx5bh3fvWPqrlTxCX43lss/L2mFKWBosR9RdQgUUJCQkJC4m4QNUisj9TH9cES4hE1SKxP/N02dqlf0w4SEhISEhISEhISEhISwpFmFCUkJCQkJCQkJCQkJGrg75Z3IA0UJWqNfdc2OD7WFXOGEavVSvqG7VXaOQ3sidea2cS1GYI1r6BW2l7dQwjo15H89GywWolau9vm89aTB6D1cCb/hhH3VgFEro7AeDm1znwWGQtR2oqmrbFr3RWryQhWK0X7K2/FrOwxEAB5A09kWgcKtr9fo67cJxjFw20hPwerFUr+t8/mc1XvMchcPCrs3b0p2L4Ea3bND9oV5bOoGLv2aEXD/p0oMhjBColrbNeVNXyyCx59O5BzLgmnNoFc33kEw8GoGnUBlG3bo+rWA2tWJlarlfxttptiqR/ri+aJQVBUBEDBgW8pPHywRl1RdU/UuYP6d/5E1pFfLuo5fO4Kbg5aZDKY+Fg7m8/1GTm8u+8kIT7uxKWk069NID1Dqn+GblUY0jNYF76FuEsJ7PjPuts+/g9E1pH61l4EdmtJy74dMZXVvcPv76pk0+qJzjw+ewT7Fm8h9ofTNWqK9BfEnT9R7ZCodlOktqhyLKq8gdh6LVG3SAPFu8BkMjFx4kS2bt1ard21a9eIjY2ld+/e98mze49Mo6bR4ikk9p+ItbiExuvnY9+lNXknom3sVIE+qB72vS1thUZF9+XjiQh7HUtRCb3Dp+HVLYSUYzHlNnb2Gn5ZvA2AJgM70/mNkRwc926d+CwyFsK0lWo0w18md9lkKClBM34uiqatMcdX6Np17IU1P5eSX38AQO7lX7OunRLVo6Mo+HQxmEtQPfEScp9gLFcrFv+bky9gPvRpmeMaVH2eq9UFsCifRcVYrlURvPIFfukxC2tRCa3+MxPX0JZkHj1XbqPQqLj0znYK9enoWvrT6qMZtess1Wp002aSOeE5KC7G8c23ULZpR/EZ22Nzlr2FJa32m+GIqnvCyhv18PwJrCP5RSUs2XWML2c9jcpOwawth/nfxRQ6B3mV23zy41na+HsypkdLYvUGXtv6f3c0UIw6G0NY6D+IvZhw28f+gdA6Us/aC6VGxeAl41nbZzbmohJG/WsGgV1DuHy8ou65enuQm5GDMbUW7aVgf0Hc+RPWDglqN4VqCyrHosobCK7XDyAW2d9rMxtpjeJdoNPp+PTTT2u00+v1HDp06D54JA5t22CKU37HWlwCQF7UeXQ9O9nYyDRq3F4YiuEWdy9vhWf7IEzXDFiKSrXTfr2I76NtbGxOra64OyWTyynOLawzn0XGQpS2IiAYS8YNKCnVNSdewC6ko42NskNPZA46lD0GohrwLNbC/Bp15Q8FYs3OAHOpriXlMoqAVjY25vjI8v/tQrpREnOsTn0WFWPnDk0puHYDa1k5zjoZh3vvtjY2qTuOUKgv7YTtAxqRG3+tVtrK5iGY09KguBiA4phzqDp1qWSnGTQY7dDhaEeNReZY82YUouqeqHMH9e/8iawjZ5N+5yFXHSo7BQBt/BtyNDbZxsZNpyUzt3S2KCO3gBbeDWql/Wf69ArF3t7+jo79A5F1pL61F77tgsjUGzCXxSIpMp7gMNtYZF67QcKJ87XWFOkviDt/otohUe2mSG1R5VhUeQOx9fpBxCro70FF6Izihg0bKC4uRqlUEh8fz7p160hLS2PdunX4+/uTlJTE4MGDad++PTNmzECv1xMaGsq5c+do2bIl06ZNA2DHjh0kJibi6urKmTNnWLVqFZcuXWLz5s20aNGC2NhYZs2aRXFxMbNmzcLDw4NVq1Zx6tQp1q1bxzvvvENubi5ffvklgYGBJCYmMmvWLNzcKnbgTElJ4Z133qG4uJi2bdty5coVHn74YSZMmIDZbGbFihW4uLiQnZ1NQEAAw4cPZ8+ePbzzzjtERkbyww8/sGzZMvr164fJZOL8+fOsXr2ahg0bsnv3bi5cuMD69evp378/v/zyC1evXsXV1RW9Xs9bb71lEzeTycSSJUvw9/fn+vXrhIWFERoayqpVq/jmm28YNmwYZ8+exdfXlxs3bnD16lW6detGZGQkffr0ITQ0lI8//hh/f38SEhJ44YUXcHd3Z+bMmQAEBwdz9OhRpk2bVutZToWbC5bcisbIYspD4eZsY+PxyrOkf/gZlHVQtUXr7kSxqUK7yJRPA3enKm3lSgVBw0I5Nv+TOvNZZCxEact0zlgL8yreKMhDprPVlbk2RKaxp2j/58g8vLCf9Ba5SyaB9dYZ+TJ7R6zFFSlL1qJ85Npb3aWWofALoeT04Tr1WVSMVe5OmE0VsTCb8lG6O1eyk2uUBLw6DNduLYiZtL5W2jIXV6z5FbGw5uUicwmysSk+e4aikyewGo0oO3bGcf5isufMrFZXVN0Tde6g/p0/kXUkw5SPvbriUQMOahUZJtvZgDE9WjJzyyFW7/2Fc1cNTPjTBfj9RGgdqWfthc7dicLcilgUmPLwauBf6+Pvt78g7vwJa4cEtZsitUWVY1HlDcTWa4m6R9hA8ejRo0RHR/PRRx8BsHPnTgBWrFhBnz596Nu3LwaDgSFDhnDkyBFeffVVRo0axcsvvwxAz549mTZtGpcvX+bTTz9l377SNR379+/HarWi1WqZOXMmPj4+HDx4kE8//ZTXX3+dmTNnsnnzZnQ6HY6OjowZM4bg4GB69OhBREQEnp6e7Nq1i3//+9/Mmzev3F8vLy969+7N8ePHmTx5MgD9+/enZ8+eREVFUVJSUv7+gAED6NChA0899RTr1pWu1QgLC+PgwYN4eXkxYsQINm3axMGDBxk/fjyDBw8GYOrUqQDMnDmT+fPn06lTJ6KiKk+9b9y4ET8/P1566SUKCgro168f33//Pa+99hpbtmxh9OjR6HQ64uPjcXR0ZOTIkUydOpXCwkJu3LjBnDlzeOONN2jVqhXR0dHMnz+fzz//nAkTJrBq1Spee+01nnvuOSyW2i/JNWdkIXfQlr+W6+wxZxjLX9s1ckfhrMOxX2j5e27jBpN7JJKCcxer1c43ZKPUVWirdFoKDNmV7ORKBd2XjSNyxRfkJP1eZz6LjIUobavJiEx902yAxr50/cPNFORhvhJfan8jBTT2yFzdsWbcOtbWvBxkSk35a5lKizW/6mffKQJbY048e0ut++WzqBgXGbJR6CpiodBpKTYYK9lZCoq5/M52tP6etNu1gOOdpmEtqX7LcWtWJjJtRSxk9g5Ys7JsdW9Kbyo+cxqnxUtBLodq6rmouifq3EH9O38i64ibTkteYXH569zCItxu+g0AC774icEdm9GvbSAZpnwGrYzgmznP4GyvrvX33CuE1pF61l6YDNmoHSpiodHZk5teue7dLiL7J1HnT1g7JKjdFKktqhyLKm8gtl4/iPzdNrMRlnoaFxeHn1/FOohhw4aVv+/jU/rQZ3d3d3JycsjMzATAx8cHhUKBQqFAWfZA1vj4eLy9vct1+vbti6OjIxqNhm3btrFx40aOHj1artG1a1f0ej1Xr17l22+/pX///mRmZmI0Gvnqq68IDw/n0qVLKBSKKv3+wzcAX19fLl26ZOMzgLe3N/Hx8VUe7+/vD4Cbmxu5ublV2ixfvpwdO3YwdOhQYmJiKn0eFxdHQkIC4eHhbNmyhaZNm2I0Gstj5uzsjEKhoHnz5gD4+fmhVCrR6XQEBATY+Ovr60tsbMVamMDAQAA8PDzw9PSs0r+qyD8di9KrITJl6b0F+3YtMP14ErmzDrmDlpLrBlLnrCUjfCcZ4aU3BTL+u7vGjgcg7dRFdN7uyFWl2p4dg0g+fAa1i0N556HQKOm+/Hl+C/8Ow29X8O/fsTpJoT6LjIUobXNiLHI3D7Ar1VUENKck5lew14GmNMYl8dHI3cvKhEYLcjnW7MxqdS2pl5E5uYGiVFfuFYg58TdQ24PK9oJV0aILJedP1BgD0T6LirExMh6NtweysnLs0qkZhkOnsXNxQFFWjn0nDSi3L0zNQOnmhFyjqjEWxRdiUHh6Qlm7qAxpSdHJE8gcHZGVpQPaj3sR5KXtmqKxN5br12u82BFV90SdO6h/509kHXnEryGpmSaKyi62zlz5ndBgX4x5hZgKSjfQuJ6Vi7tTaRlx0qqRy8BirZtEJ5F1pL61F8lRF3Ft7I6iLBZ+HZoS+8NptM4OqG8aNN0uIvsnUedPVDskqt0UqS2qHIsqbyC2XkvUPcJmFIODgzl58mT564iICAYNGkRwcDDJycmEhIRw48YNnJyccHV1JS8vD1kVC0SbNm2KXq8vf33gwAE6duzIypUreeyxx3jqqaf4+eefy2ccAUaOHMl7771HQEAAKpUKV1dX3NzcGD58OM7OzmRmZnLmzJkq/b569Wr5/0lJSQQGBpKTk2Mz2Lp27RpNmzat8viqfoNCocBqtVJQUEBKSgqpqamsWbOGvLw8BgwYwMCBA3FxcbGJnbu7O88++ywAe/bsKf+8Kv0/v/dHjF1cXEhKSiI4OLha/2qDtaCQ6ws/oOGbEzFnZFMQl0jeiWg8XhuP2ZhT3uEoXJ1wGdEfgAYvDCVrx3eUpFW/MNpcUMSxuf+l61vPUpCeTcaFq6Qci6HT/BEUZuUS/cFeeq2fjFszbxx9nwNAqVVz5dtf68RnkbEQpl1cSMEXH6J++iWsJiOWlCuY46NRDxqHNS+HokMRFB2KQP3kOFSPDUPm/hAFW9dCSfGtNQFKiin6YTvKnsMhz4TFoMdyNRZl9yFYC3IpiTwAgMzDG2vm71Bc87oS0T6LirElv4i42ZtotmQcRenZmM4nk3n0HA+/OYriLBNJ679CrlbSbPnzFOgNOAQ1Jv7NTzCbarE2r7AQ0/q1OEyahtWYRUnCZYrPRGH//ESsOdnkf7EdS2YGumkzMV9Pxc6/CTkrl9QoK6ruCStv1MPzJ7COaFV2zBvcjRVfncDVQUPQQ650DvJi7TcncbZXM75Xa14b2JltP8cQnZSGPiOHqX074OqgqVn8T/x6+ix7DxzGkJ7Bxk8+Y+zIIWjUtzcrKbSO1LP2origiD1vfMzARWPJTc/memwyl4/H0HfOSPKNJo78ay8AvaY8hUtjdx4Z8A/MJWYu/lT9jLPI/knU+RPWDglqN4VqCyrHosobCK7XDyCWv9deNsisVnG3Fjds2EBhYSFqtRoXFxdGjx5NWloa7733Hn5+fiQlJTF06FDat2/P2rVr2bt3L0uWLMFkMjFv3jxef/11hg4dyo4dO7h06RKurq5YLBamTJnCgQMH2Lp1K507dyY1NZXz58/z1ltv0apVK3JzcwkLC+Prr78unzWLiopi9+7dNGrUiNTUVMaNG1c+u/YHu3bt4scffyQkJIS4uDiCgoKYNGkSZrOZ5cuX4+TkhNFoJCgoiOHDh/P111/z9ttvM3PmTEJCQli4cCHNmzdnxowZvP322xiNRt5++22cnZ2ZPn06/v7+hIWF8f3339OoUSMAcnJymDt3ro0fJpOJVatW4enpSU5ODj4+Pvzzn/9k586drFy5ksmTJzNu3DiA8rhNnjyZoUOHAnD58mU2bdqEn58fiYmJTJgwAR8fHxYtWsSFCxeYOHEijz/+eI3nL7Zp/7suA3/maJ5bzUZ3SKh9hjDt+kbjvmKSBRRNGgvRNSfoaza6Q/T7xSSK6I212/Tgdmnd5jZ337sNdsf41Gx0B4wYklWz0R0g6tyBuPPXZa5LzUZ3gCwgsGajO8Su21Ahuj+FzK3Z6A7oNLzqbJ27RWR521wkplyMVQmqe4LqB0CCUlmz0R0wOORqzUYPGOpmYuK89Ouq13jeLb3zxaWKPpq2Q5j2veQzr1FCdEembBOie7cIHSjWN3bt2oVery9fS/h3Rxoo1l+kgWIF0kCxAmmgeJO2NFAsRxooliINFCuQBor3B2mgWEF9GShu8xotRHdUSvWP2qsrpOcolpGamsr//d//YTQaiY+Pv2VqqYSEhISEhISEhITE34+/2+yaNFAs46GHHmL9emm7XgkJCQkJCQkJCQkJCSn1VOKWdGscds81/ewqP1vnXpFUUnk75geZ52Rede3CA0OCnbiUr/qWmiUqLQvEpWeLisUhbdW7U98LRJULUYhMBRRFj5hlQnQvd50iRPf5m54zJyEOUdcBfrK7273zViRZ69+mKwvUt7Fx3G0gKoUaYNmV7cK07yVbGotJPX1W/2Cmngp7PIaEhISEhISEhISEhIRE/URKPZWQkJCQkJCQkJCQkKgBcflPDybSQFHitnB0cWTS3BdJSU7FO6AxG5f/h0xD5Qe9Nvb3YsqbEzGbzbwxYfFtf0/Lbo/Qsd8/yDYYsVqt7Hr/iwfOX1HaXt1DCOjXkfz0bLBaiVq72+bz1pMHoPVwJv+GEfdWAUSujsB4OfUvpwsQ2K0lLft2xFSmffj9XZVsWj3Rmcdnj2Df4i3E/nC6Vrr2Xdvg+FhXzBml5St9Q9UpL04De+K1ZjZxbYZgzas5Lc21Rysa9u9EkcEIVkhcE2HzecMnu+DRtwM555JwahPI9Z1HMByMqpXPouJcH2NR38qFKF0QF2eR5+/PGNIzWBe+hbhLCez4z7o70gBxcRbV1tfH/ul+XQMAODjrGDlnDGnJaTQKeIgdK7eSbbi9JSai2oqquFfXLVVxL2Ihsh26n3Gua/5u6/Wk1NN7zLVr1zh06FBduyGMiXOeJ/LnU2z94DOOHjjGlAUTq7QLaducEz/8746+Q6VRMX7pRD5962O+fG8Hvs39Ca1pusYAACAASURBVOnW6oHzV4S2QqOi+/LxnFi8lah3d+HW3AevbiE2Nnb2Gn5ZvI3oD/eR+O1JOr8x8i+nC6DUqBi8ZDz73v6Uw+99SaNgXwK72mq7enuQm5GDMbX6B0XfjEyjptHiKfy+NBzD+m1omgVg36V1JTtVoA+qh31rrSvXqghe+QLxCzaTuDoCXQtfXENb2tgoNCouvbOd5A++5sr7uwla/GyttEXFuT7Gor6VC1G6IC7OIs9fVUSdjSEs9B/czY4JIuMsqh+pb/2TaJ//zPDZo/nt52j2/msXpw78j1Hzn7ut40W1FVVxL69bquJuYyGyftzPOP+dOX78OIsWLWL9+vVs2LCh0ufh4eEsXbqU8PBwpk+fzuXLl+/J90oDxXuMXq//Sw8Uuzz6D86dOg/A2V/P0TWsc5V2B3cfpqS45I6+I6h9Mwz6G5QUlR4fHxlL27AOD5y/IrQ92wdhumbAUvbb0369iO+jbWxsTq2uuLMvk8spzq150Xp90wXwbRdEpt6AuUw7KTKe4LC2NjaZ126QcOJ8rfT+QNs2mOKU37GWnZO8qPPoenaysZFp1Li9MBTDLe64VoVzh6YUXLuBtczfrJNxuPe29Td1xxEK9aUdpX1AI3Ljr9VKW1Sc62Ms6lu5EKUL4uIs8vxVRZ9eodjb29/x8SA2zqL6kfrWP4n2+c+0DWvPxag4AOIiY2kb1v62jhfVVlTFvbxuqYq7jYXI+nE/4/wgYJGJ+auO/Px8Fi5cyLx585g6dSpxcXGcOHHCxiYvL4+5c+cyYcIEHn/8cVatWnVPfm+9Tz1NSEhg48aNBAYGcvHiRSZPnsx7771HYmIiK1euxM7Ojtdff51Jkyaxb98+9Ho9Xbt25fTp0/Tu3ZuMjAwuXLhAixYtmD59Oj/88APLli2jf//+GAwGrly5wtixYzl27BhxcXGsWbOGxo0bk5aWxpo1awgKCiI5OZnhw4fTtGlTdu/ezYULF1i/fj39+/dn/fr1XL16lW7duhEZGcmjjz7Kpk2bGDVqFFOmTGHDhg2cPn2atWvX4uRU8YDU77//nqNHj+Lt7U1KSgpz587l/PnzLFq0iJCQEJycnPjqq694/fXX+eCDD+jVqxcWi4Xvv/+eI0eO8MEHH1BSUoLFYkGpVDJlyhS+/PJL3n33XUaOHMn169eJjY0lIiKimuhWxrWBC3mmPADycnJxcnVCoZBjNt+7rG2nBs4UmCp2Gcsz5eHfoMkdaYn0V4S21t2J4pt+e5EpnwbuVT84V65UEDQslGPzP/nL6QLo3J0ovGkXwgJTHl4N/Gt1bHUo3Fyw5Fb4bDHloXCz3YXP45VnSf/wM7iNCx2VuxNmU4W/ZlM+SvfKu/vJNUoCXh2Ga7cWxEyq3SN5RMW5PsaivpULUbogLs4iz58oRMZZVD9S3/on0T7/GacGzhSUndN8Ux46F0fkCjmWWn6XqLaiKu7ldcst9e8iFiLrx/2M89+VM2fO4OXlhUqlAqBdu3b8+OOPdOnSpdxmxowZ5f9bLJa7vvn2B/V+oPjTTz+hVqt57rnnSEtLQ61Ws2TJEgYNGoS/vz8Wi4VOnTrRu3dvgoODGTNmDNOnT8dkMhEaGsrx48fRarWEhYUxffp0wsLCOHjwIN7e3rzyyissWbKE8+fPs3jxYj755BMOHDjA+PHjWbFiBb169WLgwIFcu3aNKVOmsGfPHgYPHgzA1KlTAXj11VcZOXIkU6dOpbCwkBs3bmAymVCr1QCoVCoWL15sM0g0Go0sXryYQ4cOodFoWL9+PZ9//jljx46ld+/e5OfnM3v2bJ566ikCAwP55Zdf8PPzY9SoUTz55JMcPXqUs2fPsnHjRgBeeOEFfv75Z55++mn27NlDq1atmDJlCr/99lutYvzk6AH06Nud/Lx8MtOzsNfZY8rOxd7RgezM7HveQWSnG9HoKra5ttfZk51e+1x8kf6KjkW+IRvlTb9dpdNSYMiuZCdXKui+bByRK74gJ+n3v5wugMmQjdpBU/5ao7MnN72y9u1izshC7lDhs1xnjzmjonzZNXJH4azDsV9o+Xtu4waTeySSgnMXb6lbZMhGoavwV6HTUlzFGhJLQTGX39mO1t+TdrsWcLzTNKwl5mp9FhXn+hiL+lYuROmCuDiLPH+iuNdxFtXW18f+6X5eA4T9sw8dH+9MQV5B6bWAg5a87Dy0OntMWTm1HhiBuLaiKu72uqUq7mUsRLZD9zPODwJ1sZlNeno6Dg4O5a91Oh3p6VWn8RYVFbF7924WLlx4T7673qeePvPMM7i5uTFq1CjWr1+PnZ0dOp2OXr16sXfvXnbv3s2TTz5Zbu/t7Y1cLsfJyYkGDRrg4OCAXC5HLrcNha9vaY62k5OTzf+5ubkAxMXFER0dTXh4ON988w0NGjTAYqm6+Pj5+aFUKtHpdAQEBDBy5EgiIiIoLCwkLS0Nb29vG/ukpCQAtmzZQnh4OEajkZKSijs8gYGBAAQHB6Mse+baH++1atWKuLg4fHx8bL4/Nja2/HWTJk3KbWvDV1v3MWv0HN6YsJgTh3+hZfsWADzSsSXHy9YgyGQyPL0a1kqvJi6eisO9sQd2qtL7GE07BHP6h8haHy/SX9GxSDt1EZ23O/Ky3+7ZMYjkw2dQuziUDxQUGiXdlz/Pb+HfYfjtCv79O/7ldAGSoy7i2tgdRZm2X4emxP5wGq2zA2rdnT8vK/90LEqvhsiUpbr27Vpg+vEkcmcdcgctJdcNpM5ZS0b4TjLCdwKQ8d/dNXaWxsh4NN4eyMr8denUDMOh09i5OKAo89d30oBy+8LUDJRuTsg1qhp9FhXn+hiL+lYuROmCuDiLPH+iuNdxFtXW18f+6X5eA/yw/SArxr7N+5NWcfqHUwS1awZAsw7BnP7h1G1piWorquJur1uq4l7GQmQ7dD/j/CBgEfRXHQ0aNCgffwCYTCYaNGhQya6oqIhFixbxyiuvlI9d7hbFokWLFt0TpToiMjKSJ554glGjRhEVFcW1a9do27Ytvr6+rFy5EqVSybBhwwDIzs7m8OHDDBkyBIDNmzczduzYSv8fOnSI5s2b4+3tzcmTJ3FycqJ58+ZcuHCBnJwcOnfuTFRUFGFhYQwbNoz27dtjtVpp3rw5169fJyYmhtDQUJKTk1EoFDbfCeDg4EB0dDT79+/n6aefpnHjxja/SaPRlKeJduzYEX9/fxwdHfHy8rLx5w9u9hdK85R/+uknBg4cCJQOOPv06YOvry+7d++md+/eNjOYt+LjdzdXeu+3yBieGj2Qh1sE0qpDSz58J5yCvAKCQgJ5J3wRu7d8DUD3Pl0JG9ATv0BftA5afouMAcBFrqmk+WfMJWb0l67xxItP8nDbpmSlZfDTzv+r8TijpfKarLv1tzruVruNrPIDtK0lZrIupvDIS/1p2DaQvN+ziP/iJ9rPehq3YB/Sfo3n0X9NxeORABp1DqbpMz1o3LUFsdurj8+Drpspr7yLhaXEzO+X9IS++AQ+bR4m5/csTkX8RO9XhtKomTdJkfEA9JryFE26tECj01KUX0RGUpqNThvFn3ZtKzFTePkqbs8PQds6mJIbGWTvOoT7tNGom/qTX7b+RuHqhNu4wTh0aQ0lZoqu6G1Sd3IK1ZVikRevx2/SQJzaB1GUlkXq5z/S5LVn0DX3wXgyDtfuIXg+1Q1dc1+8RvREv/Uw2WW/ozwWisoPmb9XcfZT/unB0Q94LBKUle9lPujlohKCYnwv4yxK1+/l3tV+zx/8evosew8cJu5iAgWFhbRs3hQ7u1snPGX+59vKb96DOH99i5Q7Uf3Ig9w/idStzXUAQPypWB4d9Ti+zf1o2j6Y7cu2UJh367XXLjKlzet71VYYqTkV806vW2rL7cbi/9n9aWb/HrVD0ebK5+5exbn3jKfvNDz3lVPvVt7R9V7QYeaQW37m7u5OeHg4I0aMQKFQ8PHHHzNw4EAcHR0pKSlBpVJRUFDAokWLGD9+PM2bN+fAgQM8/PDDd+2XzGq9m33G6p79+/dz/PhxvL29SUhI4OWXXy6fTZswYQLPPPMMvXuXdlZr165l7969LF26lJSUFJYtW8bSpUsBmDdvHrNnz6ZZs2YsXLiQ5s2b8+KLL7Jw4UKcnZ2ZOXMm7777bnlaqL29PevWrcPb2xuDwUDnzp3p06cPWVlZTJ8+HX9/f8LCwoiKimLv3r1MnjyZoUOHlvt99uxZFixYwJ49e6r8XYcOHeLYsWM0aNCAtLQ0pk2bhslkKvdnwoQJtGrVirNnz5b7+9JLL+Hn5wfAhg0bKCwsxGq1otFomDJlCseOHePNN9/k8ccf58UXX8TNza3a2HZrHHbX5+fP+NlVXudyr0gqubs0j/vNczKvunbhgSHBTlwyx1hVlhBdvbHyQP9ekKBU1mx0h4TaZwjRFRWLQ9rKg+Z7hahyIQpRMRZJj5hlQnQvd50iRPf53No9CkDi7hB1HeAnEzN7lWSt5kbQA8oCde02jbtdNhe5CNEFWHbl9jbRqSv+7TNaiO7Eq1ur/fzYsWMcOHAAV1fX8r1HVq5ciYuLCxMmTGDKlClcvHiRhg1LZ/bz8vL48ssv79qvej9QrIqioqLytX9vvvlmpbTSusRisWCxWIiJieHy5cs2M40PGtJAUSzSQLECaaBYgTRQrEAaKFYgDRQrkAaK9RtpoCgeaaAojroaKNYV9X4zm6pYvnw5jo6OdOjQ4YEaJELp+sPVq1fj6enJnDlz6todCQkJCQkJCQkJCYlaUBeb2dQlf8mB4oIFC+rahVsSEBDABx98UNduSEhISEhISEhISEhI3JK/5EBR4t5Q39JD6ltpblJQLEy7dZvrQnQNiQ41G90BTQSm1enzxWh3mSsmBSdhdW7NRndI/UuXrX/3bv1W9RSi65t4WYgugDlBL0RXVIpo4PENQnR/BMa2nyVEW6KCXmYx/Yio1HqRHM2rfq+IO2VzkZjrrHmD/rqPvagt9a9Xujvq2aW1RH1H2CBRQkJCQkLiLpAGiRISEjXxl9vYpQYerAV8EhISEhISEhISEhISEnWONKMoISEhISEhISEhISFRAxZZXXtwf5EGihJ3hYOzjpFzxpCWnEajgIfYsXIr2Ybbf0xFYLeWtOzbEVN6NlitHH6/8gNNWz3Rmcdnj2Df4i3E/nC6zn0WpevaoxUN+3eiyGAEKySuibD5vOGTXfDo24Gcc0k4tQnk+s4jGA5G1airbNseVbceWLMysVqt5G/bbPO5+rG+aJ4YBEVFABQc+JbCwwdr1LXv2gbHx7pizjBitVpJ31D1FtdOA3vitWY2cW2GYM2r3Tb0omIhSlfuE4zi4baQn4PVCiX/22fzuar3GGQuHhX27t4UbF+CNTu9Rm2v7iEE9OtIflkdiVq72+bz1pMHoPVwJv+GEfdWAUSujsB4ObVGXVGxEOUviGsvRJXlXy7qOXzuCm4OWmQymPhYO5vP9Rk5vLvvJCE+7sSlpNOvTSA9Q/xq5bOoMqdo2hq71l2xmoxgtVK0/7NKNsoeA0s1G3gi0zpQsP39Gv0V2V78GUN6BuvCtxB3KYEd/1l3Rxq34kHvR+6X7r3SFtVeiCxvorRFxULkdZao9kKi7ql16unevXt5++23WbBgAb/88stdf/GhQ4e4du1atTaRkZEMHjyY//3vf9XarVy5kjFjxgBgMpkYPVrMM05q4vnnnyc9veYLvr8Sw2eP5refo9n7r12cOvA/Rs1/7rY1lBoVg5eMZ9/bn3L4vS9pFOxLYNcQGxtXbw9yM3Iwpt59fO+Fz6J05VoVwStfIH7BZhJXR6Br4YtraEsbG4VGxaV3tpP8wddceX83QYufrVlYrUY3bSa5GzeQt/UT7JoEomzTrpJZzrK3MM6egXH2jFoNEmUaNY0WT+H3peEY1m9D0ywA+y6tK9mpAn1QPexbs583ISoWwmJsp0T16CiKf9pJ8S/7kLs3Ru4TbGNiTr5AYcS7pX9ff4j5WnytBokKjYruy/8/e2ceF2XV/v/3zDDADMM2ghibICIogiuSimumaZlLmpaZmsZPeszHLJcCLUtzKS016xtZj/moZZZampUFWeZGiCsJmgsoruwMw7AM8/sDAieWQeUYPN5vX75eM8w1n/ua6z73de5z7rM8w4EFG0hcsRVtWy/ce5pfI1ZqWw4u2Mix93dyflc8YdFP/GOxEOUviMsXospyYXEpi7buY9bQ+4kc2JkzV7I5dOaymc26Pcfp6OPGM/06MKlvCMt3xtdPXFSZU9pgO+ZfFG37iOLvNiF390HRxjwWVqH9MBUWUPLrDoq2raV4z9cW3RWZL2oi8XgS/Xvdj4jdohtzPXI3dRtCW1S+EFneRGmLioXQ+yxB+aKxUibof2Ol3g3FrVu3MnHiRF577TVCQ0Pv+MA//fQT6el1r7TWtWtXAgICLGo9+eSTla81Gg3//e9/79i/22Ht2rU0a9bsHzn2P0Wn/l04k5gCQEpCMp36d7llDe/O/mSnZ2AsLgUgNeE0gf07mdlkX7rBuQN/3LnDNIzPonQdu7bBcOkGpopY5MSn4DLAPBZXNv9CUXp5Ilf7tqDgdN0dLgDKtkEYr12DkvKVVkuSTmLdrXs1O9tHR6AaNQbVuAnI7C2vkKnqFEjJ5euYSsr91Sf+gaZvNzMbma0N2imjyKilt7U2RMVClK78Pj9MeVlgLNctu3wWhW+wmY3xdELla6ugnpQm7bOoC+DWxR/dpQzKKny+9vsZvB/oaGZz+O2qJ4EyuZySAssbLouKhSh/QVy+EFWWj6de5z5nDdZWCgA6+jRnb3KamY1WoyK7YrP3rAID7TzrV4+IKnMK30DKsm5Aabmu8fwprILM631l177I7DQoew/F+pGnMRVZ3pRcZL6oiYH9eqFWq+9YpyYacz1yN3UbQltUvhBZ3kRpi4qFyPssUfmisXKvNRTrNfQ0NjaW1NRU1q9fT3h4OPHx8Xz77beMHj2a48eP4+3tTVhYGLGxsfj6+pKSksKCBQvQaDRcu3aNd999Fz8/P9LS0ggODiYgIIBTp04BcOzYMSIiIoiOjsbNzQ29Xo+rqyvPPPNMnT4lJSWxevVqgoODUd60NPv27dtZuHAhCQkJxMXFsXjxYoYMGUJGRgYXLlxgwoQJ7Nu3j5SUFJYvX46HhwfXrl1j+fLl+Pv7k5aWxpgxY2jfvj0zZswgPT2dXr16cfLkSdq3b8/06dPJyspiyZIl+Pn5cfHiRYYPH47JZGLhwoW88sorhIWFcfjwYbZv3463tzfnzp1jxowZFBYW8vLLL+Ph4YGrqytHjhxh6tSp9O3bt9rvW7lyJUajEblcjp2dHc8++yxfffUVK1as4IknnuDq1askJyfz0EMPsWbNGv79739z4sQJDAYDS5cuZdmyZXh6enL58mXCw8MZMGAAb731VrXzFhUVdQvFpToOzRwxFJRf8IU6PRone+QKOWXG+hd7jYsDRQVVwzEMOj3uzXzuyK+6aAifRelauzhg1FXFwqgrROlSfZsSua0S35dG49yzHUmRqy3qypycMRXqK9+b9AXInPzNbEqOH6U4/gCm3FyUoWHYRy0gb+7MOnUVWifKCqoSfplOj0Jr7q/rC0+T+f5nUFGh1hdRsRAWY7U9ppIqXVNxIXJVbT3JMhQtgyg9EmtRF0Dl4kCJrirOxbpCmrk41GgrVyrwH92LfVHrLOqKioUof0FcvhBVlrN0hahtquooOxtrsnTmPfbje7dn5vqfeHvHQU5ezCDibzeGtSGqzMk0jpiKqvIFBj0yjXksZM7NkdmqKf7+c2Su7qgjX6dgUSSYas93IvPF3aYx1yN3U7chtEXlC5HlTZS2qFiIvM8SlS8kGgf1aig+8MADrFu3jgkTJuDp6UmfPn1Yv349Tz31FBqNhtOnT5Ofn88rr7yCvb09//nPf/j6668ZN24cS5cuZcCAAQwZMoTi4mK+++47QkJCaNu2LSNGjCAsLAyAvn37MmDAAACGDRvG448/jkajqdWn+fPnM3/+fDp06MD+/fvZu3cvAMOHD2fVqvK5CP3792f37t14enrywgsvsGjRIv744w8WLFjAunXr+OGHH3jmmWdYunQp/fr1Y+jQoVy6dIlp06axfft2XnrpJcaNG8e//vWvSh+nT59OYmIiubm5jB8/nqKiInJycvD19aVt27YAmEwmXnjhBbZv345Wq2XXrl0sW7aM5cuXM3r0aH777TfmzJnD8ePHef/996s1FPfu3cuxY8f45JNPABg/fjzh4eE89thjbN++neDgYKZNm8aJEycIDg5m06ZNhIeHM3HiRE6cOMGHH35Iy5YtmTx5MsXFxQwYMIDQ0FBmzZpV7bzdDv2fHEjooDAMegN5mbnY2qnQ5+lRadTocvJvueLRZeRhY2db+d5Wo6Ygs2H36mlon0XpFmfkodBUxUKhUVFSw1yPMkMJZxduQuXjRuet89nfbTqmUmOtuqacbGSqqp51mdoOU06Ouea1qr0XS44ewWHBmyCXQ1ntv8GYlYPcrmrLE7lGjTGryl+rFi4oHDXYD+5V+TftpBEU/JKA4eSZWnVBXCyExVifj0xZpSuzVmEqzK/RVuHXAeP547Vq/Z3CjDyUmqo4W2tUGDKqXyNypYLwxZNIWPoF+anXLeqKioUof0FcvhBVlrUaFfqiqj1TC4qK0d4Uc4D5X/zKiNAABnfyI0tXyKPLvuTbuY/jqLap02dRZc6ky0Vmc9OTOFt1+dyjmzHoMV4or0NMNy6DrRqZswumrNrPo8h8cTdoKvWIaN2G1haVL0SWN1HaomIh8j5LVL5orEjbY9QTFxcXHB0dUSgUtG3bFrVazZo1a/jwww85cuQIWVnlG5+mpKTQsmX5pHxra2uGDRtWo96NGzdYsWIFMTEx6HQ6cv52E/t3/vzzz0pdLy+vOm29vct7WB0cHMxeFxQUVPp47NgxYmJi+Pbbb2nWrBllFTfHXl5eKBQKFApF5ZPLvn37EhoayuTJk4mOjsbKyry9nZ2djU6nQ6vVVh4/OTm58nMfHx8AtFptpQ83k5KSQmFhITExMcTExNCiRYvKeAK0atUKgODgqiFGfn5+lX9LSUmpjIm1tTWOjo6kpqYC1c/b7RC3aTdLJ7zBysi3OBJ3GP/O5cODA7oGciTu8C3rpSWewdnDBYV1eRxbdm1DctwRVI522GgaZt/FhvZZlG5uwmlsPV2RVcTCqVsAGT8dwcrJDkVFLLwjH6m0L7qShVLrgNzWuk7dklNJKNzcoKIMK4PaUxx/AJm9PbKKoVnqSc+CvHx4nMLDk7KrV+tsJAIUHklG6d4cmbLcX3Xnduj2xCN31CC3U1F6NYMrc98hK2YLWTFbAMj6z7Z63fSJioUo3bIrZ5E5aEFRrit398N4/gTYqMHavGGgaNed0j8OWIzBX1w7fAaNpwvyCp/dQv1Jiz2KjZNd5U2FwlZJ+JLJnIj5jowTF/AZYnmKgKhYiPIXxOULUWU5pGVzrmTrKK5oWB+9cJ1egd7k6ovQGcoXjrqaU4CLQ/l16KCyQS6DsnpMrBNV5oznk5FrXaGiblP4tqU06XdQa8C2PMalp48hd3Er/4KtCuRyTHnZdeqKzBd3g6ZSj4jWbWhtUflCZHkTpS0qFiLvs0TlC4nGwW2veiqTma8PGx0dTVRUFKGhoWzevJnr18t7CQIDA0lLSyMoKAiDwcD333/P8OHDkcvlmEwmUlNTKSwsZO3atcTGlg+JiYuLs3h8Pz8/Lly4QMeOHbl48eLt/oxKH7t3784DDzyAyWTCzc0NuVxe4+8EOH36NEOHDmXKlCls3LiRTz/9lOjo6MrPnZ2dsbe3JzMzk2bNmpGamkpgYNUCAzVp/t2fo0ePEhERAcCBAwcqG8W1ff/mv/0Vc4Di4mJyc3MrG6eWjn2rbF62gSdefpr7Wrnj5t2CjYvW3bJGiaGY7dGfMPS1CRRk5nE1OY2z+5N4aO4TFObq+OWDHQD0mzYcJw8XQh65H2OpkTO/1v+JTEP7LEq3rLCYlNlrCVg0ieLMPHR/pJG99ySt542jJEdH6uqvkdsoCVgyGUN6Bnb+Hpyetw6jzsJ4/6IidKvfwS5yOqbcHErPnaXkaCLqyVMx5edR+MUmyrKz0EyfifHqFax8WpG/bJFFf02GIq6+uobm86ZizMrDkHIe/YFjuM56BmNufmUFqXB2wGnsEACaTRlFzubvKL1W94R5UbEQFuPSEorjNqHsOwb0Osoy0im7mIwyfCQmQwGlCT8AIHP1xJR9HUrqNycPwGgoZt/L/6HH609jyMwj69RFLu9LolvUWIpyCji2Zgf9Vj+HNsATe++JAChVNlzY9fs/EgtR/oK4fCGqLKusrXhlRE+Wfn0AZztb/O9zJszfnXe+jcdRbcMz/Towa2gYG39L4ljqNdKz8nn+oa4429nWqlmJqDJXUoThi/exeez/YdLlUnb5AsbTx7B5dBImfT7FP31J8U9fYjNsEtYPjkbmch+GDe9AaUmdsiLzRU38fuQ4O36IJSMziw/XfcaEJ0Zia1P3U9r60pjrkbup2xDaovKFyPImSltULITeZwnKF42Ve217DJnJZLnbcs+ePbz++usMGjSIJ598koMHD7Js2TKee+45Jk2aBMCGDRuIjY0lLCyMpKQkcnNzeeONN7C1teXdd9/Fx8eHGzduMHr0aAICAti5cyc///xz5dy+GTNm4OrqSqtWrdiwYQPDhw+nZ8+eLFq0iLZt2xIdHW02Kf3EiROsXLmS9u3bU1payq5du5g/fz55eXm88cYbzJw5k6CgIF599VXatm3Ls88+y6uvvoqjoyMzZ85kxYoV5ObmsmDBAtRqNatWrcLT05OMjAzCwsIYOHAg77zzDjt27GDRokXodDpeeeUV5syZg4+PD1u2bMHPz4/U1FTGjBlDcXGxma+nTp3iUfIPDwAAIABJREFUq6++wtvbm/Pnz/Piiy8il8tZsGBBZWy++eYbtm3bxqJFi+je3Xxhkffff5/CwkIUCgVFRUW89NJLHDx4kHnz5jFo0CCeffZZtFot3333HfPnz2fSpEk8++yzKJVKdDodS5Yswd3dnStXrtCnTx8GDBjAli1bqp23uniy5YhbLU8WaSlrmCeENZFqalqToycb6n4qcyd06HjVstFtkHHeTohueq7lhXMaG91fdhKiu/Ht6qMMGopWJWIq5nM3zRNvUF0rcfNXJljXPWrldmn5Vl8huqbzZ4XoAhjP1b2w3O2S/r2Y8+e3/z0huhO6vChEV8KcfkYx9UgvdZZlo0bGXr1WiK6o3PnKow07Lehm7FfttGzUCFjSUszOCnNTNwjRvVPq1VCUuDeRGopikRqKVUgNxSqkhuJNulJDsRKpoViF1FBs2kgNxSqkhmIVUkOxcTYUb3voqYSEhISEhISEhISExL3CvfZ07bYXs5GQkJCQkJCQkJCQkJD430QaeipRKyUZ5xpc82yPaQ2u+ReihiOJYl3H+cK0RwTd2QJPtaEeHGjZ6DaQ+foJ0QXY8K/bW/TIEpHXfxaim7fsEctGt8mDb4lZPfLHWf6WjW4DkeUiYNJ/hejOU3cQoisSUUP2Jt+0b1tD0tKq+l6fDcGnh5cL0QVxdV9Tq/dAXN0nqt6zCRA3NcLqgd5CdEXVeyKnAyy+sEmYdkOyqOU4IbpRqRuF6N4p0hNFCQkJCQkJCQkJCQkJCTOkOYoSEhISEhISEhISEhIWEPdMtXEiNRQl7oiMzCxWxawn5c9zbP541W3rqHt0xP7BHhizcjGZTGS+V/MQBIehfXFfPpuUjiMx6W99WFND+StS2z08CN/BoRRm5oHJROI728w+7/DcI6hcHSm8kYtLsC8Jb39J7tkrFnWVnbpg3bM3ppxsTCYThRs/Nfvc5sGHsH34USgu3wTc8MMuimJ3W9SVewWiaN0JCvMxmaD0kPnKZdYDxiNzcq2yd/HEsGkRpjzL+6IdPJNO7MkLaO1UyGQw9cHOZp+nZ+WzYmc8QV4upFzOZHBHP/oGtaxFrQpRMXZ2duLNRS9z/nwarVv7Ej1vCdevZ5jZuLo24+OP3mHf/niau7qgtFby7xnRWJoFICrO9k72RL78LJfTruDp68GHSz4mO6P6RsgePu5MmzcVo9FIdMQCS6FokuXC0cmBl199gbQLl/Dx82bZG6vIuGHuT0inICZPHU/SiVP4tfbhaOJJPlv/VZ26osqbSG1ROVlUeasJO0cNT8wdz7W0a7TwvY/NyzaQl5F7W1o309TqvYb0WZRuU6v3ABRtOmDVoQcmXS6YTBR//1n14/ceCoC8mRsylR2GTSst6ja1eg/Ar2d72j8Uiq5CO3bl1mo2wQ+HMWj2WHYuWE9y3JF66TZG7rX5etLQ07vAoUOHOHXqVIPp/bWn49y5cwE4d+4cL7zwQoPp3wqJx5Po3+t+7mSmq8zWhhYLpnH9zRgyVm/ENsAXdffq83ys/bywbu19B942jL8itRW21oQveYYDCzaQuGIr2rZeuPcMMrOxUttycMFGjr2/k/O74gmLfsKysI0NmukzKfjwPfQb1mHVyg9lx87VzPIXv07u7Bnkzp5Rv8rSSon1A+Mo+XULJQd3InfxQO5lPo/RmHaKoi9XlP//5n2Ml07XqzFQWFzKoq37mDX0fiIHdubMlWwOnblsZrNuz3E6+rjxTL8OTOobwvKd8RZ1hcUYWPjGXGLjfmPZW2v45psfWLa0+lwcKysrvv7me5Yue48XZ71Gz57d6H5/l7qFBcZ56tzJJPx2mA1rPmPvD/uYNn9qjXZBndpyIO6QRT3R/ooqFwBz5v2b3345yPsrP2b3t3FEv159uwQ3N1c++XADMe99StRLi3jltZk4a2vfKkVkeROlLTInCylvtTBm9lOc+O0YOz7YyuEfDjEuauId6f1FU6v3QFzdd0/WewBKG2zH/IuibR9R/N0m5O4+KNqYnz+r0H6YCgso+XUHRdvWUrzna4uyTbHeU9paM2LRM+x847/EvvsVLQK98ethru3s6UpBVj65VyzneInGhdRQvAvEx8c3aENRo9EwbNiwyvetWrVixYoVDaZ/Kwzs1wu1Wn1HGqpOgZRcvo6ppBQAfeIfaPp2M7OR2dqgnTKKjFp6XOtLQ/grUtutiz+6SxmUFZfH4trvZ/B+oKOZzeG3v6x8LZPLKSkosqirbBuE8do1qNhHryTpJNbdulezs310BKpRY1CNm4DM3vIEfvl9fpjyssBY7m/Z5bMofIPNbIynEypfWwX1pDRpn0VdgOOp17nPWYO1lQKAjj7N2ZucZmaj1ajIrlgwI6vAQDvPZhZ1RcUYYMjgBzh48DAA+/b/zpDB/avZXLlyjY8/KS/HdnZqNHZqUtPq3sdOZJy7P3A/Jw//AcDx30/So39YjXa7t8VSWnGNWqIplguA/gN7c/j3owD8fugI/QdWX2jix+/3cCzxZOX70tLSOuMisryJ0haZk0WUt9ro1L8LZxJTAEhJSKZTfwsdMvWkqdV7IK7uuxfrPQCFbyBlWTegtNxn4/lTWAWFmh+/a19kdhqUvYdi/cjTmIos7/XcFOs9787+ZKdnYKzQTk04TWD/TmY22ZducO7AH/XSa+yUCfrfWLnnh56uWrWK7du3s2jRIjp06MCgQYOIjY3l5MmTLFmyhOjoaOzs7Pjkk0/w8fHh3LlzTJkyBRcXF2bOnAlAYGAge/fuZfr06ezfvx+tVovBYMDNzY3w8HDi4+Oxt7cnPT2diIgIbGxsAMjNza1R4+TJk5SUlKBUKikqKmLOnDkA7N27l02bNtGxY0dyc6uGz6xfv55169YRFxfHt99+y6uvvkpCQgLHjx9n3rx5vPLKK4SFhbFq1SqMRiNKpZKSkpJ/7Cnk31FonSgrqEqgZTo9Cq35qnauLzxN5vufwR3eODR2VC4OlOiqYlGsK6SZi0ONtnKlAv/RvdgXtc6irszJGVOhvvK9SV+AzMl8pcqS40cpjj+AKTcXZWgY9lELyJs7s25dtT2mkqqhUKbiQuSq2nq/ZShaBlF6JNaivwBZukLUNlUbudvZWJOlM++NHN+7PTPX/8TbOw5y8mIGEX+r+GpCVIwBmjdvRn6+DoC8vHy0WmcUCgVGo7Ga7eOPP8rUiKd5e/kHpKfXPbxHZJydmzmh15WXDX1+AQ7ODigUcozG26+6mmK5AGjmoqUgvzwWuvwCnJwdaz1/ABOefYL33llbec5rQmR5E6UtMieLKG+14dDMEUPF7yjU6dE42SNXyCkTcKxbRar3qmhq9R6ATOOIqahKG4Memcb8/MmcmyOzVVP8/efIXN1RR75OwaJIMNVe/ppivadxcaDophWODTo97s186vVdicbPPd9QnDZtGt988w0hISHExcWh1WrZu3cvXbp0oU+fPoSEhDBmzBiio6MJDg7m2LFjREVF8fnnnxMREcFbb73FrFmzmDhxImVlZSxcuJDNmzfj5uZGYmIivr6+dOvWDQ8PD0aOHGl2bEdHxxo1AAYMGADA1KlTOXPmDH5+fsydO5ft27fj6urKli1byMoqX9L86aefZt26dQA8/PDDLF9evsR3SEgIbdu2rTzeF198waeffoqfnx+JiYmiQ1tvjFk5yO1Ule/lGjXGrKqGsFULFxSOGuwH96r8m3bSCAp+ScBwUsxy//8UhRl5KDVVsbDWqDBk5FWzkysVhC+eRMLSL8hPvW5R15STjUxV1esrU9thyskxsym7drXydcnRIzgseBPkciirvVIz6fORKW2rdK1VmArza7RV+HXAeL7+S3ZrNSr0RSWV7wuKitFqbM1s5n/xKyNCAxjcyY8sXSGPLvuSb+c+jqPaplbdho7xs1OeYviwh9AV6Ll+PRN7ew25uXk4ONiTlZVdayPjiy++YcuWHfy0+wsuXbrMd9/H1XqMho7zsKceofdD4RTqC8nOzEGtUaPLK0Btb0dedt4d37Q3pXIxbsJoBj3SH32BnsyMLOzs1eTl5aOxtyMnO7fW8zfssSGo1SpWL4+p019R17RI7YbOyaLL2830f3IgoYPCMOgN5GXmYmunQp+nR6VRo8vJbxSNRJDqvZtpavUegEmXi8zmpiepturyuYo3Y9BjvHC63P7GZbBVI3N2wZRVu+9Npd67GV1GHjZ2VT7aatQUZFbX/l+hTPZPe3B3ueeHnsrlcvr27cvPP/9MUlISM2bMYNeuXezevZuBAwcCkJKSgpeXFwDe3t4kJydXft/Pr3yfL1dXV9zc3IiKiiIqKorx48djMNRv0vnfNUpKSli2bBkxMTFcv36drKwssrOzKSwsxNW1fPEHT0/PW/6ty5cvZ8WKFYwdO5YrV+o3QfluUHgkGaV7c2TK8n4Lded26PbEI3fUILdTUXo1gytz3yErZgtZMVsAyPrPtv+5yhLg2uEzaDxdkFuXx8It1J+02KPYONlVJnmFrZLwJZM5EfMdGScu4DMktC5JAEpOJaFwcwNleU+lMqg9xfEHkNnbI6sYNqSe9CzIy4e7KDw8Kbt61WJlWXblLDIHLSjK/ZW7+2E8fwJs1GBtXrkp2nWn9I8D9Y5FSMvmXMnWUVxafqN+9MJ1egV6k6svQmcoX3jgak4BLg7l/juobJDLoMzCZJmGjvFHazfw8NCnGDM2gl3fxXJ/xXzDnj1C2fVdeeNPJpPh5eUOQO9e9xPatbwH2GQykZqWjq9v3XOQGjrOX2/YyYtPzSU6YgEHYg/Svks7AEJC27O/Yl6YTCbDzb15nTp3y9+baehysfHTLTw9OpKpE18kbvevdAktPzehYZ2I2/0rUB4Ld48Wld8ZO34kLq5aVi+PIaCtP75+tS8kIeqaFqnd0DlZdHm7mbhNu1k64Q1WRr7FkbjD+HcOACCgayBH4g7fsX5DIdV7VTS1eg/AeD4ZudYVrMp9Vvi2pTTpd1BrwLbc59LTx5C7uJV/wVYFcjmmvOoLN91MU6n3biYt8QzOHi4oKrRbdm1DctwRVI522NzUOP1foQyTkP+NlXv+iSLA4MGDeffdd+nRowfh4eG8/vrr2NnZ8fjjjwPlw0LT0tJwcnIiNTWVwMCqRRlksqquhby8PJydnVm7di1nzpxh5syZ7NixA7lcjslk4tq1ayiVSrRardnx/64xe/ZsDh8+jLW1NSkp5fMrnJ2dsbW15fr16zRv3pxLly7V+nvs7OzQ6XRoNBouX66aBF1QUMCaNWvIzMxk2LBhPPzww3cWOOD3I8fZ8UMsGZlZfLjuMyY8MRJbm9p7tWrCZCji6qtraD5vKsasPAwp59EfOIbrrGcw5uZXVpIKZwecxg4BoNmUUeRs/o7Sa7c2Mboh/BWpbTQUs+/l/9Dj9acxZOaRdeoil/cl0S1qLEU5BRxbs4N+q59DG+CJvfdEAJQqGy7s+r1u4aIidKvfwS5yOqbcHErPnaXkaCLqyVMx5edR+MUmyrKz0EyfifHqFax8WpG/bJFlh0tLKI7bhLLvGNDrKMtIp+xiMsrwkZgMBZQm/ACAzNUTU/Z1KKnfnAcAlbUVr4zoydKvD+BsZ4v/fc6E+bvzzrfxOKpteKZfB2YNDWPjb0kcS71GelY+zz/UFWc72zp1hcUYiJ63hMVvvkIb/1a0atWS2XNeByAkpB3r/rOSTp0HYDAU8eKLkRw9ehJ7eztkMhnrPt1ct7DAOP/fko957pUIvFp54dHSnfde/z8AWrdrxbyVL/P0gCkAhA/sQc8B3fH28+LJyDFs+qAOn5tguQBY+sZKXnntBXz9WtLS14uF88tHZ7QNasO7/7eYgeEjeXBwP6LfeImk48kMHNIfZ60T8+e8yfmzqTVqiixvorRF5mQh5a0WNi/bwBMvP819rdxx827BxkXrblmjJppavddQPovSbXL1HkBJEYYv3sfmsf+HSZdL2eULGE8fw+bRSZj0+RT/9CXFP32JzbBJWD84GpnLfRg2vAOlJXXKNsV6r8RQzPboTxj62gQKMvO4mpzG2f1JPDT3CQpzdfzywQ4A+k0bjpOHCyGP3I+x1MiZX+s/kkTin0NmsrQm+z2AyWSiX79+fPTRR/j7+xMVFYWnpyeRkZEAnD17lrVr19KyZUvOnz9PREQEXl5evPbaa5w6dYqpU6cyaNAgsrOzefXVVwkKCiI7O5vWrVszatQo4uPjWbduHSqVitmzZ+PmVt7DVFxcXE3DZDIxZ84cSkpKaN++Pd988w1BQUEsWLCA/fv3s3HjRoKDg7lx4wYnTpwgKiqKlJQU3n33XV599VUeeeQRPv/8c44cOUKHDh3YvXs37u7uREdHM2fOHNq1a4fBYEClUjF1as2rzf1FSca5Bo/12R7TGlzzL/z2vydMWwTrOlZfEbOhGBF0UYiuenCgZaPbQObrJ0QXYMO/xFRGkdd/FqKbt+wRIboAD74l5mnEj7P8LRvdBiLLRcCk/wrRnaeuvnJlY6eXOkuI7uSC29vKwRItrRwtG90Gnx5eLkQXxNV9Ta3eA3F1n6h6zyagfgvc3A5WD1RfMKshEFXvnbMSN3R78YU7X7TpbhDl86QQ3UWN9PdLTxQpf6K3Z8+eyveLFpn3KPn5+bF48eJq33vzzTfN3js7O7NqVfV9hLp160a3bt2q/d3a2rqahkwmY9myZZXvJ0+eXPm6T58+9OnTp5pO165dGTduXOX7sWPHMnbsWACefLKqQK9evbradyUkJCQkJCQkJCQkJP6O1FCUkJCQkJCQkJCQkJCwQONYDuvuITUUJSQkJCQkJCQkJCQkLNCYF54RgdRQlKiVX4NebnDNnj881+CafyHCX5H8bFssTjzJS4hsq6M5lo1uC3GrEZ5TKYTohrkGCNHd+HaBEF2Aj+uxoMvtIM5ncYsdeKlcheiKnMMjinPFToKUr1o2aURM6PIi823qv7DSrSBqLqHIef/puYLm5imVlm1ug4zzdkJ0OS/wmv5+jyBhrWWT26BV6T2/WcI9h9RQlJCQkJCQkLjnEdVIbIoIayRKSDRx7q3nidI+ihISEhISEhISEhISEhJ/Q3qiKCEhISEhISEhISEhYYGmN7ngzpAaihL1xrl3MM2HdKM4IxdMcH75l2afNx/WHdeHupJ/MhWHjn5c3fILGbsT66V98PhpYuNPoHXUIAOmjh5k9nn69Sw+2PIDfp5unL10jfEP9yHAx/0f81lkLP5O+54hhA6+n7yMXEwmE1tXfnFbOu7hQfgODqUwMw9MJhLf2Wb2eYfnHkHl6kjhjVxcgn1JePtLcs9esagrMhaitP16tqf9Q6HoKmIRu3JrNZvgh8MYNHssOxesJznuSL38tXeyJ/LlZ7mcdgVPXw8+XPIx2RnZ1ew8fNyZNm8qRqOR6IgF9dIWdf7UPTpi/2APjFnl5SvzvZr3cnIY2hf35bNJ6TgSk97y/nii/BWpLer8iSpvIrWb4jXyd+wcNTwxdzzX0q7Rwvc+Ni/bQF5G7i3riLpGaiIjM4tVMetJ+fMcmz+uvt3WP+2vqJzc1PKbSG0pJ0s0JqShpw3MoUOHOHXqVI2fnTp1ikOHDt229tatW8nLy7vt798JcpU1gcumcHr+p5x/+0s07bxx7tXezEZha82fCzeRtuYbLqzchv+Cp+ulXVhUzMK1XzJrwjAiRw/idNoVDp04bWbz1qfb6RfanknD+jNhaF+i11jemFSUzyJj8Xesba155s2p/Pf1T/jq3c14t/UhqGfwLesobK0JX/IMBxZsIHHFVrRtvXDvGWRmY6W25eCCjRx7fyfnd8UTFv2ERV2RsRClrbS1ZsSiZ9j5xn+JffcrWgR649fDPBbOnq4UZOWTeyWzXr7+xdS5k0n47TAb1nzG3h/2MW3+1Brtgjq15UBc/XOBqPMns7WhxYJpXH8zhozVG7EN8EXdvfqG8dZ+Xli39v7H/RWtLeL8iSxvorSb4jVSE2NmP8WJ346x44OtHP7hEOOiJt6yhqhrpDYSjyfRv9f9mG5zIpRIf0Xl5KaW30RqSzm58VOGScj/xorUUGxg4uPj62woxsfH37b2tm3b/rGGomPXNhgu3cBUXApATnwKLgM6mdlc2fwLRenlNw1q3xYUnL5UL+3jpy9wn6sz1sryB9wdA3z59Yh5DFOvZnCfS/nKfB7NtZxOu0J2nu4f8VlkLP6Of5cAMtJvUFpxrNMJyXTq3/WWddy6+KO7lEFZhc6138/g/UBHM5vDb1f1DMvkckoKLC/sIDIWorS9O/uTnZ6BsUI3NeE0gf3NdbMv3eDcgT/q5efNdH/gfk4eLv/e8d9P0qN/WI12u7fFUlpSWm9dUedP1SmQksvXMVX4ok/8A03fbmY2MlsbtFNGkVFLr/bd9Fe0tojzJ7K8idJuitdITXTq34UziSkApCQk06l/l1vWEHWN1MbAfr1Qq9W3/X2R/orKyU0tv4nUlnJy48ck6H9j5Z4aerpq1Sq2b9/OokWL6NChA4MGDSI2NpaTJ0+yZMkSoqOjsbOz45NPPsHHx4dz584xZcoUXFxcmDlzJgCBgYHs3buX6dOns3//frRaLQaDATc3N8LDw4mPj8fe3p709HQiIiKwsbEBIDMzk59++on8/HxWr17N2LFjycnJ4aOPPqJNmzacO3eOyMhIdDodCxYsQK1Ws3jxYiIjIxk9ejReXl6kp6fz6aef0qpVKzw8PHjttddYv349JSUlzJ8/nxEjRjBy5EhmzJjBxYsX6dmzJwkJCQwcOJCePXtWO5aXV/23ULB2ccCoqxraYNQVonRxrGYnt1Xi+9JonHu2Iylydb20s/J02NnaVL7XqGw4lWveCOwU4MvxM6m0a+XFyT8vAlBQWISzg+au+ywyFn/HoZkjBl1h5Xu9To9Ps1a3rKNycaDkJp1iXSHNXBxqtJUrFfiP7sW+qHUWdUXGQpS2xsWBooIqXYNOj3szn3r5ZAnnZk7odXoA9PkFODg7oFDIMRrvbFaDqPOn0DpRVlClW6bTo9Cax9j1hafJfP8zuIWbdlH+itYWcf5EljdR2k3xGqkJh2aOGCrKd6FOj8bJHrlCTtktHEvUNSIKkf6KyslNLb+J1JZyskRj455qKE6bNo1vvvmGkJAQ4uLi0Gq17N27ly5dutCnTx9CQkIYM2YM0dHRBAcHc+zYMaKiovj888+JiIjgrbfeYtasWUycOJGysjIWLlzI5s2bcXNzIzExEV9fX7p164aHhwcjR440O3azZs0YMGAA6enpPP/885X+zJkzh86dO3Po0CGWLFnCmjVrWLNmDePGjWP37t1MmjSJoUOHAuDh4cGECRPw9PSsfA9UHvcvXnrpJZ544gmef/55ioqKuHHjBnPnzq3xWPWlOCMPhaZqDzaFRkVJDXM9ygwlnF24CZWPG523zmd/t+mYSo11amsdNBQYqnqXdIVFaB3NG4AvPf0o63f+wn+//QUHOxVO9mrcmlWvoO6GzyJj8XfyMnOx1agq36s1avIyb32OTWFGHsqbdKw1KgwZ1Z9Oy5UKwhdPImHpF+SnXreoKzIWorR1GXnY3LSfoK1GTUHm7T+pH/bUI/R+KJxCfSHZmTmoNWp0eQWo7e3Iy85rkBtgUefPmJWD3K5KV65RY8yqirFVCxcUjhrsB/eq/Jt20ggKfknAcPLMXfdXhLbo89fQ5e1uaDfFa+Qv+j85kNBBYRj0hvL8aadCn6dHpVGjy8m/pUYiiLtGRCHSX1E5uanlN5HaUk5u/Nxri9ncU0NP5XI5ffv25eeffyYpKYkZM2awa9cudu/ezcCBAwFISUmpfNLm7e1NcnJy5ff9/PwAcHV1xc3NjaioKKKiohg/fjwGw61PXE9JSWHfvn3ExMRw6NChyuEmzZo141//+hcfffQRgwYNsqBSMy1btkSpVKLRaPD19a31WPUlN+E0tp6uyKzL+xacugWQ8dMRrJzsUFQkCO/IRyrti65kodQ6ILe1tqgd0saHKzeyKa7oHTuacp7endqSq9Ojq5igfT0rjwlD+zL+4T50aOND95AAlFZ193OI8llkLP7OmcMpuHi4YlVxrDZdAzkSl3DLOtcOn0Hj6YK8Qsct1J+02KPYONlVJniFrZLwJZM5EfMdGScu4DMk1KKuyFiI0k5LPIOzhwuKCt2WXduQHHcElaMdNjdVdvXl6w07efGpuURHLOBA7EHad2kHQEhoe/ZXzLGSyWS4uTe/Ze2/EHX+Co8ko3Rvjqxi2Le6czt0e+KRO2qQ26kovZrBlbnvkBWzhayYLQBk/WebxZsoUf6K0BZ9/hq6vN0N7aZ4jfxF3KbdLJ3wBisj3+JI3GH8OwcAENA1kCNxh29ZT9Q1IgqR/orKyU0tv4nUlnKyRGND8dprr732TztxN7Gzs+Pjjz+mdevWDB8+nGXLlqFSqRg1ahQAe/bsISQkBDc3N1JSUkhOTmbUqFGkp6eTnJzMgAEDAMjLy0Ov1zN16lSCg4OZN28eTz75JAkJCWg0Gpo1a0ZxcTEqVVWlmpKSQm5uLgEBAWRlZXH48GEmT57MoEGD6NSpE2VlZbRp04bi4mK2bt1K586diY2NpU+fPgB8/fXX9O/fn6tXr+Ls7MyPP/5I586dcXFxYfv27Xh4eNC2bVvy8vKIjY01e6q5Z8+eGo9VF+dvGlNuKjWiP51Oy8ihOHTxp/haDlc+30OrWY+jaetFbnwKzuFBuA3viaatN+5j+5K+IZa8BPNFabzHV5+UrbRS4OvRnPU7f+HEmTRcnR0Y3i+MD774nj8vXqVzYCt+OZzEpzv2kH49k2NnLvD82CHYWivNdC5uOG72vqF8/jsNpXvEyvLTRWOpkfQ/L/Hws8No3akNOdey+HXLzxa/52syr5hNpUZyzlwm5P8NoXknP/TXczj9xa90efExtIFeXPv9NA988DyuIb60CAukzeO90YDJAAAgAElEQVS98ejRjuRN5sdyLiurpisixg2pfU5p3h9WVmrk+p/p9Hr2Ybw6tib/eg6Hv/yVAS+MokWAJ6kV3+83bTiturfDVqOiuLCYrNRrZjqpZdXnyJ5ISGL4U0Np3c6P4K7teX9hDAa9Af8gPxbGvMa29d8AED6wB/0f6UtLP29UdipOJCRVanSUVd/ouqHOX0tloblwqZGisxfRTh6JqkMgpTeyyNv6Ey7Tn8KmjQ+FFXPJFM4OaCeNwK57Byg1Unwh3Wx4VFqJeeOhofytiYbSPkq+kPPXUmE+GqKhyltNiNJu7NeIk9y2mm5NnD6czAPjBuHdtiVtugSyafF6ivS1z43qU1NObqBrRDt5SL18/v3IcXb8EEvKmXMYiopo37YNVnV0imZ/vEuIv/lFNvydhsrJ2QpFNd3GnN9qRJT2PZyTu8wcWcsRGhffvbtFiO7gGaOF6N4pMpPpdtfWapqYTCb69evHRx99hL+/P1FRUXh6ehIZGQnA2bNnWbt2LS1btuT8+fNERETg5eXFa6+9xqlTp5g6dSqDBg0iOzubV199laCgILKzs2ndujWjRo0iPj6edevWoVKpmD17Nm5ubpXHPnfuHIsWLaJ58+aMHz8eGxsbPvnkEzw9Pbly5QqPPvoobm5uLF68mG7duhEWFsa4ceMYNWoUL7/8Mh9//DFpaWkUFRWxdOlS4uLi2L59O926dSMhIaF8GfHoaDZt2sSOHTt47rnnKhvAZ8+erXasrl3rXhQl1m1Mg8e/5w/jG1zzL/YN+q8wbRF8bFssTLuf0U6IbquSEiG6IvlJpbBsdBv8WnJViO5EmeVtX26XXuosIbp79VohuiJZZ7osRLe3soUQ3aaIqGukpVXd0w5ul/k24hbY8Nv/nhDdsz2mCdFNz63eYdVQnFMqLRvdBqLyW1OkKebkZy9t+KddqBfTfBr+3hjgvQubhejeKfdcQ1Gi/kgNRbFIDcW7g9RQrEJqKFYhNRTFIzUUq5AailVIDUXxNMWcLDUUG2dD8Z5azEZCQkJCQkJCQkJCQuJ2aMx7HorgnlrMRkJCQkJCQkJCQkJCQsIy0hNFiVrpnbS4wTVFDZMB6J0kZmiPKM51nC9Me+zIHCG6ilYeQnTlfYcK0QU498jnQnQ3Ft4Qojtuvr8QXYAH3xIz3PLHWWKGOst8/YToArwx6ZgQ3QGlrkJ0RQ3XA3FD9n5tYiPVRQ0PBXF1nyifxV154uo+F98CIbo2AeKG4Vo90FuIbnrEra/wWx9ETeVoStxbzxOlhqKEhISEhISEhISEhIRFpKGnEhISEhISEhISEhISEvc00hNFiTsiIzOLVTHrSfnzHJs/XnXbOuoeHbF/sAfGrFxMJhOZ722q0c5haF/cl88mpeNITHrDP+avSG338CB8B4dSmJkHJhOJ72wz+7zDc4+gcnWk8EYuLsG+JLz9Jblnr1jUVbTpgFWHHph0uWAyUfz9Z9VslL3Lh4DKm7khU9lh2LTSoq7cKxBF605QmI/JBKWHdpp9bj1gPDKnquF4chdPDJsWYcrLtKh98PhpYuNPoHXUIAOmjh5k9nn69Sw+2PIDfp5unL10jfEP9yHAx/KqoaJi7OjkwMuvvkDahUv4+Hmz7I1VZNww/50hnYKYPHU8SSdO4dfah6OJJ/ls/VcWtUXF2d7JnsiXn+Vy2hU8fT34cMnHZGdkV7Pz8HFn2ryp5dvwRCz4x/wFOHgmndiTF9DaqZDJYOqDnc0+T8/KZ8XOeIK8XEi5nMngjn70DWppUVfU+XPuHUzzId0ozsgFE5xf/qXZ582Hdcf1oa7kn0zFoaMfV7f8QsbuRIv+griyLConiypvNWHnqOGJueO5lnaNFr73sXnZBvIycm9L62aaWr3XkD6L0hVVjpWdumDdszemnGxMJhOFGz81+9zmwYewffhRKC5fhdzwwy6KYnfXy2dRdaqo/CYyD/n1bE/7h0LRVZy/2JVbq9kEPxzGoNlj2blgPclxR+ql2xgps2zyP4X0RLEWDh06xKlTp4Tpnzt3jhdeeKFBtE6dOsWhQ4caROtWSTyeRP9e93Mnm6zIbG1osWAa19+MIWP1RmwDfFF371DNztrPC+vW3nfgbcP4K1JbYWtN+JJnOLBgA4krtqJt64V7zyAzGyu1LQcXbOTY+zs5vyuesOgnLAsrbbAd8y+Ktn1E8XebkLv7oGhjHmOr0H6YCgso+XUHRdvWUrzna8u6VkqsHxhHya9bKDm4E7mLB3KvQDMTY9opir5cUf7/m/cxXjpdr8ZAYVExC9d+yawJw4gcPYjTaVc4dMJ84+a3Pt1Ov9D2TBrWnwlD+xK9puYbrZsRFmNgzrx/89svB3l/5cfs/jaO6NdfrGbj5ubKJx9uIOa9T4l6aRGvvDYTZ61T3cIC4zx17mQSfjvMhjWfsfeHfUybP7VGu6BObTkQV888I7JcFJeyaOs+Zg29n8iBnTlzJZtDZ8znXq7bc5yOPm48068Dk/qGsHxnfL3cFnH+5CprApdN4fT8Tzn/9pdo2nnj3Ku9mY3C1po/F24ibc03XFi5Df8FT9fLX1FlWWROFlLeamHM7Kc48dsxdnywlcM/HGJc1MQ70vuLplbvgbi6r1HXezY2aKbPpODD99BvWIdVKz+UHTtXM8tf/Dq5s2eQO3tGvRuJoupUUflNZB5S2lozYtEz7Hzjv8S++xUtAr3x62F+/pw9XSnIyif3iuUcL9G4kBqKtRAfHy+0odiqVStWrFjRIFqnTp0iPr5+N0INzcB+vVCr1XekoeoUSMnl65hKSgHQJ/6Bpm83MxuZrQ3aKaPIqKXHtb40hL8itd26+KO7lEFZcXksrv1+Bu8HOprZHH67qhdQJpdTUmB57y+FbyBlWTegtFzXeP4UVkGhZjbKrn2R2WlQ9h6K9SNPYyoqtKgrv88PU14WGMt1yy6fReEbbGZjPJ1Q+doqqCelSfss6gIcP32B+1ydsVaWD3zoGODLr0fMr8nUqxnc51J+k+7RXMvptCtk5+nq1BUVY4D+A3tz+PejAPx+6Aj9B1ZfqODH7/dwLPFk5fvS0lJKK8p+bYiMc/cH7ufk4T8AOP77SXr0D6vRbve2WIt+3g1/j6de5z5nDdZW5YsqdPRpzt7kNDMbrUZFdkH5k5esAgPtPJvVS1vE+XPs2gbDpRuYKspbTnwKLgM6mdlc2fwLRenlN1Bq3xYUnL5UL39FlWWROVlEeauNTv27cCYxBYCUhGQ69e9yR3p/0dTqPRBX9zXmek/ZNgjjtWtQsf9vSdJJrLt1r2Zn++gIVKPGoBo3AZl9/RauEVWnispvIvOQd2d/stMzMFZopyacJrC/uXb2pRucO/BHvfQaOyZB/xorTXLo6apVq9i+fTuLFi2iQ4cODBo0iNjYWE6ePMmSJUuIjo7Gzs6OTz75BB8fH86dO8eUKVNwcXFh5syZAAQGBrJ3716mT5/O/v370Wq1GAwG3NzcCA8PJz4+Hnt7e9LT04mIiMDGxgaAy5cvs3DhQkpKSujUqRMXLlygdevWREREUFpayptvvolWq0Wn0xEYGMjw4cNZu3Yta9as4d///jcnTpzAYDAQFhbGunXriIuL46uvvmLFihVMmjSJlJQUsrOzGTlyJL/99hupqal8+OGHaDQazpw5w0cffUSbNm04d+4ckZGRqNVqfvrpJ/Lz81m9ejVjx44lJyenmp2Dg0ONv33AgAH/2Hn8C4XWibKCqgRaptOj0JpvqOz6wtNkvv8Z3OGNQ2NH5eJAia4qFsW6Qpq5ONRoK1cq8B/di31R6yzqyjSOmIr0VX8w6JFpzGMsc26OzFZN8fefI3N1Rx35OgWLIsFU+0ALmdoeU0nVUChTcSFyVW293zIULYMoPRJr0V+ArDwddrY2le81KhtO5Zo3AjsF+HL8TCrtWnlx8s+LABQUFuHsoKlVV1SMAZq5aCnIL4+zLr8AJ2dHFAoFRqOxRvsJzz7Be++sJT+/7satyDg7N3NCryv3WZ9fgIOzAwqFHKPx9gfYCC0XukLUNlUrgNrZWJOlM++lHt+7PTPX/8TbOw5y8mIGEX+76awNEefP2sUBo64qFkZdIUqX6hvGy22V+L40Guee7UiKXF0vf0WVZZE5WUR5qw2HZo4YKn5HoU6PxskeuUJOmYBj3SpSvVeFsHrPyRlTYVW9Z9IXIHMyX1W65PhRiuMPYMrNRRkahn3UAvLmzrSsLahOFZXfROYhjYsDRQVV2gadHvdmPvX6rkTjp0k2FKdNm8Y333xDSEgIcXFxaLVa9u7dS5cuXejTpw8hISGMGTOG6OhogoODOXbsGFFRUXz++edERETw1ltvMWvWLCZOnEhZWRkLFy5k8+bNuLm5kZiYiK+vL926dcPDw4ORI0eaHdvd3Z0BAwawf/9+nnvuOQCGDBlC3759SUxMpKSkhGnTpmEymRg8eDC9evViypQpbNq0ifDwcCZOnMiJEycIDg5m3bp1ADz22GNs376doKAgpkyZwnPPPUdBQQFvvvkmCxcuZN++fQwaNIjo6GjmzJlD586dOXToEEuWLGHNmjUMGDCA9PR0nn/++cr41GRX029vDBizcpDbqSrfyzVqjFlV80isWrigcNRgP7hX5d+0k0ZQ8EsChpNn7qqvoinMyEOpqYqFtUaFISOvmp1cqSB88SQSln5Bfup1i7omXS4ym5t6fW3V5fMqbsagx3ihfGin6cZlsFUjc3bBlFW7vkmfj0xpW/leZq3CVJhfo63CrwPG88ct+voXWgcNBYaqXmNdYRFaR/MG4EtPP8r6nb/w329/wcFOhZO9Grdm1Su/m2noGI+bMJpBj/RHX6AnMyMLO3s1eXn5aOztyMnOrbWRMeyxIajVKlYvj6nTX2j4OA976hF6PxROob6Q7Mwc1Bo1urwC1PZ25GXn3fFNu9ByoVGhL6rad6GgqBitxtbMZv4XvzIiNIDBnfzI0hXy6LIv+Xbu4ziqbf4uJ/z8FWfkobjJP4VGRUkN8+TKDCWcXbgJlY8bnbfOZ3+36ZhKaz72X4jKFw2dk0WXt5vp/+RAQgeFYdAbyMvMxdZOhT5Pj0qjRpeT3ygaiSDVezcjrN7LyUamqqr3ZGo7TDnmW0eVXbta+brk6BEcFrwJcjlYuD8SVac2dH77C5F5SJeRh41dlbatRk1BZvXz979C48ggd48mOfRULpfTt29ffv75Z5KSkpgxYwa7du1i9+7dDBw4EICUlBS8vLwA8Pb2Jjk5ufL7fn7lOwS5urri5uZGVFQUUVFRjB8/HoOhfhPF/9L+S//PP/8kJSWFGzduEBMTU/lE78aNqr3W/jpucHBwNb2bNR0cHPD2Lu99d3R0pKCgoPI37du3j5iYGA4dOlTrcI+67P7+2xsDhUeSUbo3R1YxxFDduR26PfHIHTXI7VSUXs3gytx3yIrZQlbMFgCy/rPtf66yBLh2+AwaTxfk1uWxcAv1Jy32KDZOdpUVqcJWSfiSyZyI+Y6MExfwGRJalyQAxvPJyLWuYFWuq/BtS2nS76DWgG25bunpY8hdKsqErQrkckx51ReZuJmyK2eROWhBUa4rd/fDeP4E2KjB2rxyU7TrTukfB+odi5A2Ply5kU1xRW/60ZTz9O7UllydHl3Fgg7Xs/KYMLQv4x/uQ4c2PnQPCUBpVXf/V0PHeOOnW3h6dCRTJ75I3O5f6RJa3rsbGtaJuN2/AiCTyXD3aFH5nbHjR+LiqmX18hgC2vrj61f3QgQNHeevN+zkxafmEh2xgAOxB2nfpR0AIaHt2V8xL0wmk+Hm3rxOnbvl782EtGzOlWwdxRU3L0cvXKdXoDe5+iJ0hvIFKa7mFODiUJ73HFQ2yGVQVsskKtHnLzfhNLaersgqyptTtwAyfjqClZMdiory5h35SKV90ZUslFoH5LbWFmMhKl80dE4WXd5uJm7TbpZOeIOVkW9xJO4w/p0DAAjoGsiRODF7y90OUr1XhahyXHIqCYWbG1TsQaoMak9x/AFk9vbIKu6L1JOeBXn5ME+FhydlV69abCSCuDq1ofPbX4jMQ2mJZ3D2cEFRod2yaxuS446gcrTD5qYOgP8VpKGnTYTBgwfz7rvv0qNHD8LDw3n99dexs7Pj8ccfB8qHV6alpeHk5ERqaiqBgVULKchkssrXeXl5ODs7s3btWs6cOcPMmTPZsWMHcrkck8nEtWvXUCqVaLVas+NfvHix8nVqaip+fn7k5+djbW1NREQEAD/++COenp41Hvd2CAwM5MEHHyQwMJDi4mJ+/PFHgEpfc3Jy0Ov1tdo1hA9/5/cjx9nxQywZmVl8uO4zJjwxElub2nu1asJkKOLqq2toPm8qxqw8DCnn0R84huusZzDm5ldWkgpnB5zGDgGg2ZRR5Gz+jtJrtzYxuiH8FaltNBSz7+X/0OP1pzFk5pF16iKX9yXRLWosRTkFHFuzg36rn0Mb4Im990QAlCobLuz6vW7hkiIMX7yPzWP/D5Mul7LLFzCePobNo5Mw6fMp/ulLin/6Epthk7B+cDQyl/swbHgHSi3smF1aQnHcJpR9x4BeR1lGOmUXk1GGj8RkKKA04QcAZK6emLKvQ0n95voBqGysiZryGEv+sw2tg4Y23vcRFtyGdzbswEGjZvLwBzh2+jy/HUmmXStPcgv0vPzMSIu6wmIMLH1jJa+89gK+fi1p6evFwvnLAWgb1IZ3/28xA8NH8uDgfkS/8RJJx5MZOKQ/zlon5s95k/NnU2sXFhjn/1vyMc+9EoFXKy88Wrrz3uv/B0Drdq2Yt/Jlnh4wBYDwgT3oOaA73n5ePBk5hk0fbP5H/FVZW/HKiJ4s/foAzna2+N/nTJi/O+98G4+j2oZn+nVg1tAwNv6WxLHUa6Rn5fP8Q11xtrO1qC3i/JUVFpMyey0BiyZRnJmH7o80sveepPW8cZTk6Ehd/TVyGyUBSyZjSM/Azt+D0/PWYdRZns8kqiyLzMlCylstbF624f+zd+ZxUZXt/3/PDDMwC6soyiaICIrirrmjmaZlZY9lm7nzxTJ/pWUauD6aZlmZrWY9amaZPlppVj5iLqlJiksQm4CiiAs7M+zD+f2BgRPoIHoH1Hn78vXizFznc665zn2u+9zn3AuPz32aVm3ccfNuyedL192yRm00tXrvTvksSldYTi4pwbj6LfTTZiDl5VKekkzZyWh0k8OQCvIp+moTFTnZGGbMxHwpAxufNhSsWFo3pwXVqaLym8g8VFZcytcRnzJq4XhMWflcik8j+XAs9855nKI8I/s/2AHA4OkP4eThSvD9d2EuN5N0oO49SWQaDoUkiZj/UTySJDF48GA+/vhj/P39CQ8Px9PTk2nTpgGQnJzM2rVrad26NampqYSGhuLl5cXChQuJi4sjLCyM4cOHk5OTw4IFCwgKCiInJ4e2bdsyZswYoqKiWLduHVqtltmzZ1u8fdu2bRv79u0jKCiIhIQE/P39mTZtGmazmddffx29Xk9ZWRm2trY8++yzfP/998yfP5+JEycydepU1Go1n3/+OW+//TYLFizA2dmZefPmMXr0aIYMGUJERATt27dn6tSpLFiwAEdHRxYsWEBeXh6ffvopnp6eZGRk8MADD9CjRw9SUlJYunQpLVq0YNy4cdja2tawCw4OrvHbrVGWmXLHz1ty3+l3XPMP/A6/K0xbBOu6zBem/djDudaN6oGqjYcQXWXIKCG6AJ/d/6UQ3X8XnhKiGz+/rxBdgHteF/M24n8v+Vs3qgcKXz8hugABEz8TovsfdQchuilqtXWjejJAly1Ed7Kpfks5WKO1zc27l9eX9cdXCtEFcXVfU6v3QFzdNzrovHWjemAbULcJbuqDzd01J8y6ExwOFfP2fI9WJUQXYNnZ25+06a9gvM+/hOiuP2t9mayGoMk2FBuSbdu2WYwJ/LsiNxTFIjcUq5EbitXIDcVq5IZiNXJDsRq5oVhNU6v3QG4oXo/cUKxGbig2zoZik+162lBkZGTw008/kZeXR2JiIu3atWtol2RkZGRkZGRkZGRkBGNtPOjfDbmheIu0atWK1avrNmWwjIyMjIyMjIyMjMzfg4ZqJh4+fJjdu3fTrFkzFAoF06db9lIoKSnhtddew83NjbNnzxIaGoqvr+9tH1duKMrcEPOFO784anqeuC4cPgL8BVB5iulKJpKShNqXIrhddG2EyCKd+3ssxHsnENndEv5+MyY2NkR2ERWFsLxsI6brqSjGd58lrPupqBiLzBZNjcxUvRBdV8TUpwCqNslCdMXloX/a4hCNg6KiIhYsWMB3332HRqPhueee48iRI/Tp06fKZv369bRq1YqpU6eSkJBAeHg4mzbdfnfeJrk8hoyMjIyMjIzMnUTkGEUZGZm/BxVIQv7fjJMnT+Lu7o5GU7lcSbdu3di3b5+Fzb59++jatSsAAQEBxMfHYzQab/v3yg1FGRkZGRkZGRkZGRmZRkhWVhZ6ffUbc4PBQFZW1i3b1Ae566mMjIyMjIyMjIyMjIwVpAYYpdisWTNMJlPVttFopFmzZrdsUx/khqJMnfnldCKRUb/h4mhAAYQ9YrkWY/qVbD7Y8iN+nm4kX7jMuPsGEeDjXidt54GdaDGyF6WZeSBB6sqtFt+3eLAPze/tQUHMORy6+HFpy34yd0c3qM9/JjMrm3fWbCDhTAqbP3mnXhoA7v2D8B3Rk6KsfJAkot/abvF952fuR9vckaKrebh28uXYG1vJS86wqqvu2h1Nv4FIuTlIkkTR5+stvre9517s7nsASksBKP5xFyWRu63qKr0CUbXtCkUFSBKUH91p8b1m6DgUTs2r7V09Kd60FCnf+pOuX5LSiYw5i4tei0IBYfd0s/g+PbuAN3dGEeTlSsLFLEZ08SMkqLVVXVExdnRyYO6CF0g7ewEfP29W/PsdMq9a/s7grkFMDhtH7G9x+LX14WR0DF9ssD4ttqhY2DvZM23uVC6mZeDp68FHyz8hJzOnhp2HjzvT54VhNpuJCF1kVbcplgtR509UeROpLSoniypvtaF3NPD4nHFcTrtMS99WbF6xkfzMvHppXc+dyvWiYizSZ1G6osqxrm8X7O/pizk7D0mSyHq39jFbDqNCcF85m4QuDyMV1m1sbVOrU0XmIb9+Hel4b0+M17QjV22rYdPpvt4Mn/0YOxdtIH7viTrpNkYaYpRmly5duHjxIqWlpWg0GqKjo3niiSfIzc3FxsYGg8FASEgIJ06coEePHiQkJBAYGIjBYLjtY8tdTxsJR48eJS4urqHduCFFJaUsWbuVl8Y/yLRHhpOYlsHR3xItbF5f/zWDe3Zk4oNDGD8qhIj36jaIVqnVELhiConz15P6xlYMHbxxHtDRwkZlp+HMkk2kvfctZ1dtx3/R0w3qc21En45lyIC7uJ2Zk1V2Gvovn8SRRRuJfnMbLu29cO8XZGFjo7Pjl0Wfc+r9naTuiqJ3xOPWhW1tMcyYiemjdyncuA6bNn6ou3SrYVawbDF5s58nb/bzdarQsFGjuftJyg5soeyXnShdPVB6BVqYmNPiKNn6ZuX/b9/HfCGxTo2BotJylm47xEuj7mLasG4kZeRwNOmihc26fafp4uPGpMGdmRgSzMqdUVZ1hcUYeHne/+Pn/b/w/qpP2P3dXiIWz6ph4+bWnE8/2siad9cT/uJSXlk4E2cXp5vqiooFQNicyRz7+Tgb3/uCgz8eYvr8sFrtgrq258jeo3XSbIrlAsScP5HlTZS2qJwMgsrbDRg7+yl++/kUOz7YxvEfj/Jk+ITb0vuDO5HrRcZYlM+idEWVY4WdLS0XTefKq2vIXP05dgG+6Pp0rmGn8fNC09b71pxuYnWqyDykttMweukkdv77MyLf/i8tA73x62up7ezZHFN2AXkZt98V8p+IVqtl4cKFLFmyhLfeeouAgAD69OnDmjVrqiasefrpp7l48SLvv/8+//nPf1i6dOkdObbcUGwkREVFNeqG4unEs7Rq7oxGXfkSukuALwdOWPp77lImrVwrb5g8WriQmJZBTr71gbSOPdpRfOEqUmk5ALlRCbgO7Wphk7F5PyXplQlG59sSU+KFBvW5NoYNHoBOp6vXvn/g1t0f44VMKq7F4vKvSXjf3cXC5vgb1U+dFUolZaYSq7rq9kGYL1+GsjIAymJj0PTqU8PO7oHRaMeMRfvkeBT21mfpU7byQ8rPBnOlvxUXk1H5drKwMSceq/rbJqgf5bGHrOoCnD53hVbOBjQ2lQv8dvFpwcH4NAsbF4OWnGuLemebiungab2bhagYAwwZNpDjv54E4NejJxgyrOZiyv/7YR+nomOqtsvLyykvK7+prqhYAPS5+y5ijlfOOnv61xj6Duldq93u7ZFW/fyDplguQMz5E1neRGmLyskgprzdiK5DupMUnQBAwrF4ug7pflt6f3Ancr3IGNfGnfBZlK6ocqztGkjZxStI18pRYfTvGEJ6Wdgo7GxxmTKGzBu8abwRTa1OFZmHvLv5k5Oeifma9rljiQQOsSzLOReuknLk7zG7eUNMZgPQr18/Fi9ezAsvvFC1NMbs2bMJDQ0FwM7OjgULFvDMM8+wfPnyO7I0BshdT+vEO++8w9dff83SpUvp3Lkzw4cPJzIykpiYGJYvX05ERAR6vZ5PP/0UHx8fUlJSmDJlCq6ursycOROAwMBADh48yIwZMzh8+DAuLi4UFxfj5uZG//79iYqKwt7envT0dEJDQ7G1ta06/qZNmzhz5gzNmjXj4sWLLFq0CJPJVKu2k5MT//3vf/Hz8yM1NZVZs2bh4uLC22+/TVlZGWq1mpKSEl5++eVbikF2vhG9XbVPBq0tcXmWDaquAb6cTjpHhzZexJw5D4CpqARnh5u/+ta4OmA2Vnf1MBuLULs61rBT2qnxffERnPt1IHaa9bUsRfosCq2rA2XGoqrtUmMRzVwdarVVqiP7C9cAACAASURBVFX4PzKAQ+HrrOoqnJyRigqrtqVCEwonfwubstMnKY06gpSXh7pnb+zDF5E/Z+bNdXX2SGXV504qLUKpvdGTWQWq1kGUn4i06i9AtrEInW31FN96Ww3ZRsunkeMGdmTmhj28seMXYs5nEvqniq82RMUYoJmrC6aCyjgbC0w4OTuiUqkwm8212o+f+jjvvrWWgoKbP5wQFQsA52ZOFBorfS4sMOHg7IBKpcRsrn8Hm6ZYLkDM+RNZ3kRpi8rJIKa83QiHZo4UmyrjU2QsxOBkj1KlpELAsW4VkTFuaogqxyoXJypM1boVxkJULpYxbv7C02S9/wXc4kOJplanisxDBlcHSkzVPhcbC3Fv5lOnfWUaP3JDsQ5Mnz6db7/9luDgYPbu3YuLiwsHDx6ke/fuDBo0iODgYMaOHUtERASdOnXi1KlThIeH8+WXXxIaGsrrr7/OSy+9xIQJE6ioqGDJkiVs3rwZNzc3oqOj8fX1pVevXnh4ePDwww/XOH7Lli157LHHUCqVLFmyhJ9//pmQkJBatceMGcPWrVtxc3Nj27ZtfPjhh7zyyit07NiRoUOHAhAWFkZSUhL+/v41jnUjXBwMmIqrny4Zi0pwcbRsTL349ANs2Lmfz77bj4Nei5O9DrdmNSu+P1OamY/KYFe1rTJoKatlHElFcRnJSzah9XGj27b5HO41A6m89hs40T6LoigzH7VBW7WtMWgpzsyvYadUq+i/bCLHXvuKgnNXrOpKuTkotNVPfRU6PVJuroVNxeVLVX+XnTyBw6JXQamEihvfVEmFBSjU1edOodEiFdW+5pTKrzPm1NNWff0DF4OWwpKyqm1TSSku15UTgPlfHWB0zwBGdPUj21jEAyu28t2cR3HU2f5Zroo7HeMnxz/C8PuHUGgqJCszG729jvz8Agz2enJz8m7YyHjwXyPR6bSsXrnmhtp/cKdj8eBT9zPw3v4UFRaRk5WLzqDDmG9CZ68nPyf/tm/am1K5EH3+RF3TIrXvdE4WXd6uZ8gTw+g5vDfFhcXkZ+Vhp9dSmF+I1qDDmFvQKBqJIK7ea4qIKsfm7FyU+mpdpUGHObs6xjYtXVE5GrAfMaDqM5eJozHtP0ZxzM3XnG1qdarIPGTMzMdWX+2znUGHKaum9t+FhpjMpiGRu57WAaVSSUhICD/99BOxsbE8//zz7Nq1i927dzNs2DAAEhIS8PLyAsDb25v4+Piq/f38KpfEbd68OW5uboSHhxMeHs64ceMoLrY+aFqr1fL666+zZs0azpw5Q3Z2dq3aarWavLw8vvnmmypblaqye1ZZWRkrVqxgzZo1XLlyxUKjLgS38yHjag6l1566nUxIZWDX9uQZCzFeG/h9JTuf8aNCGHffIDq386FPcABqG+vPIvKOJWLn2RyFptLWqVcAmXtOYOOkR3UtsXlPu7/KviQjG7WLA0o7TYP5LIrLx5MweLqivBYLt57+pEWexNZJX5XkVXZq+i+fzG9rvifzt7P4jOxpVbcsLhaVmxtcW4RXHdSR0qgjKOztUVzrNqSbOBWUleVF5eFJxaVLN63QACoyklE4uICq0l+lux/m1N/AVgcay5t3VYc+lP9+pM6xCG7dgowcI6XXbopOnr3CgEBv8gpLMBZXTg5wKdeEq0Ol/w5aW5QKqLAyWOZOx/jz9Vt4+pFphE2Yxd7dB+jes/LtVc/eXdm7+wAACoUCd4+WVfs8Nu5hXJu7sHrlGgLa++Prd/OJVu50LL7ZuJNZT80hInQRRyJ/oWP3DpXH6dmRw9fGhSkUCtzcW9zUrxvRlMqF6PMn6poWqX2nc7Lo8nY9ezft5rXx/2bVtNc5sfc4/t0CAAjoEciJvcdvW/9OIarea4qIKsdFJ+JRu7dAcW34ia5bB4z7olA6GlDqtZRfyiRjzltkr9lC9potAGT/Z7vVRiI0vTpVZB5Ki07C2cMV1TXt1j3aEb/3BFpHPbbXNU7/LlQI+t9Ykd8o1pERI0bw9ttv07dvX/r378/ixYvR6/U8+uijQGX3z7S0NJycnDh37hyBgdWDjxUKRdXf+fn5ODs7s3btWpKSkpg5cyY7duxAqVQiSRKXL19GrVbj4uJStc+MGTP45ptvcHd3r7F45vXazs7OuLi4MHbsWBwdHcnJyeHkyZPk5+cze/Zsjh8/jkajISEh4ZZ/v9ZWQ/iUf7H8P9txcTDQzrsVvTu1462NO3Aw6Jj80N2cSkzl5xPxdGjjSZ6pkLmTar4drY2KolISZq8lYOlESrPyMf6eRs7BGNrOe5KyXCPnVn+D0lZNwPLJFKdnovf3IHHeOszXdaP4q32ujV9PnGbHj5FkZmXz0bovGP/4w9jZ3vjNVm2Yi0s5NPc/9F38NMVZ+WTHnefioVh6hT9GSa6JU+/tYPDqZ3AJ8MTeewIAaq0tZ3f9enPhkhKMq99CP20GUl4u5SnJlJ2MRjc5DKkgn6KvNlGRk41hxkzMlzKw8WlDwYo6DIQuL6N07ybUIWOh0EhFZjoV5+NR938YqdhE+bEfAVA090TKuQJldRvzAKDV2PDK6H689s0RnPV2+Ldypre/O299F4WjzpZJgzvz0qjefP5zLKfOXSY9u4Dn7u2Bs97uprrCYgy89u9VvLLwBXz9WtPa14sl8ysX8G4f1I63P1zGsP4Pc8+IwUT8+0ViT8czbOQQnF2cmP/yq6Qmn/vLYwHw4fJPeOaVULzaeOHR2p13F38IQNsObZi3ai5PD50CQP9hfek3tA/efl48MW0smz7YfGPRJlguQMz5E1neRGmLyskgqLzdgM0rNvL43Kdp1cYdN++WfL503S1r1MadyPUiYyzKZ1G6osqxVFzCpQXv0WJeGObsfIoTUik8cormL03CnFdQ1ThUOTvg9NhIAJpNGUPu5u8pv2xl0pUmVqeKzENlxaV8HfEpoxaOx5SVz6X4NJIPx3LvnMcpyjOy/4MdAAye/hBOHq4E338X5nIzSQfq3pNEpuFQSNKdngPr74kkSQwePJiPP/4Yf39/wsPD8fT0ZNq0aQAkJyezdu1aWrduTWpqKqGhoXh5ebFw4ULi4uIICwtj+PDh5OTksGDBAoKCgsjJyaFt27aMGTOGqKgo1q1bh1arZfbs2bi5uVUde+XKlSQlJdGtWzcOHjyIk5MTCxYs4M0337TQBoiOjmb79u20bNmSjIwMJk6cSJs2bXj55ZcpKyujY8eOfPvttwQFBbFo0SLUanWtvxeg+OTOG35XXw4N/+yOa/5Bvx/HCdFVeXYQoruuy3whugCjg84L0dWNCLRuVA8Uvn5CdAE2PiumMvp34Skhugn/EVOOAYY+84MQ3f+9VPdu7LeCyHIRMFFMLpqnqzmrYmOnTVmZdaN6MN/mqhDd1jZihgesP75SiC7AgaC5QnQHxi4ToisSUXXfAN2t9ZSqK66+JutG9URUnfr5G2J8TrER9+5r2dn6zzr/VzLae5QQ3e1pO4To3i7yG8U6olAo2LdvX9X2n6ed9fPzY9mymgn71Vdftdh2dnbmnXdqrjXUq1cvevXqVeNzgFmzqqdo/2N2o9q0Abp160a3bjWnaF6xYkXV35MnT671ODIyMjIyMjIyMjIyMiA3FGVkZGRkZGRkZGRkZKxSl6Us/k7Ik9nIyMjIyMjIyMjIyMjIWCCPUZS5IXN9nmhoF26JNuVinnuI6pM/f1Xd1narD+WRB4Topv8gJhYix4BkpuqF6IryeWWChxBdgANll6wb1YOB6pbWjRoZQ4vELDGQcpNx3/80flKJuUYGm8Vc0zJ/DRNOLhaiW7x4hhDdkoTal6f4J2IbYC9M2/6dOz8vhghGed9v3age7EhrnL9f7noqIyMjIyMjIyMjIyNjBXkdRRkZGRkZGRkZGRkZGZl/NPIbRZk649evIx3v7YkxKx8kichV22rYdLqvN8NnP8bORRuI33uiwbXd+wfhO6InRdd0o9/abvF952fuR9vckaKrebh28uXYG1vJS85oMH8BfklKJzLmLC56LQoFhN1jOYttenYBb+6MIsjLlYSLWYzo4kdI0M0XbAdQteuMTee+SMY8kCRKf/iiho16YOW0z8pmbii0eoo3rbKqq+vbBft7+mLOzkOSJLLerX2Ka4dRIbivnE1Cl4eRCout6gKou3ZH028gUm4OkiRR9Pl6i+9t77kXu/segNLKhdaLf9xFSeTuBvNZlL8grszZO9kzbe5ULqZl4OnrwUfLPyEnM6eGnYePO9PnhWE2m4kIXdRg/orUdh7YiRYje1GamQcSpK7cavF9iwf70PzeHhTEnMOhix+Xtuwnc3e0VV1ReUiktkif/4ze0cDjc8ZxOe0yLX1bsXnFRvIz825Zp6nFoqnpitb+M5lZ2byzZgMJZ1LY/EnN2eLrgqh6D8Tl+6amC2Lj3NiQJ7P5h3D06FHi4uIa2g3WrVtX9fe+ffsYMmQIFy5caDiHboDaTsPopZPY+e/PiHz7v7QM9Mavb5CFjbNnc0zZBeRlWFmo9i/SVtlp6L98EkcWbST6zW24tPfCvZ+lro3Ojl8Wfc6p93eSuiuK3hGPN5i/AEWl5SzddoiXRt3FtGHdSMrI4WjSRQubdftO08XHjUmDOzMxJJiVO6OsC6ttsRv7LCXbP6b0+00o3X1QtbNc782m52CkIhNlB3ZQsn0tpfu+sSqrsLOl5aLpXHl1DZmrP8cuwBddn5rryGn8vNC09bbu5/XY2mKYMRPTR+9SuHEdNm38UHepufRLwbLF5M1+nrzZz9epUhPmsyB/QWyZC5szmWM/H2fje19w8MdDTJ8fVqtdUNf2HNl7tMH9FaWt1GoIXDGFxPnrSX1jK4YO3jgP6Ghho7LTcGbJJtLe+5azq7bjv+hpq7qi8pBIbZE+18bY2U/x28+n2PHBNo7/eJQnwyfcskZTi0VT0xWtXRvRp2MZMuAu6j2ThqB6DxCX75uaLoiNs0yD849tKEZFRTWKhuKGDRuq/g4JCcHDQ9xEFreDdzd/ctIzMZeWA3DuWCKBQ7pa2ORcuErKkd8bjbZbd3+MFzKpuKZ7+dckvO+2nEDm+BvVbwwUSiVlppIG8xfg9LkrtHI2oLFRAdDFpwUH49MsbFwMWnJMlW+3sk3FdPBsZlVX5RtIRfZVKK/02Zwah01QTwsbdY8QFHoD6oGj0Nz/NFJJkVVdbddAyi5eQSqr1C2M/h1DiOV6oAo7W1ymjCHzBm/tboS6fRDmy5fh2mLgZbExaHr1qWFn98BotGPGon1yPAp76wPtRfksyl8QW+b63H0XMccr9zv9awx9h/Su1W739kjKr8WsIf0Vpe3Yox3FF64iXdPNjUrAdailbsbm/ZSkVzY+db4tMSVaf6gnKg+J1Bbpc210HdKdpOgEABKOxdN1SPdb1mhqsWhquqK1a2PY4AHodLp67y+q3gNx+b6p6YLYODdGJEkS8r+x0ii7nr7zzjt8/fXXLF26lM6dOzN8+HAiIyOJiYlh+fLlREREoNfr+fTTT/Hx8SElJYUpU6bg6urKzJkzAQgMDOTgwYPMmDGDw4cP4+LiQnFxMW5ubvTv35+oqCjs7e1JT08nNDQUW1tbi+ObzWbUajVlZWVMmjSJmTNnolKp8PPz48SJE4wdO5bExER+//13Ro4cydixYwF47733KC8vp6KiArVazfTp02/4+a5du8jPz2f16tW0adOG++67D4Dvv/+e8+fPk5KSwocffojZbK46fkBAACdPnmTUqFE8+uijAKxatQqz2YxSqUSv1zN16lRSUlL46KOP8PPzIykpiWeeeQZJkmp85uvrW6dzYnB1oMRU3fWu2FiIezOf2z7XIrW1rg6UGauTUamxiGauDrXaKtUq/B8ZwKHwdVZ1RcYi21iEzrZ61kS9rYZso+VbkXEDOzJzwx7e2PELMeczCb3b+uypCoMjUklh9QfFhSgMjpY2zi1Q2Oko/eFLFM3d0U1bjGnpNJBuPNOpysWJClN1jCuMhahcLHWbv/A0We9/AXVsZFT54+SMVFTts1RoQuHkb2FTdvokpVFHkPLyUPfsjX34IvLnzLypriifRfkLYsucczMnCo2VfhcWmHBwdkClUmI213+G26aYLzSuDpiN1bpmYxFqV8cadko7Nb4vPoJzvw7ETlttVVdUHhKpLdLn2nBo5kjxtWuyyFiIwckepUpJxS2UwaYWi6amK1pbBKLqPRCX75uaLoiNc2Ok6Xl8ezTKhuL06dP59ttvCQ4OZu/evbi4uHDw4EG6d+/OoEGDCA4OZuzYsURERNCpUydOnTpFeHg4X375JaGhobz++uu89NJLTJgwgYqKCpYsWcLmzZtxc3MjOjoaX19fevXqhYeHBw8//HCN43/11VesX78ePz8/oqOjcXR0JDQ0lFWrVvHyyy8TFxfHs88+y549eygoKOCpp55i7NixHDx4kNOnT/PRRx8BMGXKFH7++WckSar185EjR/LGG2/w3HPPWRy/Q4cOTJ06lcWLF3Po0CGGDx9OaGgob775JrNmzSI7O5vx48fz6KOPcvDgQU6dOsWnn34KwLhx4+jfvz9Hjx7F1taWCRMmcPnyZWxtbdm1a1eNz+qKMTMfW71d1badQYcpK/+Wz+1fqV2UmY/aoK3a1hi0FGfW1FWqVfRfNpFjr31FwbkrDeYvVL4tLCwpq9o2lZTiYrCzsJn/1QFG9wxgRFc/so1FPLBiK9/NeRRH3Y3Pp2TMQ2F73ZNZO13lWILrKS7EfDax0v7qRbDToXB2Rcq+cUzM2bko9dUxVhp0mLOrdW1auqJyNGA/YkD1b5w4GtP+YxTHJN1QF0DKzUGhrfZZodMj5eZa2FRcrl7uoezkCRwWvQpKJVTcOJWL8lmUv3Dny9yDT93PwHv7U1RYRE5WLjqDDmO+CZ29nvyc/NtqJIrw96/QLs3MR3XdtaYyaCmrZZxcRXEZyUs2ofVxo9u2+RzuNQOp/MZLbYjKQyK1Rfr8B0OeGEbP4b0pLiwmPysPO72WwvxCtAYdxtyCW2okivRZ1v1rtEUgqt4Dcfm+qemC2DjLNDyNsuupUqkkJCSEn376idjYWJ5//nl27drF7t27GTZsGAAJCQl4eXkB4O3tTXx8fNX+fn5+ADRv3hw3NzfCw8MJDw9n3LhxFBdbn5Bi5cqVvPnmmzz22GNkZFQPwvb2rhyvZG9vj4eHB0qlEkdHR0wmUw2fAFq3bk18fPwNP78RfxzH2dm5ShvAx8cHABcXF4tjFhUVsWbNGtasWUPLli3Jzs7m0UcfxcXFhSeffJLVq1djY2NT62d1JS06CWcPV1Sayn1a92hH/N4TaB312F5XcdQHUdqXjydh8HRFeU3Xrac/aZEnsXXSV1V2Kjs1/ZdP5rc135P521l8Rva8maRQfwGCW7cgI8dI6bUbz5NnrzAg0Ju8whKMxZUDzC/lmnB1qEzKDlpblAqosNJtwZwaj9KlOVw75yrf9pTH/go6A9hV+lyeeAqlq1vlDnZaUCqR8mtOanI9RSfiUbu3QKGu1NV164BxXxRKRwNKvZbyS5lkzHmL7DVbyF6zBYDs/2y32kgEKIuLReXmBtfWpVMHdaQ06ggKe3sU17oj6SZOBWVlN12VhycVly5ZrdRE+SzKX7jzZe6bjTuZ9dQcIkIXcSTyFzp27wBAcM+OHL42DlGhUODm3uKWtUX4+1do5x1LxM6zOYpruk69AsjccwIbJz2qa7re06rXzyrJyEbt4oDSTnNTXVF5SKS2SJ//YO+m3bw2/t+smvY6J/Yex79bAAABPQI5sff4LWmJ9FnW/Wu0RSCq3gNx+b6p6YLYODdGJEH/GiuN8o0iwIgRI3j77bfp27cv/fv3Z/Hixej1+qruloGBgaSlpeHk5MS5c+cIDAys2lehUFT9nZ+fj7OzM2vXriUpKYmZM2eyY8cOlEolkiRx+fJl1Go1Li4uVfuYTCbee+89srKyePDBB6u6hFojMDCQqKjqiUXOnj3LkCFDkCSp1s+BKj/i4uLo0KFDDf+vp7bPAwMDOXnyJKGhoQAcOXKE1q1bc+rUKUJDQ3n++ed57bXX+OabbwgMDKzx2cSJE+v028qKS/k64lNGLRyPKSufS/FpJB+O5d45j1OUZ2T/BzsAGDz9IZw8XAm+/y7M5WaSDpxuMG1zcSmH5v6Hvoufpjgrn+y481w8FEuv8McoyTVx6r0dDF79DC4Bnth7TwBArbXl7K5fGywWWo0Nr4zux2vfHMFZb4d/K2d6+7vz1ndROOpsmTS4My+N6s3nP8dy6txl0rMLeO7eHjjr7W4uXFZC8VfvY/uv/0My5lFx8SzmxFPYPjARqbCA0j1bKd2zFdsHJ6K55xEUrq0o3vgWlJfdVFYqLuHSgvdoMS8Mc3Y+xQmpFB45RfOXJmHOK6hqaKmcHXB6bCQAzaaMIXfz95RftjLRSEkJxtVvoZ82Aykvl/KUZMpORqObHIZUkE/RV5uoyMnGMGMm5ksZ2Pi0oWDFUqsxFuazIH9BbJn7cPknPPNKKF5tvPBo7c67iz8EoG2HNsxbNZenh04BoP+wvvQb2gdvPy+emDaWTR9sbhB/RWlXFJWSMHstAUsnUpqVj/H3NHIOxtB23pOU5Ro5t/oblLZqApZPpjg9E72/B4nz1mE23nysjag8JFJbpM+1sXnFRh6f+zSt2rjj5t2Sz5euu2WNphaLpqYrWrs2fj1xmh0/RpKZlc1H675g/OMPY3cLPaFE1XuAuHzf1HRBbJxlGhyF1EhHUEqSxODBg/n444/x9/cnPDwcT09Ppk2bBkBycjJr166ldevWpKamEhoaipeXFwsXLiQuLo6wsDCGDx9OTk4OCxYsICgoiJycHNq2bcuYMWOIiopi3bp1aLVaZs+ejZubW9Wxn3vuOTp06EBxcTFarZZJkyZV6S5ZsoS9e/eyfft2Xn31VS5evMiyZctYvHgxI0aM4N1336WkpARJkrCzs6sao3ijz5csWYKNjQ1ms5mQkBDmzZvHgw8+yMMPP0x4eDiOjo4sWLCAN998k7i4OBYvXkxSUhLLli1jyZIlDB8+nPfff5+ioiJUKhUlJSW8+OKL/O9//+Pw4cN4enqSkpLCs88+S2xsbI3Prn/T+Wfm+jwh8AzfedqUi3lBnmIjpkf6/FXWxxbWl/LIA0J0038QEwtXX5N1o3qSmaoXoivK55UJ4ia0OlB2ybpRPRiobilEVyRDi27cXfR2SFGrrRv9Q/hJJeYaGWwWc03L/DVMOLlYiG7x4hlCdEsSCoToNkVsA+o2wU19sH9npzDtO8lQr+FCdPec/1GI7u3SaBuKMg2P3FCsRG4oViM3FKuRG4rVyA3FauSGYjVyQ1GmNuSGYtNFbijC3Z7DhOhGXqjjciR/MY1yjKKMjIyMjIyMjIyMjIxMw9FoxyjKyMjIyMjIyMjIyMg0Fioa8cQzIpAbijI3ZLwm17rRLZKeJ67bQucu6UJ0RXVdNL3/nRBdAMNrLwnRbX33rS+QXhek1GQhugCu3994huHGiKgukQAHBGV8kT6LYr7NVSG6/3vR37pRPShsYuUYIEVQN+oBumwhuiCuS3lT6wIP4nwW1UXUbv47QnTVF8TUewDSOTHaooaf2Nw9UIiuTONFbijKyMjIyMjI/OMR2ehqaohqJMrINHUa81IWIpDHKMrIyMjIyMjIyMjIyMhYIL9RlJGRkZGRkZGRkZGRsULFP2yxCLmhKFNndH27YH9PX8zZeUiSRNa7m2q1cxgVgvvK2SR0eRipsLhO2s4DO9FiZC9KM/NAgtSVWy2+b/FgH5rf24OCmHM4dPHj0pb9ZO6Otqqr7todTb+BSLk5SJJE0efrLb63vede7O57AEpLASj+cRclkdanKBYZC1E+/3I6kcio33BxNKAAwh6xXAso/Uo2H2z5ET9PN5IvXGbcfYMI8HG3rpuUTmTMWVz0WhQKCLunm6VudgFv7owiyMuVhItZjOjiR0hQa6u6AEqvQFRtu0JRAZIE5Uctp8/WDB2Hwql5tb2rJ8WbliLlZ91UV1SMRemCuGvE3smeaXOncjEtA09fDz5a/gk5mTk17Dx83Jk+Lwyz2UxE6KIG81ektqhYiCrH0PTKsl+/jnS8tyfGrHyQJCJXbath0+m+3gyf/Rg7F20gfu8Jq5p/ICovN7V6RGQeEuWzql1nbDr3RTLmgSRR+sMXNX/XwFEAKJu5odDqKd60qk4+/5nMrGzeWbOBhDMpbP6k/uMam1qdKjLGIu8DGhv/rGbi37yhePToURwcHGjfvn1Du3JLXLhwgfj4eIYOHdrQrlShsLOl5aLppI4MQyorx2N1OLo+nSk8csrCTuPnhaat9y1pK7UaAldM4ZeBs5BKy+n0yUycB3Qk52BMlY3KTsOZJZsoSc/C0NGHTh8/b/3Gz9YWw4yZ5IROgLIy7OctRt2lG2UnLfcrWLaYist1X19OZCxE+VxUUsqStVvZtnI2GrUNM1eu4+hvifTu1K7K5vX1XzNqUE/u7tWJpLQMXln9OVtef/HmuqXlLN12iP/O+hcaGxWzNkRyNOkivf2rK8N1+07TxceNcQM7Ep+eyUsbf6pbBWGjRnP3kxR/tgjM5Wju+z+UXoFUnK+e1MOcFod5z2eVGxo7NMMmWL+5FhRjYboIvEaAsDmTOfbzcfbu2E+/e/owfX4Y/56xrIZdUNf2HNl7lF6DejSov00tFsLKMTS5sqy20zB66STeGjYbc2k5T37wPH59g0g+HFtl4+zZHFN2AXkZdfj91yEsLze1ekRgHhLms9oWu7HPYlr2DJSXYzdpLqp2nTEnVuva9ByMVGSi/Ne9ACjdfW7J9+uJPh3LkAF3EZ+UUm+NJlenCoyx0PsAmQbnbz1GMSoqiri4uIZ245ZJT09nz549De2GBdqugZRdvIJUVg5AYfTvGEJ6Wdgo7GxxmTKGzBs8YbwRSvw0/QAAIABJREFUjj3aUXzhKlJppXZuVAKuQ7ta2GRs3k9JeuWNg863JabEC1Z11e2DMF++DGVlAJTFxqDp1aeGnd0Do9GOGYv2yfEo7K3PyioyFqJ8Pp14llbNndGoK58NdQnw5cAJy2vj3KVMWrk6AeDRwoXEtAxy8o031z13hVbOBjQ2qkpdnxYcjE+zsHExaMkxVT5RzjYV08GzmVV/AZSt/JDys8FcGeeKi8mofDtZ2JgTj1X9bRPUj/LYQ1Z1RcVYlC6Iu0YA+tx9FzHHK2feO/1rDH2H9K7Vbvf2SMqvlfmG9LepxUJUOYamV5a9u/mTk56J+dq5O3cskcAhlucu58JVUo7c+kyQovJyU6tHROYhUT6rfAOpyL4K5ZW65tQ4bIJ6Wv6uHiEo9AbUA0ehuf9ppJKiOuv/mWGDB6DT6eq9PzS9OlVkjEXeBzRGKpCE/G+sNNgbxXfeeYevv/6apUuX0rlzZ4YPH05kZCQxMTEsX76ciIgI9Ho9n376KT4+PqSkpDBlyhRcXV2ZOXMmAIGBgRw8eJAZM2Zw+PBhXFxcKC4uxs3Njf79+xMVFYW9vT3p6emEhoZia2tbdfzk5GTWrl2Ln58fiYmJjBgxgsGDB7N582bOnj2Lvb092dnZzJ07l/3797Ns2TJGjhxJZmYmZ8+eZfz48Rw6dIiEhARWrlyJo6MjS5Ys4ezZswwYMIArV66gVquJiIjAZDLxwgsv0KNHD1JTUxk1ahR9+/YF4N1336WsrAy1Wk1iYiJvvPEG27dvJy4ujtWrVzNy5EhWr15Neno6AwYMICYmho4dOzJjRuX00ps2bSI1NRVnZ2cKCgqYPXs2OTk5LF++HD8/P86fP89DDz1EmzZtanzWo0cdnoZfQ+XiRIWpOmlUGAtRuTha2DR/4Wmy3v8C6ngD9QcaVwfMxuquKWZjEWpXxxp2Sjs1vi8+gnO/DsROW21VV+HkjFRUWLUtFZpQOFlOXV92+iSlUUeQ8vJQ9+yNffgi8ufMvKmuyFiI8jk734jerrr8G7S2xOVZVlhdA3w5nXSODm28iDlzHgBTUQnODoYb6xqL0Nmqq7b1thqyjZZvAsYN7MjMDXt4Y8cvxJzPJPTuLjf19Q8UOnuksupyIZUWodTe6Cm1AlXrIMpPRFrXFRRjUbog7hoBcG7mRKGx0u/CAhMOzg6oVErM5oo67f9X+9vUYiGqHEPTK8sGVwdKTNWxKDYW4t7Mx9rPrBOi8nJTq0dE5iFhPhsckUqqfaa4EIXBUlfh3AKFnY7SH75E0dwd3bTFmJZOA6n+1+bt0NTqVJExFnkfINPwNFhDcfr06Xz77bcEBwezd+9eXFxcOHjwIN27d2fQoEEEBwczduxYIiIi6NSpE6dOnSI8PJwvv/yS0NBQXn/9dV566SUmTJhARUUFS5YsYfPmzbi5uREdHY2vry+9evXCw8ODhx9+uMbxX3nlFcLDwwkODubq1avExsaSnJzMxo0b2bFjBwALFixg69atjB07lt27d+Pp6ckLL7zA0qVL+f3331m0aBHr1q3jxx9/ZNKkSYwePZq3336bZ599FoApU6awb98+evfuzYQJE+jbty+5ublMnjyZvn37cvDgQU6dOsXHH38MwJYtW9BoNIwePRqA5557DoAXX3yRJ598sko3JCSEGTNmkJyczGeffcauXbtQKBTMmTOHyMjKG4y8vDzGjRtHSUkJubm5REdH1/jsVjBn56LUa6u2lQYd5uy8qm2blq6oHA3YjxhQ9ZnLxNGY9h+jOCbpptqlmfmoDHZV2yqDlrLMvBp2FcVlJC/ZhNbHjW7b5nO41wyk8huv3ybl5qDQVj81VOj0SH/63dd3vSk7eQKHRa+CUgkVN06MImMhymcXBwOm4pKqbWNRCS6OlpXVi08/wIad+/nsu/046LU42etwa1bzBtxC16ClsKSsattUUorLdecSYP5XBxjdM4ARXf3INhbxwIqtfDfnURx1tn+Ws0AqLEChrtZSaLRIRQW12qr8OmNOPX1TvSpdQTEWpQt3/hp58Kn7GXhvf4oKi8jJykVn0GHMN6Gz15Ofk39bDSMR/orUFh0LUeUYml5ZNmbmY6uvjoWdQYcpK/8mv7DuiMrLTa0eEZmHhPlszENhe90bPjtd5Ti66ykuxHw2sdL+6kWw06FwdkXKvnJTn0XR1OpUkTEWeR/QGGnMb/9E0GBdT5VKJSEhIfz000/Exsby/PPPs2vXLnbv3s2wYcMASEhIwMvLCwBvb2/i46vHdPj5+QHQvHlz3NzcCA8PJzw8nHHjxlFcbH3gdEJCAt7e3lUaISEhJCYm4uFRvUBw69atLY75h72Dg4PF3yZT9dpLf/j7x/5JSUlIksTRo0d57733+Oqrr8jJyanyoXXr6j7ajzzyyA399fLyQqVSoVKpUKsrn9wkJiaiVCr5+OOPWbNmDTY2NhiNRkJCQujZsyeTJ08mIiICGxubWj+7FYpOxKN2b4HiWjcLXbcOGPdFoXQ0oNRrKb+UScact8hes4XsNVsAyP7PdqsNI4C8Y4nYeTZHoanUduoVQOaeE9g46VEZKisl72n3V9mXZGSjdnFAaae5qW5ZXCwqNze4Fi91UEdKo46gsLdHca3biW7iVFBWdpdQeXhScemS1cpSZCxE+RzczoeMqzmUXnvKezIhlYFd25NnLMR4baKBK9n5jB8Vwrj7BtG5nQ99ggNQWyknwa1bkJFjpPTaDfjJs1cYEOhNXmEJxuLKiRIu5Zpwdaj03UFri1JRt1nDKjKSUTi4gKrSB6W7H+bU38BWBxrLSkjVoQ/lvx+xqgniYixKF+78NfLNxp3MemoOEaGLOBL5Cx27dwAguGdHDu89CoBCocDNvYVV3/4Kf0Vqi46FqHIMTa8sp0Un4ezhiurauWvdox3xe0+gddRja9DedF9riMrLTa0eEZmHRPlsTo1H6dIcrtU3Kt/2lMf+CjoD2FWWi/LEUyhd3Sp3sNOCUomUX3Oiqb+KplanioyxyPuAxogkSUL+N1ZUCxcuXNhQB9fr9XzyySe0bduWhx56iBUrVqDVahkzZgwA+/btIzg4GDc3NxISEoiPj2fMmDGkp6dbTPaSn59PYWEhYWFhdOrUiXnz5vHEE09w7NgxDAYDzZo1o7S0FK22uiK6Xvvy5ctERUXh5+fHli1beOKJJwDYvn07QUFBBAUFsWfPHtq3b4+npydRUVFVk+TExcVRUFBA7969SU9PZ/fu3VX+b9iwgZCQEI4cOcK5c+eYO3cuwcHBbNy4kfHjx1NYWMiBAwcYNapylqmtW7fStm3bqjecAwYMIC0tDZVKRWRkZNWb0fXr1zN+/HiUSiV79+5lxYoVdO/eHTc3N9zc3MjMzKRLly6MHz+e3Nxc9u3bR4sWLWp8NnDgwJuen8zVn1dvlJspST6Py+SH0XYOpPxqNvnb9uA64yls2/lQdG1cj8rZAZeJo9H36QzlZkrPplt0VSkoqfn0SCo3U5iYTutpo3Do7k/p5VwyvtxHm5cexdDei7yoBJz7B+H2UD8M7b1xfyyE9I2R5B9LtNBp2fJPff/NZszn09D+ayzqwA5UZGVR8r8f0I2bhI2PL+Wxv6Hy8cVu+EhUPm2w7T8I0ycfUZF51UKmMPdPN5h3KBY65zJqcId81tzTz2JbbaPC16MFG3bu57ekNJo7O/DQ4N588NUPnDl/iW6Bbdh/PJb1O/aRfiWLU0lnee6xkdhp1BY65FkeR61S4tvCic8O/MZvaVdo7qDjoZ7t+GB3NMmXc+jq25I2LZz44tDvpGXl892JM9zfrS3d27Sy1M2tpTKqqKAi+xI23e9B1bINkikP8++HUfcZhbKZOxUXkwFQNPdEqXem4uxvNTWAsjOZQmJcgzuke/lSzW5Jd+oa+UlZWEP7t2OxPPTUKNp28KNTj468v2QNxYXF+Af5sWTNQrZv+BaA/sP6MuT+EFr7eaPVa/ntWPUEJIMrLBfnvlP+1kZjj8XT/f409kZUOYZGX5aPZDlYhqLczJUz6QyYeh9eXdpScCWX41sPMPSFMbQM8OTctXM0ePpDtOnTATuDltKiUrLPXbbQ6aKq5WHwHcjLInPyX1aPiPL3Dvrs0FZhqVthpuLyeTRDRqPyCUDKz6H86B5sRzyJqlVrzCm/Y05LQt1zCCp3H2y6D6L0f1uQLluOPbYZNKKmz7Xw64nT7PgxkoSkFIpLSujYvt1NH55L+TWvl8Zep1aknrM8zh2KsbJNzQlo7pTP6sB+NbQbI2tXrhOiO2XWBCG6t4tCasBmrCRJDB48mI8//hh/f3/Cw8Px9PRk2rRpQPU4wtatW5OamkpoaCheXl4sXLiQuLg4wsLCGD58ODk5OSxYsICgoCBycnJo27YtY8aMISoqinXr1qHVapk9ezZubm5Vx/5D29fXl0uXLvF///d/uLm5sXnzZs6cOYO9vT35+fnMnTuX2NhYFixYQPv27Zk6dSoLFizA0dGRmTNn8uabb5KXl8eiRYu4cuUKH3zwAX379iUtLQ21Ws38+fNJSUlh3rx5dO7cGScnJ9auXcuSJUsYPnw47777LiUlJdja2uLk5MRTTz1Fbm4u/+///T98fHwYMmQI0dHR7Nixg6VLl2I0GnnllVd4+eWXGTNmDF999RXJycno9Xpyc3OZNWsWcXFxbNmyBT8/P86dO8fYsWMpLS2t8VlwcPBNz098u5F3/Jyn59Vt0Hx96Nzl1mZwqyuZqXrrRvXA1ddk3aieGF57SYiudO7WJ5mok25qshBdgMLv460bNSJOnWwpTHu+jZUGQj1ZXN7culEjQ1Qs/veSv3WjetDUyjHAygQP60b1YLzm1oZO1BWRObmp1SOi/AXwuFdMZza7+fVf6uJmmC+IqfdAXJ1aHnlAiK7N3Td/wXA7aB+cLUz7TtLLfZAQ3aiL+4Xo3i4N2lD8u3H06FG2b9/O8uXLG9qVO4LcUKykqVXwIDcUr6ep3WDLDcW/BrmhKB65oVhNU6tH5IZiNXJDsRq5ofjPayj+rZfH+CsxGo188803JCQkcOzYMes7yMjIyMjIyMjIyMg0GSRB/xorDTbr6d8Ng8HAq6++2tBuyMjIyMjIyMjIyMgI4J/WEVNuKMrcEBFdWtJPiut6ahsgRtuV2qeyv11Edu3RC+rOomjdQYiuSDJTzwvRFdWN2sNRTHkDQFDPOqE+i0JQLMwp6UJ0dSMCheiCuG6t56T6L4reEIiqQwBIFbPeX1Or9wBKEsToqgV1EVV5iqv3br4YUP0pSfhOiK6qjbhhIjKNE7mhKCMjIyMjIyMjIyMjYwV5HUUZGRkZGRkZGRkZGRmZfzTyG0WZOqPu2h1Nv4FIuTlIkkTR5+stvre9517s7nsASisXWC3+cRclkbvrpO08sBMtRvaiNDMPJEhdudXi+xYP9qH5vT0oiDmHQxc/Lm3ZT+buaKu6qnadsencF8mYB5JE6Q9f1PxdAyvXsVQ2c0Oh1VO8aZVVXZGx0PXtgv09fTFn5yFJElnvbqrVzmFUCO4rZ5PQ5WGkwlrWFfsTvySlExlzFhe9FoUCwu7pZvF9enYBb+6MIsjLlYSLWYzo4kdIUM01k2ronk4kMuo3XBwNKICwR4Zb6l7J5oMtP+Ln6UbyhcuMu28QAT7uVnVF+iwqxqLKsUif7Z3smTZ3KhfTMvD09eCj5Z+Qk1lzXUsPH3emzwvDbDYTEbqowfwVqS0qFqLyEIDSKxBV265QVIAkQfnRnRbfa4aOQ+FUPSut0tWT4k1LkfKzbqorMsf9mY79guk54i7yMyvP57ZVX9VLR1S5EHX+mpq/IK5ciNIVWT/9mcysbN5Zs4GEMyls/qT+M7CK8lnkNS0qDzVG5DGKMjU4evQoDg4OtG/fvqFduSlGo5GwsDA2btwIwLZt2xg6dCgODg5W9qwDtrYYZswkJ3QClJVhP28x6i7dKDtpeZNbsGwxFZdvbZkKpVZD4Iop/DJwFlJpOZ0+mYnzgI7kHIypslHZaTizZBMl6VkYOvrQ6ePnrd9gq22xG/sspmXPQHk5dpPmomrXGXPiqSoTm56DkYpMlP+6t9IXdx/rDguMhcLOlpaLppM6MgyprByP1eHo+nSm8MgpCzuNnxeatt511i0qLWfptkP8d9a/0NiomLUhkqNJF+ntX125rNt3mi4+bowb2JH49Exe2viT1UZXUUkpS9ZuZdvK2WjUNsxcuY6jvyXSu1O7KpvX13/NqEE9ubtXJ5LSMnhl9edsef3FBvNZVIyFlWOBPgOEzZnMsZ+Ps3fHfvrd04fp88P494xlNeyCurbnyN6j9BrUo0H9bWqxEJaHAGzUaO5+kuLPFoG5HM19/4fSK5CK89VjDs1pcZj3fFa5obFDM2yC9ZszgTnuz2jsNEx6NYzZ98ygvLSc5z+cTVC/TsQe+u2WdISVC0Hnr6n5C4grF4J0RdZPtRF9OpYhA+4iPimlXvsL9VnkNS0qD8k0CuSup3UgKiqKuLi4hnbDKgaDgc8++6xqe/v27eTn598RbXX7IMyXL0NZGQBlsTFoevWpYWf3wGi0Y8aifXI8Cvu6DbJ37NGO4gtXkUrLAciNSsB1aFcLm4zN+ylJr0wqOt+WmBIvWNVV+QZSkX0Vyit1zalx2AT1tPxdPUJQ6A2oB45Cc//TSCXWJ18QGQtt10DKLl5BKqv0uTD6dwwhvSxsFHa2uEwZQ+YNnj7XxulzV2jlbEBjowKgi08LDsanWdi4GLTkmCqfVmebiung2cy6buJZWjV3RqOufObUJcCXAycsr5VzlzJp5eoEgEcLFxLTMsjJNzaYz6JiLKoci/QZoM/ddxFzvHISiNO/xtB3SO9a7XZvj6T82vEb0t+mFgtReQhA2coPKT8bzJXaFReTUfl2srAxJ1Yv12QT1I/y2ENWdUXmuD/j3z2AzPSrlF+7bhKPxdN1SB0a4H9CVLkQdf6amr8grlyI0hVZP9XGsMED0Ol09dpXtM8ir2lReaixUoEk5H9jpcm/UXznnXf4+uuvWbp0KZ07d2b48OFERkYSExPD8uXLiYiIQK/X8+mnn+Lj40NKSgpTpkzB1dWVmTNnAhAYGMjBgweZMWMGhw8fxsXFheLiYtzc3Ojfvz9RUVHY29uTnp5OaGgotra2VcdPTk5m7dq1+Pn5kZiYyIgRIxg8eDCbN2/m7Nmz2Nvbk52dzdy5c9m/fz/Lli1jxIgRGI1Gfv/9d9544w08PT25fPkyb7/9Nn5+fqSlpdGpUyceeeQRIiIicHNzo7CwkObNmzNp0iQ2bNjAqlWrWLFiBX379uWFF16gbdu2tG3bliVLlnDs2DF+/vln0tPTWb9+PW3atOHs2bPs3r2bFStW4O3tzQsvvMCoUaN4/PHH6xRnhZMzUlFh1bZUaELhZLmwdNnpk5RGHUHKy0Pdszf24YvInzPTqrbG1QGzsbo7jdlYhNrVsYad0k6N74uP4NyvA7HTVlv32eCIVFLtM8WFKAyWugrnFijsdJT+8CWK5u7opi3GtHQaSDeeoU5kLFQuTlSYqivtCmMhKhdLn5u/8DRZ738BdbxRBcg2FqGzVVdt6201ZBstn+aNG9iRmRv28MaOX4g5n0no3V2s6+Yb0dtVXw8GrS1xeZYVVtcAX04nnaNDGy9izlTOQGoqKsHZwdAgPouKsahyLNJnAOdmThQaK8tzYYEJB2cHVColZnP9Z2kU6W9Ti4WoPASg0NkjlVWXOam0CKX2Rm+fFKhaB1F+ItK6zwJz3J9xaOZIsbH6fBYaC/Fp1uaWdUSVC1Hnr6n5C+LKhShdkfWTKET5LPKaFpWHGiuNec1DETT5huL06dP59ttvCQ4OZu/evbi4uHDw4EG6d+/OoEGDCA4OZuzYsURERPx/9s48Lqp6///PmWGG2UBBcJQdcUFRzP1qLmikabmVS2VmmZF2Dc3KNCzTNLey/d7fNbtXS725fNOya8YV00xNcskFWQxxCYVAlmGAYWBmfn8MgSPIoPJJuZ1njx4PjvOe13nP+7zP53M+57zP50OnTp04fvw4cXFxfP7558TExLBixQpeeuklnnjiCWw2G4sWLWLjxo0YDAaOHj1KaGgoPXv2xN/fnwcffLDG/l955RXi4uKIjIwkJyeHpKQk0tPTWbduHdu3bwdg/vz5bNmyhfHjxxMfH4+fnx8PP/wwq1evJj4+nsmTJ7Ns2TKio6MZNmwYFouFb775BoCoqCiio6MBGDlyJOPGjePxxx9n586dGAwGNBoNLVq0YNasWcjlct5/31EX37dvX/z9/Zk0aRIBAQHYbDb27NlDUFAQzZs3p23btvUeJALYC/KRaarvlMm0OuwFBU42V5crlP98DM8Fb4JcDra6Ox9LrhGFXl21rdBrKM8trGFnM5eTvmgDmhADXb94jQM9Y7FXXH9yabupEJn7VXf31FrHOxtXYy7Bei7NYZ9zCdRaZF4+2PN+u76uwFhY8wqQ6zRV23K9Fmtetc9uLXxQNNHjMbRf1b95Pzma4r2HMZ86c11db72GkrLyqu3iMgveV8Uc4LVN3zO6RzuGdgkjz1TKiOVb+M+ccTTRul8rV63rqafYXFa1bSotw7uJc2f14uMj+PTrvXz2n7146jQ09dBiaFZzAPVH+SwqxqLyWITPIx97gP739aW0pJT8KwVo9VpMxmK0HjqM+cZbGhiJ8FektuhYiGqHAOwlRciU1TknU2mwl9a+rIEirDPWjBP181lgG3ctxiuFqPXVx1Or12K8UvO8cYWonBN1/BqbvyAuL0TpiuyfRCHKZ5HntKh2SOLOoNGXnsrlcqKiovjuu+9ISkpi5syZ7Nixg/j4eAYPHgxAamoqgYGBAAQFBZGSUl03HRYWBoCvry8Gg4G4uDji4uKYOHEiZrPrl8ZTU1MJCgqq0oiKiiItLQ1/f/8qm+DgYKd9hoSEAODt7U1xcXGVTnCw490qlUrFyJEjAcjJyWHlypWsWrUKk8lEQeWJ/dhjj7Fu3TrS09MJDQ1FLq/7UMrlcsaPH8+GDRvYu3cv/fv3d/nbrqY8OQmFwQBKxxMeZURHLIkHkXl4IKsstdA++TTIHWWCCv8AbFlZ9bpoKDychjrAF5nKcd+iac925O46hltTHYrKC4igaQ9U2ZddzkPp7YlcrapT15qRgtzbF9wcuorQ9lQk/QRaPagduhVpx5H7GBxfUGtALsdurDl5xR8Vi9JjKSj9miOrLDvRdu2AaU8i8iZ65DoNFVm5XJ7zDnmrNpO3ajMAef/a6vLiOjK4OZfzTVgqByQ/n/uNfuFBFJaUYTI7XlzPKijGx9Phv6fGHbkMbC5e2o5sG8LlnHwslXe8f07NoH+X9hSaSjBVTrrwW56RScOjmHj/ADq3DaF3ZDuUbq7vUYnyWVSMReWxCJ+/XPc1Lzw2h3kxCziY8CMduznWCYvs0ZEDuw8BIJPJMPg1d+nbH+FvY46FqHYIwHY5HZmnNygc2nK/MKwZJ8FdCyrnmyqKDr2pOH2wXj6LbOOu5cyRVHz8fXGrPG/adg/n2O7DLr5VE1E5J+r4NTZ/QVxeiNIV2T+JQpTPIs9pUe3QnYrNbhfy/52K4vXXX3/9djtxq+h0Oj755BNat27NqFGjWL58ORqNhjFjxgCwZ88eIiMjMRgMpKamkpKSwpgxY8jMzCQlJaXqiZ3RaKSkpISpU6fSqVMnXn31VR599FEOHz6MXq+nWbNmWCwWNJrqu4BXa2dnZ5OYmEhYWBibN2/m0UcfBRzvCkZERBAREcGuXbto3749AQEBJCcnU1RURK9evTh69CjNmjWjTZs2mM1mvv7aMWPUokWL+H//7//RrVs3vvnmGwYPHoynpychISGsWLGCnJwcnn766apy2LVr1zJp0iQAvvzySwYNGkRWVhZeXl60adOGN954g7KyMiZPnoxMJqszriXr1lRvWK1YL15A89B4lOEdsF25Qtl/d6KdOBm3kFAqkk6iCAlFPWQYipBWuPcdQPEn/8CWm+OkmZ1VszTCXmGlJC2T4GnD8ezWBkt2AZc/30Orl8ahbx9IYWIqXn0jMIy6G337IPwejiJzXQLGw2lOOv4dy52FbVZs2RdRDRqNIqQddmM+FYd24T50AoqWwVjPnsZ64QzKHoNQ+IXg1m0Alv9uxp7t/N6Y9YrFWbeBYlFSUMsAocJKWfpFvJ96EE3ncCpy8jB+sQuf2MdwbxtCaeX7UwovT7yfHI2ud2eosGI5l+lUxtR0cIiTrFIhJ7R5Uz77/iQnL/yGr6eWUT3a8vf4o6Rn59MltAWtmjfl3/tPc+GKkf8c+4UHuramW6uWTjpXz1oGoHRTEOrfnE+/3svJMxfw9fJk1MBe/H3TTn65mEXX8FbsPZLE2u17yPztCsfPnOO5h4ehVimddCh0jk1D+lz433NCYlxU5vzUsqHy2FN9Tb41oM9f1VLCdvJwEqMeG07rDmF06t6Rvy1ahbnETJuIMBatep2tn34FQN/BfRj0QBTBYUFodBpOHk6q0hipuuYipYH8rZU7PBaPdbgmtxuoHZJ71TIpmc2GLS8Lt273omjRCntxIdbTB1D2Ho68mR+2S47FsWW+Ach1XtjO1T5BTPkvuc7/0EBt3De5GlxhrbCS+cuv3P/0SFp3aUtBdh7fb/6uzu8McKvlCXwD5IVn61r6xAY6fkW/XHMh2EB5XMNnUf0eNFheiNJV3Xu303ZD9U9yT+d+73r8dOwE279NIPXMWcxlZXRs3xY3FwM4u9H5NzSUz5ZdB5x31EAxVrbxqfkjGqgdUv5leJ2xulP421urheg++9IUIbq3isz+PzDPq91uZ+DAgXz88ce0adOGuLg4AgICmDZtGlD9HmFwcDAZGRnExMQQGBjI66+/TnJyMlOnTmXIkCHk5+czf/58IiIiyM/Pp3Xr1owZM4bExETWrFmDRqNXNa1DAAAgAElEQVRh9uzZGAyGqn3/rh0aGkpWVhbPPPMMBoOBjRs38ssvv+Dh4YHRaGTu3LkkJSUxf/582rdvz8yZM3njjTcoLCzkjTfeQK1W8+677xISEkJOTg5jx44lODiY2NhYfH19adWqFevWrWPUqFHMmDEDgA8//JDc3Fx+H+t/9dVXvPHGG8yaNYtHHnmE1atXc+HCBcrKyli2bBkACxYsICQkpGowWRe5QwY08JGC4z+3aHDN3+k5vliIbllq7SUUt0puhk6ILkDwiighurLgDkJ07edPC9EFOP/SHiG6mYU3N3mHK/ybiMk3gKeK67f8xI3yiU7t2ugOQ1Qsdj50a5NZXA9FK3/XRjdJyTcpro1ugtgUbyG6r7mXuTa6CfzvE1dklbnz1kqYr4con0X1eyLRL3tJiK4iQEy/B2D9VUzfZ3p5hRBd7dBwIboA2pn/EKbdkEQYap/k7FZJyj4kRPdW+Z8YKEq4xmKxoFKpWLFiBdOmTUOvd/2itjRQdCANFKuRBorVSAPFaqSBYjXSQLEaaaBYjTRQFI80UKxGGiiKo33znq6NboLk3xKF6N4qjX4yG4n6sWrVKiwWCy1btqzXIFFCQkJCQkJCQkJC4s+LNFD8kzB9+vTb7YKEhISEhISEhIREo0VaHkNCohIhJQY/F7i2uUnc7rmxmVzri6JVuhBdH0HlXo0RUSWtDvYIUT2rVLo2ugk6h4opoQbglEKIbKMsfxMUC2Gl6jsvCtEF8L9PTBk1gpq4fSViSlr5Ah77KFKM9s49QmQbW78H4kqdRb3CUPfiRbeGyLJWEYg6dgDamcKkJW4BaaAoISEhISEh8adH2CBRQkLif4Y7eSkLEUgDRQkJCQkJCQkJCQkJCRf82UpPxU3xJSEhISEhISEhISEhIdEokZ4oStQbeWA4itZdoLQIux0qDn3t9LkqeqLTguxynwDMGxZjN15xqe3VvxPNh/XEklsIdsh4e4vT581H9sb3vu4UnTqP511hZG3eS278UZe6P57JJOHUObx1GmQymHpvV6fPM/OKWPl1IhGBPqReusLQu8KIigh2qSsyFsou3VDd3R97QT52u53S9WudPne/9z7U948Ai2NBZPO3OyhLiHepKyoWP55IIyHxJN5N9MiAqWOHOOv+lsffN39LWICB9F+zmXj/ANqF+LnUFamt7XMXHvf2wZpXiN1u58qHG2q18xwehd/bs0m960HsJa6XU/DrG0Ho0B6UXjGC3c7Rd7Y6fd752QfQ+DahNKcQn06hHH5rC4Xpl13qgri88GjqwbS5T3PpwmUCQv35x9JPyM/Nr2HnH+LH9FenYrVamRezwKWuom1n3Dr3wW4qBLsdy85/1/xN/R0LLMubGZBpdJg3vOdSFxpfLET5C+JyWeTxuxZdEz2PzJlI9oVsWoS2ZOPydRhzC29YR9T5J6rtFHXsRPkL4vq+xtbvgdi+72pyr+Tx/qpPSf3lLBs/ef+Gv/87Itshkdp3GlLpaSPg0KFDeHp60r59+9vtym3hvffeo2PHjtxzzz112n3xxRdER0fj6el56zt1U6K6ZwLmzxaAtQLV/c8gDwzHdrH6xWbrhWSsuz5zbKjUqAY/Ua+BkVyjInz5FH7s/wJ2SwWdPpmFV7+O5O87VWWjUKv4ZdEGyjKvoO8YQqePZ7ocKJZaKlj8xX7+74WHULkpeOHTBA6duUSvNtUN9Zo9J7grxMDE/h1JyczlpXXfue4kBMYCd3f0sbPIj3kCysvxeHUhyru6Uv6z828tWrIQW3aWa71KRMWitMzCotVb+OLt2aiUbsx6ew2HTqbRq1PbKpsVa7cxfEAP7unZiTMXLvPKB+vZvOJF1z4L0pap3WmxYDoZw6ZiL6/A/4M4tL07U3LwuJOdKiwQVesgl37+jkKtou/SyWwZ9DI2SwXRq2LxuzuCS/uTqmzctGp+XLAegFbDe9Fr3iPEP7nStbigvACYOucpDv9whN3b93L3vb2Z/tpU3ohdUsMuokt7Du4+RM8B3V2LKt1Rj/8rxUuehYoK1JPnomjbGWtadYzdegzEXlpMxU+7AZD7hdTP4cYWC4H+isplocevFsbPfoyTPxzn0H8O0PWe7kyIe4K/P39jg05R55+otlPUsRPW74G4vq+R9Xsgtu+7lqMnkhjU7y+knDl7w9+tQmA7JFRb4rbTKEtPExMTSU5Ovt1u3DZiY2NdDhIBtm7ditFobJB9yluGYTfmgbUCANuldBShnZxsrGmHq/52i7ibiqT99dJu0r0t5l9zsFsc2gWJqfhEd3GyubxxL2WZjs5GG9qC4rRfXeqeOP8bLb30qNwcsxveFdKcfSkXnGy89RryKxfezis20yGgmUtdkbFQto/Amp0N5eUAlCedQtWzdw079YjRaMaMRzNhEjIP1zMWiorFibRztPT1QqV03HO6q10o3x9zPjfPZ+XS0qcpAP7NvUm7cJl8o+m2aWu6hFN+6Tfs5Y7jV3L0NPoo5wV0ZWp3vKeMIfc6d/hrw9CtDaZfc7FV5nH2T2cIuucuJ5sjb1U/KZfJ5ZQX12/hcFF5AdD7nr9w6ohjtsATP52iz6BetdrFb02gojJmrlCEhmPLy4EKh701Ixm3iB7Ov6l7FDKdHmX/4ageeBx7WWm9tBtbLET6KyqXRR6/2ugyqBtnjqYCkHo4hS6Dut2whqjzT1TbKerYifIXxPV9ja3fA7F937UMHtgPrVZ7w9+7GpHtkEjtOxG7oP/uVBr8ieL777/Ptm3bWLx4MZ07d2bIkCEkJCRw6tQpli5dyrx589DpdPzzn/8kJCSEs2fPMmXKFHx8fJg1axYA4eHh7Nu3j9jYWA4cOIC3tzdmsxmDwUDfvn1JTEzEw8ODzMxMYmJicHd3r9p/eno6q1evJiwsjLS0NIYOHcrAgQPZuHEj586dw8PDg7y8PObOncvevXtZsmQJQ4cOxWQycfr0ad566y0CAgLIzs7m3XffJSwsjAsXLtCpUyfGjh3LvHnzMBgMlJSU4Ovry+TJk/n000957733WL58OX369OH555+ndevWzJw5kzfffBNvb29MJhPh4eGMGjXKKV7btm1j0aJFPPPMMxQXF5OSkkJcXByBgYEcOXKEbdu2ERQUxNmzZ5k5cyZWq5VFixbRvn17nnvuOWbOnElmZib9+vXj1KlTdOzYkdjYWH744QcyMzNZu3YtrVq1YsiQISxdupSwsDAuXrzIqFGj6N69HnfDK5FpPbCXV5e82C2lyDXXu8spQxEcQcWxhHppq3w8sZqqta2mUpQ+TWrYydVKQl8ci9fdHUia9oFL3TxTKVr36iUMdO4q8kzOdzYn9u/IrE938db2Hzl1MZeYay4sakNkLGRNvbCXllRrlxQja9rGyab8xM9YEg9iLyxE2aMXHnELMM6ZVaeuqFjkGU3o1NXnn17jTnKhc0fYpV0oJ86cp0OrQE794pjiv7i0DC9P/W3RVng3xVZcfVFrM5Wg8HbON9/nH+fK3/4N9RwMAGh8PCk3VetaTKU086n9ab5cqaDN2H7sj1tTL21ReQHg1awpJSaHdklRMZ5enigUcqxWW718q9VffRPsZdX+Yi5BpneOscyrOTK1FsvOz5H5+qGdtpDixdPAXvd+G10sBPorKpdFHr/a8GzWBHPl7yg1laBv6oFcIcd2A3EXdf6JajtFHTtR/oK4vq+x9Xsgtu8Tgch2SKS2xO2nwQeK06dP56uvviIyMpLdu3fj7e3Nvn376NatGwMGDCAyMpLx48czb948OnXqxPHjx4mLi+Pzzz8nJiaGFStW8NJLL/HEE09gs9lYtGgRGzduxGAwcPToUUJDQ+nZsyf+/v48+OCDNfb/yiuvEBcXR2RkJDk5OSQlJZGens66devYvn07APPnz2fLli2MHz+e+Ph4/Pz8ePjhh1m9ejXx8fFMnjyZZcuWER0dzbBhw7BYLHzzzTcAREVFER0dDcDIkSMZN24cjz/+ODt37sRgMKDRaGjRogWzZs1i06ZNlJeXM336dOx2O0OHDqVfv340a1Z9x2rUqFG8//77DB48mODgYHbs2MGKFSt47733eP7559m2bRve3t7s2LGD5cuX8/bbbxMdHU1mZiYAL774IhMmTOCvf/1rlX+xsbH07dsXf39/Jk2aREBAALt27aKwsJCJEydSVlZGQcGNrWdoLylCplRXbctUGuylta8XpgjrjDXjRL21LblGFPpqbYVeQ3kt76fYzOWkL9qAJsRA1y9e40DPWOwV11/hyFuvoaSsvGq7uMyC91X7AXht0/eM7tGOoV3CyDOVMmL5Fv4zZxxNtO7XylUhMhb2gnxkmuo7hzKtDvs1x+rq0o3yn4/hueBNkMvBdv2LKlGx8PbUU2yuvitvKi3Du4lzJ/ji4yP49Ou9fPafvXjqNDT10GJoVvNGwB+lbc0rQK7TVG3L9VqsedX55tbCB0UTPR5D+1X78uRoivcexnzqzHV1S3ONKPXVuiq9BnNuzSf6cqWCvkue5PCyTRSd/61OX3+nofNi5GMP0P++vpSWlJJ/pQCtXovJWIzWQ4cx33hLAyMAu6kQmftVd8DVWse7bldjLsF6Ls1hn3MJ1FpkXj7Y8+qOSaOLhaBzGsTlssjj9zuDHh1MjyG9MJeYMV4pRK3TUGIsQaPXYioouqFBIog7/0S1naKOnSh/QVzf19j6PRDb94lAZDskUvtOxH4TN8NEU1BQwNtvv01gYCDnzp1j1qxZ+Pj4ONmcOHGCtWvX0qFDBzIyMoiMjGTcuHEutRu89FQulxMVFcV3331HUlISM2fOZMeOHcTHxzN48GAAUlNTCQwMBCAoKIiUlOr69rCwMAB8fX0xGAzExcURFxfHxIkTMZtdv8SdmppKUFBQlUZUVBRpaWn4+/tX2QQHBzvtMyQkBABvb2+Ki4urdIKDHXXqKpWKkSNHApCTk8PKlStZtWoVJpOpasD12GOPsW7dOtLT0wkNDUUul5OamkpOTg6rVq3i448/pm3btuTk5NTq99Xx+OWXX8jPz8dkMuHt7V1rnK79rkKhQKFQoLzOIuBRUVH06NGDp556innz5uHmdmP3CGyX05F5eoPC8T25XxjWjJPgrgWVc8Or6NCbitMH661deDgNdYAvMpVDu2nPduTuOoZbUx2Kyo4/aNoDVfZll/NQensiV6vq1I0Mbs7lfBOWysHkz+d+o194EIUlZZjMjheqswqK8fF0NHCeGnfkMtcvKouMRXlyEgqDASqPozKiI5bEg8g8PJBVlp5on3wa5I5SGoV/ALasLJeNrahYRLYN4XJOPpbKO94/p2bQv0t7Ck0lmConXfgtz8ik4VFMvH8AnduG0DuyHcp65J8o7dJjKSj9miOrLBnSdu2AaU8i8iZ65DoNFVm5XJ7zDnmrNpO3ajMAef/aWufFGUD2kTPoA3yQV+axoUcbLiT8jHtTXdUFrEKtpO/Spzi56htyT54jZFiPuiSraOi8+HLd17zw2BzmxSzgYMKPdOzmWPQ5skdHDuw+BIBMJsPg17xe/l2LNSMFubcvVB4LRWh7KpJ+Aq0e1I5YVKQdR+5jcHxBrQG5HLux5sQxjT0Wos5pEJfLIo/f7+zeEM+ySW/w3rQVHNt9hDZd2wHQrns4x3YfqbfO74g6/0S1naKOnSh/QVzf19j6PRDb94lAZDskUvtOxIZdyP+3wsqVK+nduzcxMTFER0ezbNmyGjY5OTlMmjSJp556ivnz57NixQry8vJcagvJ2KFDh/Luu+/Sp08f+vbty8KFC9HpdFUj1/DwcC5cuEDTpk05f/484eHhVd+VyWRVfxuNRry8vFi9ejVnzpxh1qxZbN++Hblcjt1uJzs7G6VSWTWYulY7OzubpKQk2rZtW/UEDuDcuXN07Nix1n1eqxMREYHZbGbnzp2Eh4ezevVqEhIcpRS7d++usr/33ntZuXIlCoWCOXPmVGmoVCpiYmIA+O9//0tAQECtMbt48SLBwcGcO3eO1q1b4+XlhYeHB1euXKFZs2Y14nQ1tfkPVMUpNTWV8vJyhg8fzpQpU1i/fj1r165l3rx5tX6vVirKsezegDJqPJSYsOVmYruYgrLvg9jNxVQc/tbhi28A9vzfoLx+714B2EotpM5eTbvFT2K5YsR0+gL5+07R+tUJlBeYOP/Bl8jdlbRb+hTmzFx0bfxJe3UNVlPd78RoVG68Mvpuln15EC+dmjYtvejVxo93/pNIE607kwd25qXhvVj/QxLHz2eTmVfEc/d1x0unrlNXZCwoK8P0wTvopsViLyyg4mw65T8fRfvUVOxFRko3bcCWn4c+dhbWrMu4hbSiaPlil7KiYqFxVxE35SGW/msr3p562ga1pFentryzbjueei1PjbqH42kZ/HAshQ6tAigsLmHu5JqVAH+ktt1cRtb8j2j+6lSseUbMqRmUHDyO70uTsRYWVV2UKbw8afrwMACaTRlDwcZvqMi+/qQMVrOF/XP/RZ+Fj2O+YiQv+SKX9ifRM+5hygqKOf7RdgZ+8Cze7QLwCHoCAKXGnXM7fnIdDEF5AfD/ln7Cs6/EENgqEP9gPz5c+P8AaN2hFa++N5fHo6cA0HdwH+6O7k1QWCCPThvPhr9vvL5oeRnmTX/D/aFnsJsKsV06hzXtOO4jnsReUoRl1xYsu7bgPvJJVPeORebTEvO6d6Ci/PqajTUWAv0VlctCj18tbFy+jkfmPk7LVn4YglqwfvGaG9YQdf6JajtFHTth/R6I6/saWb8HYvu+a/np2Am2f5tA7pU8/rHm30x65EHU7nU/8ayBwHZIqLZEvdi7dy/Tpk0DoGvXrlXjkKu5dm6Tuh4uXY3Mbm/4eV7tdjsDBw7k448/pk2bNsTFxREQEFD1I35/jzA4OJiMjAxiYmIIDAzk9ddfJzk5malTpzJkyBDy8/OZP38+ERER5Ofn07p1a8aMGUNiYiJr1qxBo9Ewe/ZsDAZD1b5/1w4NDSUrK4tnnnkGg8HAxo0b+eWXX/Dw8MBoNDJ37lySkpKYP38+7du3Z+bMmbzxxhsUFhbyxhtvoFareffddwkJCSEnJ4exY8cSHBxMbGwsvr6+tGrVinXr1jFq1ChmzJgBwIcffkhubi6vv/46AFarlRUrVqDT6SgvL8fd3b2qRPRqBg0axNNPP01WVhanT5/m1VdfJSgoiCNHjvB///d/BAUFkZGRwQsvvIDVauXNN9+ksLCQ1157je3bt7N9+3YWL16MyWTilVde4eWXX2bMmDGsXr2aCxcuUFZWxtixY9m8eTNhYWGcP3+e8ePHExkZWedxLHn3mQbKiGoOLrmxktcboc+qG58AoT7YM9KF6JZ8U/sT4oZA9+z9QnRlwR2E6Irk3Li/CdHdV+Lt2ugmGB1xUYguwMhTCiG6Ox+6tYkWrkdZau1lbQ2BqFh82fH65fC3Qm6GTogugP99Yua1e+ZL1xchN8NAq5hYPPZR3X3irXD+pT1CdINXRAnRFdXvgbi+rzH2e4oAMdqFE54UoisSn2/33m4X6kWQdyfXRjfBhbyTdX7+1FNPkZubW+PfY2NjmTFjBgcOHMDT05OKigoiIiJISkq6bvXgmjVrAHjiiSdc+iVkoChxYwwaNMjp6eSdgjRQdCANFKuRBorVSAPFaqSBYjXSQLEaaaBYjTRQrEYaKFYjDRSrkQaKdQ8U62LAgAF8/vnntGzZkoKCAgYPHkxiYmKtttu3b+fixYs8++yz9dJulMtj/C/x1VdfUVRUxPr162+3KxISEhISEhISEhIS1+FOfEdxwIABHDt2DICjR48yYMAAh682G5cuXaqy27x5M1euXOHZZ58lNTWVjIwMl9q3561aiSpGjBjBiBEjbrcbEhISEhISEhISEhJ1cCcWYs6aNYu33nqLc+fOcfHiRV5++WXAMTHn7Nmz2b59O7t27WLp0qV06NCBhIQECgoKmDdvHqGhoXVqS6WnEtfl44DHGlzzrJu4Wa5aVYh5QC7K50kqcWW4PqHFQnRFlr+JIujZQCG6osqod2nElEQCfF+e5droJuivbCFEVySiYrGwwleI7tl6TDpw09qNrI1ba2kqRFdUHyKSVuU3N4GQK0Tmm6jyevd2YhZxF1kCL4om6/8lRFdkSWtjKT3194oQopuZnyRE91aRnihKSEhISEhISEhISEi4oD7Lp/wv0fhun0lISEhISEhISEhISEgIRXqiKFFv/PpGEDq0B6VXjGC3c/SdrU6fd372ATS+TSjNKcSnUyiH39pCYfrlemmH3d2Rjvf1wFSpnfDeFzVsOt3fiyGzH+brBZ+SsvvYbfVZlL8A2j534XFvH6x5hdjtdq58uKFWO8/hUfi9PZvUux7EXrnAb10ou3RDdXd/7AX52O12Stevdfrc/d77UN8/AiyOhYjN3+6gLCH+tvkrUlseGI6idRcoLcJuh4pDXzt9roqeiKxpdQmh3CcA84bF2I11rD0HePXvRPNhPbHkFoIdMt7e4vR585G98b2vO0WnzuN5VxhZm/eSG3/Upb8gLuc8mnowbe7TXLpwmYBQf/6x9BPyc2sunO4f4sf0V6ditVqZF7PgtvkrUltULETmRWNr40S2F42tHxGl2xjzTVT/pGjbGbfOfbCbCsFux7Lz3zX33X84APJmBmQaHeYN77nUFemzKN3ayL2Sx/urPiX1l7Ns/OT9m9L4o32+3dhvceKZxoY0UKyDQ4cO4enpSfv27W+3Ky754osviI6OxtPTU4i+Qq2i79LJbBn0MjZLBdGrYvG7O4JL+6trqt20an5c4Ji9tdXwXvSa9wjxT650qa1Uqxi9eDLvDJ6N1VLBhL/PJKxPBOkHqrW9Anwpziui8HLdF+p/hM+i/AWQqd1psWA6GcOmYi+vwP+DOLS9O1Ny8LiTnSosEFXroPoLu7ujj51FfswTUF6Ox6sLUd7VlfKfnS8OipYsxJZd/3e3hPkrUttNieqeCZg/WwDWClT3P4M8MBzbxeop260XkrHu+qxyB2pUg59wOUiUa1SEL5/Cj/1fwG6poNMns/Dq15H8faeqbBRqFb8s2kBZ5hX0HUPo9PHMel2gicy5qXOe4vAPR9i9fS9339ub6a9N5Y3YJTXsIrq05+DuQ/Qc0P22+tvYYiEyLxpbGyeyvWhs/Ygo3caYb6L6J5TuqMf/leIlz0JFBerJc1G07Yw1rTrf3HoMxF5aTMVPjiXK5H4h9dMW5bMo3etw9EQSg/r9hZQzZ29e5A/2+XbzZ5vaRSo9rYPExESSk5Nvtxv1YuvWrRiNRmH6hm5tMP2ai81SAUD2T2cIuucuJ5sjb1XftZTJ5ZQXl9VLO6hrG/Izc7FWap8/nEb4oC5ONvm/5nD24Ok7wmdR/gJouoRTfuk37OUO7ZKjp9FH9XSykand8Z4yhtzr3ImvDWX7CKzZ2VA58UF50ilUPXvXsFOPGI1mzHg0EyYh83A9MYAof0Vqy1uGYTfmgdWha7uUjiLUeV0ka9rhqr/dIu6mImm/S90m3dti/jUHe2VeFCSm4hPtnBeXN+6lLNNxkaoNbUFx2q/18llkzvW+5y+cOuL43omfTtFnUK9a7eK3JlBReSxup7+NLRYi86KxtXEi24vG1o+I0m2M+Saqf1KEhmPLy4EKh7/WjGTcIno477t7FDKdHmX/4ageeBx7WalLXZE+i9K9HoMH9kOrvbV1cP9onyX+WBrtE8X333+fbdu2sXjxYjp37syQIUNISEjg1KlTLF26lHnz5qHT6fjnP/9JSEgIZ8+eZcqUKfj4+DBr1iwAwsPD2bdvH7GxsRw4cABvb2/MZjMGg4G+ffuSmJiIh4cHmZmZxMTE4O7u7rR/q9WKUqmkvLycXr168cILLzBhwgSmT5/Ohx9+yLFjxxg+fDgfffQRw4YNIzc3l3PnzjFp0iT2799Pamoqb7/9Nv7+/sycOZPMzEz69OnDsWPHiI6OJi8vj+TkZDp06MCMGTMA2LBhAxkZGXh5eVFUVMTs2bPZv38/mZmZrF27llatWmEwGFiyZAkDBw7EZrOxc+dOunXrRkZGBsuXL8fNzY2XX36ZadOmER0dXa94a3w8KTdVN6AWUynNfGp/eilXKmgzth/749bUS1vv40lZcXVpkdlUgl+zkHp9ty5E+SzKXwCFd1NsxdU+20wlKLybONn4Pv84V/72b6jnhSqArKkX9tKSqm17STGypm2cbMpP/Iwl8SD2wkKUPXrhEbcA45xZt8VfkdoyrQf28urjZ7eUItdc7+mFDEVwBBXHElzqqnw8sZqqda2mUpQ+TWrYydVKQl8ci9fdHUia9kG9fBaZc17NmlJicuRGSVExnl6eKBRyrNabnwlTpL+NLRYi86KxtXEi24vG1o+I0m2M+Saqf5Lpm2Avq9bFXIJM7xwLmVdzZGotlp2fI/P1QzttIcWLp4G97nNemM+CdEXSGH2+FW51zcPGRqMdKE6fPp2vvvqKyMhIdu/ejbe3N/v27aNbt24MGDCAyMhIxo8fz7x58+jUqRPHjx8nLi6Ozz//nJiYGFasWMFLL73EE088gc1mY9GiRWzcuBGDwcDRo0cJDQ2lZ8+e+Pv78+CDD9bY/6ZNm1i7di1hYWEcPXqUrl278vDDD1cNJlUqFQsWLCAgIIAff/yRgIAAnn/+eRYvXszp06dZsGABa9as4dtvv2Xy5Mm8+OKLTJw4kRkzZmAymejXrx8HDhxAo9EwaNAgZsyYQXp6Op999hk7duxAJpMxZ84cEhISiI6Oxt/fn0mTJhEQEABAfHw8wcHBTJgwgZEjRxIaGsqIESMICQnBZrPRs2fPeg8SAUpzjSj1mqptlV6DObfmE0y5UkHfJU9yeNkmis7/Vi9tU64Rd526alut11J85dafjoryWZS/ANa8AuS6ap/lei3WvMKqbbcWPiia6PEY2q/q37yfHE3x3sOYT+ND060AACAASURBVJ25rq69IB+ZpvquoUyrw17gPHX91SUh5T8fw3PBmyCXg+36HaYof0Vq20uKkCmrj59MpcFeWvv054qwzlgzTtTp5+9Yco0o9NW6Cr2G8tzCGnY2cznpizagCTHQ9YvXONAzFnuFtU7ths65kY89QP/7+lJaUkr+lQK0ei0mYzFaDx3GfOMtDYxE+CtSW3QsROZFY2vjRLYXja0fEaXbGPNNVP9kNxUic7/qaZla63hX8WrMJVjPpTnscy6BWovMywd7Xt1+C/NZkK5IGqPPEvWn0ZaeyuVyoqKi+O6770hKSmLmzJns2LGD+Ph4Bg8eDDgWmgwMdKyhFhQUREpK9TtIYWFhAPj6+mIwGIiLiyMuLo6JEydiNrt+cf7tt99m5cqVPPzww1y+7HhR+5FHHmHLli2UlZWRnZ1dNWj7ff8Anp6eTn8XF1evdxcQEIBcLsfT05NmzZqh0+mQy+XI5Y7DlJaWhlwu5+OPP2bVqlW4ublhMpmu6+Pvv7FTp07o9XoGDhzI9u3b2bp1KyNHjnT5G68m+8gZ9AE+yFWOewuGHm24kPAz7k11VR2HQq2k79KnOLnqG3JPniNkWI+6JKu4cPQMXv4+KCq1g7u3JWX3MTRNdLhf1SndKKJ8FuUvQOmxFJR+zZEpHdrarh0w7UlE3kSPXKehIiuXy3PeIW/VZvJWbQYg719bXV5ElScnoTAYoHJtLGVERyyJB5F5eCCrLDvRPvk0yB1r+Cn8A7BlZblsxEX5K1Lbdjkdmac3KBy6cr8wrBknwV0LKrWTraJDbypOH3TpK0Dh4TTUAb7IKvOiac925O46hltTHYrKvAia9kCVfdnlPJTensjVKpfaDZ1zX677mhcem8O8mAUcTPiRjt06ABDZoyMHdh8CQCaTYfBrfsPaIvwVqS06FiLzorG1cSLbi8bWj4jSbYz5Jqp/smakIPf2BTeHv4rQ9lQk/QRaPagd/lakHUfuY3B8Qa0BuRy7seYEVn+Uz6J0RdIYfb4V7Ha7kP/vVBSvv/7667fbiZtFp9PxySef0Lp1a0aNGsXy5cvRaDSMGTMGgD179hAZGYnBYCA1NZWUlBTGjBlDZmYmKSkpVU/UjEYjJSUlTJ06lU6dOvHqq6/y6KOPcvjwYfR6Pc2aNcNisaDRVHc2aWlpTJ8+nYEDB/LXv/6VyZMno9PpOH78ODt37uShhx7C398fgF27dtG+fXsCAgJITEysmiAnOTmZoqIievXqhdFoJCEhoerp5dq1a5k0aZLT33K5nN27d7N8+XK6deuGwWDAYDDg4+PDl19+yaBBg8jKysLLy4vdu3dX7fN3goKCWL58OUqlkrFjx7qM79GV1TPG2SusFJy5ROQzw2jeJYyS3wpI2/Q93V54CO/wQLJ/SuOevz+Hb2QoLXqF03Zcf/z7dCBlw3dOmvnymieDrcLKb79k0u/p+wm8qzVFvxVwZMv3RD8/hhbtAjh/2HG3b+D0UbTq3QG1XoOl1ELe+WwnHS+bzGlblM8N5e9dilpuSFRYKUu/iPdTD6LpHE5FTh7GL3bhE/sY7m1DKK18f0rh5Yn3k6PR9e4MFVYs5zKdSrq0Xtcswmy1Yr14Ac1D41GGd8B25Qpl/92JduJk3EJCqUg6iSIkFPWQYShCWuHedwDFn/wDW26Ok0xJwTUXEw3kb600kHaTHteUXdls2PKycOt2L4oWrbAXF2I9fQBl7+HIm/lhu5QOgMw3ALnOC9u5k7W69+sPzsfPXmGlJC2T4GnD8ezWBkt2AZc/30Orl8ahbx9IYWIqXn0jMIy6G337IPwejiJzXQLGynz5nbPKmvfvGirnzttq3lg6eTiJUY8Np3WHMDp178jfFq3CXGKmTUQYi1a9ztZPvwKg7+A+DHogiuCwIDQ6DScPV09gEazQC/G3Nu70WAy06Zw0Gyov8hWKGv42ujaugc7p41bnGzoN5fO1fUhDxliUrtc1F9uNId/aN7/mKWQD9U9uPu7OujYrtuyLqAaNRhHSDrsxn4pDu3AfOgFFy2CsZ09jvXAGZY9BKPxCcOs2AMt/N2PPdn5n03rFUvMANpDPonTVD42qez+V/HTsBNu/TSD1zFnMZWV0bN8WN7frFxuWffGlMJ+1E5+sl8+3m+VL6leqfaPMnvucEN1bRWa/k4exLrDb7QwcOJCPP/6YNm3aEBcXR0BAANOmTQMgPT2d1atXExwcTEZGBjExMQQGBvL666+TnJzM1KlTGTJkCPn5+cyfP5+IiAjy8/Np3bo1Y8aMITExkTVr1qDRaJg9ezYGg6Fq38899xwdOnTAbDaj0WiYOnUqACdOnOC1115j27ZtVdvz58+nffv2PP3008yfP58mTZowa9YsVq5cSWFhIQsWLGDbtm1s376dN998k0uXLrFkyRLefPNNAF555RVmz57N2LFj2bRpE+np6eh0OgoKCnjhhRfQ6XSsXr2aCxcuUFZWxoQJE6r2+cwzzxAcHFzld0xMDOPGjatX2enHAY812LH6nbNu4u4gtaoQ84BclM+TVAWujW4Sn9Bi10Y3QW6GzrXRHUbQs4FCdA8uEXP8dmlqXqA1FN+Xi5lxrr+yhRBdkYiKxcIKX9dGN8HZyrv1QrQbWRu31tJUiK6oPkQkrcrLXRvdBCLzbXTERSG67u3ETJBSllr7qwl3Mk3W/0uIbuEEcYM5n2/3CtNuSLw92rg2ugnyilxXTtwOGvVA8U7CZrNhs9lISkoiPT291vcabycWi6XqvclXX321qpy1LqSBooPGdhEF0kDxaqSBYjXSQLEaaaB4lXYja+OkgWI10kCxGmmgWI00UBSHl761EN180y9CdG+VRjuZzZ3G+fPneeuttzAYDMyZM+d2u1ODpUuX4uHhQffu3es1SJSQkJCQkJCQkJCQ+PMiDRQbiNDQUD766KPb7cZ1ee211263CxISEhISEhISEhKNFml5DAmJSr5TNHz5YjC3NuNhXYgqnzpvr98CvDeKqPJQAO3QcCG6Pt+kuDa6CUSWtF74m5gypz6rooTofhJbv1lWb4Zgt5rrqd3JiDr3QFy5bM+HGmYZkGvx3ymwbL9QTMneWsSUiL4yQkyMAd78qvZ1AW8VUWWt4krVxeWbqBJRt3v6C9FVtEoXogtQIqhPFVUiKqqkVeLORRooSkhISEhISPzpETVIlJCQ+N/hzza1izRQlJCQkJCQkJCQkJCQcIHtTzZQlGY1kZCQkJCQkJCQkJCQkHBCeqIocUt0vDuSHkP/gjG3ELvdzhfvbbopnbC7O9Lxvh6YrhjBbifhvS9q2HS6vxdDZj/M1ws+JWX3sduqWxsNFQtll26o7u6PvSAfu91O6fq1Tp+733sf6vtHgMWxCLD52x2UJcS71JUHhqNo3QVKi7DboeLQ106fq6InImtaPbW/3CcA84bF2I1Xbou/ANo+d+Fxbx+seY6YXvlwQ612nsOj8Ht7Nql3PYi9xFyrzR+h++OZTBJOncNbp0Emg6n3dnX6PDOviJVfJxIR6EPqpSsMvSuMqIjg66jVTUPlW23omuh5ZM5Esi9k0yK0JRuXr8OYW3hDGo3x3BPls6JtZ9w698FuKgS7HcvOf9ewUfYfDoC8mQGZRod5w3v10haVy179O9F8WE8suYVgh4y3tzh93nxkb3zv607RqfN43hVG1ua95MYfdakrMi9ExVmUz359Iwgd2oPSSt2j72x1+rzzsw+g8W1CaU4hPp1COfzWFgrTL982f0Vqizp2ItvkxtaniuyrryX3Sh7vr/qU1F/OsvGT929K407F/iebzOa2P1E8dOgQycnJtX5WWlrKzJkzWb16NXPnzr3lfRmNRr74omajdi3Lly9n4sSJddocPnyY0aNHc+jQIQDee+89EhISbtnHG+XgwYMsW7bsD98vgEqtYvKbU/ls4T/5v3c3EtQ+hIi7O92wjlKtYvTiyXz9xmckvPt/tAgPIqxPhJONV4AvxXlFFF6uu4H9I3Rro6Figbs7+thZFP/jQ0rWrcGtVRjKu7rWMCtaspDC2TMpnD2zfg25mxLVPRMo/34z5T9+jdzHH3mg84Q31gvJlG1Z6fj/q79h/TXNZYcmzF9ApnanxYLp/PbmKnI/WI+6XSja3p1r2KnCAlG1DqqXpkjdUksFi7/Yz0vD/8K0wV05czmfQ2cuOdms2XOCu0IMTB7YmSejInn768R66zv51lD5dh3Gz36Mkz8cZ/vfv+DIt4eYEPfEDX2/MZ57wnxWuqMe/1fKtn6M5ZsNyP1CULR1zje3HgOxlxZT/v12yrauxrLny3pJi8pluUZF+PIppL22loy3tqDvEIRXv45ONgq1il8WbeDCR19x7r2ttFnwuEtdoXkhKM6ifFaoVfRdOpmDC9ZxdOUXeLcPxO9uZ103rZofF6zn+N++JmNHIr3mPXLb/BWqLejYCW2TG1ufKrCvro2jJ5IY1O8v/MmqNP8nue0DxcTExOsOFE+fPo1KpWLKlCksXLjwlvdlNBrZunWrS7tHH33UpU337t1p165d1XZsbCz33HPPLfl3M/Tu3ZvZs2f/4fsFaNOtHbmZOVRYKgBIO5xCl0Hdb1gnqGsb8jNzsVbqnD+cRvigLk42+b/mcPbg6TtCtzYaKhbK9hFYs7OhchHl8qRTqHr2rmGnHjEazZjxaCZMQubhegY5ecsw7MY8sDr8s11KRxHqfDFtTTtc9bdbxN1UJO2/bf4CaLqEU37pN+zlDp9Ljp5GH9XTyUamdsd7yhhyr/MU5Y/UPXH+N1p66VG5OWYhvCukOftSLjjZeOs15Bc7nubkFZvpENCs3vpX01D5dj26DOrGmaOpAKQeTqHLoG439P3GeO6J8lkRGo4tLwcqHLrWjGTcIno42Si7RyHT6VH2H47qgcexl9VvtldRudyke1vMv+Zgr4xFQWIqPtHOsbi8cS9lmY6LXm1oC4rTfnWpKzIvRMVZlM+Gbm0w/ZqLrVI3+6czBN1zl5PNkbeqn+LK5HLKi8tum78itUUdO5FtcmPrU0X21bUxeGA/tFrtTX//TsZmtwv5/06lztLT999/n23btrF48WI6d+7MkCFDSEhI4NSpUyxdupR58+ah0+n45z//SUhICGfPnmXKlCn4+Pgwa9YsAMLDw9m3bx+xsbEcOHAAb29vzGYzBoOBvn37kpiYiIeHB5mZmcTExODu7g6AyWRi8+bNpKam8sEHH9CnTx8WLlxIREQEnp6efPnll3z77bfMnTuXTp06kZWVRdeuXRkxYgQAGzduJCMjAy8vL37++WdWrFjBpk2byMzM5IMPPqBfv34ArF27lg4dOpCSksILL7yAn59fnQFbtGgR5eXlBAYGkpWVBcClS5dYtGgR7du357nnnmPmzJlkZmbSp08fjh07RnR0NHl5eSQnJ9OhQwdmzJgBwIYNG6p8LCoqYvbs2Xz33XcsWbKEoUOHYjKZOH36NG+99RYBAQGsX7+eixcv4uXlRWZmJgsXLmTJkiWcPn2azz77DKvVyrJly2jatClGo5HQ0FDGjx/P6tWr+eijj5g9ezbHjx8nLy+Pv//97ygUtzattmezJphN1Y11iamEkGatblhH7+NJWXF1KZTZVIJfs5Bb8k2kbm00VCxkTb2wl5ZUbdtLipE1beNkU37iZyyJB7EXFqLs0QuPuAUY58yqW1frgb28OhZ2SylyzfWeMMhQBEdQccz1E3JR/gIovJtiK66Oqc1UgsLbeXkH3+cf58rf/g2VF8r1QZRunqkUrbuyalvnriLP5Hz3eGL/jsz6dBdvbf+RUxdzibnmwrC+NFS+1alfGaNSUwn6ph7IFXJs1vpNmd8Yzz1RPsv0TbCXVZ8jmEuQ6Z3zTebVHJlai2Xn58h8/dBOW0jx4mlgrzveonJZ5eOJ1VQdC6upFKVPzaVV5GoloS+OxevuDiRN+8Clrsi8EBVnUT5rfDwpvypvLaZSmvnUPuuqXKmgzdh+7I9b41JXZIwb2zkisk1ubH2qyL76z4Y06+lVTJ8+na+++orIyEh2796Nt7c3+/bto1u3bgwYMIDIyEjGjx/PvHnz6NSpE8ePHycuLo7PP/+cmJgYVqxYwUsvvcQTTzyBzWZj0aJFbNy4EYPBwNGjRwkNDaVnz574+/vz4IMPOu1br9czevRoAJ577jkAoqOjKS0tZfbs2YwaNQqNRsPo0aOJjo7GarUybNgwRowYQXp6Op999hlff+2oF9+5cyd2u51x48Zx7NixKr3U1FRmzZpFYGAg8fHxfPbZZ7z88svXjceePXs4d+4cq1evBqgqNfXz8yM6OprMzEwAXnzxRSZOnMiMGTMwmUz069ePAwcOoNFoGDRoEDNmzKjycceOHchkMubMmUNCQgLR0dHEx8fj5+fHww8/zOrVq4mPj2fy5Mls2rSJuLg4evbsydGjjndBJk6cWFWWu3nzZioqKnj22WcBeOCBB+jevTtTpkxhw4YN9OrVi0ceeYSYmBiSk5Pp2LHjtT/xhjBeKUStr14XUavXYrxyY+8xAZhyjbjr1FXbar2W4iu3vk6WKN3aaKhY2AvykWmq78LJtDrsBQVONrbsrKq/y38+hueCN0EuB9v1O0x7SREyZXUsZCoN9tKiWm0VYZ2xZpy4rf4CWPMKkOuqYyrXa7HmVcfUrYUPiiZ6PIb2q/o37ydHU7z3MOZTZ/5wXW+9hpKy8qrt4jIL3nq1k81rm75ndI92DO0SRp6plBHLt/CfOeNoonW/rm5tNFS+Xc2gRwfTY0gvzCVmh75OQ4mxBI1ei6mgqN6DRGic554on+2mQmTuV91ZV2sd72FdjbkE67k0h33OJVBrkXn5YM/7rU5tUblsyTWiuCp3FXoN5bW8o2ozl5O+aAOaEANdv3iNAz1jsVdYr6srMi9ExVmUz6W5RpRX5a1Kr8GcW1NXrlTQd8mTHF62iaLzdeeDSH9Faos6diLb5MbWp4rsqyX+t6mz9FQulxMVFcV3331HUlISM2fOZMeOHcTHxzN48GDAMdgKDAwEICgoiJSU6sVDw8LCAPD19cVgMBAXF0dcXBwTJ07EbHb9Mn1t/K4ZHh6OTCYjPT2d999/n08++YS8vDwA0tLSCAgIqPrOfffdh0ctj9DVajXr16/nH//4B/v27SM/P7/OfZ85c4aQkJCq7d9/d20EBAQgl8vx9PSkWbNm6HQ65HI5crm8yke5XM7HH3/MqlWrcHNzw2QyVX3/9/14e3tTXOxYmH3p0qVs3LiRMWPGkJSUVGOfVx+L331IS0ur2g4NDa2heSucOZKKj78vbirH/Ya23cM5tvuwi2/V5MLRM3j5+6Co1Anu3paU3cfQNNHhflVHeqfo1kZDxaI8OQmFwQBKx11QZURHLIkHkXl4IKss49A++TTIHU+DFf4B2LKyXDbktsvpyDy9QeHwT+4XhjXjJLhrQeXccSo69KbidP0WfRflL0DpsRSUfs2RKR0+a7t2wLQnEXkTPXKdhoqsXC7PeYe8VZvJW7UZgLx/ba3zAlikbmRwcy7nm7BUXij/fO43+oUHUVhShsnsmBwgq6AYH09HXDw17shlNzfVdkPl29Xs3hDPsklv8N60FRzbfYQ2XR2l9e26h3Ns95Eb0mqM554on60ZKci9fcHNoasIbU9F0k+g1YPaoVuRdhy5j8HxBbUG5HLsxrr7IxCXy4WH01AH+CKrjEXTnu3I3XUMt6Y6FJWxCJr2QJV92eU8lN6eyNWqOnVF5oWoOIvyOfvIGfQBPsgrdQ092nAh4Wfcm+qqBpAKtZK+S5/i5KpvyD15jpBhPeqSFOqvSG1Rx05km9zY+lSRffWfDbug/+5UXM56OnToUN5991369OlD3759WbhwITqdjnHjxgGOAduFCxdo2rQp58+fJzy8+mVemUxW9bfRaMTLy4vVq1dz5swZZs2axfbt25HL5djtdrKzs1EqlXh7e9fpz9Wae/bsYf/+/Xz66acAfPbZZwC0bdu26ukewLfffkuPHj1QKBRVj4yTk5P58MMPuffeexk1ahQ//PBD1RPI69G6dWt+/PHHqu2LFy/WaV8Xbdu2xd3dnZiYGACSkpJwc6s+HFf/zt+5fPkyb7/9NiUlJTzwwAMMHz7c6fPw8HCngfqvv/5K27Zt69S8FSxmC/+M+weTXp+CMc/IheRzJO0/ecM65WYL2+b9k+GvT6L4ipGslAukH0jivjmPUFpoYu/ftwMwcPoomvr7EPnAX7BWWDnzfd136ETpiowFZWWYPngH3bRY7IUFVJxNp/zno2ifmoq9yEjppg3Y8vPQx87CmnUZt5BWFC1f7Fq3ohzL7g0oo8ZDiQlbbia2iyko+z6I3VxMxeFvAZD5BmDP/w3KXb8LI9RfwG4uI2v+RzR/dSrWPCPm1AxKDh7H96XJWAuLqi58FV6eNH14GADNpoyhYOM3VGRff8IAUboalRuvjL6bZV8exEunpk1LL3q18eOd/yTSROvO5IGdeWl4L9b/kMTx89lk5hXx3H3d8dKpr6t5PRos367DxuXreGTu47Rs5YchqAXrF6+5oe83xnNPmM/lZZg3/Q33h57BbirEdukc1rTjuI94EntJEZZdW7Ds2oL7yCdR3TsWmU9LzOvegYryunURl8u2Ugups1fTbvGTWK4YMZ2+QP6+U7R+dQLlBSbOf/Alcncl7ZY+hTkzF10bf9JeXYPVVPd7Y0LzQlCcRflsNVvYP/df9Fn4OOYrRvKSL3JpfxI94x6mrKD4/7N35mFVlWv//zKTKCjUQQUExRQcitSQ0iMypGlOJIrmQGkhlZriDISKyGDiBDmgqIhmgCKoaGaCoCAqIanIJCAKmKggYzKu3x/7t1Z7Y73nba/7cXjP+lyX12Ht61xfnjbPWuu5n+e+vzd++/4E7EK+gn5fY3Tq8SkAQOM1Ldw5dfWFfcev2j3C9Jn8qr1TGb6r/4qr167jxJlzePS4Erv2H4br9I+hrfXPMmckXg5UuP+QbMtxHOzs7LB79268+eab8PLygrGxMb788ksAQGFhIfbs2QNTU1MUFxfDzc0NJiYmWLNmDXJycuDu7o7Ro0ejqqoKq1evRv/+/VFVVYXevXvD2dkZV65cwf79+/Haa69h+fLlMDSU7RjV1dXB398fOTk5mDVrFt555x2sXr0aenp6cHNzw8CBA/H48WMsWrQIffr0gaGhIcLCwrBy5Uo4OzsjKioKt2/fRpcuXdDW1ob58+ejpaUF8+bNQ+/evdG7d2/o6uri4MGDGDp0KO7fv49bt27B19cXp0+fxk8//YTVq1fD1tZW4bvw9fVFU1MTunXrhosXL6JXr15YsGAB/P39UV1dDR8fH5w4cQInTpyAv78/ysvLERAQAH9/fwCAp6cnli9fjilTpiA6OhqFhYXQ0dHBkydPsGTJEhQWFmL16tWwtLTEokWLsG7dOlRXV2PdunXYvXs3unbtCgCora3FqlWrsGHDBmGsw4cPR2BgIHR1dVFdXY0333wTLi4uOH36NHx8fLBixQpYWlrC29sblpaWWLt2LTQ0NPB3fGLqJG52/QWmKrSnCc+DEu5/ZyzxT9lmUclEFwA6jLH4z/8nJWg4nfuf/09K8KhYh4kuS0y/G8lEd+7C/93O88sEq/ua1b0HsBuz5wQ2KbZlP7Hb2S+rVt604n/il9fE1cH/Hay+Y//jf10jSEGvFjbegUXqr96JD6u/n7rDCCa6XHEhE12A3TuVFXqH9jHT1nidrs6eJZpaxv/5/6QETY3/2RDsRfAfA0WJ/16kQFGGFCj+iRQo/okUKP6JFCj+iRQo/okUKP6JFCj+iRQo/okUKP6JFCi+nIHif0w9lZCQkJCQkJCQkJCQ+G/nv+18TQoUJSQkJCQkJCQkJCQk/gP/XWGilHoqISEhISEhISEhISEh0Q42SfMSEhISEhISEhISEhISryxSoCghISEhISEhISEhISGhgBQoSkhISEhISEhISEhISCggBYoSEhISEhISEhISEhISCkiBooSEhISEhISEhISEhIQCUqAoISEhISEhISEhISEhoYAUKEpISEhISEhISEhISEgoIAWKEhKvCDU1NUx0c3Jy0NLSQqKVm5tLovM/QTleACgvLyfTeh4sWLAA2dnZr5R2QEAAk7nBSheQzeW6ujpy3XPnzqGiooJclyWsxjx16tRXar69ijzP+dba2vpcfs/LSFRU1Isewj/i3Llzz+13/TfPi/8LSIGihFJkZmaipKQEZWVlWL9+PW7evEmqn5iYiOPHj6OgoAB//PEHiaadnR2zBbY8Z8+eJdNas2YNsrKycOjQIUyaNAlBQUEkuosXL0ZWVhZCQkKwdu1a+Pr6kuh6e3vjxIkTpIEcwG68ALBkyRJkZmaS6f0dVPNCS0sL/fv3F64pX8KstEtKStC3b18SreehCwDz589HVVUVuW5ISAjU1NTIdVk+31iNuW/fvgrzjer7Zjkvtm7diqysLJw6dQrDhw9HWFgYie69e/cwf/58rFixAqdOnSJ7JrH62/EUFhbi6tWruHr1KlavXk2iWVJSgkePHuHJkyfYv38/ysrKSHRZrluOHz8OX19fnD9/nkwTYDfmHTt2YMOGDcjPzyfRk4fjONy6dYt8Xki8GKRAUUIp4uLioKenh8DAQJiZmZHupm3YsAFnz55FRkYGmpubERwcTKLr4OCgsCi5d+8eie6lS5fg7OwMBwcH2Nvbw9vbm0QXALp37w4rKyvEx8cjISEBHTt2JNEdMGAArKyskJycjIMHD6J79+4kugsWLED37t0RGBiIsLAwVFZWkuiyGi8AuLi4oLS0FKtXr0Z8fDyam5tJdGNiYjB+/HjyefHWW2+hsLBQuN69ezeJLkvtt99+G/X19cL1/v37X2pdABg9ejRMTEyE6/T0dBLd4cOHw8DAQLim2kBg9XwD2I25a9euSElJXupuYgAAIABJREFUQVlZGcrLy1/6+QYAGhoasLKyQmRkJI4fP46GhgYS3V27dmH27NkwMTGBo6Mjjh8/TqLL6m8HyN7VGzduRGBgIH788UeyjYrt27fj6dOnCAwMREVFBUJDQ0l0Wa5bNm/eDB8fH6ioqMDb2xuRkZEKc1BZWI05KCgICxYswNWrV7F27VokJiaS6AKAu7s7tm7ditjYWMTGxuLWrVtk2hLPH/UXPQCJVxNjY2Noa2vj8ePHmDFjBuliVVdXF8uXL0dYWBj69etHliKhrq6Ow4cPw9zcHCoqKoiPj4efn59o3ZMnTyI8PBxRUVGYM2cO9u7dSzBaGVVVVcjIyICJiQlee+01Mt3S0lLExcXB0tIS6urqePr0KYnukCFDoKOjg9dffx0bN27EoUOHMHbsWLi4uMDMzOylGy8ATJo0CQBgb28PLy8vbNmyBS4uLpg2bRo6d+6stO6JEycQEREBfX19AMCxY8dIxrt582ZEREQAkO3c1tfXw93d/aXWjo6Oxs6dO4UFa319PT799NOXVheQzbnFixfD3NwcAJCRkQEbGxvRujk5OZg6daqgm5eXhw8++EC0LqvnG8BuzIcPH1YIwO/fv4/ly5eL1mU5L5qamlBUVAQDAwPo6+uTPZd79eoFa2tr/Pbbb9DU1ETXrl1JdFn97QBAW1sbO3bsQFhYGNzc3MjefW+++SYMDQ1RVFSEwMBAsvUFy3XL3bt3oaKigmvXriE9PR2amprYsWMHunXrhhkzZrx0Y25tbYWamho0NTVx7do1lJeXIzU1FYMHD8bYsWNFaevr6yMgIEC4lgLFVxspUJRQisLCQixatAiOjo548OCBwimEWPj0IxUVFQAg2ZUDZOmsgwcPxvXr1wHIXpgU9OzZE3p6emhtbYW6ujqqq6tJdAHZS2L9+vUICAhAUlISfv/9dxLd9957D/Hx8Vi5ciWSkpLAcRyJ7vLly9HS0oIHDx5g5syZ2LhxIwBg48aN8PLyUlr3/fffR1xcHPl4AcDT0xPa2tpISUnB+PHj4eXlBY7jEBISgm+//VZpXSsrKyFIBIA+ffpQDBfz58/H3LlzheuffvqJRJel9kcffYSlS5cK1zExMS+1LgA8fPgQU6ZMEa6pUrRUVFQUxhwfH0+iy+r5BrAbs4eHBz7++GPhOi0tjUSX5bxQVVWFq6srNm3ahKSkJLJTtLy8PGRlZaGxsRH5+fkoLS0l0WX1twMgZF/U1NSgpaWF7Lu4ffs2/Pz8MGzYMDx9+pTsdJzlumX58uXQ1NTEtGnTcOzYMXTq1AmAbPNNDKzGvGzZMtTX12PkyJHYsmWLsJG7detW0dpWVlYoKSmBqakpAFm9d79+/UTrSrwgOAkJJXj8+DF39uxZrqWlhbt16xZ39epVMu0ffviBGzNmDPfRRx9xkydP5qKjo0l0ExMTFa5/++03El03NzcuLS2N27x5M+fp6cnNmDGDRLc91dXVTHRv3brFNTc3k2g5Oztz6enpCp81NjZyn3/+uShd+b9VTk4OFxMTI0pPnlGjRnFHjx7lGhsbhc+ampo4V1dXUbpubm6ci4sLt3LlSm7lypWck5OTyJH+SWFhIffTTz9xxcXFZJqstWtqargbN25wtbW1r4TunTt3FK4fPXpEotv+Ppafd2Jg9XzjOHZj5jiOS01N5cLDw7m0tDQyTY5jNy/a09DQQKJTUFDAubi4cFZWVty0adO4wsJCEl2Wf7stW7Zw586d406fPs0NHDiQW7FiBYluUVERt3//fq6hoYG7dOkS99NPP5Hoyq9bcnJySNctAQEBXFtbG8dxnPC/jY2NXEhIiChd+THfvHmTy8jIED1WjuO4b7755pl7o7GxkfP19RWtbWVlxdnb23N2dnacnZ0dZ21tLVpT4sUhBYoSJCQkJJBp1dbWcnl5edzp06fJXpY8586d4+Lj47n8/HyyF3x9fT33xx9/cA0NDVxERATpmFevXs1du3aNO3jwIGdnZ8cFBgaS6C5atIi7du0at23bNs7FxYX79ttvSXTlX2KUAV37ly3V98BxHDdv3jwuJyfnmc//+OMPUbpz587lLl++LPzz9PQUpccTFRXFjRs3jps3bx43btw4so0Ultq//PILN2LECG7cuHGcra3tM0HNy6bLcRxXUVHBeXh4cOPGjeOWLl3KPXz4kES3sLBQCAimT59O9rxoa2vjoqOjOV9fXy46OlpYrFLAasyhoaHc559/zq1bt46bO3cuFxoaSqLLcl5cuXJF4Z+XlxeJbkREBJONn3v37nFff/01t3z5ci4hIYH79ddfyX8Hx8kCcyru3LnDVVRUcFVVVdy+ffu40tJSEt1ff/2Vu3PnDldaWsr5+flxN27cINHlOI778ccfhZ/z8vK4lStXvtS6w4YN427evEmi1Z49e/YoXJ8+fZrJ75F4PkippxJKYW9vL6SGcv+/lklsXjvP6NGjsW/fPnz44YckejwbNmxAVVUVNDQ00Lt3bwQHB5MYjHTo0AG5ubmoqqrCqFGjYGhoSDBaGbyZjb+/PxISEshqQHhzGD8/P/z444/Ys2cPie6lS5cwePBgAICFhYXoNKdjx44hNjYW5eXluHLlCgDZfNPS0hI9Vnn+yiFRW1tblOamTZugq6srXFtZWYnS4ykuLsaJEyeE6/Xr15PostROTU3F2bNnoampiaamJvj5+cHOzu6l1QVkKWMODg6YM2cOSkpKEBwcrFB3oyx79uyBp6cnevTogTt37mD37t0kuv7+/mhuboapqSlu3ryJ/Px8Uene8rAac3Nzs0LNFZVxGct5sX79elhaWgKQtdahchQ9c+YMnJ2dSbTk2blzJ2bPno3Lly/D0dER/v7+GDRoEIn2kydPsHPnTqiqqmLo0KEwMTFBr169ROtu374dCxYsQHBwMPT19REaGkoy3+Li4uDh4YFvv/0WNjY2iIqKwoABA0Rp1tXVoaamBkVFRUKrpQ4dOkBDQ4NcV0dHR7Quz4cffviM+ZW8eZcY5s6di6KiIhQUFKBv377kazmJ54sUKEoohbu7O6ZOnQoACot4CiZMmKBQz5Wdna3wQFMWViY5e/bsQXJyMrp37w4nJydERkZi2bJlJNqvipkNq4DO0dER1tbWiI6OFuabmpoa3njjDVG68vAOibyj7P79+0mMLx49egQ3Nzfk5eXB0tISfn5+JIsoPT29//H6ZdTu3r07NDU1AQCamppkrrWsdAGZuQi/+dW/f3+yOqlevXrhrbfeAiDbPLh8+TKJrr6+Pr788kvhOiQkhEQXYDfm9kGWqiqNETvLeeHr6yt8F4DMkIeCIUOGKDwvY2NjFeo3lYWVSQ4gC+wHDRqEwsJCvP3229i0aRNJ66JXyczm7NmziI2NRVlZGXJycsBxHNTV1fHvf//7pdTlYWl+FR0djcjISBgZGaGsrAyzZ89WqPeWeLWQAkUJpeAX7YDspXz//n0y7ZqaGmzcuFF4gCUmJmLbtm2idVmZ5DQ0NCAyMhJhYWGwsbEhW0QBshebn58fAgMDSc1sqM1hWAV0nTp1QqdOnbB48WKFz6k2DwB2DomsTmGqqqrg5+cHY2Nj3Lt3D+rqdI9xVtp3797Fvn37YGJigrt37wo75C+rLq/95MkTdO7cGZWVlWSB4p07d5CdnS2MuaSkhES3trZW4bquro5EF2A3ZjU1Nbi7u8PExAT37t1TCMDEwHJeyI+xvr4e169fx/Tp00XrpqamIi4uDj169AAgc4ClCBRZmeQAMiM3JycnhIWFoXPnzmRBKGszGwcHBzJjGCcnJzg5OSE1NRXDhg0jGCVbXR6W5lcss14knj9SoCihFKtWrRJ+rq+vR1tbG5l2dnY2HB0dhSa7VC6iZmZmGDt2LFRVVXHmzBm4uLiQ6PJNyfkAlPK7mDFjhmCtbWFhQZY+9cEHH8DBwQFPnjzByJEjRev+XUCXlZVFknKZnZ2NY8eOCcF9Xl4eYmNjResC7BwSWZ3CrFixAjExMcjLy4OFhQVpuhor7RUrVmDXrl1IT0+HhYUFVqxY8VLrArK2KRMnTkR9fT06deqETZs2keh+9tln8PLyEr5jqkWUmZkZJk6cCCMjI5SWlmLWrFkkugC7MX/11Ve4ePEi8vLyMHLkSLIFMct58e6770JXVxccx6Fjx46YOXMmia6JiYmC4yTVc+iLL76At7c38vLykJqaSrpoLygoQEVFBVRUVFBbW0u2kTlv3jykpKTAxcUFWVlZpPMiMzMT9vb2yMvLI312th9jVFQUyRqjvS7VSfOqVauE935bWxtu3rwpWpOHZdaLxPNHChQllMbJyQmALG+er9mgYM2aNQrBxZ07d0h0p0+fDmtraxQUFKB3797o3bs3ia6amhrmzp2Lp0+f4saNG6Q20NnZ2fD29oaBgQHGjx8PHR0dODo6ita9ePEifHx80KdPH4wdOxYNDQ2YNm2a0nqHDh3CjBkzFDYQALqAbt26dfj000+FdhOUFu/yPcVyc3OFgF8sd+7cwa1bt2BsbEx6CjNt2jSsXr2abKPjeWivXLkS7u7u8PDweCV0AdnpanR0NDQ0NBTanIglNTVVaJ5NydixY2FlZYWioiL06dOHJM2Zh9WY7ezsEBoaqtCShQKW82Lt2rVk9fjytG+j8Mknn5DodunSBT/++CMA2SklZfumyZMnw9nZGdXV1Th8+DDZZkrPnj3R2NiIa9euoWfPnmR1//r6+lBVVcXJkydhaWlJkpUya9YsREZGChsIwJ++DRTP0ZCQEMTExEBDQ0PQpQgUKyoqhJ9v376Nw4cPk53os8x6kXj+SH89CaVYu3atUAMCyFI6+Ia+Yml/AnX58mWSBQq/w2dubo78/HysWrWKJBVw4cKFwq64hYUFaZpIVFQUQkNDkZCQgIkTJ8LPz48kUExKSsLp06cRERGBCRMmYMOGDaL05I1f+A0EgC6ge/vttxUK4ikXwRcuXMDbb78NgMaAh4fVKUzfvn0VFjhVVVXo0qXLS62tpaWloMs3e35ZdQHZAi08PJw0SATYmZawMgED2I3ZwcGBiaEGy3nRPkg8deoUSeAYFxencE1VcnH48GHMnz8fgMxkJSQkBP7+/qJ1AVm2Dx/EUN4nrOr+WRja7dixAwDg5eWFSZMmCZ/Lp16K4ebNmzh//rxQv/vLL7+I0mNlviMPy6wXieePFChK/COSkpJgZ2eHsLAwhc8zMjKwf/9+kt8hn9pTVVWFTp06idqZY+0exmpXHABMTU1hZGQknHJRmbgYGhpCS0tL0JV351SGyZMnAwBGjRoFa2tr4XOqOsLm5mZs3rwZZmZmZHWrrB1Vzc3NhZ18QHEHVwxdu3ZFSkqKUMN78OBBLF++/KXWfuuttxQ2k3bv3g13d/eXVhcAhg8fLtStAjJzCfnTZ2VhZVrCygQMYDdmVoYaLObFkiVLEBwczMzx++jRoxg6dCgAmUGc2FOYK1euCP9CQ0OF8T548ED0WHlYbaawqvtnYWjHm6AVFRUpfD5+/HjR2gAwcOBABZMn/vcpi7xJzq1btwCA1CQHkJlSya/ZLly4QKov8XyRAkWJf8SNGzdgZ2eHnJwchZOt/Px8st/h6+uLMWPGAAAaGxtx/PhxUXqsH4ysdsUB2feakJCAhw8fIikpidRQIywsDEVFRTh48CBZbUloaCgKCgrg7OwMfX196OjokOhmZmaS162yMuDJzc2FhYUF0xOC9PR04fr+/ftkgSIr7c2bNyMiIgLAn4trioCOlS4A5OTkYOrUqUKwkZeXRxIosjItYWUCBrAbMytDDRbzgj+V++yzzxTqP3/44QdRujzr1q1TyJw5dOiQKD1dXV0YGRmhU6dOMDIyAiBbwI8bN06UrjysNlNY1f2zMrQDZO9qX19fweBHbEDHk5KSgqNHj8LY2BiA7N4Tc6rI0iRnw4YNWL58OWbNmqWwmSJ2zBIvFilQlPhHLFy4EADg7e2Nbt26CZ+/9957ZL+DDxIBWQqRWJc21u5hLG2mPTw8sGHDBuTl5aGyspIsIPD09ERYWBiqqqrw8OFDsnYe/Lw4dOgQ6uvr8dFHH2HgwIGidVnUrcob8LS1teHJkyfo0qWL6BrFAwcOwN/fX+GEAKAzZfLw8FBYpKelpZHostSeP3++won7Tz/99FLrArLFpLzJEVVKMivTElYmYAC7Mc+fPx9jx44VTs/4gJFCl3pe9OzZEwCeMQmysLAQrQ3I2njwGS91dXW4cuWKYGSmDBYWFrCwsICtrS35iR8Pq80UVnX/rAztAGDLli3Q1tZGUVERgoKCoKGhAR8fH9G6RkZGQv0qx3E4cuSIaE1AZpKTmJiIuro6WFpawtjYWHQLLn7zvX///sJ9wnEcWQsZiReDCifWF1/iv4qrV6/+5eeUwZH8bhT/EKOq76J+MAKydEu+yTxA68jZHsrTSnmo0tR+//13dO3aFVeuXMHevXtRXFyMM2fOKK3HcRxUVFSesbf/8ccfyYwqqI19ePLy8tC3b1/huqCgAG+++aZoXT7VmSqt8HloT506FatXr35ldAFZP007OzuYmpoCAJqamhTqspXF398fH3/8MVmAwZOVlQVzc3N06tQJgGwzhcp8htWYhw0bhn379imkzFLAYl60N+rioXre29vbw9jYWHBTnTp1KonLdWlpKQIDA6GjowNbW1t07doVgwYNEq0LAJ9//jnc3NyE6/j4eLJ3Nau6/8LCQhQUFJAbPiUlJcHQ0BCRkZHIyMiAk5MTvvrqKzJ9Hio/CPl6zWnTpiE2NlZ0veZfUVNTI7q0ReLFIp0oSvwjAgIC0LdvX/z+++9obGwU+l9RYmVlJSzUdXR00LlzZxJdFoXsgKLNNEC3Kw7IalV+/vlnoScaVS0oq3YTy5YtQ01NDQwNDTFjxgyMGDFClN6UKVNw5MgRzJw5U1hEAbL0G6pAkdrYh6ekpAQtLS1QVVVFSEgIXFxcSAJFlqnOrLRZmeSwNPb55ZdfFPrFUgSJgKzHn/wGAhVLly7Fvn37hECR0qGU1ZhZ1VWymBeqqqqYOHEikpOTYWZmJrz7qObF2rVrhROZtrY2hbo0MezcuROzZ8/G5cuX4ejoCH9/f7JAcdOmTQpBAJWTOCBLax0+fDgAupRWQFY7zgdaVLW2gOzdZ2FhgU8++QTr1q0TXWPK+0Hw9aU8VGsAFvWaPGvWrMGkSZOQnZ2N8PBwjB49mrRFjcTzheZJJPFfg5eXFwICAjB8+HD88MMPCAoKwg8//ID333+f7Hc4OjrCyMgIRkZG+P3338lSLXR1dREQEABjY2P069ePrLdP+13f27dvk+gCslTA1tZW4fug2plbt24dhgwZIqTlUrU30dTUxLZt2xAWFgZbW1vRaZz8397b2xsHDhxAZGQkIiMj8e2331IMFwC9sQ/P1atX0adPH2zatAkuLi5ISUkh0eVTna9cuYKrV69i165dJLostXmTnLKyMpSXl2P37t0vtS4ADB48+BkDFwrefvtthdooKhOw0aNHKwT18rWmYmE1Zr6u8tixY4iLi3up55unpyesra2hp6eHKVOmwMbGBlOmTCEzGJPPmrh9+/bfnmD+U3r16gVra2toa2tDU1MTXbt2JdEFZMYqycnJiIuLQ1xcHNasWUOiGxMTg/Hjx8PBwQH29vZkJ10hISEYMWKEoBsUFESiCwDffPMNDh48qJBKLYYbN24AkKX38u9/yjUAy3rN7t27w8rKCvHx8UhISCCr15R4MUgnihL/CD7Fsn3dIKWTGqt2BawejCdOnEBoaCiePHkCLS0tPH36lGyXsn///gq1NvwOq1hYtZvYuHGjws59eno6bGxsROva29sjLS0Nubm5sLS0hK2trWhNHlbGPgYGBvjjjz/Q1NQEW1tbMsMnVgYgLLVZmeSwNPZhZeASHR2NnTt3CiYg9fX1+PTTT0XrlpaWYvHixcJpSUZGBsm9B7AbM6u6ShbzgjfmysrKQmVlJfT19fHo0SPRjcpZtyvIy8tDVlYWGhsbkZ+fL7rmXx4fHx+hLm/gwIFkf78TJ04gIiJCqK08duwYiS51qwl5Zs2ahYaGBjx58gSAzMV2wYIFSuux9oNgWa9ZVVWFjIwMmJiYkJT3SLxYpEBRQinU1NTg5uYGU1NT3LlzR1hMiYF1uwJWD8Zr167h9OnT2LNnD9zc3EhPNTp27Ijo6GihLQRVLSiLdhOAbMNgzZo1qKqqInU7+/7775GVlQVTU1OkpaUhMzMTX3/9tWhdgJ2xz7179/DZZ59h+vTpKCgoQE5ODokuy1RnVtqsTHJYGvuwMnD56KOPFExyqHQfPnyIKVOmCNeUTtSsxszCpApgOy+mT5+OCRMm4OnTp9DW1hbdi1felTsnJwccx5G6cn/xxRfw9vZGXl4eUlNTyWoIAZnRypdffomwsDDSd5+VlZWCAQ9VDSt1qwl59u3bh/j4eNTX18PAwAAVFRWiAkUedXV1LFmyBPn5+bCwsCBL4Zw+fTqsra2Z1GuamJjAz88PgYGBSEpKItt8lXgxSGY2EkqTnJyMgoIC9O7dGyNHjhStV1tbi5qamr9sV0DVLJlFITv/kgwNDcX8+fPh5+dHlirDyihn0qRJCu1Nrl69KtjJi2HFihVwdXXFqVOn4OzsjOjoaJITni1btmDRokXCdXBwMJYsWSJa96+4ceMGiVNrY2MjiouLYWFhgdLSUjQ3NwvOiWJhYcrEWjs3NxdVVVXo2bMnDA0NRacls9ZlCaXLLk9JSYlgvAMAjx8/VmhdIBYWY25qakJUVBRaWlrw1ltvwczMjGzMLOcF3+O3S5cuaGtrI3k/sXLlbg9/GkoB/64LCgqCq6sr1q5dKzSgF8O8efNQXV0tPC+p3ntTp07Fw4cPyVpNyMO3huDXA3v37sWcOXNE63p6emL48OEwNTVFSUkJLly4IHpz4q84cuQInJ2dyXUBdiZ8Es8H6URRQmlsbW2FFECKYnO+XQFfo8iTmJgIe3t7Udo88oXsVA/G69evIzExEZqamnB1dQXl3gurEx75nfyGhgYyo4A+ffqgX79+uHDhAszMzMiMHtovxKiMHgB2xj5aWlro1q0bysvLoaqqipMnT5LsMLMyZWKpvWfPHiQnJ6N79+5wcnJCZGQkycktK11ANi+8vb1hYGCA8ePHQ0dHR2FzRVlYuezq6ekhMDAQqqqqGDp0KExMTMiCLlZjDgwMhL6+Pqqrq+Ho6IitW7fC19dXtC7LeQHImqtXVlYCoHP8bp8SGhgYiJUrV4rWra+vR1pamvB8o+yvaW5ujvPnz2PEiBGYMGECpk+fTqLb2tqqYFZGVX4i32oCoDsZB2TpwsCfJS3FxcUkur169cLYsWMByEpRqMwDt23bhmPHjkFVVVXoNUoVKLIy4ZN4MUhmNhJKcenSJTg7O5MXmwNAUFCQ8BL+7bffyNpubNu2DXZ2dsKYv/vuOxLd0NBQ2NvbY+7cuZg9eza2bNlCoguwM8qRb3Py6NEjhIWFkehmZmYiJycHNTU12L59OzIyMkh01dTU4O7ujvXr18Pd3Z0sHRmQGfvY2NiQG/t4enpixowZWLlyJVasWEG22GFlysRSu6GhAZGRkTA3N4eNjQ3ZBgIrXQCIiopCaGgorK2tMXHiRDJzGN5ld9CgQZgwYQLu3r1LohscHIy+fftCVVUVb7/9NunCjNWYu3fvjvnz5+ONN96AiYkJDA0NSXRZzosNGzZg48aNCAwMxI8//ojs7GwS3aioKCQkJKClpQV+fn5kJm7u7u64cuUKSktLUVpaStpfc/r06Rg5ciTee+89XLlyBYsXLybR3bRpE6ytrYV/VNkjCxYsEExhmpqaFLJ1xPLgwQMkJiaiW7ducHR0xMOHD0l07969K9Q9VlZWkgWKt27dQmJiIs6dO4fExETSlGRWJnwSLwbpRFFCKU6ePInw8HBERUVhzpw52Lt3L5n2zJkzsXv3bqiqqiI9PR0TJ04k0eUfjHwKEmUhOyALZhwcHEg1Q0JCcOTIEairqwu7fmIMNcrLy1FWVoaioiIhWGxrayNLy1q5ciU0NTUxZ84chIWFkS0cvvrqK6Gv1siRI0lTtAYMGKBwUkS5WD158qRwffnyZRJdlm51rLRbW1sVdNva2l5qXQAwNTWFkZGRoE3lbsnKZbdnz55wcnJCWFgYOnfuTOpuyWrM9+7dQ1NTE1RUVNDW1obHjx+T6LKcF9ra2tixY4dCiiEFO3fuxM8//wxHR0c4Ozvj4MGDJLo9evSAl5eXcE1VB8oSNTU1nD17lvwU9NSpU5g/fz4A2SbFgQMHyGpB161bJ/xsZWVFVmYwadIkTJw4EXV1ddDV1cWmTZtIdC0tLdHY2AhtbW0SPXlYmfBJvBikQFFCKXr27Ak9PT20trZCXV2ddJeytbUVhYWFaG1thZOTE9mDnOWDkRXZ2dk4f/48WXB769YtnDt3Djk5OUJ6paqqKkljZ16rqakJhoaGcHFxUaiZEgu/+GtpaSHTBIChQ4dixYoVQg0FVZrMwIEDUV9fL7gl1tTUiNYE2LrVsdJWU1PD3Llz8fTpU9y4cQP9+vV7qXUBmRlMQkICHj58iKSkJLKd/PYuu1SO0QUFBaioqICKigpqa2tJDSRYjfnf//437O3toaKigujoaLKWECznRXNzMwDZ/dzS0kJ2orhq1Srcv38fS5cuxW+//YbLly/DwsJCtO7w4cNx9OhRwXCOKlWWJe7u7ujTp4/goC12ffHLL7/g3LlzyM3NFRx2OY4jdWs/cuQItLS0MH78eOTm5kJNTY2kb+6gQYOQnJxMWlsKAPr6+rCxsYGBgYGwCU2RWg+wM+GTeDFIgaKEUly9ehX9+/dHY2MjvLy8UFJSQqb97bffYs2aNZgwYQKysrKwZMkSkjQclg9GVgwYMEDhtE+sS5ujoyMcHR1x/fqe9vhGAAAgAElEQVR1vPXWW2KH9wwBAQFwdXWFiYkJHj16hKNHj5LUBvn5+QlmHWlpaUhJSSHrpRgeHo4xY8YIpyRUbpEHDx7Etm3bFFoKUNSCsnSrY6W9cOFC4UTYwsKC7ESYlS4gS5/asGED8vLyUFlZSdZ2o73LrrybqBgmT54MZ2dnVFdX4/Dhw2QnDwC7MTs6OsLa2lq4t6lOKlnOC3V1dSQmJmLAgAEYNGiQUD8mlvr6ehw8eBD6+voYNWoUFi9eDFdXV9G6sbGx0NTUFL5bynY6TU1NQlpvZWUlWlpa8K9//Uu0bq9evRSe72I3aSwtLaGrq4tjx47ByckJgGxTkyKQ47l+/Tp8fHwAAGPGjEFwcDA8PT1F65aVleG7777D7du3YW5ujqVLl5IYw5w7dw4XLlxAp06dANC1IAGA06dPY/Dgwbh27RoA2jkn8QLgJCSUoKGhgXv69CnX0NDARUREcIWFhWTaR44cUbg+e/Ysia6rqytXU1MjXMfGxpLotufnn38WrTFz5kxu1qxZ3JQpUzg7Oztu5syZ3MyZMzkHBweCEXJcTEwMd/z4cY7jOO7YsWNcfn4+ie7u3bsVrnft2kWiu3r1aoVrb29vEt2/0r5z5w6J7nfffadwHR0dTaIrIfE/8fjx4xc9hP/T/PLLL9yDBw84juMU3idiycnJUbi+e/cuie6yZcsUrrOzs0l0OY7jQkJChJ9///13bsmSJaL0ysrKuLKyMm7Pnj1cWloaV1paypWVlXHbtm0TO1SO4ziuubn5L3+mIDw8XOGa6t335ZdfcgkJCdzNmze5EydOcG5ubiS6mzdvVri+fPkyiS7HcVxiYqLC9W+//UamLfH8kU4UJZRi7NixCA0NRf/+/TF79mxS7cmTJytY9FPtBltZWQm7ZwAUnFXFEBMTgwMHDqChoUE4qRR7cvTWW2/hk08++cvfRQGr3c/y8nK0tLRAXV0dLS0tQgNpsbSvG+TTqDIzMzFo0CBR2urq6ti2bZugSVUPs3TpUoWWAvI97iQkWEGZnibxLCEhIQgPDwcAhfeJWHR0dDB//nzo6OjA1tYWXbt2JTk5srCwQHp6usLzTWwqbm5urvAvLi4OgKwO9I8//hClO2vWLBgZGYHjOJw/f174/P79+ySO0UuWLMGIESMwefJkxMfH4+nTp5gxY4ZoXUBmNPfTTz/B1NQUd+/eJasF7du3r4LraUFBAYluWloajh8/DmNjY9Kex4DMhE9+DUd5civx/JECRQmlcHBwQP/+/YVryj45rCz6WT0YT5w4gYiICGGBRpHCwadrFhcXC0XxxcXFZC5tZmZmUFeX3f5aWlp4/fXXSXSHDRsGBwcHdO7cGdXV1UIwKpaUlBRcvHgRJiYmuHfvHrS0tFBUVETSyiIjIwOOjo6CPT1VvS2rlgISEv9XqKurI216/jwYPny4QtsRitZQALBr1y7Mnj0bV65cgaOjI/z9/UVvggHA3r17FYxV7t+/Lxi6KEtNTY3goMo/N1VVVfHpp5+K0vXx8RFabsmTlpYmSpenT58+mDx5MgDZhvT3339PogsAixYtQlBQkJDuTJWq3rFjR2F9de/ePXTv3h0AcPjwYVHtSORbhXAcR+ayC7Bt4STx/JECRQmlUFdXx+HDh2Fubk5erKyrqys0ru3Xrx/OnTtHosvqwWhlZaWwi9+nTx8SXQBISEhg4tLGavfTwcEB7777LnnNkZmZmVBbIg9Fywn5npIAnSsg31IgIiICEyZMwIYNG0h0S0pKoKOjA3V1dcTFxeGDDz4gOx0H2DQqnzp1KlavXq2wucSC9PR02NjYMP0dlNTU1JDdI+Xl5cIikiWUY168eDHWrl3LfNyU8yInJwdTp04V+vHm5eWRBIq9evWCtbU1fvvtN2hqapK51np4eCg4ZVMEXXzbinHjxsHMzEy0Hg8fJJ46dUo4RcvJyUFycjLef/990frtjdCamppEa/L861//QnBwsHBNZX61c+dO/PDDDwAg9GjetWsX6uvrRQWK8v0kAfzl+1VZWK3hJF4MUqAooRSJiYkYPHiw0ACesliZlUV/+wfj+PHjSXTz8vIwbdo0YdeW4pSLtUsbq91PQDZOAwMD1NXVISIigiRlyMvL6y9PHigCDysrK1RXVwvzLD4+Ht98841oXVYtBbZv344FCxYgNDQU+vr6CA0NRUBAAIk2q0blffv2VfhbVVVVCY6GYsjNzcWOHTtQVVVFnj61detW2Nraory8HP7+/pg9ezbc3NxE665ZswaTJk1CdnY2wsPDMXr0aKxYsUK07pIlS7Bs2TKSU6j2sBqzlZUVTp48iYqKCnz44YcYMmQIwWiB5ORkHD58WCgHoJwXKioqCmY+VP1R8/LykJWVhcbGRuTn5wsndWJp306J78lHQWpqKp48eYI//vgDgYGBcHV1FdW+iaeoqEj42dLSEsePHxetCcg2uOfNm4cePXrg3r17pIZurJrMe3l5YdKkSc98fuLECVG69+/fx5kzZ8jHC7Bt4STx/JECRQmlWLVqlUJLBT5gpIDaoj8pKQl2dnYIDQ1V+Jzqwdja2goPDw/hmmLhwNqljdXup6enJ65fvw59fX1hgUYRKP5dehrfekIMnp6eyM7Ohp6enjBmikCxfUsBqnYFb775JgwNDVFUVITAwEDs3r2bRBf4s1F5WFgYbGxsyHo/du3aFSkpKUIGwsGDB0k2J/bt24d58+bh1KlTcHZ2RnR0NMFoZWhoaMDKygpBQUE4fvw4Dhw4QKLbvXt3WFlZwd/fHwkJCWR9+FxcXFBaWor4+HjBjVNDQ4NEm9WYv/76awCyk51ly5YJAfm4ceOE1Hhl2LlzJzw9PYXnEF9HR8GmTZsUNn3ksxHE8MUXX8Db2xt5eXlITU0V3QB9yZIlCA4OFtqPABBq6KmcWh88eIAZM2Zg9uzZCA4OxtGjR0XpRUREICIiArW1tTh27Bg4joO6uvpfpqMqw9dff82sH6+Hh4dCdgeVe/ZfBYmA+I1uDw8PjB49mny8ANsWThLPHylQlFCK9n33KHfm5C36e/fujd69e4vSu3HjBuzs7JCTk6PQDoPqwdh+4SB2vIAsTdbIyAjvvPMONDQ0yE5geOrr6xEfH4/KykoAdEEzqybzLGloaFAI7qnG3L6lAMXJHCBLG/bz88OwYcPw9OlTsiAfYNeo/PDhw0hPTxeu79+/TxIo9unTB/369cOFCxdgZmYmWPVT0NTUhKKiIhgYGEBfXx+vvfYaiW5VVRUyMjJgYmJCpgn8uaC0t7eHl5cXtmzZAhcXF0ybNg2dO3cWpc1qzN9//z20tbURExODgQMHYu3ateA4DoGBgaJqmt555x0MHDhQuJ44cSLFcAE8mxlANee6dOmCH3/8EYDs/hBbK82XLHz22WeYNWuW8DmfxkiBlpYWKioqoKWlhd69e4t+R7m6usLV1RWnT5/GmDFjiEapyPDhw4UG8FT1pcCr12S+b9++CjWllJkILFs4STx/pEBR4qUjKioKLi4uMDc3R35+PlatWiUqtW7hwoUAgFGjRuG9994T+jy99957JOPt2LEjkpOThXQLKtdMQBbkLl68GLW1tdDT08OmTZvwzjvviNZds2aN4KA2bNgwsrRIVk3mWTJw4ECUlpbC2NgYgGyRRkHHjh0VTppv3LihsHhVlnnz5iElJQVTp05FVlYW6a44q0blLOqkAJnr7fvvv4+amhps374dGRkZJLqA7ATf1dUVmzZtQlJSElljdWNjY6xfvx4BAQFISkoiO2n29PSEtrY2UlJSMH78eHh5eYHjOISEhIjuOWpiYqIwZqoU+EOHDmH69OmIjIzEG2+8AUC2WREWFiZK9+HDh1iyZIlQDkCZVseKw4cPC8Fdhw4dEBISAn9/f6X1+P/2K1euYNCgQULq91+5aStLZWUlpk2bBk9PT/z666+4efMmie6YMWOQlpaG3NxcWFpakr2rL126hODgYCFVnaq3LfDqNZkfNmwYE7dvHnNzc6GO98iRI3B2dibTlni+SIGixEtDXV0dampqUFRUJLRV0NHRIUuf2rdvn8IuH5VZgI+PD7S1tVFUVISBAweSuWYCQFxcHGJjY2FgYICHDx9iy5YtJIFinz598Omnn6KpqQlTp04lq1th1WT+3r17CAoKUrCOp9oBrampweTJk9GxY0dh8fB36T7/hOzsbBw7dkyoz6CoXQVkC0B+EWhjY4MLFy6I1uRh1aj8448/VjDJoVr4rVy5EpqampgzZw7CwsKwePFiEl1A9l3wm0zAs1kUyjJjxgzBkr979+5kur/++ivmzZsnfCcA0NzcjMLCQtHa7733nhBgaGlpkS2uP//8czg4OAhBIiAL0OVrAJWhrKxMoR0NZVoddZP5K1euCP/48gjKenQtLS2F+uDW1laoqamRaL///vtwc3ODoaEhamtr4evrS6L7/fffIysrC6ampkhLS0NmZqaQpiyGkydPIjw8HFFRUZgzZw5ZCjXAtsl8ZWUlHj16BCMjI5JyC0CW5tuvXz9yt28A2LZtG44dOwZVVVXhnSoFiq8uUqAoQQJFCsfZs2cRGxuLsrIy5OTkCPUJFC6fADtbcyMjI3z55ZcICwuDm5sbac2YqampMOY33ngDpqamJLrFxcWoq6sTUsouX75MYtTx0UcfKSzyqPo+srKOB2SGKOnp6UK6JZXpxbp16/D5558Lp7VUphezZs1ScCItLy8n7X8VGhqqkEJFASuTHBMTE2ERtXDhQrJFFKBoZhMQEIBZs2a91GY2/v7+Qvuc4uJilJaW4t///jd27twpWpuV+3JiYuIzLWNUVFREO2kGBAQoPCtHjBghSk+esLAw4btobm7Gd999h40bNyqtp6urCyMjI3Tq1EmoF1NVVcW4ceNIxvvWW2+hsLBQON3ZvXs33N3dSbRZ9ZRsbm5WeI/K19OLoWfPntDT00NrayvU1dVJgyNWvg3x8fHYuXMn+vTpg/Hjx6OgoABffvmlaF0zMzOF3slUbt8AcOvWLSQmJpK/UyVeDFKgKKEULJrMOzk5wcnJCampqaTpdDysbM0fP34MQFbH8/vvvyMzM1O0Jk9xcTF+/vlnmJiYkLaxsLe3R25uLsaNGwdvb2+y3b6lS5eitrYWJSUlMDMzI2syz8o6HgCGDBmCxsZGaGtrk2kCwIABAxRqYg0NDUl0Bw0ahKlTpwKQpclSpUQC7PqjsjLJYbWIAhTNbOLj4196M5tLly4JgaJ8MCdmXrN2Xx48eDC0tLSE69jYWBLXzA4dOmDJkiXIz8+HhYUFSSDOqsm8hYUFLCwsYGtrq9BmiYrNmzcjIiICwJ9mNlSBIqvN1/YnnqqqqqI1AeDq1avo378/Ghsb4eXlhZKSEhJdABg5ciRiYmKQm5sLCwsLsndqbm4uTp8+jbCwMDg6OuLGjRskuq+//jqOHj0qpJ5SpspaWloyeadKvBikQFFCKVg0medhESQC7GzNzc3Ncf78eYwYMQITJkwQ1duoPQsXLmTSxoIPYCorK0n/dufOnYOvry90dXVRW1uL1atXk6TWsbKOB2TmDt9//z0MDAyEhZR8gKcsQ4cOxYoVK4RAi6pOSj690sjICFlZWaI1eVj1R2VlksNqEQW8OmY2LIM51u7LqampiIuLExar9+/fJwkUN2/eDAcHB8yZMwclJSUIDg4W3UKGVZN5HhZBIiAztZHPEPjpp5/ItFltvqqpqcHd3V1oMk9llrd582aoqqrCysoKMTExpJkT/v7+aG5uhqmpKW7evIn8/Hx4eXmJ1uVPavlnJ1UpzqlTpzB48GChrpsyVVZfXx82Njbk71SJF4MUKEooBcsm86xgZWsuHxheuXKFRJOHb2NB7Xp68eJFrFy5Eg0NDejQoQMCAwNJXNpSU1Nx9uxZaGpqoqmpCX5+fiSBIrV1vDys0mXDw8MxZswYYc5R1UnJt3mpr69HYWEhPv/8cxJtVv1RWZnksFpEAWzNbPz8/BAYGEhiZsMymGvvvsxz79496OnpidY3MTHB1q1bhWuqe69Xr15CC4j+/fuTOAOzajLPmrlz5yrUuH344Ydk2qw2X7/66ismbSw6dOiA5ORk3L59G7179yZ149TX11fIZggJCSHRffz4MXx8fPDw4UMEBQWRaAKyViHybsCULc7OnTuHCxcuCM9nys1oieePFChKKAWLJvOsqampgaenJ7khSmlpKQIDA5kYrWRmZmLx4sWoq6uDrq4umevp4cOHERcXh9dffx0VFRXw8fEhCRS7d+8uGD1oamqie/fuojUB2SkMbx1PTfvUSqp0WQsLC7i6ugrXVHNCvs2Ljo4OickDD6s6G1YmOawWUcDzMbOxsLAQrft3wVxhYSFJMAfIXERZNBNfsGCBUJdXXFwspM6K5e7du3jy5Ak6d+6MyspK0hYyrJrMU5vk8LBMz2a1+QrINjtUVFTI0k4BwM/PDyUlJYJJTkpKimhHYJ7a2lqFa/5eEYunpyeOHDmCvLw89OzZkyyl9eTJk3j33XeFdzRlizMrKyuFmlX+Hpd4NZECRQmlYNFk/u+gqnvYuXMnZs+ejcuXL5MaorDSBdi5ng4YMACvv/46ANmpJf+S4E8YleXu3bvYt2+fUFPJu9eKJTAwEKNHj4azszN5itbx48eRl5eHESNGYOTIkWS66urqTOzH165dK/ztAKCiogIdO3YUrQs8GwyVlpaSLCBOnTqFsWPHYvjw4cjJyUFAQABWrVolWpfVIgqQ/bcHBQWhQ4cOpBtA2dnZ8Pb2hoGBAcaPHw8dHR1RaVlJSUmws7PDrl27FD6nbAnBqpn4qVOnmJjkTJo0CRMnTkR9fT06deqETZs2idbkoW4yz0NtksPDMj370aNHcHNzQ15eHiwtLeHn50dySsfK9bSlpUXBJIcqSARk5jATJkyAsbExSktLFXpXimHr1q0Kay0qrKyscPLkSVRUVODDDz/EkCFDyLTT0tJw/PhxGBsbg+M43L9/XzK0eYWRAkUJpWDRZJ6HhVEOwM4QhaXRCivX0wcPHuDIkSNCDUhDQwOuXr0quiZtxYoV2LVrF9LT08lMJADA29sb3bp1w6FDh1BfX4+PPvqIpCchIKtb+de//oXk5GR4e3ujb9+++Pjjj0U7aGZkZMDR0ZHMfpw3Sbh48aLC55T9r+zt7YU0Tv7e41P4xFBUVCT8bGlpiePHj4vWBGRp36tXr4aLiwuJnjw7d+7ErFmzyDeAoqKiEBoaioSEBEycOBF+fn6iAsUbN27Azs5O4aQZoG0JQd1MnLVJzqBBg5CcnIzKykro6+vj6dOnJLoAfZN5ViY5PCzTs/fs2QNPT0/06NEDd+7cwe7du0XXggLsXE/bG4rxm3iZmZmi7+2pU6di8ODB5E3m8/Pz4evri549e8LJyYlsU5APvJuamrBs2TL4+/tj9uzZGDduHNTVxYUGRkZG2Lx5MwDZfX3kyBHR45V4cUiBooRSsGwyz8ooh5UhCkujFVaup/x4f/31V+Gz2NhY0TVpKioqcHNzQ8eOHVFZWUnWrqBbt27o2rUrhg4dir1792Lp0qU4c+YMifbdu3ehoqKCa9euIT09HZqamtixYwe6desmpAgqw5o1axRSscT+7Q4cOAB/f38cPXoUQ4cOFT6ntHh3d3cXHFXLy8tF19xGREQgIiICtbW1OHbsmNDyxtbWlmK46Nu3r4JLK2UtL6sNIFNTUxgZGQkLd/kegsrAp8d+9tlnCqcCVL0qAfpm4qzqKvnA8OrVq8JnhYWFpI6O1E3mWZvksEzP7tWrl5BxYGVlReZmzMr1NCUlBRcvXhQ2SLW0tFBUVERSOpOcnAwA+PDDD3H+/Hl07NiRJHV4y5YtQp/moKAgaGhowMfHR7Tu999/D21tbcTExGDgwIFYu3YtOI5DYGAgvL29RWnzQSLP+PHjRelJvFikQFFCKVg2mWdllMPKEIWl0Qor11Nvb++/TDWRDxyVYenSpZg0aRJGjRqFjIwM3L59G1999ZUoTQBYtmwZampqYGhoiBkzZpD2RVu+fDk0NTUxbdo0HDt2TNiBb/+y+9/CcRxUVFTwr3/9SyH1NjY2VlQKkb+/PwAIp548BQUFSmu2hw8SAVkq4P3790Xpubq6wtXVFadPn8aYMWPEDu8ZunbtipSUFMGl9eDBg2T3CKsNoPz8fCQkJODhw4dISkoiq58LDAzErFmzMHbsWGhoaJBmNlA3E/+7ukqxLF++HHv27MH69ethaWkpfE7p6EjdZJ61SQ7L9Ow7d+7g1q1bMDY2xt27d8naTbByPTUzMxM2JuShKJ05deqU8OyxtLREWFiY6IALkLW9MTQ0RGRkJDIyMv5y/Mpw6NAhTJ8+HZGRkcJmVWtrK8LCwpTW5NPg5Q3XANo0eInnjwrHcdyLHoTEq8eOHTueaTL/xRdfkGjPmzcP1dXVr5RRjjz8rjYL+EJ8Cqqrq1FfXw9Alu77zTffiNbk5wMPP0/EMnfuXPj4+JD9t8sTGBiIFStWKDSxb2pqUqgZ+ic4OzvjyJEjsLe3F2o0AIiu0/i7es8ffvhBwXlQDPJ1g3V1deA47pmXvrLk5uaiqqoKPXv2hKGhocL3rSzDhw8XnhOA+O9Yntu3bwsbQBYWFli/fj1JOtmDBw+wYcMGhc0fipOH5ORk6Orq4tSpU+jSpQucnZ1JdIE/F4A8169fJ1m8U9dr8mRkZChshP36669kRjmTJk1CeHi4Qv9ACg4dOoT+/fuTm+QAsrl8+/ZtmJubk7U2AWSntV5eXuT3CAAm5ld1dXUKqZv8+7S+vl509svevXsxZ84c4ZpqTTRkyBBYWFjgk08+wahRo0SnhfJcunTpmawDjuOEHsjKsG3bNixcuBBff/21wn2clJRElnEm8fyRThQllIJlk3lWRjk5OTn49ttvUVhYCHNzc6xdu1YhbU1Z6uvrkZaWJgRdlGm49+/fx5kzZ8jdBj09PZGdnQ09PT2h2JwiUORTkXmePHkiWhMANm7cqJBSmJ6eDhsbGxLtUaNG4e7du1BXV8f+/fsxceJEDBgwQKkgEYBQj+Ht7Q17e3vhcz41SVlmzZoFIyMj1NXVoaqqCkZGRigrK4OOjg5ZoAhA2LHW0dFROJURw549e5CcnIzu3bvDyckJkZGRWLZsmWhdDw8PhcV0WlqaaE2e3r17Kzjttp/bymJoaKhQc0V1ojhkyBDo6Ojg9ddfx9atWzFx4kRcunSJRLu9yVFOTg5JoEhdr8mjqqqKkpIShXuaClZN5lma5MTHxwvlCxMnTsS8efNItBsaGpi5UQ8fPlyohT1z5gxGjx4tWrO2thZHjhx55n1KUSJx+/Zt3Lx5Ez169MDdu3dRXFwsWhMAvvnmGzJjHHmuXLmC1157DeXl5UJ9opubm6hTbT4NnvcU4KFMg5d4/kiBooRSsGwyz8ooZ9u2bfDx8UGPHj1QXFyMrVu3ikqz4HF3d4eFhYVgRU+Zhuvh4YHRo0eTuw02NDQoBOBUtSU9e/bE+PHjYWJiQur89uDBA6xZswZVVVXkLmpxcXHw8PDAt99+CxsbG0RFRWHAgAGideWDROBZ+/R/io+PD2xtbbFv3z58+umnUFFRQVtbG7Zv3y5KV54vvvhCOBEoLi5GamoqiQtlQ0MDIiMjERYWBhsbG7L51v7Ehd+soYDVBlB5eTmTVhPLly9Ha2srysrKMH36dNEpkYBscyIyMhLvvvsudHV1hdPx+vp6EgMh6npNHlb3NMCuyTy1SQ5PYWEhEhIShOslS5aQ6AKydhMjR47EqFGjhO+DgujoaERGRioY2lEEiqzep4As66X96SoFs2bNQkNDg7DpevToUSxYsEC0roaGBqysrBAUFITjx4/jwIEDojXla4PlU/Upa4Qlnj9SoCihFCybzLMyyunfv7+wC/7OO++QuWb26NEDXl5ewjWV4QwgM+uQNzWgarsxcOBAlJaWwtjYGABE16LxsHJ+27dvH+bNm4dTp07B2dkZ0dHRJLqArAG6trY2Hj9+jBkzZii47SmDvHMoD7/YGTdunNK6vAFMRUWFoK+qqkq6McGqXUFrayuAP50X29raRGsCsvSp4OBgYQOByiEZkKXA9+nTR0gjp/qeWbWaqKiogIeHB+nu/Y4dOwAAXl5emDRpkvD5iRMnSPRZ1WtS39PysGoyT22SwyOfmg1AqG/mm86LYeXKlejVqxd+/vlnREZGolu3biSnladPn8ahQ4eEDWMqQztW71NAtnnO4nR13759iI+PR319PQwMDFBRUUESKDY1NaGoqAgGBgbQ19fHa6+9JlozICAAffv2xe+//47GxkahxlTi1UYKFCWUgmWTeVZGOc3Nzbh06ZKC41l5ebnoGq/hw4fj6NGjgtU25e7ZsGHDmPTiq6mpweTJk9GxY0dhgS2/EBSDubk56e4yIDM06tevHy5cuAAzMzOhMTUFhYWFWLRoERwdHfHgwQMUFhaK0vv888/xySefIDo6GjY2NkLK188//0wy3kePHmHt2rUwNTXFnTt3SBo7s25XoKamhrlz5+Lp06e4ceMG+vXrR6J78uRJhIeHIyoqCnPmzMHevXtJdAGZo6O8uyDVgoe61QRPcHCw8JwAZPV/YlPr+XquSZMmoaioCAUFBejbty+Zi6GHh4dQr1lZWUlmRER9T8vDqsk8tUkOT2ZmJpYvXy6895qamhAaGkpykj1gwABcvHgRly9fRmZmJlktIe+Ky0PVsJ3V+5QlDx8+RFxcnFD/T/WMU1NTg6urKzZt2oSkpCRkZ2eL1vTy8sLgwYMRHh6u8IyjqnOXeDFIgaKEUrBsMm9kZPSMUQ4FJ06ceKaW8sKFC7h//76oQDE2NhaamprCi43SYS8iIgL9+vUj68XHk5ubi/T0dOGE52VvhpuZmYn3338fNTU12L59OzIyMsi0V/GY/xwAACAASURBVKxYgczMTNjZ2SE/P1+0K+Ann3wCQHbCwy9ITE1NSQI6AFi/fj1iYmJw+/ZtvPnmm5gyZYpoTVbtCngWLlzIxJyiZ8+e0NPTQ2trK9TV1UlPV3v06IHU1FShJURcXBzJTj51qwkeTU1NbNiwQcjEoDQB41MB+brY2bNnk8y79vWaVK6Z1Pe0PKyazIeEhCA8PBzAn70PKdDQ0BBOmfkMEoDmJNvBwQEGBgZYtGgRAgMDyYxW+vTpo2AGRlVqwOp9ypIOHToA+DOtnqr2ccGCBQrPs/Z1yMrAG0a1d4im2nCUeDFIgaKEUrBsMs/KKKe9uQiPWJORLl26YMOGDcL1rVu3ROnJY2ZmBk9PT+GaKq11yJAhaGxshLa2Nokea1auXAlNTU3MmTMHYWFhWLx4MZm2vr6+YKBBZd4CyE50Tp8+DTMzMxQXFyM3N5dEV1NTU6G/I4WxD9+uwNDQUMFZtr17pBhYmFNcvXoV/fv3R2NjI7y8vMgCDQDYv3//M46qFIEidasJno0bN2L06NG4cOECRo8e/UwvOjEUFxcrpJtS1V+xMutqf0/HxsaSzWNWTeZZmeS0NxbhoUhRPnPmDBITE3H58mXk5ubC0dGRJJskOjoa27dvR6dOncBxHOLi4kRrAuzep4AslZPPdKmsrERLSwuJ6/CDBw+QmJiIbt26wdHRkcyzgSVqampwc3MTsl7kMx0kXj2kQFFCKVg2mWdllPNXQSIA0c2/LSwskJ6erpDOQpVa98YbbzBJa/3hhx/w/fffw8DAQEg9pXAbbA/VYkdVVRVNTU0wNDSEi4sLkzYZ1Hh5eeG7774TaoEoemoBstPgHTt2MDH28ff3R0hICDQ1NXHv3j34+fmRLNJYmVNs3rwZqqqqsLKyQkxMjEK6k1hYOaquWrVK2L1va2sjq0WztLTEBx98gOLiYgwbNgxZWVkkugAEo66/u1YWVuYiISEhiImJgYaGhjDfqFpNsGoyz8ok56+CRAAkm7vXr18X3qtRUVGIiopCYmKiaN23334bFhYWwrWDg4NoTYDd+xSAQjul5uZmfPfdd9i4caNo3XXr1gk/W1lZPVNz+jLi7e2N5ORkFBQUYNiwYRg5cuSLHpKECKRAUUIpWDaZZ2mUw4K9e/c+c/KgbGuF9iQkJGDw4MFCqiXV6cNHH32kkG4bExNDohsTE4MDBw4oBAQUi52AgAC4urrCxMQEjx49wtGjR0naK7DE2NgYW7duJddlaezz3nvvISgoCObm5kJfNwpYmVPwaVkAMHv2bBJNHlaOqhUVFcLPt2/fxuHDh8l6EpaVlaGqqgrx8fG4fPkyvv76a9G6gCy7w8/PD8bGxrh37x5ZiiErc5GbN2/i/PnzUFVVBUCbWs+qyTwrkxyWrFq1Curq6rC1tcWiRYvI/n5FRUWYOXMmTExMANClUbN4n+bm5gr/+E21trY2/PHHH6K1AVm7JS0tLYwfPx65ublQU1MjKQlgdQLKY2trK3oTXuLlQAoUJZSifY+xyspKMm2WRjksYNnLTf70AZDt4FKwdOlS1NbWCs11KeqNAFkdaEREhOAUSRUQWFlZ4d133wUAWFtbk56WtIfqFJQVLI19BgwYgMLCQkRGRmLevHkYMWIEiS4rcwqWUG961NXVoaamBkVFRSgvLwcgC3Q1NDRIxuvq6oqGhgZMnz4dGzZswMyZM0l0AVnNX0xMjFBjSlXzx8pcZODAgUKQCEChybpYPvvsMyZtEFiZ5LAMCJycnDB//vxnXJ7F0traKvTkA+iCZhbv05qaGpSWlqK6ulrIrFJVVVXYABHD9evXBVOtMWPGIDg4WCF9VllYnYBK/N9DChQllIJlk3mWRjnyUAUE7U8e3n//fdGaPHp6ekwaR587dw6+vr7Q1dVFbW0tVq9eTVLMbmVlJQSJgCyooaC8vBwtLS1QV1dHS0uLsNCmgNUpKCtYGvvMnTsX7u7uOHXqFM6ePYt58+aRnDazMqdgycmTJ3HgwAGhn53YTY+zZ88iNjYWZWVlQh2zuro6SfsRAHjzzTfR1taGjh07wtfXV+E+FIuqqioGDx4MPT09mJubKwRhYmBlLpKSkoKjR48qtP+hmm+smsyzMslhGRBQ1Oz+FayC5pEjRyImJga5ublkGx7W1tawtrbGuHHjRDWr/zvMzMyEE3wtLS28/vrrovRYn4BK/N9DChQllIJlk3lWRjmvWkAAsGscnZqairNnz0JTUxNNTU3w8/MjCRTz8vIwbdo0IRWXKmVo2LBhcHBwQOfOnVFdXa3QtkAsrE5B21NRUUGyk8/S2Ofrr7+Gm5sbAGDUqFGora0l0WVlTsESKysrhabnYjc9nJyc4OTkhNTUVDLXV3mWLVuGiRMnYtSoUf+vvXuPy/n+/wf+KKk+K7NqhE5UlEM0rJmaGslsObS2MJ9kN1/lLM2xmu1mLYkw2gcJw26zFgkzE66Fai6HHDtclFLJuVIqKt6/P7r1/nU5bl2vV+/r0vN+u7ndPu/+eO5583F1vZ+vw/OJ06dPIycnB9OmTWMSOyYmBnv27BFHvYwaNYrJvDxezUXMzMywatUq8ZnV0XqA35B51k1yNLkg2LJlC1xdXVFcXIzw8HBMmDBB/L2kivDwcNTW1sLKygqXLl3C5cuXlWYgqyI1NRVlZWWorq5GREQE/Pz8mNyLzcnJwZ9//gkrKysUFBSo/BnhvQP6PMePH2e2IEaaHxWKpEl4Dpnn1SinuQoClngNju7UqZN4HElXVxedOnViEvfx48cICgoSn1kdGRoyZAjeffddXLt2DVZWVkqrzapivQv6oiKI1a77V199hW+++QY9e/bEokWLVI7XmL+/P7Kzs1FaWoouXbowawDCqzkFDw1/pyUlJfjiiy/ExkmsFj3q6upw9OhRuLq6Ijk5GT169GCygNCnTx94eHgAqC/yWc4OzM3Nxf79+8Xnr776iklcXs1FGheJABAYGKhyzAa8hsyzbpIjRUHASuvWreHo6Ihly5Zh79692LZtG5O4xsbGmDp1qvi8du1aJnGB+u6k48ePx4QJExAVFYVdu3YxiRsYGIhly5aJR51VnTXKcwfU19f3mWPImnKChLwYFYqkSXgOmefVKIfXsUieeA2OLigowJYtW8QdAlZHOZ8+MsSylbcgCDAxMcGDBw+wdetWZseeWO+C7t69W7xP2RirXXc7OzulJjOlpaVKO1+qiI2NxdGjR9GpUyd4eXlh+/btTJoG8WpOwYO2tvZzj3izWvQ4cOCA+HfavXt3xMTEMOmI2zA/sUFZWZnKMRs83WnRzs4OAMSOvk3Fq1lXRkYGQkNDYWJighEjRsDAwIBZV2deQ+ZZN8nhfSSSp5qaGly9ehUmJiYwNjbGf/7zHyZxnz4hwWq2LVB/LPT27dvQ09ODra0ts9/J7du3V5o1WlhYyCQujx3Q3r17i3OEGwiCgB07dqgUl0iLCkXSJDyHzPNqlMPrWCRPvAZHL1iwABs2bMCJEydgb2+PBQsWMIlraGiIo0ePii+trHbRgoODceHCBRgbG4srlKwKRda7oKGhoc/tSseq9X+HDh1w7Ngx2NjYQEtLCz///LPKq8wNqqqqsH37dsTExGDAgAHMWv/zak7BQ3BwMAwMDHDv3j1xrt2NGzcwZswYJvG7du0qxjU1NYWpqSmTuF26dMGIESNgYWGBoqIi+Pr6MokL1N+LnT9/PiwsLFBYWIiamhpER0erPPeQV7OuuLg4REdHY//+/Rg1ahTCwsKYFYq8hszzapLD60gkT61atYKfnx9WrlyJv/76CxkZGUzidu7cGSNHjoS5uTnzz0hJSQnGjh2L4OBgnDlzhtnYm+LiYiQlJTGfNcpjB/RFi4off/yxyrGJdKhQJE3Cc8g8r0Y5vI5F8sRrGLyWlhb8/f1haGiIkpISGBgYMIm7ePFi6Ovr4+rVq3BwcGC2i1ZVVYXff/9dfGZVwADsd0EbF4m5ubniQgerXfcdO3bgxIkT4vONGzeYFYqPHz8GAPH40JMnT5jEffrvuGFnUR01fBZ27NghNgF544038OuvvzIZY5Gbm4tLly7B0tISBQUFyMvLUzkmAPj4+KBfv364cuUKunXrxqQRSoPWrVuLA9obGsQAqi9+PH0v+vLly0z+jq2srGBmZib+O27Xrp3KMRvwGjLPq0kOryORPM2cOVNpIZDF/XmA72dk4MCB8Pf3h6mpKSoqKrBkyRImcYOCgjB06FDms0Z57YACwM2bN7Ft2zZxwVgTFuXJi1GhSJqE55B5Xo1yeB6L1DRz587F6NGjmTe+MDMzw9SpUxETEwN/f39mdyodHBxQWVkpvsSXl5cziQvw2wWNjIxEXl4ebt++jc6dO+Pq1asqxwT4jmNp1aoVJk2ahIcPH+LixYsqf6Ybugs+PYSbZZdk1k6ePCn+iY6OBlB/fOrWrVtM4k+aNInLzhEA2NjYiEVLQkICs52j0NDQ5w5ubygem2rNmjXYvXs3tLW1xQZjLE5NXL58Gfv378edO3fw119/MTuuB/AbMs+rSQ7PgkDTHD16FADw0UcfITk5GYaGhsxGhaxduxabNm0CALRp04ZJTADo2bMnJk2aJD67uLgwictrBxQAVqxYgWHDhuH48eMYNmwYWrVqxSw2aX5UKJIm4TlknlejHF4FgSZydHTk0vji3r17AOrvS928eRPp6elM4v78889Ys2aNeGSPZcdaXrug+vr6WLdunVg0b968mUlcnuNYZs2ahZSUFLGIUfX+1bZt2xAeHo5du3bhvffeE3/Osksya2+++SbMzMzQpk0bcRVfW1sbnp6eTOLb2Ngo7Rw9fbewqdauXYv4+Hi0bt1aLLpYFYrPKxIBqNyROjMzEzKZTNz5Y9XwIigoCJGRkVAoFCgpKWG24w7wGzLPq0kOz4JA0/zxxx/ivwWW94OB+gKu4fsJYDd+y9DQEL/99hs6d+4MLS0tZidTeO2AAvV/t0OHDkVeXh6cnZ25zj0m/FGhSJqE564Gr0Y5vAqC5sTqy4dX4wsbGxskJydj0KBBGDlyJMaNG8ck7ieffIK5c+eKzyzb3fPaBa2trQVQv/tZV1fH7J4NT3/88Qc+/vhjuLi4ICsrC0uXLlWps2p4eDgAYPbs2bCzsxNX2q9cucIkXx7s7e1hb28PV1dXprMIGzx58gTHjx9nvmB16dIlJCcnizMONaHLYPfu3fHo0SPo6+szjWtqaqrUAETVxjCN8Royz6tJDs+CgJeamhqxK3dJSQnq6uqY7PzZ2dlxuR8MAFlZWfDx8RF3gxUKBZPv6gMHDqBfv344e/asGJcFXjugQH0zqevXr6O0tBR79uyBXC7H9OnTmf43SPOhQpE0Cc9dDV6NcngVBDzxmv3Iq/FF48Lw5MmTTGIC9UdlKyoqcO3aNXTu3Bmff/45s9i8dkF1dHQgk8nQq1cv9O3bVyMu9Dc+Htu9e3fs3buXSdyFCxdiy5Yt4gvJ85r9qJvS0lJMmzaN+QB0XgtWDg4OYpEI1O9EqDtjY2MMGDAAJiYm4u83Fk1nbty4gYMHDzJvAALwGzLPq0kOz4KAl5iYGPGEUm1tLZYvX44VK1aoHDcnJ4fL/WCg/l5348VMVj0QeDV84rUDCgB+fn6oqqrCuHHjEBkZybRpEGl+VCgStcOrUQ6vgoAnXrMfeV3qLyoqQkREBAwMDODq6ooOHTowOZp15MgRLFmyBG+++SYqKirwzTffMGtwwGsXdOrUqeKquJOTE+rq6pjE5WHr1q3YunUrKioqsHv3bgiCIB6vY2HYsGFKDWxOnDiBAQMGMInNy+bNm5kOQG/Aa8Hq2LFj2LVrl9hshuXssrKyMqxbtw6tWrXCe++9BwsLCya/M44cOYLjx4+LBQyr329BQUEYNmwY8wYgPPFqksOzIGAtOztb/NMwj/bJkyeorq5mEp/n/eCneyA4Ojoyievm5ob4+Hjxvjerzue8dkCB+qK5YTFwxowZLfq48+uACkWidng1yuFVEPDEc/Zj48YXrKxfvx4TJkyAXC6Hu7s7wsPDmRSKqampOHToEHR1dVFTU4OwsDBmhSKvXVBeq+I8+Pn5wc/PDwcOHMDw4cOZxy8qKsKcOXPEf2+nT59W+0KxS5cuTAegN+C1YGVmZiYOmhcEATt37mQSFwCioqLQt29f5Obmok+fPli5ciWTI4yOjo5Ku1wNhZ2q7OzslAbLs7pHyBOvJjk8CwLWysvLUVRUhPv376OoqAhA/f3gxv9fquLp+8Es3b17F/7+/sxPIISHh6O2thZWVla4dOkSLl++rNTDoal47YAC9YtWDb877e3tNaLDPHkxKhSJ2uHVKIdXQcCTps1+tLa2hpOTE86fPw9dXV2Vm1006NSpk7g7p6uri06dOjGJC7DfBeW9Ks4TjyIRAO7cuaN0XFgTdnhYD0BvwGvBqqFILC0thZGREQIDA5nEBeqLZi8vL8TExOCtt95i9rlOS0vD3r17YW5uLs5HZbEL6uzsjDVr1igtNqp74zJeTXJ4FgSsOTk5wcnJCZ6enujcubPU6fwrsbGxXE4gGBsbY+rUqeLz2rVrVY4J8NkB3b17NxISElBcXCy+YwmCAD09PZVjE+lQoUjUDq9GObyORfKkabMfFQoFzp07h0ePHuHy5cviqrCqCgoKsGXLFlhYWKCgoADFxcVM4gLsd0F5r4proqVLl8LKykp8HjRokITZ/DO8BqDzWrBKT0/HnDlzUFFRgbZt22LVqlXMjr9duXIFt2/fhpaWFioqKnDz5k0mcRvvggLsmlRt3boVPXr0ED9/mtC4jFeTHF5HInlKTU1FWVkZqqurERERAT8/P2YdfHmxtrbmcgKhoqJC6bnh3q2qeOyAuru7w8nJCb/99ht8fHwA1I9cYjnHlEhAIKSFCAkJEeRyubBmzRrh0aNHwjfffCN1Sq90//59ped79+5x+e8kJSUxiXPlyhVhzJgxgqOjozB27FghNzeXSdwHDx4IUVFRgr+/v7By5UrhwYMHTOIKgiBs2rRJEARBiImJEQRBENatW8ckbl5eHpM4r4PS0lJh6dKlwrJly4Tk5GRm/y40UWZmpuDt7S04OjoK3t7eQkZGBpO4X3/9tXD37l1BEATh9u3bQnBwMJO4giAIp06dEj744AOhd+/ewocffiicPXuWSdzDhw8zifO0kJAQpeeW/FnMzc0VfyePGzdOIz57UVFRgiAIgq+vr3DlyhUhIiKCSdxHjx6J//vevXvCrVu3mMQVhPp/cxkZGcL9+/eFixcvCosWLWISNy4uThgxYoQwdepUYcSIEcJvv/3GJO6iRYuE8+fPC6WlpcLZs2eFhQsXMokrCIJQWVkpVFRUCIIgCCUlJcziEmnQjiJpMXgdi+SJ1+xHXt1UbW1tle6AlJSUqBwTqD8+5e/vD0NDQ5SUlMDAwIBJXIDfLuiBAwdgamqKTz/9FHFxcejcubPSLMGWhNcdN55yc3PFHUWWd47WrFmDxYsXw9LSEnl5eVi9ejViYmJUjmtlZSU2LWnXrp3SDq6qDA0NcezYMZSUlDAdGbJu3TqcOXMGo0ePZnr/ul27dlxGLGkiXkciedLT08Pt27ehp6cHW1tbGBkZMYnL8944rxMIvBrP8doBBYB58+Zh1KhR8PDwwKlTp5CTk4Np06Yxi0+aFxWKpMXgVRDwxKuVPq9uqpWVlUhLS0NlZSUAdoXt3LlzMXr0aHh4eOD06dNMv3gmT56M0NBQKBQKpKamMvuCLysrE++WjBkzBsuXL2+xhSKvO248bdq0icsLds+ePcUXtHfeeQcODg4qxwSAvLw8JCUlicez8/PzmcQFgNDQUPj6+jIf8bJs2TJ06tQJCQkJ2LFjBz744AOxoYsq9u/fj379+uH06dMA2I1Y0kQ8CwJeSkpKMHbsWAQHB+PMmTMqd81sjnvjVVVVXBrlHD16FADw0UcfITk5GYaGhkxmSvK6gw0Affr0gYeHBwDAw8MDubm5zGKT5keFImkxeBUEPPFqpc+rm+qUKVNgb2+Ptm3bAmB3N8jR0ZHbFw+vXdC3335b6bnxPaGWhtcdN554vWDX1tbi77//hoWFBQoLC6Gnp4fi4mL88ssvSk1H/q1Zs2Zh2bJl4o7G/PnzmeQL1M8OfPPNNxEZGQkjIyN89tlnTF5WHz9+jFatWkFXVxdnz55FcXExUlNT0a9fP5WKUl6z5zQRz4KAl4EDB8Lf3x+mpqaoqKhQ+fRBc9wbDwsLg5ubGzw8PJh2E//jjz/Ez3L37t0RExOD0NBQlePy2gEFIJ6AalBWVsYsNml+VCiSFoNXQcATr1b6vLqpWlpaKrXuZrWrwfOLh9cuaGFhITZt2gQrKysUFBTg+vXrKsfUVN7e3vjss89w//597NixAytXrpQ6pVfi9YK9b9++Zz7Hx48fx40bN1QqFNu3b4+oqCgA9cVo69atVcqzsf79+8PAwABvv/02fvjhB4waNQp///23ynHnzZuHyspKuLm5YfXq1WKnyx9++EGluE+PzsnKyhKL/paGZ0HAy9q1a7Fp0yYAUBqf0lTN0U114cKFsLa2RlJSErZv346OHTsiICBA5bh2dnbikXJTU1OYmpqqHBPgtwMK1J8gGTFiBCwsLFBUVARfX18u/x3SPKhQJC0Gr4KAJ16t9Hl1U3VxceFyN4jnFw+vXdAFCxZgw4YNiI+Ph729PRYsWMAkribq378/lztuPPF6wQ4NDX3u8cqGI2ZNNXv2bAwaNAje3t7Yu3cvHj58iPHjx6sUs8H8+fPx+PFjXL9+HePGjWN2v7RLly4ICwuDoaGh+LOamhqUl5c3KZ6vry+2b9+Od999V9zBb7iDPWbMGCY5axqeBQEvLi4uYnEEAIcOHWJyh55nN9VevXohJSUFcrkc6enpcHZ2ZhI3JycHly5dgqWlJQoKCpCXl8ckLq8dUIDfvUoiDS1BEASpkyCkOfj6+ioVBKdOncLWrVslzkoa5eXlSkchWb3AT548Gbq6umJslnMfc3NzuXzxhISEKBUB+fn5XFadMzIy0LNnT+ZxNcH169exfPly5OTkwMbGBnPnzoWFhYXUaf0rvIrcnTt34rPPPlM5zo8//ojp06e/8FkVn3/+OYKCgvD+++8zidcgPz8fhoaG0NHRQWJiIoYOHQozM7Mmx3vw4AEMDQ2RmJiI0aNHiz/ft28fRowYwSJljTNmzBhuBQEvkyZNQkVFhZgvq++RlStXIigoCBMmTMDixYuxa9cuZgt4gwYNgomJCQIDA+Hs7AwdHTb7MI2bajUsWLH4/jt79qy4A5qRkcFsB/R5EhIS1H68CXkx2lEkLQavY5E88Zr9yKubqpGRESIjI8XnzMxMlWM2sLGx4fKiw2sX9ObNm9i+fbt4xJll0axpvv/+e3h6emLy5MnIy8tDWFgYNmzYIHVaL8XrBMKaNWuwe/duaGtri7tdLArFuro6peeamhqVYzaIiooSPx8Au0WPdevWYebMmYiOjoaxsTGio6NVahjUsDPZuEgEWvb9YF5HInnS0tJSOobN6sQLr26qAHDw4EHIZDLI5XJkZ2fD3d2dyfeVjY0Nlx1hXjugQP3CTHR0NMrKyqCnp4eHDx9SoajBqFAkLQavgoAn1sPgG/Dqpmpvb48TJ06If8cymQw9evRgEpuXhISEZ3ZBWVixYgWGDRuG48ePY9iwYWjVqhWTuJrIzs5ObE7Ss2dPXLlyReKMXo3XkeTMzEzIZDJxsPrhw4eZxNXR0UFAQAAsLS1RWFjI9E6erq4uIiMjxYUlVoseXbt2hampKa5evYqIiAiVm3X5+vo+M7BeEATcuHEDrq6uKsXWVDwLAl5WrlypVNw7Ojoyicu6m2pjFy5cEI+Ux8XFIS4uDjKZjFl81oYMGSLugEZERDDbAQXqdysPHDiA2NhYpk34iDSoUCQtBq+CgCdesx95dVPdvHmz2CAHAG7cuCHOrVJXvHZBu3fvjqFDhyIvLw/Ozs44d+4ck7iayNDQEIWFhWKnz06dOgEAduzYwezeLWu8TiB0794djx49gr6+PpN4DaZPn46UlBQoFAq4ubkxLQh4LXrk5OQgLCwMzs7OePjwIQoLC1WK17t3b3zxxRfYv38/HBwcYG5ujqKiIqSlpTHJVxPxLAh4uXv3Lvz9/ZnPMGXdTbWxRYsWQUdHB66urggMDGSyoMsTrx1QAOjQoQO0tbXFUw23bt1iEpdIQ/1/YxDCCM9jkbzwmv3Iq5tqUFCQ0hETXi9orJobAPx2QTMyMnD9+nWUlpZiz549kMvlzO6MaZr169fjl19+AVC/wwMAGzZsQGVlpdoWirxOIBgbG2PAgAEwMTERj566u7urHBeoz9nFxYVJrMZ4LXoEBATg2LFj8PHxwblz51QubufNmwegvllXw31KCwsLnDp1SuVcNRXPgoCX2NhYLjNMWXdTbczLywszZsx4ZkdbVTU1NdDV1QVQvyNaV1fHZDQNzx3QCxcuQCaTQVdXF35+fqBWKJqNCkXSYmjisUhesx95dVN9+h7CwIEDmcSNj4/Htm3bUFVVJb5csyoUee2C+vn5oaqqCuPGjUNkZCT++9//qhxTU4WEhDxzbwyov8uirnidQDhy5AiOHz8uvqju3r2bSVyeeC16dOnSRfzsDRgwQOV4Dc6fP4+LFy/CysoK+fn5TI8YahpNOxIJ8JthyqubKlA/a5SHmJgY8fuotrYWy5cvx4oVK1SOy3MHNCIiAvr6+nB1dYWNjQ3eeecdZrFJ86NCkbQYmngsktfsx8aF4cmTJ5nE5Gnfvn3YunWr2HWS5cs1r13Q2NhYTJkyBV27dkV0dDSTmJrqeUUiALXuRMnrBIKjo6PSboYqXT6bi6YtesyaNQuLFy9GTk4ObG1t8e2330qdkmQ07UgkwG+GaVZWFnx8fJS6qbIqOfkADwAAEZtJREFUFFnLzs4W/yQmJgIAnjx5gurqaibxee2AAsCwYcOwZcsWdOvWDUOGDGEenzQvKhRJi9FcxyJZ4tV5kVc3VV4cHR2VRhN069aNWWxeu6B6enpKnSEfP37cohvaaBpeJxDS0tKwd+9emJubi41WWDW04aVr16548uQJDA0NsWTJErWfhdmjRw/s3LlT6jTUAs+CgBdeM0x5dVPloby8HEVFRbh//7545URbWxsTJ05kEp/XDigAjBw5Uuk7uiWPhnod0BxFQtQYr9mPoaGhGDlyJORyOQICAhAeHq7Wq+4BAQG4f/++uCOsCaMmtm3bBmdnZ3H1ev369ZgyZYrEWZF/ysXF5ZkTCCwKujlz5ogvq4IgYOfOnQgMDFQ5Lk/Tp0/HqFGj4OHhgaSkJOTk5GDatGlSp0VeUxcvXoSDgwPzuE/PD258/09d8Zrry1NISAiMjIxgY2MDLS0tZgvcRBq0o0iIGuPVeZFXN1VeHj9+jKCgIPFZnVeCG6xatUos6hvuVVKhqDl4nUBYtWoVgPpGUkZGRmpfJAJAnz594OHhAQDw8PBAbm6uxBmR11lYWBjc3Nzg4eHBtPEOr26qPKWmpqKsrAzV1dWIiIiAn5+f2s8kzMjIgLu7O65fvw6A3WghIg0qFAlRY7w6L/LqpsrL03O1bG1tJczm5fbu3YuPP/4YM2bMwKRJk8Sf//nnnxJmRf6tfv364e7du9DR0UFiYiKzu0zp6emYM2cOKioq0LZtW6xatYrZnDheGuYnNigrK5Mok6Z5+PAh83EkhJ+FCxfC2toaSUlJ2L59Ozp27IiAgACV4/LqpsrTrVu3MH78eEyYMAFRUVHYtWuX1Cm9UHBwML777jssXrxY6SoLqwVuIg0qFAlRY7w6L/LqpsqLoaEhjh49Kr6wqvNRluzsbIwcOfKZXVorKyuJMiJN8b///Q8zZ85EdHQ0jI2NER0dzeSlMjExEQkJCTAxMcGdO3ewevVqtS8Uu3TpghEjRsDCwgJFRUXw9fWVOqXnetEYDFYLbKR59OrVCykpKZDL5UhPT2c2E5RXN1We9PT0cPv2bejp6cHW1hZGRkZSp/RCtra2aNWqFdLS0pQKxXPnzmnc8Vny/1GhSIga49V5kVc3VV4WL14MfX19XL16FQ4ODmp9lKW8vBxyuRwpKSlK867oZVWzdO3aFaamprh69SoiIiKwceNGJnGtrKzEFv3t2rXTiAUEHx8f9OvXD1euXEG3bt3U9rje0qVLYWdn98zPWS2wkeYxZMgQmJiYIDAwEBEREdDRYfOqyqubKk8lJSUYO3YsgoODcebMGbUe9XLx4kUsXLgQCoVCPHYK1H/+XtT5mqg/KhQJUWO8Oi/y6qbKi5mZGaZOnYqYmBj4+/sze2nnwdPTE7///juysrKUGu7Qy6pmycnJQVhYGJydnfHw4UMUFhYyiZuXl4ekpCRYWFigoKBAY45l2djYiPfFEhIS1PKeVEhICPr16/fMz8+cOSNBNqSpDh48CJlMBrlcjuzsbLi7uzO5q8irmypPAwcOhL+/P0xNTVFRUYElS5ZIndILLV26FJmZmYiPj4eXl5f4c03oKUBejApFQtQYr9mPU6ZMUeqmqs47dABw7949APV3pW7evIn09HSJM3qxAQMGYMCAATh9+jT69+8v/pxeVjVLQEAAjh07Bh8fH5w7d47Z8bdZs2Zh2bJl4svq/PnzmcTlae3atYiPj0fr1q3FxkzqWCg2LhKrqqrEu5RpaWnPLSCJerpw4QIGDx4MAIiLi0NcXBxkMpnKcauqqpRO0miCtWvXYtOmTQCgNH9VHenr66Nv377o1q0bDA0NxZ/TaAzNRoUiIWqMV+dFXt1UebGxsUFycjIGDRqEkSNHYty4cVKn9EqNi0QA9KKqYbp06SIu0gwYMIBZ3Pbt2yMqKgoAUFtbi9atWzOLzculS5eQnJwMbW1tAFD7uY9btmzBnj17UFlZCRMTE9y+fZvr3DjC1qJFi6CjowNXV1cEBgYym/HLq5sqTy4uLuJRdQA4dOgQs8ZavDQuEgHAwMBAokwIC1QoEqLGeA2D59VNlZfGheHJkyclzIQQ1cyePRuDBg2Ct7c39u7di4cPH2L8+PFSp/VSDg4OYpEIPPsiqG7u3LmDxMRE8aj65s2bpU6J/AteXl6YMWMGtLS0mMbl1U2Vp6ysLPj4+IiFrUKhUPtCkbxeqFAkpAXi1U2Vl6KiIkRERMDAwACurq7o0KEDs1VmQppTt27d4O3tDQDw9vbGjz/+KHFGr3bs2DHs2rUL5ubmAOqPwKvzruIbb7wBAOId7Ly8PCnTIf8Sr91fXt1UedLS0sLcuXPFZ02471dTUwNdXV0A9c146urqlBq7Ec1ChSIhLRCvbqq8rF+/HhMmTIBcLoe7uzvCw8M1rlDUhCNDhL+6ujql55qaGoky+efMzMywatUqAIAgCNi5c6fEGb3crVu3IJPJ0LFjR7i7u6v13FXSfHh1U+Xp6RnC6j5KBwBiYmLEXgq1tbVYvnw5VqxYIXFWpKnU/1NCCGGOVzdVXqytreHk5ITz589DV1f3mRmF6ig+Ph7btm1DVVWV2ACECkWio6ODgIAAWFpaorCwUJzrps4aisTS0lIYGRkhMDBQ4oxe7rvvvhP/t6OjI81wIwD4dVPl6e7du/D394dCoUD37t0RFhamtuNpsrOzxT+JiYkAgCdPnqC6ulrizIgqqFAkpAXi1U2VF4VCgXPnzuHRo0e4fPkyioqKpE7plfbt24etW7fC2NgYALB7926JMyLqYPr06UhJSYFCoYCbm5tGHH9LT0/HnDlzUFFRgbZt22LVqlVqvbOxceNGTJ48GUD9MdSQkBCxgRBpuXh1U+UpNjYWwcHBsLS0RH5+PjZu3IilS5dKndZzlZeXo6ioCPfv3xe/o7W1tTFx4kRpEyMqoUKRkBaIVzdVXiZPnozQ0FAoFAqkpqZqxPwrR0dHsUgE6u+mEQLUN5NycXGROo1/LDExEQkJCTAxMcGdO3ewevVqtSwUi4uLcf36dVy9ehWnTp0CUL+jwbopCtFMvLqp8mRtbS2eOnB0dIRcLpc4oxdzcnKCk5MTPD09aRf/NUKFIiEtEK9uqrzY2toqzb8qKSmRMJt/RqFQYOzYseLOrUKhQEJCgsRZEfLvWVlZiS3627VrBysrK4kzer7MzEwcOXIEWVlZ4mdNW1sbH374ocSZEXXAq5sqT/n5+cjMzIS5uTkKCgpw7do1qVN6pdTUVJSVlaG6uhoRERHw8/NTy7mr5J+hQpEQovYqKyuRlpYmdjGUyWRYs2aNxFm93OPHjxEUFCQ+a0K3OkKeJy8vD0lJSbCwsEBBQYHazl11d3eHu7s7Lly4oBF3P0nz0sRZml9++SVCQkKgUChgb2+vEadpbt26hfHjx2PChAmIiorCrl27pE6JqIAKRUKI2psyZQrs7e3Rtm1bAMD9+/clzujVnu5WR50XiaaaNWsWli1bJr6szp8/X+qUXury5cu4du0aRowYgcTERPTs2RNdu3aVOi1C/rWqqiql0zSaQE9PD7dv34aenh5sbW1hZGQkdUpEBVQoEkLUnqWlJUJCQsRndd3RaMzQ0BBHjx5FaWkpAM3YBSXkedq3by82g6mtrUXr1q0lzujlLly4gMWLFwMAhg8fjqioKAQHB0ucFSH/XlhYGNzc3ODh4aH2HVoblJSUYOzYsQgODsaZM2dw6dIlqVMiKmj17bfffit1EoQQ8jI1NTW4cOECKisrUVxcjB07dojd69TV119/jeLiYpw4cQK6urq4du0avLy8pE6LkH9t9uzZqKqqQo8ePZCYmIizZ8+q9dHOvLw89OvXD0D9OJKsrCzxmRBNYmtri0GDBiElJQXx8fFQKBTo37+/1Gm9VF1dHaZNm4bevXujTZs2cHNzg76+vtRpkSaiHUVCiNpLSEiArq6ueJRToVBInNGrmZmZYerUqYiJiYG/vz82btwodUqENEm3bt3g7e0NAPD29saPP/4ocUYvl5OTgz///BNWVlZqfaeSkFfp1asXUlJSIJfLkZ6erhHjdNauXYtNmzYBANq0aSNxNkRVVCgSQtSekZERIiMjxefMzEwJs/ln7t27B6B+SPnNmzeRnp4ucUaENE1dXZ3Sc01NjUSZ/DOBgYEadaeSkBcZMmQITExMEBgYiIiICOjoqP9ru4uLi9glGQAOHTqEoUOHSpgRUYX6/4sjhLR49vb2OHHiBCwtLQHU3/fr0aOHxFm9nI2NDZKTkzFo0CCMHDkS48aNkzolQppER0cHAQEBsLS0RGFhoVofOwWU71QCQGFhoYTZENJ0Bw8ehEwmg1wuR3Z2Ntzd3dX+rmJWVhZ8fHzEPBUKBRWKGkxLEARB6iQIIeRlXFxcxHmEAHDjxg0cPnxYwowIaVlSUlLEHTp1P/5WWVmJPXv2iPNWT58+jZ9++knapAhpArlcjt69e0MmkyEuLg5FRUWQyWRSp/VS//d//wd/f3/xec+ePRox1oM8HxWKhBC1l5CQoDSwNy0tDQMHDpQwo1crKipCREQEDAwM4Orqig4dOqBv375Sp0XIa2/evHno2bMnzp49C2dnZ6SkpFDHYaKRBg8eDB0dHbi6umL48OEa8R1SXl6uNBqqpqYGurq6EmZEVKEtdQKEEPIqjYtEAGpfJALA+vXrMWHCBJibm8Pd3R179+6VOiVCWoRu3bph4sSJ6NmzJ3x8fNCrVy+pUyKkSby8vHDw4EGEhIRoRJEIAHfv3sXYsWPxzjvv4IsvvkBRUZHUKREVUKFICCEcWFtbw8nJCfr6+tDV1UWHDh2kTomQFiEvLw8PHjxAaWkpTp8+DblcLnVKhDTJzJkzoaWlJXUa/0psbCyCg4Px119/Yf78+dTxW8NRoUgIIRwoFAqcO3cOjx49wuXLl2lVlZBmMnjwYGRnZ8PT0xPff/+92s9cJeR1Ym1tjd69e+Ott96Co6MjOnfuLHVKRAXU9ZQQQjiYPHkyQkNDoVAokJqaSpf5CWkmp06dgpeXF+zt7bF7926p0yGkRcnPz0dmZibMzc1RUFCAa9euSZ0SUQEVioQQwoGtrS1+/fVX8bmhAyMhhK9r167Bzs5O6jQIaZG+/PJLhISEiF2SaZFUs1GhSAghHFRWViItLQ2VlZUA6mc/UudFQvjr06cPKisrYWhoCAD46aefMHHiRGmTIqSFqKqqUlokJZqNxmMQQggHvr6+sLe3R9u2bQHUH4fbunWrxFkR8vr78MMPUVJSAhMTEwD1izbU0IaQ5jFmzBi4ubnBw8MDNjY2UqdDVEQ7ioQQwoGlpSVCQkLE5/z8fOmSIaQFOHnyJPr3749PPvkEc+fOFX8eHx8vYVaEtCwLFy6EtbU1kpKSsH37dnTs2BEBAQFSp0WaiLqeEkIIBy4uLti1axdOnTqFU6dOITY2VuqUCHmt7d+/H9ra2nj33XeVfv7+++9LlBEhLU+vXr2Qnp4OuVyOY8eOUcdvDUc7ioQQwkFCQgJ0dXXx5ptvAqgfl0EI4ad169YoKipCWloaunbtKv78l19+wfz58yXMjJCWY8iQITAxMUFgYCAiIiKgo0OlhiajO4qEEMLB/PnzERkZKT5nZmaiR48eEmZEyOvt999/R0JCAvLz82Fubo6G15sbN27g8OHDEmdHSMtQXV0NmUyGjIwMtG3bFu7u7nRXUYNRmU8IIRzY29vjxIkTsLS0BFDf9ZQKRUL48fT0hKenJ2QyGQYPHiz+/OjRoxJmRUjLcuHCBfHzFxcXh7i4OMhkMomzIk1FO4qEEMKBi4sLunTpIj7TrgYhhJDX3eDBg6GjowNXV1cMHz4cffv2lTologLaUSSEEA6CgoLw6aefis9paWkSZkMIIYTw5+XlhRkzZkBLS0vqVAgDtKNICCGEEEIIIUQJjccghBBCCCGEEKKECkVCCCGEEEIIIUqoUCSEEEIIIYQQooQKRUIIIYQQQgghSqhQJIQQQgghhBCi5P8BZZigRg0PMu8AAAAASUVORK5CYII=\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" + "execution_count": 7, + "id": "2b363d73", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false } - ], + }, + "outputs": [], "source": [ - "import matplotlib.pyplot as plt\n", + "# Common imports\n", + "import os\n", "import numpy as np\n", - "from sklearn.model_selection import train_test_split \n", - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.linear_model import LogisticRegression\n", - "cancer = load_breast_cancer()\n", "import pandas as pd\n", - "# Making a data frame\n", - "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n", - "\n", - "fig, axes = plt.subplots(15,2,figsize=(10,20))\n", - "malignant = cancer.data[cancer.target == 0]\n", - "benign = cancer.data[cancer.target == 1]\n", - "ax = axes.ravel()\n", - "\n", - "for i in range(30):\n", - " _, bins = np.histogram(cancer.data[:,i], bins =50)\n", - " ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n", - " ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n", - " ax[i].set_title(cancer.feature_names[i])\n", - " ax[i].set_yticks(())\n", - "ax[0].set_xlabel(\"Feature magnitude\")\n", - "ax[0].set_ylabel(\"Frequency\")\n", - "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n", - "fig.tight_layout()\n", - "plt.show()\n", - "\n", - "import seaborn as sns\n", - "correlation_matrix = cancerpd.corr().round(1)\n", - "# use the heatmap function from seaborn to plot the correlation matrix\n", - "# annot = True to print the values inside the square\n", - "plt.figure(figsize=(15,8))\n", - "sns.heatmap(data=correlation_matrix, annot=True)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Discussing the correlation data\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", "\n", - "In the above example we note two things. In the first plot we display\n", - "the overlap of benign and malignant tumors as functions of the various\n", - "features in the Wisconsing breast cancer data set. We see that for\n", - "some of the features we can distinguish clearly the benign and\n", - "malignant cases while for other features we cannot. This can point to\n", - "us which features may be of greater interest when we wish to classify\n", - "a benign or not benign tumour.\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", "\n", - "In the second figure we have computed the so-called correlation\n", - "matrix, which in our case with thirty features becomes a $30\\times 30$\n", - "matrix.\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", "\n", - "We constructed this matrix using **pandas** via the statements" + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" ] }, { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "de30ce89", + "metadata": { + "editable": true + }, "source": [ - "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)" + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "936b8f0f", + "metadata": { + "editable": true + }, "source": [ - "and then" + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." ] }, { "cell_type": "code", - "execution_count": 28, - "metadata": {}, + "execution_count": 8, + "id": "399b09d4", + "metadata": { + "collapsed": false, + "editable": true, + "jupyter": { + "outputs_hidden": false + } + }, "outputs": [], "source": [ - "correlation_matrix = cancerpd.corr().round(1)" + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" ] }, { "cell_type": "markdown", - "metadata": {}, + "id": "ded3c9a0", + "metadata": { + "editable": true + }, "source": [ - "Diagonalizing this matrix we can in turn say something about which\n", - "features are of relevance and which are not. This leads us to\n", - "the classical Principal Component Analysis (PCA) theorem with\n", - "applications. This will be discussed later this semester ([week 43](https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html)).\n", + "## Material for the lab sessions\n", "\n", + "This week we will discuss during the first hour of each lab session\n", + "some technicalities related to the project and methods for updating\n", + "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n", + "the jupyter-notebook from week 37 (September 12-16).\n", "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see \n", "\n", - "## Other measures in classification studies: Cancer Data again" + "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at " ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(426, 30)\n", - "(143, 30)\n", - "Test set accuracy with Logistic Regression: 0.95\n", - "Test set accuracy Logistic Regression with scaled data: 0.96\n", - "[1. 1. 1. 1. 1. 1.\n", - " 1. 1. 0.92857143 0.92857143]\n", - "Test set accuracy with Logistic Regression and scaled data: 0.96\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/hjensen/opt/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n" - ] - }, - { - "ename": "ModuleNotFoundError", - "evalue": "No module named 'scikitplot'", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 34\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 35\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 36\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mscikitplot\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mskplt\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 37\u001b[0m \u001b[0my_pred\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlogreg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_test_scaled\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[0mskplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmetrics\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot_confusion_matrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_pred\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'scikitplot'" - ] - } - ], - "source": [ ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn.model_selection import train_test_split \n", - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "# Load the data\n", - "cancer = load_breast_cancer()\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", - "print(X_train.shape)\n", - "print(X_test.shape)\n", - "# Logistic Regression\n", - "logreg = LogisticRegression(solver='lbfgs')\n", - "logreg.fit(X_train, y_train)\n", - "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", - "#now scale the data\n", - "from sklearn.preprocessing import StandardScaler\n", - "scaler = StandardScaler()\n", - "scaler.fit(X_train)\n", - "X_train_scaled = scaler.transform(X_train)\n", - "X_test_scaled = scaler.transform(X_test)\n", - "# Logistic Regression\n", - "logreg.fit(X_train_scaled, y_train)\n", - "print(\"Test set accuracy Logistic Regression with scaled data: {:.2f}\".format(logreg.score(X_test_scaled,y_test)))\n", - "\n", - "\n", - "from sklearn.preprocessing import LabelEncoder\n", - "from sklearn.model_selection import cross_validate\n", - "#Cross validation\n", - "accuracy = cross_validate(logreg,X_test_scaled,y_test,cv=10)['test_score']\n", - "print(accuracy)\n", - "print(\"Test set accuracy with Logistic Regression and scaled data: {:.2f}\".format(logreg.score(X_test_scaled,y_test)))\n", - "\n", - "\n", - "import scikitplot as skplt\n", - "y_pred = logreg.predict(X_test_scaled)\n", - "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", - "plt.show()\n", - "y_probas = logreg.predict_proba(X_test_scaled)\n", - "skplt.metrics.plot_roc(y_test, y_probas)\n", - "plt.show()\n", - "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", - "plt.show()" - ] -<<<<<<< HEAD -======= - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -2759,13 +2315,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", -<<<<<<< HEAD - "version": "3.6.8" -======= - "version": "3.8.3" ->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252 + "version": "3.9.15" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 5 } diff --git a/doc/pub/week38/ipynb/figures/BiasVariance.png b/doc/pub/week38/ipynb/figures/BiasVariance.png new file mode 100644 index 000000000..3fb3474ac Binary files /dev/null and b/doc/pub/week38/ipynb/figures/BiasVariance.png differ diff --git a/doc/pub/week38/ipynb/figures/adagrad.png b/doc/pub/week38/ipynb/figures/adagrad.png new file mode 100644 index 000000000..97a9cf908 Binary files /dev/null and b/doc/pub/week38/ipynb/figures/adagrad.png differ diff --git a/doc/pub/week38/ipynb/figures/adam.png b/doc/pub/week38/ipynb/figures/adam.png new file mode 100644 index 000000000..a3a39f025 Binary files /dev/null and b/doc/pub/week38/ipynb/figures/adam.png differ diff --git a/doc/pub/week38/ipynb/figures/nns.png b/doc/pub/week38/ipynb/figures/nns.png new file mode 100644 index 000000000..19e31ef05 Binary files /dev/null and b/doc/pub/week38/ipynb/figures/nns.png differ diff --git a/doc/pub/week38/ipynb/figures/rmsprop.png b/doc/pub/week38/ipynb/figures/rmsprop.png new file mode 100644 index 000000000..9f336d033 Binary files /dev/null and b/doc/pub/week38/ipynb/figures/rmsprop.png differ diff --git a/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz b/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz index 076993818..1e2fa70f2 100644 Binary files a/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz and b/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz differ diff --git a/doc/pub/week38/ipynb/w38KHBiasVariance.ipynb b/doc/pub/week38/ipynb/w38KHBiasVariance.ipynb deleted file mode 100644 index 43efc5e41..000000000 --- a/doc/pub/week38/ipynb/w38KHBiasVariance.ipynb +++ /dev/null @@ -1,196 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "\n", - "from sklearn.preprocessing import PolynomialFeatures\n", - "from sklearn.linear_model import LinearRegression\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.utils import resample\n", - "from sklearn.metrics import mean_squared_error" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "n = 50 # increase, var goes down\n", - "x = np.random.rand(n) * 10\n", - "y = 5 + x**2 + np.random.randn(n) * 3 # decrease,\n", - "poly = PolynomialFeatures(10) # increase, var goes up" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "X = poly.fit_transform(x.reshape(n, 1))\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", - "x_test = X_test[:, 1]" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "models = []\n", - "for i in range(10):\n", - " X_sample, y_sample = resample(X_train, y_train)\n", - " mdl = LinearRegression().fit(X_sample, y_sample)\n", - " models.append(mdl)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "92.13165463148223\n", - "92.13165463148225\n" - ] - }, - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAGdCAYAAADnrPLBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA6QUlEQVR4nO3de3RU5b3/8c9cMnsmYRIDgQyRIBepFoOK4KGALVhuWkQtZ3kp6oEjZWkBNUWqRXsqciSpN/QUVqn0uIBKEdf5Ka3VWoja4uF4QxTLxUKt3JQMUYgzk2Rumdm/P4aMDvdoJsMO79dae83M3s/s/d3R5Xx89rOfbTNN0xQAAIBF2XNdAAAAwNdBmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJbmzHUB7SGZTGrfvn3yer2y2Wy5LgcAAJwE0zQVCoVUVlYmu/3Y/S+nRZjZt2+fysvLc10GAAD4Cvbu3asePXocc/tpEWa8Xq+k1B+jsLAwx9UAAICTEQwGVV5env4dP5bTIsy0XFoqLCwkzAAAYDEnGiLCAGAAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAPCVNMWa1eunL6rXT19UU6w5Z3UQZgAAgKURZgAAgKURZgAAgKURZgAAgKVlNcw0NzfrZz/7mXr37i2Px6M+ffpo3rx5SiaT6TamaWru3LkqKyuTx+PRyJEjtXXr1oz9RKNR3XbbbSopKVFBQYGuvPJKffzxx9ksHQAAWERWw8yDDz6oX//611q0aJE++OADPfTQQ3r44Ye1cOHCdJuHHnpICxYs0KJFi7Rhwwb5fD6NGTNGoVAo3aayslKrV6/WqlWrtH79ejU0NOiKK65QIpHIZvkAAMACnNnc+RtvvKGrrrpK48ePlyT16tVLTz/9tN555x1JqV6Zxx9/XPfee68mTpwoSVq+fLlKS0u1cuVK3XLLLQoEAnryySf11FNPafTo0ZKkFStWqLy8XC+//LLGjRuXzVMAAACnuKz2zFxyySV65ZVXtGPHDknS+++/r/Xr1+t73/ueJGnnzp3y+/0aO3Zs+juGYWjEiBF6/fXXJUkbN25UPB7PaFNWVqaKiop0m8NFo1EFg8GMBQAAdExZ7Zm5++67FQgEdO6558rhcCiRSGj+/Pn6wQ9+IEny+/2SpNLS0ozvlZaWavfu3ek2LpdLxcXFR7Rp+f7hqqurdf/997f16QAAgFNQVntmnnnmGa1YsUIrV67Uu+++q+XLl+uRRx7R8uXLM9rZbLaMz6ZpHrHucMdrM2fOHAUCgfSyd+/er3ciAADglJXVnpmf/OQn+ulPf6rrr79ekjRgwADt3r1b1dXVmjx5snw+n6RU70v37t3T36urq0v31vh8PsViMdXX12f0ztTV1WnYsGFHPa5hGDIMI1unBQAATiFZ7ZlpamqS3Z55CIfDkb41u3fv3vL5fKqpqUlvj8ViWrduXTqoDBo0SHl5eRltamtrtWXLlmOGGQAAcPrIas/MhAkTNH/+fPXs2VPnnXee3nvvPS1YsEA333yzpNTlpcrKSlVVValfv37q16+fqqqqlJ+fr0mTJkmSioqKNHXqVN15553q0qWLOnfurNmzZ2vAgAHpu5sAAMDpK6thZuHChfqP//gPTZ8+XXV1dSorK9Mtt9yin//85+k2d911l8LhsKZPn676+noNGTJEa9euldfrTbd57LHH5HQ6de211yocDmvUqFFatmyZHA5HNssHAAAWYDNN08x1EdkWDAZVVFSkQCCgwsLCXJcDAECH0BRrVv+fr5EkbZs3Tvmutu0jOdnfb57NBAAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALC3rYeaTTz7RjTfeqC5duig/P18XXnihNm7cmN5umqbmzp2rsrIyeTwejRw5Ulu3bs3YRzQa1W233aaSkhIVFBToyiuv1Mcff5zt0gEAgAVkNczU19dr+PDhysvL00svvaRt27bp0Ucf1RlnnJFu89BDD2nBggVatGiRNmzYIJ/PpzFjxigUCqXbVFZWavXq1Vq1apXWr1+vhoYGXXHFFUokEtksHwAAWIAzmzt/8MEHVV5erqVLl6bX9erVK/3eNE09/vjjuvfeezVx4kRJ0vLly1VaWqqVK1fqlltuUSAQ0JNPPqmnnnpKo0ePliStWLFC5eXlevnllzVu3LhsngIAADjFZbVn5vnnn9fgwYN1zTXXqFu3bho4cKB+85vfpLfv3LlTfr9fY8eOTa8zDEMjRozQ66+/LknauHGj4vF4RpuysjJVVFSk2xwuGo0qGAxmLAAAoGPKapj56KOPtHjxYvXr109r1qzRrbfeqttvv12//e1vJUl+v1+SVFpamvG90tLS9Da/3y+Xy6Xi4uJjtjlcdXW1ioqK0kt5eXlbnxoAADhFZDXMJJNJXXTRRaqqqtLAgQN1yy23aNq0aVq8eHFGO5vNlvHZNM0j1h3ueG3mzJmjQCCQXvbu3fv1TgQAAJyyshpmunfvrv79+2es++Y3v6k9e/ZIknw+nyQd0cNSV1eX7q3x+XyKxWKqr68/ZpvDGYahwsLCjAUAAHRMWQ0zw4cP1/bt2zPW7dixQ2eddZYkqXfv3vL5fKqpqUlvj8ViWrdunYYNGyZJGjRokPLy8jLa1NbWasuWLek2AADg9JXVu5l+/OMfa9iwYaqqqtK1116rt99+W0uWLNGSJUskpS4vVVZWqqqqSv369VO/fv1UVVWl/Px8TZo0SZJUVFSkqVOn6s4771SXLl3UuXNnzZ49WwMGDEjf3QQAAE5fWQ0zF198sVavXq05c+Zo3rx56t27tx5//HHdcMMN6TZ33XWXwuGwpk+frvr6eg0ZMkRr166V1+tNt3nsscfkdDp17bXXKhwOa9SoUVq2bJkcDkc2ywcAABZgM03TzHUR2RYMBlVUVKRAIMD4GQAA2khTrFn9f75GkrRt3jjlu9q2j+Rkf795NhMAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALC0dgsz1dXVstlsqqysTK8zTVNz585VWVmZPB6PRo4cqa1bt2Z8LxqN6rbbblNJSYkKCgp05ZVX6uOPP26vsgEAwCmuXcLMhg0btGTJEp1//vkZ6x966CEtWLBAixYt0oYNG+Tz+TRmzBiFQqF0m8rKSq1evVqrVq3S+vXr1dDQoCuuuEKJRKI9SgcAAKe4rIeZhoYG3XDDDfrNb36j4uLi9HrTNPX444/r3nvv1cSJE1VRUaHly5erqalJK1eulCQFAgE9+eSTevTRRzV69GgNHDhQK1as0ObNm/Xyyy9nu3QAAGABWQ8zM2bM0Pjx4zV69OiM9Tt37pTf79fYsWPT6wzD0IgRI/T6669LkjZu3Kh4PJ7RpqysTBUVFek2RxONRhUMBjMWAADQMTmzufNVq1bp3Xff1YYNG47Y5vf7JUmlpaUZ60tLS7V79+50G5fLldGj09Km5ftHU11drfvvv//rlg8AACwgaz0ze/fu1R133KEVK1bI7XYfs53NZsv4bJrmEesOd6I2c+bMUSAQSC979+5tXfEAAMAyshZmNm7cqLq6Og0aNEhOp1NOp1Pr1q3TL3/5SzmdznSPzOE9LHV1deltPp9PsVhM9fX1x2xzNIZhqLCwMGMBAAAdU9bCzKhRo7R582Zt2rQpvQwePFg33HCDNm3apD59+sjn86mmpib9nVgspnXr1mnYsGGSpEGDBikvLy+jTW1trbZs2ZJuAwAATm9ZGzPj9XpVUVGRsa6goEBdunRJr6+srFRVVZX69eunfv36qaqqSvn5+Zo0aZIkqaioSFOnTtWdd96pLl26qHPnzpo9e7YGDBhwxIBiAABwesrqAOATueuuuxQOhzV9+nTV19dryJAhWrt2rbxeb7rNY489JqfTqWuvvVbhcFijRo3SsmXL5HA4clg5AAA4VdhM0zRzXUS2BYNBFRUVKRAIMH4GAIA20hRrVv+fr5EkbZs3Tvmutu0jOdnfb57NBAAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALC2rYaa6uloXX3yxvF6vunXrpquvvlrbt2/PaGOapubOnauysjJ5PB6NHDlSW7duzWgTjUZ12223qaSkRAUFBbryyiv18ccfZ7N0AABgEVkNM+vWrdOMGTP05ptvqqamRs3NzRo7dqwaGxvTbR566CEtWLBAixYt0oYNG+Tz+TRmzBiFQqF0m8rKSq1evVqrVq3S+vXr1dDQoCuuuEKJRCKb5QMAAAuwmaZpttfBPv30U3Xr1k3r1q3Td77zHZmmqbKyMlVWVuruu++WlOqFKS0t1YMPPqhbbrlFgUBAXbt21VNPPaXrrrtOkrRv3z6Vl5frT3/6k8aNG3fC4waDQRUVFSkQCKiwsDCr5wgAwOmiKdas/j9fI0naNm+c8l3ONt3/yf5+t+uYmUAgIEnq3LmzJGnnzp3y+/0aO3Zsuo1hGBoxYoRef/11SdLGjRsVj8cz2pSVlamioiLd5nDRaFTBYDBjAQAAHVO7hRnTNDVr1ixdcsklqqiokCT5/X5JUmlpaUbb0tLS9Da/3y+Xy6Xi4uJjtjlcdXW1ioqK0kt5eXlbnw4AADhFtFuYmTlzpv72t7/p6aefPmKbzWbL+Gya5hHrDne8NnPmzFEgEEgve/fu/eqFAwCAU1q7hJnbbrtNzz//vP7yl7+oR48e6fU+n0+SjuhhqaurS/fW+Hw+xWIx1dfXH7PN4QzDUGFhYcYCAADaSH299NRT0o035boSSVkOM6ZpaubMmXruuef06quvqnfv3hnbe/fuLZ/Pp5qamvS6WCymdevWadiwYZKkQYMGKS8vL6NNbW2ttmzZkm4DAACyzO+XnnhCuvpq6ZZbUusWLsxpSS3adtjxYWbMmKGVK1fqD3/4g7xeb7oHpqioSB6PRzabTZWVlaqqqlK/fv3Ur18/VVVVKT8/X5MmTUq3nTp1qu6880516dJFnTt31uzZszVgwACNHj06m+UDAHB6271bWr1aevll6YwzUkFmxQqpU6fU9lhzLqtLy2qYWbx4sSRp5MiRGeuXLl2qKVOmSJLuuusuhcNhTZ8+XfX19RoyZIjWrl0rr9ebbv/YY4/J6XTq2muvVTgc1qhRo7Rs2TI5HI5slg8AwOln+3bpueekdeukM8+UJk6Ubr1VcrtzXdkxtes8M7nCPDMAAByDaUrvv58KMG++KX3jG9K//qv07W9LzuP3eZwq88xktWcGAACcgpJJ6a23pGefTQWZCy9MBZi5cyW79R7bSJgBAOB00NycunT03HPShx9K3/qWdNNN0sMPSyeYDuVUR5gBAKCjikRSg3efe06qrZVGjJBuv10655xcV9amCDMAAHQkDQ3SSy9Jv/+9FAxKo0enLh/17JnryrKGMAMAgNXV10t//GNqicel731PWrBAOsbksh0NYQYAACvy+1O9Ly+9JBmGdOWV0pIl0mHPMjwdEGYAALCKXbu+mMSuuFj6/vellSulgoJcV5ZThBkAAE5lf/97agDva69JPXqkJrGbPj3VGwNJhBkAAE4tpilt2pQKMG+9lbrzaOJE6a67TjiJ3emKvwoAALmWTKZm3332Welvf5MGDkwFmPvvt+Qkdu2NMAMAQC7E45mT2A0dKv3bv0nnn2/5SezaG2EGAID2EolINTWpQby1tdLIkVJlZep5SBaUSH7xeMe3Pjqo73yjqxz29g9ihBkAALIpFMqcxG7sWGtNYhcOS3v2SHv2yNy1W6E9n2j/vs+0/dNGbYy5pQvHS5L+fdkGdS9y674J/XVZRfd2LZGnZgMA0NYOHvxiErtEIjWJ3ZVXnnqT2Jmm9Nln0u7diu7ao7pd+7R/32fyH2zQftOl/XkF2m8Uyt+pWHXOAvnNPIWTx+55admy+MaL2iTQ8NRsAADaU21tqvflz3/+YhK7//5v6YwzcldTLKbk3r068I/d2r9rn/bXHpD/QEj7w0ntd+Zrv7NAfneh9ucVqF6GpN6Sq7fkO8q+El+8tdlSOehwplKB5v4/btOY/r52u+REmAEA4KvatSs1gPeVV6TOnaWrr26/SexMU6H9n2n/P3Zr/8592l97UP4DIdWFovLbDPmdBarLK1CdM1/NNruk4tTSSanlKFxOu3yFbpUWGiotdKu00C1foVvdCg35Ct3yKabP/99q7X5ypc79bJfyoxFdMnN5ZlmSagMRvb3zoIb27ZLlP0IKYQYAgNb44INUgPnf/5XKy1O3UM+Y0aaT2MWak6qrb9T+nR9r/y6//LWfaf+BBu1viGl/3Kb9zgLtd+ar0eE69A2HpK6SvatUdOT+bDappFMqkJR6XSp1JlUaCcr30Qcq3fqeSj/6u3y1u1QUOCBbInH0bpdDeko6X6nQ0pDnPma7ulDka/wFWocwAwDo8BJJU2/vPKi6UETdvG79S+/OJ38JxDSl995LBZi335bOPTcVYO6+u9WT2CWTpg42xbQ/GNF+f7327/HLX3tAdQca5G+IaX9M2m/36IDTc9g381OLS6nlS7wOqdQWk6/hgLp9tk++z/bJV79f3QJ18gU/U2nDAXVtqJfTTB67MJstNZ+NYUiFhanByd/6lnTVVdLw4RlB7c1/HtAPfvPmCc+1m/fYQaetEWYAAB3an7fU6v4/blNt4IueghPedZNMSm+8kZrEbvNm6aKLUgFm3rxjTmLXGG2WPxhJBZXPw6mBtLUHVXewJajYVGdzKW5zHOXbh677fCmouBJxdWusV2nogHyhAyoNfabShoPyhQ6oW8NB+RoOqLThgPLj0cxd2e2SwyF5PFKXLtJ5A6V/+RdpwgRp8OCvPYvwv/TurO5FbvkDER2t/8YmyVeUCozthbuZAAAd1p+31OpHK9494kf3qHfdxOPSX/+a6oH56KPUJHYTJyre/zx92hCTPxhRXTAifyAi/8EG1fnrU3f9hKLaH7erwXbyIaGkJaQ0HFS3hkNhpeGgShsOqDSUCirF4aBsdruUl5cag9O1q9S3byqYXHGFdMEFOXu8Qav+rl/Dyf5+E2YAAB1SImnqkgdfzeiROVwPj01PnB1X3V/+L9Wj0vNs+bv2UF3UTPWmJJ064HDLtJ3cIwU6RZsOhZNUIMkIKo31Ko01qJtiyutWkgomQ4ZIl12WmvXXYs9d+vOWWt33/FbtD37RM9TW88wQZr6EMAMAHVdzOKKGzz5X6MDnCtYHFfq8QaFgk/6x5zP935aP5Uim7ilO2u2K251qMPJV7ynUpwXFijvzTuoYzkTzFz0nDQdTSyQoX6JJpYZU2q1YpRedp06XficVTNztN14kl0KRuAbMXStJWjrl4jafAZh5ZgAAp7ZoVM31n6eCyMGAgp+HFAo0KhQKK9QYSb02RBSKxBWMJxVK2BS05ylkcynkcCmU51YoL19NrqMFB5ukrlLvricso0vj5+rW9Ll8kaBKFVVpQZ5Ke5bKd14/dRvwDfl8ndU53yV7DqbpP9V9ObgM6dOKQdVtjDADACfpa90R00bHaI8ajss0U88XCgaV+DyghoMBBQ8EFAykekNCDRGFGqMKhWMKheOpINJsKiRnanG4FHIYCjndCrqOFkQcypgI5Sh37xyLOx6VN9YkbywsbzImj5lQQyyhqNOlRpdbQaOTmgyPkvbMAbiLKi9rt/lQkB2EGQA4CV/pjpg2PsbXqsE0paam1LOBgkElAgE1HAx+0RsSbFKoMaJgU0yhSLNC0YRC8aSCpkMhOTKDSJ5HIZdHja4v3z6cp9QEJ4cmObEpfTfxyTDiURW2BJFEVIX2pLxup7yFBfKWnCFvaYm8xV55z/DK28mjQrdTXndeqs2h9y5n5riWljEzp9JdN8gOwgwAnMCx7tzwByL60Yp32+TOjeMe46mNmnlxqVa/slmdwyH1jjTIE4/IaI4rL9msmvUv6ECPQuW7nApF4grFTQVNRyqI2JwK2fJSQcTpTl2aceV/KYi0dH0Upz7a1aoQIqWCiDfWpMJYWN7miArVLG+eXd4CQ97iQnm7nCFv1zNSr8VeFXbynDCItAWH3ab7JvTXj1a8K5uU8bdt6cu6b0L/nF0aQdshzADAcSSSpu7/47aj/p99q59DY5qpJygfPKjYpwcU+vSggp99rsCBoP76f//Q98MxOZPNciYTijryFM5zK5xnqNHl0ct1H0lGgfac4VPIfeRc9M+2vGnlLPoZQSQeljcZV6EjKa/hTIWRok7ydi1O9Yx07azCzoXyevOzHkTaymUV3bX4xouO6NHy5ejpzsgOwgwAHMfbOw8ecWuvLZmQN9okb6xJBdGwPPui+t2DH8ubjCl4IKBQsEnBaLNCzVLQnqegw1DIYSjo8ijkylfQla9oXsuMqoe6Qc4+2pP9js9ojsobaVKnWGopTMRUZEvI65C8LrsK8w15izulgsiZpSr0dU1dsunksUQQaSuXVXTXmP6+3I41QlYRZgB0GKZpKp4wFY4nFIknFI4lFGlOKBxtVjjQoEioUZGGJoUbwgo3NikSbFQkEEp9boooHIkrGk8onDAVNu0K2xwK25062+FS1OlSxOlK95QEPV4FPd70sd8PHHrjKpNKTr7mTtEmeaON8kYbVRCLKD8Wlrs5JlciLpuZVMLmUKPLrXpPkfZ7Oyvg7qS4MzUiNuo0FO1k6LNDl4j+6/oLddWFZ7bVn7NDcdhtDPLtwAgzwCks53eutBHTNBVtTqYCxqGQ0RI4IvFk+nM4nlA0llA4HFU41KhwsCEVOBqaFG4MK9wUVSTWrEizqbBsCsuhiN2piCNPYaehsNN1xJ0qx+eSdOgHznNoaQUjHpU3eqhXJB5WUSKWGi/ilLz5hgoLPakxI906q/BMn7xl3VR4hldet1OF7jx1cjvlsNv0xkk+6+ZE2vNZOMCphDADnKLa4+6ZZNJM9Vykw0Vm4Gh5H4k1p4JEU1ThcFSRSCz1Gk69hhvDijRFFYnFFW6WwrIpYmsJGakejZOdQfXo3KklT6nlJDiSCXniUbmbY/IkUos7mZDbbJbHlpTHLrmddnkMp9weQ55O+XIXeeXpfIbcXTvLU9RJbpdThtOuu5/9mw42xo84RjTPUCzPUF5Rd62++7tfOWie6Fk3kmS3pYbccFcOcCTCDI7QUXoDrMo0Tb3wt3267elNR2yrDUR062/f0a3fKtO5hU6FQ02ppTGsSENYkaawwuGYwtF4KpgkTIWTUlh2hW1ORW0OhR0uhZ2pJeY8yQk8juvQvCCOTifVs+FqjstIxORpjsmTjMtjJuS2melg4XG75O7kkdtbIE+nfHnyDXk8hgxP6tXjcsiT55D70PLFZ3vq9dDnPEfbjQVJJE39aMW7krJzR8zJ3HUz7du9teS1ndyVAxwFYeZryMaPfnsHicOPV98Y03++mN3eAKtIJE1FmxOKxpOKHHptuVQSjScUCUcVDUdTr5GYIo1hRRuaFAk2KNrQpGhTRNFITNFoTJFYIvXdpBQ1bYrIrqjNoajNoYgjT1FHniJOl6KOvC8NDD0Gu12/ftt/2MrDJvUwDi2tYDRH5WmOyZ2Iy5OIy63Eod4Lhzwel9z5bnk6eVI9GJ088njccucb8uQbcruc6YDhcTlSweQoocPttMvZhiGjvbTHHTEnc4yBPYu5Kwc4Cp7N9BVl4xJAe1xWONHxjqatn4LaWvFIVNFASJFAKBUWQk2KNqZ6IaKNEUUjUUUicUWjcUVjzYrGmw+Fh4QiidRYjWhzqpcimjAVMe0ZgSJidyhqz1PU4VT0UKCIOF2KO07yekYW2cyk3PGY3M1RuZujMprjMg4NDu1klwrybPLkOVI9GgWp3gx35zPk6XKGPEVeuQs8chupoHG8Hg2308FU7SeBGYCBTE2xZvX/+RpJ0rZ545Tvats+Eh40+SVtHWay8ejz9nqceos17+3WXcvflOfQ3RPeWJMKYmF1ioWVHwurIBZJTcqVOPTj2RyTN8+u8ed1U+xQz0S0OaFI86GAkDAVTUoR06ZoUorKngoLsitidypqd37xeqgnIuN9S+/E4b0UTpcSrRrQmR3ORLPczTEZibjciZiMRLOMZLMMMyFDSbltpgyHTW6nXYbLIbeRJ8NtyO0xZBwKGUaRV0aRV+5OHhlGnow8hwynXe7DXv+6vU4/+/3WE9bEnSsAcu1UCTOWucz0q1/9Sg8//LBqa2t13nnn6fHHH9e3v/3tdq/j8Am0fvzaChWHA0rKdih52PTZqw6t79pJSVNqNqV40lRzUmo2TTWbNjVLh15tiiv12pCQbrHZFXG4tOziqyRJ17/3JzlMU0mbXX/Z9Ge9ludUUjYlbXYlbDaZNpsSNrsSNnv6fXq7/dB7u/1Qe/uh9zYlbA4lJPWwORR3OLSjay9J0ln1+5SUTc12h2qLukmSvJFGxZypUCFJ97bnH/soXM0xGc2xLwWLeCpYmM1ymwkZNlNuh+2LcOB2ye12ych3y+iUL6OwkwxvJ7mLOsnId8tdkAoWbqddxqFeCsP5xavhtMto50sjfbt6T9xI3LkCAC0sEWaeeeYZVVZW6le/+pWGDx+uJ554Qpdffrm2bdumnj17tmsth0+g9dh3bszasVYN/F7W9n00u4vLjlgXch99OlF3PJIKFIcGc7a8dydiMpLNciebZZhJGTZTxqG7RgwjT4Y7L9Vr4TnUa5HvkdubnwoYZ3hlFBfJ3aU4FT4O67VwOeynxaWQE93Zwp0rAJDJEmFmwYIFmjp1qn74wx9Kkh5//HGtWbNGixcvVnV1dbvWUhc6+viSkoZ65SXjyksk5Egm5EwmlJdMyGmm3n95yTMTcpim8syEnGZSSiaVjDfLbiZlyqbnKy6VJF215RU5k0m1TJre2WuoyJ0nu8Muu8Mhh9Mhm8MhR15e6r3TIUeeU3Z76kffYVP6vd1mk8Nuk12m9hxo1Jv//EwypaTNpr+e/S+SpJH/3CBbMqlmu0P/23ewJGnAJ9sVdhlqcBUokufSoqnDNbyih2yO3F/66ah4ngwAtM4pH2ZisZg2btyon/70pxnrx44dq9dff/2o34lGo4pGo+nPwWCwzeo5vGvfE4so7nDqs4IzJNsXPy5PT/vWSc82eawJs/5QMSrjc2v2eaLjVR3leH/te/ER6zafeY6kL3oDhg7oKRs/olnH82QA4OSd8mHms88+UyKRUGlpacb60tJS+f2H356aUl1drfvvvz8r9Rx+CSDsygw3X+USQHtfVjiZCboOP75Eb0B743kyAHByLDPhg82W+R9w0zSPWNdizpw5CgQC6WXv3r1tVkfLJQDpix/5dI2HXlv7o5+NfX7V4x2Nr8ids9uyT3ctz5O56sIzNbRvF4IMABzFKd8zU1JSIofDcUQvTF1d3RG9NS0Mw5BhtHLGsFbIxiWA9r6scKzjdS9y6z/Gf1PFBQa9AQAASzjlw4zL5dKgQYNUU1Oj73//++n1NTU1uuqqq3JWVzYuAbT3ZQUuYwAAOoJTPsxI0qxZs3TTTTdp8ODBGjp0qJYsWaI9e/bo1ltvzWld2XikfHs/pr69jwcAQFuzRJi57rrrdODAAc2bN0+1tbWqqKjQn/70J5111lm5Lg0AAOSYJcKMJE2fPl3Tp0/PdRkAAOAUY5m7mQAAAI6GMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACwta2Fm165dmjp1qnr37i2Px6O+ffvqvvvuUywWy2i3Z88eTZgwQQUFBSopKdHtt99+RJvNmzdrxIgR8ng8OvPMMzVv3jyZppmt0gEAgIU4s7Xjv//970omk3riiSd09tlna8uWLZo2bZoaGxv1yCOPSJISiYTGjx+vrl27av369Tpw4IAmT54s0zS1cOFCSVIwGNSYMWN06aWXasOGDdqxY4emTJmigoIC3XnnndkqHwAAWETWwsxll12myy67LP25T58+2r59uxYvXpwOM2vXrtW2bdu0d+9elZWVSZIeffRRTZkyRfPnz1dhYaF+97vfKRKJaNmyZTIMQxUVFdqxY4cWLFigWbNmyWazZesUAACABbTrmJlAIKDOnTunP7/xxhuqqKhIBxlJGjdunKLRqDZu3JhuM2LECBmGkdFm37592rVrV7vVDgAATk3tFmb++c9/auHChbr11lvT6/x+v0pLSzPaFRcXy+Vyye/3H7NNy+eWNoeLRqMKBoMZCwAA6JhaHWbmzp0rm8123OWdd97J+M6+fft02WWX6ZprrtEPf/jDjG1Hu0xkmmbG+sPbtAz+PdYlpurqahUVFaWX8vLy1p4mAACwiFaPmZk5c6auv/7647bp1atX+v2+fft06aWXaujQoVqyZElGO5/Pp7feeitjXX19veLxeLr3xefzHdEDU1dXJ0lH9Ni0mDNnjmbNmpX+HAwGCTQAAHRQrQ4zJSUlKikpOam2n3zyiS699FINGjRIS5culd2e2RE0dOhQzZ8/X7W1terevbuk1KBgwzA0aNCgdJt77rlHsVhMLpcr3aasrCwjNH2ZYRgZY2wAAEDHlbUxM/v27dPIkSNVXl6uRx55RJ9++qn8fn9GL8vYsWPVv39/3XTTTXrvvff0yiuvaPbs2Zo2bZoKCwslSZMmTZJhGJoyZYq2bNmi1atXq6qqijuZAACApCzemr127Vp9+OGH+vDDD9WjR4+MbS1jXhwOh1588UVNnz5dw4cPl8fj0aRJk9K3bktSUVGRampqNGPGDA0ePFjFxcWaNWtWxmUkAABw+rKZp8FUusFgUEVFRQoEAukeHwAA8PU0xZrV/+drJEnb5o1Tvqtt+0hO9vebZzMBAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLa5cwE41GdeGFF8pms2nTpk0Z2/bs2aMJEyaooKBAJSUluv322xWLxTLabN68WSNGjJDH49GZZ56pefPmyTTN9igdAACc4pztcZC77rpLZWVlev/99zPWJxIJjR8/Xl27dtX69et14MABTZ48WaZpauHChZKkYDCoMWPG6NJLL9WGDRu0Y8cOTZkyRQUFBbrzzjvbo3wAAHAKy3qYeemll7R27Vo9++yzeumllzK2rV27Vtu2bdPevXtVVlYmSXr00Uc1ZcoUzZ8/X4WFhfrd736nSCSiZcuWyTAMVVRUaMeOHVqwYIFmzZolm82W7VMAAACnsKxeZtq/f7+mTZump556Svn5+Udsf+ONN1RRUZEOMpI0btw4RaNRbdy4Md1mxIgRMgwjo82+ffu0a9euox43Go0qGAxmLAAAoGPKWpgxTVNTpkzRrbfeqsGDBx+1jd/vV2lpaca64uJiuVwu+f3+Y7Zp+dzS5nDV1dUqKipKL+Xl5V/3dAAAwCmq1WFm7ty5stlsx13eeecdLVy4UMFgUHPmzDnu/o52mcg0zYz1h7dpGfx7rEtMc+bMUSAQSC979+5t7WkCAACLaPWYmZkzZ+r6668/bptevXrpgQce0JtvvplxeUiSBg8erBtuuEHLly+Xz+fTW2+9lbG9vr5e8Xg83fvi8/mO6IGpq6uTpCN6bFoYhnHEcQEAQMfU6jBTUlKikpKSE7b75S9/qQceeCD9ed++fRo3bpyeeeYZDRkyRJI0dOhQzZ8/X7W1terevbuk1KBgwzA0aNCgdJt77rlHsVhMLpcr3aasrEy9evVqbfkAAKCDydqYmZ49e6qioiK9fOMb35Ak9e3bVz169JAkjR07Vv3799dNN92k9957T6+88opmz56tadOmqbCwUJI0adIkGYahKVOmaMuWLVq9erWqqqq4kwkAAEjK8QzADodDL774otxut4YPH65rr71WV199tR555JF0m6KiItXU1Ojjjz/W4MGDNX36dM2aNUuzZs3KYeUAAOBU0S6T5kmpcTRHm7W3Z8+eeuGFF4773QEDBui1117LVmkAAMDCeDYTAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwtKyHmRdffFFDhgyRx+NRSUmJJk6cmLF9z549mjBhggoKClRSUqLbb79dsVgso83mzZs1YsQIeTwenXnmmZo3b55M08x26QAAwAKc2dz5s88+q2nTpqmqqkrf/e53ZZqmNm/enN6eSCQ0fvx4de3aVevXr9eBAwc0efJkmaaphQsXSpKCwaDGjBmjSy+9VBs2bNCOHTs0ZcoUFRQU6M4778xm+QAAwAKyFmaam5t1xx136OGHH9bUqVPT688555z0+7Vr12rbtm3au3evysrKJEmPPvqopkyZovnz56uwsFC/+93vFIlEtGzZMhmGoYqKCu3YsUMLFizQrFmzZLPZsnUKAADAArJ2mendd9/VJ598IrvdroEDB6p79+66/PLLtXXr1nSbN954QxUVFekgI0njxo1TNBrVxo0b021GjBghwzAy2uzbt0+7du066rGj0aiCwWDGAgAAOqashZmPPvpIkjR37lz97Gc/0wsvvKDi4mKNGDFCBw8elCT5/X6VlpZmfK+4uFgul0t+v/+YbVo+t7Q5XHV1tYqKitJLeXl5m54bAAA4dbQ6zMydO1c2m+24yzvvvKNkMilJuvfee/Wv//qvGjRokJYuXSqbzab/+Z//Se/vaJeJTNPMWH94m5bBv8e6xDRnzhwFAoH0snfv3taeJgAAsIhWj5mZOXOmrr/++uO26dWrl0KhkCSpf//+6fWGYahPnz7as2ePJMnn8+mtt97K+G59fb3i8Xi698Xn8x3RA1NXVydJR/TYfPk4X74sBQAAOq5Wh5mSkhKVlJScsN2gQYNkGIa2b9+uSy65RJIUj8e1a9cunXXWWZKkoUOHav78+aqtrVX37t0lpQYFG4ahQYMGpdvcc889isVicrlc6TZlZWXq1atXa8sHAABtJN/l1K5fjM91GdkbM1NYWKhbb71V9913n9auXavt27frRz/6kSTpmmuukSSNHTtW/fv310033aT33ntPr7zyimbPnq1p06apsLBQkjRp0iQZhqEpU6Zoy5YtWr16taqqqriTCQAASMryPDMPP/ywnE6nbrrpJoXDYQ0ZMkSvvvqqiouLJUkOh0Mvvviipk+fruHDh8vj8WjSpEl65JFH0vsoKipSTU2NZsyYocGDB6u4uFizZs3SrFmzslk6AACwCJt5GkylGwwGVVRUpEAgkO7xAQAAp7aT/f3m2UwAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSsvrU7FNFy7M0g8FgjisBAAAnq+V3+0TPxD4twkwoFJIklZeX57gSAADQWqFQSEVFRcfcbjNPFHc6gGQyqX379snr9cpms33l/QSDQZWXl2vv3r3HfRS51Z0u5ymdPud6upyndPqc6+lynhLn2hGd7HmapqlQKKSysjLZ7cceGXNa9MzY7Xb16NGjzfZXWFjYof8la3G6nKd0+pzr6XKe0ulzrqfLeUqca0d0Mud5vB6ZFgwABgAAlkaYAQAAlkaYaQXDMHTffffJMIxcl5JVp8t5SqfPuZ4u5ymdPud6upynxLl2RG19nqfFAGAAANBx0TMDAAAsjTADAAAsjTADAAAsjTADAAAsjTBzAtXV1br44ovl9XrVrVs3XX311dq+fXuuy8qKxYsX6/zzz09PYjR06FC99NJLuS4r66qrq2Wz2VRZWZnrUtrc3LlzZbPZMhafz5frsrLik08+0Y033qguXbooPz9fF154oTZu3Jjrstpcr169jvhnarPZNGPGjFyX1uaam5v1s5/9TL1795bH41GfPn00b948JZPJXJfW5kKhkCorK3XWWWfJ4/Fo2LBh2rBhQ67L+tpee+01TZgwQWVlZbLZbPr973+fsd00Tc2dO1dlZWXyeDwaOXKktm7d2urjEGZOYN26dZoxY4befPNN1dTUqLm5WWPHjlVjY2OuS2tzPXr00C9+8Qu98847euedd/Td735XV1111Vf6F8sqNmzYoCVLluj888/PdSlZc95556m2tja9bN68Odcltbn6+noNHz5ceXl5eumll7Rt2zY9+uijOuOMM3JdWpvbsGFDxj/PmpoaSdI111yT48ra3oMPPqhf//rXWrRokT744AM99NBDevjhh7Vw4cJcl9bmfvjDH6qmpkZPPfWUNm/erLFjx2r06NH65JNPcl3a19LY2KgLLrhAixYtOur2hx56SAsWLNCiRYu0YcMG+Xw+jRkzJv1MxZNmolXq6upMSea6detyXUq7KC4uNv/7v/8712VkRSgUMvv162fW1NSYI0aMMO+4445cl9Tm7rvvPvOCCy7IdRlZd/fdd5uXXHJJrsvIiTvuuMPs27evmUwmc11Kmxs/frx58803Z6ybOHGieeONN+aoouxoamoyHQ6H+cILL2Ssv+CCC8x77703R1W1PUnm6tWr05+TyaTp8/nMX/ziF+l1kUjELCoqMn/961+3at/0zLRSIBCQJHXu3DnHlWRXIpHQqlWr1NjYqKFDh+a6nKyYMWOGxo8fr9GjR+e6lKz6xz/+obKyMvXu3VvXX3+9Pvroo1yX1Oaef/55DR48WNdcc426deumgQMH6je/+U2uy8q6WCymFStW6Oabb/5aD9E9VV1yySV65ZVXtGPHDknS+++/r/Xr1+t73/tejitrW83NzUokEnK73RnrPR6P1q9fn6Oqsm/nzp3y+/0aO3Zsep1hGBoxYoRef/31Vu3rtHjQZFsxTVOzZs3SJZdcooqKilyXkxWbN2/W0KFDFYlE1KlTJ61evVr9+/fPdVltbtWqVXr33Xc7xDXp4xkyZIh++9vf6hvf+Ib279+vBx54QMOGDdPWrVvVpUuXXJfXZj766CMtXrxYs2bN0j333KO3335bt99+uwzD0L/927/lurys+f3vf6/PP/9cU6ZMyXUpWXH33XcrEAjo3HPPlcPhUCKR0Pz58/WDH/wg16W1Ka/Xq6FDh+o///M/9c1vflOlpaV6+umn9dZbb6lfv365Li9r/H6/JKm0tDRjfWlpqXbv3t2qfRFmWmHmzJn629/+1qGT8jnnnKNNmzbp888/17PPPqvJkydr3bp1HSrQ7N27V3fccYfWrl17xP8JdTSXX355+v2AAQM0dOhQ9e3bV8uXL9esWbNyWFnbSiaTGjx4sKqqqiRJAwcO1NatW7V48eIOHWaefPJJXX755SorK8t1KVnxzDPPaMWKFVq5cqXOO+88bdq0SZWVlSorK9PkyZNzXV6beuqpp3TzzTfrzDPPlMPh0EUXXaRJkybp3XffzXVpWXd4r6Jpmq3uaSTMnKTbbrtNzz//vF577TX16NEj1+Vkjcvl0tlnny1JGjx4sDZs2KD/+q//0hNPPJHjytrOxo0bVVdXp0GDBqXXJRIJvfbaa1q0aJGi0agcDkcOK8yegoICDRgwQP/4xz9yXUqb6t69+xGB+5vf/KaeffbZHFWUfbt379bLL7+s5557LtelZM1PfvIT/fSnP9X1118vKRXId+/ererq6g4XZvr27at169apsbFRwWBQ3bt313XXXafevXvnurSsabmz0u/3q3v37un1dXV1R/TWnAhjZk7ANE3NnDlTzz33nF599dUO/S/W0ZimqWg0musy2tSoUaO0efNmbdq0Kb0MHjxYN9xwgzZt2tRhg4wkRaNRffDBBxn/4egIhg8ffsSUCTt27NBZZ52Vo4qyb+nSperWrZvGjx+f61KypqmpSXZ75s+Uw+HokLdmtygoKFD37t1VX1+vNWvW6Kqrrsp1SVnTu3dv+Xy+9B15Umoc2Lp16zRs2LBW7YuemROYMWOGVq5cqT/84Q/yer3pa3xFRUXyeDw5rq5t3XPPPbr88stVXl6uUCikVatW6a9//av+/Oc/57q0NuX1eo8Y81RQUKAuXbp0uLFQs2fP1oQJE9SzZ0/V1dXpgQceUDAY7HD/V/vjH/9Yw4YNU1VVla699lq9/fbbWrJkiZYsWZLr0rIimUxq6dKlmjx5spzOjvuf8QkTJmj+/Pnq2bOnzjvvPL333ntasGCBbr755lyX1ubWrFkj0zR1zjnn6MMPP9RPfvITnXPOOfr3f//3XJf2tTQ0NOjDDz9Mf965c6c2bdqkzp07q2fPnqqsrFRVVZX69eunfv36qaqqSvn5+Zo0aVLrDtQGd1t1aJKOuixdujTXpbW5m2++2TzrrLNMl8tldu3a1Rw1apS5du3aXJfVLjrqrdnXXXed2b17dzMvL88sKyszJ06caG7dujXXZWXFH//4R7OiosI0DMM899xzzSVLluS6pKxZs2aNKcncvn17rkvJqmAwaN5xxx1mz549Tbfbbfbp08e89957zWg0muvS2twzzzxj9unTx3S5XKbP5zNnzJhhfv7557ku62v7y1/+ctTf0MmTJ5ummbo9+7777jN9Pp9pGIb5ne98x9y8eXOrj2MzTdNsg/AFAACQE4yZAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlvb/Aa1yW8Mj68H3AAAAAElFTkSuQmCC", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "def sort_both(x, y):\n", - " sort_inds = np.argsort(x)\n", - " return x[sort_inds], y[sort_inds]\n", - "\n", - "\n", - "preds = np.zeros((10, y_test.size))\n", - "for i in range(10):\n", - " y_pred = models[i].predict(X_test)\n", - " preds[i, :] = y_pred\n", - "\n", - "means = np.mean(preds, axis=0)\n", - "vars = np.var(preds, axis=0)\n", - "\n", - "bias = np.mean((y_test - means) ** 2)\n", - "variance = np.mean(vars)\n", - "mse = np.mean((preds - y_test) ** 2)\n", - "print(bias + variance)\n", - "print(mse)\n", - "\n", - "for i in range(10):\n", - " y_pred = models[i].predict(X_test)\n", - " plt.plot(*sort_both(x_test, y_pred), lw=0.5, color=\"red\")\n", - "plt.scatter(*sort_both(X_test[:, 1], y_test))\n", - "# plt.scatter(*sort_both(X_train[:, 1], y_train))\n", - "sort_inds = np.argsort(x_test)\n", - "plt.errorbar(*sort_both(x_test, means), yerr=vars[sort_inds])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "15.94085519235225\n" - ] - } - ], - "source": [ - "print(bias)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "76.19079943912999\n" - ] - } - ], - "source": [ - "print(variance)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.18" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/doc/pub/week38/ipynb/week38.ipynb b/doc/pub/week38/ipynb/week38.ipynb index 4cc0c95c5..cd2b6ab03 100644 --- a/doc/pub/week38/ipynb/week38.ipynb +++ b/doc/pub/week38/ipynb/week38.ipynb @@ -2,2687 +2,1902 @@ "cells": [ { "cell_type": "markdown", - "id": "a811ba80", - "metadata": {}, + "id": "cd058661", + "metadata": { + "editable": true + }, "source": [ "\n", - "" + "" ] }, { "cell_type": "markdown", - "id": "e5014a9c", - "metadata": {}, + "id": "bb0e0285", + "metadata": { + "editable": true + }, "source": [ - "# Week 38: Logistic Regression and Optimization\n", - "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University\n", + "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n", + "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n", "\n", - "Date: **September 16-20, 2024**" + "Date: **September 15-19, 2025**" ] }, { "cell_type": "markdown", - "id": "023eb6d1", - "metadata": {}, + "id": "5d0bf374", + "metadata": { + "editable": true + }, "source": [ - "## Plans for week 38, lecture Monday September 16\n", + "## Plans for week 38, lecture Monday September 15\n", + "\n", + "**Material for the lecture on Monday September 15.**\n", "\n", - "**Material for the lecture on Monday September 16.**\n", + "1. Statistical interpretation of OLS and various expectation values\n", "\n", - " * Logistic regression as our first encounter of classification methods. From binary cases to several categories.\n", + "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n", "\n", - " * Start gradient and optimization methods\n", + "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at \n", "\n", - " * [Video of lecture](https://youtu.be/c9DIfNHy2ks)\n", + "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n", "\n", - " * Whiteboard notes at " + "5. [Video from lab sessions on the bias-variance tradeoff](https://youtu.be/GBWc1abChKo)\n", + "\n", + "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)" ] }, { "cell_type": "markdown", - "id": "e981c015", - "metadata": {}, + "id": "38a10c06", + "metadata": { + "editable": true + }, "source": [ - "## Suggested reading and videos\n", - " * Readings and Videos:\n", + "## Readings and Videos\n", + "1. Raschka et al, pages 175-192\n", "\n", - " * Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n", + "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See .\n", "\n", - " * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n", + "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n", "\n", - " * For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.\n", + "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n", "\n", - " * [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n", + "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n", "\n", - " * [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)\n", - "\n", - " * [Video on gradient descent](https://www.youtube.com/watch?v=sDv4f4s2SB8)" + "For the lab session, the following video on cross validation (from 2024), could be helpful, see " ] }, { "cell_type": "markdown", - "id": "11590c09", - "metadata": {}, + "id": "2beeb82a", + "metadata": { + "editable": true + }, "source": [ - "## Plans for the lab sessions\n", + "## Linking the regression analysis with a statistical interpretation\n", "\n", - "**Material for the active learning sessions on Tuesday and Wednesday.**\n", + "We will now couple the discussions of ordinary least squares, Ridge\n", + "and Lasso regression with a statistical interpretation, that is we\n", + "move from a linear algebra analysis to a statistical analysis. In\n", + "particular, we will focus on what the regularization terms can result\n", + "in. We will amongst other things show that the regularization\n", + "parameter can reduce considerably the variance of the parameters\n", + "$\\theta$.\n", "\n", - " * Repetition from last week on the bias-variance tradeoff\n", + "On of the advantages of doing linear regression is that we actually end up with\n", + "analytical expressions for several statistical quantities. \n", + "Standard least squares and Ridge regression allow us to\n", + "derive quantities like the variance and other expectation values in a\n", + "rather straightforward way.\n", "\n", - " * Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method\n", - "\n", - " * Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at \n", - "\n", - " * Work on project 1, in particular resampling methods like cross-validation and bootstrap.\n", - "\n", - " * [Video on cross-validation from exercise session](https://youtu.be/T9jjWsmsd1o)" + "It is assumed that $\\varepsilon_i\n", + "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n", + "independent, i.e.:" ] }, { "cell_type": "markdown", - "id": "57e011be", - "metadata": {}, + "id": "84021a7f", + "metadata": { + "editable": true + }, "source": [ - "## Material for lecture Monday September 16" + "$$\n", + "\\begin{align*} \n", + "\\mbox{Cov}(\\varepsilon_{i_1},\n", + "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n", + "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2. \\end{array} \\right.\n", + "\\end{align*}\n", + "$$" ] }, { "cell_type": "markdown", - "id": "0896e712", - "metadata": {}, + "id": "1291c926", + "metadata": { + "editable": true + }, "source": [ - "## Logistic Regression\n", + "The randomness of $\\varepsilon_i$ implies that\n", + "$\\mathbf{y}_i$ is also a random variable. In particular,\n", + "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n", + "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n", + "non-random scalar. To specify the parameters of the distribution of\n", + "$\\mathbf{y}_i$ we need to calculate its first two moments. \n", "\n", - "In linear regression our main interest was centered on learning the\n", - "coefficients of a functional fit (say a polynomial) in order to be\n", - "able to predict the response of a continuous variable on some unseen\n", - "data. The fit to the continuous variable $y_i$ is based on some\n", - "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n", - "analytical expressions for standard ordinary Least Squares or Ridge\n", - "regression (in terms of matrices to invert) for several quantities,\n", - "ranging from the variance and thereby the confidence intervals of the\n", - "parameters $\\boldsymbol{\\beta}$ to the mean squared error. If we can invert\n", - "the product of the design matrices, linear regression gives then a\n", - "simple recipe for fitting our data." + "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n", + "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n", + "row number $i$ and perform a sum over all values $p$." ] }, { "cell_type": "markdown", - "id": "44bb3650", - "metadata": {}, + "id": "bf15a73d", + "metadata": { + "editable": true + }, "source": [ - "## Classification problems\n", - "\n", - "Classification problems, however, are concerned with outcomes taking\n", - "the form of discrete variables (i.e. categories). We may for example,\n", - "on the basis of DNA sequencing for a number of patients, like to find\n", - "out which mutations are important for a certain disease; or based on\n", - "scans of various patients' brains, figure out if there is a tumor or\n", - "not; or given a specific physical system, we'd like to identify its\n", - "state, say whether it is an ordered or disordered system (typical\n", - "situation in solid state physics); or classify the status of a\n", - "patient, whether she/he has a stroke or not and many other similar\n", - "situations.\n", + "## Assumptions made\n", "\n", - "The most common situation we encounter when we apply logistic\n", - "regression is that of two possible outcomes, normally denoted as a\n", - "binary outcome, true or false, positive or negative, success or\n", - "failure etc." + "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n", + "that there exists a function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n", + "which describe our data" ] }, { "cell_type": "markdown", - "id": "921c6771", - "metadata": {}, + "id": "ed7830e9", + "metadata": { + "editable": true + }, "source": [ - "## Optimization and Deep learning\n", - "\n", - "Logistic regression will also serve as our stepping stone towards\n", - "neural network algorithms and supervised deep learning. For logistic\n", - "learning, the minimization of the cost function leads to a non-linear\n", - "equation in the parameters $\\boldsymbol{\\beta}$. The optimization of the\n", - "problem calls therefore for minimization algorithms. This forms the\n", - "bottle neck of all machine learning algorithms, namely how to find\n", - "reliable minima of a multi-variable function. This leads us to the\n", - "family of gradient descent methods. The latter are the working horses\n", - "of basically all modern machine learning algorithms.\n", - "\n", - "We note also that many of the topics discussed here on logistic \n", - "regression are also commonly used in modern supervised Deep Learning\n", - "models, as we will see later." + "$$\n", + "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", + "$$" ] }, { "cell_type": "markdown", - "id": "f80e9666", - "metadata": {}, + "id": "b1d75235", + "metadata": { + "editable": true + }, "source": [ - "## Basics\n", - "\n", - "We consider the case where the dependent variables, also called the\n", - "responses or the outcomes, $y_i$ are discrete and only take values\n", - "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n", - "\n", - "The goal is to predict the\n", - "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n", - "made of $n$ samples, each of which carries $p$ features or predictors. The\n", - "primary goal is to identify the classes to which new unseen samples\n", - "belong.\n", - "\n", - "Let us specialize to the case of two classes only, with outputs\n", - "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n", - "credit card user that could default or not on her/his credit card\n", - "debt. That is" + "We approximate this function with our model from the solution of the linear regression equations, that is our\n", + "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with" ] }, { "cell_type": "markdown", - "id": "952f8119", - "metadata": {}, + "id": "0255cd11", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\ 1 & \\mathrm{yes} \\end{bmatrix}.\n", + "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "9b587b40", - "metadata": {}, + "id": "04897143", + "metadata": { + "editable": true + }, "source": [ - "## Linear classifier\n", - "\n", - "Before moving to the logistic model, let us try to use our linear\n", - "regression model to classify these two outcomes. We could for example\n", - "fit a linear model to the default case if $y_i > 0.5$ and the no\n", - "default case $y_i \\leq 0.5$.\n", + "## Expectation value and variance\n", "\n", - "We would then have our \n", - "weighted linear combination, namely" + "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$" ] }, { "cell_type": "markdown", - "id": "bfb711d7", - "metadata": {}, + "id": "2a6cea60", + "metadata": { + "editable": true + }, "source": [ - "\n", - "
    \n", - "\n", "$$\n", - "\\begin{equation}\n", - "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\beta} + \\boldsymbol{\\epsilon},\n", - "\\label{_auto1} \\tag{1}\n", - "\\end{equation}\n", + "\\begin{align*} \n", + "\\mathbb{E}(y_i) & =\n", + "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n", + "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n", + "\\end{align*}\n", "$$" ] }, { "cell_type": "markdown", - "id": "0acaaf3c", - "metadata": {}, + "id": "08eb2262", + "metadata": { + "editable": true + }, "source": [ - "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n", - "$n\\times p$ design matrix and $\\boldsymbol{\\beta}$ represents our estimators/predictors." + "while\n", + "its variance is" ] }, { "cell_type": "markdown", - "id": "73564ce7", - "metadata": {}, + "id": "0f36d3c2", + "metadata": { + "editable": true + }, "source": [ - "## Some selected properties\n", - "\n", - "The main problem with our function is that it takes values on the\n", - "entire real axis. In the case of logistic regression, however, the\n", - "labels $y_i$ are discrete variables. A typical example is the credit\n", - "card data discussed below here, where we can set the state of\n", - "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n", - "in the data set (see the full example below).\n", - "\n", - "One simple way to get a discrete output is to have sign\n", - "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n", - "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n", - "We will encounter this model in our first demonstration of neural networks.\n", - "\n", - "Historically it is called the **perceptron** model in the machine learning\n", - "literature. This model is extremely simple. However, in many cases it is more\n", - "favorable to use a ``soft\" classifier that outputs\n", - "the probability of a given category. This leads us to the logistic function." + "$$\n", + "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n", + "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n", + "[\\mathbb{E}(y_i)]^2 \\\\ & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n", + "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n", + "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n", + "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n", + "\\ast} \\, \\theta)^2 \\\\ & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n", + "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n", + "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n", + "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n", + "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2. \n", + "\\end{align*}\n", + "$$" ] }, { "cell_type": "markdown", - "id": "ef6011fd", - "metadata": {}, + "id": "ea74022f", + "metadata": { + "editable": true + }, "source": [ - "## Simple example\n", - "\n", - "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful." + "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n", + "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)." ] }, { - "cell_type": "code", - "execution_count": 1, - "id": "3444ad7b", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "d6eba03b", + "metadata": { + "editable": true + }, "source": [ - "%matplotlib inline\n", - "\n", - "# Common imports\n", - "import os\n", - "import numpy as np\n", - "import pandas as pd\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.utils import resample\n", - "from sklearn.metrics import mean_squared_error\n", - "from IPython.display import display\n", - "from pylab import plt, mpl\n", - "plt.style.use('seaborn')\n", - "mpl.rcParams['font.family'] = 'serif'\n", - "\n", - "# Where to save the figures and data files\n", - "PROJECT_ROOT_DIR = \"Results\"\n", - "FIGURE_ID = \"Results/FigureFiles\"\n", - "DATA_ID = \"DataFiles/\"\n", + "## Expectation value and variance for $\\boldsymbol{\\theta}$\n", "\n", - "if not os.path.exists(PROJECT_ROOT_DIR):\n", - " os.mkdir(PROJECT_ROOT_DIR)\n", - "\n", - "if not os.path.exists(FIGURE_ID):\n", - " os.makedirs(FIGURE_ID)\n", - "\n", - "if not os.path.exists(DATA_ID):\n", - " os.makedirs(DATA_ID)\n", - "\n", - "def image_path(fig_id):\n", - " return os.path.join(FIGURE_ID, fig_id)\n", - "\n", - "def data_path(dat_id):\n", - " return os.path.join(DATA_ID, dat_id)\n", - "\n", - "def save_fig(fig_id):\n", - " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", - "\n", - "infile = open(data_path(\"chddata.csv\"),'r')\n", - "\n", - "# Read the chd data as csv file and organize the data into arrays with age group, age, and chd\n", - "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n", - "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n", - "output = chd['CHD']\n", - "age = chd['Age']\n", - "agegroup = chd['Agegroup']\n", - "numberID = chd['ID'] \n", - "display(chd)\n", - "\n", - "plt.scatter(age, output, marker='o')\n", - "plt.axis([18,70.0,-0.1, 1.2])\n", - "plt.xlabel(r'Age')\n", - "plt.ylabel(r'CHD')\n", - "plt.title(r'Age distribution and Coronary heart disease')\n", - "plt.show()" + "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value" ] }, { "cell_type": "markdown", - "id": "01d01242", - "metadata": {}, + "id": "b8a7314f", + "metadata": { + "editable": true + }, "source": [ - "## Plotting the mean value for each group\n", - "\n", - "What we could attempt however is to plot the mean value for each group." + "$$\n", + "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 2, - "id": "143c59fe", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "ed668c22", + "metadata": { + "editable": true + }, "source": [ - "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n", - "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n", - "plt.plot(group, agegroupmean, \"r-\")\n", - "plt.axis([0,9,0, 1.0])\n", - "plt.xlabel(r'Age group')\n", - "plt.ylabel(r'CHD mean values')\n", - "plt.title(r'Mean values for each age group')\n", - "plt.show()" + "This means that the estimator of the regression parameters is unbiased.\n", + "\n", + "We can also calculate the variance\n", + "\n", + "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is" ] }, { "cell_type": "markdown", - "id": "42136436", - "metadata": {}, + "id": "6f4ab09a", + "metadata": { + "editable": true + }, "source": [ - "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n", - "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model" + "$$\n", + "\\begin{eqnarray*}\n", + "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n", + "\\\\\n", + "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n", + "\\\\\n", + "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\\\\n", + "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "% \\\\\n", + "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n", + "% \\\\\n", + "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n", + "\\\\\n", + "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n", + "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n", + "\\end{eqnarray*}\n", + "$$" ] }, { "cell_type": "markdown", - "id": "e8a7f059", - "metadata": {}, + "id": "7b7808c7", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "f(y_i\\vert x_i)=\\beta_0+\\beta_1 x_i.\n", - "$$" + "where we have used that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n", + "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n", + "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n", + "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n", + "variance of the estimate of the $j$-th regression coefficient:\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n", + "construct a confidence interval for the estimates.\n", + "\n", + "In a similar way, we can obtain analytical expressions for say the\n", + "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n", + "when we employ Ridge regression, allowing us again to define a confidence interval. \n", + "\n", + "It is rather straightforward to show that" ] }, { "cell_type": "markdown", - "id": "f1c0bcf8", - "metadata": {}, + "id": "456afe19", + "metadata": { + "editable": true + }, "source": [ - "This expression implies however that $f(y_i\\vert x_i)$ could take any\n", - "value from minus infinity to plus infinity. If we however let\n", - "$f(y\\vert y)$ be represented by the mean value, the above example\n", - "shows us that we can constrain the function to take values between\n", - "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n", - "at our last curve we see also that it has an S-shaped form. This leads\n", - "us to a very popular model for the function $f$, namely the so-called\n", - "Sigmoid function or logistic model. We will consider this function as\n", - "representing the probability for finding a value of $y_i$ with a given\n", - "$x_i$." + "$$\n", + "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n", + "$$" ] }, { "cell_type": "markdown", - "id": "e4fd2845", - "metadata": {}, + "id": "0a38fc64", + "metadata": { + "editable": true + }, "source": [ - "## The logistic function\n", + "We see clearly that \n", + "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n", "\n", - "Another widely studied model, is the so-called \n", - "perceptron model, which is an example of a \"hard classification\" model. We\n", - "will encounter this model when we discuss neural networks as\n", - "well. Each datapoint is deterministically assigned to a category (i.e\n", - "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n", - "classifier that outputs the probability of a given category rather\n", - "than a single value. For example, given $x_i$, the classifier\n", - "outputs the probability of being in a category $k$. Logistic regression\n", - "is the most common example of a so-called soft classifier. In logistic\n", - "regression, the probability that a data point $x_i$\n", - "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event," + "We can also compute the variance as" ] }, { "cell_type": "markdown", - "id": "f4bb77ad", - "metadata": {}, + "id": "851bebe1", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T} \\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n", "$$" ] }, { "cell_type": "markdown", - "id": "47fc800d", - "metadata": {}, + "id": "fe64e9b5", + "metadata": { + "editable": true + }, "source": [ - "Note that $1-p(t)= p(-t)$." + "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n", + "\n", + "With this, we can compute the difference" ] }, { "cell_type": "markdown", - "id": "0fe9154b", - "metadata": {}, + "id": "496492d5", + "metadata": { + "editable": true + }, "source": [ - "## Examples of likelihood functions used in logistic regression and nueral networks\n", - "\n", - "The following code plots the logistic function, the step function and other functions we will encounter from here and on." + "$$\n", + "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 8, - "id": "150c4acd", - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiMAAAHFCAYAAAAg3/mzAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABH+klEQVR4nO3deVxVdeLG8c9lu+woCCgKglvuG6apmVpJWdO+OC0u/bQymzZarZlSp8mxbZxKrCbLbDGzdSozaSqz1Mnd3BdUVEAElVUul3vP7w+SiUADRM5dnvfrxSvv4Zx7n8vXi0/nfM85FsMwDERERERM4mN2ABEREfFuKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZERETEVCojIi5o3LhxJCYmmh3jd1ksFqZMmfK7682dOxeLxcLevXt/d90XX3yRDh06EBAQgMVi4dixY6eds6EWLVp00veXmJjIuHHjmjSPiKfyMzuAiNT0l7/8hXvuucfsGL9rxYoVtGnTptGeb/369dx9991MmDCBsWPH4ufnR1hYWKM9f30tWrSIWbNm1VpIPv74Y8LDw5s+lIgHUhkRcUHt27c3O0KdnHPOOY36fJs3bwbg1ltvpX///o363I2tT58+ZkcQ8Rg6TCPSxA4fPsxtt91GfHw8VquV6OhoBg8ezNdff121Tm2HaY4dO8b48eOJjIwkNDSUSy+9lIyMjBqHSqZMmYLFYmHjxo1cd911REREEBkZSWpqKhUVFWzfvp2LL76YsLAwEhMTefrpp2tkzMzM5OabbyYmJgar1UqXLl147rnncDqd1dar7TDNypUrGTx4MIGBgcTFxTF58mTsdvvv/lyGDRvGzTffDMCAAQOwWCxVh0FOdkhk2LBhDBs2rOrxd999h8ViYf78+Tz22GPExcURHh7OhRdeyPbt22tsv3jxYi644AIiIiIIDg6mS5cuTJ8+Hagcg1mzZlW9zxNfJw411ZapLj+3vXv3YrFYePbZZ3n++edJSkoiNDSUgQMHsnLlyt/9OYl4Iu0ZEWlio0ePZu3atfztb3+jU6dOHDt2jLVr15Kfn3/SbZxOJ5dddhmrV69mypQp9O3blxUrVnDxxRefdJvrr7+em2++mdtvv5309HSefvpp7HY7X3/9NZMmTeKBBx7g3Xff5eGHH6ZDhw5cffXVQGVZGjRoEOXl5fz1r38lMTGRzz//nAceeIDdu3eTlpZ20tfcsmULF1xwAYmJicydO5fg4GDS0tJ49913f/fnkpaWxvz583nyySd544036Ny5M9HR0b+7XW0effRRBg8ezGuvvUZhYSEPP/wwl112GVu3bsXX1xeAOXPmcOuttzJ06FBefvllYmJi2LFjB5s2bQIqD5WVlJTwwQcfsGLFiqrnbtWqVa2vWd+f26xZs+jcuTMzZ86ser1LLrmEPXv2EBER0aD3LeK2DBFpUqGhoca99957ynXGjh1rtG3bturxF198YQDG7Nmzq603ffp0AzCeeOKJqmVPPPGEARjPPfdctXV79+5tAMZHH31UtcxutxvR0dHG1VdfXbXskUceMQDjv//9b7Xt77jjDsNisRjbt2+vWvbb1x41apQRFBRk5OTkVC2rqKgwOnfubADGnj17Tvm+33jjDQMwVq1aVW1527ZtjbFjx9ZYf+jQocbQoUOrHn/77bcGYFxyySXV1nv//fcNwFixYoVhGIZRVFRkhIeHG+eee67hdDpPmufOO+80TvZr8reZ6vpz27NnjwEYPXr0MCoqKqrW++mnnwzAmD9//knziHgqHaYRaWL9+/dn7ty5PPnkk6xcubJOhzCWLl0KVO7t+LUbbrjhpNv84Q9/qPa4S5cuWCwWRo4cWbXMz8+PDh06sG/fvqpl33zzDV27dq0xZ2PcuHEYhsE333xz0tf89ttvueCCC4iNja1a5uvry6hRo07x7hrf5ZdfXu1xz549Aare5/LlyyksLGTSpElYLJZGec36/twuvfTSqr00tWUU8SYqIyJNbMGCBYwdO5bXXnuNgQMHEhkZyZgxY8jJyTnpNvn5+fj5+REZGVlt+a//0f+t364bEBBAcHAwgYGBNZaXlZVVe63aDkXExcVVff9UOVu2bFljeW3LzqSoqKhqj61WKwDHjx8HKg+pAI16JlB9f26/l1HEm6iMiDSxFi1aMHPmTPbu3cu+ffuYPn06H3300SmvWREVFUVFRQVHjhyptvxUBaahoqKiyM7OrrE8KysLqMx/qm1ry3S6OQMDA7HZbDWW5+XlNej5TsxFOXDgwGnl+rXT+bmJeDuVERETJSQk8Kc//YkRI0awdu3ak643dOhQoHKvyq+99957jZ7pggsuYMuWLTXyzJs3D4vFwvDhw0+67fDhw/nPf/7DoUOHqpY5HI4auesrMTGRjRs3Vlu2Y8eOWs+QqYtBgwYRERHByy+/jGEYJ12vPnsrTufnJuLtdDaNSBMqKChg+PDh3HjjjXTu3JmwsDBWrVrF4sWLq85mqc3FF1/M4MGDuf/++yksLCQ5OZkVK1Ywb948AHx8Gu//K+677z7mzZvHpZdeyrRp02jbti1ffPEFaWlp3HHHHXTq1Omk2/75z3/m3//+N+effz6PP/44wcHBzJo1i5KSktPKNHr0aG6++WYmTZrENddcw759+3j66acbfLZNaGgozz33HBMmTODCCy/k1ltvJTY2ll27drFhwwZeeuklAHr06AHAjBkzGDlyJL6+vvTs2ZOAgIAaz3k6PzcRb6cyItKEAgMDGTBgAG+99RZ79+7FbreTkJDAww8/zEMPPXTS7Xx8fPjss8+4//77+fvf/055eTmDBw/m7bff5pxzzqFZs2aNljE6Oprly5czefJkJk+eTGFhIe3atePpp58mNTX1lNt2796dr7/+mvvvv5+xY8fSvHlzRo8ezTXXXMNtt93W4Ew33ngjWVlZvPzyy7zxxht0796d2bNnM3Xq1AY/5/jx44mLi2PGjBlMmDABwzBITExk7Nix1V73xx9/JC0tjWnTpmEYBnv27Kn1Uv2n83MT8XYW41T7KEXEpb377rvcdNNN/PjjjwwaNMjsOCIiDaIyIuIm5s+fz8GDB+nRowc+Pj6sXLmSZ555hj59+lSd+isi4o50mEbETYSFhfHee+/x5JNPUlJSQqtWrRg3bhxPPvmk2dFERE6L9oyIiIiIqXRqr4iIiJhKZURERERMpTIiIiIipnKLCaxOp5OsrCzCwsIa7aZWIiIicmYZhkFRURFxcXGnvDijW5SRrKws4uPjzY4hIiIiDbB///5T3pjSLcpIWFgYUPlmwsPDTU7TMHa7nSVLlpCSkoK/v7/ZcbyexsN1aCxch8bCdXjKWBQWFhIfH1/17/jJuEUZOXFoJjw83K3LSHBwMOHh4W79F8tTaDxch8bCdWgsXIenjcXvTbHQBFYRERExlcqIiIiImEplREREREylMiIiIiKmUhkRERERU6mMiIiIiKlURkRERMRUKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZERETEVCojIiIiYiqVERERETGVyoiIiIiYSmVERERETKUyIiIiIqZSGRERERFTqYyIiIiIqVRGRERExFQqIyIiImKqepeR77//nssuu4y4uDgsFguffPLJ726zdOlSkpOTCQwMpF27drz88ssNySoiIiIeqN5lpKSkhF69evHSSy/Vaf09e/ZwySWXMGTIENatW8ejjz7K3XffzYcffljvsCIiIuJ5/Oq7wciRIxk5cmSd13/55ZdJSEhg5syZAHTp0oXVq1fz7LPPcs0119T35UVERMTD1LuM1NeKFStISUmptuyiiy5izpw52O12/P39a2xjs9mw2WxVjwsLCwGw2+3Y7fYzG/gMOZHbXfN7Go2H69BYuA6NhevwlLGoa/4zXkZycnKIjY2ttiw2NpaKigry8vJo1apVjW2mT5/O1KlTayxfsmQJwcHBZyxrU0hPTzc7gvyKxsN1aCxch8bCdbj7WJSWltZpvTNeRgAsFku1x4Zh1Lr8hMmTJ5Oamlr1uLCwkPj4eFJSUggPDz9zQc8gu91Oeno6I0aMqHVvkDQtjYfr0Fi4Do2F6zBrLMornBSV2SmyVVBU9r+vYlsFJeUOin/5c7GtgtJyB6XlDkrKKyixOSgtryAowJcPbz+n6vlOHNn4PWe8jLRs2ZKcnJxqy3Jzc/Hz8yMqKqrWbaxWK1artcZyf39/t/+AeMJ78CQaD9ehsXAdGgvX0dCxKLM7OFpazpGSco6W2DlSWs6x0so/HzteTkGpnWPH7RT88lV43E5hmZ0yu/O08oZZ/arlrWv2M15GBg4cyGeffVZt2ZIlS+jXr5/+souIiNRRmd1BbqGN3KIyDhfZOFxsI7fQRl6xjbzicvJLbOQXl5NfbKOk3HFarxVq9SMs8MSXP6FWP0ID/Qiz+hHyy1eY1Y9gqy8hAX4EB/gSYvUj1NqwWlHvrYqLi9m1a1fV4z179rB+/XoiIyNJSEhg8uTJHDx4kHnz5gEwceJEXnrpJVJTU7n11ltZsWIFc+bMYf78+Q0KLCIi4mlKyyvIOlZG1rHjZBccZ39+Cat2+fDBm2vILSonp7CMguP1m8zq52OheUgAkcEBNA/xp3lwAM2CA2gW7E+zIH+aBfsTEeRPeJA/4YH/+3Oo1Q9fn9qnUZwp9S4jq1evZvjw4VWPT8ztGDt2LHPnziU7O5vMzMyq7yclJbFo0SLuu+8+Zs2aRVxcHC+88IJO6xUREa9R4XBy8Nhx9uaXkplfwoGjxzlw9Dj7j5Zy4OhxjpSU17KVDxzOr7bE6udDTLiV6FArMWGBRIdZaRFqpUVYAFEhVlqEBtAi1ErzkADCA/1OOjfT1dS7jAwbNqxqAmpt5s6dW2PZ0KFDWbt2bX1fSkRExG0YhkFOYRm7c0vIyCsm43AJGXkl7Msv4eDR41Q4T/5vJ1TOt4hrFkSrZoG0DLdSmLOP8/r1JK55CC0jAokND3SrglEfTXI2jYiIiKcwDIOsgjJ2HCpi56EitucUszO3iF25xZSeYq6G1c+HtlHBJESGEB8ZRJvmwcQ3r/xvm8ggwgP/N4/SbrezaNFeLunb2ivmV6qMiIiInER5hZMdh4rYkl3IlqxCtmZXfhWWVdS6vp+PhYSoYNq1CKV9dAjtokNoGxVCYlQIMWFWfJp4Loa7UBkRERGhcl7Hztxifj5QwMaDx/j5QAFbs4sod9Q83dXPx0K76BA6xobRKSaMs1qG0iEmjLZRwfj71vu2b15PZURERLzS0ZJy1u0/ytp9x1iz7ygbDhyr9TBLRJA/3eLC6dKq8qtrq3A6xIQS4KfS0VhURkRExCvkFpXx34wj/LTnCP/dk8+OQ8U11gm1+tGjdQQ920TQo00EPVs3Iz4yyCMnjboSlREREfFIBcftrNidzw+7DrN8Vz4ZeSU11mkXHULfhOaVX22b0TEmrMmvsSEqIyIi4iEcToP1+4+xdHsu3+/MY+OBY/z6bFqLBbq2Cqd/UiQDkqI4O7E5UaE1bz0iTU9lRERE3FZBqZ2lOw/z7bZclu44XOPiYe2jQzi3QwvO7RhN/6RIIoI8/zRZd6QyIiIibiW3sIyvthziq005rMjIx/Gr3R/hgX6c1yma8zpFc26HFsQ1CzIxqdSVyoiIiLi87ILjfLExmy835bA28yi/vhB4x5hQzu8cw/mdY0hu2xw/nVrrdlRGRETEJR0rLWfRzzl8uv4gP+09Uq2A9I5vxsjuLbmoW0sSW4SYF1IahcqIiIi4DLvDyTfbclm4+gBLd+Rid/yvgZyd2JxLe7Tiou4taRWhwy+eRGVERERMt+NQEQtX7+ejtQfJ/9Uk1C6twrmidxyX9YqjteZ/eCyVERERMUWZ3cHnG7N5e+U+1u8/VrU8OszK1X1bc03fNnSKDTMvoDQZlREREWlS+4+U8vZ/9/H+qv0cLbUDlfd6uaBLDNf3i2dop2hNQvUyKiMiInLGGYbByowjvLYsg2+251ZNRm3dLIgbByRwfb94osN0ATJvpTIiIiJnTIXDyZebcvjXsgw2HiioWj6kYwtGn9OWC7rE6vLrojIiIiKNr8zuYP5Pmby2bA8Hjx0HwOrnw3X92nDL4CTaR4eanFBcicqIiIg0mtLyCt5Zmckr32eQV2wDICokgDEDE7n5nATdC0ZqpTIiIiKnrcRWwVsr9/Gv7zOqTs1t0zyIiUPbc21yGwL9fU1OKK5MZURERBrMVuHgnZWZvPTtrqqb1CVEBvOn4R24qm9r/HVWjNSByoiIiNSbw2nw6fqDPJ++gwNHK+eEJEYF86fzO3JF7ziVEKkXlREREamXb7fnMuPLbWzLKQIgJszKfSM6cV1yG10fRBpEZUREROpkV24xT36xhe+2HwYgLNCPO4a155ZBSQQFaE6INJzKiIiInFJhmZ0Xvt7J3OV7qXAa+PtaGDcokTuHd6BZcIDZ8cQDqIyIiEitnE6DD9Yc4OmvtpFXXDk59YLOMTx2aRfa6Toh0ohURkREpIZduUU8+tEmftp7BID20SH85Q9dGXZWjMnJxBOpjIiISBW7E2b+ZxevLtuD3WEQHODLfRd2YtzgRJ0hI2eMyoiIiACwMuMIMzb4crgsA6g8JDPtyu60bhZkcjLxdCojIiJerrS8gr9/uY15K/YBFmLCrEy9vBsXd2+JxaKb2MmZpzIiIuLFVu89wv0LN7AvvxSAQbFOXpowiMiwYJOTiTdRGRER8UJldgfPLdnOaz/swTCgVUQgT13ZjcId/yUs0N/seOJlVEZERLzMtpxC7np3HTtziwG4LrkNf7msK0G+sGiHyeHEK6mMiIh4CcMwePu/mTz5+RZsFU5ahFqZcU0PLugSC4Ddbjc5oXgrlRERES9wrLSchz/cyFebDwEw/Kxonr2uF1GhVpOTiaiMiIh4vFV7j3DP/HVkFZTh72vh4Ys783+Dk/Dx0Zky4hpURkREPJRhGMz5YQ/Tv9yGw2mQGBXMizf0pUebCLOjiVSjMiIi4oFKbBU8/OFGPt+YDcAVveP421U9CLXq1764Hv2tFBHxMBmHi5n49hp2HCrGz8fCny/twthBibqAmbgslREREQ+yZHMOqe9voNhWQXSYlbSb+nJ2YqTZsUROSWVERMQDGIbB7KW7eXrxdgD6J0by0o19iAkPNDmZyO9TGRERcXO2CgeTP/qZj9YeBGD0OW15/LKuusuuuA2VERERN5ZfbOP2t9awet9RfH0sPHFZV8YMTDQ7lki9qIyIiLipHYeKGP/mKvYfOU6Y1Y9ZN/XlvE7RZscSqTeVERERN7QyI59b562mqKyChMhgXh/Xjw4xYWbHEmkQlRERETfz5c/Z3LNgPeUVTvq1bc6rY/oRGRJgdiyRBlMZERFxI28u38uUzzZjGJDSNZYXbuhDoL+v2bFETovKiIiIGzAMg2eXbGfWt7sBuGlAAtOu6I6v7i8jHkBlRETExTmcBpM/2sj7qw8AcP+ITvzp/A66oqp4DJUREREXZnc4uW/Bej7fmI2PBaZf3YNRZyeYHUukUTXoijhpaWkkJSURGBhIcnIyy5YtO+X677zzDr169SI4OJhWrVpxyy23kJ+f36DAIiLeoszu4I631/D5xmz8fS3MurGvioh4pHqXkQULFnDvvffy2GOPsW7dOoYMGcLIkSPJzMysdf0ffviBMWPGMH78eDZv3szChQtZtWoVEyZMOO3wIiKeqrS8glvnrebrrblY/Xx4dXQ/RvZoZXYskTOi3mXk+eefZ/z48UyYMIEuXbowc+ZM4uPjmT17dq3rr1y5ksTERO6++26SkpI499xzuf3221m9evVphxcR8URFZXbGvv4Ty3bmERzgyxu3nM3wzjFmxxI5Y+pVRsrLy1mzZg0pKSnVlqekpLB8+fJatxk0aBAHDhxg0aJFGIbBoUOH+OCDD7j00ksbnlpExEMVltkZPecnVu09SligH2+NH8Cg9i3MjiVyRtVrAmteXh4Oh4PY2Nhqy2NjY8nJyal1m0GDBvHOO+8watQoysrKqKio4PLLL+fFF1886evYbDZsNlvV48LCQgDsdjt2u70+kV3Gidzumt/TaDxch8bif4ptFfzfm2tYv7+AZkH+zB2XTLe40Cb72WgsXIenjEVd8zfobJrfnk5mGMZJTzHbsmULd999N48//jgXXXQR2dnZPPjgg0ycOJE5c+bUus306dOZOnVqjeVLliwhODi4IZFdRnp6utkR5Fc0Hq7D28fC5oCXt/qSUWQhyNfg1o7H2bf+B/atb/os3j4WrsTdx6K0tLRO61kMwzDq+qTl5eUEBwezcOFCrrrqqqrl99xzD+vXr2fp0qU1thk9ejRlZWUsXLiwatkPP/zAkCFDyMrKolWrmhOyatszEh8fT15eHuHh4XWN61Lsdjvp6emMGDECf39/s+N4PY2H69BYVE5WnfDWuqpDM2+OS6ZH64gmz6GxcB2eMhaFhYW0aNGCgoKCU/77Xa89IwEBASQnJ5Oenl6tjKSnp3PFFVfUuk1paSl+ftVfxte38tLFJ+tBVqsVq9VaY7m/v79bDwp4xnvwJBoP1+GtY3G83MHEdzZUFhFr5RyR3vHNTM3krWPhitx9LOqavd5n06SmpvLaa6/x+uuvs3XrVu677z4yMzOZOHEiAJMnT2bMmDFV61922WV89NFHzJ49m4yMDH788Ufuvvtu+vfvT1xcXH1fXkTEY5RXOLn97TWsyMgn1OrHm+P7m15ERMxQ7zkjo0aNIj8/n2nTppGdnU337t1ZtGgRbdu2BSA7O7vaNUfGjRtHUVERL730Evfffz/NmjXj/PPPZ8aMGY33LkRE3IzDaXDfgvV8v+MwQf6+zL3lbPomNDc7logpGjSBddKkSUyaNKnW782dO7fGsrvuuou77rqrIS8lIuJxDMPgz5/8zBc/V15Z9dUxyfRLjDQ7lohpGnQ5eBERabgZi7cz/6f9+Fjgn3/sw5CO0WZHEjGVyoiISBOa/d1uXl66G6i86d0lusS7iMqIiEhTee+nTGYs3gbAo5d01k3vRH6hMiIi0gS+2XaIRz/+GYA7hrXntvPam5xIxHWojIiInGEbDxzjznfW4TTg2uQ2PHTRWWZHEnEpKiMiImdQZn4p/zd3FcftDoZ0bMH0q3uc9PYZIt5KZURE5Aw5WlLOuDd+Iq+4nK6twpl9czL+vvq1K/Jb+lSIiJwBZXYHE+atJiOvhNbNgnjjlrMJtTbo0k4iHk9lRESkkTmdBve/v4E1+44SHujH3FvOJjY80OxYIi5LZUREpJHN/HrHr66u2o+OsWFmRxJxaSojIiKN6NP1B3nhm10APHVVD85pF2VyIhHXpzIiItJI1mYe5cEPNgJw+9B2XNcv3uREIu5BZUREpBEcPHac2+atobzCyYVdYnnoos5mRxJxGyojIiKnqdhWwfi5q8grttGlVTj//GNvfH10LRGRulIZERE5DZVnzqxnW04RLUKtvDa2HyE6hVekXlRGREROw6xvd/HV5kME+Prw6phkWjcLMjuSiNtRGRERaaBvth3i+a93APDXK7vRN6G5yYlE3JPKiIhIA2QcLuae+esxDLj5nARGnZ1gdiQRt6UyIiJST0Vldm57aw1FtgrOTmzO43/oZnYkEbemMiIiUg8nLvW+K7eYluGBzLqpLwF++lUqcjr0CRIRqYe073axZEvlhNXZN/clJkz3nBE5XSojIiJ19OOuPJ5P/9+E1T6asCrSKFRGRETqIKegjLvnr8NpwPX92mjCqkgjUhkREfkddoeTO99dS35JOV1bhTPtiu5mRxLxKCojIiK/4+9fbmPNvqOEBfox++a+BPr7mh1JxKOojIiInMKXP2cz54c9ADx7XS/aRoWYnEjE86iMiIicxJ68Eh78YCMAt5/Xjou6tTQ5kYhnUhkREalFmd3Bne+spdhWQf+kSB686CyzI4l4LJUREZFaPLVoK1uyC4kMCeDFG/rg56tflyJnij5dIiK/sXhTNvNW7APg+et7ERuuC5uJnEkqIyIiv7L/SCkP/WqeyLCzYkxOJOL5VEZERH5hdzi5+711FJZV0Du+GQ9onohIk1AZERH5xbNLtrMu8xhhgX68eEMf/DVPRKRJ6JMmIgIs3XGYV5ZmAPD0NT2Jjww2OZGI91AZERGvl1ds4/73NwBw8zkJjOzRyuREIt5FZUREvJphGDz8wUbyim10ig3lz5d2NTuSiNdRGRERr/bWyn38Z1suAX4+/POPfXTfGRETqIyIiNfacaiIv32xFYBHLu5Ml1bhJicS8U4qIyLilcrsDu6evw5bhZOhnaK5ZXCi2ZFEvJbKiIh4pRmLt7Etp4gWoQE8e10vLBaL2ZFEvJbKiIh4naU7DvPGj3sBeObaXkSHWc0NJOLlVEZExKscLSnnwYWVp/GOHdiW4Z11uXcRs6mMiIjXMAyDP3+yidwiG+2jQ5h8SRezI4kIKiMi4kU+XZ/FFz9n4+dj4R+jeus0XhEXoTIiIl7h4LHj/OXTTQDcc0FHerZpZm4gEamiMiIiHs/pNHjg/Q0UlVXQJ6EZdwxrb3YkEfkVlRER8XhvLN/Liox8gvx9ef763vjpbrwiLkWfSBHxaDsPFTFj8TYA/vyHLiS1CDE5kYj8lsqIiHgsu8NJ6vsbKK9wMuysaG7sn2B2JBGphcqIiHistG938/PBAiKC/JlxTU9dZVXERamMiIhH2nSwgBe/2QnAtCu6ERseaHIiETmZBpWRtLQ0kpKSCAwMJDk5mWXLlp1yfZvNxmOPPUbbtm2xWq20b9+e119/vUGBRUR+j63Cwf3vb6DCaTCye0su7xVndiQROQW/+m6wYMEC7r33XtLS0hg8eDCvvPIKI0eOZMuWLSQk1H489vrrr+fQoUPMmTOHDh06kJubS0VFxWmHFxGpzT+/3sn2Q0VEhQTw5JXddXhGxMXVu4w8//zzjB8/ngkTJgAwc+ZMvvrqK2bPns306dNrrL948WKWLl1KRkYGkZGRACQmJp5eahGRk1ibeZSXl+4G4G9X9SAqVDfBE3F19Soj5eXlrFmzhkceeaTa8pSUFJYvX17rNv/+97/p168fTz/9NG+99RYhISFcfvnl/PWvfyUoKKjWbWw2GzabrepxYWEhAHa7HbvdXp/ILuNEbnfN72k0Hq6jMceizO7ggffX4zTgil6tuOCsKI1xPehz4To8ZSzqmr9eZSQvLw+Hw0FsbGy15bGxseTk5NS6TUZGBj/88AOBgYF8/PHH5OXlMWnSJI4cOXLSeSPTp09n6tSpNZYvWbKE4ODg+kR2Oenp6WZHkF/ReLiOxhiLj/f6kJHnQ4S/wTkB+1m0aH8jJPM++ly4Dncfi9LS0jqtV+/DNECN46+GYZz0mKzT6cRisfDOO+8QEREBVB7qufbaa5k1a1ate0cmT55Mampq1ePCwkLi4+NJSUkhPDy8IZFNZ7fbSU9PZ8SIEfj7+5sdx+tpPFxHY43F2sxjLF35EwDP/rEvwzpFN1ZEr6HPhevwlLE4cWTj99SrjLRo0QJfX98ae0Fyc3Nr7C05oVWrVrRu3bqqiAB06dIFwzA4cOAAHTt2rLGN1WrFaq15nNff39+tBwU84z14Eo2H6zidsSizO5j88WYMA67p24YR3XT2zOnQ58J1uPtY1DV7vU7tDQgIIDk5ucZuo/T0dAYNGlTrNoMHDyYrK4vi4uKqZTt27MDHx4c2bdrU5+VFRGr1j/QdZOSVEBNm5fE/dDU7jojUU72vM5Kamsprr73G66+/ztatW7nvvvvIzMxk4sSJQOUhljFjxlStf+ONNxIVFcUtt9zCli1b+P7773nwwQf5v//7v5NOYBURqau1mUf517IMAJ66qgcRwe77f5Ei3qrec0ZGjRpFfn4+06ZNIzs7m+7du7No0SLatm0LQHZ2NpmZmVXrh4aGkp6ezl133UW/fv2Iiori+uuv58knn2y8dyEiXqnM7uDBhRtwGnB1n9Zc2LX2w8Ui4toaNIF10qRJTJo0qdbvzZ07t8ayzp07u/2MYBFxPTO/3snuwyVEh1l5/DIdnhFxV7o3jYi4pQ37j/Hq95UXN3vqqh40Cw4wOZGINJTKiIi4HVuFgwc/qDw8c0XvOEbo8IyIW1MZERG3M+vb3ew4VEyL0ACmXNbN7DgicppURkTErWzJKiTt210ATL28O81DdHhGxN2pjIiI26hwOHnoww1UOA0u7taSS3q0NDuSiDQClRERcRuvLstg08FCIoL8mXZlt5PehkJE3IvKiIi4hV25xcz8eicAj/+hKzFhgSYnEpHGojIiIi7P4TR46IMNlFc4Gdopmqv7tjY7kog0IpUREXF581bsZW3mMUKtfjx1dQ8dnhHxMCojIuLS9h8p5enF2wF4eGRnWjfTPa1EPI3KiIi4LMMwePTjnzlud9A/KZKb+ieYHUlEzgCVERFxWR+uPciynXlY/Xz4+9U98PHR4RkRT6QyIiIuKbeojL9+vgWA+0Z0ol10qMmJRORMURkREZc05d+bKThup3vrcCacm2R2HBE5g1RGRMTlLN6Uw6Kfc/D1sTDjmp74+epXlYgn0ydcRFxKQamdv3y6CYCJQ9vRLS7C5EQicqapjIiIS3lq0VYOF9loFx3CXed3NDuOiDQBlRERcRnLd+WxYPV+AGZc05NAf1+TE4lIU1AZERGXcLzcwSMf/QzA6HPacnZipMmJRKSpqIyIiEv45ze7yDxSSquIQB66+Cyz44hIE/IzO4CISGYxvLFpHwB/u6o7YYH+JicSkaakPSMiYiq7w8n83b44Dbi8Vxznd441O5KINDGVEREx1b+W7SWr1ELzYH+euKyr2XFExAQqIyJiml25xbz03W4AHht5FlGhVpMTiYgZVEZExBROp8HkjzZidxh0aebk8l6tzI4kIiZRGRERU7zz332s2nuUkABfrm/nxGLRHXlFvJXKiIg0uaxjx/n7l9sAuH9ERyJ1dEbEq6mMiEiTMgyDP3+yiZJyB8ltm3NT/3izI4mIyVRGRKRJ/XtDFt9syyXA14cZ1/TAx0eHZ0S8ncqIiDSZIyXlTP1sCwB/Or8DHWLCTE4kIq5AZUREmsxfP9/CkZJyzooNY+LQ9mbHEREXoTIiIk3i2+25fLzuID4WmHFtTwL89OtHRCrpt4GInHHFtgoe++WOvLcMTqJ3fDNzA4mIS1EZEZEz7unF28gqKCM+Moj7UzqZHUdEXIzKiIicUav2HmHeiso78v796p4EB+hm4SJSncqIiJwxZXYHD3+4EYBR/eIZ3KGFyYlExBWpjIjIGfPiNzvJOFxCTJiVRy/tYnYcEXFRKiMickZszirg5aUZAEy7ojsRQf4mJxIRV6UyIiKNrsLh5OEPN+JwGlzSoyUXd29pdiQRcWEqIyLS6F5dlsGmg4VEBPkz5fJuZscRERenMiIijWpXbjEzv94JwON/6EpMWKDJiUTE1amMiEijcTgNHvpgA+UVToZ2iubqvq3NjiQibkBlREQazZvL97I28xihVj+euroHFovuyCsiv09lREQaRWZ+Kc98tR2AyZd0pnWzIJMTiYi7UBkRkdNmGAaPfLSR43YH57SL5IazE8yOJCJuRGVERE7be6v2s3x3PoH+Psy4pic+Pjo8IyJ1pzIiIqcl69hxnvpiKwAPpJxF26gQkxOJiLtRGRGRBjMMg8kf/UyRrYI+Cc24ZXCS2ZFExA2pjIhIgy1cfYClOw4T4OfDM9f2wleHZ0SkAVRGRKRBsguO89fPtwBw/4hOdIgJNTmRiLirBpWRtLQ0kpKSCAwMJDk5mWXLltVpux9//BE/Pz969+7dkJcVERfx68MzveObMWFIO7MjiYgbq3cZWbBgAffeey+PPfYY69atY8iQIYwcOZLMzMxTbldQUMCYMWO44IILGhxWRFzDB2sO8N32ysMzz17XU4dnROS01LuMPP/884wfP54JEybQpUsXZs6cSXx8PLNnzz7ldrfffjs33ngjAwcObHBYETFfTkEZ0345PJM6ohMdYsJMTiQi7q5eZaS8vJw1a9aQkpJSbXlKSgrLly8/6XZvvPEGu3fv5oknnmhYShFxCYZh8OjHP1NUVkGv+GZMOFdnz4jI6fOrz8p5eXk4HA5iY2OrLY+NjSUnJ6fWbXbu3MkjjzzCsmXL8POr28vZbDZsNlvV48LCQgDsdjt2u70+kV3Gidzumt/TaDwa5oO1B/lmWy7+vhamX9kVw+nA7nSc1nNqLFyHxsJ1eMpY1DV/vcrICb+9+ZVhGLXeEMvhcHDjjTcydepUOnXqVOfnnz59OlOnTq2xfMmSJQQHB9c/sAtJT083O4L8isaj7o7YYMYGX8DCxa0r2Ln6e3Y24vNrLFyHxsJ1uPtYlJaW1mk9i2EYRl2ftLy8nODgYBYuXMhVV11Vtfyee+5h/fr1LF26tNr6x44do3nz5vj6+lYtczqdGIaBr68vS5Ys4fzzz6/xOrXtGYmPjycvL4/w8PC6xnUpdrud9PR0RowYgb+/v9lxvJ7Go34Mw2Dcm2tYvvsIveMjeG9C/0abtKqxcB0aC9fhKWNRWFhIixYtKCgoOOW/3/XaMxIQEEBycjLp6enVykh6ejpXXHFFjfXDw8P5+eefqy1LS0vjm2++4YMPPiApqfbjzVarFavVWmO5v7+/Ww8KeMZ78CQaj7p5a+U+lu8+QqC/D89f35tAa0Cjv4bGwnVoLFyHu49FXbPX+zBNamoqo0ePpl+/fgwcOJBXX32VzMxMJk6cCMDkyZM5ePAg8+bNw8fHh+7du1fbPiYmhsDAwBrLRcQ17csvYfqiynvPPHxxZ9pF6+JmItK46l1GRo0aRX5+PtOmTSM7O5vu3buzaNEi2rZtC0B2dvbvXnNERNyD02nw4MKNlJY7OKddJGMHJpodSUQ8UIMmsE6aNIlJkybV+r25c+eectspU6YwZcqUhrysiDSx13/cw097jxAS4Msz1/bCRxc3E5EzQPemEZFa7TxUxDNfbQfgsUu7Eh/p3meyiYjrUhkRkRrKK5zc9/56bBVOzusUzQ39482OJCIeTGVERGp44T872XSwkGbB/jxzbc9aryMkItJYVEZEpJo1+46Q9t0uAJ66qgex4YEmJxIRT6cyIiJVSmwV3LdgA04Dru7Tmkt6tDI7koh4AZUREany5BdbyTxSSutmQUy5opvZcUTES6iMiAgA/9l6iPk/ZWKxwLPX9SI80H2v+igi7kVlREQ4XGTj4Q83AjB+cBID20eZnEhEvInKiIiXMwyDhz7YQF5xOWfFhvHARWeZHUlEvIzKiIiXm7diH99uP0yAnw8v3NCHQH/f399IRKQRqYyIeLHtOUX87Zeb4D06sjNntQwzOZGIeCOVEREvVWZ3cPf8dZRXOBl+VjRjByWaHUlEvJTKiIiX+vuX29h+qIgWoQE8c10vXWVVREyjMiLihb7dnsvc5XsBeOa6XrQItZobSES8msqIiJfJLSrjwYUbABg3KJHhZ8WYnEhEvJ3KiIgXcTgN7n1vPXnF5XRuGcYjIzubHUlERGVExJukfbuL5bvzCfL35aUb++o0XhFxCSojIl7ipz1H+MfXOwD465Xd6RATanIiEZFKKiMiXuBoSTn3vLeu6m681ya3MTuSiEgVlRERD2cYBg8s3EB2QRntWoTw1yu7mx1JRKQalRERD/f6j3v5z7ZcAvx8ePHGPoRY/cyOJCJSjcqIiAdbs+8o03+53PtfLu1Ct7gIkxOJiNSkMiLiofKLbfzp3bVUOA0u7dGKm89pa3YkEZFaqYyIeCCH0+DeBesr54lEhzDj2p663LuIuCyVEREP9M//7GTZzjyC/H2ZfVMyoZonIiIuTGVExMN8tz2XF7/ZCcDfrurOWS3DTE4kInJqKiMiHuTgsePcu2A9hgE3Dkjg6r66noiIuD6VEREPUWZ3MOntNRwrtdOjdQSP/6Gr2ZFEROpEZUTEAxiGwV8+2cSGAwVEBPmTdpPuOyMi7kNlRMQDvLVyHwvXHMDHAi/d2If4yGCzI4mI1JnKiIib+29GPtM+2wLAwxd3ZkjHaJMTiYjUj8qIiBvLOnacO3+5sNllveK47bx2ZkcSEak3lRERN1Vmd3DH22vIKy6nc8swZlzTQxc2ExG3pDIi4oYMw+DPv0xYbRbsz7/G9CM4QBc2ExH3pDIi4oZe+T6DD36ZsPriDZqwKiLuTWVExM18tTmHGYu3AfD4H7pqwqqIuD2VERE3sjmrgPt+ucLqzeckMHZQotmRREROm8qIiJvILSrj1jdXU1ru4NwOLXjism6asCoiHkFlRMQNlNkd3DpvDVkFZbSLDmHWTX3x99XHV0Q8g36bibg4p9Pg/oUb2LD/GBFB/rw+9mwigvzNjiUi0mhURkRc3N8Xb+OLjdn4+1qYfXNfEluEmB1JRKRRqYyIuLC5P+7h1e8zAHjm2l4Mat/C5EQiIo1PZUTERS3elMPUzyvvOfPgRWdxZZ/WJicSETkzVEZEXNCafUe45711GAbcNCCBScPamx1JROSMURkRcTEZh4uZ8OZqbBVOLuwSw9TLdQqviHg2lRERF5JdcJzRc37iaKmdXm0ieOGGPvjpFF4R8XD6LSfiIo6UlDN6zk8cPHacpBYhzBl3tm5+JyJeQWVExAUU2yoY98ZP7MotplVEIG+N70+LUKvZsUREmoTKiIjJyuwObn1zNRsPFBAZEsBb4wfQprnuwisi3kNlRMREFQ4nd81fx4qMfEKtfrx5S386xISaHUtEpEmpjIiYxOE0eGDhBtK3HCLAz4d/jelHjzYRZscSEWlyDSojaWlpJCUlERgYSHJyMsuWLTvpuh999BEjRowgOjqa8PBwBg4cyFdffdXgwCKewOk0eOiDjXyyPgtfHwuzbuzLwPZRZscSETFFvcvIggULuPfee3nsscdYt24dQ4YMYeTIkWRmZta6/vfff8+IESNYtGgRa9asYfjw4Vx22WWsW7futMOLuCOn02DyRz/z4doD+PpYePGGPozoGmt2LBER09S7jDz//POMHz+eCRMm0KVLF2bOnEl8fDyzZ8+udf2ZM2fy0EMPcfbZZ9OxY0eeeuopOnbsyGeffXba4UXcjdNp8Ngnm1iwej8+Fpg5qjeX9GhldiwREVPV6yIG5eXlrFmzhkceeaTa8pSUFJYvX16n53A6nRQVFREZGXnSdWw2GzabrepxYWEhAHa7HbvdXp/ILuNEbnfN72nMGA/DMJjy+Vbm/3QAHws8c00PLu4a7fV/J/TZcB0aC9fhKWNR1/z1KiN5eXk4HA5iY6vvUo6NjSUnJ6dOz/Hcc89RUlLC9ddff9J1pk+fztSpU2ssX7JkCcHB7n3KY3p6utkR5FeaajycBny4x4cfDvlgweCG9k78Dq5j0UEdrjxBnw3XobFwHe4+FqWlpXVar0GXd/ztfTIMw6jTvTPmz5/PlClT+PTTT4mJiTnpepMnTyY1NbXqcWFhIfHx8aSkpBAeHt6QyKaz2+2kp6czYsQI/P39zY7j9ZpyPBxOg0c/2cwPh7KwWOCpK7tzbV/dgfcEfTZch8bCdXjKWJw4svF76lVGWrRoga+vb429ILm5uTX2lvzWggULGD9+PAsXLuTCCy885bpWqxWrtebVJ/39/d16UMAz3oMnOdPjYXc4uX/her7YmI2vj4XnruvFlX1URGqjz4br0Fi4Dncfi7pmr9cE1oCAAJKTk2vsNkpPT2fQoEEn3W7+/PmMGzeOd999l0svvbQ+LynitsrsDu54ew1fbMzG39fCrBv7qIiIiNSi3odpUlNTGT16NP369WPgwIG8+uqrZGZmMnHiRKDyEMvBgweZN28eUFlExowZwz//+U/OOeecqr0qQUFBREToAk/imUrLK7ht3hp+2JWH1c+Hl0cnM/yskx+aFBHxZvUuI6NGjSI/P59p06aRnZ1N9+7dWbRoEW3btgUgOzu72jVHXnnlFSoqKrjzzju58847q5aPHTuWuXPnnv47EHExR0rKGf/mKtZlHiM4wJc5Y8/WBc1ERE6hQRNYJ02axKRJk2r93m8LxnfffdeQlxBxS/uPlDL29Z/IyCshIsifN245m74Jzc2OJSLi0hpURkSkpi1ZhYx94ycOF9mIiwhk3vj+dIgJMzuWiIjLUxkRaQTLd+Vx+1trKLJV0LllGHNv6U/LiECzY4mIuAWVEZHT9On6gzywcAN2h8GApEheHdOPiCD3PRVPRKSpqYyINJDTafCPr3fw4je7ALikR0uev743gf6+JicTEXEvKiMiDXC83MH9C9ez6OfKU9VvH9qOhy/qjI/P71+JWEREqlMZEamnQ4Vl3DpvNRsPFODva+Gpq3pwXb94s2OJiLgtlRGReth44Bi3zVtDTmEZzYP9eWV0P/onnfwO1CIi8vtURkTq6L2fMnn8082UO5x0jAllztizSYhy77tIi4i4ApURkd9RZnfwxKebWbB6PwAjusby3PW9CA/UGTMiIo1BZUTkFA4cLeWOt9fy88ECfCxwf8pZ3DG0vSaqiog0IpURkZP4dnsuqQvWc7TUTvNgf164oQ9DOkabHUtExOOojIj8hq3CwdOLtzPnhz0A9GwTQdpNfWnTXPNDRETOBJURkV/JOFzM3e+tY9PBQgDGDUrkkZGddSEzEZEzSGVE5BcfrjnAXz7dRGm5g+bB/jxzbS8u7BprdiwREY+nMiJe70hJOX/5ZBNf/JwNwDntIpk5qo9udCci0kRURsSrpW/J5fHPtpBXXI6fj4V7LujIpOEd8NXZMiIiTUZlRLxSwXE7b+/0YdWK9QB0ig3luet606NNhLnBRES8kMqIeJ30LYf48yc/c6jQBx8L3HZee+4b0RGrnyapioiYQWVEvEZOQRlT/r2ZxZsr77QbHWgwa/QA+rfXtUNERMykMiIez+E0eHvlPp75ajvFtgr8fCyMH5xIB9tO+iQ0MzueiIjXUxkRj7bxwDH+8ulmNuw/BkCfhGZMv7oH7aOCWLRop7nhREQEUBkRD5VbVMYzi7ezcM0BAMKsfjw0sjM39U/Ax8eC3W43OaGIiJygMiIexVbhYO6Pe3nxm10U2yoAuLpPax4e2ZnYcF03RETEFamMiEcwDIMvN+Xw9OJt7M0vBaBXmwieuLwbfROam5xORERORWVE3N7yXXnMWLyNDQcKAGgRauXhi8/imr5t8NHFy0REXJ7KiLitTQcLmLF4G8t25gEQHODLhCHtuO28doRa9VdbRMRd6De2uJ1NBwt44T87WbLlEAD+vhZuGtCWO4d3IDrManI6ERGpL5URcRsb9h/jhf/s5D/bcgGwWOCKXnGkjjiLhKhgk9OJiEhDqYyISzMMg5UZR3jl+918t/0wAD4WuLxXHH86vyMdYkJNTigiIqdLZURcUoXDyZebcvjXsgw2/jIx1dfHwhW94/jT8A60i1YJERHxFCoj4lIKjtv5YM0BXv9hDwePHQfA6ufDdf3aMOHcdiS2CDE5oYiINDaVEXEJW7MLmbdiH5+sO8hxuwOAqJAAxgxM5OZzEogK1cRUERFPpTIipimzO/hqcw5vr9zHqr1Hq5Z3ig1l7KBErunbhkB/XxMTiohIU1AZkSZlGAabswp5f/V+Pll3kMKyyku2+/lYuKhbS0YPbMuApEgsFl2sTETEW6iMSJPILSrj8w3ZLFxzgK3ZhVXL4yICua5fPDcOSNC9Y0REvJTKiJwxhWV2Fm/K4d/rs1i+Ow+nUbk8wM+Hi7q15Pp+bRjUvgW+umS7iIhXUxmRRlVw3M432w7x5c85fLfjMOUVzqrv9UloxpW9W3NF7ziaBQeYmFJERFyJyoictsNFNr7eeogvN+WwfFceFSd2gQAdYkK5snccl/dqraukiohIrVRGpN6czspJqP/Zdohvt+VW3S33hI4xoYzs3pKLu7eiS6swTUYVEZFTUhmROjlcZGP57jyW7cxj6Y7DHC6yVft+zzYRXNStJRd1a6lLtIuISL2ojEitisrsrN53lOW7KgvItpyiat8PCfDl3I4tOL9zDMPPiiFGZ8KIiEgDqYwIAEdLylm19wj/3XOEn/YcYXNWAb+a+gFAt7hwzu3QgiEdozk7qTlWP12QTERETp/KiBdyOA125haxZt9R1u47xrrMo2TkldRYr21UMOckRTG4YwsGt4/SJdlFROSMUBnxcE6nwd78En4+WMDGAwX8fKCATVkFlJY7aqzbPjqE/klRnNMukgFJUbSM0KEXERE581RGPEiJrYIdh4rYkl3I1uxCtmYXsS27kJJaikdIgC+9E5rRN6E5fROa0zu+Gc1DdO0PERFpeiojbqiozE7G4RJ2Hy5m+6Eidh4qZsehIg4cPV7r+lY/H7rFhdOzTTN6tomgZ5sIklqE6sqnIiLiElRGXNTxcgeZR0rZl1/CvvxS9uSXkHG4mIzDJeT+5rTaX4sOs9KlVThdWoXRtVU4XVuFk9QiBD9fnyZMLyIiUncqIyaxVTjIKSjj4LHjHDh6nANHSjlw9Dj7j5ay/8hxcgrLTrl9dJiVdi1C6BQbRqeWYXSKCaVTbJgOtYiIiNtRGWlkhmFQbKsgt8jGoYIyDhWVkVNgI+tYKRt2+DAncyVZBTbyik++d+OEsEA/EqNCaBsVTNuoYNq1CKV9TCjtokMID/RvgncjIiJy5qmM1EGFw8mx43aOlpRzpKSc/JJy8ott5BWXk19iI6+onMPFNg4X2cgtKqPM7jzJM/kAhVWPAv19iIsIok1kMG2aB9GmeRDxzSv/nBgVQrNgf11KXUREPJ7XlBGHs3KPRVGZnaKyCgqP2yksq6DguJ3C43YKfvk6VlrOseN2jpVWPj5SUk7BcXu9Xy/U6kdMuJWW4YG0DA8kOjSAvP27uHBQMvFRocQ1C6K5yoaIiEjDykhaWhrPPPMM2dnZdOvWjZkzZzJkyJCTrr906VJSU1PZvHkzcXFxPPTQQ0ycOLHer/vf3flYrMcptTs4Xl5Bic1BaXkFJeUOSmyVj0tsFRT/6qvEVkFRWeWfT1ezYH8igwOICg0gKsRa+d9QK1EhAcSEWYn+1VdwQPUfrd1uZ9GinVzYJQZ/fx1iEREROaHeZWTBggXce++9pKWlMXjwYF555RVGjhzJli1bSEhIqLH+nj17uOSSS7j11lt5++23+fHHH5k0aRLR0dFcc8019Xrt8fNW42M9vdvQB/j6EBboR0SQP2FB/kQE+RMe6Ed4kD/Ng/1pFhRARLA/zYL8aRYcQGSIP82DA4gI8tcZKSIiImdAvcvI888/z/jx45kwYQIAM2fO5KuvvmL27NlMnz69xvovv/wyCQkJzJw5E4AuXbqwevVqnn322XqXkfbRIYSHhxMU4EtwgB/BAb6EWv0IDvAjxFq5LNTqS2igHyEBfoQG+hFq9SMs0J+wQD/CAv10PxUREREXU68yUl5ezpo1a3jkkUeqLU9JSWH58uW1brNixQpSUlKqLbvooouYM2cOdru91kMWNpsNm+1/Z5sUFlZO+vzg9gGEh4fXJ3J1hhP7SSeXnll2u73af8VcGg/XobFwHRoL1+EpY1HX/PUqI3l5eTgcDmJjY6stj42NJScnp9ZtcnJyal2/oqKCvLw8WrVqVWOb6dOnM3Xq1BrLlyxZQnDw6R2mMVt6errZEeRXNB6uQ2PhOjQWrsPdx6K0tLRO6zVoAutvzwAxDOOUZ4XUtn5ty0+YPHkyqampVY8LCwuJj48nJSXl9PaMmMhut5Oens6IESM0gdUFaDxch8bCdWgsXIenjMWJIxu/p15lpEWLFvj6+tbYC5Kbm1tj78cJLVu2rHV9Pz8/oqKiat3GarVitda8Xb2/v79bDwp4xnvwJBoP16GxcB0aC9fh7mNR1+z1Oj0kICCA5OTkGruN0tPTGTRoUK3bDBw4sMb6S5YsoV+/fm79AxYREZHGUe9zVVNTU3nttdd4/fXX2bp1K/fddx+ZmZlV1w2ZPHkyY8aMqVp/4sSJ7Nu3j9TUVLZu3crrr7/OnDlzeOCBBxrvXYiIiIjbqveckVGjRpGfn8+0adPIzs6me/fuLFq0iLZt2wKQnZ1NZmZm1fpJSUksWrSI++67j1mzZhEXF8cLL7xQ79N6RURExDM1aALrpEmTmDRpUq3fmzt3bo1lQ4cOZe3atQ15KREREfFwuqSoiIiImEplREREREylMiIiIiKmUhkRERERU6mMiIiIiKlURkRERMRUKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZERETEVCojIiIiYiqVERERETGVyoiIiIiYSmVERERETKUyIiIiIqZSGRERERFTqYyIiIiIqVRGRERExFQqIyIiImIqP7MD1IVhGAAUFhaanKTh7HY7paWlFBYW4u/vb3Ycr6fxcB0aC9ehsXAdnjIWJ/7dPvHv+Mm4RRkpKioCID4+3uQkIiIiUl9FRUVERESc9PsW4/fqigtwOp1kZWURFhaGxWIxO06DFBYWEh8fz/79+wkPDzc7jtfTeLgOjYXr0Fi4Dk8ZC8MwKCoqIi4uDh+fk88McYs9Iz4+PrRp08bsGI0iPDzcrf9ieRqNh+vQWLgOjYXr8ISxONUekRM0gVVERERMpTIiIiIiplIZaSJWq5UnnngCq9VqdhRB4+FKNBauQ2PhOrxtLNxiAquIiIh4Lu0ZEREREVOpjIiIiIipVEZERETEVCojIiIiYiqVEZPZbDZ69+6NxWJh/fr1ZsfxOnv37mX8+PEkJSURFBRE+/bteeKJJygvLzc7mldIS0sjKSmJwMBAkpOTWbZsmdmRvNL06dM5++yzCQsLIyYmhiuvvJLt27ebHcvrTZ8+HYvFwr333mt2lDNOZcRkDz30EHFxcWbH8Frbtm3D6XTyyiuvsHnzZv7xj3/w8ssv8+ijj5odzeMtWLCAe++9l8cee4x169YxZMgQRo4cSWZmptnRvM7SpUu58847WblyJenp6VRUVJCSkkJJSYnZ0bzWqlWrePXVV+nZs6fZUZqETu010Zdffklqaioffvgh3bp1Y926dfTu3dvsWF7vmWeeYfbs2WRkZJgdxaMNGDCAvn37Mnv27KplXbp04corr2T69OkmJpPDhw8TExPD0qVLOe+888yO43WKi4vp27cvaWlpPPnkk/Tu3ZuZM2eaHeuM0p4Rkxw6dIhbb72Vt956i+DgYLPjyK8UFBQQGRlpdgyPVl5ezpo1a0hJSam2PCUlheXLl5uUSk4oKCgA0OfAJHfeeSeXXnopF154odlRmoxb3CjP0xiGwbhx45g4cSL9+vVj7969ZkeSX+zevZsXX3yR5557zuwoHi0vLw+Hw0FsbGy15bGxseTk5JiUSqDy91Nqairnnnsu3bt3NzuO13nvvfdYu3Ytq1atMjtKk9KekUY0ZcoULBbLKb9Wr17Niy++SGFhIZMnTzY7sseq61j8WlZWFhdffDHXXXcdEyZMMCm5d7FYLNUeG4ZRY5k0rT/96U9s3LiR+fPnmx3F6+zfv5977rmHt99+m8DAQLPjNCnNGWlEeXl55OXlnXKdxMRE/vjHP/LZZ59V+6XrcDjw9fXlpptu4s033zzTUT1eXcfixAc+KyuL4cOHM2DAAObOnYuPj3r6mVReXk5wcDALFy7kqquuqlp+zz33sH79epYuXWpiOu9111138cknn/D999+TlJRkdhyv88knn3DVVVfh6+tbtczhcGCxWPDx8cFms1X7nidRGTFBZmYmhYWFVY+zsrK46KKL+OCDDxgwYABt2rQxMZ33OXjwIMOHDyc5OZm3337bYz/srmbAgAEkJyeTlpZWtaxr165cccUVmsDaxAzD4K677uLjjz/mu+++o2PHjmZH8kpFRUXs27ev2rJbbrmFzp078/DDD3v0YTPNGTFBQkJCtcehoaEAtG/fXkWkiWVlZTFs2DASEhJ49tlnOXz4cNX3WrZsaWIyz5eamsro0aPp168fAwcO5NVXXyUzM5OJEyeaHc3r3Hnnnbz77rt8+umnhIWFVc3biYiIICgoyOR03iMsLKxG4QgJCSEqKsqjiwiojIiXW7JkCbt27WLXrl01iqB2Gp5Zo0aNIj8/n2nTppGdnU337t1ZtGgRbdu2NTua1zlxevWwYcOqLX/jjTcYN25c0wcSr6PDNCIiImIqzdITERERU6mMiIiIiKlURkRERMRUKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZExBR79+7FYrHU+Prt/VFExPPpRnkiYor4+Hiys7OrHufk5HDhhRdy3nnnmZhKRMygG+WJiOnKysoYNmwY0dHRfPrpp/j4aKetiDfRnhERMd348eMpKioiPT1dRUTEC6mMiIipnnzySRYvXsxPP/1EWFiY2XFExAQ6TCMipvnwww+54YYb+PLLL7ngggvMjiMiJlEZERFTbNq0iQEDBpCamsqdd95ZtTwgIIDIyEgTk4lIU1MZERFTzJ07l1tuuaXG8qFDh/Ldd981fSARMY3KiIiIiJhK09ZFRETEVCojIiIiYiqVERERETGVyoiIiIiYSmVERERETKUyIiIiIqZSGRERERFTqYyIiIiIqVRGRERExFQqIyIiImIqlRERERExlcqIiIiImOr/AbSpzJqxOet0AAAAAElFTkSuQmCC", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAHFCAYAAAD/kYOsAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA4kUlEQVR4nO3de3SU1b3/8c8EwoQIiQYkCW0IlEWBAAoEyYWCREwQBNGKQLVRFLAssIKpy5p6A/Q0R2slgILSQ02R67ER0BqRYA3gIVCBBKun8gMPGIRELkqGi5kMyfP7QzMwTghJZZjJnvdrrVn67Nnz5LvZTvj47OdisyzLEgAAgEFC/F0AAADApUbAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABcFErVqxQbm6uX2v46quvNGHCBHXo0EE2m0233nqrX+v5/e9/r7Vr13q1FxUVyWazqaio6LLXBOAcG49qAHAxo0aN0scff6wDBw74rYaHHnpICxcu1J///Gd17dpVUVFR+ulPf+q3etq0aaOxY8cqLy/Po93hcOh///d/lZCQoIiICP8UB0At/V0AADTGxx9/rK5du+quu+7ydykNioiIUHJysr/LAIIeS1RAkDt69Kjuv/9+xcXFyW636+qrr9agQYO0ceNGSdLQoUP19ttv6/PPP5fNZnO/6lRXV+uZZ55Rjx493J+/9957dfToUY+f07lzZ40aNUpr1qzRNddco7CwMP3kJz/R/PnzG6zvwIEDstls2rhxo/71r3+5f35RUdEFl4PqPnP+0ZWJEyeqTZs22rdvn0aOHKk2bdooLi5Ov/nNb+R0Oj0+73Q6NWfOHPXs2VNhYWFq166d0tLStHXrVkmSzWbT6dOn9Ze//MVdz9ChQyVdeInqzTffVEpKisLDw9W2bVulp6eruLjYo8+sWbNks9n0ySef6Be/+IUiIyMVHR2t++67T5WVlQ3+OQHwxBEcIMhlZmZq165d+o//+A/99Kc/1YkTJ7Rr1y4dP35ckrRw4ULdf//9+uyzz7RmzRqPz9bW1mrMmDHasmWLHnnkEaWmpurzzz/XU089paFDh2rHjh1q3bq1u39paalmzpypWbNmKSYmRsuXL9eMGTNUXV2thx9+uN76YmNjVVxcrGnTpqmyslLLly+XJCUkJGjXrl1NGqvL5dItt9yiSZMm6Te/+Y02b96sp59+WpGRkXryySclSWfPntWIESO0ZcsWzZw5UzfccIPOnj2rbdu2qaysTKmpqSouLtYNN9ygtLQ0PfHEE5LU4HLUihUrdNdddykjI0MrV66U0+nUc889p6FDh+q9997Tz372M4/+t99+u8aPH69Jkybpn//8p7KzsyVJf/7zn5s0XiCoWQCCWps2bayZM2c22Ofmm2+24uPjvdpXrlxpSbLy8/M92j/88ENLkrVw4UJ3W3x8vGWz2azS0lKPvunp6VZERIR1+vTpBmu4/vrrrV69enm0vf/++5Yk6/333/do379/vyXJevXVV91t99xzjyXJ+u///m+PviNHjrS6d+/u3l66dKklyfrTn/7UYD1XXHGFdc8993i1f7+mmpoaq2PHjlafPn2smpoad7+TJ09aHTp0sFJTU91tTz31lCXJeu655zz2OW3aNCssLMyqra1tsCYA57BEBQS5gQMHKi8vT88884y2bdsml8vV6M/+7W9/05VXXqnRo0fr7Nmz7lffvn0VExPjtUzTq1cvXXvttR5td955pxwOR5OPxvw7bDabRo8e7dF2zTXX6PPPP3dvv/POOwoLC9N99913SX7mnj17dPjwYWVmZiok5Nyv3DZt2uj222/Xtm3bdObMGY/P3HLLLV41VlVV6ciRI5ekJiAYEHCAILd69Wrdc889+q//+i+lpKQoKipKd999tyoqKi762S+//FInTpxQq1atFBoa6vGqqKjQsWPHPPrHxMR47aOurW5JzJfCw8MVFhbm0Wa321VVVeXePnr0qDp27OgRRn6IunHFxsZ6vdexY0fV1tbq66+/9mhv166dV42S9M0331ySmoBgwDk4QJBr3769cnNzlZubq7KyMr355pt69NFHdeTIEa1fv/6in23Xrt0F+7Vt29Zju77QVNf2/b/UG6MurHz/JOHvB6umuPrqq/XBBx+otrb2koScunGVl5d7vXf48GGFhIToqquu+sE/B4AnjuAAcOvUqZMeeOABpaeneywZ2e32eo8ejBo1SsePH1dNTY0GDBjg9erevbtH/08++US7d+/2aFuxYoXatm2r/v37N7nezp07S5I++ugjj/Y333yzyfuqM2LECFVVVXnd3+b7LvRn8n3du3fXj370I61YsULWebcdO336tPLz891XVgG4tDiCAwSxyspKpaWl6c4771SPHj3Utm1bffjhh1q/fr1+/vOfu/v16dNHb7zxhhYtWqTExESFhIRowIABmjBhgpYvX66RI0dqxowZGjhwoEJDQ/XFF1/o/fff15gxY3Tbbbe599OxY0fdcsstmjVrlmJjY7Vs2TIVFhbq2Wef/bf+ko+JidGNN96onJwcXXXVVYqPj9d7772nN95449/+M/nFL36hV199VVOnTtWePXuUlpam2tpabd++XT179tSECRPcfyZFRUV66623FBsbq7Zt23oFOkkKCQnRc889p7vuukujRo3Sr371KzmdTv3hD3/QiRMn9J//+Z//dq0AGuDvs5wB+E9VVZU1depU65prrrEiIiKs1q1bW927d7eeeuopj6uavvrqK2vs2LHWlVdeadlsNuv8Xx0ul8t6/vnnrWuvvdYKCwuz2rRpY/Xo0cP61a9+Ze3du9fdLz4+3rr55putv/71r1avXr2sVq1aWZ07d7ZeeOGFRtVa31VUlmVZ5eXl1tixY62oqCgrMjLS+uUvf2nt2LGj3quorrjiCq/P1125dL5vvvnGevLJJ61u3bpZrVq1stq1a2fdcMMN1tatW919SktLrUGDBlnh4eGWJOv666+3LOvCV3atXbvWSkpKssLCwqwrrrjCGjZsmPU///M/9dZy9OhRj/ZXX33VkmTt37+/EX9SACzLsnhUA4DLonPnzurdu7f+9re/+bsUAEGAc3AAAIBxCDgAAMA4LFEBAADj+PQIzubNmzV69Gh17NhRNptNa9eubbB/3UPqvv/69NNPPfrl5+crISFBdrtdCQkJXs/HAQAAwc2nAef06dO69tpr9eKLLzbpc3v27FF5ebn71a1bN/d7xcXFGj9+vDIzM7V7925lZmZq3Lhx2r59+6UuHwAANFOXbYnKZrNpzZo1uvXWWy/Yp6ioSGlpafr666915ZVX1ttn/Pjxcjgceuedd9xtN910k6666iqtXLnyElcNAACao4C80V+/fv1UVVWlhIQEPf7440pLS3O/V1xcrIceesij//Dhw5Wbm3vB/TmdTo9budfW1uqrr75Su3btZLPZLnn9AADg0rMsSydPnmzU8+ICKuDExsZq8eLFSkxMlNPp1GuvvaZhw4apqKhIQ4YMkfTtc2uio6M9PhcdHd3ggwFzcnI0e/Zsn9YOAAAuj4MHD+rHP/5xg30CKuB0797d41bnKSkpOnjwoJ5//nl3wJHkddTFsqwGj8RkZ2crKyvLvV1ZWalOnTpp//79Xg8DbE5cLpfef/99paWlKTQ01N/lBDXmInAwF4GF+QgcJszFyZMn1aVLl0b93R1QAac+ycnJWrZsmXs7JibG62jNkSNHvI7qnM9ut8tut3u1R0VFKSIi4tIVe5m5XC6Fh4erXbt2zfY/VlMwF4GDuQgszEfgMGEu6upuzOklAX+jv5KSEsXGxrq3U1JSVFhY6NFnw4YNSk1NvdylAQCAAOXTIzinTp3Svn373Nv79+9XaWmpoqKi1KlTJ2VnZ+vQoUNaunSpJCk3N1edO3dWr169VF1drWXLlik/P1/5+fnufcyYMUNDhgzRs88+qzFjxmjdunXauHGjPvjgA18OBQAANCM+DTg7duzwuAKq7jyYe+65R3l5eSovL1dZWZn7/erqaj388MM6dOiQWrdurV69euntt9/WyJEj3X1SU1O1atUqPf7443riiSfUtWtXrV69WklJSb4cCgAAaEZ8GnCGDh2qhm6zk5eX57H9yCOP6JFHHrnofseOHauxY8f+0PIAAIChAv4cHAAAgKYi4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcXwacDZv3qzRo0erY8eOstlsWrt2bYP933jjDaWnp+vqq69WRESEUlJS9O6773r0ycvLk81m83pVVVX5cCQAAKA58WnAOX36tK699lq9+OKLjeq/efNmpaenq6CgQDt37lRaWppGjx6tkpISj34REREqLy/3eIWFhfliCAAAoBlq6cudjxgxQiNGjGh0/9zcXI/t3//+91q3bp3eeust9evXz91us9kUExNzqcoEAACG8WnA+aFqa2t18uRJRUVFebSfOnVK8fHxqqmpUd++ffX00097BKDvczqdcjqd7m2HwyFJcrlccrlcvin+MqirvTmPwRTMReBgLgIL8xE4TJiLptQe0AHnj3/8o06fPq1x48a523r06KG8vDz16dNHDodD8+bN06BBg7R7925169at3v3k5ORo9uzZXu0bNmxQeHi4z+q/XAoLC/1dAr7DXAQO5iKwMB+BoznPxZkzZxrd12ZZluXDWs79IJtNa9as0a233tqo/itXrtTkyZO1bt063XjjjRfsV1tbq/79+2vIkCGaP39+vX3qO4ITFxenY8eOKSIioknjCCQul0uFhYVKT09XaGiov8sJasxF4GAuAgvzEThMmAuHw6H27dursrLyon9/B+QRnNWrV2vSpEl6/fXXGww3khQSEqLrrrtOe/fuvWAfu90uu93u1R4aGtpsJ/l8pozDBMxF4GAuAgvzETia81w0pe6Auw/OypUrNXHiRK1YsUI333zzRftblqXS0lLFxsZehuoAAEBz4NMjOKdOndK+ffvc2/v371dpaamioqLUqVMnZWdn69ChQ1q6dKmkb8PN3XffrXnz5ik5OVkVFRWSpNatWysyMlKSNHv2bCUnJ6tbt25yOByaP3++SktL9dJLL/lyKAAAoBnx6RGcHTt2qF+/fu4rnLKystSvXz89+eSTkqTy8nKVlZW5+7/yyis6e/aspk+frtjYWPdrxowZ7j4nTpzQ/fffr549eyojI0OHDh3S5s2bNXDgQF8OBQAANCM+PYIzdOhQNXQOc15ensd2UVHRRfc5d+5czZ079wdWBgAATBZw5+AAAAD8UAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADCOTwPO5s2bNXr0aHXs2FE2m01r16696Gc2bdqkxMREhYWF6Sc/+Ylefvllrz75+flKSEiQ3W5XQkKC1qxZ44PqAQBAc+XTgHP69Glde+21evHFFxvVf//+/Ro5cqQGDx6skpIS/e53v9ODDz6o/Px8d5/i4mKNHz9emZmZ2r17tzIzMzVu3Dht377dV8MAAADNTEtf7nzEiBEaMWJEo/u//PLL6tSpk3JzcyVJPXv21I4dO/T888/r9ttvlyTl5uYqPT1d2dnZkqTs7Gxt2rRJubm5Wrly5SUfAwAAaH58GnCaqri4WBkZGR5tw4cP15IlS+RyuRQaGqri4mI99NBDXn3qQhEAHK+SFvz9M9X4uxCotqZW+8pC9GnhXoW04LRPfzJhLqpOn2p034AKOBUVFYqOjvZoi46O1tmzZ3Xs2DHFxsZesE9FRcUF9+t0OuV0Ot3bDodDkuRyueRyuS7hCC6vutqb8xhMwVwEDpfLpb+VhWhXyWf+LgVuISo8tN/fRUBSc5+LWueZRvcNqIAjSTabzWPbsiyv9vr6fL/tfDk5OZo9e7ZX+4YNGxQeHv5Dyg0IhYWF/i4B32EuAsM3Nd/+32n3yFrFtPZzMQAumeqqWq1qZN+ACjgxMTFeR2KOHDmili1bql27dg32+f5RnfNlZ2crKyvLve1wOBQXF6eMjAxFRERcwhFcXi6XS4WFhUpPT1doaKi/ywlqzEXgcLlcWvS/70mS7hvWRz/v9yM/VxTc+G4EDhPmwuFwaFV24/oGVMBJSUnRW2+95dG2YcMGDRgwwD0ZKSkpKiws9DgPZ8OGDUpNTb3gfu12u+x2u1d7aGhos53k85kyDhMwF4EltGVL5iNA8N0IHM15LppSt08DzqlTp7Rv3z739v79+1VaWqqoqCh16tRJ2dnZOnTokJYuXSpJmjp1ql588UVlZWVpypQpKi4u1pIlSzyujpoxY4aGDBmiZ599VmPGjNG6deu0ceNGffDBB74cCoBmxPrunw2sXAMwnE9Po96xY4f69eunfv36SZKysrLUr18/Pfnkk5Kk8vJylZWVuft36dJFBQUFKioqUt++ffX0009r/vz57kvEJSk1NVWrVq3Sq6++qmuuuUZ5eXlavXq1kpKSfDkUAM1IXcAJIeEAQcunR3CGDh3qPkm4Pnl5eV5t119/vXbt2tXgfseOHauxY8f+0PIAGKqBXzsAgkTzvBAeABqhoasrAZiNgAPAOJa+DTYh5BsgaBFwABinbonKJhIOEKwIOACMw1VUAAg4AIxz7ioqv5YBwI8IOAAMRsIBghUBB4Bx6s7B4QgOELwIOACMc+4cHBIOEKwIOACMRbwBghcBB4BxauuWqPgNBwQtvv4AjMV9cIDgRcABYBz3o6jIN0DQIuAAMM65q6hIOECwIuAAMBbxBgheBBwAxqn97p8cwQGCFwEHgHnqHrZJvgGCFgEHgHHcN/rzaxUA/ImAA8A43MkYAAEHgLHIN0DwIuAAME7dZeLkGyB4EXAAGKduiSqEx4kDQYuAA8A4nGQMgIADwDgWl4kDQY+AA8BYXEUFBC8CDgDjsEQFgIADwDg8bBMAAQeAcc7d6M+vZQDwIwIOAGPZWKQCghYBB4BxuIoKAAEHgHFYogJAwAFgnHNXUZFwgGBFwAFgnHOPavBrGQD8iK8/APO4H7bJERwgWBFwABjHfQSHfAMELQIOAONwkjGAyxJwFi5cqC5duigsLEyJiYnasmXLBftOnDhRNpvN69WrVy93n7y8vHr7VFVVXY7hAAh0dQmHJSogaPk84KxevVozZ87UY489ppKSEg0ePFgjRoxQWVlZvf3nzZun8vJy9+vgwYOKiorSHXfc4dEvIiLCo195ebnCwsJ8PRwAzQBLVAB8HnBeeOEFTZo0SZMnT1bPnj2Vm5uruLg4LVq0qN7+kZGRiomJcb927Nihr7/+Wvfee69HP5vN5tEvJibG10MB0EycW6Ii4QDByqcBp7q6Wjt37lRGRoZHe0ZGhrZu3dqofSxZskQ33nij4uPjPdpPnTql+Ph4/fjHP9aoUaNUUlJyyeoG0Ly572Ts3zIA+FFLX+782LFjqqmpUXR0tEd7dHS0KioqLvr58vJyvfPOO1qxYoVHe48ePZSXl6c+ffrI4XBo3rx5GjRokHbv3q1u3bp57cfpdMrpdLq3HQ6HJMnlcsnlcv07QwsIdbU35zGYgrkIHC6Xy30Ep6bmLHPiZ3w3AocJc9GU2n0acOp8/zCxZVmNOnScl5enK6+8UrfeeqtHe3JyspKTk93bgwYNUv/+/bVgwQLNnz/faz85OTmaPXu2V/uGDRsUHh7eyFEErsLCQn+XgO8wF4GihSRpU1GR2nFqXkDguxE4mvNcnDlzptF9fRpw2rdvrxYtWngdrTly5IjXUZ3vsyxLf/7zn5WZmalWrVo12DckJETXXXed9u7dW+/72dnZysrKcm87HA7FxcUpIyNDERERjRxN4HG5XCosLFR6erpCQ0P9XU5QYy4Ch8vlkrXt75KkG25I04+ubO3nioIb343AYcJc1K3ANIZPA06rVq2UmJiowsJC3Xbbbe72wsJCjRkzpsHPbtq0Sfv27dOkSZMu+nMsy1Jpaan69OlT7/t2u112u92rPTQ0tNlO8vlMGYcJmIvAULdExXwEDuYicDTnuWhK3T5fosrKylJmZqYGDBiglJQULV68WGVlZZo6daqkb4+uHDp0SEuXLvX43JIlS5SUlKTevXt77XP27NlKTk5Wt27d5HA4NH/+fJWWluqll17y9XAANAPnHrYJIFj5POCMHz9ex48f15w5c1ReXq7evXuroKDAfVVUeXm51z1xKisrlZ+fr3nz5tW7zxMnTuj+++9XRUWFIiMj1a9fP23evFkDBw709XAANAPn7oNDxAGC1WU5yXjatGmaNm1ave/l5eV5tUVGRjZ4ItHcuXM1d+7cS1UeANPUXSZOvgGCFs+iAmAclqgAEHAAGMf6LtpwJ2MgeBFwABjFstxP2mSJCghiBBwARjkv33CSMRDECDgAjHJevuEcHCCIEXAAGIUlKgASAQeAYWrPO4TDScZA8CLgADCKxxIV+QYIWgQcAGY5f4nKj2UA8C8CDgCj1HIVFQARcAAYxhInGQMg4AAwjMdJxixSAUGLgAPAKJbHVVT+qwOAfxFwABiGJSoABBwAhuEkYwASAQeAYTyWqPxXBgA/I+AAMIrnVVREHCBYEXAAGMVzicp/dQDwLwIOALNYHMEBQMABYJjzj+AACF4EHABGqcs3LE8BwY2AA8Ao1ndLVCxPAcGNgAPAKHVLVBzBAYIbAQcAABiHgAPAKCxRAZAIOAAMwxIVAImAA8AwdXcyJt8AwY2AA8Aodff5Y4kKCG4EHABGORdw/FsHAP8i4AAwyrklKhIOEMwIOACMUlv77T85yRgIbgQcAEape1QDS1RAcCPgADCK+z44LFEBQY2AA8AonGQMQCLgADCM+yRjAg4Q1Ag4AIziPoLDEhUQ1C5LwFm4cKG6dOmisLAwJSYmasuWLRfsW1RUJJvN5vX69NNPPfrl5+crISFBdrtdCQkJWrNmja+HAaAZ4FENAKTLEHBWr16tmTNn6rHHHlNJSYkGDx6sESNGqKysrMHP7dmzR+Xl5e5Xt27d3O8VFxdr/PjxyszM1O7du5WZmalx48Zp+/btvh4OgAB3bomKhAMEM58HnBdeeEGTJk3S5MmT1bNnT+Xm5iouLk6LFi1q8HMdOnRQTEyM+9WiRQv3e7m5uUpPT1d2drZ69Oih7OxsDRs2TLm5uT4eDYBAx0nGACSppS93Xl1drZ07d+rRRx/1aM/IyNDWrVsb/Gy/fv1UVVWlhIQEPf7440pLS3O/V1xcrIceesij//Dhwy8YcJxOp5xOp3vb4XBIklwul1wuV1OGFFDqam/OYzAFcxE4XK6zkr592Cbz4X98NwKHCXPRlNp9GnCOHTummpoaRUdHe7RHR0eroqKi3s/ExsZq8eLFSkxMlNPp1GuvvaZhw4apqKhIQ4YMkSRVVFQ0aZ85OTmaPXu2V/uGDRsUHh7+7wwtoBQWFvq7BHyHufC/slOS1FJVVVUqKCjwdzn4Dt+NwNGc5+LMmTON7uvTgFPn+2vhlmVdcH28e/fu6t69u3s7JSVFBw8e1PPPP+8OOE3dZ3Z2trKystzbDodDcXFxysjIUERERJPHEyhcLpcKCwuVnp6u0NBQf5cT1JiLwLHzwHHpnzsV3rq1Ro4ccvEPwKf4bgQOE+aibgWmMXwacNq3b68WLVp4HVk5cuSI1xGYhiQnJ2vZsmXu7ZiYmCbt0263y263e7WHhoY220k+nynjMAFz4X915+uF2MRcBBC+G4GjOc9FU+r26UnGrVq1UmJiotfhsMLCQqWmpjZ6PyUlJYqNjXVvp6SkeO1zw4YNTdonADPVPYuKs4yB4ObzJaqsrCxlZmZqwIABSklJ0eLFi1VWVqapU6dK+nb56NChQ1q6dKmkb6+Q6ty5s3r16qXq6motW7ZM+fn5ys/Pd+9zxowZGjJkiJ599lmNGTNG69at08aNG/XBBx/4ejgAApzFfXAA6DIEnPHjx+v48eOaM2eOysvL1bt3bxUUFCg+Pl6SVF5e7nFPnOrqaj388MM6dOiQWrdurV69euntt9/WyJEj3X1SU1O1atUqPf7443riiSfUtWtXrV69WklJSb4eDoAAx8M2AUiX6STjadOmadq0afW+l5eX57H9yCOP6JFHHrnoPseOHauxY8deivIAGKSW++AAEM+iAmCYujsZs0QFBDcCDgCjWOfOMvZnGQD8jIADwCicZAxAIuAAMMy5h236uRAAfkXAAWAU98M2WaICghoBB4BRalmiAiACDgDD1C1RsUYFBDcCDgCjnFuiAhDMCDgAjFJ3J+MQfrsBQY1fAQCMUncbHE4yBoIbAQeAUTjJGIBEwAFgGIuTcACIgAPANNwHB4AIOAAMwxIVAImAA8Aw5x7VQMIBghkBB4BROAUHgETAAWCYWouHbQIg4AAwFEtUQHAj4AAwCicZA5AIOAAMU3cfHPINENwIOACM4n5UA0tUQFAj4AAwSt0SFfkGCG4EHABmcS9RkXCAYEbAAWCUc0tUfi0DgJ8RcAAYpe4+OFxFBQQ3Ag4Ao1g8bBOACDgADMNJxgAkAg4A4/CoBgAEHACGYYkKgETAAWAYHtUAQCLgADCM5V6iIuEAwYyAA8AodUtUAIIbAQeAUSzugwNABBwAhuFhmwAkAg4Aw7jvg+PfMgD42WUJOAsXLlSXLl0UFhamxMREbdmy5YJ933jjDaWnp+vqq69WRESEUlJS9O6773r0ycvLk81m83pVVVX5eigAAty5JSoiDhDMfB5wVq9erZkzZ+qxxx5TSUmJBg8erBEjRqisrKze/ps3b1Z6eroKCgq0c+dOpaWlafTo0SopKfHoFxERofLyco9XWFiYr4cDIMC5zzEm3wBBraWvf8ALL7ygSZMmafLkyZKk3Nxcvfvuu1q0aJFycnK8+ufm5nps//73v9e6dev01ltvqV+/fu52m82mmJgYn9YOoPmxuA8OAPk44FRXV2vnzp169NFHPdozMjK0devWRu2jtrZWJ0+eVFRUlEf7qVOnFB8fr5qaGvXt21dPP/20RwA6n9PplNPpdG87HA5JksvlksvlasqQAkpd7c15DKZgLgLH2bNnJX27VMV8+B/fjcBhwlw0pXafBpxjx46ppqZG0dHRHu3R0dGqqKho1D7++Mc/6vTp0xo3bpy7rUePHsrLy1OfPn3kcDg0b948DRo0SLt371a3bt289pGTk6PZs2d7tW/YsEHh4eFNHFXgKSws9HcJ+A5z4X97DtsktVBFebkKCg75uxx8h+9G4GjOc3HmzJlG9/X5EpXkfbmmZVmNuoRz5cqVmjVrltatW6cOHTq425OTk5WcnOzeHjRokPr3768FCxZo/vz5XvvJzs5WVlaWe9vhcCguLk4ZGRmKiIj4d4YUEFwulwoLC5Wenq7Q0FB/lxPUmIvAUbbpM+nzz/Sjjh01cuQ1/i4n6PHdCBwmzEXdCkxj+DTgtG/fXi1atPA6WnPkyBGvozrft3r1ak2aNEmvv/66brzxxgb7hoSE6LrrrtPevXvrfd9ut8tut3u1h4aGNttJPp8p4zABc+F/ISHfXjsR0iKEuQggfDcCR3Oei6bU7dOrqFq1aqXExESvw2GFhYVKTU294OdWrlypiRMnasWKFbr55psv+nMsy1JpaaliY2N/cM0AmjeL++AA0GVYosrKylJmZqYGDBiglJQULV68WGVlZZo6daqkb5ePDh06pKVLl0r6NtzcfffdmjdvnpKTk91Hf1q3bq3IyEhJ0uzZs5WcnKxu3brJ4XBo/vz5Ki0t1UsvveTr4QAIcNwHB4B0GQLO+PHjdfz4cc2ZM0fl5eXq3bu3CgoKFB8fL0kqLy/3uCfOK6+8orNnz2r69OmaPn26u/2ee+5RXl6eJOnEiRO6//77VVFRocjISPXr10+bN2/WwIEDfT0cAAHu3KMa/FoGAD+7LCcZT5s2TdOmTav3vbrQUqeoqOii+5s7d67mzp17CSoDYJpa7oMDQDyLCoBh6paoOAsHCG4EHABGYYkKgETAAWCYcycZ+7kQAH5FwAFglHOXiZNwgGBGwAFgFJaoAEgEHACGqf3uEE5jHgcDwFwEHABm4U7GAETAAWAY7oMDQCLgADCMJZaoABBwABiGh20CkAg4AAxz7iRjPxcCwK8IOACMxBIVENwIOACMwhIVAImAA8Awte5HNRBxgGBGwAFgFO5kDEAi4AAwTN19cAg4QHAj4AAwS91VVJyFAwQ1Ag4Ao7BEBUAi4AAwzLmTjP1cCAC/IuAAMMq5y8RJOEAwI+AAMErdEhX5BghuBBwARrG4Dw4AEXAAGIY7GQOQCDgADMN9cABIBBwAhrHEEhUAAg4Aw1jWxfsAMB8BB4BR3CcZ89sNCGr8CgBgFO6DA0Ai4AAwDI9qACARcAAYppb74AAQAQeAYTjJGIBEwAFgGIv74AAQAQeAYbgPDgCJgAPAMDyqAYBEwAFgmHMnGfu5EAB+dVkCzsKFC9WlSxeFhYUpMTFRW7ZsabD/pk2blJiYqLCwMP3kJz/Ryy+/7NUnPz9fCQkJstvtSkhI0Jo1a3xVPoBmxH2OMUtUQFDzecBZvXq1Zs6cqccee0wlJSUaPHiwRowYobKysnr779+/XyNHjtTgwYNVUlKi3/3ud3rwwQeVn5/v7lNcXKzx48crMzNTu3fvVmZmpsaNG6ft27f7ejgAAhxLVACkyxBwXnjhBU2aNEmTJ09Wz549lZubq7i4OC1atKje/i+//LI6deqk3Nxc9ezZU5MnT9Z9992n559/3t0nNzdX6enpys7OVo8ePZSdna1hw4YpNzfX18MBEOAs7oMDQFJLX+68urpaO3fu1KOPPurRnpGRoa1bt9b7meLiYmVkZHi0DR8+XEuWLJHL5VJoaKiKi4v10EMPefW5UMBxOp1yOp3ubYfDIUma8pcP1ar1FU0dVsCwLEvHjoXo9SM7ZOOXuV8xF4Hjk8Pffr9ra2vkcrn8XA3q5oC58D8T5qIptfs04Bw7dkw1NTWKjo72aI+OjlZFRUW9n6moqKi3/9mzZ3Xs2DHFxsZesM+F9pmTk6PZs2d7tRf/31cKsVc1ZUgBKESq/MrfRUAScxFYvtj3vyo4/om/y8B3CgsL/V0CvtOc5+LMmTON7uvTgFPn+/9Ha1lWg/+XW1//77c3ZZ/Z2dnKyspybzscDsXFxenpW3oqvE3bxg0iANXU1Ojjjz9W79691aJFC3+XE9SYi8BRU1OjA3s+1gNjh8neqpW/ywl6LpdLhYWFSk9PV2hoqL/LCWomzEXdCkxj+DTgtG/fXi1atPA6snLkyBGvIzB1YmJi6u3fsmVLtWvXrsE+F9qn3W6X3W73ar8tsZMiIiIaPZ5A43K51PrLf2pkYlyz/Y/VFMxF4HC5XCr48p+yt2rFXASQ0NBQ5iNANOe5aErdPj3JuFWrVkpMTPQ6HFZYWKjU1NR6P5OSkuLVf8OGDRowYIB7YBfqc6F9AgCA4OLzJaqsrCxlZmZqwIABSklJ0eLFi1VWVqapU6dK+nb56NChQ1q6dKkkaerUqXrxxReVlZWlKVOmqLi4WEuWLNHKlSvd+5wxY4aGDBmiZ599VmPGjNG6deu0ceNGffDBB74eDgAAaAZ8HnDGjx+v48ePa86cOSovL1fv3r1VUFCg+Ph4SVJ5ebnHPXG6dOmigoICPfTQQ3rppZfUsWNHzZ8/X7fffru7T2pqqlatWqXHH39cTzzxhLp27arVq1crKSnJ18MBAADNwGU5yXjatGmaNm1ave/l5eV5tV1//fXatWtXg/scO3asxo4deynKAwAAhuFZVAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwjk8Dztdff63MzExFRkYqMjJSmZmZOnHixAX7u1wu/fa3v1WfPn10xRVXqGPHjrr77rt1+PBhj35Dhw6VzWbzeE2YMMGXQwEAAM2ITwPOnXfeqdLSUq1fv17r169XaWmpMjMzL9j/zJkz2rVrl5544gnt2rVLb7zxhv7f//t/uuWWW7z6TpkyReXl5e7XK6+84suhAACAZqSlr3b8r3/9S+vXr9e2bduUlJQkSfrTn/6klJQU7dmzR927d/f6TGRkpAoLCz3aFixYoIEDB6qsrEydOnVyt4eHhysmJsZX5QMAgGbMZ0dwiouLFRkZ6Q43kpScnKzIyEht3bq10fuprKyUzWbTlVde6dG+fPlytW/fXr169dLDDz+skydPXqrSAQBAM+ezIzgVFRXq0KGDV3uHDh1UUVHRqH1UVVXp0Ucf1Z133qmIiAh3+1133aUuXbooJiZGH3/8sbKzs7V7926voz91nE6nnE6ne9vhcEj69pwfl8vVlGEFlLram/MYTMFcBA7mIrAwH4HDhLloSu1NDjizZs3S7NmzG+zz4YcfSpJsNpvXe5Zl1dv+fS6XSxMmTFBtba0WLlzo8d6UKVPc/967d29169ZNAwYM0K5du9S/f3+vfeXk5NRb84YNGxQeHn7RWgLdhYIdLj/mInAwF4GF+QgczXkuzpw50+i+TQ44DzzwwEWvWOrcubM++ugjffnll17vHT16VNHR0Q1+3uVyady4cdq/f7/+/ve/exy9qU///v0VGhqqvXv31htwsrOzlZWV5d52OByKi4tTRkbGRfcdyFwulwoLC5Wenq7Q0FB/lxPUmIvAwVwEFuYjcJgwF3UrMI3R5IDTvn17tW/f/qL9UlJSVFlZqX/84x8aOHCgJGn79u2qrKxUamrqBT9XF2727t2r999/X+3atbvoz/rkk0/kcrkUGxtb7/t2u112u92rPTQ0tNlO8vlMGYcJmIvAwVwEFuYjcDTnuWhK3T47ybhnz5666aabNGXKFG3btk3btm3TlClTNGrUKI8rqHr06KE1a9ZIks6ePauxY8dqx44dWr58uWpqalRRUaGKigpVV1dLkj777DPNmTNHO3bs0IEDB1RQUKA77rhD/fr106BBg3w1HAAA0Iz49D44y5cvV58+fZSRkaGMjAxdc801eu211zz67NmzR5WVlZKkL774Qm+++aa++OIL9e3bV7Gxse5X3ZVXrVq10nvvvafhw4ere/fuevDBB5WRkaGNGzeqRYsWvhwOAABoJnx2FZUkRUVFadmyZQ32sSzL/e+dO3f22K5PXFycNm3adEnqAwAAZuJZVAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwjk8Dztdff63MzExFRkYqMjJSmZmZOnHiRIOfmThxomw2m8crOTnZo4/T6dSvf/1rtW/fXldccYVuueUWffHFFz4cCQAAaE58GnDuvPNOlZaWav369Vq/fr1KS0uVmZl50c/ddNNNKi8vd78KCgo83p85c6bWrFmjVatW6YMPPtCpU6c0atQo1dTU+GooAACgGWnpqx3/61//0vr167Vt2zYlJSVJkv70pz8pJSVFe/bsUffu3S/4WbvdrpiYmHrfq6ys1JIlS/Taa6/pxhtvlCQtW7ZMcXFx2rhxo4YPH37pBwMAAJoVnx3BKS4uVmRkpDvcSFJycrIiIyO1devWBj9bVFSkDh066Kc//ammTJmiI0eOuN/buXOnXC6XMjIy3G0dO3ZU7969L7pfAAAQHHx2BKeiokIdOnTwau/QoYMqKiou+LkRI0bojjvuUHx8vPbv368nnnhCN9xwg3bu3Cm73a6Kigq1atVKV111lcfnoqOjL7hfp9Mpp9Pp3q6srJQkffXVV3K5XP/O8AKCy+XSmTNndPz4cYWGhvq7nKDGXAQO5iKwMB+Bw4S5OHnypCTJsqyL9m1ywJk1a5Zmz57dYJ8PP/xQkmSz2bzesyyr3vY648ePd/977969NWDAAMXHx+vtt9/Wz3/+8wt+rqH95uTk1Ftzly5dLrg/AAAQmE6ePKnIyMgG+zQ54DzwwAOaMGFCg306d+6sjz76SF9++aXXe0ePHlV0dHSjf15sbKzi4+O1d+9eSVJMTIyqq6v19ddfexzFOXLkiFJTU+vdR3Z2trKystzbtbW1+uqrr9SuXbsGw1agczgciouL08GDBxUREeHvcoIacxE4mIvAwnwEDhPmwrIsnTx5Uh07drxo3yYHnPbt26t9+/YX7ZeSkqLKykr94x//0MCBAyVJ27dvV2Vl5QWDSH2OHz+ugwcPKjY2VpKUmJio0NBQFRYWaty4cZKk8vJyffzxx3ruuefq3Yfdbpfdbvdou/LKKxtdQ6CLiIhotv+xmoa5CBzMRWBhPgJHc5+Lix25qeOzk4x79uypm266SVOmTNG2bdu0bds2TZkyRaNGjfK4gqpHjx5as2aNJOnUqVN6+OGHVVxcrAMHDqioqEijR49W+/btddttt0n6dmCTJk3Sb37zG7333nsqKSnRL3/5S/Xp08d9VRUAAAhuPjvJWJKWL1+uBx980H3F0y233KIXX3zRo8+ePXvcJ/22aNFC//znP7V06VKdOHFCsbGxSktL0+rVq9W2bVv3Z+bOnauWLVtq3Lhx+uabbzRs2DDl5eWpRYsWvhwOAABoJnwacKKiorRs2bIG+5x/JnTr1q317rvvXnS/YWFhWrBggRYsWPCDa2zO7Ha7nnrqKa/lN1x+zEXgYC4CC/MROIJtLmxWY661AgAAaEZ42CYAADAOAQcAABiHgAMAAIxDwAEAAMYh4BjG6XSqb9++stlsKi0t9Xc5QefAgQOaNGmSunTpotatW6tr16566qmnVF1d7e/SgsbChQvVpUsXhYWFKTExUVu2bPF3SUEnJydH1113ndq2basOHTro1ltv1Z49e/xdFvTt3NhsNs2cOdPfpfgcAccwjzzySKNuYQ3f+PTTT1VbW6tXXnlFn3zyiebOnauXX35Zv/vd7/xdWlBYvXq1Zs6cqccee0wlJSUaPHiwRowYobKyMn+XFlQ2bdqk6dOna9u2bSosLNTZs2eVkZGh06dP+7u0oPbhhx9q8eLFuuaaa/xdymXBZeIGeeedd5SVlaX8/Hz16tVLJSUl6tu3r7/LCnp/+MMftGjRIv3f//2fv0sxXlJSkvr3769Fixa523r27Klbb71VOTk5fqwsuB09elQdOnTQpk2bNGTIEH+XE5ROnTql/v37a+HChXrmmWfUt29f5ebm+rssn+IIjiG+/PJLTZkyRa+99prCw8P9XQ7OU1lZqaioKH+XYbzq6mrt3LnTfef0OhkZGdq6daufqoIk993q+R74z/Tp03XzzTcH1SONfHonY1welmVp4sSJmjp1qgYMGKADBw74uyR857PPPtOCBQv0xz/+0d+lGO/YsWOqqalRdHS0R3t0dLQqKir8VBUsy1JWVpZ+9rOfqXfv3v4uJyitWrVKu3bt0ocffujvUi4rjuAEsFmzZslmszX42rFjhxYsWCCHw6Hs7Gx/l2ysxs7F+Q4fPqybbrpJd9xxhyZPnuynyoOPzWbz2LYsy6sNl88DDzygjz76SCtXrvR3KUHp4MGDmjFjhpYtW6awsDB/l3NZcQ5OADt27JiOHTvWYJ/OnTtrwoQJeuuttzx+idfU1KhFixa666679Je//MXXpRqvsXNR9wvk8OHDSktLU1JSkvLy8hQSwv9L+Fp1dbXCw8P1+uuv67bbbnO3z5gxQ6Wlpdq0aZMfqwtOv/71r7V27Vpt3rxZXbp08Xc5QWnt2rW67bbbPB5GXVNTI5vNppCQEDmdTmMfVE3AMUBZWZkcDod7+/Dhwxo+fLj++te/KikpST/+8Y/9WF3wOXTokNLS0pSYmKhly5YZ+8sjECUlJSkxMVELFy50tyUkJGjMmDGcZHwZWZalX//611qzZo2KiorUrVs3f5cUtE6ePKnPP//co+3ee+9Vjx499Nvf/tboZUPOwTFAp06dPLbbtGkjSeratSvh5jI7fPiwhg4dqk6dOun555/X0aNH3e/FxMT4sbLgkJWVpczMTA0YMEApKSlavHixysrKNHXqVH+XFlSmT5+uFStWaN26dWrbtq37HKjIyEi1bt3az9UFl7Zt23qFmCuuuELt2rUzOtxIBBzgktqwYYP27dunffv2eYVLDpb63vjx43X8+HHNmTNH5eXl6t27twoKChQfH+/v0oJK3WX6Q4cO9Wh/9dVXNXHixMtfEIISS1QAAMA4nPkIAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAwwoEDB2Sz2bxe338eEoDgwMM2ARghLi5O5eXl7u2KigrdeOONGjJkiB+rAuAvPGwTgHGqqqo0dOhQXX311Vq3bp1CQjhYDQQbjuAAMM6kSZN08uRJFRYWEm6AIEXAAWCUZ555RuvXr9c//vEPtW3b1t/lAPATlqgAGCM/P1+/+MUv9M4772jYsGH+LgeAHxFwABjh448/VlJSkrKysjR9+nR3e6tWrRQVFeXHygD4AwEHgBHy8vJ07733erVff/31KioquvwFAfArAg4AADAOlxcAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYJz/DxYHI6oI3yvKAAAAAElFTkSuQmCC", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAHFCAYAAAD/kYOsAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABP0ElEQVR4nO3deVxUdf8+/mtmGIZ92GRREdBIMMwFFHE3E9MWs1zKog0tf7YZrWTdqXflt+Uuyla77eZWS737mLd1ayZWagW4IGBuuIEogggCgyzDMJzfHwNTI9vMyHBmuZ6Px9w6Z97n8Dqvm6arc97nHIkgCAKIiIiI7IhU7AKIiIiIuhsDDhEREdkdBhwiIiKyOww4REREZHcYcIiIiMjuMOAQERGR3WHAISIiIrvDgENERER2hwGHiIiI7A4DDpEDy8jIwNKlS1FVVWXxn1VYWAiJRIJ3333X7G389NNPiI2Nhbu7OyQSCf773/92X4EmOnr0KJYuXYrCwsI2nz300EMICwvr8ZqI6E8MOEQOLCMjA8uWLeuRgHOtBEHAnDlzIJfL8d133yEzMxMTJkwQrZ6jR49i2bJl7QacV199FZs3b+75oohIz0nsAoiIjHHhwgVcvnwZM2fOxOTJk8Uup1MDBgwQuwQih8cjOEQOaunSpXj++ecBAOHh4ZBIJJBIJNi1axcAYOPGjUhISEBwcDBcXV0RFRWFl156CbW1tQbbeeihh+Dh4YFTp05h+vTp8PDwQEhICJ599lmo1ep2f/Z7772H8PBweHh4ID4+HllZWV3W2rdvXwDAiy++CIlEoj8F1NHpoKVLl0IikRgsk0gkeOKJJ7B27VpERUXBzc0NQ4YMwf/+97826x8/fhz33nsvAgMDoVAo0K9fPzzwwANQq9VIS0vD7NmzAQCTJk3S9y4tLa3DmhoaGpCSkoLw8HA4OzujT58+ePzxx9scPQsLC8Ntt92G7du3Y/jw4XB1dUVkZCS+/PLLTntERIZ4BIfIQc2fPx+XL1/GypUr8e233yI4OBgAMGjQIADAyZMnMX36dCxevBju7u44fvw43nrrLezbtw8///yzwbY0Gg3uuOMOJCUl4dlnn8WePXvw97//HUqlEn/7298Mxn788ceIjIxEamoqAN3pnOnTp6OgoABKpbLDWocMGYK77roLTz75JObNmweFQmHWfm/duhX79+/H8uXL4eHhgbfffhszZ85Efn4++vfvDwDIy8vD2LFj4e/vj+XLlyMiIgIlJSX47rvv0NjYiFtvvRVvvvkmXn75ZXz88ccYPnw4gI6P3AiCgDvvvBM//fQTUlJSMG7cOBw6dAivvfYaMjMzkZmZabA/eXl5ePbZZ/HSSy8hMDAQ//znP5GUlITrrrsO48ePN2u/iRyOQEQO65133hEACAUFBZ2Oa25uFjQajbB7924BgJCXl6f/7MEHHxQACP/5z38M1pk+fbowcOBA/fuCggIBgDB48GChqalJv3zfvn0CAGH9+vWd1tC6/jvvvGOw/MEHHxRCQ0PbjH/ttdeEq7/iAAiBgYGCSqXSLystLRWkUqmwYsUK/bKbbrpJ8Pb2FsrKyjqs55tvvhEACL/88kubz66uafv27QIA4e233zYYt3HjRgGAsGrVKv2y0NBQwcXFRTh79qx+WX19veDr6ys89thjHdZDRIZ4ioqI2nXmzBnMmzcPQUFBkMlkkMvl+km9x44dMxgrkUhw++23Gyy78cYbcfbs2TbbvfXWWyGTyQzGAWh3rCVMmjQJnp6e+veBgYEICAjQ//y6ujrs3r0bc+bMQa9evbrlZ7Ye8XrooYcMls+ePRvu7u746aefDJYPHToU/fr10793cXHB9ddf32M9IrIHPEVFRG1cuXIF48aNg4uLC15//XVcf/31cHNzw7lz53DXXXehvr7eYLybmxtcXFwMlikUCjQ0NLTZtp+fX5txANps01Ku/vmtNbT+/MrKSmi1Wv2cn+5QUVEBJyenNoFJIpEgKCgIFRUVJtVIRF1jwCGiNn7++WdcuHABu3btMrgU2xovJ3dxcWl3MnN5eblZ2/P19YVMJsP58+evtTQ9Pz8/NDU14dKlSwYhRxAElJaWYsSIEd32s4hIh6eoiBxYR0dPWq8+unoi7+eff94zhZkgLCwMZWVluHjxon5ZY2MjfvzxR7O25+rqigkTJuCbb77pNCSZcuSp9bL2devWGSzftGkTamtrrf6ydyJbxCM4RA5s8ODBAIAPPvgADz74IORyOQYOHIjRo0fDx8cHCxcuxGuvvQa5XI6vvvoKeXl5Ilfc1ty5c/G3v/0N99xzD55//nk0NDTgww8/hFarNXub7733HsaOHYu4uDi89NJLuO6663Dx4kV89913+Pzzz+Hp6Yno6GgAwKpVq+Dp6QkXFxeEh4e3e3ppypQpmDp1Kl588UWoVCqMGTNGfxXVsGHDkJiYaHatRNQ+HsEhcmATJ05ESkoKvv/+e4wdOxYjRoxAdnY2/Pz8sHXrVri5ueH+++/HI488Ag8PD2zcuFHsktsIDw/Hli1bUFVVhVmzZuH555/H7Nmz8cADD5i9zSFDhmDfvn2IiYlBSkoKbrnlFrz44otQKBRwdnbW/9zU1FTk5eVh4sSJGDFiBL7//vt2t9f6WInk5GT861//wvTp0/Huu+8iMTERP//8s9mXvBNRxySCIAhiF0FERETUnXgEh4iIiOwOAw4RERHZHQYcIiIisjsWDTh79uzB7bffjt69e+sn2XVl9+7diImJgYuLC/r374/PPvuszZhNmzZh0KBBUCgUGDRoEDZv3myB6omIiMhWWTTg1NbWYsiQIfjoo4+MGl9QUIDp06dj3LhxyMnJwcsvv4ynnnoKmzZt0o/JzMzE3LlzkZiYiLy8PCQmJmLOnDnYu3evpXaDiIiIbEyPXUUlkUiwefNm3HnnnR2OefHFF/Hdd98ZPOdm4cKFyMvLQ2ZmJgDdPS9UKhV++OEH/ZhbbrkFPj4+WL9+vcXqJyIiItthVTf6y8zMREJCgsGyqVOnYvXq1dBoNJDL5cjMzMQzzzzTZkxqamqH21Wr1Qa3cm9ubsbly5fh5+env2MrERERWTdBEFBTU4PevXtDKu38JJRVBZzS0lIEBgYaLAsMDERTUxPKy8sRHBzc4ZjS0tIOt7tixQosW7bMIjUTERFRzzp37lyXD8S1qoADoM0RldYzaH9d3t6Yzo7EpKSkIDk5Wf++uroa/fr1Q0FBATw9PbusSaPR4JdffsGkSZMgl8uN2g9i364Fe2c+9s583dE7bbOAspoGFFc14HxVPYorG1CqasAllRoXaxpQVqNGXWPzNdfq7CSFm1wKZycpXOQyuDhJIXeSQuEkg7NMCrmTBM4yCZxlUjjJpJDLpJDLJHCSSiCTSSGXtvxdKoVMKml5QfenBJBKJZBKJJBKJZBBAulfl0nQ8qcEEgkgkUogNDUh748/MHTIjZDLnSCB7jOpBPq/A9CNhwQSQPf667+7JIbLJC3jdX+X6Ndvz5/j2l/+lx/Rqa5Oaki63ILpNE1a7Nu3F3FxcXByknU4TiGXQinTIjw83Kh/d1tVwAkKCmpzJKasrAxOTk7657t0NObqozp/pVAo2r0Vuq+vL7y8vLqsS6PRwM3NDX5+fvzCNAH7Zj72znzsnflM6Z0gCDhfWY/DxdU4cfEKTpbV4FTZFZwpr0VjU0cBRgJIXCBVAC5yKfw9FPD3UMDX3RnebnJ4uzrDx00OL1c5PF2c4Omi+9ND4QR3hRPcFTK4OzvBVS6DVGpd0ws0Gg2cKk5h+ogI/t6ZSKPRoPzMHxgxMKTL3qlUKgBtD3S0x6oCTnx8fJtnuezYsQOxsbH6nY6Pj0d6errBPJwdO3Zg9OjRPVorEZEjqa7XIPvsZewvrMSh81U4XKxCdb2m3bFOUgn6+riin587Qnxc0dvbFcFKFwQpXRDk5YIALxe4O8s4B5IsyqIB58qVKzh16pT+fUFBAXJzc+Hr64t+/fohJSUFxcXFWLNmDQDdFVMfffQRkpOTsWDBAmRmZmL16tUGV0c9/fTTGD9+PN566y3MmDEDW7Zswc6dO/Hbb79ZcleIiBxKg0aLfQWX8Ut+GbLOXMbxUhWuvuZWLpNgYJAnIoO8EBHggetaXn193CCzsiMs5HgsGnAOHDiASZMm6d+3zoN58MEHkZaWhpKSEhQVFek/Dw8Px7Zt2/DMM8/g448/Ru/evfHhhx/i7rvv1o8ZPXo0NmzYgFdeeQWvvvoqBgwYgI0bNyIuLs6Su0JEZPeq6zXIvCjBlnU5yDxzGfUarcHn4f7uGBHmg+H9fBDdR4nrAz3h7MQb4pN1smjAmThxIjq7zU5aWlqbZRMmTMDBgwc73e6sWbMwa9asay2PiMjhNWi02HnsIrbkXsCu/DJotDIAlwAAgV4K3BQZgLHX9cKIcB8EeLqIWyyRCaxqDg4REfWMc5frsCazEBv3n4OqoUm/vLebgDnxEZhyQxAGBXtxngzZLAYcIiIHknWmAv/8tQA/Hb+on1PTx9sVM4b2xq3RATiV/SumT+zPK4HI5jHgEBE5gINFlfjHjnz8fqpCv2xchD8eHhOGidcHQCqVQKPR4FQn2yCyJQw4RER27FiJCu/8mI+fj5cB0F35NCc2BA+PCcd1AR4iV0dkOQw4RER2qFbdhPfTT+DL3wvQLOju0DtreF88Ofk69PVxE7s8IotjwCEisjPpRy/itS2HcaG6AQAwfXAQnksYiP69eMSGHAcDDhGRnVA1aJDy7R/YeqgEABDi64q/z4jGxIEBIldG1PMYcIiI7MAf56vx+NcHUXS5Dk5SCRaM74+nboqAq3PHDy8ksmcMOERENkwQBKzJPIs3th5Do7YZfX1c8dG84Rga4i12aUSiYsAhIrJR6iYtXvi/Q9iSewEAkDAoEO/MGgKlG+9hQ8SAQ0Rkg2oaNHhsbTYyTlfASSrBy9Oj8PCYMN55mKgFAw4RkY0pUzXgwX/tx7ESFdydZfg8MRZjI/zFLovIqjDgEBHZkDOXruCBL/fhfGU9/D0USHt4BKL7KMUui8jqMOAQEdmIc5frcO8XWbioUiPMzw1rHolDPz/etI+oPQw4REQ2oKymAfev3ouLKjWuD/TA1wtGwd9DIXZZRFZLKnYBRETUueo6DR5YvQ9nK+oQ4uuKtUlxDDdEXWDAISKyYnWNTXjk3/txvLQGvTwVWJcUh0AvF7HLIrJ6DDhERFaquVnAU+tzkX22EkpXOdYmjUSon7vYZRHZBAYcIiIr9cmuU9h57CKcnaT48qERiAzyErskIpvBgENEZIX2nLiEf6SfAAC8PiMaMaE+IldEZFsYcIiIrMy5y3V4akMOBAG4d2Q/zBkRInZJRDaHAYeIyIo0aLT4/77KRlWdBkP6KrH0jkFil0RkkxhwiIisyOtbj+JwsQq+7s745P4YKJxkYpdEZJMYcIiIrETGqXKsyyoCAKTOHYo+3q4iV0RkuxhwiIisQK26CS9sOgQAuH9UP4y/vpfIFRHZNgYcIiIr8Nb24zhfWY8+3q54aVqU2OUQ2TwGHCIikWWersCazLMAgLdn3QgPBR8TSHStGHCIiERU19iEF1tOTd07sh/GXOcvckVE9oEBh4hIRO+nn0DR5Tr0Vrrg5emRYpdDZDcYcIiIRFJYXou0jEIAwBszB8PTRS5uQUR2hAGHiEgkK344Bo1WwITre2FSZIDY5RDZFQYcIiIRZJ2pwI9HLkImlWDJrbxqiqi7MeAQEfWw5mYBr289CgC4d2QIrg/0FLkiIvvDgENE1MM25xTjcLEKngonLL75erHLIbJLDDhERD2orrEJ7/yYDwB4/Kbr4O+hELkiIvvUIwHnk08+QXh4OFxcXBATE4Nff/21w7EPPfQQJBJJm9cNN9ygH5OWltbumIaGhp7YHSIis335WwFKVQ3o6+OKh0aHiV0Okd2yeMDZuHEjFi9ejCVLliAnJwfjxo3DtGnTUFRU1O74Dz74ACUlJfrXuXPn4Ovri9mzZxuM8/LyMhhXUlICFxcXS+8OEZHZatVN+OdvBQCA56cOhIucTwonshSLB5z33nsPSUlJmD9/PqKiopCamoqQkBB8+umn7Y5XKpUICgrSvw4cOIDKyko8/PDDBuMkEonBuKCgIEvvChHRNflq71lU1WkQ7u+O227sLXY5RHbNog88aWxsRHZ2Nl566SWD5QkJCcjIyDBqG6tXr8bNN9+M0NBQg+VXrlxBaGgotFothg4dir///e8YNmxYu9tQq9VQq9X69yqVCgCg0Wig0Wi6rKF1jDFj6U/sm/nYO/NZa+8aNFqs2nMGAPDouDA0a5vQrBW5qKtYa+9sAXtnPlN6Z0p/LRpwysvLodVqERgYaLA8MDAQpaWlXa5fUlKCH374AV9//bXB8sjISKSlpWHw4MFQqVT44IMPMGbMGOTl5SEiIqLNdlasWIFly5a1Wb5jxw64ubkZvT/p6elGj6U/sW/mY+/MZ22921MiQfkVGXycBSgu5GFbaZ7YJXXI2npnS9g78xnTu7q6OqO31yOPrJVIJAbvBUFos6w9aWlp8Pb2xp133mmwfNSoURg1apT+/ZgxYzB8+HCsXLkSH374YZvtpKSkIDk5Wf9epVIhJCQECQkJ8PLy6rIOjUaD9PR0TJkyBXI5b6VuLPbNfOyd+ayxd41Nzfh/qb8BaMDTUwfh9pEhYpfULmvsna1g78xnSu9az8AYw6IBx9/fHzKZrM3RmrKysjZHda4mCAK+/PJLJCYmwtnZudOxUqkUI0aMwMmTJ9v9XKFQQKFoeymmXC436RfR1PGkw76Zj70znzX17tvcIpRUNyDAU4F7RoZCbuWTi62pd7aGvTOfMb0zpbcWnWTs7OyMmJiYNoed0tPTMXr06E7X3b17N06dOoWkpKQuf44gCMjNzUVwcPA11UtE1N2atM34ZNdpAMCj4/vzyimiHmLxU1TJyclITExEbGws4uPjsWrVKhQVFWHhwoUAdKePiouLsWbNGoP1Vq9ejbi4OERHR7fZ5rJlyzBq1ChERERApVLhww8/RG5uLj7++GNL7w4RkUm2/lGCsxV18HV3xry4fmKXQ+QwLB5w5s6di4qKCixfvhwlJSWIjo7Gtm3b9FdFlZSUtLknTnV1NTZt2oQPPvig3W1WVVXh0UcfRWlpKZRKJYYNG4Y9e/Zg5MiRlt4dIiKT/Ov3QgDAw6PD4ObcI9MeiQg9NMl40aJFWLRoUbufpaWltVmmVCo7nSn9/vvv4/333++u8oiILOLQ+SrknquCs0yKe3n0hqhH8VlUREQWsibzLABg+uAgPnOKqIcx4BARWUBlbSO+z7sAAEiMDxO3GCIHxIBDRGQB/zlwDuqmZtzQ2wvD+3mLXQ6Rw2HAISLqZtpmAev26k5PPRAfatSNTYmoezHgEBF1s90nynDucj2UrnLcMaSP2OUQOSQGHCKibtY6uXh2TF+4OvPGfkRiYMAhIupGheW12JV/CQBw/6hQkashclwMOERE3WjjgXMAgAnX90KYv7vI1RA5LgYcIqJu0tws4L85xQCAuSOs84nhRI6CAYeIqJtknalASXUDvFyccFNkgNjlEDk0Bhwiom6y6aDu6M1tQ3rzqeFEImPAISLqBnWNTfjhcAkA4K5hvDScSGwMOERE3WDHkYuoa9Sin68bYkJ9xC6HyOEx4BARdYNNB88DAO4a3od3LiayAgw4RETX6KKqAb+fKgcAzOTpKSKrwIBDRHSNtuQWo1kAYkN9EOrHe98QWQMGHCKia/Rty9VTdw3vK3IlRNSKAYeI6BocvaDC8dIaOMukuHVwsNjlEFELBhwiomvw/aELAIDJUQFQuslFroaIWjHgEBGZSRAEbD9cCgCYzqM3RFaFAYeIyEz5F2tQUF4LZycpJvHRDERWhQGHiMhMP/yhO3ozPqIXPBROIldDRH/FgENEZKYfj+gCzi3RQSJXQkRXY8AhIjJDQXktjpfWwEkqwZSoQLHLIaKrMOAQEZmh9cGa8QP8ePUUkRViwCEiMkPr1VPTonn1FJE1YsAhIjLR+co6HDpfDakESLiBp6eIrBEDDhGRiVqP3owI84W/h0LkaoioPQw4REQmag04vHqKyHox4BARmaBM1YDsokoADDhE1owBh4jIBDuPlUEQgKEh3ghWuopdDhF1gAGHiMgEPx8vAwDcHMVHMxBZMwYcIiIjqZu0+P1UOQBg4kAGHCJrxoBDRGSkfQWXUa/RIsBTgRt6e4ldDhF1ggGHiMhIvxy/BACYNDAAEolE5GqIqDM9EnA++eQThIeHw8XFBTExMfj11187HLtr1y5IJJI2r+PHjxuM27RpEwYNGgSFQoFBgwZh8+bNlt4NInJwu/J1828mRfYSuRIi6orFA87GjRuxePFiLFmyBDk5ORg3bhymTZuGoqKiTtfLz89HSUmJ/hUREaH/LDMzE3PnzkViYiLy8vKQmJiIOXPmYO/evZbeHSJyUIXltThTXgsnqQRjrvMXuxwi6oLFA857772HpKQkzJ8/H1FRUUhNTUVISAg+/fTTTtcLCAhAUFCQ/iWTyfSfpaamYsqUKUhJSUFkZCRSUlIwefJkpKamWnhviMhR/dJy9GZEmC88XfhwTSJrZ9GA09jYiOzsbCQkJBgsT0hIQEZGRqfrDhs2DMHBwZg8eTJ++eUXg88yMzPbbHPq1KldbpOIyFy/5Ovm39wUyauniGyBkyU3Xl5eDq1Wi8BAw4fRBQYGorS0tN11goODsWrVKsTExECtVmPt2rWYPHkydu3ahfHjxwMASktLTdqmWq2GWq3Wv1epVAAAjUYDjUbT5X60jjFmLP2JfTMfe2c+S/SurrEJWWcqAABjB/jY7f8v/L0zH3tnPlN6Z0p/LRpwWl19tYEgCB1egTBw4EAMHDhQ/z4+Ph7nzp3Du+++qw84pm5zxYoVWLZsWZvlO3bsgJubm9H7kZ6ebvRY+hP7Zj72znzd2bvDlRI0NsngqxCQv38PTtj5BVT8vTMfe2c+Y3pXV1dn9PYsGnD8/f0hk8naHFkpKytrcwSmM6NGjcK6dev074OCgkzaZkpKCpKTk/XvVSoVQkJCkJCQAC+vru9lodFokJ6ejilTpkAu57l3Y7Fv5mPvzGeJ3u39/iiA85g2pB9uvTWqW7Zpjfh7Zz72znym9K71DIwxLBpwnJ2dERMTg/T0dMycOVO/PD09HTNmzDB6Ozk5OQgODta/j4+PR3p6Op555hn9sh07dmD06NHtrq9QKKBQKNosl8vlJv0imjqedNg387F35uuu3gmCgN0ndKenJg8KdIj/P/h7Zz72znzG9M6U3lr8FFVycjISExMRGxuL+Ph4rFq1CkVFRVi4cCEA3dGV4uJirFmzBoDuCqmwsDDccMMNaGxsxLp167Bp0yZs2rRJv82nn34a48ePx1tvvYUZM2Zgy5Yt2LlzJ3777TdL7w4ROZiTZVdQXFUPhZMU8f15eTiRrbB4wJk7dy4qKiqwfPlylJSUIDo6Gtu2bUNoaCgAoKSkxOCeOI2NjXjuuedQXFwMV1dX3HDDDdi6dSumT5+uHzN69Ghs2LABr7zyCl599VUMGDAAGzduRFxcnKV3h4gczJ4Tuqun4vr7wdVZ1sVoIrIWPTLJeNGiRVi0aFG7n6WlpRm8f+GFF/DCCy90uc1Zs2Zh1qxZ3VEeEVGHMk63XD11nZ/IlRCRKfgsKiKiDmi0zdjbcnk4715MZFsYcIiIOpB3rgq1jVr4ujsjKohPDyeyJQw4REQd+O1UOQAgfoAfpFI7v/kNkZ1hwCEi6kDGqZbTUwN4eorI1jDgEBG1o1bdhINFlQCAsZx/Q2RzGHCIiNqxr/AympoF9PVxRT8/4x/pQkTWgQGHiKgdv5/Uzb/h0Rsi28SAQ0TUjt9b7n8zmgGHyCYx4BARXaX8ihrHSnQP9Rs9gDf4I7JFDDhERFfJbDl6ExnkCX+Ptg/qJSLrx4BDRHSV309x/g2RrWPAISK6yu+ndQGHj2cgsl0MOEREf1FUUYdzl+vhJJVgZLiv2OUQkZkYcIiI/iKr5eGaQ0O84a5wErkaIjIXAw4R0V/sLbgMAIjrz6M3RLaMAYeI6C/2FuiO4IwM5+XhRLaMAYeIqEVxVT3OV9ZDJpUgJtRH7HKI6Bow4BARtdjfcnoqurcXPDj/hsimMeAQEbX48/QU598Q2ToGHCKiFvoJxpx/Q2TzGHCIiABcqlHjzKVaSCTAiDAewSGydQw4REQA9rUcvYkM8oLSTS5yNUR0rRhwiIgA7GuZfxPH+TdEdoEBh4gIf86/4QRjIvvAgENEDq+qrhH5F2sAMOAQ2QsGHCJyePsLKyEIwIBe7vD3UIhdDhF1AwYcInJ4e8/w8QxE9oYBh4gc3r5C3fybUXzAJpHdYMAhIod2Rd2Ew8XVADj/hsieMOAQkUM7eLYSzQIQ4uuKYKWr2OUQUTdhwCEih3bgbCUAIDaUR2+I7AkDDhE5tOyzuvk3MaE+IldCRN2JAYeIHFaTthm5RVUAgNgwBhwie8KAQ0QO63hpDWobtfBUOCEiwFPscoioGzHgEJHDym6ZfzMs1AcyqUTkaoioO/VIwPnkk08QHh4OFxcXxMTE4Ndff+1w7LfffospU6agV69e8PLyQnx8PH788UeDMWlpaZBIJG1eDQ0Nlt4VIrIjf04w5ukpIntj8YCzceNGLF68GEuWLEFOTg7GjRuHadOmoaioqN3xe/bswZQpU7Bt2zZkZ2dj0qRJuP3225GTk2MwzsvLCyUlJQYvFxcXS+8OEdmRgy0BhxOMieyPk6V/wHvvvYekpCTMnz8fAJCamooff/wRn376KVasWNFmfGpqqsH7N998E1u2bMH333+PYcOG6ZdLJBIEBQVZtHYisl8l1fUorqqHTCrB0BBvscshom5m0SM4jY2NyM7ORkJCgsHyhIQEZGRkGLWN5uZm1NTUwNfX8B4VV65cQWhoKPr27YvbbrutzREeIqLOHCjUHb2JCvaEu8Li/61HRD3Mov9Ul5eXQ6vVIjAw0GB5YGAgSktLjdrGP/7xD9TW1mLOnDn6ZZGRkUhLS8PgwYOhUqnwwQcfYMyYMcjLy0NERESbbajVaqjVav17lUoFANBoNNBoNF3W0DrGmLH0J/bNfOyd+Yzt3b4C3QM2h/VVss8t+HtnPvbOfKb0zpT+SgRBEMyuqgsXLlxAnz59kJGRgfj4eP3yN954A2vXrsXx48c7XX/9+vWYP38+tmzZgptvvrnDcc3NzRg+fDjGjx+PDz/8sM3nS5cuxbJly9os//rrr+Hm5mbCHhGRvXj3kAznaiV4MEKL4f4W+xokom5UV1eHefPmobq6Gl5eXp2OtegRHH9/f8hksjZHa8rKytoc1bnaxo0bkZSUhG+++abTcAMAUqkUI0aMwMmTJ9v9PCUlBcnJyfr3KpUKISEhSEhI6LJBgC4xpqenY8qUKZDL5V2OJx32zXzsnfmM6V2tugnJe38BICBpxiQEK3mBAsDfu2vB3pnPlN61noExhkUDjrOzM2JiYpCeno6ZM2fql6enp2PGjBkdrrd+/Xo88sgjWL9+PW699dYuf44gCMjNzcXgwYPb/VyhUEChULRZLpfLTfpFNHU86bBv5mPvzNdZ746erYa2WUBvpQv6+fMGf1fj75352DvzGdM7U3pr8Zl1ycnJSExMRGxsLOLj47Fq1SoUFRVh4cKFAHRHV4qLi7FmzRoAunDzwAMP4IMPPsCoUaP0R39cXV2hVCoBAMuWLcOoUaMQEREBlUqFDz/8ELm5ufj4448tvTtEZAda738znJeHE9ktiwecuXPnoqKiAsuXL0dJSQmio6Oxbds2hIaGAgBKSkoM7onz+eefo6mpCY8//jgef/xx/fIHH3wQaWlpAICqqio8+uijKC0thVKpxLBhw7Bnzx6MHDnS0rtDRHaAN/gjsn89cm3kokWLsGjRonY/aw0trXbt2tXl9t5//328//773VAZETma5mYBOa0BJ8y3i9FEZKv4LCoicigny66gRt0EN2cZIoM4/4bIXjHgEJFDySnSHb25sa8STjJ+BRLZK/7TTUQO5WBLwBnej/NviOwZAw4ROZScoioAwDAGHCK7xoBDRA6jul6Dk2VXAIAP2CSycww4ROQwDp2vAgCE+Lqil2fbm38Skf1gwCEih9F6eorzb4jsHwMOETmM1gnGw3h6isjuMeAQkUMQBIETjIkcCAMOETmEgvJaVNdroHCSIirYS+xyiMjCGHCIyCG0Hr0Z3EcJZyd+9RHZO/5TTkQOIedcy/ybft7iFkJEPYIBh4gcwsGzVQA4/4bIUTDgEJHdq2tswvFSFQBeIk7kKBhwiMjuHTpfjWYBCFa6IEjpInY5RNQDGHCIyO79eXm4t6h1EFHPYcAhIruXo7/BH09PETkKBhwismuCIOBg6yMaQr1FrYWIeg4DDhHZtfOV9Si/ooaTVIIbeivFLoeIeggDDhHZtdxzVQCAQb294CKXiVsMEfUYBhwismutAWcoH7BJ5FAYcIjIrrUGnCF9vUWtg4h6FgMOEdktjbYZh4urAQBDeYk4kUNhwCEiu3W8pAbqpmZ4uTgh3M9d7HKIqAcx4BCR3cptecDmkBBvSKUSkashop7EgENEdiunZf7NME4wJnI4DDhEZLf0V1Bx/g2Rw2HAISK7VF2vwZlLtQB4BRWRI2LAISK7dKjl6ql+vm7w81CIXA0R9TQGHCKyS3nnWi4P5/wbIofEgENEdinvPAMOkSNjwCEiuyMIfwk4nGBM5JAYcIjI7lSogco6DeQyCQYFe4ldDhGJgAGHiOzO2Su6m/oNCuYTxIkcFQMOEdmdszW6gMP5N0SOiwGHiOxO6xEczr8hclw9EnA++eQThIeHw8XFBTExMfj11187Hb97927ExMTAxcUF/fv3x2effdZmzKZNmzBo0CAoFAoMGjQImzdvtlT5RGRDGpuacV53fz8MDfERtxgiEo3FA87GjRuxePFiLFmyBDk5ORg3bhymTZuGoqKidscXFBRg+vTpGDduHHJycvDyyy/jqaeewqZNm/RjMjMzMXfuXCQmJiIvLw+JiYmYM2cO9u7da+ndISIrl3+xBk2CBEpXJ4T5uYldDhGJxOIB57333kNSUhLmz5+PqKgopKamIiQkBJ9++mm74z/77DP069cPqampiIqKwvz58/HII4/g3Xff1Y9JTU3FlClTkJKSgsjISKSkpGDy5MlITU219O4QkZVrvTz8xj5KSCR8gjiRo3Ky5MYbGxuRnZ2Nl156yWB5QkICMjIy2l0nMzMTCQkJBsumTp2K1atXQ6PRQC6XIzMzE88880ybMR0FHLVaDbVarX+vUqkAABqNBhqNpsv9aB1jzFj6E/tmPvbOfDlFlQCA6N6e7J+J+HtnPvbOfKb0zpT+WjTglJeXQ6vVIjAw0GB5YGAgSktL212ntLS03fFNTU0oLy9HcHBwh2M62uaKFSuwbNmyNst37NgBNzfjD2Gnp6cbPZb+xL6Zj70zXdYJGQAJtGWnsW3bKbHLsUn8vTMfe2c+Y3pXV1dn9PYsGnBaXX2YWBCETg8dtzf+6uWmbDMlJQXJycn69yqVCiEhIUhISICXV9c3AdNoNEhPT8eUKVMgl8u7HE867Jv52DvzVNdrUJb5CwDgwdvGI0DpLnJFtoW/d+Zj78xnSu9az8AYw6IBx9/fHzKZrM2RlbKysjZHYFoFBQW1O97JyQl+fn6djulomwqFAgpF26cJy+Vyk34RTR1POuyb+dg70xwtqAIA+CsEBCjd2Tsz8ffOfOyd+YzpnSm9tegkY2dnZ8TExLQ57JSeno7Ro0e3u058fHyb8Tt27EBsbKx+xzoa09E2icgx5J6rAgD08xDELYSIRGfxU1TJyclITExEbGws4uPjsWrVKhQVFWHhwoUAdKePiouLsWbNGgDAwoUL8dFHHyE5ORkLFixAZmYmVq9ejfXr1+u3+fTTT2P8+PF46623MGPGDGzZsgU7d+7Eb7/9ZundISIr1hpwQj0ZcIgcncUDzty5c1FRUYHly5ejpKQE0dHR2LZtG0JDQwEAJSUlBvfECQ8Px7Zt2/DMM8/g448/Ru/evfHhhx/i7rvv1o8ZPXo0NmzYgFdeeQWvvvoqBgwYgI0bNyIuLs7Su0NEVkoQBOS1BJwwHsEhcng9Msl40aJFWLRoUbufpaWltVk2YcIEHDx4sNNtzpo1C7NmzeqO8ojIDpyvrEdFbSPkMgn6cG4xkcPjs6iIyC7ktBy9iQryhJzfbEQOj18DRGQXcouqAAA39lWKWwgRWQUGHCKyC7nndHcwHsKAQ0RgwCEiO6DRNuPwBd0NwBhwiAhgwCEiO3C8pAaNTc1Qusr5BHEiAsCAQ0R2QH96KsSbTxAnIgAMOERkB1qvoBoa4i1qHURkPRhwiMjm5ekDDuffEJEOAw4R2bTqeg1OX6oFAAzp6y1uMURkNRhwiMimtR69CfVzg5+HQtxiiMhqMOAQkU3LabnB3zDOvyGiv2DAISKb1noF1bB+PiJXQkTWhAGHiGyWIAi8goqI2sWAQ0Q2q7CiDlV1Gjg7SREV7CV2OURkRRhwiMhmtZ6eGtxHCWcnfp0R0Z/4jUBENqt1gjFPTxHR1RhwiMhm6a+g6uctah1EZH0YcIjIJjVotDhWonuCOK+gIqKrMeAQkU06XFyNpmYBAZ4K9Fa6iF0OEVkZBhwiskl/nX/DJ4gT0dUYcIjIJuXwBn9E1AkGHCKySbmcYExEnWDAISKbU1rdgAvVDZBKgBv7KsUuh4isEAMOEdmc1hv8DQzygpuzk8jVEJE1YsAhIpvT+vwpnp4ioo4w4BCRzdHf4I93MCaiDjDgEJFN0Wib8cf5agA8gkNEHWPAISKbcrykBvUaLZSucvT39xC7HCKyUgw4RGRTss9eBgAM7+cNqZQ3+COi9jHgEJFNyW6ZfzOcN/gjok4w4BCRTTl4VneJeEwoAw4RdYwBh4hsRml1A4qr6iGVAEN4BRURdYIBh4hsxsEi3dGbqGAvuCt4gz8i6hgDDhHZjOyW01Ocf0NEXWHAISKb0XoEh/NviKgrFg04lZWVSExMhFKphFKpRGJiIqqqqjocr9Fo8OKLL2Lw4MFwd3dH79698cADD+DChQsG4yZOnAiJRGLwuueeeyy5K0QksgaNFoeLdTf4Y8Ahoq5YNODMmzcPubm52L59O7Zv347c3FwkJiZ2OL6urg4HDx7Eq6++ioMHD+Lbb7/FiRMncMcdd7QZu2DBApSUlOhfn3/+uSV3hYhEdri4GhqtAH8PBfr6uIpdDhFZOYvN0jt27Bi2b9+OrKwsxMXFAQC++OILxMfHIz8/HwMHDmyzjlKpRHp6usGylStXYuTIkSgqKkK/fv30y93c3BAUFGSp8onIymTrLw/3hkTCG/wRUecsFnAyMzOhVCr14QYARo0aBaVSiYyMjHYDTnuqq6shkUjg7e1tsPyrr77CunXrEBgYiGnTpuG1116Dp6dnu9tQq9VQq9X69yqVCoDulJhGo+myhtYxxoylP7Fv5mPv2jpQqLuD8dC+yk77wt6Zj70zH3tnPlN6Z0p/LRZwSktLERAQ0GZ5QEAASktLjdpGQ0MDXnrpJcybNw9eXl765ffddx/Cw8MRFBSEw4cPIyUlBXl5eW2O/rRasWIFli1b1mb5jh074ObmZuQeocPtU+fYN/OxdzqCAGSdkgGQQF18FNu2He1yHfbOfOyd+dg78xnTu7q6OqO3Z3LAWbp0abth4a/2798PAO0eRhYEwajDyxqNBvfccw+am5vxySefGHy2YMEC/d+jo6MRERGB2NhYHDx4EMOHD2+zrZSUFCQnJ+vfq1QqhISEICEhwSA4dVZLeno6pkyZArlc3uV40mHfzMfeGSq6XIearN8gl0kw/66pUMhlHY5l78zH3pmPvTOfKb1rPQNjDJMDzhNPPNHlFUthYWE4dOgQLl682OazS5cuITAwsNP1NRoN5syZg4KCAvz8889dhpDhw4dDLpfj5MmT7QYchUIBhULRZrlcLjfpF9HU8aTDvpmPvdM5dKEGABDdRwkPNxej1mHvzMfemY+9M58xvTOltyYHHH9/f/j7+3c5Lj4+HtXV1di3bx9GjhwJANi7dy+qq6sxevToDtdrDTcnT57EL7/8Aj8/vy5/1pEjR6DRaBAcHGz8jhCRzTh4tgoAEMMb/BGRkSx2mXhUVBRuueUWLFiwAFlZWcjKysKCBQtw2223GUwwjoyMxObNmwEATU1NmDVrFg4cOICvvvoKWq0WpaWlKC0tRWNjIwDg9OnTWL58OQ4cOIDCwkJs27YNs2fPxrBhwzBmzBhL7Q4RiSibD9gkIhNZ9D44X331FQYPHoyEhAQkJCTgxhtvxNq1aw3G5Ofno7pad/Ou8+fP47vvvsP58+cxdOhQBAcH618ZGRkAAGdnZ/z000+YOnUqBg4ciKeeegoJCQnYuXMnZLKOz8sTkW1SNWhwrFR33p0Bh4iMZdGn1fn6+mLdunWdjhEEQf/3sLAwg/ftCQkJwe7du7ulPiKyftlnKyEIQKifGwK8jJt/Q0TEZ1ERkVXbX6C7/82IMF+RKyEiW8KAQ0RWbX/LDf5GMuAQkQkYcIjIajVotMg7p5ujNyKcAYeIjMeAQ0RW69D5ajRqm+HvoUCYn/F3HSciYsAhIqulPz0V7sMHbBKRSRhwiMhq7eMEYyIyEwMOEVklbbOAgy03+GPAISJTMeAQkVU6VqJCjboJngonRAV3/VBcIqK/YsAhIqvUOv9meKgPZFLOvyEi0zDgEJFV+nOCMU9PEZHpGHCIyOoIgoB9BZx/Q0TmY8AhIqtTUF6L8itqOMukuLGvUuxyiMgGMeAQkdVpPT01JEQJF7lM5GqIyBYx4BCR1eHpKSK6Vgw4RGR19hVWAODzp4jIfAw4RGRVzl2uw7nL9ZBJJTyCQ0RmY8AhIquSeUZ39GZIXyU8FE4iV0NEtooBh4isStZpXcCJH+AnciVEZMsYcIjIagiCgIyWgDN6gL/I1RCRLWPAISKrUVBei1JVA5xlUsSE+ohdDhHZMAYcIrIarfNvhvXz5v1viOiaMOAQkdXg6Ski6i4MOERkFQRB0E8wHn0dJxgT0bVhwCEiq3Di4hVU1DbCVS7DkL7eYpdDRDaOAYeIrELG6XIAQGyYD5yd+NVERNeG3yJEZBUyef8bIupGDDhEJDpts4CsM5xgTETdhwGHiER39IIKqoYmeCqcEN3bS+xyiMgOMOAQkegyz+jm34wM94WTjF9LRHTt+E1CRKLL4PwbIupmDDhEJKoGjVY//2bMdZx/Q0TdgwGHiES1v/AyGjTNCPRSIDLIU+xyiMhOMOAQkah2518CAIyP6AWJRCJyNURkLxhwiEhUe07qAs6Egb1EroSI7AkDDhGJ5kJVPU5cvAKpBBjL+TdE1I0sGnAqKyuRmJgIpVIJpVKJxMREVFVVdbrOQw89BIlEYvAaNWqUwRi1Wo0nn3wS/v7+cHd3xx133IHz589bcE+IyBL2nNAdvRka4g1vN2eRqyEie2LRgDNv3jzk5uZi+/bt2L59O3Jzc5GYmNjlerfccgtKSkr0r23bthl8vnjxYmzevBkbNmzAb7/9hitXruC2226DVqu11K4QkQXsbgk446/n6Ski6l5OltrwsWPHsH37dmRlZSEuLg4A8MUXXyA+Ph75+fkYOHBgh+sqFAoEBQW1+1l1dTVWr16NtWvX4uabbwYArFu3DiEhIdi5cyemTp3a/TtDRN2uSduM307pbvA3gQGHiLqZxQJOZmYmlEqlPtwAwKhRo6BUKpGRkdFpwNm1axcCAgLg7e2NCRMm4I033kBAQAAAIDs7GxqNBgkJCfrxvXv3RnR0NDIyMtoNOGq1Gmq1Wv9epVIBADQaDTQaTZf70jrGmLH0J/bNfI7Qu+yzlahpaIK3qxxRge7dtq+O0DtLYe/Mx96Zz5TemdJfiwWc0tJSfSj5q4CAAJSWlna43rRp0zB79myEhoaioKAAr776Km666SZkZ2dDoVCgtLQUzs7O8PHxMVgvMDCww+2uWLECy5Yta7N8x44dcHNzM3qf0tPTjR5Lf2LfzGfPvdtaJAUgRX83NX7c/kO3b9+ee2dp7J352DvzGdO7uro6o7dncsBZunRpu2Hhr/bv3w8A7d7TQhCETu91MXfuXP3fo6OjERsbi9DQUGzduhV33XVXh+t1tt2UlBQkJyfr36tUKoSEhCAhIQFeXl0/2E+j0SA9PR1TpkyBXC7vcjzpsG/mc4Te/fOzLAAqzBk/GNOH9+m27TpC7yyFvTMfe2c+U3rXegbGGCYHnCeeeAL33HNPp2PCwsJw6NAhXLx4sc1nly5dQmBgoNE/Lzg4GKGhoTh58iQAICgoCI2NjaisrDQ4ilNWVobRo0e3uw2FQgGFQtFmuVwuN+kX0dTxpMO+mc9ee1d+RY0/inVfVDdFBVlkH+21dz2BvTMfe2c+Y3pnSm9NDjj+/v7w9+/6fhXx8fGorq7Gvn37MHLkSADA3r17UV1d3WEQaU9FRQXOnTuH4OBgAEBMTAzkcjnS09MxZ84cAEBJSQkOHz6Mt99+29TdISIR/HZSN7k4KtgLAV4uIldDRPbIYpeJR0VF4ZZbbsGCBQuQlZWFrKwsLFiwALfddpvBBOPIyEhs3rwZAHDlyhU899xzyMzMRGFhIXbt2oXbb78d/v7+mDlzJgBAqVQiKSkJzz77LH766Sfk5OTg/vvvx+DBg/VXVRGRdWu9PJxXTxGRpVhskjEAfPXVV3jqqaf0Vzzdcccd+OijjwzG5Ofno7q6GgAgk8nwxx9/YM2aNaiqqkJwcDAmTZqEjRs3wtPzz4fwvf/++3BycsKcOXNQX1+PyZMnIy0tDTKZzJK7Q0TdoEnbjJ+PlwEAJvLxDERkIRYNOL6+vli3bl2nYwRB0P/d1dUVP/74Y5fbdXFxwcqVK7Fy5cprrpGIeta+wsuortfAx02O2FCfrlcgIjIDn0VFRD1qxxHdxQeTowLhJONXEBFZBr9diKjHCIKA9KO6gJMwyPirKYmITMWAQ0Q95miJCsVV9XCRSzEugvNviMhyGHCIqMe0np4aF9ELrs68KICILIcBh4h6DE9PEVFPYcAhoh5x7nIdjpaoIJXoJhgTEVkSAw4R9YjWozexYb7wdXcWuRoisncMOETUI3h6ioh6EgMOEVlcZW0j9hVeBgAkDAoSuRoicgQMOERkcT8fL4O2WUBkkCf6+bmJXQ4ROQAGHCKyuO1HSgEAU3h6ioh6CAMOEVlUdZ0Gu/N1Tw+/9cZgkashIkfBgENEFrX9SAkatc0YGOiJyCAvscshIgfBgENEFrUl9wIA4I6hvUWuhIgcCQMOEVlMmaoBmWcqAAB3DGHAIaKew4BDRBbz/aESCAIwvJ83Qnx59RQR9RwGHCKymO9yiwEAM4b2EbkSInI0DDhEZBGF5bXIO18NmVSC6YN59RQR9SwGHCKyiO/ydJOLRw/wQy9PhcjVEJGjYcAhom4nCAL+y9NTRCQiBhwi6nZHLqhw5lItnJ2kmHoD715MRD2PAYeIut2WlqM3kyMD4OkiF7kaInJEDDhE1K3UTVpsOqgLODOH8fQUEYmDAYeIulX60Yu4XNuIQC8FbooMELscInJQDDhE1K2+3lsEAJgbGwInGb9iiEgc/PYhom5TUF6LjNMVkEiAOSNCxC6HiBwYAw4RdZsN+3VHbyZc3wt9ffhoBiISDwMOEXWLxqZm/N+B8wCAe0f2E7kaInJ0DDhE1C3Sj15ERW0jAjwVmMzJxUQkMgYcIuoW6/e1TC4ewcnFRCQ+fgsR0TUrLK/Fb6fKdZOLYzm5mIjEx4BDRNfs65ajN+MjeiHEl5OLiUh8DDhEdE1UDRr9vW8eiA8VuRoiIh0GHCK6Jl/vLcIVdRMiAjwwaSAnFxORdWDAISKzqZu0+PK3AgDAo+P7QyqViFwREZGORQNOZWUlEhMToVQqoVQqkZiYiKqqqk7XkUgk7b7eeecd/ZiJEye2+fyee+6x5K4QUTu25F5AWY0agV4KzBjKB2sSkfVwsuTG582bh/Pnz2P79u0AgEcffRSJiYn4/vvvO1ynpKTE4P0PP/yApKQk3H333QbLFyxYgOXLl+vfu7q6dmPlRNSV5mYBq/acAQAkjQ2HsxMPCBOR9bBYwDl27Bi2b9+OrKwsxMXFAQC++OILxMfHIz8/HwMHDmx3vaCgIIP3W7ZswaRJk9C/f3+D5W5ubm3GElHP+fl4GU6VXYGnwol3LiYiq2Ox/+TKzMyEUqnUhxsAGDVqFJRKJTIyMozaxsWLF7F161YkJSW1+eyrr76Cv78/brjhBjz33HOoqanpttqJqGuf7zkNAJg3qh88XeQiV0NEZMhiR3BKS0sREND2ioqAgACUlpYatY1///vf8PT0xF133WWw/L777kN4eDiCgoJw+PBhpKSkIC8vD+np6e1uR61WQ61W69+rVCoAgEajgUaj6bKO1jHGjKU/sW/ms/be5RRVYX9hJeQyCRJH9rWqOq29d9aMvTMfe2c+U3pnSn9NDjhLly7FsmXLOh2zf/9+ALoJw1cTBKHd5e358ssvcd9998HFxcVg+YIFC/R/j46ORkREBGJjY3Hw4EEMHz68zXZWrFjRbs07duyAm5vxNyXrKEBR59g381lr7z4+KgUgxXBfLbJ/+1nsctplrb2zBeyd+dg78xnTu7q6OqO3Z3LAeeKJJ7q8YiksLAyHDh3CxYsX23x26dIlBAYGdvlzfv31V+Tn52Pjxo1djh0+fDjkcjlOnjzZbsBJSUlBcnKy/r1KpUJISAgSEhLg5eXV5fY1Gg3S09MxZcoUyOU8FG8s9s181ty7309X4ERmNuQyCVYkjkeIj3Xdudiae2ft2DvzsXfmM6V3rWdgjGFywPH394e/v3+X4+Lj41FdXY19+/Zh5MiRAIC9e/eiuroao0eP7nL91atXIyYmBkOGDOly7JEjR6DRaBAcHNzu5wqFAgqFos1yuVxu0i+iqeNJh30zn7X1rrlZwD/STwEA7osLRf8ApcgVdczaemdL2DvzsXfmM6Z3pvTWYpOMo6KicMstt2DBggXIyspCVlYWFixYgNtuu83gCqrIyEhs3rzZYF2VSoVvvvkG8+fPb7Pd06dPY/ny5Thw4AAKCwuxbds2zJ49G8OGDcOYMWMstTtEBGDb4RL8UVwNd2cZnrjpOrHLISLqkEVvXPHVV19h8ODBSEhIQEJCAm688UasXbvWYEx+fj6qq6sNlm3YsAGCIODee+9ts01nZ2f89NNPmDp1KgYOHIinnnoKCQkJ2LlzJ2QymSV3h8ihabTN+MeOEwCA+eP6w9+j7VFRIiJrYdEb/fn6+mLdunWdjhEEoc2yRx99FI8++mi740NCQrB79+5uqY+IjPefA+dQUF4LP3dnLBjfv+sViIhExFuPElGX6hu1+GDnSQDAEzddBw+FRf/biIjomjHgEFGXPtt9GmU1avT1ccW8ON61mIisHwMOEXXqVNkVfLpLd9fil6ZFQuHEuW5EZP0YcIioQ4IgYMnmP9CobcbEgb1w6+D2b8VARGRtGHCIqEPfZJ/H3oLLcJFL8fcZ0UbfhZyISGwMOETUroorary57RgA4Jmbr0eIr3XdsZiIqDMMOETUrje2HkNVnQZRwV54ZGy42OUQEZmEAYeI2tiVX4Zvc4ohkQAr7hoMuYxfFURkW/itRUQGylQNePY/eQCAB+PDMDTEW9yCiIjMwIBDRHraZgGLN+aiorYRkUGeeGlapNglERGZhQGHiPQ+/uUUMk5XwFUuw0fzhsNFznveEJFtYsAhIgDA3jMVSN2pe5jm63dG47oAD5ErIiIyHwMOEaHiihpPb8hFswDcNbwP7o7pK3ZJRETXhAGHyMHVN2qR9O8DKFU1oH8vd/x9RrTYJRERXTMGHCIHpm0W8PSGHOSeq4LSVY5VibFw55PCicgOMOAQOShBEPD3/x3FjqMX4ewkxT8fjOW8GyKyGww4RA7qn78WIC2jEADw/pyhGBHmK25BRETdiAGHyAF9vbcIb7Q8Z2rJ9CjceiOfEk5E9oUn24kczD9/PYPXt+rCTdLYcMwfx+dMEZH9YcAhchCCIOCjn0/hH+m6e90snDAAL94yEBKJROTKiIi6HwMOkQMQBAFv/5iPT3edBgA8O+V6PHHTdQw3RGS3GHCI7FytugnP/18etv1RCgB45dYozB/XX+SqiIgsiwGHyI4Vltfi0bUHcOLiFchlErxx52DMGREidllERBbHgENkp37JL8PT63OgamhCgKcCn94fg5hQH7HLIiLqEQw4RHamVt2E//fDcazNOgsAiAn1waf3DUeAl4vIlRER9RwGHCI7knGqHC9sOoTzlfUAgAfjQ7Hk1kFwduItr4jIsTDgENmBiitqvLvjBNbvKwIA9PF2xduzbsSY6/xFroyISBwMOEQ2rL5Riy9/L8Cnu07jiroJAJA4KhQvTouEBx+aSUQOjN+ARDaosakZm3POI3XnSZRUNwAAovt44ZVbB2FUfz+RqyMiEh8DDpENUTVo8PXeIvzr9wJcVKkB6E5HPT91IO4Y0htSKW/cR0QEMOAQ2YRjJSr858A5fHPgvP5UVICnAgvG9UdifChc5DKRKyQisi4MOERW6ooGWJNVhM25F3C4WKVfHhHggQXj+2PG0N5QODHYEBG1hwGHyIqcrahF+tGL+PFIKQ4UyiDgOABALpPg5qhAzBkRggkRvXgqioioCww4RCK6XNuIrDMVyDhdjszTFTh9qfYvn0oQ3dsLs2L6YsbQPvBxdxatTiIiW8OAQ9RDGjRa5JfWIO98FfLOVSPvfBVOlV0xGCOTShAX7oubBvpDVnoE988cBblcLlLFRES2y6IB54033sDWrVuRm5sLZ2dnVFVVdbmOIAhYtmwZVq1ahcrKSsTFxeHjjz/GDTfcoB+jVqvx3HPPYf369aivr8fkyZPxySefoG/fvhbcGyLj1DRoUFheh8KKWpy+dAUnLtbgeGkNCstr0Sy0HT8w0BPxA/wQP8APo8L9oHSTQ6PRYNu2Iz1fPBGRnbBowGlsbMTs2bMRHx+P1atXG7XO22+/jffeew9paWm4/vrr8frrr2PKlCnIz8+Hp6cnAGDx4sX4/vvvsWHDBvj5+eHZZ5/FbbfdhuzsbMhknHRJliEIAmobtShTNeBSjRplNWpcVDWgpLoBF6rqcaGqHucr61FR29jhNnzc5LixrzeGhHhjaIgSQ/p6w89D0YN7QUTkGCwacJYtWwYASEtLM2q8IAhITU3FkiVLcNdddwEA/v3vfyMwMBBff/01HnvsMVRXV2P16tVYu3Ytbr75ZgDAunXrEBISgp07d2Lq1KkW2ReyXYIgQKMV0NCkRYNGi/pGLeoatahv+XtNQxNq1U240vKqrtdAVa9Bdcursk6DytpGXK5rRGNTs1E/09/DGWF+7gjzd8fAQE8MDNK9AjwVkEg4QZiIyNKsag5OQUEBSktLkZCQoF+mUCgwYcIEZGRk4LHHHkN2djY0Go3BmN69eyM6OhoZGRkmBZz0o6Vw96jtclxTkxa5FRJIj1yEUzccIWrnLIXh510MEK7awtXjr15dENof37od/Xv98j/XEVr+R4Bw1ed/LhNaVm4WdOsJAJoFQKvV4sgFCS78VgiJVIpmQTe+uVk3VvdegFYQoG3Wvdc2615//Xvrq6nlT422GU2tf2p1f2q0zWhs+XtjUzPUTdqWP3UvbXvnhszk7ixDgJcLenkq0MtTgT7eruitdEFvb1f09nZFqJ8bPF04b4aISExWFXBKS0sBAIGBgQbLAwMDcfbsWf0YZ2dn+Pj4tBnTuv7V1Go11Gq1/n11dTUA4Ok1mZAq3Iyub/UfmUaPpb84kSt2BQZcnaVwdZLBxVkGN7kM7gonuDnL4K6Qwc1ZBi8XObxc5PB0kcHTRQ4fNzm8XZ3h7e4Eb1c53Jw7+8emCY21KlR0nZs7pdFoUFdXh4qKCk4yNhF7Zz72znzsnflM6V1NTQ2Atv/h3h6TA87SpUv1p546sn//fsTGxpq6ab2rD+ELgtDlYf3OxqxYsaLdmos/fcjsGomIiEgcNTU1UCqVnY4xOeA88cQTuOeeezodExYWZupmAQBBQUEAdEdpgoOD9cvLysr0R3WCgoLQ2NiIyspKg6M4ZWVlGD16dLvbTUlJQXJysv59c3MzLl++DD8/P6PmQ6hUKoSEhODcuXPw8vIya98cEftmPvbOfOyd+dg787F35jOld4IgoKamBr179+5yuyYHHH9/f/j7+5u6mlHCw8MRFBSE9PR0DBs2DIDuSqzdu3fjrbfeAgDExMRALpcjPT0dc+bMAQCUlJTg8OHDePvtt9vdrkKhgEJheKWKt7e3yfV5eXnxF9cM7Jv52DvzsXfmY+/Mx96Zz9jedXXkppVF5+AUFRXh8uXLKCoqglarRW5uLgDguuuug4eHBwAgMjISK1aswMyZMyGRSLB48WK8+eabiIiIQEREBN588024ublh3rx5AHQ7lpSUhGeffRZ+fn7w9fXFc889h8GDB+uvqiIiIiLHZtGA87e//Q3//ve/9e9bj8r88ssvmDhxIgAgPz9fP+kXAF544QXU19dj0aJF+hv97dixQ38PHAB4//334eTkhDlz5uhv9JeWlsZ74BAREREACwectLS0Lu+Bc/VMaIlEgqVLl2Lp0qUdruPi4oKVK1di5cqV3VBl1xQKBV577bU2p7moc+yb+dg787F35mPvzMfemc9SvZMIxlxrRURERGRDpGIXQERERNTdGHCIiIjI7jDgEBERkd1hwCEiIiK7w4Bjhq1btyIuLg6urq7w9/fXP/mcjKNWqzF06FBIJBL9vZGoY4WFhUhKSkJ4eDhcXV0xYMAAvPbaa2hsbBS7NKv0ySefIDw8HC4uLoiJicGvv/4qdklWb8WKFRgxYgQ8PT0REBCAO++8E/n5+WKXZXNWrFihv58bGae4uBj3338//Pz84ObmhqFDhyI7O7tbts2AY6JNmzYhMTERDz/8MPLy8vD777/rb0JIxnnhhReMus026Rw/fhzNzc34/PPPceTIEbz//vv47LPP8PLLL4tdmtXZuHEjFi9ejCVLliAnJwfjxo3DtGnTUFRUJHZpVm337t14/PHHkZWVhfT0dDQ1NSEhIQG1tdf41FgHsn//fqxatQo33nij2KXYjMrKSowZMwZyuRw//PADjh49in/84x9mPWmgXQIZTaPRCH369BH++c9/il2Kzdq2bZsQGRkpHDlyRAAg5OTkiF2STXr77beF8PBwscuwOiNHjhQWLlxosCwyMlJ46aWXRKrINpWVlQkAhN27d4tdik2oqakRIiIihPT0dGHChAnC008/LXZJNuHFF18Uxo4da7Ht8wiOCQ4ePIji4mJIpVIMGzYMwcHBmDZtGo4cOSJ2aTbh4sWLWLBgAdauXQs3Nzexy7Fp1dXV8PX1FbsMq9LY2Ijs7GwkJCQYLE9ISEBGRoZIVdmm1rvL83fMOI8//jhuvfVWPi7IRN999x1iY2Mxe/ZsBAQEYNiwYfjiiy+6bfsMOCY4c+YMAGDp0qV45ZVX8L///Q8+Pj6YMGECLl++LHJ11k0QBDz00ENYuHAhYmNjxS7Hpp0+fRorV67EwoULxS7FqpSXl0Or1SIwMNBgeWBgIEpLS0WqyvYIgoDk5GSMHTsW0dHRYpdj9TZs2ICDBw9ixYoVYpdic86cOYNPP/0UERER+PHHH7Fw4UI89dRTWLNmTbdsnwEHusAikUg6fR04cADNzc0AgCVLluDuu+9GTEwM/vWvf0EikeCbb74ReS/EYWzvVq5cCZVKhZSUFLFLthrG9u6vLly4gFtuuQWzZ8/G/PnzRarcukkkEoP3giC0WUYde+KJJ3Do0CGsX79e7FKs3rlz5/D0009j3bp1cHFxEbscm9Pc3Izhw4fjzTffxLBhw/DYY49hwYIF+PTTT7tl+xZ9FpWteOKJJ3DPPfd0OiYsLAw1NTUAgEGDBumXKxQK9O/f32EnMRrbu9dffx1ZWVltnjUSGxuL++67z+ChrI7C2N61unDhAiZNmoT4+HisWrXKwtXZHn9/f8hksjZHa8rKytoc1aH2Pfnkk/juu++wZ88e9O3bV+xyrF52djbKysoQExOjX6bVarFnzx589NFHUKvVfAh0J4KDgw3+fQoAUVFR2LRpU7dsnwEHui9Gf3//LsfFxMRAoVAgPz8fY8eOBQBoNBoUFhYiNDTU0mVaJWN79+GHH+L111/Xv79w4QKmTp2KjRs3Ii4uzpIlWi1jewfoLqWcNGmS/qihVMqDr1dzdnZGTEwM0tPTMXPmTP3y9PR0zJgxQ8TKrJ8gCHjyySexefNm7Nq1C+Hh4WKXZBMmT56MP/74w2DZww8/jMjISLz44osMN10YM2ZMm9sRnDhxotv+fcqAYwIvLy8sXLgQr732GkJCQhAaGop33nkHADB79myRq7Nu/fr1M3jv4eEBABgwYAD/S7ELFy5cwMSJE9GvXz+8++67uHTpkv6zoKAgESuzPsnJyUhMTERsbKz+SFdRURHnK3Xh8ccfx9dff40tW7bA09NTfxRMqVTC1dVV5Oqsl6enZ5t5Su7u7vDz8+P8JSM888wzGD16NN58803MmTMH+/btw6pVq7rtCDUDjoneeecdODk5ITExEfX19YiLi8PPP/8MHx8fsUsjO7Vjxw6cOnUKp06dahMGBUEQqSrrNHfuXFRUVGD58uUoKSlBdHQ0tm3b5rBHWI3VOudh4sSJBsv/9a9/4aGHHur5gsghjBgxAps3b0ZKSgqWL1+O8PBwpKam4r777uuW7UsEfkMSERGRneGJfCIiIrI7DDhERERkdxhwiIiIyO4w4BAREZHdYcAhIiIiu8OAQ0RERHaHAYeIiIjsDgMOERER2R0GHCIiIrI7DDhERERkdxhwiMguFBYWQiKRtHld/XwlInIMfNgmEdmFkJAQlJSU6N+Xlpbi5ptvxvjx40WsiojEwodtEpHdaWhowMSJE9GrVy9s2bIFUikPVhM5Gh7BISK7k5SUhJqaGqSnpzPcEDkoBhwisiuvv/46tm/fjn379sHT01PscohIJDxFRUR2Y9OmTbj33nvxww8/YPLkyWKXQ0QiYsAhIrtw+PBhxMXFITk5GY8//rh+ubOzM3x9fUWsjIjEwIBDRHYhLS0NDz/8cJvlEyZMwK5du3q+ICISFQMOERER2R1eXkBERER2hwGHiIiI7A4DDhEREdkdBhwiIiKyOww4REREZHcYcIiIiMjuMOAQERGR3WHAISIiIrvDgENERER2hwGHiIiI7A4DDhEREdkdBhwiIiKyO/8/+od1efM9u/8AAAAASUVORK5CYII=", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "\"\"\"The sigmoid function (or the logistic curve) is a\n", - "function that takes any real number, z, and outputs a number (0,1).\n", - "It is useful in neural networks for assigning weights on a relative scale.\n", - "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n", - "\n", - "import numpy\n", - "import matplotlib.pyplot as plt\n", - "import math as mt\n", - "\n", - "z = numpy.arange(-5, 5, .1)\n", - "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n", - "sigma = sigma_fn(z)\n", - "\n", - "fig = plt.figure()\n", - "ax = fig.add_subplot(111)\n", - "ax.plot(z, sigma)\n", - "ax.set_ylim([-0.1, 1.1])\n", - "ax.set_xlim([-5,5])\n", - "ax.grid(True)\n", - "ax.set_xlabel('z')\n", - "ax.set_title('sigmoid function')\n", - "\n", - "plt.show()\n", - "\n", - "\"\"\"Step Function\"\"\"\n", - "z = numpy.arange(-5, 5, .02)\n", - "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n", - "step = step_fn(z)\n", - "\n", - "fig = plt.figure()\n", - "ax = fig.add_subplot(111)\n", - "ax.plot(z, step)\n", - "ax.set_ylim([-0.5, 1.5])\n", - "ax.set_xlim([-5,5])\n", - "ax.grid(True)\n", - "ax.set_xlabel('z')\n", - "ax.set_title('step function')\n", - "\n", - "plt.show()\n", - "\n", - "\"\"\"tanh Function\"\"\"\n", - "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n", - "t = numpy.tanh(z)\n", - "\n", - "fig = plt.figure()\n", - "ax = fig.add_subplot(111)\n", - "ax.plot(z, t)\n", - "ax.set_ylim([-1.0, 1.0])\n", - "ax.set_xlim([-2*mt.pi,2*mt.pi])\n", - "ax.grid(True)\n", - "ax.set_xlabel('z')\n", - "ax.set_title('tanh function')\n", - "\n", - "plt.show()" + "cell_type": "markdown", + "id": "503eb7b2", + "metadata": { + "editable": true + }, + "source": [ + "The difference is non-negative definite since each component of the\n", + "matrix product is non-negative definite. \n", + "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below." ] }, { "cell_type": "markdown", - "id": "9c1d64b9", - "metadata": {}, + "id": "1a33763c", + "metadata": { + "editable": true + }, "source": [ - "## Two parameters\n", + "## Deriving OLS from a probability distribution\n", + "\n", + "Our basic assumption when we derived the OLS equations was to assume\n", + "that our output is determined by a given continuous function\n", + "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n", + "distribution with zero mean value and an undetermined variance\n", + "$\\sigma^2$.\n", "\n", - "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\beta$ in our fitting of the Sigmoid function, that is we define probabilities" + "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n", + "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n", + "the design matrix are not stochastic variables, we can assume that the\n", + "probability distribution of our targets is also a normal distribution\n", + "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n", + "single output $y_i$ is given by the Gaussian distribution" ] }, { "cell_type": "markdown", - "id": "d1929423", - "metadata": {}, + "id": "70a645e3", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\begin{align*}\n", - "p(y_i=1|x_i,\\boldsymbol{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n", - "p(y_i=0|x_i,\\boldsymbol{\\beta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\beta}),\n", - "\\end{align*}\n", + "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "1698b9e7", - "metadata": {}, + "id": "7fced9cb", + "metadata": { + "editable": true + }, "source": [ - "where $\\boldsymbol{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$. \n", + "## Independent and Identically Distributed (iid)\n", "\n", - "Note that we used" + "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n", + "We define this distribution as" ] }, { "cell_type": "markdown", - "id": "eff2f862", - "metadata": {}, + "id": "313c05af", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "p(y_i=0\\vert x_i, \\boldsymbol{\\beta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\beta}).\n", + "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n", "$$" ] }, { "cell_type": "markdown", - "id": "640f9f45", - "metadata": {}, + "id": "66eeeef9", + "metadata": { + "editable": true + }, "source": [ - "## Maximum likelihood\n", + "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n", "\n", - "In order to define the total likelihood for all possible outcomes from a \n", - "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n", - "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n", - "We aim thus at maximizing \n", - "the probability of seeing the observed data. We can then approximate the \n", - "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is" + "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have" ] }, { "cell_type": "markdown", - "id": "f94fafba", - "metadata": {}, + "id": "cda5e4d2", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\begin{align*}\n", - "P(\\mathcal{D}|\\boldsymbol{\\beta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\beta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\beta}))\\right]^{1-y_i}\\nonumber \\\\\n", - "\\end{align*}\n", + "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n", "$$" ] }, { "cell_type": "markdown", - "id": "5d457b2e", - "metadata": {}, + "id": "2e6ed5cd", + "metadata": { + "editable": true + }, "source": [ - "from which we obtain the log-likelihood and our **cost/loss** function" + "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n", + "in case we have a simple one-dimensional input and output case" ] }, { "cell_type": "markdown", - "id": "683657ba", - "metadata": {}, + "id": "ba81d29e", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\mathcal{C}(\\boldsymbol{\\beta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\beta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\beta}))\\right]\\right).\n", + "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n", "$$" ] }, { "cell_type": "markdown", - "id": "3d17d95b", - "metadata": {}, + "id": "26e2d548", + "metadata": { + "editable": true + }, "source": [ - "## The cost function rewritten\n", - "\n", - "Reordering the logarithms, we can rewrite the **cost/loss** function as" + "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n", + "We can now rewrite the above probability as" ] }, { "cell_type": "markdown", - "id": "76cd7541", - "metadata": {}, + "id": "0d5ef8ad", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\mathcal{C}(\\boldsymbol{\\beta}) = \\sum_{i=1}^n \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n", + "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "b88061e7", - "metadata": {}, + "id": "b6c5763c", + "metadata": { + "editable": true + }, "source": [ - "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\beta$.\n", - "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that" + "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$." ] }, { "cell_type": "markdown", - "id": "3c95fe37", - "metadata": {}, + "id": "e4afd86f", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\mathcal{C}(\\boldsymbol{\\beta})=-\\sum_{i=1}^n \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n", - "$$" + "## Maximum Likelihood Estimation (MLE)\n", + "\n", + "In statistics, maximum likelihood estimation (MLE) is a method of\n", + "estimating the parameters of an assumed probability distribution,\n", + "given some observed data. This is achieved by maximizing a likelihood\n", + "function so that, under the assumed statistical model, the observed\n", + "data is the most probable. \n", + "\n", + "We will assume here that our events are given by the above Gaussian\n", + "distribution and we will determine the optimal parameters $\\theta$ by\n", + "maximizing the above PDF. However, computing the derivatives of a\n", + "product function is cumbersome and can easily lead to overflow and/or\n", + "underflowproblems, with potentials for loss of numerical precision.\n", + "\n", + "In practice, it is more convenient to maximize the logarithm of the\n", + "PDF because it is a monotonically increasing function of the argument.\n", + "Alternatively, and this will be our option, we will minimize the\n", + "negative of the logarithm since this is a monotonically decreasing\n", + "function.\n", + "\n", + "Note also that maximization/minimization of the logarithm of the PDF\n", + "is equivalent to the maximization/minimization of the function itself." ] }, { "cell_type": "markdown", - "id": "4f573bed", - "metadata": {}, + "id": "03d912b0", + "metadata": { + "editable": true + }, "source": [ - "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n", - "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression." + "## A new Cost Function\n", + "\n", + "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF" ] }, { "cell_type": "markdown", - "id": "08a700a8", - "metadata": {}, + "id": "fef4cb78", + "metadata": { + "editable": true + }, "source": [ - "## Minimizing the cross entropy\n", - "\n", - "The cross entropy is a convex function of the weights $\\boldsymbol{\\beta}$ and,\n", - "therefore, any local minimizer is a global minimizer. \n", - "\n", - "Minimizing this\n", - "cost function with respect to the two parameters $\\beta_0$ and $\\beta_1$ we obtain" + "$$\n", + "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "000125c6", + "metadata": { + "editable": true + }, + "source": [ + "which becomes" ] }, { "cell_type": "markdown", - "id": "9bd6709b", - "metadata": {}, + "id": "4f665607", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\beta_0} = -\\sum_{i=1}^n \\left(y_i -\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right),\n", + "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "98c81b67", - "metadata": {}, + "id": "5f5877fa", + "metadata": { + "editable": true + }, "source": [ - "and" + "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely" ] }, { "cell_type": "markdown", - "id": "5540b76a", - "metadata": {}, + "id": "1c342299", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\beta_1} = -\\sum_{i=1}^n \\left(y_ix_i -x_i\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right).\n", + "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n", "$$" ] }, { "cell_type": "markdown", - "id": "0018d823", - "metadata": {}, + "id": "4b155a17", + "metadata": { + "editable": true + }, "source": [ - "## A more compact expression\n", - "\n", - "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n", - "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n", - "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\beta})$. We can rewrite in a more compact form the first\n", - "derivative of cost function as" + "which leads to the well-known OLS equation for the optimal paramters $\\theta$" ] }, { "cell_type": "markdown", - "id": "ee63f4f9", - "metadata": {}, + "id": "c23eaf84", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", + "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n", "$$" ] }, { "cell_type": "markdown", - "id": "413ff641", - "metadata": {}, + "id": "7699c6f7", + "metadata": { + "editable": true + }, "source": [ - "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", - "$p(y_i\\vert x_i,\\boldsymbol{\\beta})(1-p(y_i\\vert x_i,\\boldsymbol{\\beta})$, we can obtain a compact expression of the second derivative as" + "Next week we will make a similar analysis for Ridge and Lasso regression" ] }, { "cell_type": "markdown", - "id": "337a2c56", - "metadata": {}, + "id": "84c9b69d", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", - "$$" + "## Why resampling methods\n", + "\n", + "Before we proceed, we need to rethink what we have been doing. In our\n", + "eager to fit the data, we have omitted several important elements in\n", + "our regression analysis. In what follows we will\n", + "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n", + "\n", + "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n", + "\n", + "and discuss how to select a given model (one of the difficult parts in machine learning)." ] }, { "cell_type": "markdown", - "id": "8c3e92fe", - "metadata": {}, + "id": "59e6b611", + "metadata": { + "editable": true + }, "source": [ - "## Extending to more predictors\n", + "## Resampling methods\n", + "Resampling methods are an indispensable tool in modern\n", + "statistics. They involve repeatedly drawing samples from a training\n", + "set and refitting a model of interest on each sample in order to\n", + "obtain additional information about the fitted model. For example, in\n", + "order to estimate the variability of a linear regression fit, we can\n", + "repeatedly draw different samples from the training data, fit a linear\n", + "regression to each new sample, and then examine the extent to which\n", + "the resulting fits differ. Such an approach may allow us to obtain\n", + "information that would not be available from fitting the model only\n", + "once using the original training sample.\n", + "\n", + "Two resampling methods are often used in Machine Learning analyses,\n", + "1. The **bootstrap method**\n", "\n", - "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors" + "2. and **Cross-Validation**\n", + "\n", + "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n", + "cross-validation and the bootstrap method." ] }, { "cell_type": "markdown", - "id": "ba84fae7", - "metadata": {}, + "id": "3ea44242", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\log{ \\frac{p(\\boldsymbol{\\beta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\beta}\\boldsymbol{x})}} = \\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p.\n", - "$$" + "## Resampling approaches can be computationally expensive\n", + "\n", + "Resampling approaches can be computationally expensive, because they\n", + "involve fitting the same statistical method multiple times using\n", + "different subsets of the training data. However, due to recent\n", + "advances in computing power, the computational requirements of\n", + "resampling methods generally are not prohibitive. In this chapter, we\n", + "discuss two of the most commonly used resampling methods,\n", + "cross-validation and the bootstrap. Both methods are important tools\n", + "in the practical application of many statistical learning\n", + "procedures. For example, cross-validation can be used to estimate the\n", + "test error associated with a given statistical learning method in\n", + "order to evaluate its performance, or to select the appropriate level\n", + "of flexibility. The process of evaluating a model’s performance is\n", + "known as model assessment, whereas the process of selecting the proper\n", + "level of flexibility for a model is known as model selection. The\n", + "bootstrap is widely used." ] }, { "cell_type": "markdown", - "id": "bddd73d3", - "metadata": {}, + "id": "a98de365", + "metadata": { + "editable": true + }, "source": [ - "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\beta}=[\\beta_0, \\beta_1, \\dots, \\beta_p]$ leading to" + "## Why resampling methods ?\n", + "**Statistical analysis.**\n", + "\n", + "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n", + "\n", + "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n", + "\n", + "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors." ] }, { "cell_type": "markdown", - "id": "fce6aba6", - "metadata": {}, + "id": "2fd2ca6a", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "p(\\boldsymbol{\\beta}\\boldsymbol{x})=\\frac{ \\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}{1+\\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}.\n", - "$$" + "## Statistical analysis\n", + "\n", + "* As in other experiments, many numerical experiments have two classes of errors:\n", + "\n", + " * Statistical errors\n", + "\n", + " * Systematical errors\n", + "\n", + "* Statistical errors can be estimated using standard tools from statistics\n", + "\n", + "* Systematical errors are method specific and must be treated differently from case to case." ] }, { "cell_type": "markdown", - "id": "63325aad", - "metadata": {}, + "id": "87ab1f2b", + "metadata": { + "editable": true + }, "source": [ - "## Including more classes\n", + "## Resampling methods\n", + "\n", + "With all these analytical equations for both the OLS and Ridge\n", + "regression, we will now outline how to assess a given model. This will\n", + "lead to a discussion of the so-called bias-variance tradeoff (see\n", + "below) and so-called resampling methods.\n", + "\n", + "One of the quantities we have discussed as a way to measure errors is\n", + "the mean-squared error (MSE), mainly used for fitting of continuous\n", + "functions. Another choice is the absolute error.\n", "\n", - "Till now we have mainly focused on two classes, the so-called binary\n", - "system. Suppose we wish to extend to $K$ classes. Let us for the sake\n", - "of simplicity assume we have only two predictors. We have then following model" + "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n", + "we discuss the\n", + "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n", + "\n", + "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n", + "\n", + "As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n", + "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n", + "training error reaches a saturation." ] }, { "cell_type": "markdown", - "id": "1c5878f6", - "metadata": {}, + "id": "88ffab6d", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\beta_{10}+\\beta_{11}x_1,\n", - "$$" + "## Resampling methods: Bootstrap\n", + "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n", + "that substitutes computation for more traditional distributional\n", + "assumptions and asymptotic results. Bootstrapping offers a number of\n", + "advantages: \n", + "1. The bootstrap is quite general, although there are some cases in which it fails. \n", + "\n", + "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small. \n", + "\n", + "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n", + "\n", + "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n", + "\n", + "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n", + "\n", + "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**." ] }, { "cell_type": "markdown", - "id": "2c8a1b85", - "metadata": {}, + "id": "96fabf7e", + "metadata": { + "editable": true + }, "source": [ - "and" + "## The Central Limit Theorem\n", + "\n", + "Suppose we have a PDF $p(x)$ from which we generate a series $N$\n", + "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n", + "is viewed as the average of a specific measurement, e.g., throwing \n", + "dice 100 times and then taking the average value, or producing a certain\n", + "amount of random numbers. \n", + "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n", + "which follows. We do the same for $\\mathbb{E}[z]=z$.\n", + "\n", + "If we compute the mean $z$ of $m$ such mean values $x_i$" ] }, { "cell_type": "markdown", - "id": "cced4ec8", - "metadata": {}, + "id": "6e876164", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\beta_{20}+\\beta_{21}x_1,\n", + "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n", "$$" ] }, { "cell_type": "markdown", - "id": "6efd1ce1", - "metadata": {}, + "id": "2b00fa3c", + "metadata": { + "editable": true + }, + "source": [ + "the question we pose is which is the PDF of the new variable $z$." + ] + }, + { + "cell_type": "markdown", + "id": "75d6acad", + "metadata": { + "editable": true + }, "source": [ - "and so on till the class $C=K-1$ class" + "## Finding the Limit\n", + "\n", + "The probability of obtaining an average value $z$ is the product of the \n", + "probabilities of obtaining arbitrary individual mean values $x_i$,\n", + "but with the constraint that the average is $z$. We can express this through\n", + "the following expression" ] }, { "cell_type": "markdown", - "id": "933753b8", - "metadata": {}, + "id": "8b412a9e", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\beta_{(K-1)0}+\\beta_{(K-1)1}x_1,\n", + "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n", + " \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n", "$$" ] }, { "cell_type": "markdown", - "id": "ba94450f", - "metadata": {}, + "id": "3bdb59e7", + "metadata": { + "editable": true + }, "source": [ - "and the model is specified in term of $K-1$ so-called log-odds or\n", - "**logit** transformations." + "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n", + "All measurements that lead to each individual $x_i$ are expected to\n", + "be independent, which in turn means that we can express $\\tilde{p}$ as the \n", + "product of individual $p(x_i)$. The independence assumption is important in the derivation of the central limit theorem." ] }, { "cell_type": "markdown", - "id": "8f174f5d", - "metadata": {}, + "id": "d709f4c1", + "metadata": { + "editable": true + }, "source": [ - "## More classes\n", - "\n", - "In our discussion of neural networks we will encounter the above again\n", - "in terms of a slightly modified function, the so-called **Softmax** function.\n", + "## Rewriting the $\\delta$-function\n", "\n", - "The softmax function is used in various multiclass classification\n", - "methods, such as multinomial logistic regression (also known as\n", - "softmax regression), multiclass linear discriminant analysis, naive\n", - "Bayes classifiers, and artificial neural networks. Specifically, in\n", - "multinomial logistic regression and linear discriminant analysis, the\n", - "input to the function is the result of $K$ distinct linear functions,\n", - "and the predicted probability for the $k$-th class given a sample\n", - "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\beta}$ is (with two\n", - "predictors):" + "If we use the integral expression for the $\\delta$-function" ] }, { "cell_type": "markdown", - "id": "9ba36ed7", - "metadata": {}, + "id": "bf40508f", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\beta_{k0}+\\beta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}}.\n", + "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n", "$$" ] }, { "cell_type": "markdown", - "id": "b5b5ecc6", - "metadata": {}, + "id": "8b2b63fe", + "metadata": { + "editable": true + }, "source": [ - "It is easy to extend to more predictors. The final class is" + "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n", + "we arrive at" ] }, { "cell_type": "markdown", - "id": "e6b33699", - "metadata": {}, + "id": "4c1720db", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}},\n", + "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", + " dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n", + " dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n", "$$" ] }, { "cell_type": "markdown", - "id": "b49a6a23", - "metadata": {}, + "id": "5aba4a1e", + "metadata": { + "editable": true + }, "source": [ - "and they sum to one. Our earlier discussions were all specialized to\n", - "the case with two classes only. It is easy to see from the above that\n", - "what we derived earlier is compatible with these equations.\n", - "\n", - "To find the optimal parameters we would typically use a gradient\n", - "descent method. Newton's method and gradient descent methods are\n", - "discussed in the material on [optimization\n", - "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)." + "with the integral over $x$ resulting in" ] }, { "cell_type": "markdown", - "id": "e9bfd38c", - "metadata": {}, + "id": "00a5fc23", + "metadata": { + "editable": true + }, "source": [ - "## Searching for Optimal Regularization Parameters $\\lambda$\n", - "\n", - "In project 1, when using Ridge and Lasso regression, we end up\n", - "searching for the optimal parameter $\\lambda$ which minimizes our\n", - "selected scores (MSE or $R2$ values for example). The brute force\n", - "approach, as discussed in the code here for Ridge regression, consists\n", - "in evaluating the MSE as function of different $\\lambda$ values.\n", - "Based on these calculations, one tries then to determine the value of the hyperparameter $\\lambda$\n", - "which results in optimal scores (for example the smallest MSE or an $R2=1$)." + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n", + " \\int_{-\\infty}^{\\infty}dxp(x)\n", + " \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n", + "$$" ] }, { - "cell_type": "code", - "execution_count": 2, - "id": "5dc4fd0e", - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkkAAAGwCAYAAAC99fF4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABe3klEQVR4nO3dd1QU198G8GepSxcFKUpVA6KIgIpgTxREgz12RI0maGKNSSQaNYkGY4omsRsVe4k1iRWMgArRWNaKHUURBCwgbWnz/uHr/twACgjMLjyfc/YcZ/bO7DMrOl/u3LkjEQRBABEREREp0RA7ABEREZEqYpFEREREVAIWSUREREQlYJFEREREVAIWSUREREQlYJFEREREVAIWSUREREQl0BI7gLoqKirCgwcPYGRkBIlEInYcIiIiKgNBEPDs2TNYW1tDQ+PVfUUskirowYMHsLGxETsGERERVcC9e/fQsGHDV7ZhkVRBRkZGAJ5/ycbGxiKnISIiorLIyMiAjY2N4jz+KiySKujFJTZjY2MWSURERGqmLENlOHCbiIiIqAQskoiIiIhKwCKJiIiIqAQck0RERK9UWFiI/Px8sWMQlYm2tjY0NTUrZV8skoiIqESCICA5ORlPnz4VOwpRudSpUweWlpZvPI8hiyQiIirRiwKpfv360NfX58S5pPIEQUB2djZSUlIAAFZWVm+0PxZJRERUTGFhoaJAqlevnthxiMpMT08PAJCSkoL69eu/0aU3UQduR0dHIyAgANbW1pBIJNizZ88r2yclJWHo0KFwcnKChoYGJk+eXKzNqlWr0KFDB5iamsLU1BRdu3bFqVOnirVbunQpHBwcIJVK4enpiWPHjlXSURERqb8XY5D09fVFTkJUfi9+bt90LJ2oRVJWVhbc3NywePHiMrWXy+UwNzfHjBkz4ObmVmKbyMhIDBkyBEePHkVsbCxsbW3h6+uLxMRERZtt27Zh8uTJmDFjBs6dO4cOHTrA398fCQkJlXJcREQ1BS+xkTqqrJ9biSAIQqXs6Q1JJBLs3r0bffr0KVP7zp07o2XLlli0aNEr2xUWFsLU1BSLFy/GiBEjAABeXl7w8PDAsmXLFO2aNm2KPn36IDQ0tEyfn5GRARMTE6Snp3PGbSKqcXJzcxEfH6/ocSdSJ6/6+S3P+bvGz5OUnZ2N/Px81K1bFwCQl5eHM2fOwNfXV6mdr68vYmJiSt2PXC5HRkaG0ouIiIhqrhpfJE2fPh0NGjRA165dAQBpaWkoLCyEhYWFUjsLCwskJyeXup/Q0FCYmJgoXjY2NlWam4iIqLwiIyMhkUheOW1DWFgY6tSpU22Z1FmNLpIWLFiALVu2YNeuXcW62/57vVIQhFdewwwJCUF6erride/evSrJTEREFTdy5EhIJBIEBwcXe2/8+PGQSCQYOXKkYl1KSgo+/PBD2NraQldXF5aWlvDz80NsbKyijb29PSQSSbHX/PnzS81x+/ZtDBkyBNbW1pBKpWjYsCF69+6N69evK9qU5Yall72cQ09PD87Ozvj+++/x8qgZHx8fJCUlwcTEpMz7rQolfV8vv17+Oygve3v71w61qSw1dgqAH374Ad9++y0iIiLQokULxXozMzNoamoW6zVKSUkp1rv0Ml1dXejq6lZZ3pclZiRi8M7BCH0nFO1t21fLZxIR1RQ2NjbYunUrFi5cqLgdPDc3F1u2bIGtra1S2/79+yM/Px/r1q2Do6MjHj58iCNHjuDx48dK7b7++muMHTtWaZ2RkVGJn5+Xl4du3brB2dkZu3btgpWVFe7fv4/9+/cjPT39jY7tRY7c3FxERERg3LhxMDY2xocffggA0NHRgaWl5Rt9RmVISkpS/Hnbtm2YNWsWrl27plj34u9F1dXInqTvv/8e33zzDQ4ePIhWrVopvaejowNPT0+Eh4crrQ8PD4ePj091xizVnMg5OJ5wHB3WdsCwXcOQmJH4+o2IiKpBVl5Wqa/cgtwyt83JzylT24rw8PCAra0tdu3apVi3a9cu2NjYwN3dXbHu6dOnOH78OL777jt06dIFdnZ2aNOmDUJCQtCzZ0+lfRoZGcHS0lLpZWBgUOLnX7lyBbdv38bSpUvRtm1b2NnZoV27dpg3bx5at25doWP6bw57e3uMGTMGLVq0wOHDhxXvl3S5LSwsDLa2ttDX10ffvn3x6NGjYvudO3cu6tevDyMjI4wZMwbTp09Hy5YtldqsXbsWTZs2hVQqhbOzM5YuXVpqzpe/JxMTE0gkEqV10dHR8PT0hFQqhaOjI7766isUFBQotp8zZ46id8/a2hoTJ04E8Pymrbt372LKlCmKXqmqJGpPUmZmJm7evKlYjo+Ph0wmQ926dWFra4uQkBAkJiZi/fr1ijYymUyxbWpqKmQyGXR0dODi4gLg+SW2L7/8Eps3b4a9vb2ix8jQ0BCGhoYAgKlTpyIwMBCtWrWCt7c3Vq5ciYSEhBK7Z8Xw7TvfQkOigVVnV2Hzxc3Ye3UvZnSYganeU6GrVT29WUREJTEMNSz1vR5NemDf0H2K5fo/1Ed2fnaJbTvZdULkyEjFsv3P9kjLTivWTphdsRuwR40ahbVr12LYsGEAgDVr1mD06NGIjPzfZ744L+zZswdt27attKsF5ubm0NDQwI4dOzB58uRKe47YywRBQFRUFOLi4tCkSZNS2508eRKjR4/Gt99+i379+uHgwYOYPXu2UptNmzZh3rx5WLp0Kdq1a4etW7fixx9/hIODg6LNqlWrMHv2bCxevBju7u44d+4cxo4dCwMDAwQFBZUr+6FDhzB8+HD88ssv6NChA27duoUPPvgAADB79mzs2LEDCxcuxNatW9GsWTMkJyfj/PnzAJ4Xu25ubvjggw+K9exVCUFER48eFQAUewUFBQmCIAhBQUFCp06dlLYpqb2dnZ3ifTs7uxLbzJ49W2k/S5YsEezs7AQdHR3Bw8NDiIqKKlf29PR0AYCQnp5egSMvmzMPzgg+q30EzIGAORAa/dxI+OvaX1X2eUREL+Tk5AhXrlwRcnJylNa/+P+opFePTT2U2urP0y+1bae1nZTami0wK7FdeQUFBQm9e/cWUlNTBV1dXSE+Pl64c+eOIJVKhdTUVKF3796Kc4wgCMKOHTsEU1NTQSqVCj4+PkJISIhw/vx5pX2+OFcYGBgovY4ePVpqjsWLFwv6+vqCkZGR0KVLF+Hrr78Wbt26pfxdAsLu3bvLfGwv59DW1hYACFKpVDhx4oSizYvz6pMnTwRBEIQhQ4YI3bt3V9rPoEGDBBMTE8Wyl5eX8NFHHym1adeuneDm5qZYtrGxETZv3qzU5ptvvhG8vb1fm3vt2rVKn9ehQwfh22+/VWqzYcMGwcrKShAEQfjxxx+Ft956S8jLyytxf3Z2dsLChQtf+Zml/fwKQvnO36L2JHXu3FlpwNl/hYWFFVv3qvYAcOfOnTJ99vjx4zF+/PgytRWLh5UHjo86js0XN+PT8E9x68ktRN+NRs+3er5+YyKiKpAZklnqe5oayj0mKdNSSm2rIVEe7XFn0p03yvVfZmZm6NmzJ9atWwdBENCzZ0+YmZkVa9e/f3/07NkTx44dQ2xsLA4ePIgFCxbgt99+Uxpc/OmnnxYbbNygQYNSP/+jjz7CiBEjcPToUZw8eRK///47vv32W/zxxx/o1q1bhY/rRY7U1FTMmDEDb7/99iuHisTFxaFv375K67y9vXHw4EHF8rVr14qdD9u0aYO///4bAJCamop79+7h/fffV+q9KSgoqNAA8TNnzuDff//FvHnzFOsKCwuRm5uL7OxsvPfee1i0aBEcHR3RvXt39OjRAwEBAdDSqv6SpcYO3K4pJBIJhrUYhl5OvfBT7E+Y6j1V8V5CegLq6tWFoU7p3d9ERJXJQKfkcTjV2basRo8ejY8//hgAsGTJklLbSaVSdOvWDd26dcOsWbMwZswYzJ49W6koMjMzQ+PGjcv1+UZGRujVqxd69eqFuXPnws/PD3Pnzn2jIulFjsaNG2Pnzp1o3Lgx2rZtq5jm5r9e17HwQkl3fL9QVFQE4PklNy8vL6V2FbmUWFRUhK+++gr9+vUr9p5UKoWNjQ2uXbuG8PBwREREYPz48fj+++8RFRUFbW3tcn/em6iRA7drIiNdI8zuPBtGus/vpigSijBk5xC4LHHB7rjdZf6HQERUW3Tv3h15eXnIy8uDn59fmbdzcXFBVlbFBo2XRiKRwNnZuVL3a2pqigkTJmDatGmlngNcXFzwzz//KK3777KTk1OxZ5yePn1a8WcLCws0aNAAt2/fVhRoL14vj1sqKw8PD1y7dq3Yvho3bgwNjedliZ6eHnr16oVffvkFkZGRiI2NxcWLFwE8vwGrsLCw3J9bEexJUlP3M+4jMSMR9zLuod/2fujZpCd+8f8FjqaOYkcjIlIJmpqaiIuLU/z5vx49eoT33nsPo0ePRosWLWBkZITTp09jwYIF6N27t1LbZ8+eFZs6Rl9fv8THWshkMsyePRuBgYFwcXGBjo4OoqKisGbNGnz++edKbV/csPSyxo0bK240ep2PPvoI3333HXbu3IkBAwYUe3/ixInw8fHBggUL0KdPHxw+fFjpUhsATJgwAWPHjkWrVq3g4+ODbdu24cKFC3B0/N/5ZM6cOZg4cSKMjY3h7+8PuVyO06dP48mTJ5g6dep/P/aVZs2ahXfffRc2NjZ47733oKGhgQsXLuDixYuYO3cuwsLCUFhYCC8vL+jr62PDhg3Q09ODnZ0dgOfzJEVHR2Pw4MHQ1dUt8TJqpXntqCUqUXUM3H6drLws4YuILwTtr7UFzIEgnSsVvon6RsjNzxUtExHVDK8a+KrKXgzcLs3LA7dzc3OF6dOnCx4eHoKJiYmgr68vODk5CTNnzhSys7MV25R2Q9CHH35Y4mekpqYKEydOFJo3by4YGhoKRkZGgqurq/DDDz8IhYWFinYl7RNAqQPCSxuwPHbsWKFZs2ZCYWFhsYHbgiAIq1evFho2bCjo6ekJAQEBwg8//KA0kFoQBOHrr78WzMzMBENDQ2H06NHCxIkThbZt2yq12bRpk9CyZUtBR0dHMDU1FTp27Cjs2rWr1O/6hf8O3BYEQTh48KDg4+Mj6OnpCcbGxkKbNm2ElStXCoIgCLt37xa8vLwEY2NjwcDAQGjbtq0QERGh2DY2NlZo0aKFoKurK5RWxlTWwG2VecCtulGlB9xeTbuK8fvG4+idowCAt+q9hcPDD8Oujp2ouYhIffEBt7Vbt27dYGlpiQ0bNogdpUIq6wG3vNxWAzibOePIiCPYcmkLph6aCgNtAzQ0bih2LCIiUgPZ2dlYvnw5/Pz8oKmpiS1btiAiIqLYpMu1EYukGkIikWCo61D0bNITKVkpiltxc/JzsPHCRoxyHwUtDf51ExGRMolEgv3792Pu3LmQy+VwcnLCzp07S71jrjbhWbOGMZGawET6v3krvjvxHb6K+gpLTy/Fsp7L0LZhWxHTERGRqtHT00NERITYMVQSpwCo4exM7GAqNYUsWQbv1d4I/isYT3Ofih2LiNQEh62SOqqsn1sWSTXcKPdRuPrxVQS5PX+2zoozK+C82BnbLm3jf35EVKoXk/ZlZ5f87DUiVfbi5/ZNJ5/k5bZaoL5BfYT1CcPIliMR/Fcwrj26hsE7B+Nq2lXM7jz79TsgolpHU1MTderUQUrK80eL6OvrV/kT14nelCAIyM7ORkpKCurUqfPGDxdmkVSLdLbvjPPB5/Hdie+w6J9FGOE2QuxIRKTCLC0tAUBRKBGpizp16ih+ft8E50mqIFWaJ6kinsmfKR5xAgDfHvsWXey7wNvGW8RURKSKCgsLkZ+fL3YMojLR1tZ+ZQ8S50mi13q5QDp29xhm/D0DEkgQ3CoY377zLepI64gXjohUiqam5htftiBSRxy4TWhq3hSjWo6CAAHLTi9D0yVNsf3ydg7sJiKiWo1FEsFM3wxreq/B0aCjcKrnhOTMZAzaMQg9N/dE/JN4seMRERGJgkUSKbwY2D2n0xzoaOrgwM0DeHv92ygoKhA7GhERUbVjkURKdLV0MbvzbFwIvoDO9p0xt8tcPs6EiIhqJZ79qEROZk74e8TfSut+v/w7Iu9E4tt3vlV69AkREVFNxJ4kKpVEIlFMHpeTn4NJBydh6emlcF7izIHdRERU47FIojLR09bDpn6b8Fa9txQDu3tt7YX7GffFjkZERFQlWCRRmXVx6ILzwecxq+MsaGto46/rf8FliQuWn16OIqFI7HhERESVikUSlYtUS4qvunyFcx+eQ9uGbfEs7xnG7RuHU4mnxI5GRERUqThwmyqkWf1mOD7qOJb8uwQ3Ht1A24ZtxY5ERERUqdiTRBWmqaGJiV4T8WuPXxXr7qXfQ6ewTjibdFbEZERERG+ORRJVqulHpiP6bjTarGqD6RHTkZOfI3YkIiKiCmGRRJXqJ9+fMKjZIBQKhfjuxHdwW+6GqDtRYsciIiIqNxZJVKksDC2wdcBW7B28F9ZG1rjx+AY6r+uMCfsnICsvS+x4REREZcYiiapEL6deuDL+CsZ6jAUALP53MRb+s1DkVERERGXHIomqjInUBCsDVuLQ8EPo6tgVn3h/InYkIiKiMmORRFXOt5EvwgPDoaetBwAoLCrEqL2j8G/ivyInIyIiKh2LJKp2S/5dgjBZGLxXe+PLv79EXmGe2JGIiIiKYZFE1W6Y6zAMbj4YhUIh5h6bC5/VPrj+6LrYsYiIiJSwSKJqV0+/Hrb034LtA7ajrl5dnEk6A/cV7lh9djUEQRA7HhEREQCRi6To6GgEBATA2toaEokEe/bseWX7pKQkDB06FE5OTtDQ0MDkyZOLtbl8+TL69+8Pe3t7SCQSLFq0qFibOXPmQCKRKL0sLS0r56CozN5r9h7OB59HF/suyM7Pxpg/xyDkSIjYsYiIiACIXCRlZWXBzc0NixcvLlN7uVwOc3NzzJgxA25ubiW2yc7OhqOjI+bPn//KwqdZs2ZISkpSvC5evFihY6A309C4IcIDwzH/nfkw0jHCMNdhYkciIiICIPIDbv39/eHv71/m9vb29vj5558BAGvWrCmxTevWrdG6dWsAwPTp00vdl5aWFnuPVISmhiY+b/85xnqORV29uor1Mfdi4NXAC5oamiKmIyKi2qrWjkm6ceMGrK2t4eDggMGDB+P27duvbC+Xy5GRkaH0osr1coEUey8WncI6wXejLx5mPhQxFRER1Va1skjy8vLC+vXrcejQIaxatQrJycnw8fHBo0ePSt0mNDQUJiYmipeNjU01Jq59HmY9hK6mLv6O/xvuK9wRfTda7EhERFTL1Moiyd/fH/3794erqyu6du2Kffv2AQDWrVtX6jYhISFIT09XvO7du1ddcWulPs598O/Yf+Fi7oKkzCS8ve5tLDixAEVCkdjRiIiolqiVRdJ/GRgYwNXVFTdu3Ci1ja6uLoyNjZVeVLWamjfFqTGnMLzFcBQKhfg84nP03dYX6bnpYkcjIqJagEUSno83iouLg5WVldhR6D8MdAywvs96rHh3BXQ1dfHHtT+w6eImsWMREVEtIOrdbZmZmbh586ZiOT4+HjKZDHXr1oWtrS1CQkKQmJiI9evXK9rIZDLFtqmpqZDJZNDR0YGLiwsAIC8vD1euXFH8OTExETKZDIaGhmjcuDEAYNq0aQgICICtrS1SUlIwd+5cZGRkICgoqJqOnMpDIpHgA88P4GnliQ0XNmBcq3FiRyIiolpAIog4xXFkZCS6dOlSbH1QUBDCwsIwcuRI3LlzB5GRkYr3JBJJsfZ2dna4c+cOAODOnTtwcHAo1qZTp06K/QwePBjR0dFIS0uDubk52rZti2+++UZRaJVFRkYGTExMkJ6ezktvIsnKy8KuuF0Y3mJ4iT8XRERE/1We87eoRZI6Y5EkLkEQMHDHQOy4sgPvu7+PJT2WQFdLV+xYRESk4spz/uaYJFJbPg19oCHRwOpzq+G30Q9Pcp6IHYmIiGoQFkmkliQSCaZ4T8G+oftgpGOEqLtR6LC2AxLSE8SORkRENQSLJFJr3Rt3x7FRx2BtZI3LqZfhvdob55PPix2LiIhqABZJpPbcLN0Q+34sXMxd8ODZA/Tf3h8FRQVixyIiIjXHIolqBFsTWxwfdRw9mvTApn6boKUh6uwWRERUA/BMQjWGqZ4p9g3dp7QuLTsNZvpmIiUiIiJ1xp4kqrH+TfwXjX9pjGX/LhM7ChERqSEWSVRj7b22F+nydIzfPx4/xPwgdhwiIlIzLJKoxvqmyzf4ov0XAIBPwz/F3Oi5IiciIiJ1wiKJaiyJRIJ578zDvLfnAQC+PPolFpxYIHIqIiJSFyySqMb7osMXikLp84jP8fM/P4uciIiI1AGLJKoVvujwBWZ1nAUA2H9zPwqLCkVOREREqo5TAFCtMafzHDiaOmJQ80HQ1NAUOw4REak49iRRrSGRSBDUMghSLSkAQBAE3Hh0Q+RURESkqlgkUa1UJBThk8OfwHWZK6LuRIkdh4iIVBCLJKqVBEFA/NN4yAvl6L21Ny4+vCh2JCIiUjEskqhW0tTQxOZ+m9HBtgPS5enovqk77j69K3YsIiJSISySqNbS09bD3sF70cy8GR48ewD/Tf5Iz00XOxYREakIFklUq5nqmeLg8INoYNQAcWlxGLhjIAqKCsSORUREKoBFEtV6DY0b4o8hf0BfWx/ht8JxNP6o2JGIiEgFcJ4kIgAeVh7Y3G8zBAjo1qib2HGIiEgFsEgi+n+9nXuLHYGIiFQIL7cRleDO0zvw3eCLO0/viB2FiIhEwiKJqATj9o1D+O1w9N/eHzn5OWLHISIiEbBIIirBindXwEzfDGeTzmL8/vEQBEHsSEREVM1YJBGVwNbEFlv7b4WGRANhsjCsOLNC7EhERFTNWCQRleIdx3cw/535AICJBybizIMzIiciIqLqxCKJ6BWm+UxDX+e+yC/Kx6Adg5AhzxA7EhERVRMWSUSvIJFIsLrXatia2MJEasLHlhAR1SKcJ4noNUz1TBERGAFbE1voaumKHYeIiKoJiySiMmhSr4nScl5hHnQ0dURKQ0RE1YGX24jKoaCoALOOzoLPah/kFeaJHYeIiKoQiySicniU/QhL/12KM0ln8HXU12LHISKiKsQiiagcLAwtsPzd5QCA0OOh+Of+PyInIiKiqiJqkRQdHY2AgABYW1tDIpFgz549r2yflJSEoUOHwsnJCRoaGpg8eXKxNpcvX0b//v1hb28PiUSCRYsWlbivpUuXwsHBAVKpFJ6enjh27NibHxDVCgNcBmCY6zAUCUUYsXsEsvKyxI5ERERVQNQiKSsrC25ubli8eHGZ2svlcpibm2PGjBlwc3MrsU12djYcHR0xf/58WFpalthm27ZtmDx5MmbMmIFz586hQ4cO8Pf3R0JCQoWPhWqXX/1/RQOjBrjx+AY+j/hc7DhERFQFJIKKPJRKIpFg9+7d6NOnT5nad+7cGS1btiy1pwgA7O3tMXny5GI9Tl5eXvDw8MCyZcsU65o2bYo+ffogNDS0xH3J5XLI5XLFckZGBmxsbJCeng5jY+MyZaaaJfxWOHw3+gIAIgIj8I7jOyInIiKi18nIyICJiUmZzt+1bkxSXl4ezpw5A19fX6X1vr6+iImJKXW70NBQmJiYKF42NjZVHZVUXLdG3TC+1Xjoa+vjwbMHYschIqJKVuuKpLS0NBQWFsLCwkJpvYWFBZKTk0vdLiQkBOnp6YrXvXv3qjoqqYH5Xefj8vjLCHQLFDsKERFVslo7maREIlFaFgSh2LqX6erqQleXsy2TMiNdIxjpGokdg4iIqkCt60kyMzODpqZmsV6jlJSUYr1LROUReScS/bf3R35hvthRiIioEtS6IklHRweenp4IDw9XWh8eHg4fHx+RUpG6y8rLwnu/v4ddcbvwfcz3YschIqJKIGqRlJmZCZlMBplMBgCIj4+HTCZT3IofEhKCESNGKG3zon1mZiZSU1Mhk8lw5coVxft5eXmKNnl5eUhMTIRMJsPNmzcVbaZOnYrffvsNa9asQVxcHKZMmYKEhAQEBwdX/UFTjWSgY4CFfgsBAN9Ef4PbT26LnIiIiN6UqFMAREZGokuXLsXWBwUFISwsDCNHjsSdO3cQGRmpeK+kcUN2dna4c+cOAODOnTtwcHAo1qZTp05K+1m6dCkWLFiApKQkNG/eHAsXLkTHjh3LnL08txBS7SAIArpt6IYj8Ufw7lvv4s8hf4odiYiI/qM852+VmSdJ3bBIopJcTbuKFstaIL8oH3sH70Uvp15iRyIiopdwniQikTibOeMT708AABMPTER2frbIiYiIqKJYJBFVspkdZ8LWxBZ30+9iy8UtYschIqIKqrXzJBFVFQMdAyzvuRzP8p7hPZf3xI5DREQVxCKJqAr4N/EXOwIREb0hXm4jqmJPc58iLjVO7BhERFROLJKIqtDxhONo8msTDNwxEAVFBWLHISKicmCRRFSFmpk3Q5FQhEspl/Db2d/EjkNEROXAIomoCpnqmWJOpzkAgFlHZyE9N13cQEREVGYskoiqWHCrYDibOSM1OxXzjs0TOw4REZURiySiKqatqY0ffX8EACz6ZxHin8SLnIiIiMqCRRJRNfBv7I9ujt2QX5SPWZGzxI5DRERlwCKJqBpIJBLM7zofmhJNaGlooUgoEjsSERG9BieTJKomHlYeiJ8UDxsTG7GjEBFRGbAniagasUAiIlIfLJKIRHD90XXM/HsmBEEQOwoREZWCl9uIqllWXhbarGqDdHk6PK080bdpX7EjERFRCdiTRFTNDHQM8HGbjwEAM4/ORGFRociJiIioJCySiEQwzWca6kjr4ErqFfx+5Xex4xARUQlYJBGJoI60Dqa0nQIA+Drqa/YmERGpIBZJRCKZ5DUJdaR1EJcWh+2Xt4sdh4iI/oNFEpFITKQm+MT7EwDA19HsTSIiUjUskohENNFrIhrXbYzhrsORX5QvdhwiInoJpwAgEpGxrjGufXwNGhL+vkJEpGr4PzORyFggERGpJv7vTKQCBEHAvuv7MGjHIBQUFYgdh4iIwCKJSCVk52cjaE8Qtl/ezjvdiIhUBIskIhVgoGOgmDdp/vH5fKYbEZEKYJFEpCI+avMRjHSMcDHlIvbd2Cd2HCKiWo9FEpGKqCOtg3GtxgEAQo+HsjeJiEhkLJKIVMgU7ynQ1dRFzL0YHEs4JnYcIqJajUUSkQqxNLTEqJajADzvTSIiIvGwSCJSMZ+2+xSeVp6KYomIiMTBGbeJVIyjqSNOf3Ba7BhERLWeqD1J0dHRCAgIgLW1NSQSCfbs2fPK9klJSRg6dCicnJygoaGByZMnl9hu586dcHFxga6uLlxcXLB7926l9+fMmQOJRKL0srS0rKSjIiIioppA1CIpKysLbm5uWLx4cZnay+VymJubY8aMGXBzcyuxTWxsLAYNGoTAwECcP38egYGBGDhwIE6ePKnUrlmzZkhKSlK8Ll68+MbHQ1SZnsmf4ceYHzHz75liRyEiqpUkgorcZyyRSLB792706dOnTO07d+6Mli1bYtGiRUrrBw0ahIyMDBw4cECxrnv37jA1NcWWLVsAPO9J2rNnD2QyWYXzZmRkwMTEBOnp6TA2Nq7wfohKcyLhBNqvbQ9dTV3cnXwXFoYWYkciIlJ75Tl/17iB27GxsfD19VVa5+fnh5iYGKV1N27cgLW1NRwcHDB48GDcvn37lfuVy+XIyMhQehFVJR8bH3g18IK8UI5lp5eJHYeIqNapcUVScnIyLCyUf+O2sLBAcnKyYtnLywvr16/HoUOHsGrVKiQnJ8PHxwePHj0qdb+hoaEwMTFRvGxsbKrsGIiA572rU72nAgCW/rsUOfk5IiciIqpdalyRBDw/ubxMEASldf7+/ujfvz9cXV3RtWtX7Nv3/BEQ69atK3WfISEhSE9PV7zu3btXNeGJXtKvaT/YmdghNTsVmy5uEjsOEVGtUuOKJEtLS6VeIwBISUkp1rv0MgMDA7i6uuLGjRulttHV1YWxsbHSi6iqaWloYZLXJADAT7E/oUgoEjkREVHtUeOKJG9vb4SHhyutO3z4MHx8fErdRi6XIy4uDlZWVlUdj6jc3vd4H0Y6RohLi8Ohm4fEjkNEVGuIOplkZmYmbt68qViOj4+HTCZD3bp1YWtri5CQECQmJmL9+vWKNi/uSMvMzERqaipkMhl0dHTg4uICAJg0aRI6duyI7777Dr1798bevXsRERGB48ePK/Yxbdo0BAQEwNbWFikpKZg7dy4yMjIQFBRUPQdOVA7GusYY12ocEjISYGPCsXBERNVF1CkAIiMj0aVLl2Lrg4KCEBYWhpEjR+LOnTuIjIxUvPff8UYAYGdnhzt37iiWd+zYgZkzZ+L27dto1KgR5s2bh379+ineHzx4MKKjo5GWlgZzc3O0bdsW33zzjaLQKgtOAUDV6b/j6oiIqGLKc/5WmXmS1A2LJCIiIvVTq+dJIqrJrqVdw/h945H0LEnsKERENR6LJCI1MubPMVh2ehlWnFkhdhQiohqPRRKRGvm49ccAgBVnViCvME/kNERENRuLJCI10q9pP1gZWiE5Mxk7ruwQOw4RUY3GIolIjWhraiO4VTAA4NdTv4qchoioZmORRKRmPvD8ANoa2vjn/j84/eC02HGIiGosFklEasbS0BIDmw0EwN4kIqKqxCKJSA1NaDMBZvpmcKzjKHYUIqIaS9THkhBRxXg19ML9Kfehq6UrdhQiohqLPUlEaooFEhFR1WKRRKTGioQiHLx5EMcTjr++MRERlQuLJCI19mPMj/Df5I+Zf88UOwoRUY3DIolIjQ1xHQINiQai7kYhLjVO7DhERDUKiyQiNdbQuCEC3goAAD7PjYiokrFIIlJzL2bgXnd+HbLzs0VOQ0RUc7BIIlJzvo18YV/HHk9zn2L75e1ixyEiqjFYJBGpOQ2JBj70/BAAsPz0cpHTEBHVHCySiGqAUS1HQVtDG1n5WXia+1TsOERENQJn3CaqASwMLXBx3EW8Ve8tSCQSseMQEdUILJKIaggnMyexIxAR1Sjluty2YMEC5OTkKJajo6Mhl8sVy8+ePcP48eMrLx0RlVtmXibnTCIiqgQSQRCEsjbW1NREUlIS6tevDwAwNjaGTCaDo+PzJ5E/fPgQ1tbWKCwsrJq0KiQjIwMmJiZIT0+HsbGx2HGIAABH44+iz7Y+cKjjgHMfnuOlNyKi/yjP+btcPUn/rafKUV8RUTVoadkSeYV5OP/wPE4mnhQ7DhGRWuPdbUQ1iKmeKQY3HwyA0wEQEb0pFklENUyw5/MZuLdd3obHOY9FTkNEpL7KfXfbb7/9BkNDQwBAQUEBwsLCYGZmBuD5wG0iElebBm3gZuGG8w/PY+OFjZjoNVHsSEREaqlcA7ft7e3LNBA0Pj7+jUKpAw7cJlW25NQSfHzgYzSv3xwXgi9wADcR0f8rz/m7XEUS/Q+LJFJlT3KewPona+QW5OLK+Ctoat5U7EhERCqhPOdvTiZJVAOZ6pliQ98N8LTyhIOpg9hxiIjUUrkGbp88eRIHDhxQWrd+/Xo4ODigfv36+OCDD5QmlyQi8QxwGcACiYjoDZSrSJozZw4uXLigWL548SLef/99dO3aFdOnT8eff/6J0NDQSg9JRG+moKhA7AhERGqnXEWSTCbDO++8o1jeunUrvLy8sGrVKkydOhW//PILtm/fXukhiahibjy6gV5beuHtdW+LHYWISO2Ua0zSkydPYGFhoViOiopC9+7dFcutW7fGvXv3Ki8dEb0RI10jHLh5AAVFBbicchnN6jcTOxIRkdooV0+ShYWF4vb+vLw8nD17Ft7e3or3nz17Bm1t7cpNSEQVZmloiYC3AgAAv539TeQ0RETqpVxFUvfu3TF9+nQcO3YMISEh0NfXR4cOHRTvX7hwAY0aNSrz/qKjoxEQEABra2tIJBLs2bPnle2TkpIwdOhQODk5QUNDA5MnTy6x3c6dO+Hi4gJdXV24uLhg9+7dxdosXboUDg4OkEql8PT0xLFjx8qcm0idjPEYAwBYf2E95AW8sYKIqKzKVSTNnTsXmpqa6NSpE1atWoWVK1dCR0dH8f6aNWvg6+tb5v1lZWXBzc0NixcvLlN7uVwOc3NzzJgxA25ubiW2iY2NxaBBgxAYGIjz588jMDAQAwcOxMmT/3vY57Zt2zB58mTMmDED586dQ4cOHeDv74+EhIQyZydSF36N/NDAqAEe5zzGnqt7xI5DRKQ2KjSZZHp6OgwNDaGpqam0/vHjxzAyMqrQJTeJRILdu3ejT58+ZWrfuXNntGzZEosWLVJaP2jQIGRkZChNVdC9e3eYmppiy5YtAAAvLy94eHhg2bJlijZNmzZFnz59ynx3HieTJHUy6+gsfBP9Dbo6dkV4YLjYcYiIRFNlk0mOHj26TO3WrFlTnt1WqtjYWEyZMkVpnZ+fn6KYysvLw5kzZzB9+nSlNr6+voiJiSl1v3K5XGkOqIyMjMoLTVTFRruPxtzouYi4HYHbT27D0dRR7EhERCqvXEVSWFgY7Ozs4O7uDlV9mklycrLSHXjA8wHnycnJAIC0tDQUFha+sk1JQkND8dVXX1V+YKJqYF/HHhPaTICzmTPM9c3FjkNEpBbKVSQFBwdj69atuH37NkaPHo3hw4ejbt26VZWtwv77ME9BEIqtK0ubl4WEhGDq1KmK5YyMDNjY2FRCWqLq8bP/z2JHICJSK+UauL106VIkJSXh888/x59//gkbGxsMHDgQhw4dUpmeJUtLy2I9QikpKYqeIzMzM2hqar6yTUl0dXVhbGys9CIiIqKaq1xFEvC8WBgyZAjCw8Nx5coVNGvWDOPHj4ednR0yMzOrImO5eHt7IzxceWDq4cOH4ePjAwDQ0dGBp6dnsTbh4eGKNkQ1VYY8A8tPL8eMIzPEjkJEpPLKdbntvyQSCSQSCQRBQFFRUbm3z8zMxM2bNxXL8fHxkMlkqFu3LmxtbRESEoLExESsX79e0UYmkym2TU1NhUwmg46ODlxcXAAAkyZNQseOHfHdd9+hd+/e2Lt3LyIiInD8+HHFPqZOnYrAwEC0atUK3t7eWLlyJRISEhAcHFzBb4JIPSSkJ2DcvnHQ0tDCpLaTUN+gvtiRiIhUl1BOubm5wubNm4WuXbsKUqlUGDBggLBv3z6hsLCwvLsSjh49KgAo9goKChIEQRCCgoKETp06KW1TUns7OzulNr///rvg5OQkaGtrC87OzsLOnTuLffaSJUsEOzs7QUdHR/Dw8BCioqLKlT09PV0AIKSnp5drOyKxtV7ZWsAcCD/F/CR2FCKialee83e55kkaP348tm7dCltbW4waNQrDhw9HvXr1KrtuUwucJ4nU1fLTyzFu3zg0r98cF4IvvPKGBSKimqY85+9yFUkaGhqwtbWFu7v7K/9j3bVrV9nTqikWSaSunuY+hdWPVsgtyMW/Y/9FK+tWYkciIqo2VTaZ5IgRI/hbJ5GaqyOtg35N+2Hzxc1Yc24NiyQiolJU6LEkxJ4kUm9Hbh9B1w1dUUdaBw+mPoCetp7YkYiIqkV5zt/lngKAiNRfF4cucDR1RDubdkjLThM7DhGRSnqjKQCISD1pSDRwefxlSLWkYkchIlJZ7EkiqqVYIBERvRqLJKJa7u7Tu4i+Gy12DCIilcMiiagWC78VDoefHTByz0gUCeWfNZ+IqCZjkURUi7WzbQcjXSPEP41H1J0oseMQEakUFklEtZi+tj4GNxsMAFgrWytyGiIi1cIiiaiWG+0+GgCw48oOpOemi5yGiEh1sEgiquXaNGiDpmZNkVOQg22Xt4kdh4hIZbBIIqrlJBKJojeJl9yIiP6HRRIRYXiL4dCUaOJSyiU8zHwodhwiIpXAGbeJCJaGljgw7AC8bbxhqGModhwiIpXAIomIAADdGnUTOwIRkUrh5TYiUiIIAjLzMsWOQUQkOhZJRKQQfTcabsvdMGL3CLGjEBGJjpfbiEihrl5dXEy5iLi0OKRkpaC+QX2xIxERiYY9SUSk0Lx+c7Rp0AYFRQXYdGGT2HGIiETFIomIlIxqOQoAsPrcagiCIHIaIiLxsEgiIiWDmw+GVEuKy6mXcfrBabHjEBGJhkUSESmpI62Dfk37AeAM3ERUu7FIIqJiRrd8/piSzRc3Iyc/R+Q0RETi4N1tRFRMF4cu+NDzQ/R26g0dTR2x4xARiYJFEhEVoyHRwPJ3l4sdg4hIVLzcRkRERFQCFklEVKrbT24jJCIES04tETsKEVG1Y5FERKWKuReD+Sfm48fYH1EkFIkdh4ioWrFIIqJS9WvaD8a6xoh/Go/IO5FixyEiqlYskoioVPra+hjSfAiA5zNwExHVJiySiOiVxniMAQDsvLITT3KeiJyGiKj6sEgiolfytPJEC4sWkBfKsfniZrHjEBFVGxZJRPRKEokE77u/DwD47dxvIqchIqo+ohZJ0dHRCAgIgLW1NSQSCfbs2fPabaKiouDp6QmpVApHR0csX6484V1+fj6+/vprNGrUCFKpFG5ubjh48KBSmzlz5kAikSi9LC0tK/PQiGqUYa7DYGloiXY27SAvkIsdh4ioWog643ZWVhbc3NwwatQo9O/f/7Xt4+Pj0aNHD4wdOxYbN27EiRMnMH78eJibmyu2nzlzJjZu3IhVq1bB2dkZhw4dQt++fRETEwN3d3fFvpo1a4aIiAjFsqamZuUfIFENUU+/Hu5PuQ9NDf47IaLaQ9Qiyd/fH/7+/mVuv3z5ctja2mLRokUAgKZNm+L06dP44YcfFEXShg0bMGPGDPTo0QMAMG7cOBw6dAg//vgjNm7cqNiXlpZWuXqP5HI55PL//QadkZFR5m2JagIWSERU26jVmKTY2Fj4+voqrfPz88Pp06eRn58P4HkxI5VKldro6enh+PHjSutu3LgBa2trODg4YPDgwbh9+/YrPzs0NBQmJiaKl42NTSUcEZF6EQQBx+4eQ8y9GLGjEBFVObUqkpKTk2FhYaG0zsLCAgUFBUhLSwPwvGj66aefcOPGDRQVFSE8PBx79+5FUlKSYhsvLy+sX78ehw4dwqpVq5CcnAwfHx88evSo1M8OCQlBenq64nXv3r2qOUgiFfbLyV/QMawjZvw9Q+woRERVTq2KJOD5nTYvEwRBaf3PP/+MJk2awNnZGTo6Ovj4448xatQopTFH/v7+6N+/P1xdXdG1a1fs27cPALBu3bpSP1dXVxfGxsZKL6Lapl/TfpBAgsg7kbj5+KbYcYiIqpRaFUmWlpZITk5WWpeSkgItLS3Uq1cPAGBubo49e/YgKysLd+/exdWrV2FoaAgHB4dS92tgYABXV1fcuHGjSvMTqTsbExv4NfYDAKw5t0bkNEREVUutiiRvb2+Eh4crrTt8+DBatWoFbW1tpfVSqRQNGjRAQUEBdu7cid69e5e6X7lcjri4OFhZWVVJbqKa5MWcSWGyMBQUFYichoio6ohaJGVmZkImk0EmkwF4fou/TCZDQkICgOfjgEaMGKFoHxwcjLt372Lq1KmIi4vDmjVrsHr1akybNk3R5uTJk9i1axdu376NY8eOoXv37igqKsJnn32maDNt2jRERUUhPj4eJ0+exIABA5CRkYGgoKDqOXAiNdbLqRfM9M2QlJmEgzcPvn4DIiI1JWqRdPr0abi7uyvmL5o6dSrc3d0xa9YsAEBSUpKiYAIABwcH7N+/H5GRkWjZsiW++eYb/PLLL0pzLOXm5mLmzJlwcXFB37590aBBAxw/fhx16tRRtLl//z6GDBkCJycn9OvXDzo6Ovjnn39gZ2dXPQdOpMZ0NHUwosXzX15+O8sZuImo5pIIL0Y+U7lkZGTAxMQE6enpHMRNtc6V1CtotrQZ3qr3Fi6OuwgdTR2xIxERlUl5zt+iTiZJROrJxdwFJ8ecRCvrVtCQqNXQRiKiMmORREQV0qZBG7EjEBFVKf4KSERvRF4gR2pWqtgxiIgqHYskIqqwbZe2ocFPDTAtfNrrGxMRqRkWSURUYTYmNniU8wi/X/4d6bnpYschIqpULJKIqMK8G3qjqVlT5BTkYPPFzWLHISKqVCySiKjCJBIJPvD8AACw4swKcEYRIqpJWCQR0RsZ4TYCupq6OP/wPE4lnhI7DhFRpWGRRERvpK5eXQxqPggAsPzMcpHTEBFVHhZJRPTGgj2DATy/2y0zL1PkNERElYOTSRLRG2vbsC2+fftb9HHuA0MdQ7HjEBFVChZJRPTGJBIJQjqEiB2DiKhS8XIbEVU63uVGRDUBiyQiqjRXUq9gyM4heP+P98WOQkT0xlgkEVGlycnPwdZLW7Hp4iakZaeJHYeI6I2wSCKiSuNp7QlPK0/kFeYhTBYmdhwiojfCIomIKlVwq+fTAaw4swJFQpHIaYiIKo5FEhFVqsHNB8NIxwg3H9/E0fijYschIqowFklEVKkMdQwR2CIQALDs9DKR0xARVRyLJCKqdC8uue25ugf3M+6LnIaIqGI4mSQRVTpXC1cEuQWhhUULGOsaix2HiKhCWCQRUZUI6xMmdgQiojfCy21EREREJWCRRERVRl4gx8YLG/HBnx+IHYWIqNxYJBFRlUmXp+P9P97HqrOrcCrxlNhxiIjKhUUSEVWZ+gb1MajZIADA4lOLRU5DRFQ+LJKIqEp93OZjAMC2y9uQkpUichoiorJjkUREVapNgzZo06AN8grzsOrMKrHjEBGVGacAIKIq93HrjzEicQSWnV6Gz9p9Bm1NbbEjEVWJvMI8JKQnoLCoEE5mTgCAIqEI4/eNR3Z+NvKL8lFQVIDCokJItaTQ09KDs5kzPm33qWIf0XejoaelBysjK1gZWkFTQ1Osw6n1JIIgCGKHUEcZGRkwMTFBeno6jI05WR7Rq8gL5LBZaIPU7FRsG7ANA5sNFDsS0Rt7mPkQ/z74F2eTzuJs0lmcf3geCekJKBKK8O5b7+LPIX8q2hqFGiEzL7PE/XS064iokVGKZasfrZCcmQwA0NHUgX0dezQybYRGpo3gYeWBUe6jqvbAarjynL/Zk0REVU5XSxfjWo1DdEI0rAytxI5DVCGCIEAikQAAcgtyYbfIDvJCebF2Ui0pNCTKo1m+6fIN8gvzoa2pDW0NbWhINJBbkIucghw0MGqg9Bm2JrbQlGjiYdZD5BXm4fqj67j+6DoAoINtB6Ui6f2976O+QX20btAaXg280MC4AajysCepgtiTRFQ+hUWFvGxAaic9Nx1/Xf8LO+J2ID03HX8H/a147+11byMlKwUeVh7wsPJAS8uWcKrnBEtDS0Ux9SYKiwpxL+Mebj2+hdtPbuPm45twNHXEh60+BABk5mXCONQYAv53Gm9g1ABeDb3g1cALbzu8jVbWrd44R01TnvM3i6QKYpFERFQzCYKAYwnHsPLMSuy4skOpt+j+lPuK3poXPUNiyczLxIbzG3A26SxOPTiFSymXUCQUKd4PcgtSPB6oSCjC4VuH4WPjU+ufp1ie87eod7dFR0cjICAA1tbWkEgk2LNnz2u3iYqKgqenJ6RSKRwdHbF8+XKl9/Pz8/H111+jUaNGkEqlcHNzw8GDB4vtZ+nSpXBwcIBUKoWnpyeOHTtWWYdFRK+QkpWCryK/wu0nt8WOQlTMtkvb0HRJU3QK64RNFzdBXihHU7OmmNVxFi4EX4C1kbWirdg3IBjqGGJc63FY1WsVzgefR/r0dEQGRWJB1wXo37Q/fBv5KtpeSrkE/03+MP3OFJ4rPTHl4BTsjtuNtOw0EY9A9Yk6JikrKwtubm4YNWoU+vfv/9r28fHx6NGjB8aOHYuNGzfixIkTGD9+PMzNzRXbz5w5Exs3bsSqVavg7OyMQ4cOoW/fvoiJiYG7uzsAYNu2bZg8eTKWLl2Kdu3aYcWKFfD398eVK1dga2tbpcdMVNuN+WMM/rz+Jx7nPMbP/j+LHYdISUFRAa49ugZDHUMMbT4UYz3HwtPKs1Iun1U1Qx1DdLLvhE72nYq9l5qVikamjXDryS3FQPNFJxcBAJqZN0PoO6EIcAqo5sSqT2Uut0kkEuzevRt9+vQptc3nn3+OP/74A3FxcYp1wcHBOH/+PGJjYwEA1tbWmDFjBj766CNFmz59+sDQ0BAbN24EAHh5ecHDwwPLli1TtGnatCn69OmD0NDQMuXl5Taiigm/FQ7fjb4w0DbAvSn3YKpnKnYkqqWe5j7FghMLYGdipxjnU1BUgPXn1+M9l/dgpGskcsLKl5iRiGMJxxB9NxrRd6NxOfUyAODw8MPo1qgbACDqThS2Xtr6vOCy6wQro5p1s0WNvbstNjYWvr6+Suv8/PywevVq5OfnQ1tbG3K5HFKpVKmNnp4ejh8/DgDIy8vDmTNnMH36dKU2vr6+iImJKfWz5XI55PL/XZfOyMh408MhqpW6OnZF8/rNcSnlEladXYXP2n0mdiSqZeQFciz9dynmHpuLxzmP0dC4IUa5j4KOpg60NLQw2n202BGrTAPjBhjcfDAGNx8M4HkP07GEY/Cx8VG0+fP6n1h+ZjmWn3k+nOWtem+hs11nRdFUm+6gU6sZt5OTk2FhYaG0zsLCAgUFBUhLe35d1c/PDz/99BNu3LiBoqIihIeHY+/evUhKSgIApKWlobCwsMT9JCcnl/rZoaGhMDExUbxsbGwq+eiIageJRIKpbacCAH45+QvyCvNETkS1yZ/X/oTzEmdMPTwVj3Meo6lZUyz2Xwxtjdo5wam5gTn6Ne0HAx0DxbqAtwIw2Wsy3C3dIYEE1x9dx8qzKzFs1zA0XNgQtx7fUrTNLcgVI3a1UasiCUCx68Ivrha+WP/zzz+jSZMmcHZ2ho6ODj7++GOMGjUKmpqar93Pq645h4SEID09XfG6d+9eZRwOUa001HUoLAwskPgsEb9f/l3sOFQL3H16F7239kavrb1w5+kdWBtZ47eA33Bh3AX0du6tFmOOqksn+05Y2H0hzn54Fo8+e4Q/Bv+BT7w/gaeVJxoaN4SjqaOibeDuQDj+7IhRe0dhnWwd7jy9AxUZxVMp1Opym6WlZbHenpSUFGhpaaFevXoAAHNzc+zZswe5ubl49OgRrK2tMX36dDg4OAAAzMzMoKmpWeJ+/tu79DJdXV3o6upW8hER1U66Wrr4uM3H+PLol/g+5nsMdR3KkxRVqfsZ9/HHtT+gpaGFad7TMLPjTKXeEyqZqZ4pApwCFIO65QVyxb9VQRAQcy8GD549QLwsHmGyMACAtZE12jZsi052nTDRa6JY0SuFWvUkeXt7Izw8XGnd4cOH0apVK2hrK3eVSqVSNGjQAAUFBdi5cyd69+4NANDR0YGnp2ex/YSHh8PHxwdEVD3GtRqHunp14dXACzkFOWLHoRooJ/9/P1ftbNvhR98fcT74PEK7hrJAqiBdrf91FkgkElz96CoODDuA6e2mo23DttDS0MKDZw+wK24Xtl7aqrTt3Oi5WH9+Pa6lXVOaz0mViXp3W2ZmJm7evAkAcHd3x08//YQuXbqgbt26sLW1RUhICBITE7F+/XoAz6cAaN68OT788EOMHTsWsbGxCA4OxpYtWxRTAJw8eRKJiYlo2bIlEhMTMWfOHMTHx+Ps2bOoU6cOgOdTAAQGBmL58uXw9vbGypUrsWrVKly+fBl2dnZlys6724jeXHZ+NvS19cWOQTWMIAhYfno5vor6CjHvxyhdHqKqlZ2fjTMPzuCf+//A3MAcI1uOBPB84kuT+SaK4shIxwhulm5wt3SHu6U7vG284WzmXC0Zy3X+FkR09OhRAUCxV1BQkCAIghAUFCR06tRJaZvIyEjB3d1d0NHREezt7YVly5YVe79p06aCrq6uUK9ePSEwMFBITEws9tlLliwR7OzsBB0dHcHDw0OIiooqV/b09HQBgJCenl6u7YiIqOo8yn4k9NrSS8AcCJgD4ZNDn4gdiQRBSM1KFT459InQfk17QTpXqvj7efEasXuEoq28QC4sjF0o/HPvnyrJUp7zt8rMk6Ru2JNEVHnOJZ3DwZsHEdIhROwopMZi78Vi8M7BSEhPgI6mDhZ0XYAJXhOKPWyWxFVQVICraVchS5bhXNI5nEs+hyHNh2Cs51gAz2cHd13mioHNBmLbgG2V/vk1dp4kIqp5kp4lofWq1igUCuHX2A8eVh5iRyI1IwgCfor9CdOPTEdBUQGa1G2C7e9tR0vLlmJHoxJoaWihef3maF6/OYa3GF7sfUEQ0Ne5Lzrbda7+cP/B8pqIRGVlZIVBzQcBAOYfny9yGlJHq8+txrTwaSgoKsDg5oNx+oPTLJDUmKuFK3YN2oVxrceJHYVFEhGJb3q75zPg77iyA9fSromchtRNYItAdLTriCU9lmBzv821/in3VHlYJBGR6FwtXNHLqRcECPjuxHdixyE1cPHhRRQWFQJ4flv60aCjGN96POfbokrFIomIVEJI++eDtjdc2ICE9ASR05Aq23hhI1qtaoUvjnyhWMfB2VQV+FNFRCqhbcO26GLfBQVFBQg9Fip2HFJBgiBg/vH5CNwdiLzCPNx8clPRm0RUFVgkEZHKmNN5DiwNLdHCooXYUUjFFBYV4qP9HyHkyPMex2ne0/D7e79DU0PzNVsSVRynACAildHRriPuTLqj9OgDouz8bAzdORR7r+2FBBIs6r5I7Z8JRuqBRRIRqRQWSPSyIqEIPTb1QNTdKOhq6mJTv03o79Jf7FhUS/ByGxGpnCKhCNsubcOcyDliRyGRaUg0MK7VOJjpmyFiRAQLJKpWfCxJBfGxJERVR5Ysg/sKd2hINBD3URzeqveW2JFIZBnyDM5/RJWiPOdv9iQRkcppadkSAW8FoEgowldRX4kdh6pZ/JN4vLP+HdzPuK9YxwKJxMAiiYhU0lednxdHWy5uwfnk8yKnoepyNe0qOqztgL/j/0bwX8Fix6FajkUSEakkdyt3DGo2CAIEfB7xudhxqBpcfHgRHdd2ROKzRLiYu2BVwCqxI1EtxyKJiFTWvLfnQVtDG4duHcKR20fEjkNV6ErqFbyz/h2kZqfCw8oDUSOjYGVkJXYsquVYJBGRympUtxGCWz2/5PJZxGfgfSY10/VH15UKpIjACJjpm4kdi4jzJBGRavuy45e4mHIRMzvM5MNLa6jx+8YjOTMZLSxa4PDwwzDVMxU7EhEATgFQYZwCgIiociQ9S8KEAxOwtOdS1DeoL3YcquHKc/5mTxIRqRV5gZyzctcAeYV50NHUAQBYGVlhx8AdIiciKo5jkohILeQV5mFe9Dw0+qURUrNSxY5DbyA5MxnuK9yx4fwGsaMQvRKLJCJSC5oSTeyM24nEZ4mY+fdMseNQBaVmpeKd9e/gSuoVzIqchZz8HLEjEZWKRRIRqQVNDU384v8LAGDV2VU4m3RW5ERUXo9zHqPbhm64knoFDYwaICIwAnraemLHIioViyQiUhvtbdtjSPMhECBg4oGJnBJAjTzNfQrfDb44//A8LAwscGTEETSq20jsWESvxCKJiNTKgm4LoK+tjxP3TmDLpS1ix6EyeCZ/Bv9N/jiTdAZm+mY4MuIInMycxI5F9FoskohIrTQ0bogv2n8BAPgs/DNk5mWKnIheZ/359fjn/j8wlZoiIjACzeo3EzsSUZmwSCIitfOJzydwNHVEanYqYu7FiB2HXmN86/GY3Wk2wgPD4WbpJnYcojLjZJIVxMkkicQVey8WdfXq8rKNipIXyAGAc1qRyuFkkkRU43nbeIsdgUqRV5iHAb8PQJFQhJ0Dd0KqJRU7ElGF8HIbEam9k/dPcmJCFZFfmI8hO4fgr+t/4e/4v3Hx4UWxIxFVGHuSiEitnbx/Et6rvSHVkqK9bXs4mDqIHanWKiwqxIg9I7Arbhd0NHWwZ9AetG7QWuxYRBXGniQiUmttGrRBZ/vOyCnIwdg/x6JIKBI7Uq1UJBRh9B+jsfXSVmhraGPnwJ3wa+wndiyiN8IiiYjUmkQiwYp3V0BPSw9H4o9g6b9LxY5U6xQJRfjgzw+w/vx6aEo0sW3ANrz71rtixyJ6YyySiEjtNanXBAu6LQDwfO6ka2nXRE5Uu9x6fAvbL2+HhkQDm/ptQt+mfcWORFQpWCQRUY0wvvV4dHPshpyCHATuDkR+Yb7YkWqNJvWa4HDgYWzsuxGDmg8SOw5RpRG1SIqOjkZAQACsra0hkUiwZ8+e124TFRUFT09PSKVSODo6Yvny5cXaLFq0CE5OTtDT04ONjQ2mTJmC3Nxcxftz5syBRCJRellaWlbmoRFRNdOQaGBN7zWoI62Dfx/8i22Xt4kdqUYTBAH30u8plts2bIshrkNETERU+US9uy0rKwtubm4YNWoU+vfv/9r28fHx6NGjB8aOHYuNGzfixIkTGD9+PMzNzRXbb9q0CdOnT8eaNWvg4+OD69evY+TIkQCAhQsXKvbVrFkzREREKJY1NTUr9+CIqNo1NG6IFe+uwNPcpxjmOkzsODWWIAj4NPxTrDm3BhEjIuBh5SF2JKIqIWqR5O/vD39//zK3X758OWxtbbFo0SIAQNOmTXH69Gn88MMPiiIpNjYW7dq1w9ChQwEA9vb2GDJkCE6dOqW0Ly0trXL1HsnlcsjlcsVyRkZGmbclouozsNlAsSPUaIIg4IsjX+DH2B8BALJkGYskqrHUakxSbGwsfH19ldb5+fnh9OnTyM9/Pv6gffv2OHPmjKIoun37Nvbv34+ePXsqbXfjxg1YW1vDwcEBgwcPxu3bt1/52aGhoTAxMVG8bGxsKvHIiKgqPM19iu+Of8dpASqJIAiYdXQW5p+YDwBY7L8Yo91Hi5yKqOqo1WSSycnJsLCwUFpnYWGBgoICpKWlwcrKCoMHD0Zqairat28PQRBQUFCAcePGYfr06YptvLy8sH79erz11lt4+PAh5s6dCx8fH1y+fBn16tUr8bNDQkIwdepUxXJGRgYLJSIVVlBUgPZr2uNy6mVIJBJ81u4zsSOpNUEQ8HnE5/g+5nsAwCK/RfiozUcipyKqWmrVkwQ8nxPlZS+ez/tifWRkJObNm4elS5fi7Nmz2LVrF/766y988803im38/f3Rv39/uLq6omvXrti3bx8AYN26daV+rq6uLoyNjZVeRKS6tDS0MNFrIgDgiyNf4NjdYyInUl9FQhEmHJigKJB+7v4zJrWdJHIqoqqnVkWSpaUlkpOTldalpKRAS0tL0QP05ZdfIjAwEGPGjIGrqyv69u2Lb7/9FqGhoSgqKrnL3cDAAK6urrhx40aVHwMRVZ+xHmMxzHUYCoVCDNoxCPcz7osdSS3lF+bj2qNrkOD5xJ0vik+imk6tiiRvb2+Eh4crrTt8+DBatWoFbW1tAEB2djY0NJQPS1NTE4IgKHqd/ksulyMuLg5WVlZVE5yIRCGRSLD83eVoZt4MSZlJ6LWlF7LyssSOpXZ0tXSxZ9Ae/DX0L3zg+YHYcYiqjahFUmZmJmQyGWQyGYDnt/jLZDIkJCQAeD4OaMSIEYr2wcHBuHv3LqZOnYq4uDisWbMGq1evxrRp0xRtAgICsGzZMmzduhXx8fEIDw/Hl19+iV69eilu8582bRqioqIQHx+PkydPYsCAAcjIyEBQUFD1HTwRVQtDHUP8OeRPmOmb4VzyOQzfPZwDucsgrzAPGy9sVPxyaaBjgB5Neoiciqh6iTpw+/Tp0+jSpYti+cXA6KCgIISFhSEpKUlRMAGAg4MD9u/fjylTpmDJkiWwtrbGL7/8ojTH0syZMyGRSDBz5kwkJibC3NwcAQEBmDdvnqLN/fv3MWTIEKSlpcHc3Bxt27bFP//8Azs7u2o4aiKqbg6mDtgzaA/eXv82ziWdQ9KzJDQwbiB2LJWVIc9A/+39EXE7AvFP4vFlpy/FjkQkColQ2jUoeqWMjAyYmJggPT2dg7iJ1MT+G/vRyroV6hvUFzuKykrOTEaPTT1wLvkcDLQNsHvQbnRr1E3sWESVpjznb7WaAoCI6E3893JRalYqzA3MRUqjeq4/uo7uG7sj/mk86hvUx76h+9DKupXYsYhEo1YDt4mIKsvGCxth/7M9Im5HvL5xLXAq8RTarWmH+KfxaGTaCDGjY1ggUa3HIomIah1BEPDHtT+QnZ+NPlv74HjCcbEjiepxzmN0Xd8Vadlp8LTyRMz7MWhUt5HYsYhExyKJiGodiUSCDX03wLeRL7Lys9B9Y3dE3okUO5Zo6urVxfyu89GjSQ9EjozkmC2i/8eB2xXEgdtE6i8nPwd9tvXB4VuHoaelhz+G/IGujl3FjlUtsvKykJqdCvs69op1RUIRNCT83ZlqtvKcv/mvgYhqLT1tPewdvBc9mvRATkEOem7uie2Xt4sdq8pdSb2CNr+1ge8GXzzJeaJYzwKJSBn/RRBRrSbVkmLXwF3o17Qf8grzIEuWiR2pSm26sAmtV7XGldQryMzLxN30u2JHIlJZnAKAiGo9XS1dbB+wHZsvbsbwFsPFjlMlsvKyMPXQVKw8uxIA8I7DO9jUbxMsDC1ETkakutiTREQEQFNDE4FugZBIJACA7PxsfPjnh0jOTH7Nlqov5l4MWq5oiZVnV0ICCWZ1nIVDww+xQCJ6DRZJREQleNHr4rHCQ+3vfPs+5nvcfHwTDY0b4nDgYXzV5StoamiKHYtI5bFIIiIqwZS2U+Bi7oKkzCS8ve5tfBb+GeQFcrFjlVleYZ7iz8t7Lsf4VuNxcdzFWnP3HlFl4BQAFcQpAIhqvsy8TEw9NBWrzq4CALSwaIEV765A24ZtRU5Wunvp9/DJ4U8AANvfq/l36hGVF6cAICKqBIY6hlgZsBJ7B++Fmb4ZLjy8AO/V3lh7bq3Y0Yp5nPMYXxz5As5LnPH7ld+xK24Xbjy6IXYsIrXGIomI6DV6OfXClfFXMLLlSNSR1in2oFwxpeemY07kHDj87IDQ46HIzs9GO5t2OPPBGTSp10TseERqjZfbKoiX24hqp9SsVJgbmCuWh+0ahqZmTfGh54dK66vD6Qen8fa6t/Es7xmA55cDv+78NXo59VLcpUdEyspz/uY8SURE5fByIXTs7jFsvrgZADA3ei6GuA7BiBYj0NGuY5XcPZYhz8DNxzfhYeUBAHCt7wp9bX3YmNhgTqc56O/Sn7NmE1Ui9iRVEHuSiCi/MB/bL2/HopOLcPrBacV6ayNr9HPuhw9bfYjm9ZtXeP9FQhGuP7qO8Fvh2H9zPyLvPH/4bPykeEUxFP8kHnZ17FgcEZVRec7fLJIqiEUSEb0gCAJi78di7bm12BG3A09znwIADg0/BN9GvgCA4wnHcTzhOGxNbGFtZI16evWgrakNbQ1t5Bbk4q16b0FbUxsAsPbcWmy+tBn/Jv6LdHm60mc51XPCkRFH0MC4QbUeI1FNwcttRETVSCKRwMfGBz42PljcYzEO3TqEQzcPob1te0WbPVf34MfYH0vdx+2Jt+Fg6gDg+QNoI25HAHj+bDnvht7o0aQH/Bv7w8XcheONiKoJiyQiokqkq6WLXk690Mupl9J6TytPDHMdhgfPHuDBswd4kvsE+YX5yCvMg1RLipyCHEXbfk37oUm9JmjToA2amTdT9DARUfXi5bYK4uU2IiIi9cPJJImIiIjeEIskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohJoiR1AXQmCAADIyMgQOQkRERGV1Yvz9ovz+KuwSKqgZ8+eAQBsbGxETkJERETl9ezZM5iYmLyyjUQoSylFxRQVFeHBgwcwMjKCRCKp1H1nZGTAxsYG9+7dg7GxcaXuu6bhd1V2/K7Kjt9V2fG7Kjt+V+VTVd+XIAh49uwZrK2toaHx6lFH7EmqIA0NDTRs2LBKP8PY2Jj/kMqI31XZ8bsqO35XZcfvquz4XZVPVXxfr+tBeoEDt4mIiIhKwCKJiIiIqAQsklSQrq4uZs+eDV1dXbGjqDx+V2XH76rs+F2VHb+rsuN3VT6q8H1x4DYRERFRCdiTRERERFQCFklEREREJWCRRERERFQCFklEREREJWCRpCbkcjlatmwJiUQCmUwmdhyV1KtXL9ja2kIqlcLKygqBgYF48OCB2LFUzp07d/D+++/DwcEBenp6aNSoEWbPno28vDyxo6mkefPmwcfHB/r6+qhTp47YcVTO0qVL4eDgAKlUCk9PTxw7dkzsSCopOjoaAQEBsLa2hkQiwZ49e8SOpJJCQ0PRunVrGBkZoX79+ujTpw+uXbsmWh4WSWris88+g7W1tdgxVFqXLl2wfft2XLt2DTt37sStW7cwYMAAsWOpnKtXr6KoqAgrVqzA5cuXsXDhQixfvhxffPGF2NFUUl5eHt577z2MGzdO7CgqZ9u2bZg8eTJmzJiBc+fOoUOHDvD390dCQoLY0VROVlYW3NzcsHjxYrGjqLSoqCh89NFH+OeffxAeHo6CggL4+voiKytLnEACqbz9+/cLzs7OwuXLlwUAwrlz58SOpBb27t0rSCQSIS8vT+woKm/BggWCg4OD2DFU2tq1awUTExOxY6iUNm3aCMHBwUrrnJ2dhenTp4uUSD0AEHbv3i12DLWQkpIiABCioqJE+Xz2JKm4hw8fYuzYsdiwYQP09fXFjqM2Hj9+jE2bNsHHxwfa2tpix1F56enpqFu3rtgxSI3k5eXhzJkz8PX1VVrv6+uLmJgYkVJRTZOeng4Aov3/xCJJhQmCgJEjRyI4OBitWrUSO45a+Pzzz2FgYIB69eohISEBe/fuFTuSyrt16xZ+/fVXBAcHix2F1EhaWhoKCwthYWGhtN7CwgLJyckipaKaRBAETJ06Fe3bt0fz5s1FycAiSQRz5syBRCJ55ev06dP49ddfkZGRgZCQELEji6as39ULn376Kc6dO4fDhw9DU1MTI0aMgFBLJpUv73cFAA8ePED37t3x3nvvYcyYMSIlr34V+a6oZBKJRGlZEIRi64gq4uOPP8aFCxewZcsW0TLwsSQiSEtLQ1pa2ivb2NvbY/Dgwfjzzz+V/sMpLCyEpqYmhg0bhnXr1lV1VNGV9buSSqXF1t+/fx82NjaIiYmBt7d3VUVUGeX9rh48eIAuXbrAy8sLYWFh0NCoPb8zVeTnKiwsDJMnT8bTp0+rOJ16yMvLg76+Pn7//Xf07dtXsX7SpEmQyWSIiooSMZ1qk0gk2L17N/r06SN2FJU1YcIE7NmzB9HR0XBwcBAth5Zon1yLmZmZwczM7LXtfvnlF8ydO1ex/ODBA/j5+WHbtm3w8vKqyogqo6zfVUle1P9yubwyI6ms8nxXiYmJ6NKlCzw9PbF27dpaVSABb/ZzRc/p6OjA09MT4eHhSkVSeHg4evfuLWIyUmeCIGDChAnYvXs3IiMjRS2QABZJKs3W1lZp2dDQEADQqFEjNGzYUIxIKuvUqVM4deoU2rdvD1NTU9y+fRuzZs1Co0aNakUvUnk8ePAAnTt3hq2tLX744QekpqYq3rO0tBQxmWpKSEjA48ePkZCQgMLCQsU8ZY0bN1b8m6ytpk6disDAQLRq1Qre3t5YuXIlEhISOL6tBJmZmbh586ZiOT4+HjKZDHXr1i32f31t9tFHH2Hz5s3Yu3cvjIyMFOPbTExMoKenV/2BRLmnjiokPj6eUwCU4sKFC0KXLl2EunXrCrq6uoK9vb0QHBws3L9/X+xoKmft2rUCgBJfVFxQUFCJ39XRo0fFjqYSlixZItjZ2Qk6OjqCh4eHaLdqq7qjR4+W+HMUFBQkdjSVUtr/TWvXrhUlD8ckEREREZWgdg1EICIiIiojFklEREREJWCRRERERFQCFklEREREJWCRRERERFQCFklEREREJWCRRERERFQCFklEREREJWCRRESVqnPnzpg8ebLYMUr06NEj1K9fH3fu3AEAREZGQiKRVPlDayv6OWFhYahTp065tmndujV27dpVrm2IqGQskohIpSUlJWHo0KFwcnKChoZGqQXYzp074eLiAl1dXbi4uGD37t3F2oSGhiIgIAD29vZVG1pEX375JaZPn46ioiKxoxCpPRZJRKTS5HI5zM3NMWPGDLi5uZXYJjY2FoMGDUJgYCDOnz+PwMBADBw4ECdPnlS0ycnJwerVqzFmzJjqii6Knj17Ij09HYcOHRI7CpHaY5FERFXmyZMnGDFiBExNTaGvrw9/f3/cuHFDqc2qVatgY2MDfX199O3bFz/99JPSJSZ7e3v8/PPPGDFiBExMTEr8nEWLFqFbt24ICQmBs7MzQkJC8M4772DRokWKNgcOHICWlha8vb1Lzfvo0SMMGTIEDRs2hL6+PlxdXbFlyxalNp07d8aECRMwefJkmJqawsLCAitXrkRWVhZGjRoFIyMjNGrUCAcOHCi2/xMnTsDNzQ1SqRReXl64ePGi0vthYWGwtbVVfBePHj1Sev/WrVvo3bs3LCwsYGhoiNatWyMiIkKpjaamJnr06FEsNxGVH4skIqoyI0eOxOnTp/HHH38gNjYWgiCgR48eyM/PB/C8aAgODsakSZMgk8nQrVs3zJs3r9yfExsbC19fX6V1fn5+iImJUSxHR0ejVatWr9xPbm4uPD098ddff+HSpUv44IMPEBgYqNQjBQDr1q2DmZkZTp06hQkTJmDcuHF477334OPjg7Nnz8LPzw+BgYHIzs5W2u7TTz/FDz/8gH///Rf169dHr169FN/FyZMnMXr0aIwfPx4ymQxdunTB3LlzlbbPzMxEjx49EBERgXPnzsHPzw8BAQFISEhQatemTRscO3asbF8eEZVOICKqRJ06dRImTZokXL9+XQAgnDhxQvFeWlqaoKenJ2zfvl0QBEEYNGiQ0LNnT6Xthw0bJpiYmLxy3/+lra0tbNq0SWndpk2bBB0dHcVy7969hdGjRyu1OXr0qABAePLkSanH06NHD+GTTz5RytC+fXvFckFBgWBgYCAEBgYq1iUlJQkAhNjYWKXP2bp1q6LNo0ePBD09PWHbtm2CIAjCkCFDhO7duyt99qBBg0r9Ll5wcXERfv31V6V1e/fuFTQ0NITCwsJXbktEr8aeJCKqEnFxcdDS0oKXl5diXb169eDk5IS4uDgAwLVr19CmTRul7f67XFYSiURpWRAEpXU5OTmQSqWv3EdhYSHmzZuHFi1aoF69ejA0NMThw4eL9dS0aNFC8WdNTU3Uq1cPrq6uinUWFhYAgJSUFKXtXr7UV7duXaXvIi4urtilwP8uZ2Vl4bPPPoOLiwvq1KkDQ0NDXL16tVg+PT09FBUVQS6Xv/J4iejVtMQOQEQ1kyAIpa5/Ubz8t5B51XavYmlpieTkZKV1KSkpimIFAMzMzPDkyZNX7ufHH3/EwoULsWjRIri6usLAwACTJ09GXl6eUjttbW2lZYlEorTuxTGV5Q6zl7+L1/n0009x6NAh/PDDD2jcuDH09PQwYMCAYvkeP34MfX196OnpvXafRFQ69iQRUZVwcXFBQUGB0nieR48e4fr162jatCkAwNnZGadOnVLa7vTp0+X+LG9vb4SHhyutO3z4MHx8fBTL7u7uuHLlyiv3c+zYMfTu3RvDhw+Hm5sbHB0diw00fxP//POP4s9PnjzB9evX4ezsDOD59/Xy+/9t/yLfyJEj0bdvX7i6usLS0lIx59PLLl26BA8Pj0rLTVRbsUgioirRpEkT9O7dG2PHjsXx48dx/vx5DB8+HA0aNEDv3r0BABMmTMD+/fvx008/4caNG1ixYgUOHDhQrHdJJpNBJpMhMzMTqampkMlkSgXPpEmTcPjwYXz33Xe4evUqvvvuO0RERCjNqeTn54fLly+/sjepcePGCA8PR0xMDOLi4vDhhx8W66F6E19//TWOHDmCS5cuYeTIkTAzM0OfPn0AABMnTsTBgwexYMECXL9+HYsXL8bBgweL5du1axdkMhnOnz+PoUOHlthbdezYsWID2Ymo/FgkEVGVWbt2LTw9PfHuu+/C29sbgiBg//79iktT7dq1w/Lly/HTTz/Bzc0NBw8exJQpU4qNHXJ3d4e7uzvOnDmDzZs3w93dHT169FC87+Pjg61bt2Lt2rVo0aIFwsLCsG3bNqXxUK6urmjVqhW2b99eat4vv/wSHh4e8PPzQ+fOnWFpaakoYirD/PnzMWnSJHh6eiIpKQl//PEHdHR0AABt27bFb7/9hl9//RUtW7bE4cOHMXPmTKXtFy5cCFNTU/j4+CAgIAB+fn7FeowSExMRExODUaNGVVpuotpKIlRkAAARURUZO3Ysrl69WiW3sO/fvx/Tpk3DpUuXoKFRM39H/PTTT5Geno6VK1eKHYVI7XHgNhGJ6ocffkC3bt1gYGCAAwcOYN26dVi6dGmVfFaPHj1w48YNJCYmwsbGpko+Q2z169fHtGnTxI5BVCOwJ4mIRDVw4EBERkbi2bNncHR0xIQJExAcHCx2LCIiFklEREREJamZF+WJiIiI3hCLJCIiIqISsEgiIiIiKgGLJCIiIqISsEgiIiIiKgGLJCIiIqISsEgiIiIiKgGLJCIiIqIS/B8Nj5nHHBDgHgAAAABJRU5ErkJggg==", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "cell_type": "markdown", + "id": "354b2ab3", + "metadata": { + "editable": true + }, "source": [ - "import numpy as np\n", - "import pandas as pd\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn import linear_model\n", - "\n", - "def MSE(y_data,y_model):\n", - " n = np.size(y_model)\n", - " return np.sum((y_data-y_model)**2)/n\n", - "# A seed just to ensure that the random numbers are the same for every run.\n", - "# Useful for eventual debugging.\n", - "np.random.seed(2021)\n", - "\n", - "n = 100\n", - "x = np.random.rand(n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n", - "\n", - "Maxpolydegree = 5\n", - "X = np.zeros((n,Maxpolydegree-1))\n", + "## Identifying Terms\n", "\n", - "for degree in range(1,Maxpolydegree): #No intercept column\n", - " X[:,degree-1] = x**(degree)\n", - "\n", - "# We split the data in test and training data\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", - "\n", - "# Decide which values of lambda to use\n", - "nlambdas = 500\n", - "MSERidgePredict = np.zeros(nlambdas)\n", - "lambdas = np.logspace(-4, 2, nlambdas)\n", - "for i in range(nlambdas):\n", - " lmb = lambdas[i]\n", - " RegRidge = linear_model.Ridge(lmb)\n", - " RegRidge.fit(X_train,y_train)\n", - " ypredictRidge = RegRidge.predict(X_test)\n", - " MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n", - "\n", - "# Now plot the results\n", - "plt.figure()\n", - "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n", - "plt.xlabel('log10(lambda)')\n", - "plt.ylabel('MSE')\n", - "plt.legend()\n", - "plt.show()" + "The second term on the rhs disappears since this is just the mean and \n", + "employing the definition of $\\sigma^2$ we have" ] }, { "cell_type": "markdown", - "id": "70944449", - "metadata": {}, + "id": "2ee3d80b", + "metadata": { + "editable": true + }, "source": [ - "Here we have performed a rather data greedy calculation as function of the regularization parameter $\\lambda$. There is no resampling here. The latter can easily be added by employing the function **RidgeCV** instead of just calling the **Ridge** function. For **RidgeCV** we need to pass the array of $\\lambda$ values.\n", - "By inspecting the figure we can in turn determine which is the optimal regularization parameter.\n", - "This becomes however less functional in the long run." + "$$\n", + "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n", + " 1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n", + "$$" ] }, { "cell_type": "markdown", - "id": "1538973c", - "metadata": {}, + "id": "5c6f424d", + "metadata": { + "editable": true + }, "source": [ - "## Grid Search\n", - "\n", - "An alternative is to use the so-called grid search functionality\n", - "included with the library **Scikit-Learn**, as demonstrated for the same\n", - "example here." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "1c1fdba0", - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.model_selection import GridSearchCV\n", - "\n", - "def R2(y_data, y_model):\n", - " return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)\n", - "\n", - "def MSE(y_data,y_model):\n", - " n = np.size(y_model)\n", - " return np.sum((y_data-y_model)**2)/n\n", - "\n", - "# A seed just to ensure that the random numbers are the same for every run.\n", - "# Useful for eventual debugging.\n", - "np.random.seed(2021)\n", - "\n", - "n = 100\n", - "x = np.random.rand(n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n", - "\n", - "Maxpolydegree = 5\n", - "X = np.zeros((n,Maxpolydegree-1))\n", - "\n", - "for degree in range(1,Maxpolydegree): #No intercept column\n", - " X[:,degree-1] = x**(degree)\n", - "\n", - "# We split the data in test and training data\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", - "\n", - "# Decide which values of lambda to use\n", - "nlambdas = 10\n", - "lambdas = np.logspace(-4, 2, nlambdas)\n", - "# create and fit a ridge regression model, testing each alpha\n", - "model = Ridge()\n", - "gridsearch = GridSearchCV(estimator=model, param_grid=dict(alpha=lambdas))\n", - "gridsearch.fit(X_train, y_train)\n", - "print(gridsearch)\n", - "ypredictRidge = gridsearch.predict(X_test)\n", - "# summarize the results of the grid search\n", - "print(f\"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}\")\n", - "print(f\"MSE score: {MSE(y_test,ypredictRidge)}\")\n", - "print(f\"R2 score: {R2(y_test,ypredictRidge)}\")" - ] - }, - { - "cell_type": "markdown", - "id": "dd6f78eb", - "metadata": {}, - "source": [ - "By default the grid search function includes cross validation with\n", - "five folds. The [Scikit-Learn\n", - "documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV)\n", - "contains more information on how to set the different parameters.\n", - "\n", - "If we take out the random noise, running the above codes results in $\\lambda=0$ yielding the best fit." - ] - }, - { - "cell_type": "markdown", - "id": "20b7afcb", - "metadata": {}, - "source": [ - "## Randomized Grid Search\n", - "\n", - "An alternative to the above manual grid set up, is to use a random\n", - "search where the parameters are tuned from a random distribution\n", - "(uniform below) for a fixed number of iterations. A model is\n", - "constructed and evaluated for each combination of chosen parameters.\n", - "We repeat the previous example but now with a random search. Note\n", - "that values of $\\lambda$ are now limited to be within $x\\in\n", - "[0,1]$. This domain may not be the most relevant one for the specific\n", - "case under study." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "4810b670", - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "from sklearn.model_selection import train_test_split\n", - "from sklearn.linear_model import Ridge\n", - "from sklearn.model_selection import GridSearchCV\n", - "from scipy.stats import uniform as randuniform\n", - "from sklearn.model_selection import RandomizedSearchCV\n", - "\n", - "\n", - "def R2(y_data, y_model):\n", - " return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)\n", - "\n", - "def MSE(y_data,y_model):\n", - " n = np.size(y_model)\n", - " return np.sum((y_data-y_model)**2)/n\n", - "\n", - "# A seed just to ensure that the random numbers are the same for every run.\n", - "# Useful for eventual debugging.\n", - "np.random.seed(2021)\n", - "\n", - "n = 100\n", - "x = np.random.rand(n)\n", - "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n", - "\n", - "Maxpolydegree = 5\n", - "X = np.zeros((n,Maxpolydegree-1))\n", - "\n", - "for degree in range(1,Maxpolydegree): #No intercept column\n", - " X[:,degree-1] = x**(degree)\n", - "\n", - "# We split the data in test and training data\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n", - "\n", - "param_grid = {'alpha': randuniform()}\n", - "# create and fit a ridge regression model, testing each alpha\n", - "model = Ridge()\n", - "gridsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)\n", - "gridsearch.fit(X_train, y_train)\n", - "print(gridsearch)\n", - "ypredictRidge = gridsearch.predict(X_test)\n", - "# summarize the results of the grid search\n", - "print(f\"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}\")\n", - "print(f\"MSE score: {MSE(y_test,ypredictRidge)}\")\n", - "print(f\"R2 score: {R2(y_test,ypredictRidge)}\")" - ] - }, - { - "cell_type": "markdown", - "id": "0696dfc9", - "metadata": {}, - "source": [ - "## Wisconsin Cancer Data\n", - "\n", - "We show here how we can use a simple regression case on the breast\n", - "cancer data using Logistic regression as our algorithm for\n", - "classification." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "c55d1159", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(426, 30)\n", - "(143, 30)\n", - "Test set accuracy with Logistic Regression: 0.94\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n" - ] - } - ], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn.model_selection import train_test_split \n", - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "# Load the data\n", - "cancer = load_breast_cancer()\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", - "print(X_train.shape)\n", - "print(X_test.shape)\n", - "# Logistic Regression\n", - "logreg = LogisticRegression(solver='lbfgs')\n", - "logreg.fit(X_train, y_train)\n", - "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))" - ] - }, - { - "cell_type": "markdown", - "id": "b83cd520", - "metadata": {}, - "source": [ - "## Using the correlation matrix\n", - "\n", - "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n", - "We use **Pandas** to compute the correlation matrix." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "5497a1d8", - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAA9wAAAfFCAYAAABjxsRdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeZyN9f//8eeZfTEzGMbMmDEGZexC1qxlKaT0zZa1PqVPCJFWobKEFC2Uj1Ap9SmpJFuMFELRIqVkjaFkZ5jl/fvDb87HMcNs55qzzON+u83tNue63ue6Xu/3dc55nde1HZsxxggAAAAAADiVj6sDAAAAAADAG1FwAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFKLgBAAAAALAABTcAAAAAABag4AYAAAAAwAIU3AAAAAAAWICCGwAAAAAAC1BwA7BUxYoV1b9/f/vj5ORk2Ww2JScnuywmAADcyYQJE7R48WLL17N+/XqNHTtWx48ft3xdAC6i4AZQpOrVq6cNGzaoXr16rg4FAAC3UJQF97hx4yi4gSJEwQ0gG2OMzp07Z8myw8PD1bhxY4WHh1uyfAAAULTOnTsnY4yrwwDcEgU3kE9jx46VzWbTDz/8oDvvvFMREREqXbq0HnroIaWnp+vXX39Vhw4dFBYWpooVK2ry5MnZlnHy5EmNHDlSiYmJCggIUPny5TVs2DCdOXPGod0rr7yiFi1aKCoqSqGhoapVq5YmT56stLQ0h3atWrVSzZo1tXnzZjVv3lwhISGqVKmSJk2apMzMzFz7ZLPZNHjwYM2aNUvVqlVTYGCg5s+fL0kaN26cGjVqpNKlSys8PFz16tXTnDlzsiXWtLQ0jRo1StHR0QoJCdENN9ygTZs2ZVtXTqeUt2rVSq1atcrWtn///qpYsaLDtJkzZ6pOnToqUaKEwsLClJSUpMcffzzXPgIAvIO35WGbzaYzZ85o/vz5stlsstlsDjkxJSVFAwcOVFxcnAICApSYmKhx48YpPT1d0sWd5LfccosiIyO1b98++/POnj2rGjVqqFq1ajpz5ozGjh2rhx9+WJKUmJhoX1dWPrbZbBo7dmy2+C6/NGzevHmy2WxasWKF7r77bpUtW1YhISE6f/68JOm9995TkyZNFBoaqhIlSqh9+/baunXrVccA8GZ+rg4A8FTdunVT7969NXDgQK1cudKegFetWqUHHnhAI0eO1DvvvKNHHnlEVapUUdeuXSVdTIAtW7bUgQMH9Pjjj6t27dravn27nnrqKf34449atWqVbDabJGnXrl3q1auX/QvB999/r/Hjx+uXX37RG2+84RBPSkqK7rrrLo0YMUJjxozRRx99pMcee0yxsbHq27dvrv1ZvHix1q1bp6eeekrR0dGKioqSJO3Zs0cDBw5UhQoVJEkbN27UkCFD9Oeff+qpp56yP//ee+/Vm2++qZEjR6pt27b66aef1LVrV506dcop4y1JCxcu1AMPPKAhQ4Zo6tSp8vHx0e+//66ff/7ZaesAAHgGb8nDGzZsUJs2bdS6dWuNHj1akuxngaWkpKhhw4by8fHRU089pcqVK2vDhg169tlntWfPHs2dO1c2m01vvfWW6tatq27dumndunXy9/fXAw88oN27d+ubb75RaGio/vWvf+mff/7RSy+9pEWLFikmJkaSVL169QKN/913362OHTvqrbfe0pkzZ+Tv768JEyboySef1IABA/Tkk0/qwoULmjJlipo3b65NmzYVeF2ARzMA8mXMmDFGknn++ecdptetW9dIMosWLbJPS0tLM2XLljVdu3a1T5s4caLx8fExmzdvdnj+Bx98YCSZpUuX5rjejIwMk5aWZt58803j6+tr/vnnH/u8li1bGknmm2++cXhO9erVTfv27XPtkyQTERHhsMyrxfD000+byMhIk5mZaYwxZseOHUaSGT58uEP7BQsWGEmmX79+9mlr1qwxksyaNWsc4m/ZsmW29fXr188kJCTYHw8ePNiULFky1/4AALyXN+bh0NBQh1yZZeDAgaZEiRJm7969DtOnTp1qJJnt27fbp3311VfGz8/PDBs2zLzxxhtGkvnPf/7j8LwpU6YYSWb37t3Z1iXJjBkzJtv0hIQEh9jmzp1rJJm+ffs6tNu3b5/x8/MzQ4YMcZh+6tQpEx0dbbp163aF3gPejVPKgQLq1KmTw+Nq1arJZrPp5ptvtk/z8/NTlSpVtHfvXvu0JUuWqGbNmqpbt67S09Ptf+3bt892qvXWrVt16623KjIyUr6+vvL391ffvn2VkZGhnTt3Oqw/OjpaDRs2dJhWu3Zth3VfTZs2bVSqVKls01evXq2bbrpJERER9hieeuopHT16VEeOHJEkrVmzRpJ01113OTy3W7du8vNz3ok0DRs21PHjx9WzZ099/PHH+vvvv522bACAZ/G2PJyTJUuWqHXr1oqNjXWINauPa9eutbdt1qyZxo8frxdffFH//ve/1bt3b91zzz0FXndu7rjjDofHy5cvV3p6uvr27esQa1BQkFq2bMmvk6DY4pRyoIBKly7t8DggIEAhISEKCgrKNv3kyZP2x4cPH9bvv/8uf3//HJebVUTu27dPzZs3V9WqVTV9+nRVrFhRQUFB2rRpkwYNGpTtpmaRkZHZlhUYGJjnm59lnVp2qU2bNqldu3Zq1aqVZs+ebb9+bPHixRo/frx92UePHpV08cvGpfz8/HKMq6D69Omj9PR0zZ49W3fccYcyMzN1/fXX69lnn1Xbtm2dth4AgPvztjyck8OHD+vTTz/NNdYsd911l0aPHq3z58/br9e2yuXfGw4fPixJuv7663Ns7+PDcT4UTxTcQBErU6aMgoODs137del86eI11WfOnNGiRYuUkJBgn79t2zZL4sq6Xu1SCxculL+/v5YsWeLwBebyny7J+pKRkpKi8uXL26enp6fbi/GrCQoK0okTJ7JNz+kI9oABAzRgwACdOXNGX375pcaMGaNOnTpp586dDuMEAEBO3DUPXymW2rVra/z48TnOj42Ntf+fkZGhu+66S6VKlVJgYKDuueceff311woICMjTugIDA+03PrvUlfL45d8bssbtgw8+IB8Dl6DgBopYp06dNGHCBEVGRioxMfGK7bISWWBgoH2aMUazZ8+2PMZLY/Dz85Ovr6992rlz5/TWW285tMu6m+qCBQtUv359+/T333/ffhfVq6lYsaL++9//6vz58/b+Hj16VOvXr7/iz4eFhobq5ptv1oULF3Tbbbdp+/btJHgAQK7cMQ9f6Uh4p06dtHTpUlWuXDnHy74uNWbMGK1bt04rVqxQaGioWrRooYcffljTp093WI+kHNdVsWJF/fDDDw7TVq9erdOnT+epD+3bt5efn5927dqV7XRzoDij4AaK2LBhw/Thhx+qRYsWGj58uGrXrq3MzEzt27dPK1as0IgRI9SoUSO1bdtWAQEB6tmzp0aNGqXU1FTNnDlTx44dK7JYO3bsqGnTpqlXr1667777dPToUU2dOtXhy4d08bq53r1768UXX5S/v79uuukm/fTTT5o6dWqefm+7T58+eu2119S7d2/de++9Onr0qCZPnpztuffee6+Cg4PVrFkzxcTEKCUlRRMnTlRERMQVT2EDAOBS7piHa9WqpeTkZH366aeKiYlRWFiYqlatqqefflorV65U06ZN9eCDD6pq1apKTU3Vnj17tHTpUs2aNUtxcXFauXKlJk6cqNGjR+vGG2+UJE2cOFEjR45Uq1atdPvtt9vXI0nTp09Xv3795O/vr6pVqyosLEx9+vTR6NGj9dRTT6lly5b6+eef9fLLLysiIiJPfahYsaKefvppPfHEE/rjjz/UoUMHlSpVSocPH9amTZsUGhqqcePGOX3sAHdHwQ0UsdDQUK1bt06TJk3S66+/rt27dys4OFgVKlTQTTfdZP/d6aSkJH344Yd68skn1bVrV0VGRqpXr1566KGHHG4IY6U2bdrojTfe0HPPPafOnTurfPnyuvfeexUVFZXtRixz5sxRuXLlNG/ePM2YMUN169bVhx9+qB49euS6nmbNmmn+/PmaNGmSunTpokqVKmnMmDFaunSpw01Wmjdvrnnz5un999/XsWPHVKZMGd1www168803VbZsWWd3HwDghdwxD0+fPl2DBg1Sjx497D9blpycrJiYGG3ZskXPPPOMpkyZogMHDigsLEyJiYn2gvbQoUPq3bu3WrVq5fBznQ899JDWrl2ru+++W9ddd50qVqyoVq1a6bHHHtP8+fM1e/ZsZWZmas2aNWrVqpUefvhhnTx5UvPmzdPUqVPVsGFDvf/+++rSpUue+/HYY4+pevXqmj59ut59912dP39e0dHRuv7663X//fc7dcwAT2EzxhhXBwEAAAAAgLfhdoEAAAAAAFiAghsAAAAAAAsUqODevXu3s+MAAAAAAMCrFKjgrlKlilq3bq23335bqampzo4JAAAAAACPV6CC+/vvv9d1112nESNGKDo6WgMHDtSmTZucHRsAAAAAAB6rUHcpT09P16effqp58+bp888/1zXXXKN77rlHffr0uepP9GRmZurgwYMKCwuTzWYr6OoBAPB4xhidOnVKsbGx8vFxza1VyMsAAFzk7LzslJ8FO3/+vF599VU99thjunDhgvz9/dW9e3c999xziomJydb+wIEDio+PL+xqAQDwGvv371dcXJxL1k1eBgDAkbPycqEK7i1btuiNN97QwoULFRoaqn79+umee+7RwYMH9dRTT+nUqVM5nmp+4sQJlSxZUvv371d4eHihOgAA3igzM1Pp6emuDgNO4Ofnd9U95CdPnlR8fLyOHz+uiIiIIozsf8jLAABc5Oy87FeQJ02bNk1z587Vr7/+qltuuUVvvvmmbrnlFvsXisTERL322mtKSkrK8flZp6uFh4eT2AHgEsYYpaSk6Pjx464OBU5UsmRJRUdHX/V0bVeeyk1eBgDAkbPycoEK7pkzZ+ruu+/WgAEDFB0dnWObChUqaM6cOYUKDi6yZuLV57d+rGjiAIqhrGI7KipKISEhXE/r4YwxOnv2rI4cOSJJOV5mBeSKvAwAHqtABfdvv/2Wa5uAgAD169evIIuHu8st8eeGLwZAjjIyMuzFdmRkpKvDgZMEBwdLko4cOaKoqCj5+vq6OCIAAFBUClRwz507VyVKlNCdd97pMP2///2vzp49S6ENAAWQlpYmSQoJCXFxJHC2rG2alpZGwQ3XKOxRco6yA0CBFOg+55MmTVKZMmWyTY+KitKECRMKHRQAFGecRu592KYAABRPBSq49+7dq8TExGzTExIStG/fvkIHBQAAAACApytQwR0VFaUffvgh2/Tvv/+e6w4BAAAAAFABr+Hu0aOHHnzwQYWFhalFixaSpLVr12ro0KHq0aOHUwMEAEgvrNxZZOsa3vbaIltXTvbs2aPExERt3bpVdevWVXJyslq3bq1jx46pZMmSLo0NAAAgPwpUcD/77LPau3evbrzxRvn5XVxEZmam+vbtyzXcyF1e7nLOzVcAj9K/f3/Nnz9fAwcO1KxZsxzmPfDAA5o5c6b69eunefPm5XvZTZs21aFDhxQREeGkaJ1n3rx5GjZsGL+bDpDbASBHBTqlPCAgQO+9955++eUXLViwQIsWLdKuXbv0xhtvKCAgwNkxAgA8QHx8vBYuXKhz587Zp6Wmpurdd99VhQoVCrzcgIAARUdHc+MxAADgcQpUcGe59tprdeedd6pTp05KSEhwVkwAAA9Ur149VahQQYsWLbJPW7RokeLj43XdddfZpy1btkw33HCDSpYsqcjISHXq1Em7du264nKTk5Nls9kcjiLPnj1b8fHxCgkJ0e23365p06Y5nG4+duxY1a1bV2+99ZYqVqyoiIgI9ejRQ6dOncpzHHv27JHNZtOiRYvUunVrhYSEqE6dOtqwYYM9rgEDBujEiROy2Wyy2WwaO3ZsIUYQXmvNxKv/AQC8VoEK7oyMDM2ZM0e9evXSTTfdpDZt2jj8AQCKpwEDBmju3Ln2x2+88YbuvvtuhzZnzpzRQw89pM2bN+uLL76Qj4+Pbr/9dmVmZuZpHV9//bXuv/9+DR06VNu2bVPbtm01fvz4bO127dqlxYsXa8mSJVqyZInWrl2rSZMm5TuOJ554QiNHjtS2bdt07bXXqmfPnkpPT1fTpk314osvKjw8XIcOHdKhQ4c0cuTI/AwXAADwcgW6hnvo0KGaN2+eOnbsqJo1a3KaH5wvtz3+XAcGuKU+ffrosccesx8d/vrrr7Vw4UIlJyfb29xxxx0Oz5kzZ46ioqL0888/q2bNmrmu46WXXtLNN99sL26vvfZarV+/XkuWLHFol5mZqXnz5iksLMwe2xdffGEvzvMax8iRI9WxY0dJ0rhx41SjRg39/vvvSkpKUkREhGw2m6Kjo/M4QgAAoDgpUMG9cOFCvf/++7rlllucHQ8AwIOVKVNGHTt21Pz582WMUceOHVWmTBmHNrt27dLo0aO1ceNG/f333/Yjyvv27ctTwf3rr7/q9ttvd5jWsGHDbAV3xYoV7cW2JMXExOjIkSP5jqN27doOy5CkI0eOKCkpKddYAQBA8VaggjsgIEBVqlRxdiwAAC9w9913a/DgwZKkV155Jdv8zp07Kz4+XrNnz1ZsbKwyMzNVs2ZNXbhwIU/LN8ZkO7PKGJOtnb+/v8Njm83mcLp4XuO4dDlZ683r6e8AAKB4K1DBPWLECE2fPl0vv/wyp5PDNTjlHHBbHTp0sBet7du3d5h39OhR7dixQ6+99pqaN28uSfrqq6/ytfykpCRt2rTJYdqWLVvytQxnxCFd3AGdkZGR7+cBboebtwGAJQpUcH/11Vdas2aNPv/8c9WoUSPbUYRL71ALAChefH19tWPHDvv/lypVqpQiIyP1+uuvKyYmRvv27dOjjz6ar+UPGTJELVq00LRp09S5c2etXr1an3/+eb52ADsjDuniaeunT5/WF198oTp16igkJEQhISH5Xg4AAPBOBSq4S5Ysme36OQCAdYa3vdbVIeRLeHh4jtN9fHy0cOFCPfjgg6pZs6aqVq2qGTNmqFWrVnledrNmzTRr1iyNGzdOTz75pNq3b6/hw4fr5ZdfzvMynBGHJDVt2lT333+/unfvrqNHj2rMmDH8NBjgxl5YufOq8z3tsxaA+7OZnC58s9jJkycVERGhEydOXPFLGVzIG04r45RyeKDU1FTt3r1biYmJCgoKcnU4HuXee+/VL7/8onXr1rk6lBxdbdu6Q050hxi8WmEvg3LGZVTukNvdIDdTcAPIjbNzYoGOcEtSenq6kpOTtWvXLvXq1UthYWE6ePCgwsPDVaJEiUIHBgDAlUydOlVt27ZVaGioPv/8c82fP1+vvvqqq8OCJ+KeIG6DYhiANypQwb1371516NBB+/bt0/nz59W2bVuFhYVp8uTJSk1N1axZs5wdJwAAdps2bdLkyZN16tQpVapUSTNmzNC//vUvV4cFAADgoEAF99ChQ9WgQQN9//33ioyMtE+//fbb+cIDALDc+++/7+oQAOdxh9O9AQCWKPBdyr/++msFBAQ4TE9ISNCff/7plMBgEZI6AAAAABSJAhXcmZmZOf7u6IEDBxQWFlbooAAAADyCO+zIdocYAAA58inIk9q2basXX3zR/thms+n06dMaM2aMbrnlFmfFBgAAAACAxyrQEe4XXnhBrVu3VvXq1ZWamqpevXrpt99+U5kyZfTuu+86O0YAAAAAADxOgQru2NhYbdu2Te+++66+++47ZWZm6p577tFdd92l4OBgZ8cIuAY/FQMAgPPkeur7HUUSBgAUpQL/DndwcLDuvvtu3X333c6MBwAAAAAAr1CggvvNN9+86vy+ffsWKBiIo6rOwjjC2xTlTZHc+P1RsWJFDRs2TMOGDXN1KADczAsrd1q+jOFtry30OgAULwX+He5LpaWl6ezZswoICFBISAgFNwAUM/3799f8+fPtj0uXLq3rr79ekydPVu3atZ22ns2bNys0NNRpywMAALBSgQruY8eOZZv222+/6d///rcefvjhQgcFAPA8HTp00Ny5cyVJKSkpevLJJ9WpUyft27fPaesoW7as05YFwL003vf6Vee/sPK+IooEAJynQD8LlpNrrrlGkyZNynb0GwBQPAQGBio6OlrR0dGqW7euHnnkEe3fv19//fWXJOnPP/9U9+7dVapUKUVGRqpLly7as2eP/fn9+/fXbbfdpqlTpyomJkaRkZEaNGiQ0tLS7G0qVqzo8LOUv/zyi2644QYFBQWpevXqWrVqlWw2mxYvXixJ2rNnj2w2mxYtWqTWrVsrJCREderU0YYNG4piSAAAQDFX4Jum5cTX11cHDx505iIBaxTl9bBAMXT69GktWLBAVapUUWRkpM6ePavWrVurefPm+vLLL+Xn56dnn31WHTp00A8//KCAgABJ0po1axQTE6M1a9bo999/V/fu3VW3bl3de++92daRmZmp2267TRUqVNA333yjU6dOacSIETnG88QTT2jq1Km65ppr9MQTT6hnz576/fff5efn1DQIb0S+yJMNfxx1dQi5HiGXpI0VCneUnGu8AeRXgb5pfPLJJw6PjTE6dOiQXn75ZTVr1swpgQEAPMuSJUtUokQJSdKZM2cUExOjJUuWyMfHRwsXLpSPj4/+85//yGazSZLmzp2rkiVLKjk5We3atZMklSpVSi+//LJ8fX2VlJSkjh076osvvsix4F6xYoV27dql5ORkRUdHS5LGjx+vtm3bZms7cuRIdezYUZI0btw41ahRQ7///ruSkpIsGQsAAACpgAX3bbfd5vDYZrOpbNmyatOmjZ5//nlnxIWCYk+822AvOIqb1q1ba+bMmZKkf/75R6+++qpuvvlmbdq0Sd9++61+//13hYWFOTwnNTVVu3btsj+uUaOGfH197Y9jYmL0448/5ri+X3/9VfHx8fZiW5IaNmyYY9tLb9wWExMjSTpy5AgFNwAAsFSBCu7MzExnx4G8oqAGsmHnhnsIDQ1VlSpV7I/r16+viIgIzZ49W5mZmapfv74WLFiQ7XmX3gjN39/fYZ7NZrtizjHG2I+W5+bS5WY9h1wGeJa8nDJe2GUU9pRzALgcF68BBeSM3/sEvJnNZpOPj4/OnTunevXq6b333lNUVJTCw8OdsvykpCTt27dPhw8fVrly5SRd/NkwAAAAd1Gggvuhhx7Kc9tp06YVZBUAAA9z/vx5paSkSLr485Evv/yyTp8+rc6dO6thw4aaMmWKunTpoqefflpxcXHat2+fFi1apIcfflhxcXH5Xl/btm1VuXJl9evXT5MnT9apU6f0xBNPSFKej3wDQFFyxs56ztoCPEuBCu6tW7fqu+++U3p6uqpWrSpJ2rlzp3x9fVWvXj17O77wAFeWl6Rb2KTqjFOtC7uMwn654IvF/9f6MVdHkKtly5bZr48OCwtTUlKS/vvf/6pVq1aSpC+//FKPPPKIunbtqlOnTql8+fK68cYbC3zE29fXV4sXL9a//vUvXX/99apUqZKmTJmizp07KygoyFndAgAAKLACFdydO3dWWFiY5s+fr1KlSkm6eDRjwIABat68+RV/lqXY4/prt5GXny9pUimyCCIBvMO8efM0b968q7aJjo7W/Pnzr7qMy136m9uSHH63W7p4WvlXX31lf/z1119Lkv1a8ooVK8oY4/CckiVLZpsGeLvc8l5uOc8dfvarKBSHa7yLYoc/gP8pUMH9/PPPa8WKFfZiW7r4Uy7PPvus2rVrR8ENACgSH330kUqUKKFrrrlGv//+u4YOHapmzZqpcuXKrg4NAACgYAX3yZMndfjwYdWoUcNh+pEjR3Tq1CmnBAa4Wq578ysUTRzuzhNuHsddzL3XqVOnNGrUKO3fv19lypTRTTfdxM9TAgVQXI5gW60ociI5DfAsBSq4b7/9dg0YMEDPP/+8GjduLEnauHGjHn74YXXt2tWpAXoUThkvMoU9NQ6Ad+jbt6/69u3r6jAAAAByVKCCe9asWRo5cqR69+6ttLS0iwvy89M999yjKVOmODVAoCCKy576wu5J94aj0wBwVewMhxvJy2+J53adeO7LmJqPiABYrUAFd0hIiF599VVNmTJFu3btkjFGVapUUWhoqLPjA4BiJzMz09UhwMnYpgCKldx2dHnAL28AzlKggjvLoUOHdOjQIbVo0ULBwcEyxvBTYABQQAEBAfLx8dHBgwdVtmxZBQQE8Jnq4YwxunDhgv766y/5+PgoICDA1SEBAIAiVKCC++jRo+rWrZvWrFkjm82m3377TZUqVdK//vUvlSxZ0ntvWMNpaUWmuJwSnhtOp3YPRfUTKj4+PkpMTNShQ4d08ODBQi8P7iMkJEQVKlSQj4+Pq0OBGyLnAYD3KlDBPXz4cPn7+2vfvn2qVq2afXr37t01fPhw7y24AcBiAQEBqlChgtLT05WRkeHqcOAEvr6+8vPz42wFAACKoQIV3CtWrNDy5csVFxfnMP2aa67R3r17nRIY4O5yu2lJbjc9gftwtzMJbDab/P395e/vb59WVEfZgSJV2DPHuA4UXigvN1az2oY5Iwv1fKf8WgzXgcNLFKjgPnPmjEJCQrJN//vvvxUYGFjooCyRl6TOGzdPOPUNAAAAAHJXoIK7RYsWevPNN/XMM89Iung0JjMzU1OmTFHr1q2dGiAAAECO3ODeKnnZCe2Uo31wC5zdlje5vS+a5KFccMYyAHdQoIJ7ypQpatWqlbZs2aILFy5o1KhR2r59u/755x99/fXXzo6x6LhB4rYaXwyKTlGcEkZiBwAAANyXzRhjCvLElJQUzZw5U99++60yMzNVr149DRo0SDExMbk+98SJEypZsqT279+v8PDwgqw+uy+5UVtebNrzj6tDQBHaHDfgqvOvPzC30MuAcwxqU+Wq819Z/XsRRXJlucUoFT7OvKzD6hhy44wYL3Xy5EnFx8fr+PHjioiIcOqy88qSvCzlmptzy0kNK5Yu1PPzsozCckYM5GY4kzPydl6+H7haYd9XeRmnwubm3J6/6c0nco2hYd/xubYpDgo71vnh7Lyc74I7LS1N7dq102uvvaZrry3YDXoOHDig+Pj4Aj0XAABvtH///mw3Iy0q5GUAABw5Ky/n+5Ryf39//fTTT4X6eZPY2Fjt379fYWFh/ExKPmXtcXH6UQjkC9vB9dgG7oHtUHjGGJ06dUqxsbEui6Eo8jKvlewYk5wxLjljXLJjTHLGuOQsr+Pi7LxcoGu4+/btqzlz5mjSpEkFWqmPj4/L9uJ7i/DwcN5AboDt4HpsA/fAdigcV51KnqUo8zKvlewYk5wxLjljXLJjTHLGuOQsL+PizLxcoIL7woUL+s9//qOVK1eqQYMGCg0NdZg/bdo0pwQHAAAAAICnylfB/ccff6hixYr66aefVK9ePUnSzp07HdpwijgAAAAAAPksuK+55hodOnRIa9askSR1795dM2bMULly5SwJDtkFBgZqzJgxCgwMdHUoxRrbwfXYBu6B7YC84rWSHWOSM8YlZ4xLdoxJzhiXnLlqXPJ1l3IfHx+lpKQoKipK0sXz37dt26ZKlSpZFiAAAAAAAJ7IpzBPLuBPeAMAAAAA4PXyVXDbbLZs12hzzTYAAAAAANnl6xpuY4z69+9vP+89NTVV999/f7a7lC9atMh5EQIAAAAA4IHyVXD369fP4XHv3r2dGgwAAAAAAN4iXzdNAwAAAAAAeVOom6bBOl9++aU6d+6s2NhY2Ww2LV682GG+MUZjx45VbGysgoOD1apVK23fvt01wXqp3LZB//797fc1yPpr3Lixa4L1UhMnTtT111+vsLAwRUVF6bbbbtOvv/7q0Ib3gvXysh14P0Aid10J+SQ7Pt9zxudtzmbOnKnatWsrPDxc4eHhatKkiT7//HP7/OL4WsltTIrj6yQnEydOlM1m07Bhw+zTivr1QsHtps6cOaM6dero5ZdfznH+5MmTNW3aNL388svavHmzoqOj1bZtW506daqII/VeuW0DSerQoYMOHTpk/1u6dGkRRuj91q5dq0GDBmnjxo1auXKl0tPT1a5dO505c8behveC9fKyHSTeDyB3XQn5JDs+33PG523O4uLiNGnSJG3ZskVbtmxRmzZt1KVLF3uRVBxfK7mNiVT8XieX27x5s15//XXVrl3bYXqRv14M3J4k89FHH9kfZ2ZmmujoaDNp0iT7tNTUVBMREWFmzZrlggi93+XbwBhj+vXrZ7p06eKSeIqrI0eOGElm7dq1xhjeC65y+XYwhvcDsiN35Yx8kjM+33PG5+2VlSpVyvznP//htXKJrDExhtfJqVOnzDXXXGNWrlxpWrZsaYYOHWqMcc1nC0e4PdDu3buVkpKidu3a2acFBgaqZcuWWr9+vQsjK36Sk5MVFRWla6+9Vvfee6+OHDni6pC82okTJyRJpUuXlsR7wVUu3w5ZeD/gani/Xl1xf//w+Z4zPm+zy8jI0MKFC3XmzBk1adKE14qyj0mW4vw6GTRokDp27KibbrrJYborXi/5uks53ENKSookqVy5cg7Ty5Urp71797oipGLp5ptv1p133qmEhATt3r1bo0ePVps2bfTtt9/afzoPzmOM0UMPPaQbbrhBNWvWlMR7wRVy2g4S7wfkjvfrlRX39w+f7znj89bRjz/+qCZNmig1NVUlSpTQRx99pOrVq9uLpOL4WrnSmEjF93UiSQsXLtR3332nzZs3Z5vnis8WCm4PZrPZHB4bY7JNg3W6d+9u/79mzZpq0KCBEhIS9Nlnn6lr164ujMw7DR48WD/88IO++uqrbPN4LxSdK20H3g/IK96v2RX39w+f7znj89ZR1apVtW3bNh0/flwffvih+vXrp7Vr19rnF8fXypXGpHr16sX2dbJ//34NHTpUK1asUFBQ0BXbFeXrhVPKPVB0dLSk/+2hyXLkyJFse2tQdGJiYpSQkKDffvvN1aF4nSFDhuiTTz7RmjVrFBcXZ5/Oe6FoXWk75IT3Ay7H+zXvitP7h8/3nPF5m11AQICqVKmiBg0aaOLEiapTp46mT59erF8rVxqTnBSX18m3336rI0eOqH79+vLz85Ofn5/Wrl2rGTNmyM/Pz/6aKMrXCwW3B0pMTFR0dLRWrlxpn3bhwgWtXbtWTZs2dWFkxdvRo0e1f/9+xcTEuDoUr2GM0eDBg7Vo0SKtXr1aiYmJDvN5LxSN3LZDTng/4HK8X/OuOLx/+HzPGZ+3eWeM0fnz54vtayUnWWOSk+LyOrnxxhv1448/atu2bfa/Bg0a6K677tK2bdtUqVKlIn+9cEq5mzp9+rR+//13++Pdu3dr27ZtKl26tCpUqKBhw4ZpwoQJuuaaa3TNNddowoQJCgkJUa9evVwYtXe52jYoXbq0xo4dqzvuuEMxMTHas2ePHn/8cZUpU0a33367C6P2LoMGDdI777yjjz/+WGFhYfa9kREREQoODrb/riLvBWvlth1Onz7N+wGSyF1XQj7Jjs/3nPF5m7PHH39cN998s+Lj43Xq1CktXLhQycnJWrZsWbF9rVxtTIrr60SSwsLCHO55IEmhoaGKjIy0Ty/y14sl9z5Hoa1Zs8ZIyvbXr18/Y8zFW9qPGTPGREdHm8DAQNOiRQvz448/ujZoL3O1bXD27FnTrl07U7ZsWePv728qVKhg+vXrZ/bt2+fqsL1KTuMvycydO9fehveC9XLbDrwfkIXclTPySXZ8vueMz9uc3X333SYhIcEEBASYsmXLmhtvvNGsWLHCPr84vlauNibF9XVyJZf+LJgxRf96sRljjDWlPAAAAAAAxRfXcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcANwGZvNpsWLF1uy7IoVK+rFF1+0ZNkAAABAXlBwo9jq37+/bDZbtr/ff//dKcufN2+eSpYs6ZRleatDhw7p5ptvliTt2bNHNptN27Ztc21QAAAAgJP4uToAwJU6dOiguXPnOkwrW7asi6K5srS0NPn7+7s6DKeLjo52dQgAAACAZTjCjWItMDBQ0dHRDn++vr6SpE8//VT169dXUFCQKlWqpHHjxik9Pd3+3GnTpqlWrVoKDQ1VfHy8HnjgAZ0+fVqSlJycrAEDBujEiRP2I+djx46VlPNp1CVLltS8efMk/e9I7/vvv69WrVopKChIb7/9tiRp7ty5qlatmoKCgpSUlKRXX331qv1r1aqVhgwZomHDhqlUqVIqV66cXn/9dZ05c0YDBgxQWFiYKleurM8//9z+nIyMDN1zzz1KTExUcHCwqlatqunTpzssNz09XQ8++KBKliypyMhIPfLII+rXr59uu+02h3U/+OCDGjVqlEqXLq3o6Gj7GGS5dCwSExMlSdddd51sNptatWplX86wYcMcnnfbbbepf//+9sdHjhxR586dFRwcrMTERC1YsCDbWJw4cUL33XefoqKiFB4erjZt2uj777+/6vgBAAAAhUHBDeRg+fLl6t27tx588EH9/PPPeu211zRv3jyNHz/e3sbHx0czZszQTz/9pPnz52v16tUaNWqUJKlp06Z68cUXFR4erkOHDunQoUMaOXJkvmJ45JFH9OCDD2rHjh1q3769Zs+erSeeeELjx4/Xjh07NGHCBI0ePVrz58+/6nLmz5+vMmXKaNOmTRoyZIj+/e9/684771TTpk313XffqX379urTp4/Onj0rScrMzFRcXJzef/99/fzzz3rqqaf0+OOP6/3337cv87nnntOCBQs0d+5cff311zp58mSO12LPnz9foaGh+uabbzR58mQ9/fTTWrlyZY5xbtq0SZK0atUqHTp0SIsWLcrzWPXv31979uzR6tWr9cEHH+jVV1/VkSNH7PONMerYsaNSUlK0dOlSffvtt6pXr55uvPFG/fPPP3leDwAAAJAvBiim+vXrZ3x9fU1oaKj97//+7/+MMcY0b97cTJgwwaH9W2+9ZWJiYq64vPfff99ERkbaH8+dO9dERERkayfJfPTRRw7TIiIizNy5c40xxuzevdtIMi+++KJDm/j4ePPOO+84THvmmWdMkyZNrhhTy5YtzQ033GB/nJ6ebkJDQ02fPn3s0w4dOmQkmQ0bNlxxOQ888IC544477I/LlStnpkyZ4rDcChUqmC5dulxx3cYYc/3115tHHnnE/vjSscjq99atW7P1YejQoQ7TunTpYvr162eMMebXX381kszGjRvt83fs2GEkmRdeeMEYY8wXX3xhwsPDTWpqqsNyKleubF577bUr9hsAAAAoDK7hRrHWunVrzZw50/44NDRUkvTtt99q8+bNDke0MzIylJqaqrNnzyokJERr1qzRhAkT9PPPP+vkyZNKT09Xamqqzpw5Y19OYTRo0MD+/19//aX9+/frnnvu0b333mufnp6eroiIiKsup3bt2vb/fX19FRkZqVq1atmnlStXTpIcjgjPmjVL//nPf7R3716dO3dOFy5cUN26dSVdPDX78OHDatiwocNy69evr8zMzCuuW5JiYmIc1uMMO3bskJ+fn8N4JSUlOdyw7ttvv9Xp06cVGRnp8Nxz585p165dTo0HAAAAyELBjWItNDRUVapUyTY9MzNT48aNU9euXbPNCwoK0t69e3XLLbfo/vvv1zPPPKPSpUvrq6++0j333KO0tLSrrtNms8kY4zAtp+dcWrRnFbKzZ89Wo0aNHNplXXN+JZffbM1mszlMs9lsDut4//33NXz4cD3//PNq0qSJwsLCNGXKFH3zzTfZlnOpy/t0pXVfXpTnxsfH56rjlTXv8ngulZmZqZiYGCUnJ2ebx53kAQAAYBUKbiAH9erV06+//ppjMS5JW7ZsUXp6up5//nn5+Fy8FcKl1zhLUkBAgDIyMrI9t2zZsjp06JD98W+//Wa/fvpKypUrp/Lly+uPP/7QXXfdld/u5Mu6devUtGlTPfDAA/Zplx4FjoiIULly5bRp0yY1b95c0sWj/1u3brUfBS+IgIAA+7Iudfl4ZWRk6KefflLr1q0lSdWqVVN6erq2bNliP+r+66+/6vjx4/bn1KtXTykpKfLz81PFihULHCMAAACQHxTcQA6eeuopderUSfHx8brzzjvl4+OjH374QT/++KOeffZZVa5cWenp6XrppZfUuXNnff3115o1a5bDMipWrKjTp0/riy++UJ06dRQSEqKQkBC1adNGL7/8sho3bqzMzEw98sgjefrJr7Fjx+rBBx9UeHi4br75Zp0/f15btmzRsWPH9NBDDzmt71WqVNGbb76p5cuXKzExUW+99ZY2b95sv4u4JA0ZMkQTJ05UlSpVlJSUpJdeeknHjh276lHm3ERFRSk4OFjLli1TXFycgoKCFBERoTZt2uihhx7SZ599psqVK+uFF15wKKarVq2qDh066N5779Xrr78uPz8/DRs2TMHBwfY2N910k5o0aaLbbrtNzz33nKpWraqDBw9q6dKluu222xxORwcAAACchbuUAzlo3769lixZopUrV+r6669X48aNNW3aNCUkJEiS6tatq2nTpum5555TzZo1tWDBAk2cONFhGU2bNtX999+v7t27q2zZspo8ebIk6fnnn1d8fLxatGihXr16aeTIkQoJCck1pn/961/6z3/+o3nz5qlWrVpq2bKl5s2b51AIO8P999+vrl27qnv37mrUqJGOHj3qcLRbungH9Z49e6pv375q0qSJSpQoofbt2ysoKKjA6/Xz89OMGTP02muvKTY2Vl26dJEk3X333erXr5/69u2rli1bKjEx0X50O8vcuXMVHx+vli1bqmvXrvaf/8pis9m0dOlStWjRQnfffbeuvfZa9ejRQ3v27LFfww4AAAA4m83kdOElAORDZmamqlWrpm7duumZZ55xdTgAAACAW+CUcgD5tnfvXq1YsUItW7bU+fPn9fLLL2v37t3q1auXq0MDAAAA3AanlAPINx8fH82bN0/XX3+9mjVrph9//FGrVq1StWrVXB0aAAAA4DY4pRwAAAAAAAtwhBsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGkG979uyRzWbTvHnzXLL+CRMmaPHixS5ZNwAAAJBXNmOMcXUQADzL+fPntXXrVlWuXFlly5Yt8vWXKFFC//d//+eygh8AAADICz9XBwDAc2RkZCg9PV2BgYFq3Lixq8Nxqkv7BgCAtzp79qxCQkJcHQZQbHBKOYqlsWPHymaz6YcfftCdd96piIgIlS5dWg899JDS09P166+/qkOHDgoLC1PFihU1efLkbMs4efKkRo4cqcTERAUEBKh8+fIaNmyYzpw549DulVdeUYsWLRQVFaXQ0FDVqlVLkydPVlpamkO7Vq1aqWbNmtq8ebOaN2+ukJAQVapUSZMmTVJmZmaufbLZbBo8eLBee+01XXvttQoMDFT16tW1cOHCbG1TUlI0cOBAxcXFKSAgQImJiRo3bpzS09PtbbJOG588ebKeffZZJSYmKjAwUGvWrMnxlPKiGlObzaYzZ85o/vz5stlsstlsatWqldP6BgDwLt6Y8/O7ni+//FJNmzZVSEiI7r77bkv6BCBnHOFGsdatWzf17t1bAwcO1MqVK+0JZNWqVXrggQc0cuRIvfPOO3rkkUdUpUoVde3aVdLFvcMtW7bUgQMH9Pjjj6t27dravn27nnrqKf34449atWqVbDabJGnXrl3q1auXPaF9//33Gj9+vH755Re98cYbDvGkpKTorrvu0ogRIzRmzBh99NFHeuyxxxQbG6u+ffvm2p9PPvlEa9as0dNPP63Q0FC9+uqr6tmzp/z8/PR///d/9nU0bNhQPj4+euqpp1S5cmVt2LBBzz77rPbs2aO5c+c6LHPGjBm69tprNXXqVIWHh+uaa65x6Zhu2LBBbdq0UevWrTV69GhJUnh4eJH0DQDgubwp5+dnPYcOHVLv3r01atQoTZgwQT4+Ppb1CUAODFAMjRkzxkgyzz//vMP0unXrGklm0aJF9mlpaWmmbNmypmvXrvZpEydOND4+Pmbz5s0Oz//ggw+MJLN06dIc15uRkWHS0tLMm2++aXx9fc0///xjn9eyZUsjyXzzzTcOz6levbpp3759rn2SZIKDg01KSop9Wnp6uklKSjJVqlSxTxs4cKApUaKE2bt3r8Pzp06daiSZ7du3G2OM2b17t5FkKleubC5cuODQNmve3Llz7dOKckxDQ0NNv379so2BM/oGAPAu3pjz87ueL774wuE5VvQJQM44pRzFWqdOnRweV6tWTTabTTfffLN9mp+fn6pUqaK9e/fapy1ZskQ1a9ZU3bp1lZ6ebv9r3769bDabkpOT7W23bt2qW2+9VZGRkfL19ZW/v7/69u2rjIwM7dy502H90dHRatiwocO02rVrO6z7am688UaVK1fO/tjX11fdu3fX77//rgMHDthjb926tWJjYx1iz+rz2rVrHZZ56623yt/fP0/rl4pmTK/E6r4BADyXN+X8/KynVKlSatOmjcM0q/oEIDtOKUexVrp0aYfHAQEBCgkJUVBQULbpJ0+etD8+fPiwfv/99ysWa3///bckad++fWrevLmqVq2q6dOnq2LFigoKCtKmTZs0aNAgnTt3zuF5kZGR2ZYVGBiYrd2VREdHX3Ha0aNHFRcXp8OHD+vTTz/NNfYsMTExeVp3FqvH9Gqs7hsAwHN5S87P73pyynVW9QlAdhTcQAGUKVNGwcHBV7x2qUyZMpKkxYsX68yZM1q0aJESEhLs87dt22ZJXCkpKVeclpXYy5Qpo9q1a2v8+PE5LiM2NtbhcdY1XFbL65jmtgx37BsAwHO5W87P73pyynXu1ifAm1FwAwXQqVMnTZgwQZGRkUpMTLxiu6wkd+lPTRljNHv2bEvi+uKLL3T48GH7aeUZGRl67733VLlyZcXFxdljX7p0qSpXrqxSpUpZEkdB5HVMpSsfAXDXvgEAPJe75XxnrMfd+gR4MwpuoACGDRumDz/8UC1atNDw4cNVu3ZtZWZmat++fVqxYoVGjBihRo0aqW3btgoICFDPnj01atQopaamaubMmTp27JglcZUpU0Zt2rTR6NGj7Xcp/+WXXxx+Guzpp5/WypUr1bRpUz344IOqWrWqUlNTtWfPHi1dulSzZs2yF+dFKa9jKkm1atVScnKyPv30U8XExCgsLExVq1Z1274BADyXu+V8Z6zH3foEeDMKbqAAQkNDtW7dOk2aNEmvv/66du/ereDgYFWoUEE33XSTKlasKElKSkrShx9+qCeffFJdu3ZVZGSkevXqpYceesjhJi3Ocuutt6pGjRp68skntW/fPlWuXFkLFixQ9+7d7W1iYmK0ZcsWPfPMM5oyZYoOHDigsLAwJSYmqkOHDi47MpzXMZWk6dOna9CgQerRo4f9p02Sk5Pdtm8AAM/lbjnfGetxtz4B3sxmjDGuDgJA4dlsNg0aNEgvv/yyq0MBAAAAIImfBQMAAAAAwAIU3AAAAAAAWIBruAEvwdUhAAAAgHvhCDcAAAAAABag4AYAAAAAwAIuOaU8MzNTBw8eVFhYmGw2mytCAADALRhjdOrUKcXGxsrHxzX7wcnLAABc5Oy87JKC++DBg4qPj3fFqgEAcEv79+9XXFycS9ZNXgYAwJGz8rJLCu6wsDBJFzsRHh7uihAAAHALJ0+eVHx8vD03ugJ5GQCAi5ydl11ScGedrhYeHk5iBwBAcump3ORlAAAcOSsv87NgyG7NxKvPb/1Y0cQBAIDVyHkAAAtRcAMAAFwJBTkAoBD4WTAAAAAAACxAwQ0AAAAAgAU4pRz5V9jT63J7fl6WAQAAAABujoIbAACgoNiJDAC4Ck4pBwAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABbpoGz1TYO6UDAAAAgMUouOF8ebljKwAAAAB4OU4pBwAAAADAAhTcAAAAAABYgIIbAAAAAAALcA13ccP11QAAAABQJDjCDQAAAACABSi4AQAAAACwAAU3AAAAAAAW4BpuuCeuNQcAAADg4TjCDQAAAACABTjCDQAAYKXcztpq/VjRxAEAKHIc4QYAAAAAwAIc4QYAAN6Le4IAAFyIghsAAHgmbymmOeUcALwWp5QDAAAAAGABCm4AAAAAACxAwQ0AAAAAgAW4htvTcJ0XAAAAAHgEjnADAAAAAGABjnAXpbzcTZUj1AAAAADgFSi4UTwVwc6PF1buvOr84W2vLdTyAQAAALg3Cm534y2/KQoAAAAAxRzXcAMAAAAAYAGOcHsbjpA7D3eELxKceg/gishpF5GPAMBjUXADAAB4Mm7KCgBui4IbcGMc/c0bxgkAAADuiGu4AQAAAACwAEe4USxt+ONorm2aVIosgkisxZFfAAAAwHUouIEryK0ob9K6iAJxcxT1AOD++KwGANeg4IZX2jBnpKtD8Ai5fQFzB54QIwAURq47eL3gjCsAKK4ouAEAAIq5ojgCzlF2AMURBTfgIhy5BQA4Q17uS6IKhVtHcSnI3SE3s+MB8C4U3PmR2+9c8huXRSZPXy4s5g5J2R1iyI0nxAjARfLy+9Fwi5znCcg3ANwRBTcAAICFKJidg4IagCei4IZb4ssJnMkdTlME4J3IVwCAq/FxdQAAAAAAAHgjjnCjyHE0AO7GGacpFvYoOUfhAXgyTvcGgJwVn4I7LzdmKexNz7j5Cy7ReN/rhV7Gxgr3OSESAHAR8iKQb4XdeZHbDtq8LN/qnbzuEANQVIpPwY084wg0kH/F4Qh1cfmCVBy2JfLOW3JiYXcCswMYAAqGghsAAHit3ArmJpUiiygSz5ZbwU5BjqJW2J2jxWUnsrfw5J3hLim4jTGSpJMnTzpvoV8+X/hlLBlT+GW4uU17/nF1CF6j1q8vXXX+mSJYx+a4AVedf/2BuYV6Ppwnt8+71DOnrzp/4uLvCh3DoDZVCvX83GKUcu/nK6t/L1QMeelDbuvIbRm59dOpueuS5WXlRlewJC9L0plU5y6vICGcO3/V+au2HyyiSLxbbvlKImcVFWfki9yW4Q75pLCf1c6IITeFzUdFIS952R3iLMrc7Oy8bDMuyPAHDhxQfHx8Ua8WAAC3tX//fsXFxblk3eRlAAAcOSsvu6TgzszM1MGDBxUWFiabzZZjm5MnTyo+Pl779+9XeHh4EUdYNOijd6CP3oE+egdP7KMxRqdOnVJsbKx8fFzza515ycvIG098DXoKxtZajK91GFtrOXt8nZ2XXXJKuY+PT573FoSHh3v9C5M+egf66B3oo3fwtD5GRES4dP35ycvIG097DXoSxtZajK91GFtrOXN8nZmXXbMrHQAAAAAAL0fBDQAAAACABdy24A4MDNSYMWMUGBjo6lAsQx+9A330DvTROxSHPsK98Rq0DmNrLcbXOoyttdx9fF1y0zQAAAAAALyd2x7hBgAAAADAk1FwAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFKLgBAAAAALCAWxXcEydOlM1m07Bhw+zTjDEaO3asYmNjFRwcrFatWmn79u2uC7IA/vzzT/Xu3VuRkZEKCQlR3bp19e2339rne3of09PT9eSTTyoxMVHBwcGqVKmSnn76aWVmZtrbeGIfv/zyS3Xu3FmxsbGy2WxavHixw/y89On8+fMaMmSIypQpo9DQUN166606cOBAEfbiyq7Wv7S0ND3yyCOqVauWQkNDFRsbq759++rgwYMOy3Dn/km5b8NLDRw4UDabTS+++KLDdG/o444dO3TrrbcqIiJCYWFhaty4sfbt22ef7+l9PH36tAYPHqy4uDgFBwerWrVqmjlzpkMbd+8j3NvYsWNls9kc/qKjo+3zPT0fFKWiyq3Hjh1Tnz59FBERoYiICPXp00fHjx+3uHeul9v49u/fP9truXHjxg5tGN/sJk6cqOuvv15hYWGKiorSbbfdpl9//dWhDa/dgsvL+Hrya9dtCu7Nmzfr9ddfV+3atR2mT548WdOmTdPLL7+szZs3Kzo6Wm3bttWpU6dcFGn+HDt2TM2aNZO/v78+//xz/fzzz3r++edVsmRJextP7+Nzzz2nWbNm6eWXX9aOHTs0efJkTZkyRS+99JK9jSf28cyZM6pTp45efvnlHOfnpU/Dhg3TRx99pIULF+qrr77S6dOn1alTJ2VkZBRVN67oav07e/asvvvuO40ePVrfffedFi1apJ07d+rWW291aOfO/ZNy34ZZFi9erG+++UaxsbHZ5nl6H3ft2qUbbrhBSUlJSk5O1vfff6/Ro0crKCjI3sbT+zh8+HAtW7ZMb7/9tnbs2KHhw4dryJAh+vjjj+1t3L2PcH81atTQoUOH7H8//vijfZ6n54OiVFS5tVevXtq2bZuWLVumZcuWadu2berTp4/l/XO1vOS9Dh06OLyWly5d6jCf8c1u7dq1GjRokDZu3KiVK1cqPT1d7dq105kzZ+xteO0WXF7GV/Lg165xA6dOnTLXXHONWblypWnZsqUZOnSoMcaYzMxMEx0dbSZNmmRvm5qaaiIiIsysWbNcFG3+PPLII+aGG2644nxv6GPHjh3N3Xff7TCta9eupnfv3sYY7+ijJPPRRx/ZH+elT8ePHzf+/v5m4cKF9jZ//vmn8fHxMcuWLSuy2PPi8v7lZNOmTUaS2bt3rzHGs/pnzJX7eODAAVO+fHnz008/mYSEBPPCCy/Y53lDH7t3725/L+bEG/pYo0YN8/TTTztMq1evnnnyySeNMZ7XR7ifMWPGmDp16uQ4z9vyQVGyKrf+/PPPRpLZuHGjvc2GDRuMJPPLL79Y3Cv3kdPnZb9+/UyXLl2u+BzGN2+OHDliJJm1a9caY3jtOtvl42uMZ7923eII96BBg9SxY0fddNNNDtN3796tlJQUtWvXzj4tMDBQLVu21Pr164s6zAL55JNP1KBBA915552KiorSddddp9mzZ9vne0Mfb7jhBn3xxRfauXOnJOn777/XV199pVtuuUWSd/Txcnnp07fffqu0tDSHNrGxsapZs6ZH9vvEiROy2Wz2szO8oX+ZmZnq06ePHn74YdWoUSPbfE/vY2Zmpj777DNde+21at++vaKiotSoUSOHUww9vY/Sxc+gTz75RH/++aeMMVqzZo127typ9u3bS/KOPsL1fvvtN8XGxioxMVE9evTQH3/8Ial45gOrOGssN2zYoIiICDVq1MjepnHjxoqIiGC8JSUnJysqKkrXXnut7r33Xh05csQ+j/HNmxMnTkiSSpcuLYnXrrNdPr5ZPPW16/KCe+HChfruu+80ceLEbPNSUlIkSeXKlXOYXq5cOfs8d/fHH39o5syZuuaaa7R8+XLdf//9evDBB/Xmm29K8o4+PvLII+rZs6eSkpLk7++v6667TsOGDVPPnj0leUcfL5eXPqWkpCggIEClSpW6YhtPkZqaqkcffVS9evVSeHi4JO/o33PPPSc/Pz89+OCDOc739D4eOXJEp0+f1qRJk9ShQwetWLFCt99+u7p27aq1a9dK8vw+StKMGTNUvXp1xcXFKSAgQB06dNCrr76qG264QZJ39BGu1ahRI7355ptavny5Zs+erZSUFDVt2lRHjx4tdvnASs4ay5SUFEVFRWVbflRUVLEf75tvvlkLFizQ6tWr9fzzz2vz5s1q06aNzp8/L4nxzQtjjB566CHdcMMNqlmzpiReu86U0/hKnv3a9bNsyXmwf/9+DR06VCtWrHC4nvByNpvN4bExJts0d5WZmakGDRpowoQJkqTrrrtO27dv18yZM9W3b197O0/u43vvvae3335b77zzjmrUqKFt27Zp2LBhio2NVb9+/eztPLmPV1KQPnlav9PS0tSjRw9lZmbq1VdfzbW9p/Tv22+/1fTp0/Xdd9/lO15P6WPWjQu7dOmi4cOHS5Lq1q2r9evXa9asWWrZsuUVn+spfZQuFtwbN27UJ598ooSEBH355Zd64IEHFBMTk+3MqUt5Uh/hWjfffLP9/1q1aqlJkyaqXLmy5s+fb79pT3HIB0XFGWOZU3vGW+revbv9/5o1a6pBgwZKSEjQZ599pq5du17xeYzv/wwePFg//PCDvvrqq2zzeO0W3pXG15Nfuy49wv3tt9/qyJEjql+/vvz8/OTn56e1a9dqxowZ8vPzs+8lunyPw5EjR7LtQXJXMTExql69usO0atWq2e8QnHWXU0/u48MPP6xHH31UPXr0UK1atdSnTx8NHz7cftaCN/TxcnnpU3R0tC5cuKBjx45dsY27S0tLU7du3bR7926tXLnSfnRb8vz+rVu3TkeOHFGFChXsnz979+7ViBEjVLFiRUme38cyZcrIz88v188gT+7juXPn9Pjjj2vatGnq3LmzateurcGDB6t79+6aOnWqJM/vI9xPaGioatWqpd9++63Y5IOi4KyxjI6O1uHDh7Mt/6+//mK8LxMTE6OEhAT99ttvkhjf3AwZMkSffPKJ1qxZo7i4OPt0XrvOcaXxzYknvXZdWnDfeOON+vHHH7Vt2zb7X4MGDXTXXXdp27ZtqlSpkqKjo7Vy5Ur7cy5cuKC1a9eqadOmLow875o1a5bttvY7d+5UQkKCJCkxMdHj+3j27Fn5+Di+lHx9fe1H17yhj5fLS5/q168vf39/hzaHDh3STz/95BH9ziq2f/vtN61atUqRkZEO8z29f3369NEPP/zg8PkTGxurhx9+WMuXL5fk+X0MCAjQ9ddff9XPIE/vY1pamtLS0q76GeTpfYT7OX/+vHbs2KGYmJhikQ+KirPGskmTJjpx4oQ2bdpkb/PNN9/oxIkTjPdljh49qv379ysmJkYS43slxhgNHjxYixYt0urVq5WYmOgwn9du4eQ2vjnxqNeuZbdjK6BL71JujDGTJk0yERERZtGiRebHH380PXv2NDExMebkyZOuCzIfNm3aZPz8/Mz48ePNb7/9ZhYsWGBCQkLM22+/bW/j6X3s16+fKV++vFmyZInZvXu3WbRokSlTpowZNWqUvY0n9vHUqVNm69atZuvWrUaSmTZtmtm6dav9Lt156dP9999v4uLizKpVq8x3331n2rRpY+rUqWPS09Nd1S27q/UvLS3N3HrrrSYuLs5s27bNHDp0yP53/vx5+zLcuX/G5L4NL3f5XcqN8fw+Llq0yPj7+5vXX3/d/Pbbb+all14yvr6+Zt26dfZleHofW7ZsaWrUqGHWrFlj/vjjDzN37lwTFBRkXn31Vfsy3L2PcG8jRowwycnJ5o8//jAbN240nTp1MmFhYWbPnj3GGM/PB0WpqHJrhw4dTO3atc2GDRvMhg0bTK1atUynTp2KvL9F7Wrje+rUKTNixAizfv16s3v3brNmzRrTpEkTU758ecY3F//+979NRESESU5OdvhOdPbsWXsbXrsFl9v4evpr1+0L7szMTDNmzBgTHR1tAgMDTYsWLcyPP/7ougAL4NNPPzU1a9Y0gYGBJikpybz++usO8z29jydPnjRDhw41FSpUMEFBQaZSpUrmiSeecCjMPLGPa9asMZKy/fXr188Yk7c+nTt3zgwePNiULl3aBAcHm06dOpl9+/a5oDfZXa1/u3fvznGeJLNmzRr7Mty5f8bkvg0vl1PB7Q19nDNnjqlSpYoJCgoyderUMYsXL3ZYhqf38dChQ6Z///4mNjbWBAUFmapVq5rnn3/eZGZm2pfh7n2Ee+vevbuJiYkx/v7+JjY21nTt2tVs377dPt/T80FRKqrcevToUXPXXXeZsLAwExYWZu666y5z7NixIuql61xtfM+ePWvatWtnypYta/z9/U2FChVMv379so0d45vdlb4TzZ07196G127B5Ta+nv7atf3/TgIAAAAAACdy+c+CAQAAAADgjSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwALPHOO+/oxRdfzDZ9z549stlsmjp1atEHBQAA8m39+vUaO3asjh8/7upQAI9DwQ3AElcquAEAgGdZv369xo0bR8ENFAAFNwAAAAAAFqDghtcZO3asbDabfvjhB915552KiIhQ6dKl9dBDDyk9PV2//vqrOnTooLCwMFWsWFGTJ0/OtoyTJ09q5MiRSkxMVEBAgMqXL69hw4bpzJkzDu1eeeUVtWjRQlFRUQoNDVWtWrU0efJkpaWlObRr1aqVatasqc2bN6t58+YKCQlRpUqVNGnSJGVmZubap//+979q1KiRIiIi7M+9++677fOTk5Nls9n0zjvv6JFHHlFMTIxKlCihzp076/Dhwzp16pTuu+8+lSlTRmXKlNGAAQN0+vRph3Wkpqbqsccec+jzoEGDsu3NzszM1OTJk5WUlKTAwEBFRUWpb9++OnDggEN/P/vsM+3du1c2m83+d7lp06YpMTFRJUqUUJMmTbRx40aH+f3791eJEiX0+++/65ZbblGJEiUUHx+vESNG6Pz58w5tL1y4oGeffdYeV9myZTVgwAD99ddfDu1Wr16tVq1aKTIyUsHBwapQoYLuuOMOnT171t5m5syZqlOnjkqUKKGwsDAlJSXp8ccfz3U7AUBx5405ODMzUy+99JLq1q2r4OBglSxZUo0bN9Ynn3zi0Ca33HhpLBs2bFDTpk0VHBysihUrau7cuZKkzz77TPXq1VNISIhq1aqlZcuW5Ti+W7duVdeuXRUeHq6IiAj17t07W75777331K5dO8XExCg4OFjVqlXTo48+mm0cJembb75R586dFRkZqaCgIFWuXFnDhg2zr/Phhx+WJCUmJtpzenJysiSpYsWK6tSpk5YtW6Z69eopODhYSUlJeuONN7KtJyUlRQMHDlRcXJwCAgKUmJiocePGKT093aFdbnn47Nmz9tdIUFCQSpcurQYNGujdd9+92qYEXMMAXmbMmDFGkqlatap55plnzMqVK82oUaOMJDN48GCTlJRkZsyYYVauXGkGDBhgJJkPP/zQ/vwzZ86YunXrmjJlyphp06aZVatWmenTp5uIiAjTpk0bk5mZaW87fPhwM3PmTLNs2TKzevVq88ILL5gyZcqYAQMGOMTUsmVLExkZaa655hoza9Yss3LlSvPAAw8YSWb+/PlX7c/69euNzWYzPXr0MEuXLjWrV682c+fONX369LG3WbNmjZFkEhISTP/+/c2yZcvMrFmzTIkSJUzr1q1N27ZtzciRI82KFSvMc889Z3x9fc2QIUPsz8/MzDTt27c3fn5+ZvTo0WbFihVm6tSpJjQ01Fx33XUmNTXV3va+++6zj2XWesqWLWvi4+PNX3/9ZYwxZvv27aZZs2YmOjrabNiwwf5njDG7d+82kkzFihVNhw4dzOLFi83ixYtNrVq1TKlSpczx48ft6+rXr58JCAgw1apVM1OnTjWrVq0yTz31lLHZbGbcuHH2dhkZGaZDhw4mNDTUjBs3zqxcudL85z//MeXLlzfVq1c3Z8+eta87KCjItG3b1ixevNgkJyebBQsWmD59+phjx44ZY4x59913jSQzZMgQs2LFCrNq1Soza9Ys8+CDD179hQcA8LocbIwxffr0MTabzfzrX/8yH3/8sfn888/N+PHjzfTp0+1t8pIbL42latWqZs6cOWb58uWmU6dORpIZN26cqVWrlnn33XfN0qVLTePGjU1gYKD5888/s41vQkKCefjhh83y5cvNtGnT7Pn6woUL9rbPPPOMeeGFF8xnn31mkpOTzaxZs0xiYqJp3bq1Q/+WLVtm/P39Te3atc28efPM6tWrzRtvvGF69OhhjDFm//79ZsiQIUaSWbRokT2nnzhxwhhjTEJCgomLizPVq1c3b775plm+fLm58847jSSzdu1a+3oOHTpk4uPjTUJCgnnttdfMqlWrzDPPPGMCAwNN//797e3ykocHDhxoQkJCzLRp08yaNWvMkiVLzKRJk8xLL72U6/YEihoFN7xOVjJ6/vnnHabXrVvXniyypKWlmbJly5quXbvap02cONH4+PiYzZs3Ozz/gw8+MJLM0qVLc1xvRkaGSUtLM2+++abx9fU1//zzj31ey5YtjSTzzTffODynevXqpn379lftz9SpU40kh0L0clkFd+fOnR2mDxs2zEjKVizedtttpnTp0vbHy5YtM5LM5MmTHdq99957RpJ5/fXXjTHG7Nixw0gyDzzwgEO7b775xkgyjz/+uH1ax44dTUJCQrZYswruWrVqmfT0dPv0TZs2GUnm3XfftU/r16+fkWTef/99h2XccsstpmrVqvbHWcn50i9txhizefNmI8m8+uqrxpj/bcNt27ZliyvL4MGDTcmSJa84HwBwZd6Wg7/88ksjyTzxxBNXbJOf3JgVy5YtW+zTjh49anx9fU1wcLBDcb1t2zYjycyYMcM+LWt8hw8f7rCuBQsWGEnm7bffzjHGzMxMk5aWZtauXWskme+//94+r3LlyqZy5crm3LlzV+zjlClTjCSze/fubPMSEhJMUFCQ2bt3r33auXPnTOnSpc3AgQPt0wYOHGhKlCjh0M6Y/33P2b59uzEmb3m4Zs2a5rbbbrtqG8BdcEo5vFanTp0cHlerVk02m00333yzfZqfn5+qVKmivXv32qctWbJENWvWVN26dZWenm7/a9++vcMpVJK0detW3XrrrYqMjJSvr6/8/f3Vt29fZWRkaOfOnQ7rj46OVsOGDR2m1a5d22HdObn++uslSd26ddP777+vP//8M199lqSOHTtmm/7PP//YTytfvXq1pIuncF/qzjvvVGhoqL744gtJ0po1a3Js17BhQ1WrVs3eLi86duwoX19f++PatWtLUrbxsNls6ty5s8O0y8dtyZIlKlmypDp37uywzerWravo6Gj7Nqtbt64CAgJ03333af78+frjjz+yxdWwYUMdP35cPXv21Mcff6y///47z30CAFzkLTn4888/lyQNGjToim3ymxtjYmJUv359++PSpUsrKipKdevWVWxsrH16Vg7PKca77rrL4XG3bt3k5+dnj0WS/vjjD/Xq1UvR0dH28WnZsqUkaceOHZKknTt3ateuXbrnnnsUFBR0xT7mpm7duqpQoYL9cVBQkK699tps27Z169aKjY112LZZr4m1a9dKylsebtiwoT7//HM9+uijSk5O1rlz5wocO2A1Cm54rdKlSzs8DggIUEhISLaEEhAQoNTUVPvjw4cP64cffpC/v7/DX1hYmIwx9g/+ffv2qXnz5vrzzz81ffp0rVu3Tps3b9Yrr7wiSdk+/CMjI7PFGBgYmGuSaNGihRYvXqz09HT17dtXcXFxqlmzZo7XKeXU56tNz+r30aNH5efnp7Jlyzq0s9lsio6O1tGjR+3tpItfFi4XGxtrn58Xl49HYGCgpOzjltM2CwwMzLbNjh8/roCAgGzbLSUlxb7NKleurFWrVikqKkqDBg1S5cqVVblyZU2fPt2+rD59+uiNN97Q3r17dccddygqKkqNGjXSypUr89w3ACjuvCUH//XXX/L19VV0dPQV2+Q3N14+NtLFccgtV1/q8nj8/PwUGRlpX9fp06fVvHlzffPNN3r22WeVnJyszZs3a9GiRZL+Nz5Z133HxcVdsX95kZfxPXz4sD799NNs27ZGjRqSZN+2ecnDM2bM0COPPKLFixerdevWKl26tG677Tb99ttvheoHYAU/VwcAuJsyZcooODg4x5t9ZM2XpMWLF+vMmTNatGiREhIS7PO3bdvm9Ji6dOmiLl266Pz589q4caMmTpyoXr16qWLFimrSpEmhlx8ZGan09HT99ddfDkW3MUYpKSn2o+xZCfXQoUPZkvPBgwftY1PUypQpo8jIyGw3l8kSFhZm/7958+Zq3ry5MjIytGXLFr300ksaNmyYypUrpx49ekiSBgwYoAEDBujMmTP68ssvNWbMGHXq1Ek7d+502NYAAOdytxxctmxZZWRkKCUlJceCWnJNbkxJSVH58uXtj9PT03X06FF7LKtXr9bBgweVnJxsP6otKduNULNy/uU3d7NCmTJlVLt2bY0fPz7H+Zce3c8tD4eGhmrcuHEaN26cDh8+bD/a3blzZ/3yyy+W9wXID45wA5fp1KmTdu3apcjISDVo0CDbX8WKFSXJftftrCOz0sUCdfbs2ZbFFhgYqJYtW+q5556TdPF0Ome48cYbJUlvv/22w/QPP/xQZ86csc9v06ZNju02b96sHTt22NtlxVpUp3h16tRJR48eVUZGRo7brGrVqtme4+vrq0aNGtmPhnz33XfZ2oSGhurmm2/WE088oQsXLmj79u2W9wUAijN3y8FZpzvPnDnzim3ykxudZcGCBQ6P33//faWnp6tVq1aSch4fSXrttdccHl977bWqXLmy3njjjWy//nGpK52Flh+dOnXSTz/9pMqVK+e4bS8tuLPkJQ+XK1dO/fv3V8+ePfXrr786/OoI4A44wg1cZtiwYfrwww/VokULDR8+XLVr11ZmZqb27dunFStWaMSIEWrUqJHatm2rgIAA9ezZU6NGjVJqaqpmzpypY8eOOTWep556SgcOHNCNN96ouLg4HT9+XNOnT3e4Fquw2rZtq/bt2+uRRx7RyZMn1axZM/3www8aM2aMrrvuOvXp00eSVLVqVd1333166aWX5OPjo5tvvll79uzR6NGjFR8fr+HDh9uXWatWLS1atEgzZ85U/fr15ePjowYNGjgl3sv16NFDCxYs0C233KKhQ4eqYcOG8vf314EDB7RmzRp16dJFt99+u2bNmqXVq1erY8eOqlChglJTU+1HUW666SZJ0r333qvg4GA1a9ZMMTExSklJ0cSJExUREWE/0g8AsIa75eDmzZurT58+evbZZ3X48GF16tRJgYGB2rp1q0JCQjRkyJB85UZnWbRokfz8/NS2bVtt375do0ePVp06ddStWzdJUtOmTVWqVCndf//9GjNmjPz9/bVgwQJ9//332Zb1yiuvqHPnzmrcuLGGDx+uChUqaN++fVq+fLm9sK9Vq5Ykafr06erXr5/8/f1VtWpVhzPIcvP0009r5cqVatq0qR588EFVrVpVqamp2rNnj5YuXapZs2YpLi4uT3m4UaNG6tSpk2rXrq1SpUppx44deuutt9SkSROFhIQUdngBp6LgBi4TGhqqdevWadKkSXr99de1e/du++8133TTTfa960lJSfrwww/15JNPqmvXroqMjFSvXr300EMPOdwUprAaNWqkLVu26JFHHtFff/2lkiVLqkGDBlq9erX9uqfCstlsWrx4scaOHau5c+dq/PjxKlOmjPr06aMJEyY47CGfOXOmKleurDlz5uiVV15RRESEOnTooIkTJzpcwzV06FBt375djz/+uE6cOCFz8VcRnBLv5Xx9ffXJJ59o+vTpeuuttzRx4kT5+fkpLi5OLVu2tH9RqFu3rlasWKExY8YoJSVFJUqUUM2aNfXJJ5+oXbt2ki5+uZo3b57ef/99HTt2TGXKlNENN9ygN998M9s17gAA53K3HCxJ8+bNU7169TRnzhzNmzdPwcHBql69usPvQuc1NzrLokWLNHbsWM2cOdN+c9EXX3zRft13ZGSkPvvsM40YMUK9e/dWaGiounTpovfee0/16tVzWFb79u315Zdf6umnn9aDDz6o1NRUxcXF6dZbb7W3adWqlR577DHNnz9fs2fPVmZmptasWWM/op4XMTEx2rJli5555hlNmTJFBw4cUFhYmBITE9WhQweVKlVKUt7ycJs2bfTJJ5/ohRde0NmzZ1W+fHn17dtXTzzxRCFHFnA+m7HqGzAAAAAApxk7dqzGjRunv/76y2X3TQGQP1zDDQAAAACABSi4AQAAAACwAKeUAwAAAABgAY5wAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFXPI73JmZmTp48KDCwsJks9lcEQIAAG7BGKNTp04pNjZWPj6u2Q9OXgYA4CJn52WXFNwHDx5UfHy8K1YNAIBb2r9/v+Li4lyybvIyAACOnJWXXVJwh4WFSbrYifDwcFeEAACAWzh58qTi4+PtudEVyMsAAFzk7LzskoI763S18PBwEjsKZs3Eq89v/VjRxAEATuLKU7nJyxYjZwGAx3FWXuamaQAAAAAAWMAlR7jh5diTDwAAAAAU3AAAoBhjJzEAwEKcUg4AAAAAgAUouAEAAAAAsAAFNwAAAAAAFuAabrin3K6pAwAAAAA3xxFuAAAAAAAswBFuAACAK+Eu5gCAQuAINwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgGu4AQAAXInrxAHAa1FwA66Sl58+40sWAAAA4LEouAEAAAoqLztPAQDFFtdwAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFuIYbxZMzbliWyzI2/HH0qvObVIrMPQYAAAAAHosj3AAAAAAAWIAj3PBOzrhrLHeeBQAAAFAIFNwAAMB7sfMUAOBCnFIOAAAAAIAFOMKNouchRxu84qZnuY11bjeGAwC4njt8lrtDDADggSi4AQAAPJkzfnkDAGAJCm4AAOCePOSMKAAAroSCG/BgG+aMvOp8jzjtHQAAAPBS3DQNAAAAAAALcIQbxVJuN0RzlxiatLZ2HYVdPgAAzvLCyp1XnT+87bVFFAkAOA9HuAEAAAAAsABHuAE3lts12gAAAADcFwV3ccNPh+ASuZ2+J3EKHwDAOfKScwDA21BwAwAA4Kq4JwgAFAwFN/KP30X1Go33vZ6HVlMLtQ5uggMAAIDiioIbAADA2+W2s5zLyQDAEhTcAK7K6iPUHAEHAACAt6LgRnZecMq4O/zONgAA7iLXvPhH4X4Vwx1uiFYUNwJlJzGA/OJ3uAEAAAAAsABHuD0N12DBzbjDUQ0AgOfL/UaehbuJp8QRagBFjyPcAAAAAABYgCPccEu5/t5npcgiigS5ydtPi13Zxgr3OSkSAAAAwL1QcKPIecsNzbylHwAA9+YJO6HzsvOVHawAiiMKbgAAAFiusGdEFQV3uC8J15kD3oWCGx6Jo8vIF242CMAieclH7nAEGkWDYhnA5bhpGgAAAAAAFuAINwCPl+sRBT7pABRQUZxRxVlbxUdRnLLujKPsHKkHnIevoQAAwDVyu9wDuIQzitXcriP3hBu7ucN15gDyjoIbAADAg3GE/KKiuCkbBbtz1lFcjrI7Y6w9oZ+4OgruopSXPfmFvXkTRwvgYXL78vLCyty/vOS2jA25PH9juvVJ3Ru+OAAAACB/KLgBAIA13GAnsDsc/XWHGABnym0nsjN2phc2BnZkw114T8HtDj/744wvFhZ/OSHpI79c/buprl6/JG2YMzLXNpafRpiHz4YX0u+wNIRi8+XFHfIJgGzc4ZRxd1h+bvmmKE57d4fcXFic7n1RXsbBHfrpyTtYXFJwG2MkSSdPnnTeQs+kXn2+M9dV0BjcwJlz510dAuB2Us+cvur8vLxvcltGbnL9PMzD50tqeuFiyI1TP7PdWRHnk6xxzcqNrmBJXpbcIi+S91CUnJFPrF5HYfNVXtaRG2fEkJvcPs+cEYM7rKOw8hKjO+T/3OJ0ZozOzss244IMf+DAAcXHxxf1agEAcFv79+9XXFycS9ZNXgYAwJGz8rJLCu7MzEwdPHhQYWFhstlsRb36Qjt58qTi4+O1f/9+hYeHuzoct8U45Q3jlHeMVd4wTnnnDmNljNGpU6cUGxsrHx8fl8Tg6Xk5P9xhm7tSce4/fafv9L14KWj/nZ2XXXJKuY+Pj8v24jtTeHh4sXzx5hfjlDeMU94xVnnDOOWdq8cqIiLCZeuWvCcv54ert7mrFef+03f6XtwU575LBeu/M/Oya3alAwAAAADg5Si4AQAAAACwAAV3AQQGBmrMmDEKDAx0dShujXHKG8Yp7xirvGGc8o6xKn6K+zYvzv2n7/S9uCnOfZfcp/8uuWkaAAAAAADejiPcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwS3p1VdfVWJiooKCglS/fn2tW7fuqu3Xrl2r+vXrKygoSJUqVdKsWbOytTl+/LgGDRqkmJgYBQUFqVq1alq6dKlVXSgSVozTiy++qKpVqyo4OFjx8fEaPny4UlNTrepCkcnPWB06dEi9evVS1apV5ePjo2HDhuXY7sMPP1T16tUVGBio6tWr66OPPrIo+qLj7HGaPXu2mjdvrlKlSqlUqVK66aabtGnTJgt7UHSseE1lWbhwoWw2m2677TbnBu0CVoyTN36eextn56d58+bJZrNl+3PH/FSc842z++6t233RokVq27atypYtq/DwcDVp0kTLly/P1s5Ttrvk/P5767b/6quv1KxZM0VGRio4OFhJSUl64YUXsrXzlG3v7L4X2XY3xdzChQuNv7+/mT17tvn555/N0KFDTWhoqNm7d2+O7f/44w8TEhJihg4dan7++Wcze/Zs4+/vbz744AN7m/Pnz5sGDRqYW265xXz11Vdmz549Zt26dWbbtm1F1S2ns2Kc3n77bRMYGGgWLFhgdu/ebZYvX25iYmLMsGHDiqpblsjvWO3evds8+OCDZv78+aZu3bpm6NCh2dqsX7/e+Pr6mgkTJpgdO3aYCRMmGD8/P7Nx40aLe2MdK8apV69e5pVXXjFbt241O3bsMAMGDDARERHmwIEDFvfGWlaMVZY9e/aY8uXLm+bNm5suXbpY04EiYsU4eePnubexIj/NnTvXhIeHm0OHDjn8uZvinG+s6Lu3bvehQ4ea5557zmzatMns3LnTPPbYY8bf399899139jaest2Nsab/3rrtv/vuO/POO++Yn376yezevdu89dZbJiQkxLz22mv2Np6y7a3oe1Ft92JfcDds2NDcf//9DtOSkpLMo48+mmP7UaNGmaSkJIdpAwcONI0bN7Y/njlzpqlUqZK5cOGC8wN2ESvGadCgQaZNmzYObR566CFzww03OClq18jvWF2qZcuWOX4J6Natm+nQoYPDtPbt25sePXoUKlZXsmKcLpeenm7CwsLM/PnzCxqmW7BqrNLT002zZs3Mf/7zH9OvXz+PL7itGCdv/Dz3Nlbkp7lz55qIiAinx+psxTnfWNH34rDds1SvXt2MGzfO/thTtrsx1vS/OG3722+/3fTu3dv+2FO2vRV9L6rtXqxPKb9w4YK+/fZbtWvXzmF6u3bttH79+hyfs2HDhmzt27dvry1btigtLU2S9Mknn6hJkyYaNGiQypUrp5o1a2rChAnKyMiwpiMWs2qcbrjhBn377bf2U37/+OMPLV26VB07drSgF0WjIGOVF1caz8Is05WsGqfLnT17VmlpaSpdurTTllnUrByrp59+WmXLltU999xTqOW4A6vGyds+z72NVflJkk6fPq2EhATFxcWpU6dO2rp1q/M7UAjFOd9Y+blYHLZ7ZmamTp065ZAbPWG7S9b1Xyoe237r1q1av369WrZsaZ/mCdveqr5LRbPdi3XB/ffffysjI0PlypVzmF6uXDmlpKTk+JyUlJQc26enp+vvv/+WdLFw/OCDD5SRkaGlS5fqySef1PPPP6/x48db0xGLWTVOPXr00DPPPKMbbrhB/v7+qly5slq3bq1HH33Umo4UgYKMVV5caTwLs0xXsmqcLvfoo4+qfPnyuummm5y2zKJm1Vh9/fXXmjNnjmbPnl3YEN2CVePkbZ/n3saq/JSUlKR58+bpk08+0bvvvqugoCA1a9ZMv/32mzUdKYDinG+s6ntx2e7PP/+8zpw5o27dutmnecJ2l6zrv7dv+7i4OAUGBqpBgwYaNGiQ/vWvf9nnecK2t6rvRbXd/Zy6NA9ls9kcHhtjsk3Lrf2l0zMzMxUVFaXXX39dvr6+ql+/vg4ePKgpU6boqaeecnL0RcfZ45ScnKzx48fr1VdfVaNGjfT7779r6NChiomJ0ejRo50cfdHK71i5apmuZmWfJk+erHfffVfJyckKCgpyyjJdyZljderUKfXu3VuzZ89WmTJlnBGe23D2a8pbP8+9jbPzU+PGjdW4cWP7/GbNmqlevXp66aWXNGPGDGeF7RTFOd84O87isN3fffddjR07Vh9//LGioqKcskxXcHb/vX3br1u3TqdPn9bGjRv16KOPqkqVKurZs2ehlukKzu57UW33Yl1wlylTRr6+vtn2jBw5ciTbHpQs0dHRObb38/NTZGSkJCkmJkb+/v7y9fW1t6lWrZpSUlJ04cIFBQQEOLkn1rJqnEaPHq0+ffrY9zTVqlVLZ86c0X333acnnnhCPj6edwJGQcYqL640noVZpitZNU5Zpk6dqgkTJmjVqlWqXbt2oZfnSlaM1a5du7Rnzx517tzZPi0zM1OS5Ofnp19//VWVK1cueNAuYNVryts+z72NVfnpcj4+Prr++uvd6mhXcc43VueQLN623d977z3dc889+u9//5vtzC9P2O6Sdf2/nLdt+8TEREkXv2sfPnxYY8eOtRednrDtrer75aza7p5X0ThRQECA6tevr5UrVzpMX7lypZo2bZrjc5o0aZKt/YoVK9SgQQP5+/tLurh35Pfff7d/gZWknTt3KiYmxiO/nFk1TmfPns1WVPv6+spcvJmfE3tQdAoyVnlxpfEszDJdyapxkqQpU6bomWee0bJly9SgQYNCLcsdWDFWSUlJ+vHHH7Vt2zb736233qrWrVtr27Ztio+Pd0boRcqq15S3fZ57G6vy0+WMMdq2bZtiYmKcE7gTFOd8Y2UOuZQ3bfd3331X/fv31zvvvJPjvXI8YbtL1vX/ct607S9njNH58+ftjz1h21vV95zmW7LdLb8tm5vLusX8nDlzzM8//2yGDRtmQkNDzZ49e4wxxjz66KOmT58+9vZZPycyfPhw8/PPP5s5c+Zk+zmRffv2mRIlSpjBgwebX3/91SxZssRERUWZZ599tsj75yxWjNOYMWNMWFiYeffdd80ff/xhVqxYYSpXrmy6detW5P1zpvyOlTHGbN261WzdutXUr1/f9OrVy2zdutVs377dPv/rr782vr6+ZtKkSWbHjh1m0qRJbvmTDflhxTg999xzJiAgwHzwwQcOP+9w6tSpIu2bs1kxVpfzhruUWzFO3vh57m2syE9jx441y5YtM7t27TJbt241AwYMMH5+fuabb74p8v5dTXHON1b03Vu3+zvvvGP8/PzMK6+84pAbjx8/bm/jKdvdGGv6763b/uWXXzaffPKJ2blzp9m5c6d54403THh4uHniiSfsbTxl21vR96La7sW+4DbGmFdeecUkJCSYgIAAU69ePbN27Vr7vH79+pmWLVs6tE9OTjbXXXedCQgIMBUrVjQzZ87Mtsz169ebRo0amcDAQFOpUiUzfvx4k56ebnVXLOXscUpLSzNjx441lStXNkFBQSY+Pt488MAD5tixY0XQG2vld6wkZftLSEhwaPPf//7XVK1a1fj7+5ukpCTz4YcfFkFPrOXscUpISMixzZgxY4qmQxay4jV1KW8ouI2xZpy88fPc2zg7Pw0bNsxUqFDBBAQEmLJly5p27dqZ9evXF0VX8q045xtn991bt3vLli1z7Hu/fv0clukp290Y5/ffW7f9jBkzTI0aNUxISIgJDw831113nXn11VdNRkaGwzI9Zds7u+9Ftd1txnjoubsAAAAAALixYn0NNwAAAAAAVqHgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AHqlixYrq37+//fHBgwc1duxYbdu2zWUxAQCA/Onfv78qVqxYoOeuX79eY8eO1fHjx50aE+BMNmOMcXUQAJBfW7duVXh4uCpXrixJ2rJli66//nrNnTvXoRAHAADua9euXTp58qSuu+66fD936tSpevjhh7V79+4CF+2A1fxcHQAAFERBEjMAAHAvWTvOAW/FKeUotsaOHSubzaYffvhBd955pyIiIlS6dGk99NBDSk9P16+//qoOHTooLCxMFStW1OTJk7Mt4+TJkxo5cqQSExMVEBCg8uXLa9iwYTpz5oxDu1deeUUtWrRQVFSUQkNDVatWLU2ePFlpaWkO7Vq1aqWaNWtq8+bNat68uUJCQlSpUiVNmjRJmZmZufYpMzNTL730kurWravg4GCVLFlSjRs31ieffOLQZvLkyUpKSlJgYKCioqLUt29fHThwoMCxHD9+XCNGjFClSpXsy7zlllv0yy+/2NuMGzdOjRo1UunSpRUeHq569eppzpw5uvQkm9tuu00JCQk59rVRo0aqV6+e/fGlp5QnJyfr+uuvlyQNGDBANptNNptNY8eO1VtvvSWbzaYNGzZkW+bTTz8tf39/HTx4MNexBQBcRP707PxpjNGrr75q72upUqX0f//3f/rjjz9yHaesbb9161Z17dpV4eHhioiIUO/evfXXX39lG9O8jFdOp5TbbDYNHjxYb731lqpVq6aQkBDVqVNHS5YscYjl4YcfliQlJibac39ycrIkafXq1WrVqpUiIyMVHBysChUq6I477tDZs2dz7SfgVAYopsaMGWMkmapVq5pnnnnGrFy50owaNcpIMoMHDzZJSUlmxowZZuXKlWbAgAFGkvnwww/tzz9z5oypW7euKVOmjJk2bZpZtWqVmT59uomIiDBt2rQxmZmZ9rbDhw83M2fONMuWLTOrV682L7zwgilTpowZMGCAQ0wtW7Y0kZGR5pprrjGzZs0yK1euNA888ICRZObPn59rn/r06WNsNpv517/+ZT7++GPz+eefm/Hjx5vp06fb29x33332Pi5btszMmjXLlC1b1sTHx5u//vor37GcPHnS1KhRw4SGhpqnn37aLF++3Hz44Ydm6NChZvXq1fZ2/fv3N3PmzDErV640K1euNM8884wJDg4248aNs7f5+OOPjSSzcuVKh37t2LHDSDIzZsywT0tISDD9+vUzxhhz4sQJM3fuXCPJPPnkk2bDhg1mw4YNZv/+/eb8+fMmOjra3HXXXQ7LTEtLM7GxsebOO+/MdVwBAP9D/vTs/Hnvvfcaf39/M2LECLNs2TLzzjvvmKSkJFOuXDmTkpJy1XHK2vYJCQnm4YcfNsuXLzfTpk0zoaGh5rrrrjMXLlzI93j169fPJCQkOKxHkqlYsaJp2LChef/9983SpUtNq1atjJ+fn9m1a5cxxpj9+/ebIUOGGElm0aJF9tx/4sQJs3v3bhMUFGTatm1rFi9ebJKTk82CBQtMnz59zLFjx67aR8DZKLhRbGUljeeff95het26de0f3lnS0tJM2bJlTdeuXe3TJk6caHx8fMzmzZsdnv/BBx8YSWbp0qU5rjcjI8OkpaWZN9980/j6+pp//vnHPq9ly5ZGkvnmm28cnlO9enXTvn37q/bnyy+/NJLME088ccU2WYn3gQcecJj+zTffGEnm8ccfz3csTz/9dI5J/mqyxuDpp582kZGR9i9XaWlpply5cqZXr14O7UeNGmUCAgLM33//bZ92acFtjDGbN282kszcuXOzrW/MmDEmICDAHD582D7tvffeM5LM2rVr8xw3AID8eSlPy58bNmzIcdvt37/fBAcHm1GjRl11/Vnbfvjw4Q7TFyxYYCSZt99+2xiTv/G6UsFdrlw5c/LkSfu0lJQU4+PjYyZOnGifNmXKFCPJ7N692+H5Wa+lbdu2XbU/QFHglHIUe506dXJ4XK1aNdlsNt188832aX5+fqpSpYr27t1rn7ZkyRLVrFlTdevWVXp6uv2vffv2Dqc0SRdv8HXrrbcqMjJSvr6+8vf3V9++fZWRkaGdO3c6rD86OloNGzZ0mFa7dm2Hdefk888/lyQNGjToim3WrFkjSdluKtawYUNVq1ZNX3zxRb5j+fzzz3Xttdfqpptuump8q1ev1k033aSIiAj7GDz11FM6evSojhw5IuniOPfu3VuLFi3SiRMnJEkZGRl666231KVLF0VGRl51HVfy73//W5I0e/Zs+7SXX35ZtWrVUosWLQq0TAAo7sifnpc/lyxZIpvNpt69ezuMfXR0tOrUqeMw9ldz1113OTzu1q2b/Pz87OOU3/HKSevWrRUWFmZ/XK5cOUVFReW6PSWpbt26CggI0H333af58+fn6XR5wCoU3Cj2Spcu7fA4ICBAISEhCgoKyjY9NTXV/vjw4cP64Ycf5O/v7/AXFhYmY4z+/vtvSdK+ffvUvHlz/fnnn5o+fbrWrVunzZs365VXXpEknTt3zmE9ORWVgYGB2dpd7q+//pKvr6+io6Ov2Obo0aOSpJiYmGzzYmNj7fPzE8tff/2luLi4q8a2adMmtWvXTtLFovfrr7/W5s2b9cQTT0hyHIO7775bqampWrhwoSRp+fLlOnTokAYMGHDVdVxNuXLl1L17d7322mvKyMjQDz/8oHXr1mnw4MEFXiYAFHfkz4s8KX8ePnxYxhiVK1cu2/hv3LjRPva5uXys/Pz8FBkZaR+H/I5XTgq6PaWLN2JbtWqVoqKiNGjQIFWuXFmVK1fW9OnTc30u4GzcpRwooDJlyig4OFhvvPHGFedL0uLFi3XmzBktWrRICQkJ9vnO/r3osmXLKiMjQykpKTkmOOl/yevQoUPZkvzBgwftMed3vZffAOVyCxculL+/v5YsWeLwRWzx4sXZ2lavXl0NGzbU3LlzNXDgQM2dO1exsbH2LxwFNXToUL311lv6+OOPtWzZMpUsWTLbHnoAgPXIn/9bb1HnzzJlyshms2ndunUKDAzMtoycpuUkJSVF5cuXtz9OT0/X0aNH7eNkxXjlV/PmzdW8eXNlZGRoy5YteumllzRs2DCVK1dOPXr0sHz9QBaOcAMF1KlTJ+3atUuRkZFq0KBBtr+sO27abDZJjknMGONwerMzZJ3CN3PmzCu2adOmjSTp7bffdpi+efNm7dixQzfeeGOB1rtz506tXr36im1sNpv8/Pzk6+trn3bu3Dm99dZbObYfMGCAvvnmG3311Vf69NNP1a9fP4fn5iRrfK+057t+/fpq2rSpnnvuOS1YsED9+/dXaGhobt0DADgZ+fN/6y3q/NmpUycZY/Tnn3/mOPa1atXKU+wLFixwePz+++8rPT1drVq1kmTNeOUkt9wvSb6+vmrUqJH9zIjvvvvOKesG8ooj3EABDRs2TB9++KFatGih4cOHq3bt2srMzNS+ffu0YsUKjRgxQo0aNVLbtm0VEBCgnj17atSoUUpNTdXMmTN17Ngxp8bTvHlz9enTR88++6wOHz6sTp06KTAwUFu3blVISIiGDBmiqlWr6r777tNLL70kHx8f3XzzzdqzZ49Gjx6t+Ph4DR8+vEDj8N5776lLly569NFH1bBhQ507d05r165Vp06d1Lp1a3Xs2FHTpk1Tr169dN999+no0aOaOnXqFfek9+zZUw899JB69uyp8+fPZ7sGLCeVK1dWcHCwFixYoGrVqqlEiRKKjY1VbGysvc3QoUPVvXt32Ww2PfDAA/nuKwCg8Mif/xuHos6fzZo103333acBAwZoy5YtatGihUJDQ3Xo0CF99dVXqlWrlv2+J1ezaNEi+fn5qW3bttq+fbtGjx6tOnXqqFu3bpJkyXjlJGsHwfTp09WvXz/5+/uratWqWrBggVavXq2OHTuqQoUKSk1NtZ9Rkds184DTufKObYArZd1p89KfpjDm4t0yQ0NDs7Vv2bKlqVGjhsO006dPmyeffNJUrVrVBAQEmIiICFOrVi0zfPhwh5/W+PTTT02dOnVMUFCQKV++vHn44YfN559/biSZNWvWXHUdWTFdfgfPnGRkZJgXXnjB1KxZ0x5PkyZNzKeffurQ5rnnnjPXXnut8ff3N2XKlDG9e/c2+/fvz7W/V4rl2LFjZujQoaZChQrG39/fREVFmY4dO5pffvnF3uaNN94wVatWNYGBgaZSpUpm4sSJZs6cOTneXdQYY3r16mUkmWbNmuXY18vvUm6MMe+++65JSkoy/v7+RpIZM2aMw/zz58+bwMBA06FDhxyXCQDIHfnTs/Nn1jIbNWpkQkNDTXBwsKlcubLp27ev2bJly1XHKWvbf/vtt6Zz586mRIkSJiwszPTs2dPhl0DyM15Xukv5oEGDsq0/p9z/2GOPmdjYWOPj42N/XWzYsMHcfvvtJiEhwQQGBprIyEjTsmVL88knn1y1f4AVbMYY45pSHwCK1qeffqpbb71Vn332mW655RZXhwMAgEcZO3asxo0bp7/++qtIrsMGvAGnlAPwej///LP27t2rESNGqG7dug4/WQMAAABYhZumAfB6DzzwgG699VaVKlVK7777rv1GPAAAAICVOKUcAAAAAAALcIQbAAAAAAALUHADAAAAAGABl9w0LTMzUwcPHlRYWBjXUgIAijVjjE6dOqXY2Fj5+LhmPzh5GQCAi5ydl11ScB88eFDx8fGuWDUAAG5p//79iouLc8m6ycsAADhyVl52ScEdFhYm6WInwsPDXRECAABu4eTJk4qPj7fnRlcgLwMAcJGz87JLCu6s09XCw8NJ7AAASC49lZu8DACAI2flZZcU3G5rzcTCPb/1Y86JAwAAFI3ccj+5HQBQCNylHAAAAAAAC3CE25nYSw4AAAAA+P84wg0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFuAabgAA4L0K+wskAAAUAke4AQAAAACwAAU3AAAAAAAW8J5TyvlJLgAAAACAG+EINwAAAAAAFqDgBgAAAADAAt5zSjkAAICzOeMu51zWBgDFFke4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFig+Nw0zRk3PQEAAAAAII+KT8HtDvJS9HMnUwAAAADwChTcAADAM3H2GgDAzXENNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALcNM0d5PbDWC4izkAAAAAeASOcAMAAAAAYAEKbgAAAAAALMAp5QAAwD3xO9sAAA/HEW4AAAAAACzAEW5Pw03VAAAAAMAjUHADAABrsJMYHuaFlTuvOn9422uLKBIA3oKCGwAAuEZxuUabHQ8AUGxRcAMFxF5wAAAAAFfDTdMAAAAAALAABTcAAAAAABbglHIAAABX4hpvAPBaFNwocrld+yxZf/2zO8QAAAAAwLtRcBc3ebkjLHvSAQCAk3nDzUa9oQ8AihYFNwAAQDFHIQkA1qDg9jbF5TdNAQAoLrjGGwA8FgU3ssslsb+QfsdV57MXHAAAAAAouFEAjfe9ftX5L6y8r9Dr4NQ2AADyqAjuz+IJedkTbojqCTECcC4KbsCLecIXJACA6+W2M31jhcLvTC8O8lJQAyheKLiRzYY/jro6hFyR0AAAQH7x/QFAUaPgBq7A6qTsjOVzhBoAUFxQLAPwRD6uDgAAAAAAAG/EEe5ixh1OF8/tOjGJa8XyqrB7+4vi5i1cRw4AbsDinw3l6LPzFDZvukPedYcYAHdBwe1mciuIm1SKLNTzi0JeCmp4DpImABROYXN7cUHRnjeME+BZKLjzgYSJ/CAh5g0F/UWMA+CZ3GFHd14UNk5n3MWcO6F7D2/4jsO9dFBUKLjhkUja7sMbki4AeDJ3KPqdcXYbub1oeEveZkc1PAUFtxO5Q8LDReyJx6WKS1L2huv+3AHjULx4Qu72hBjdAfeI8RxFcQ8Zd+At+cRb+uEqLim4jTGSpJMnTzpvoWdSC72ITXv+cUIg1lq1/aCrQygSqWdOX3X+mXPnLV1+XtaRl2VY7foDc686f3PcAI9Yx9VMXPydpct31jpyG6eGFUtfdX5qeperzs/L52Vur8nclpHb850xToPaVCn0Mgrty+evOtsZ2yI/spaXlRtdwZK8LDklN1utsPnEGU7mMk7uEKM7qPXrS1edfyYPy8jtc87VOQ//4w45KzeFXYen5FWrxzovfXhl9e+FXkZeOTsv24wLMvyBAwcUHx9f1KsFAMBt7d+/X3FxcS5ZN3kZAABHzsrLLim4MzMzdfDgQYWFhclmsxV6eSdPnlR8fLz279+v8PBwJ0SI/GIbuBbj73psA9fy5PE3xujUqVOKjY2Vj4+PS2Jwdl6WPHubXI6+uCdv6ovkXf2hL+6JvuSNs/OyS04p9/HxsWQvfnh4uMe/eDwd28C1GH/XYxu4lqeOf0REhEvXb1Veljx3m+SEvrgnb+qL5F39oS/uib7kzpl52TW70gEAAAAA8HIU3AAAAAAAWMArCu7AwECNGTNGgYGBrg6l2GIbuBbj73psA9di/N2PN20T+uKevKkvknf1h764J/riGi65aRoAAAAAAN7OK45wAwAAAADgbii4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFjAYwruV199VYmJiQoKClL9+vW1bt26q7Zfu3at6tevr6CgIFWqVEmzZs0qoki9V362waFDh9SrVy9VrVpVPj4+GjZsWNEF6qXyM/6LFi1S27ZtVbZsWYWHh6tJkyZavnx5EUbrnfKzDb766is1a9ZMkZGRCg4OVlJSkl544YUijNb75DcPZPn666/l5+enunXrWhugl7Ei73744YeqXr26AgMDVb16dX300UeFXq8r+jJ79mw1b95cpUqVUqlSpXTTTTdp06ZNDm3Gjh0rm83m8BcdHe12fZk3b162OG02m1JTUwu1Xlf0pVWrVjn2pWPHjvY27rBd8vodyVXvFyv64ynvmbz0xVPeM3npi6e8Z/L6vdaV75mrMh5g4cKFxt/f38yePdv8/PPPZujQoSY0NNTs3bs3x/Z//PGHCQkJMUOHDjU///yzmT17tvH39zcffPBBEUfuPfK7DXbv3m0efPBBM3/+fFO3bl0zdOjQog3Yy+R3/IcOHWqee+45s2nTJrNz507z2GOPGX9/f/Pdd98VceTeI7/b4LvvvjPvvPOO+emnn8zu3bvNW2+9ZUJCQsxrr71WxJF7h/yOf5bjx4+bSpUqmXbt2pk6deoUTbBewIq8u379euPr62smTJhgduzYYSZMmGD8/PzMxo0bC7xeV/WlV69e5pVXXjFbt241O3bsMAMGDDARERHmwIED9jZjxowxNWrUMIcOHbL/HTlypMD9sKovc+fONeHh4Q5xHjp0qFDrdVVfjh496tCHn376yfj6+pq5c+fa27jDdsnLdyRXvV+s6o+nvGfy0hdPec/kpS+e8p7Jy/daV75ncuMRBXfDhg3N/fff7zAtKSnJPProozm2HzVqlElKSnKYNnDgQNO4cWPLYvR2+d0Gl2rZsiUFdyEVZvyzVK9e3YwbN87ZoRUbztgGt99+u+ndu7ezQysWCjr+3bt3N08++aQZM2YMBXc+WJF3u3XrZjp06ODQpn379qZHjx4FXm9eFMV3iPT0dBMWFmbmz59vn2bFa86KvsydO9dEREQ4db15URTb5YUXXjBhYWHm9OnT9mnusF0udaXvSK56vxR2uXn9zueu75lLXakvnvKeuVRet4snvGeyXP691pXvmdy4/SnlFy5c0Lfffqv/x959h0dV5X8c/0w6CQmGUJIQDCEiRToohGIIKEgRFV1EBAF11wIKIusGUYOogGBDpawsUiwUAVnsoBRRQEXCqsAiCghKExQJIIEk5/cHv8wypMwkmTszybxfz5PnyZx75p5yy7nf26Zr164O6V27dtX69esL/c6GDRsK5O/WrZs2bdqks2fPWlbXiqo0ywDu447+z8vLU1ZWlqpWrWpFFSs8dyyDzMxMrV+/XqmpqVZUsUIrbf/Pnj1bP/74ozIyMqyuYoVi1bhbVJ78eVox1njqGOLUqVM6e/ZsgX3szp07FR8fr6SkJPXr10+7du0qVTusbsuJEyeUmJiohIQE9erVS5mZmWUq15ttOd+sWbPUr18/RUREOKR7e7m4whvbi5XzvZCvbjOuKg/bTGmUl22msONab20zrvD5gPvIkSPKzc1VzZo1HdJr1qypgwcPFvqdgwcPFpo/JydHR44csayuFVVplgHcxx39/+yzz+rkyZPq27evFVWs8MqyDBISEhQaGqrWrVtr6NChuvPOO62saoVUmv7fuXOn0tPT9cYbbygoKMgT1awwrBp3i8qTP08rxhpPHUOkp6erVq1auuqqq+xpbdq00bx58/TRRx9p5syZOnjwoNq1a6ejR4/6VFsaNGigOXPmaPny5Zo/f77CwsLUvn177dy5s9Tleqst5/vyyy/13XffFdjn+sJycYU3thcr53shX91mXFFetpmSKk/bTGHHtd7aZlxRbo5CbDabw2djTIE0Z/kLS4frSroM4F6l7f/58+dr7Nix+ve//60aNWpYVT2/UJplsG7dOp04cUIbN25Uenq6LrnkEt1yyy1WVrPCcrX/c3Nz1b9/fz3++OO69NJLPVW9CseKcdeVeVox1lh5DDFp0iTNnz9fa9asUVhYmD29e/fu9v+bNGmilJQUJScna+7cuRo5cmSp2lFU3crSlrZt26pt27b26e3bt1fLli310ksv6cUXXyx1ua6wcrnMmjVLjRs31hVXXOGQ7ivLxV3ztOrYzMpjPl/fZpwpT9tMSZSXbaa441pvbjPF8fmAu1q1agoMDCxw5uHw4cMFzlDki42NLTR/UFCQYmJiLKtrRVWaZQD3KUv/L1y4UHfccYfeeusth7PIKJmyLIOkpCRJ5wapQ4cOaezYsQTcJVTS/s/KytKmTZuUmZmpYcOGSTp3+5kxRkFBQVqxYoU6d+7skbqXR1aNu0XlyZ+nFWON1ccQzzzzjMaPH6+PP/5YTZs2LbYuERERatKkif0qWEl56ngoICBAl19+ub2e5XG5nDp1SgsWLNC4ceOc1sUby8UV3therJxvPl/fZkrDV7eZkigv20xxx7Xe2mZc4fO3lIeEhKhVq1ZauXKlQ/rKlSvVrl27Qr+TkpJSIP+KFSvUunVrBQcHW1bXiqo0ywDuU9r+nz9/vgYPHqw333zT4ecdUHLu2gaMMcrOznZ39Sq8kvZ/VFSUvv32W23ZssX+d/fdd6t+/frasmWL2rRp46mql0tWjbtF5cmfpxVjjZXHEJMnT9YTTzyhDz/8UK1bt3Zal+zsbG3fvl1xcXGlaInnjoeMMdqyZYu9nuVtuUjSokWLlJ2drQEDBjitizeWiyu8sb1YOV+pfGwzpeGr20xJlIdtxtlxrbe2GZdY+ko2N8l/hfusWbPMtm3bzIgRI0xERITZs2ePMcaY9PR0M3DgQHv+/J+OeOCBB8y2bdvMrFmz+FmwMirpMjDGmMzMTJOZmWlatWpl+vfvbzIzM83WrVu9Uf1yr6T9/+abb5qgoCAzdepUh59xOHbsmLeaUO6VdBm8/PLLZvny5eb7778333//vXn11VdNVFSUGTNmjLeaUK6VZh90Pt5SXjJWjLuff/65CQwMNBMnTjTbt283EydOLPInW4oq11fa8vTTT5uQkBCzePFih31sVlaWPc+DDz5o1qxZY3bt2mU2btxoevXqZSIjI32uLWPHjjUffvih+fHHH01mZqYZMmSICQoKMl988YXL5fpKW/J16NDB3HzzzYWW6wvLxRjnx0je2l6sak952WZcaUt52WZcaUs+X99mXDmu9eY240y5CLiNMWbq1KkmMTHRhISEmJYtW5q1a9fapw0aNMikpqY65F+zZo1p0aKFCQkJMXXq1DHTp0/3cI0rnpIuA0kF/hITEz1b6QqkJP2fmppaaP8PGjTI8xWvQEqyDF588UVz2WWXmfDwcBMVFWVatGhhpk2bZnJzc71Q84qhpPug8xFwl5wV4+5bb71l6tevb4KDg02DBg3MkiVLSlSur7QlMTGx0H1sRkaGPc/NN99s4uLiTHBwsImPjzd9+vRxy0lnd7dlxIgR5uKLLzYhISGmevXqpmvXrmb9+vUlKtdX2mKMMTt27DCSzIoVKwot01eWiyvHSN7aXqxoT3naZpy1pTxtM66sZ+Vhm3H1uNab20xxbMb8/xsnAAAAAACA2/j8M9wAAAAAAJRHBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADaBcev/99zV27FhvVwMA4MN+++039evXTzVq1JDNZtP111/v0fKnTZumOXPmlGkederU0eDBg0v9fZvN5jBezpkzRzabTXv27ClTvXxBWfsG8IQgb1cAAErj/fff19SpUwm6AQBFeuKJJ/T222/r1VdfVXJysqpWrerR8qdNm6Zq1ar5VFDYs2dPbdiwQXFxcd6uSpm9/fbbioqK8nY1gGIRcAOo8IwxOn36tCpVquTtqgAAPOi7775TcnKybr311mLz5ebmKicnR6GhoR6qmfdUr15d1atX93Y13KJFixbergLgFLeUw++MHTtWNptN33zzjf7yl7+oSpUqqlq1qkaOHKmcnBzt2LFD11xzjSIjI1WnTh1NmjSpwDyOHz+uUaNGKSkpSSEhIapVq5ZGjBihkydPOuSbOnWqrrzyStWoUUMRERFq0qSJJk2apLNnzzrk69Spkxo3bqyvvvpKHTt2VHh4uOrWrauJEycqLy/PaZveeusttWnTRlWqVLF/9/bbb5cknThxQhdddJHuuuuuAt/bs2ePAgMDNXnyZEn/u81s1apV+utf/6qYmBhFRUXptttu08mTJ3Xw4EH17dtXF110keLi4jRq1CiHtuzZs0c2m02TJ0/W008/rTp16qhSpUrq1KmTvv/+e509e1bp6emKj49XlSpVdMMNN+jw4cMF6rVw4UKlpKQoIiJClStXVrdu3ZSZmWmfPnjwYE2dOlXSuVvl8v/yb4+z2WwaNmyYZsyYoYYNGyo0NFRz5sxRvXr11K1btwLlnThxQlWqVNHQoUOd9jUAVFQVaXzMH48+/vhjbd++3T5OrFmzxj5t0qRJevLJJ5WUlKTQ0FCtXr1ap0+f1oMPPqjmzZvb25+SkqJ///vfBcrIy8vTSy+9pObNm6tSpUq66KKL1LZtWy1fvlzSududt27dqrVr19rLr1OnjiSVqBxXHT9+3D52V65cWddcc42+//77AvkKu6U8v583bNigdu3aqVKlSqpTp45mz54tSXrvvffUsmVLhYeHq0mTJvrwww8LzHfnzp3q37+/atSoodDQUDVs2NA+Vudbs2aNbDab5s+frzFjxig+Pl5RUVG66qqrtGPHDoe8mZmZ6tWrl31+8fHx6tmzp37++Wd7nsJuKd+7d68GDBjgUI9nn33WYX3JXweeeeYZPffcc0pKSlLlypWVkpKijRs3utzngEsM4GcyMjKMJFO/fn3zxBNPmJUrV5qHHnrISDLDhg0zDRo0MC+++KJZuXKlGTJkiJFklixZYv/+yZMnTfPmzU21atXMc889Zz7++GMzZcoUU6VKFdO5c2eTl5dnz/vAAw+Y6dOnmw8//NCsWrXKPP/886ZatWpmyJAhDnVKTU01MTExpl69embGjBlm5cqV5t577zWSzNy5c4ttz/r1643NZjP9+vUz77//vlm1apWZPXu2GThwoEM9IiIizLFjxxy++/e//92EhYWZI0eOGGOMmT17tpFkkpKSzIMPPmhWrFhhnn76aRMYGGhuueUW07JlS/Pkk0+alStXmn/84x9Gknn22Wft89u9e7eRZBITE821115r3n33XfP666+bmjVrmksvvdQMHDjQ3H777eaDDz4wM2bMMJUrVzbXXnutQ52eeuopY7PZzO23327effdds3TpUpOSkmIiIiLM1q1bjTHG/PDDD+amm24yksyGDRvsf6dPnzbGGCPJ1KpVyzRt2tS8+eabZtWqVea7774zU6ZMMTabzXz//fcOZU6dOtVIss8fAPxRRRofT58+bTZs2GBatGhh6tatax8n/vjjD/tYVatWLZOWlmYWL15sVqxYYXbv3m2OHTtmBg8ebF577TWzatUq8+GHH5pRo0aZgICAAuUNHDjQ2Gw2c+edd5p///vf5oMPPjBPPfWUmTJlijHGmM2bN5u6deuaFi1a2MvfvHmzMcaUqJzExEQzaNCgYpddXl6eSUtLM6Ghoeapp54yK1asMBkZGaZu3bpGksnIyLDnzR/rd+/eXaCf69evb2bNmmU++ugj06tXLyPJPP7446ZJkyZm/vz55v333zdt27Y1oaGh5pdffrF/f+vWraZKlSqmSZMmZt68eWbFihXmwQcfNAEBAWbs2LH2fKtXrzaSTJ06dcytt95q3nvvPTN//nxz8cUXm3r16pmcnBxjjDEnTpwwMTExpnXr1mbRokVm7dq1ZuHChebuu+8227ZtK7JvDh8+bGrVqmWqV69uZsyYYT788EMzbNgwI8ncc8899nz560CdOnXMNddcY5YtW2aWLVtmmjRpYqKjowscLwFlQcANv5N/QHF+oGiMMc2bNzeSzNKlS+1pZ8+eNdWrVzd9+vSxp02YMMEEBASYr776yuH7ixcvNpLM+++/X2i5ubm55uzZs2bevHkmMDDQ/Pbbb/ZpqampRpL54osvHL7TqFEj061bt2Lb88wzzxhJxQ4OP/74owkICDDPP/+8Pe3PP/80MTExDgc3+YPwfffd5/D966+/3kgyzz33nEN68+bNTcuWLe2f8wewZs2amdzcXHv6Cy+8YCSZ3r17O3x/xIgRRpL5448/jDHG7N271wQFBRUoPysry8TGxpq+ffva04YOHWqKOmcoyVSpUsWhj40x5vjx4yYyMtIMHz7cIb1Ro0YmLS2t0HkBgL+oaONj/vcvu+wyh7T8sSo5OdmcOXOm2O/n5OSYs2fPmjvuuMO0aNHCnv7pp58aSWbMmDHFfv+yyy4zqampTutZVDnGuBZwf/DBB0aSPdjP99RTT7kccEsymzZtsqcdPXrUBAYGmkqVKjkE11u2bDGSzIsvvmhP69atm0lISLCP5/mGDRtmwsLC7Ms0P+Du0aOHQ75FixbZT6IbY8ymTZuMJLNs2bJi231h36Snpxe6vtxzzz3GZrOZHTt2GGP+tw40adLEHuQbY8yXX35pJJn58+cXWy5QEtxSDr/Vq1cvh88NGzaUzWZT9+7d7WlBQUG65JJL9NNPP9nT3n33XTVu3FjNmzdXTk6O/a9bt27229XyZWZmqnfv3oqJiVFgYKCCg4N12223KTc3t8BtXrGxsbriiisc0po2bepQdmEuv/xySVLfvn21aNEi/fLLLwXy1K1bV7169dK0adNkjJEkvfnmmzp69KiGDRvmUt9I5160cmF6YfXr0aOHAgICHPIV9X3p3O1fkvTRRx8pJydHt912m0PfhoWFKTU11aFvnencubOio6Md0iIjIzVkyBDNmTPHfnvjqlWrtG3btkL7AQD8UUUZH53p3bu3goODC6S/9dZbat++vSpXrqygoCAFBwdr1qxZ2r59uz3PBx98IEllehTJlXJctXr1akkq8Kx6//79XZ5HXFycWrVqZf9ctWpV1ahRQ82bN1d8fLw9PX/szu//06dP65NPPtENN9yg8PBwh2Xfo0cPnT59usBt2r1793b43LRpU4d5XnLJJYqOjtY//vEPzZgxQ9u2bXOpDatWrVKjRo0KrC+DBw+WMUarVq1ySO/Zs6cCAwOLrAfgDgTc8FsXvqk0JCRE4eHhCgsLK5B++vRp++dDhw7pm2++UXBwsMNfZGSkjDE6cuSIpHNBZMeOHfXLL79oypQpWrdunb766iv780x//vmnQzkxMTEF6hgaGlog34WuvPJKLVu2zB6oJiQkqHHjxpo/f75DvuHDh2vnzp1auXKlpHPPz6WkpKhly5Yu9U1R6ef3TWm+L8k+j0OHDkk6dxLhwv5duHChvW9dUdTbV++77z5lZWXpjTfekCS9/PLLSkhI0HXXXefyvAGgIqso46MzhY0TS5cuVd++fVWrVi29/vrr2rBhg7766ivdfvvtDm399ddfFRgYqNjY2FKV7Wo5rjp69KiCgoIK9FVJ6lfYG9xDQkKcjt1Hjx5VTk6OXnrppQLLvkePHpJUYPy+sJ75L6vLX6ZVqlTR2rVr1bx5cz388MO67LLLFB8fr4yMjALP+Z/v6NGjhS7X/BMGR48eLVE9AHfgLeVACVWrVk2VKlXSq6++WuR0SVq2bJlOnjyppUuXKjEx0T59y5Ytbq/Tddddp+uuu07Z2dnauHGjJkyYoP79+6tOnTpKSUmRdO6Kb+PGjfXyyy+rcuXK2rx5s15//XW316Us8vtu8eLFDn1WGjabrdD0Sy65RN27d9fUqVPVvXt3LV++XI8//rjDGW4AQMn54vhYnMLGiddff11JSUlauHChw/Ts7GyHfNWrV1dubq4OHjxYqp/XcrUcV8XExCgnJ0dHjx51CCIPHjxYqvmVRHR0tAIDAzVw4MAir/gnJSWVeL5NmjTRggULZIzRN998ozlz5mjcuHGqVKmS0tPTC/1OTEyMDhw4UCB9//79kv63DgKeRMANlFCvXr00fvx4xcTEFDuA5A+g5//EiDFGM2fOtKxuoaGhSk1N1UUXXaSPPvpImZmZ9oBbku6//37dfffd+uOPP1SzZk395S9/sawupdGtWzcFBQXpxx9/1I033lhs3vPPQpf0576GDx+url27atCgQQoMDNRf//rXUtcZAHCOL4+PrrLZbAoJCXEIgg8ePFjg7eHdu3fXhAkTNH36dI0bN67I+RV1Jd7VclyVlpamSZMm6Y033tD9999vT3/zzTdLNb+SCA8PV1pamjIzM9W0aVP7FXB3sdlsatasmZ5//nnNmTNHmzdvLjJvly5dNGHCBG3evNnhDr558+bJZrMpLS3NrXUDXEHADZTQiBEjtGTJEl155ZV64IEH1LRpU+Xl5Wnv3r1asWKFHnzwQbVp00ZXX321QkJCdMstt+ihhx7S6dOnNX36dP3+++9urc9jjz2mn3/+WV26dFFCQoKOHTumKVOmKDg4WKmpqQ55BwwYoNGjR+vTTz/VI4884vZBsazq1KmjcePGacyYMdq1a5euueYaRUdH69ChQ/ryyy8VERGhxx9/XNK5M9+S9PTTT6t79+4KDAx0eaC/+uqr1ahRI61evdr+0yEAgLLxtfGxNHr16qWlS5fq3nvv1U033aR9+/bpiSeeUFxcnHbu3GnP17FjRw0cOFBPPvmkDh06pF69eik0NFSZmZkKDw/XfffdJ+l/V2kXLlyounXrKiwsTE2aNHG5HFd17dpVV155pR566CGdPHlSrVu31ueff67XXnvNbX1TnClTpqhDhw7q2LGj7rnnHtWpU0dZWVn64Ycf9M477xR4dtqZd999V9OmTdP111+vunXryhijpUuX6tixY7r66quL/N4DDzygefPmqWfPnho3bpwSExP13nvvadq0abrnnnt06aWXlrWpQIkRcAMlFBERoXXr1mnixIl65ZVXtHv3blWqVEkXX3yxrrrqKvtvbDZo0EBLlizRI488oj59+igmJkb9+/fXyJEjHV48U1Zt2rTRpk2b9I9//EO//vqrLrroIrVu3VqrVq3SZZdd5pC3UqVKuvbaa/X666/r7rvvdlsd3Gn06NFq1KiRpkyZovnz5ys7O1uxsbG6/PLLHercv39/ff7555o2bZrGjRsnY4x2795t739n+vbtq7Fjx/KyNABwE18bH0tjyJAhOnz4sGbMmKFXX31VdevWVXp6un7++Wf7Cd98c+bMUcuWLTVr1izNmTNHlSpVUqNGjfTwww/b8zz++OM6cOCA/vrXvyorK0uJiYnas2dPicpxRUBAgJYvX66RI0dq0qRJOnPmjNq3b6/3339fDRo0KHO/ONOoUSNt3rxZTzzxhB555BEdPnxYF110kerVq2d/jrsk6tWrp4suukiTJk3S/v37FRISovr162vOnDkaNGhQkd+rXr261q9fr9GjR2v06NE6fvy46tatq0mTJmnkyJFlaSJQajaT/8piABXemTNnVKdOHXXo0EGLFi3ydnW8qnXr1rLZbPrqq6+8XRUAAABUUFzhBvzAr7/+qh07dmj27Nk6dOhQkS8bqeiOHz+u7777Tu+++66+/vprvf32296uEgAAACowAm7AD7z33nsaMmSI4uLiNG3atEJ/CswfbN68WWlpaYqJiVFGRoauv/56b1cJAAAAFRi3lAMAAAAAYIEAb1cAAAAAAICKiIAbAAAAAAALeOUZ7ry8PO3fv1+RkZGy2WzeqAIAAD7BGKOsrCzFx8crIMA758EZlwEAOMfd47JXAu79+/erdu3a3igaAACftG/fPiUkJHilbMZlAAAcuWtc9krAHRkZKelcI6KiorxRBQAAfMLx48dVu3Zt+9joDYzLAACc4+5x2SsBd/7talFRUQzsAABIXr2Vm3EZAABH7hqX+R1u+KbVE4qfnjbaM/UAAMBqjHkAUGHxlnIAAAAAACxAwA0AAAAAgAW4pRwVE7fnAQAAAPAyrnADAAAAAGABAm4AAAAAACxAwA0AAAAAgAV4hhsAAKA8c/beEol3lwCAl3CFGwAAAAAACxBwAwAAAABgAQJuAAAAAAAswDPcAAAApcXz0wCAYhBwo3xy5QAHAAAAALyIgBueV16CZWf15IoFAAAAgGLwDDcAAAAAABbgCjdgFa6QAwAAAH6NK9wAAAAAAFiAK9xAaZWXZ9EBAAAAeAVXuAEAAAAAsABXuAFfxnPgAABfwHgEAKVCwI2SY9B1i+dXfu80zwNsoQAAdzzCxGNQAOAV3FIOAAAAAIAFuH4G96sgZ9E37Dpa7PSUujFlmn/bva84z1TGMgAAAAB4DwE3AAAAysaVk+08cgbAD3FLOQAAAAAAFuAKNwAAQFEqyGNSAADvIOBGQf5wcOEPbXQTZ29Tf+DqSz1UEwBAYZy9c0Qq+3tH3IJfOQHghwi4AR/m9MVtaR6qCAAAAIAS4xluAAAAAAAswBVuVEhW/6QXAAAu4zEmAPBbXOEGAAAAAMACXOEGAAAoQkW5Y8qVF6sVp7y0EwB8DQE3AADwWxtmjfJ2FZCPt5gDqIAIuOGXynqm313zKLOyPhfIwQsAAABgGQLu8oYACz6G3+kGAAAACkfADQAAUEqu3O3k7PnnivKcuNe5clGCCw8APIyA29944KdJ3HHgwMGHa+gnAAAAwHcRcAMAgArL2WMvbT1UD/gIXswGwMMIuOFxPvGyMQAAygnGTQAovwK8XQEAAAAAACoirnADfsyl35+9+G9lKoO3mAMApLJfqU+R9e+hAQB3I+AGUKy2e18pdvrGMgbkvoCTAgDg+3hRKIDyiFvKAQAAAACwAFe4gQqsIrxox9nVZ4kr0IC/cmX/APfwl6vLZb7jyQ1vQeeuK6BiIeAGAACWsDx40Y1O6+DssRi4hy+c4HWlDs5ODDhdX1aX7cSCJ96d4hb8fBrgNgTcKJd8YWAHAAAAgOIQcAMAgHKJq9coKU7YA/A0Am6UGIMV3MoNt4yW9bZV5wftzzitQ5lx+x78kNNtt4IcpTBu+g9fWNY8Aw74lgoylHmIJw6InQYfZeMLAwEqljJfYfLEi3Ys3q5cUdbAwh0vj+MgDAAAwLMIuAEAgE/iJDF8jSfWSWcnsp9fWbaXqrn04jYnUtLKOANXToRzZxcqiIoTcLvj6nNZr4K54Sqav/zsBpDP6cHLxR4owwmXri4HLXGSo/hb453Vsa1cuZPA4lvffeAAyRM/E8edAJ7DM9iA+3liu3K2n3RHHTbmlK2MlDucjImeGNM8EZ944sREWevghhjp+Zzij6N8eWz2SsBtjJEkHT9+3H0zPXm6+OmulOVsHh5w8s/sYqcfL2Mdnc0f8DWnT54o8zzKut67UofjQcVvm6dzip+HO7ZNZ/tUZ+1wuk92Zf/jzv16IVxaFmWsQ5n7qYTy55c/NnqDJeOynPclYxJQcr6wXXmiDmUtwyfGNE/EJxaPu26pgxtiLGfHUe4cv9w9LtuMF0b4n3/+WbVr1/Z0sQAA+Kx9+/YpISHBK2UzLgMA4Mhd47JXAu68vDzt379fkZGRstlsBaYfP35ctWvX1r59+xQVFeXp6rkd7fFttMe30R7fRnvKzhijrKwsxcfHKyAgwCNlXsjZuFwaFW3d8CX0rXXoW2vQr9ahb93P3eOyV24pDwgIcOlsQVRUVIVacWiPb6M9vo32+DbaUzZVqlTxWFmFcXVcLo2Ktm74EvrWOvStNehX69C37uXOcdk7p9IBAAAAAKjgCLgBAAAAALCATwbcoaGhysjIUGhoqLer4ha0x7fRHt9Ge3wb7UFR6Evr0LfWoW+tQb9ah771fV55aRoAAAAAABWdT17hBgAAAACgvCPgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABSwLuadOmKSkpSWFhYWrVqpXWrVtXZN4DBw6of//+ql+/vgICAjRixIhC8y1ZskSNGjVSaGioGjVqpLfffrtM5XqzPTNnzlTHjh0VHR2t6OhoXXXVVfryyy8d8owdO1Y2m83hLzY21ifbM2fOnAJ1tdlsOn36dKnL9WZ7OnXqVGh7evbsac/jK8tn6dKluvrqq1W9enVFRUUpJSVFH330UYF85WX7caU95Wn7caU95Wn7caU95Wn7+eyzz9S+fXvFxMSoUqVKatCggZ5//vkC+by5/fiSkrZx7dq1atWqlcLCwlS3bl3NmDHDYbqr674/cHffStKxY8c0dOhQxcXFKSwsTA0bNtT7779vVRN8lrv71pV9nL+wYr194YUXVL9+fVWqVEm1a9fWAw88wD7BDX179uxZjRs3TsnJyQoLC1OzZs304YcfWtkEnM+42YIFC0xwcLCZOXOm2bZtmxk+fLiJiIgwP/30U6H5d+/ebe6//34zd+5c07x5czN8+PACedavX28CAwPN+PHjzfbt28348eNNUFCQ2bhxY6nL9WZ7+vfvb6ZOnWoyMzPN9u3bzZAhQ0yVKlXMzz//bM+TkZFhLrvsMnPgwAH73+HDh8vUFqvaM3v2bBMVFeVQ1wMHDpSpXG+25+jRow7t+O6770xgYKCZPXu2PY+vLJ/hw4ebp59+2nz55Zfm+++/N6NHjzbBwcFm8+bN9jzlaftxpT3laftxpT3laftxpT3lafvZvHmzefPNN813331ndu/ebV577TUTHh5u/vnPf9rzeHP78SUlbeOuXbtMeHi4GT58uNm2bZuZOXOmCQ4ONosXL7bncWXd9wdW9G12drZp3bq16dGjh/nss8/Mnj17zLp168yWLVs81SyfYEXfurKP8wdW9O3rr79uQkNDzRtvvGF2795tPvroIxMXF2dGjBjhqWb5BCv69qGHHjLx8fHmvffeMz/++KOZNm2aCQsLcxi/YR23B9xXXHGFufvuux3SGjRoYNLT051+NzU1tdAAqG/fvuaaa65xSOvWrZvp16+fW8otjhXtuVBOTo6JjIw0c+fOtadlZGSYZs2albS6TlnRntmzZ5sqVapYVq5V83V1+Tz//PMmMjLSnDhxwp7mi8snX6NGjczjjz9u/1xet598F7bnQuVl+8l3YXvK6/aTz9nyKW/bzw033GAGDBhg/+zN7ceXlLSNDz30kGnQoIFD2l133WXatm1r/+zKuu8PrOjb6dOnm7p165ozZ864v8LliBV9e6HC9nH+wIq+HTp0qOncubNDnpEjR5oOHTq4qdblgxV9GxcXZ15++WWHPNddd5259dZb3VRrFMett5SfOXNGX3/9tbp27eqQ3rVrV61fv77U892wYUOBeXbr1s0+T6vKtWq+Fzp16pTOnj2rqlWrOqTv3LlT8fHxSkpKUr9+/bRr164ylWNle06cOKHExEQlJCSoV69eyszMtLxcTy2fWbNmqV+/foqIiHBI98Xlk5eXp6ysLId1qTxvP4W150Llafspqj3ldftxZfmUp+0nMzNT69evV2pqqj3NW9uPLylNG4vqt02bNuns2bP2tOLWfX9gVd8uX75cKSkpGjp0qGrWrKnGjRtr/Pjxys3NtaYhPsjK9fZ8Re3jKjKr+rZDhw76+uuv7Y+J7dq1S++//75f3a5vVd9mZ2crLCzMIU+lSpX02WefubH2KIpbA+4jR44oNzdXNWvWdEivWbOmDh48WOr5Hjx4sNh5WlWuVfO9UHp6umrVqqWrrrrKntamTRvNmzdPH330kWbOnKmDBw+qXbt2Onr0aKnLsao9DRo00Jw5c7R8+XLNnz9fYWFhat++vXbu3GlpuZ5YPl9++aW+++473XnnnQ7pvrp8nn32WZ08eVJ9+/a1p5Xn7aew9lyoPG0/hbWnPG8/zpZPedl+EhISFBoaqtatW2vo0KEO9fXW9uNLStPGovotJydHR44ckeR83fcHVvXtrl27tHjxYuXm5ur999/XI488omeffVZPPfWUNQ3xQVb17fmK2sdVdFb1bb9+/fTEE0+oQ4cOCg4OVnJystLS0pSenm5NQ3yQVX3brVs3Pffcc9q5c6fy8vK0cuVK/fvf/9aBAwesaQgcBFkxU5vN5vDZGFMgzYp5WlGulfOVpEmTJmn+/Plas2aNw5mn7t272/9v0qSJUlJSlJycrLlz52rkyJFlKtPd7Wnbtq3atm1r/9y+fXu1bNlSL730kl588UXLyrV6vtK5M9eNGzfWFVdc4ZDui8tn/vz5Gjt2rP7973+rRo0aJZ6nry2f4tqTrzxtP0W1p7xuP64sn/Ky/axbt04nTpzQxo0blZ6erksuuUS33HJLieZp5X7IV5S0jYXlPz/d1XXfH7i7b/Py8lSjRg298sorCgwMVKtWrbR//35NnjxZjz32mJtr79vc3bfnK2of5y/c3bdr1qzRU089pWnTpqlNmzb64YcfNHz4cMXFxenRRx91c+19m7v7dsqUKfrrX/+qBg0ayGazKTk5WUOGDNHs2bPdXHMUxq0Bd7Vq1RQYGFjgDMzhw4cLnHkpidjY2GLnaVW5Vs033zPPPKPx48fr448/VtOmTYvNGxERoSZNmpTpzL/V7ckXEBCgyy+/3F7X8rp8Tp06pQULFmjcuHFO83p7+SxcuFB33HGH3nrrLYcrvVL53H6Ka0++8rT9uNKefOVh+3GlPeVp+0lKSpJ0Lvg/dOiQxo4daw+4vbX9+JLStLGofgsKClJMTEyh37lw3fcHVvVtXFycgoODFRgYaM/TsGFDHTx4UGfOnFFISIibW+J7rF5vS7KPq2is6ttHH31UAwcOtN8x0KRJE508eVJ/+9vfNGbMGAUEVPxfM7aqb6tXr65ly5bp9OnTOnr0qOLj45Wenm4f/2Att665ISEhatWqlVauXOmQvnLlSrVr167U801JSSkwzxUrVtjnaVW5Vs1XkiZPnqwnnnhCH374oVq3bu00f3Z2trZv3664uLhSl2lle85njNGWLVvsdS2Py0eSFi1apOzsbA0YMMBpXm8un/nz52vw4MF68803C33OqbxtP87aI5Wv7ceV9pzP17cfV9tTXrafCxljlJ2dbf/sre3Hl5SmjUX1W+vWrRUcHFzody5c9/2BVX3bvn17/fDDD8rLy7Pn+f777xUXF+cXwbZk/Xpbkn1cRWNV3546dapAUB0YGChz7iXPbmyB77J6vQ0LC1OtWrWUk5OjJUuW6LrrrnNvA1A4d7+FLf9V9rNmzTLbtm0zI0aMMBEREWbPnj3GGGPS09PNwIEDHb6TmZlpMjMzTatWrUz//v1NZmam2bp1q336559/bgIDA83EiRPN9u3bzcSJE4v8WZaiyvWl9jz99NMmJCTELF682OGnJbKysux5HnzwQbNmzRqza9cus3HjRtOrVy8TGRnpk+0ZO3as+fDDD82PP/5oMjMzzZAhQ0xQUJD54osvXC7Xl9qTr0OHDubmm28utFxfWT5vvvmmCQoKMlOnTnVYl44dO2bPU562H1faU562H1faU562H1fak688bD8vv/yyWb58ufn+++/N999/b1599VUTFRVlxowZY8/jze3Hl5S0b/N/puaBBx4w27ZtM7NmzSrwMzWurPv+wIq+3bt3r6lcubIZNmyY2bFjh3n33XdNjRo1zJNPPunx9nmTFX2br7h9nD+wom8zMjJMZGSkmT9/vtm1a5dZsWKFSU5ONn379vV4+7zJir7duHGjWbJkifnxxx/Np59+ajp37mySkpLM77//7unm+SW3B9zGGDN16lSTmJhoQkJCTMuWLc3atWvt0wYNGmRSU1MdKyEV+EtMTHTI89Zbb5n69eub4OBg06BBA7NkyZISletL7UlMTCw0T0ZGhj3PzTffbOLi4kxwcLCJj483ffr0KTQo9IX2jBgxwlx88cUmJCTEVK9e3XTt2tWsX7++ROX6UnuMMWbHjh1GklmxYkWhZfrK8klNTS20PYMGDXKYZ3nZflxpT3naflxpT3naflxd38rL9vPiiy+ayy67zISHh5uoqCjTokULM23aNJObm+swT29uP76kpPvaNWvWmBYtWpiQkBBTp04dM336dIfprq77/sDdfWvMud+Qb9OmjQkNDTV169Y1Tz31lMnJybG6KT7Hir51to/zF+7u27Nnz5qxY8ea5ORkExYWZmrXrm3uvfdevwwK3d23a9asMQ0bNjShoaEmJibGDBw40Pzyyy+eaAqMMTZj/OQeDQAAAAAAPKjiv30AAAAAAAAvIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAM+bM6cObLZbNqzZ489rVOnTurUqZPX6gQAgC9Zv369xo4dq2PHjlle1vjx47Vs2TLLywFQcRBwA+XMtGnTNG3aNG9XAwAAn7B+/Xo9/vjjBNwAfBIBN2CBU6dOWTbvRo0aqVGjRpbN393+/PNPGWMKnVbWfsrNzVV2dnaZ5gEAgLf9+eefXi2/qPHYGFPmuhV3HAD4AwJuoIzGjh0rm82mzZs366abblJ0dLSSk5MlSZs2bVK/fv1Up04dVapUSXXq1NEtt9yin376qcB8Nm7cqPbt2yssLEzx8fEaPXq0zp49WyDfhbeUr1mzRjabTWvWrHHIt2fPHtlsNs2ZM8eetmvXLvXr10/x8fEKDQ1VzZo11aVLF23ZssVpOzdt2qTevXuratWqCgsLU4sWLbRo0SKHPPm3wK9YsUK33367qlevrvDwcGVnZ6tTp05q3LixPv30U7Vr107h4eG6/fbbJUl79+7VgAEDVKNGDYWGhqphw4Z69tlnlZeXV6A9kyZN0pNPPqmkpCSFhoZq9erVTusOAKiYxo4dq7///e+SpKSkJNlstgJj4sKFC5WSkqKIiAhVrlxZ3bp1U2Zmpn36Z599puDgYI0aNcph3vlj2qxZsyRJNptNJ0+e1Ny5c+3l5I/H+ccCFyrs0bA6deqoV69eWrp0qVq0aKGwsDA9/vjjkqSDBw/qrrvuUkJCgkJCQpSUlKTHH39cOTk5LvWHs7ZK0uDBg1W5cmV9++236tq1qyIjI9WlSxd7G4cNG6YZM2aoYcOGCg0N1dy5c+391KVLF0VGRio8PFzt2rXTe++9V2h7CzsOAPxVkLcrAFQUffr0Ub9+/XT33Xfr5MmTks4FifXr11e/fv1UtWpVHThwQNOnT9fll1+ubdu2qVq1apKkbdu2qUuXLqpTp47mzJmj8PBwTZs2TW+++aZb69ijRw/l5uZq0qRJuvjii3XkyBGtX7/e6W14q1ev1jXXXKM2bdpoxowZqlKlihYsWKCbb75Zp06d0uDBgx3y33777erZs6dee+01nTx5UsHBwZKkAwcOaMCAAXrooYc0fvx4BQQE6Ndff1W7du105swZPfHEE6pTp47effddjRo1Sj/++GOB2+dffPFFXXrppXrmmWcUFRWlevXqubOLAADlyJ133qnffvtNL730kpYuXaq4uDhJst8JNn78eD3yyCMaMmSIHnnkEZ05c0aTJ09Wx44d9eWXX6pRo0bq0KGDnnzySaWnp+vKK69U7969tXXrVg0dOlQDBgzQHXfcIUnasGGDOnfurLS0ND366KOSpKioqFLVe/Pmzdq+fbseeeQRJSUlKSIiQgcPHtQVV1yhgIAAPfbYY0pOTtaGDRv05JNPas+ePZo9e3ax83SlrfnOnDmj3r1766677lJ6erpDQL9s2TKtW7dOjz32mGJjY1WjRg2tXbtWV199tZo2bapZs2YpNDRU06ZN07XXXqv58+fr5ptvdqhLUccBgF8yAMokIyPDSDKPPfaY07w5OTnmxIkTJiIiwkyZMsWefvPNN5tKlSqZgwcPOuRt0KCBkWR2795tT09NTTWpqan2z6tXrzaSzOrVqx3K2r17t5FkZs+ebYwx5siRI0aSeeGFF0rcxgYNGpgWLVqYs2fPOqT36tXLxMXFmdzcXGOMMbNnzzaSzG233VZgHqmpqUaS+eSTTxzS09PTjSTzxRdfOKTfc889xmazmR07dji0Jzk52Zw5c6bEbQAAVEyTJ08uMFYaY8zevXtNUFCQue+++xzSs7KyTGxsrOnbt689LS8vz/To0cNcdNFF5rvvvjONGjUyDRo0MCdOnHD4bkREhBk0aFCBOuQfC1wof1w8v26JiYkmMDDQPr7lu+uuu0zlypXNTz/95JD+zDPPGElm69atRfZBSdo6aNAgI8m8+uqrBeYjyVSpUsX89ttvDult27Y1NWrUMFlZWfa0nJwc07hxY5OQkGDy8vIc2lvYcQDgr7ilHHCTG2+8sUDaiRMn9I9//EOXXHKJgoKCFBQUpMqVK+vkyZPavn27Pd/q1avVpUsX1axZ054WGBhY4IxxWVStWlXJycmaPHmynnvuOWVmZjrcsl2UH374Qf/973916623SpJycnLsfz169NCBAwe0Y8cOh+8U1heSFB0drc6dOzukrVq1So0aNdIVV1zhkD548GAZY7Rq1SqH9N69e3OmHADg1EcffaScnBzddtttDmNXWFiYUlNTHW47t9lsmjdvniIjI9W6dWvt3r1bixYtUkREhCV1a9q0qS699FKHtHfffVdpaWmKj493qG/37t0lSWvXrnVLW/MVNVZ37txZ0dHR9s8nT57UF198oZtuukmVK1e2pwcGBmrgwIH6+eefXT4OAPwRt5QDbpJ/G9v5+vfvr08++USPPvqoLr/8ckVFRclms6lHjx4OLyE5evSoYmNjC3y/sLTSstls+uSTTzRu3DhNmjRJDz74oKpWrapbb71VTz31lCIjIwv93qFDhyRJo0aNKvB8W74jR444fC6sL4pKP3r0qOrUqVMgPT4+3j7dlXkDAHC+/PHr8ssvL3R6QIDjdaeYmBj17t1bU6dO1Q033KAmTZpYVrfCxrJDhw7pnXfeKfKk8oVj7YXflVxva3h4eJG3w19Yt99//13GmELrzFgNOEfADbjJhS9L+eOPP/Tuu+8qIyND6enp9vTs7Gz99ttvDnljYmJ08ODBAvMsLO1CYWFh9vmer7CBOTEx0f7yl++//16LFi3S2LFjdebMGc2YMaPQ+ec/Zz569Gj16dOn0Dz169d3+FzYi2OKSo+JidGBAwcKpO/fv9+hfGfzBgDgfPnjx+LFi5WYmOg0/8qVKzV9+nRdccUVevvtt7VkyRKXr9SePxaHhoba04sKkgsby6pVq6amTZvqqaeeKvQ7+cFtYUra1uLG0gunRUdHKyAggLEaKCUCbsAiNptNxhiHgVeS/vWvfyk3N9chLS0tTcuXL9ehQ4fst5Xn5uZq4cKFTsvJvzr8zTffqFu3bvb05cuXF/u9Sy+9VI888oiWLFmizZs3F5mvfv36qlevnv7zn/9o/PjxTutTUl26dNGECRO0efNmtWzZ0p4+b9482Ww2paWlub1MAEDFkT/OXvjzVd26dVNQUJB+/PFHp4Fz/ks9U1NTtXLlSvXp00d33HGHWrZsqaSkJIeyCvuZrPPH4vOvMr/zzjsut6NXr156//33lZyc7HBLtytK0taSioiIUJs2bbR06VI988wzqlSpkiQpLy9Pr7/+uhISEgrcHg/gfwi4AYtERUXpyiuv1OTJk1WtWjXVqVNHa9eu1axZs3TRRRc55H3kkUe0fPlyde7cWY899pjCw8M1depU+9vOixMbG6urrrpKEyZMUHR0tBITE/XJJ59o6dKlDvm++eYbDRs2TH/5y19Ur149hYSEaNWqVfrmm28crsAX5p///Ke6d++ubt26afDgwapVq5Z+++03bd++XZs3b9Zbb71V4v7J98ADD2jevHnq2bOnxo0bp8TERL333nuaNm2a7rnnHgZxAECx8m/9njJligYNGqTg4GDVr19fderU0bhx4zRmzBjt2rVL11xzjaKjo3Xo0CF9+eWXioiI0OOPP67c3FzdcsststlsevPNNxUYGKg5c+aoefPmuvnmm/XZZ58pJCTEXtaaNWv0zjvvKC4uTpGRkapfv7569OihqlWr6o477tC4ceMUFBSkOXPmaN++fS63Y9y4cVq5cqXatWun+++/X/Xr19fp06e1Z88evf/++5oxY4YSEhIK/a6rbS2tCRMm6Oqrr1ZaWppGjRqlkJAQTZs2Td99953mz5/PFW2gOF5+aRtQ7uW/mfTXX38tMO3nn382N954o4mOjjaRkZHmmmuuMd99951JTEws8JbTzz//3LRt29aEhoaa2NhY8/e//9288sorTt9SbowxBw4cMDfddJOpWrWqqVKlihkwYIDZtGmTw1vKDx06ZAYPHmwaNGhgIiIiTOXKlU3Tpk3N888/b3Jycpy28z//+Y/p27evqVGjhgkODjaxsbGmc+fOZsaMGfY8+W8n/eqrrwp8PzU11Vx22WWFzvunn34y/fv3NzExMSY4ONjUr1/fTJ482f72c2P+95byyZMnO60rAMC/jB492sTHx5uAgIACv9yxbNkyk5aWZqKiokxoaKhJTEw0N910k/n444+NMcaMGTPGBAQEFPgVjfXr15ugoCAzfPhwe9qWLVtM+/btTXh4uJHkMB5/+eWXpl27diYiIsLUqlXLZGRkmH/961+FvqW8Z8+ehbbj119/Nffff79JSkoywcHBpmrVqqZVq1ZmzJgxBd6YXhhnbTXm3FvKIyIiCv2+JDN06NBCp61bt8507tzZREREmEqVKpm2bduad955xyFPcccBgL+yGWOMt4J9AAAAAAAqKn4WDAAAAAAACxBwAwAAAABgAQJuAAAAAAAsQMANAAAAAIAFCLgBAAAAALAAATcAAAAAABYI8kaheXl52r9/vyIjI2Wz2bxRBQAAfIIxRllZWYqPj1dAgHfOgzMuAwBwjrvHZa8E3Pv371ft2rW9UTQAAD5p3759SkhI8ErZjMsAADhy17jslYA7MjJS0rlGREVFeaMKAAD4hOPHj6t27dr2sdEbGJcBADjH3eOyVwLu/NvVoqKifGtgXz2h+Olpoz1TDwCA3/Hmrdw+Oy6XB86OHSSOHwCgHHLXuMxL0wAAAAAAsIBXrnADAABUBBt2HXWaJyXNAxUBAPgk/wm4XbnlCwAAAAAAN+GWcgAAAAAALEDADQAAAACABQi4AQAAAACwgP88ww0AAPxPWX/yk3fAAADKgCvcAAAAAABYgIAbAAAAAAALcEs5AAAon7jdGwDg47jCDQAAAACABQi4AQAAAACwAAE3AAAAAAAW4BluAADgv3zhOfCy/nQZAMBncYUbAAAAAAALEHADAAAAAGABbikHAADwZdxyDgDlFle4AQAAAACwAFe4AQCA39qw66j1hfBiNgDwWwTcJcFgBQAAfI0vBPQAgEJxSzkAAAAAABbgCjcAAIAXObutPaVujIdqAgBwNwJud+KWcwAAAADA/yPgBgAA3sGJagBABUfADQAA4MO45RwAyi9emgYAAAAAgAW4wg0AAHwTP3flX3jEAEAFxBVuAAAAAAAsQMANAAAAAIAFCLgBAAAAALBAxXmGm+e8AADABZy94dtfOH3TeZqHKgIAfqbiBNwAAMCvEEyf45F+4IVmAFAq3FIOAAAAAIAFuMINAACAsnHl0T6uggPwQ1zhBgAAAADAAlzh9qDnV37vNM8DV1/qgZoAAACch5fPAoAlCLg9qO3eV1zI9Yzl9QAAwCMI4soNp28xrxvjoZoAQMXCLeUAAAAAAFiAK9wAAMAnVZSf/aoo7QAAlBxXuAEAAAAAsABXuEuA55sAAABKx9nLYx/gqBRABcSurYJxOpjxFnQAAFAeOXsJH7/zDcAHEXD7GAJmAAAAAKgYCLjLGVd+y7us3yeoBwAAfomr6ADcjID7PGV9i6gn3kLq7Le8N178N8vrwFV4AAAAAHCOgBsFlPUqOgAAqFjKw4tjuYsPgC8i4AYAAECZuHSX38Vlm4dbgnpnt4z7Am5rByoUvwm4PXG7tzs4u2W8InDHGWhuawcAAADg6/wm4Ibn+MIt6Z4IyAn6AaBsysvJcLiHL1xUsPoq+oZZoyydv1u4cpWfq+iA2xBwwydZHbTznBcAAAAAqxFwVzC+8BZzuIagH0BFxxVseJI7rqA7vQIuH3gG3MkVandsdylpxU8vD3f5Oa1j0BLnMynrlX6ex3efctyXXgm4jTGSpOPHj7tvpidPFz/5z2z3lVWONdnxkrer4JKvEoZ4uwqasGxzmb7vbP0+ffJEmeswtPMlJarThaau+sFpnrKWAZzP2TrnC+ubp+uYv6/IHxu9wZJxWYy9cK/jPnCsV9Y6OPu+O7ijH8p6DOPufUlpOK1jkAvLoqztcLa8faCfyg0P9qW7x2Wb8cII//PPP6t27dqeLhYAAJ+1b98+JSQkeKVsxmUAABy5a1z2SsCdl5en/fv3KzIyUjabTcePH1ft2rW1b98+RUVFebo65Qb95Dr6yjX0k+voK9fRV67J76e9e/fKZrMpPj5eAQEBXqnLheMyXMf67jn0tefQ155DX3uOq31tjFFWVpbbxmWv3FIeEBBQ6NmCqKgoVjQX0E+uo69cQz+5jr5yHX3lmipVqni9n4oal+E61nfPoa89h772HPrac1zp6ypVqritPO+cSgcAAAAAoIIj4AYAAAAAwAI+EXCHhoYqIyNDoaGh3q6KT6OfXEdfuYZ+ch195Tr6yjX0U8XAcvQc+tpz6GvPoa89x1t97ZWXpgEAAAAAUNH5xBVuAAAAAAAqGgJuAAAAAAAsQMANAAAAAIAFCLgBAAAAALCARwLuadOmKSkpSWFhYWrVqpXWrVtXbP61a9eqVatWCgsLU926dTVjxgxPVNMnlKSv1qxZI5vNVuDvv//9rwdr7Hmffvqprr32WsXHx8tms2nZsmVOv+Ov61RJ+8pf16kJEybo8ssvV2RkpGrUqKHrr79eO3bscPo9f1yvStNX/rheTZ8+XU2bNlVUVJSioqKUkpKiDz74oNjv+OP6VJ6VZixC6ZR2H42SK82+C2U3YcIE2Ww2jRgxwttVqZDGjh1b4BgkNjbWY+VbHnAvXLhQI0aM0JgxY5SZmamOHTuqe/fu2rt3b6H5d+/erR49eqhjx47KzMzUww8/rPvvv19LliyxuqpeV9K+yrdjxw4dOHDA/levXj0P1dg7Tp48qWbNmunll192Kb8/r1Ml7at8/rZOrV27VkOHDtXGjRu1cuVK5eTkqGvXrjp58mSR3/HX9ao0fZXPn9arhIQETZw4UZs2bdKmTZvUuXNnXXfdddq6dWuh+f11fSrPSrt/RcmVZb+Dkinpvgtl99VXX+mVV15R06ZNvV2VCu2yyy5zOAb59ttvPVe4sdgVV1xh7r77boe0Bg0amPT09ELzP/TQQ6ZBgwYOaXfddZdp27atZXX0FSXtq9WrVxtJ5vfff/dA7XyTJPP2228Xm8ef16nzudJXrFPnHD582Egya9euLTIP69U5rvQV69U50dHR5l//+leh01ifyjdX9q9wH1f2O3Cf4vZdKJusrCxTr149s3LlSpOammqGDx/u7SpVSBkZGaZZs2ZeK9/SK9xnzpzR119/ra5duzqkd+3aVevXry/0Oxs2bCiQv1u3btq0aZPOnj1rWV29rTR9la9FixaKi4tTly5dtHr1aiurWS756zpVFv6+Tv3xxx+SpKpVqxaZh/XqHFf6Kp+/rle5ublasGCBTp48qZSUlELzsD4BrivJfgel58q+C2UzdOhQ9ezZU1dddZW3q1Lh7dy5U/Hx8UpKSlK/fv20a9cuj5VtacB95MgR5ebmqmbNmg7pNWvW1MGDBwv9zsGDBwvNn5OToyNHjlhWV28rTV/FxcXplVde0ZIlS7R06VLVr19fXbp00aeffuqJKpcb/rpOlQbrlGSM0ciRI9WhQwc1bty4yHysV673lb+uV99++60qV66s0NBQ3X333Xr77bfVqFGjQvOyPgGucXW/g9Iryb4LpbdgwQJt3rxZEyZM8HZVKrw2bdpo3rx5+uijjzRz5kwdPHhQ7dq109GjRz1SfpAnCrHZbA6fjTEF0pzlLyy9IipJX9WvX1/169e3f05JSdG+ffv0zDPP6Morr7S0nuWNP69TJcE6JQ0bNkzffPONPvvsM6d5/X29crWv/HW9ql+/vrZs2aJjx45pyZIlGjRokNauXVvkgau/r0+AK0qyj0bplHTfhZLbt2+fhg8frhUrVigsLMzb1anwunfvbv+/SZMmSklJUXJysubOnauRI0daXr6lV7irVaumwMDAAldoDx8+XOBMfr7Y2NhC8wcFBSkmJsayunpbafqqMG3bttXOnTvdXb1yzV/XKXfxp3Xqvvvu0/Lly7V69WolJCQUm9ff16uS9FVh/GG9CgkJ0SWXXKLWrVtrwoQJatasmaZMmVJoXn9fnwBXlHW/A9eUZN+F0vn66691+PBhtWrVSkFBQQoKCtLatWv14osvKigoSLm5ud6uYoUWERGhJk2aeOw4xNKAOyQkRK1atdLKlSsd0leuXKl27doV+p2UlJQC+VesWKHWrVsrODjYsrp6W2n6qjCZmZmKi4tzd/XKNX9dp9zFH9YpY4yGDRumpUuXatWqVUpKSnL6HX9dr0rTV4Xxh/XqQsYYZWdnFzrNX9cnwBXu2u+gdIrbd6F0unTpom+//VZbtmyx/7Vu3Vq33nqrtmzZosDAQG9XsULLzs7W9u3bPXccYvVb2RYsWGCCg4PNrFmzzLZt28yIESNMRESE2bNnjzHGmPT0dDNw4EB7/l27dpnw8HDzwAMPmG3btplZs2aZ4OBgs3jxYqur6nUl7avnn3/evP322+b777833333nUlPTzeSzJIlS7zVBI/IysoymZmZJjMz00gyzz33nMnMzDQ//fSTMYZ16nwl7St/XafuueceU6VKFbNmzRpz4MAB+9+pU6fseVivzilNX/njejV69Gjz6aefmt27d5tvvvnGPPzwwyYgIMCsWLHCGMP6VBE427/CfVzZ78A9nO27YB3eUm6dBx980KxZs8bs2rXLbNy40fTq1ctERkbaYyyrWR5wG2PM1KlTTWJiogkJCTEtW7Z0+BmHQYMGmdTUVIf8a9asMS1atDAhISGmTp06Zvr06Z6opk8oSV89/fTTJjk52YSFhZno6GjToUMH895773mh1p6V/xNDF/4NGjTIGMM6db6S9pW/rlOF9ZEkM3v2bHse1qtzStNX/rhe3X777fZ9efXq1U2XLl0cDlhZn8o/Z/tXuI8r+x24h7N9F6xDwG2dm2++2cTFxZng4GATHx9v+vTpY7Zu3eqx8m3G/P9bWQAAAAAAgNtY+gw3AAAAAAD+ioAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEPGzx4sOrUqeOVstevX6+xY8fq2LFjXikfAAAA8Cc2Y4zxdiUAf/Ljjz/q+PHjatGihcfLfuaZZ/T3v/9du3fv9lrQDwAAAPgLrnADHnLq1ClJUnJysleCbSvlt83dzp49q5ycHEvKNMbozz//LNM8AAD+o7gxydcUNUbm5uYqOzvbknkDKBwBN/ze2LFjZbPZlJmZqT59+igqKkpVqlTRgAED9OuvvxbIv3DhQqWkpCgiIkKVK1dWt27dlJmZ6ZBn8ODBqly5sr799lt17dpVkZGR6tKli33ahVeXbTabhg0bptmzZ6t+/fqqVKmSWrdurY0bN8oYo8mTJyspKUmVK1dW586d9cMPPxSo18cff6wuXbooKipK4eHhat++vT755BOHdv7973+XJCUlJclms8lms2nNmjVua1tRdu7cqf79+6tGjRoKDQ1Vw4YNNXXqVIc8a9askc1m02uvvaYHH3xQtWrVUmhoqH744Ydiy/ztt9907733qlatWgoJCVHdunU1ZsyYAgcU+X08Y8YMNWzYUKGhoZo7d26x9QYAVCw//PCDhgwZonr16ik8PFy1atXStddeq2+//dYhX3FjkuR8zC1JWUUxxmjatGlq3ry5KlWqpOjoaN10003atWuXQ75OnTqpcePG+vTTT9WuXTuFh4fr9ttv1549e2Sz2TRp0iQ9+eSTSkpKUmhoqFavXi1JWr58uVJSUhQeHq7IyEhdffXV2rBhg8O884+RNm/erJtuuknR0dFKTk4uUZ8D/o6AG/h/N9xwgy655BItXrxYY8eO1bJly9StWzedPXvWnmf8+PG65ZZb1KhRIy1atEivvfaasrKy1LFjR23bts1hfmfOnFHv3r3VuXNn/fvf/9bjjz9ebPnvvvuu/vWvf2nixImaP3++srKy1LNnTz344IP6/PPP9fLLL+uVV17Rtm3bdOONN+r8p0Fef/11de3aVVFRUZo7d64WLVqkqlWrqlu3bvYDgDvvvFP33XefJGnp0qXasGGDNmzYoJYtW1ratm3btunyyy/Xd999p2effVbvvvuuevbsqfvvv7/Q740ePVp79+7VjBkz9M4776hGjRpFlnn69GmlpaVp3rx5GjlypN577z0NGDBAkyZNUp8+fQrMe9myZZo+fboee+wxffTRR+rYsWOxywQAULHs379fMTExmjhxoj788ENNnTpVQUFBatOmjXbs2FEgf2FjkitjbmnKutBdd92lESNG6KqrrtKyZcs0bdo0bd26Ve3atdOhQ4cc8h44cEADBgxQ//799f777+vee++1T3vxxRe1atUqPfPMM/rggw/UoEEDvfnmm7ruuusUFRWl+fPna9asWfr999/VqVMnffbZZwXq0qdPH11yySV66623NGPGjJJ0OQAD+LmMjAwjyTzwwAMO6W+88YaRZF5//XVjjDF79+41QUFB5r777nPIl5WVZWJjY03fvn3taYMGDTKSzKuvvlqgvEGDBpnExESHNEkmNjbWnDhxwp62bNkyI8k0b97c5OXl2dNfeOEFI8l88803xhhjTp48aapWrWquvfZah3nm5uaaZs2amSuuuMKeNnnyZCPJ7N692yGvu9pWmG7dupmEhATzxx9/OKQPGzbMhIWFmd9++80YY8zq1auNJHPllVcWmEdRZc6YMcNIMosWLXJIf/rpp40ks2LFCnuaJFOlShV7eQAA5OTkmDNnzph69eo5HAcUNSaVZMx1tazCbNiwwUgyzz77rEP6vn37TKVKlcxDDz1kT0tNTTWSzCeffOKQd/fu3UaSSU5ONmfOnHGoa3x8vGnSpInJzc21p2dlZZkaNWqYdu3a2dPyj5Eee+yxYusLoGhc4Qb+36233urwuW/fvgoKCrLfevXRRx8pJydHt912m3Jycux/YWFhSk1Ndbg1O9+NN97ocvlpaWmKiIiwf27YsKEkqXv37rLZbAXSf/rpJ0nn3jz+22+/adCgQQ71ysvL0zXXXKOvvvpKJ0+eLLZsq9p2+vRpffLJJ7rhhhsUHh7uMO8ePXro9OnT2rhxo8vzvXDaqlWrFBERoZtuuskhffDgwZJU4Pa+zp07Kzo62mm9AQAVU05OjsaPH69GjRopJCREQUFBCgkJ0c6dO7V9+/YC+S8cd0oy5pa0rPO9++67stlsGjBggEM5sbGxatasWYFxOTo6Wp07dy50Xr1791ZwcLD9844dO7R//34NHDhQAQH/CwUqV66sG2+8URs3bizwnHZJjmcAOArydgUAXxEbG+vwOSgoSDExMTp69Kgk2W/fuvzyywv9/vmDliSFh4crKirK5fKrVq3q8DkkJKTY9NOnTzvU68Kg83y//fabQzB/IavadvToUeXk5Oill17SSy+9VGieI0eOOHyOi4srNF9hZR49elSxsbEOJyQkqUaNGgoKCrIvO2fzBgD4h5EjR2rq1Kn6xz/+odTUVEVHRysgIEB33nlnoS/SvHDcKMmYW9KyLizHGKOaNWsWOr1u3brF1rO4afljY2HfiY+PV15enn7//XeFh4e7NH8AxSPgBv7fwYMHVatWLfvnnJwcHT16VDExMZKkatWqSZIWL16sxMREp/O7MAi0Sn69XnrpJbVt27bQPEUN2BfOw91ti46OVmBgoAYOHKihQ4cWmicpKcmleReWHhMToy+++ELGGIfphw8fVk5Ojr1dJa03AKBiev3113Xbbbdp/PjxDulHjhzRRRddVCD/heNGScbckpZ1YTk2m03r1q1TaGhogekXphU3vl04Lf+45sCBAwXy7t+/XwEBAQXuBmP8BEqPgBv4f2+88YZatWpl/7xo0SLl5OSoU6dOkqRu3bopKChIP/74o0/dWtW+fXtddNFF2rZtm4YNG1Zs3vwB+sIz61a1LTw8XGlpacrMzFTTpk3tV+fdpUuXLlq0aJGWLVumG264wZ4+b948+3QAAPLZbLYCwep7772nX375RZdcconT75dkzC1LWb169dLEiRP1yy+/qG/fvk7rVRL169dXrVq19Oabb2rUqFH2YPrkyZNasmSJ/c3lANyDgBv4f0uXLlVQUJCuvvpqbd26VY8++qiaNWtmH+jq1KmjcePGacyYMdq1a5euueYaRUdH69ChQ/ryyy8VERHh9E3kVqhcubJeeuklDRo0SL/99ptuuukm1ahRQ7/++qv+85//6Ndff9X06dMlSU2aNJEkTZkyRYMGDVJwcLDq169vadumTJmiDh06qGPHjrrnnntUp04dZWVl6YcfftA777yjVatWlbrtt912m6ZOnapBgwZpz549atKkiT777DONHz9ePXr00FVXXVXqeQMAKp5evXppzpw5atCggZo2baqvv/5akydPVkJCgkvfL8mYW5ay2rdvr7/97W8aMmSINm3apCuvvFIRERE6cOCAPvvsMzVp0kT33HNPqfogICBAkyZN0q233qpevXrprrvuUnZ2tiZPnqxjx45p4sSJpZovgMIRcAP/b+nSpRo7dqymT58um82ma6+9Vi+88ILDVdnRo0erUaNGmjJliubPn6/s7GzFxsbq8ssv19133+21ug8YMEAXX3yxJk2apLvuuktZWVmqUaOGmjdvbn+BmHTutzpHjx6tuXPnaubMmcrLy9Pq1avt6Va0rVGjRtq8ebOeeOIJPfLIIzp8+LAuuugi1atXTz169ChTu8PCwrR69WqNGTNGkydP1q+//qpatWpp1KhRysjIKNO8AQAVz5QpUxQcHKwJEyboxIkTatmypZYuXapHHnnE5Xm4OuaWtax//vOfatu2rf75z39q2rRpysvLU3x8vNq3b68rrriipE130L9/f0VERGjChAm6+eabFRgYqLZt22r16tVq165dmeYNwJHNmPN+zBfwQ2PHjtXjjz+uX3/9tcAzvwAAAABQWvwsGAAAAAAAFiDgBgAAAADAAtxSDgAAAACABbjCDQAAAACABQi4AQAAAACwgFd+FiwvL0/79+9XZGSkbDabN6oAAIBPMMYoKytL8fHxCgjwznlwxmUAAM5x97jslYB7//79ql27tjeKBgDAJ+3bt08JCQleKZtxGQAAR+4al70ScEdGRko614ioqChvVAEAAJ9w/Phx1a5d2z42egPjMgAA57h7XPZKwJ1/u1pUVBQDOwAAkldv5WZcBgDAkbvGZa8E3F6xeoLzPGmjra8HAAA4x9nYzLgMACjneEs5AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACQd6uAAAAqKBWT/B2DQAA8CoC7vM5OzBIG+2ZegAAAAAAyj1uKQcAAAAAwAIE3AAAAAAAWICAGwAAAAAACxBwAwAAAABgAV6aVhK8VA0AAAAA4CKucAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALBDk7QpUKKsnFD89bbRn6gEAAAAA8DqucAMAAAAAYAECbgAAAAAALFBxbil3djs3AAAAAAAexBVuAAAAAAAsQMANAAAAAIAFCLgBAAAAALAAATcAAAAAABYg4AYAAAAAwAIV5y3l5cDzK793mueBqy/1QE0AAAAAAFYj4AYAAL7J2U9+po32TD0AACglbikHAAAAAMACXOEugQ27jhY7PaVujIdqAgAAAADwdQTcbuQsINfFnqkHAAAAAMD7CLg9qO3eV5zmeX7l38pUBi9dAwAAAADfQMANAAC8oqyPavHrHwAAX0fADQAAfBKPagEAyjsC7grGlbP9znA1AAAAAADKjoD7PE7PpAMAgHKlrCeiOQkNACgLAm4f4+zFahsvLttL1VzBwQkAAO7jbFxl3ASAistvAu6KcvXaFwJyX8DBCwDAlV//KOu4yHgDACgLvwm44V84QAIAeII73p3CmAUAFRcBN9yuory4jQMgAIAvYDwCgPLLKwG3MUaSdPz4cffN9OTp4if/me2+snxYkx0vWV7GVwlDLC9jwrLNls7flXXv9MkTxU63uo6SNLTzJZaXAcC78vdH+WOjN1gyLkv6cut+t86vNJyNi54Y06zmifHIGcYrABWFu8dlm/HCCP/zzz+rdu3ani4WAACftW/fPiUkJHilbMZlAAAcuWtc9krAnZeXp/379ysyMlI2m03SuTMJtWvX1r59+xQVFeXpKuH/sRy8j2XgfSwD7/OnZWCMUVZWluLj4xUQEOCVOhQ2LrvKn5aVu9BnJUeflRx9VnL0WclVxD5z97jslVvKAwICijxbEBUVVWEWVnnGcvA+loH3sQy8z1+WQZUqVbxafnHjsqv8ZVm5E31WcvRZydFnJUeflVxF6zN3jsveOZUOAAAAAEAFR8ANAAAAAIAFfCbgDg0NVUZGhkJDQ71dFb/GcvA+loH3sQy8j2VQfrCsSo4+Kzn6rOTos5Kjz0qOPnPOKy9NAwAAAACgovOZK9wAAAAAAFQkBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALOAzAfe0adOUlJSksLAwtWrVSuvWrfN2lfzG2LFjZbPZHP5iY2O9Xa0K7dNPP9W1116r+Ph42Ww2LVu2zGG6MUZjx45VfHy8KlWqpE6dOmnr1q3eqWwF5WwZDB48uMB20bZtW+9UtoKaMGGCLr/8ckVGRqpGjRq6/vrrtWPHDoc8bAu+j/H7f9yxb8/OztZ9992natWqKSIiQr1799bPP//swVZ4jrv2Af7UZ9OnT1fTpk0VFRWlqKgopaSk6IMPPrBPp7+cmzBhgmw2m0aMGGFPo98cOYsN6K+S8YmAe+HChRoxYoTGjBmjzMxMdezYUd27d9fevXu9XTW/cdlll+nAgQP2v2+//dbbVarQTp48qWbNmunll18udPqkSZP03HPP6eWXX9ZXX32l2NhYXX311crKyvJwTSsuZ8tAkq655hqH7eL999/3YA0rvrVr12ro0KHauHGjVq5cqZycHHXt2lUnT56052Fb8G2M347csW8fMWKE3n77bS1YsECfffaZTpw4oV69eik3N9dTzfAYd+0D/KnPEhISNHHiRG3atEmbNm1S586ddd1119mDHfqreF999ZVeeeUVNW3a1CGdfiuouNiA/ioh4wOuuOIKc/fddzukNWjQwKSnp3upRv4lIyPDNGvWzNvV8FuSzNtvv23/nJeXZ2JjY83EiRPtaadPnzZVqlQxM2bM8EINK74Ll4ExxgwaNMhcd911XqmPvzp8+LCRZNauXWuMYVsoDxi/i1aaffuxY8dMcHCwWbBggT3PL7/8YgICAsyHH37osbp7S2n2Af7eZ8YYEx0dbf71r3/RX05kZWWZevXqmZUrV5rU1FQzfPhwYwzrWWGKiw3or5Lz+hXuM2fO6Ouvv1bXrl0d0rt27ar169d7qVb+Z+fOnYqPj1dSUpL69eunXbt2ebtKfmv37t06ePCgwzYRGhqq1NRUtgkPW7NmjWrUqKFLL71Uf/3rX3X48GFvV6lC++OPPyRJVatWlcS24OsYv0vGlfX566+/1tmzZx3yxMfHq3Hjxn7Rp6XZB/hzn+Xm5mrBggU6efKkUlJS6C8nhg4dqp49e+qqq65ySKffCldUbEB/lZzXA+4jR44oNzdXNWvWdEivWbOmDh486KVa+Zc2bdpo3rx5+uijjzRz5kwdPHhQ7dq109GjR71dNb+Uv96zTXhX9+7d9cYbb2jVqlV69tln9dVXX6lz587Kzs72dtUqJGOMRo4cqQ4dOqhx48aS2BZ8HeN3ybiyPh88eFAhISGKjo4uMk9FVdp9gD/22bfffqvKlSsrNDRUd999t95++201atSI/irGggULtHnzZk2YMKHANPqtoOJiA/qr5IK8XYF8NpvN4bMxpkAarNG9e3f7/02aNFFKSoqSk5M1d+5cjRw50os1829sE95188032/9v3LixWrdurcTERL333nvq06ePF2tWMQ0bNkzffPONPvvsswLT2BZ8G8unZErTX/7Qp+7eB1TkPqtfv762bNmiY8eOacmSJRo0aJDWrl1rn05/Odq3b5+GDx+uFStWKCwsrMh89Nv/FBcb5L9Alv5yndevcFerVk2BgYEFznYcPny4wJkTeEZERISaNGminTt3ersqfin/LZBsE74lLi5OiYmJbBcWuO+++7R8+XKtXr1aCQkJ9nS2Bd/G+F0yrqzPsbGxOnPmjH7//fci81REZdkH+GOfhYSE6JJLLlHr1q01YcIENWvWTFOmTKG/ivD111/r8OHDatWqlYKCghQUFKS1a9fqxRdfVFBQkL3d9FvRzo8NWM9KzusBd0hIiFq1aqWVK1c6pK9cuVLt2rXzUq38W3Z2trZv3664uDhvV8UvJSUlKTY21mGbOHPmjNauXcs24UVHjx7Vvn372C7cyBijYcOGaenSpVq1apWSkpIcprMt+DbG75JxZX1u1aqVgoODHfIcOHBA3333XYXsU3fsA/ytzwpjjFF2djb9VYQuXbro22+/1ZYtW+x/rVu31q233qotW7aobt269JsT58cGrGel4OGXtBVqwYIFJjg42MyaNcts27bNjBgxwkRERJg9e/Z4u2p+4cEHHzRr1qwxu3btMhs3bjS9evUykZGR9L+FsrKyTGZmpsnMzDSSzHPPPWcyMzPNTz/9ZIwxZuLEiaZKlSpm6dKl5ttvvzW33HKLiYuLM8ePH/dyzSuO4pZBVlaWefDBB8369evN7t27zerVq01KSoqpVasWy8CN7rnnHlOlShWzZs0ac+DAAfvfqVOn7HnYFnwb47cjd+zb7777bpOQkGA+/vhjs3nzZtO5c2fTrFkzk5OT461mWcZd+wB/6rPRo0ebTz/91Ozevdt888035uGHHzYBAQFmxYoVxhj6y1Xnv6XcGPrtQs5iA/qrZHwi4DbGmKlTp5rExEQTEhJiWrZsaf9JCFjv5ptvNnFxcSY4ONjEx8ebPn36mK1bt3q7WhXa6tWrjaQCf4MGDTLGnPvJhYyMDBMbG2tCQ0PNlVdeab799lvvVrqCKW4ZnDp1ynTt2tVUr17dBAcHm4svvtgMGjTI7N2719vVrlAK639JZvbs2fY8bAu+j/H7f9yxb//zzz/NsGHDTNWqVU2lSpVMr169Kuy+x137AH/qs9tvv92+vVWvXt106dLFHmwbQ3+56sKAm35z5Cw2oL9KxmaMMZ64kg4AAAAAgD/x+jPcAAAAAABURATcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm7Ax+zfv19jx47Vli1bCkwbPHiwKleu7PlKAQCAUhs/fryWLVvm7WoA8AICbsDH7N+/X48//nihATcAACh/CLgB/0XADaBc+/PPP2WMKXTaqVOnyjTv3NxcZWdnl2keAACg6DHZGKM///yzTPMu7lgA8DYCblRov/76q/72t7+pdu3aCg0NVfXq1dW+fXt9/PHH9jydOnVS48aNtWHDBrVr106VKlVSnTp1NHv2bEnSe++9p5YtWyo8PFxNmjTRhx9+WKCczz77TF26dFFkZKTCw8PVrl07vffeewXyfffdd7ruuusUHR2tsLAwNW/eXHPnzrVPX7NmjS6//HJJ0pAhQ2Sz2WSz2TR27FiH+fzwww/q0aOHKleurNq1a+vBBx90CAz37Nkjm82mZ555Rs8995ySkpJUuXJlpaSkaOPGjQXqtWnTJvXu3VtVq1ZVWFiYWrRooUWLFjnkOXXqlEaNGqWkpCSFhYWpatWqat26tebPn2/Ps2vXLvXr10/x8fEKDQ1VzZo11aVLF5eu1rtShzlz5shms2nFihW6/fbbVb16dYWHhys7O9u+HD/99FO1a9dO4eHhuv322yVJe/fu1YABA1SjRg2FhoaqYcOGevbZZ5WXl1egzyZNmqQnn3xSSUlJCg0N1erVq53WHQBQtP/+97+65ZZbVLNmTYWGhuriiy/Wbbfd5jBuORsfpXNjpM1m05tvvql//OMfiouLU+XKlXXttdfq0KFDysrK0t/+9jdVq1ZN1apV05AhQ3TixAmHedhsNg0bNkz//Oc/demllyo0NFSNGjXSggULHPL9+uuvuvfee9WoUSNVrlxZNWrUUOfOnbVu3boC7cvOzta4cePUsGFDhYWFKSYmRmlpaVq/fr29zJMnT2ru3Ln2cb1Tp06S/jeurV69Wvfcc4+qVaummJgY9enTR/v37y9Q1sKFC5WSkqKIiAhVrlxZ3bp1U2ZmpkMeV8biVatWqVOnToqJiVGlSpV08cUX68Ybb3TpRLUrdch/BO7bb79V165dFRkZqS5dujgsgxkzZqhhw4YKDQ21L2tXjqeKOxYAfFGQtysAWGngwIHavHmznnrqKV166aU6duyYNm/erKNHjzrkO3jwoIYMGaKHHnpICQkJeumll3T77bdr3759Wrx4sR5++GFVqVJF48aN0/XXX69du3YpPj5ekrR27VpdffXVatq0qWbNmqXQ0FBNmzZN1157rebPn6+bb75ZkrRjxw61a9dONWrU0IsvvqiYmBi9/vrrGjx4sA4dOqSHHnpILVu21OzZszVkyBA98sgj6tmzpyQpISHBXtezZ8+qd+/euuOOO/Tggw/q008/1RNPPKEqVarosccec2jX1KlT1aBBA73wwguSpEcffVQ9evTQ7t27VaVKFUnS6tWrdc0116hNmzaaMWOGqlSpogULFujmm2/WqVOnNHjwYEnSyJEj9dprr+nJJ59UixYtdPLkSX333XcOfdmjRw/l5uZq0qRJuvjii3XkyBGtX79ex44dK3Y5uVqHfLfffrt69uyp1157TSdPnlRwcLAk6cCBAxowYIAeeughjR8/XgEBAfr111/Vrl07nTlzRk888YTq1Kmjd999V6NGjdKPP/6oadOmOcz7xRdf1KWXXqpnnnlGUVFRqlevXrF1BwAU7T//+Y86dOigatWqady4capXr54OHDig5cuX68yZMwoNDXVpfDzfww8/rLS0NM2ZM0d79uzRqFGjdMsttygoKEjNmjXT/PnzlZmZqYcffliRkZF68cUXHb6/fPlyrV69WuPGjVNERISmTZtm//5NN90kSfrtt98kSRkZGYqNjdWJEyf09ttvq1OnTvrkk0/sAXNOTo66d++udevWacSIEercubNycnK0ceNG7d27V+3atdOGDRvUuXNnpaWl6dFHH5UkRUVFOdTpzjvvVM+ePfXmm29q3759+vvf/64BAwZo1apV9jzjx4/XI488Yj9GOHPmjCZPnqyOHTvqyy+/VKNGjSQ5H4v37Nmjnj17qmPHjnr11Vd10UUX6ZdfftGHH36oM2fOKDw8vMjl6WodJOnMmTPq3bu37rrrLqWnpysnJ8c+bdmyZVq3bp0ee+wxxcbGqkaNGi4fT+Ur6lgA8DkGqMAqV65sRowYUWye1NRUI8ls2rTJnnb06FETGBhoKlWqZH755Rd7+pYtW4wk8+KLL9rT2rZta2rUqGGysrLsaTk5OaZx48YmISHB5OXlGWOM6devnwkNDTV79+51KL979+4mPDzcHDt2zBhjzFdffWUkmdmzZxeo66BBg4wks2jRIof0Hj16mPr169s/796920gyTZo0MTk5Ofb0L7/80kgy8+fPt6c1aNDAtGjRwpw9e9Zhnr169TJxcXEmNzfXGGNM48aNzfXXX19ELxpz5MgRI8m88MILReYpiqt1mD17tpFkbrvttgLzyF+On3zyiUN6enq6kWS++OILh/R77rnH2Gw2s2PHDmPM//osOTnZnDlzpsRtAAAU1LlzZ3PRRReZw4cPF5nH1fFx9erVRpK59tprHfKNGDHCSDL333+/Q/r1119vqlat6pAmyVSqVMkcPHjQnpaTk2MaNGhgLrnkkiLrmJOTY86ePWu6dOlibrjhBnv6vHnzjCQzc+bMIr9rjDERERFm0KBBBdLzx7V7773XIX3SpElGkjlw4IAxxpi9e/eaoKAgc9999znky8rKMrGxsaZv377GGNfG4sWLFxtJZsuWLcXW+UKu1sGY/x2vvPrqqwXmI8lUqVLF/Pbbbw7prh5PFXcsAPgibilHhXbFFVdozpw5evLJJ7Vx40adPXu20HxxcXFq1aqV/XPVqlVVo0YNNW/e3H4lW5IaNmwoSfrpp58kSSdPntQXX3yhm266yeHt4YGBgRo4cKB+/vln7dixQ9K527e6dOmi2rVrO5Q9ePBgnTp1Shs2bHCpTTabTddee61DWtOmTe11Ol/Pnj0VGBjokO/8+v/www/673//q1tvvVXSuTP1+X89evTQgQMH7PW/4oor9MEHHyg9PV1r1qwp8LxV1apVlZycrMmTJ+u5555TZmamwy3bRSlJHfLdeOONhc4rOjpanTt3dkhbtWqVGjVqpCuuuMIhffDgwTLGOFw9kKTevXtzlhwA3ODUqVNau3at+vbtq+rVqxeZr6TjY69evRw+54/N+XeFnZ/+22+/FbitvEuXLqpZs6b9c2BgoG6++Wb98MMP+vnnn+3pM2bMUMuWLRUWFqagoCAFBwfrk08+0fbt2+15PvjgA4WFhdkfYSqt3r17O3y+cLz+6KOPlJOTo9tuu81hnAwLC1NqaqrWrFkjybWxuHnz5goJCdHf/vY3zZ07V7t27XKpjq7W4XxFjdedO3dWdHS0/XNJjqeczRvwNQTcqNAWLlyoQYMG6V//+pdSUlJUtWpV3XbbbTp48KBDvqpVqxb4bkhISIH0kJAQSdLp06clSb///ruMMYqLiyvw/fxAPf+W66NHj7qUz5nw8HCFhYU5pIWGhtrrdL6YmJgC+STZg+VDhw5JkkaNGqXg4GCHv3vvvVeSdOTIEUnnbrX+xz/+oWXLliktLU1Vq1bV9ddfr507d0o6dyLgk08+Ubdu3TRp0iS1bNlS1atX1/3336+srKwi21OSOuQrrB+LSi9pvxc1bwBAyfz+++/Kzc11eCyqMCXdTxc1Njsbs/PFxsYWKCs/Lb+s5557Tvfcc4/atGmjJUuWaOPGjfrqq690zTXXOJxw/vXXXxUfH6+AgLIdUrs6Xl9++eUFxsqFCxfax0lXxuLk5GR9/PHHqlGjhoYOHark5GQlJydrypQpxdbR1TrkCw8PL3DrfL4Ll3dJjqeKmgfgq3iGGxVatWrV9MILL+iFF17Q3r17tXz5cqWnp+vw4cOFvvyspKKjoxUQEKADBw4UmJb/spNq1apJOjeYupLPk/LLHD16tPr06VNonvr160uSIiIi9Pjjj+vxxx/XoUOH7Fe7r732Wv33v/+VJCUmJmrWrFmSpO+//16LFi3S2LFjdebMGc2YMaPMdchns9kKzVdYekn7vah5AwBKpmrVqgoMDHS4alwYT4+PF550Pz8tP/B9/fXX1alTJ02fPt0h34UnkKtXr67PPvtMeXl5ZQ66i5PfB4sXL1ZiYmKxeV0Zizt27KiOHTsqNzdXmzZt0ksvvaQRI0aoZs2a6tevX5nrIBU/nl44rSTHU67MH/AlXOGG37j44os1bNgwXX311dq8ebNb5hkREaE2bdpo6dKlDme88/Ly9PrrryshIUGXXnqppHO3sK1atarAW0fnzZun8PBwtW3bVlLBs9pWql+/vurVq6f//Oc/at26daF/kZGRBb5Xs2ZNDR48WLfccot27NhR6FtNL730Uj3yyCNq0qRJsf1d2jq4qkuXLtq2bVuBOsybN082m01paWmlnjcAoGiVKlVSamqq3nrrrQJXP8/n6vjoLp988on9aq107icgFy5cqOTkZPvVeJvNZh+P833zzTcFbm/v3r27Tp8+rTlz5hRbZmhoaJnG9W7duikoKEg//vhjkWNlYZyNxYGBgWrTpo2mTp0qScWO16WtgytKcjwFlDdc4UaF9ccffygtLU39+/dXgwYNFBkZqa+++koffvhhkVdSS2PChAm6+uqrlZaWplGjRikkJETTpk3Td999p/nz59vPwGZkZOjdd99VWlqaHnvsMVWtWlVvvPGG3nvvPU2aNMn+1vDk5GRVqlRJb7zxhho2bKjKlSsrPj7e4Vlyd/rnP/+p7t27q1u3bho8eLBq1aql3377Tdu3b9fmzZv11ltvSZLatGmjXr16qWnTpoqOjtb27dv12muvKSUlReHh4frmm280bNgw/eUvf1G9evUUEhKiVatW6ZtvvlF6erpb6lAaDzzwgObNm6eePXtq3LhxSkxM1Hvvvadp06bpnnvuYQAHAAs999xz6tChg9q0aaP09HRdcsklOnTokJYvX65//vOfioyMdHl8dJdq1aqpc+fOevTRR+1vKf/vf//r8NNgvXr10hNPPKGMjAylpqZqx44dGjdunJKSkhzetn3LLbdo9uzZuvvuu7Vjxw6lpaUpLy9PX3zxhRo2bGi/WtykSROtWbNG77zzjuLi4hQZGVng7q3i1KlTR+PGjdOYMWO0a9cuXXPNNYqOjtahQ4f05Zdf2u9Cc2UsnjFjhlatWqWePXvq4osv1unTp/Xqq69Kkq666qoy16G0XD2eAsodL7+0DbDM6dOnzd13322aNm1qoqKiTKVKlUz9+vVNRkaGOXnypD1famqqueyyywp8PzEx0fTs2bNAuiQzdOhQh7R169aZzp07m4iICFOpUiXTtm1b88477xT47rfffmuuvfZaU6VKFRMSEmKaNWtW6NvI58+fbxo0aGCCg4ONJJORkWGMOffWz4iIiAL5MzIyzPmbc/4btydPnlxo/fPnl+8///mP6du3r6lRo4YJDg42sbGxpnPnzmbGjBn2POnp6aZ169YmOjrahIaGmrp165oHHnjAHDlyxBhjzKFDh8zgwYNNgwYNTEREhKlcubJp2rSpef755x3elF4UV+qQ/2bSr776qsD3i1qOxhjz008/mf79+5uYmBgTHBxs6tevbyZPnmx/+7mzPgMAlN62bdvMX/7yFxMTE2NCQkLMxRdfbAYPHmxOnz5tz+PK+Jj/lvK33nrLIb2osSF/bPz111/taflj+LRp00xycrIJDg42DRo0MG+88YbDd7Ozs82oUaNMrVq1TFhYmGnZsqVZtmyZGTRokElMTHTI++eff5rHHnvM1KtXz4SEhJiYmBjTuXNns379enueLVu2mPbt25vw8HAjyaSmphZb9/y2rl692iF92bJlJi0tzURFRZnQ0FCTmJhobrrpJvPxxx8bY1wbizds2GBuuOEGk5iYaEJDQ01MTIxJTU01y5cvL2TpFeSsDsYUfbxy/jIojCvHU8UdCwC+yGaMMZ4P8wEAAADPstlsGjp0qF5++WVvVwWAn+AZbgAAAAAALEDADQAAAACABXhpGgAAAPwCT1IC8DSucAMAAAAAYAECbgAAAAAALEDADQAAAACABbzyDHdeXp7279+vyMhIfsQeAODXjDHKyspSfHy8AgK8cx6ccRkAgHPcPS57JeDev3+/ateu7Y2iAQDwSfv27VNCQoJXymZcBgDAkbvGZa8E3JGRkZLONSIqKsobVQAAwCccP35ctWvXto+N3sC4DADAOe4el70ScOffrhYVFcXAfqHVE4qfnjbaM/UAAHiUN2/lLrfjsrMxU2LcBACUirvGZV6aBgAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFjAK28pBwAA8Ah+/QMA4EUE3OUNBw4AAAAAUC5wSzkAAAAAABbgCre/cXaFXOIqOQAAAAC4AVe4AQAAAACwAFe4AQCA/+LdKAAACxFwAwAA3+TKY1AAAPgwAm538oWz5BycAAAAAIBPIOD2MRt2HS12ekrdGA/VBAAAAABQFrw0DQAAAAAACxBwAwAAAABgAW4p9ySerwYAAAAAv8EVbgAAAAAALEDADQAAAACABbilvJzhLeYAAAAAUD5whRsAAAAAAAsQcAMAAAAAYAECbgAAAAAALMAz3B7k7PlrT5TBM94AAAAA4Blc4QYAAAAAwAIE3AAAAAAAWICAGwAAAAAAC/AMNwAAQGmtnuA8T9po6+sBAPBJXOEGAAAAAMACBNwAAAAAAFiAW8rdyBM/++UTnN0+x61zAICKwpVbxgEAKAIBd0n4yaD7/Mrvi53+AGsNAAAAADhF6IQC2u59pfgMdWM8UxEAAAAAKMcIuAEAgHf4yZ1jAAD/xUvTAAAAAACwAFe44Xn8ZikAoJxw9kLUFBces3L2bpSyeuDqSy2dPwCg9Ai4/YxH3qTujlsEeRM6AAAAgHKOgBsAAMDPOf2FEq6iA0CpEHADAACUkkt3jl1sbR1cuWWdgBkAvIOAGwAAlEvueEzKlWewfZ3Tn/OUJD1jeT0AAAURcJ+PnydxC3e8YAYAAE/wyLtNysi1gBoA4IsIuAEAALyIgBoAKi4C7hIoD2fBAQCAbyGgBgD/RcCNEivriQdXvs9t5wBQAfCoFgDAzxFwAwAAVHDO3mTu/Co8L10DgNIg4AaK4uzKTNpoz9QDAAAvc8vvdDsZV5/PubHsZQCAjyHghk9yetv5rlHFTnZ6SzrBMgDAj/AcOQB4BwH3eXgpmh9xx3OFvnAF3BfqAAAWYVz2Hf5yy7lbruQDwHn8JuB2tgOVpLYeqAdg50rQT8AMACgHNswq/s4zl1xctq+7cqxHEr2MVAAA/RZJREFUwAzA0/wm4OZWKv/i7KqIR96C7omr6J7AVXQAgAc4O1Z7fuXfPFQTC3niZDvjNuBT/CbgBgAAQPnlLCDfeLHzgLysb2t3FvQ/ELTEaR2cqQi3tVeENgDu4pWA2xgjSTp+/Lj7Zvrps8VOPvlntvvKQrn38db93q6CrqhT1fpC3s0ofvqVDzqfx8nTxU93th072TZdqkNZlbUOzr7vwjymrvqh2OlDO1/ivAx4hKeXVf5YmD82eoMl47LkfP/h7OuM3SiBJjteKvM8TjqZfvrkiWKnf/xz2Y8vTicUX8aEZZuLnX65kzpc0ars2/mX88YUO72Jk+8fz3bhGKiMY/OXe34rdvpXCUOcVqHM+3tfOAaqIDw5Nrt7XLYZL4zwP//8s2rXru3pYgEA8Fn79u1TQkKCV8pmXAYAwJG7xmWvBNx5eXnav3+/IiMjZbPZPF28zzh+/Lhq166tffv2KSoqytvV8Rn0S+Hol8LRL4WjX4rma31jjFFWVpbi4+MVEBDglTqUdlz2tb70FH9stz+2WaLdtNs/0G7Hdrt7XPbKLeUBAQFeO4vvi6Kiovxq5XYV/VI4+qVw9Evh6Jei+VLfVKlSxavll3Vc9qW+9CR/bLc/tlmi3f6GdvuXwtrtznHZO6fSAQAAAACo4Ai4AQAAAACwAAG3F4WGhiojI0OhoaHeropPoV8KR78Ujn4pHP1SNPrGffy1L/2x3f7YZol2027/QLutbbdXXpoGAAAAAEBFxxVuAAAAAAAsQMANAAAAAIAFCLgBAAAAALAAATcAAAAAABYg4AYAAAAAwAIE3GUwbdo0JSUlKSwsTK1atdK6deuKzb927Vq1atVKYWFhqlu3rmbMmFEgz5IlS9SoUSOFhoaqUaNGevvttx2mjx07VjabzeEvNjbWre0qK3f3y9atW3XjjTeqTp06stlseuGFF9xSrqd5o1/8cX2ZOXOmOnbsqOjoaEVHR+uqq67Sl19+WeZyPc0b/eKP68vSpUvVunVrXXTRRYqIiFDz5s312muvlbnc8sob45ov8NZ+yNusWN75FixYIJvNpuuvv97NtS47K9p97NgxDR06VHFxcQoLC1PDhg31/vvvW9WEUrGi3S+88ILq16+vSpUqqXbt2nrggQd0+vRpq5pQKiVp94EDB9S/f3/Vr19fAQEBGjFiRKH5Ktp+zZV2l4f9mhXLOl+Z9mkGpbJgwQITHBxsZs6cabZt22aGDx9uIiIizE8//VRo/l27dpnw8HAzfPhws23bNjNz5kwTHBxsFi9ebM+zfv16ExgYaMaPH2+2b99uxo8fb4KCgszGjRvteTIyMsxll11mDhw4YP87fPiw5e11lRX98uWXX5pRo0aZ+fPnm9jYWPP888+XuVxP81a/+OP60r9/fzN16lSTmZlptm/fboYMGWKqVKlifv7551KX62ne6hd/XF9Wr15tli5darZt22Z++OEH88ILL5jAwEDz4Ycflrrc8spb45q3eWt78zYr2p1vz549platWqZjx47muuuus7glJWNFu7Ozs03r1q1Njx49zGeffWb27Nlj1q1bZ7Zs2eKpZjllRbtff/11Exoaat544w2ze/du89FHH5m4uDgzYsQITzXLqZK2e/fu3eb+++83c+fONc2bNzfDhw8vkKci7tdcabev79esaHO+su7TCLhL6YorrjB33323Q1qDBg1Menp6ofkfeugh06BBA4e0u+66y7Rt29b+uW/fvuaaa65xyNOtWzfTr18/++eMjAzTrFmzMtbeOlb0y/kSExMLDSxLWq6neatf/H19McaYnJwcExkZaebOnVvqcj3NW/3C+nJOixYtzCOPPFLqcssrb41r3uat7c3brGp3Tk6Oad++vfnXv/5lBg0a5HMBtxXtnj59uqlbt645c+aM+yvsJla0e+jQoaZz584OeUaOHGk6dOjgplqXXVn236mpqYUGYRVxv3a+otp9IV/br1nVZnfs07ilvBTOnDmjr7/+Wl27dnVI79q1q9avX1/odzZs2FAgf7du3bRp0yadPXu22DwXznPnzp2Kj49XUlKS+vXrp127dpW1SW5hVb9YUa4neatf8vn7+nLq1CmdPXtWVatWLXW5nuStfsnnz+uLMUaffPKJduzYoSuvvLLU5ZZH3h7XvMXb25u3WNnucePGqXr16rrjjjvcX/Eysqrdy5cvV0pKioYOHaqaNWuqcePGGj9+vHJzc61pSAlZ1e4OHTro66+/tt9WvGvXLr3//vvq2bOnBa0oOav23xVxv1YavrRfs7LN7tinEXCXwpEjR5Sbm6uaNWs6pNesWVMHDx4s9DsHDx4sNH9OTo6OHDlSbJ7z59mmTRvNmzdPH330kWbOnKmDBw+qXbt2Onr0qDuaViZW9YsV5XqSt/pFYn2RpPT0dNWqVUtXXXVVqcv1JG/1i+S/68sff/yhypUrKyQkRD179tRLL72kq6++utTllkfeHNe8yZvbmzdZ1e7PP/9cs2bN0syZM62peBlZ1e5du3Zp8eLFys3N1fvvv69HHnlEzz77rJ566ilrGlJCVrW7X79+euKJJ9ShQwcFBwcrOTlZaWlpSk9Pt6YhJWTV/rsi7tdKw5f2a1a12V37tKAyfdvP2Ww2h8/GmAJpzvJfmO5snt27d7f/36RJE6WkpCg5OVlz587VyJEjS94IC1jRL1aU62ne6Bd/X18mTZqk+fPna82aNQoLCytTuZ7mjX7x1/UlMjJSW7Zs0YkTJ/TJJ59o5MiRqlu3rjp16lTqcssrb4xrvsBb+yFvc2e7s7KyNGDAAM2cOVPVqlVzf2XdyN3LOy8vTzVq1NArr7yiwMBAtWrVSvv379fkyZP12GOPubn2pefudq9Zs0ZPPfWUpk2bpjZt2uiHH37Q8OHDFRcXp0cffdTNtS89K/ZBFXG/VhK+ul9zZ5vduU8j4C6FatWqKTAwsMAZk8OHDxc4s5IvNja20PxBQUGKiYkpNk9R85SkiIgINWnSRDt37ixNU9zKqn6xolxP8la/FMaf1pdnnnlG48eP18cff6ymTZuWqVxP8la/FMZf1peAgABdcsklkqTmzZtr+/btmjBhgjp16uTz64u7+NK45km+tL15khXt3rp1q/bs2aNrr73WPj0vL0+SFBQUpB07dig5OdnNLSkZq5Z3XFycgoODFRgYaM/TsGFDHTx4UGfOnFFISIibW1IyVrX70Ucf1cCBA3XnnXdKOnei9uTJk/rb3/6mMWPGKCDAuzfSWrX/roj7tZLwxf2aFW3+8ccf3bZP45byUggJCVGrVq20cuVKh/SVK1eqXbt2hX4nJSWlQP4VK1aodevWCg4OLjZPUfOUpOzsbG3fvl1xcXGlaYpbWdUvVpTrSd7ql8L4y/oyefJkPfHEE/rwww/VunXrMpfrSd7ql8L4y/pyIWOMsrOzS11ueeRL45on+dL25klWtLtBgwb69ttvtWXLFvtf7969lZaWpi1btqh27dqWtcdVVi3v9u3b64cffrAfjEvS999/r7i4OK8H25J17T516lSBoDowMFDm3EuZ3diC0rFq/10R92uu8tX9mhVtdus+rcSvWYMx5n+vnp81a5bZtm2bGTFihImIiDB79uwxxhiTnp5uBg4caM+f//MKDzzwgNm2bZuZNWtWgZ9X+Pzzz01gYKCZOHGi2b59u5k4cWKBnxl48MEHzZo1a8yuXbvMxo0bTa9evUxkZKS9XG+zol+ys7NNZmamyczMNHFxcWbUqFEmMzPT7Ny50+Vyvc1b/eKP68vTTz9tQkJCzOLFix1+3iorK8vlcr3NW/3ij+vL+PHjzYoVK8yPP/5otm/fbp599lkTFBRkZs6c6XK5FYW3xjVv89b25m1WtPtCvviWcivavXfvXlO5cmUzbNgws2PHDvPuu++aGjVqmCeffNLj7SuKFe3OyMgwkZGRZv78+WbXrl1mxYoVJjk52fTt29fj7StKSdttjLEfW7Vq1cr079/fZGZmmq1bt9qnV8T9mjHO2+3r+zUr2nyh0u7TCLjLYOrUqSYxMdGEhISYli1bmrVr19qnDRo0yKSmpjrkX7NmjWnRooUJCQkxderUMdOnTy8wz7feesvUr1/fBAcHmwYNGpglS5Y4TL/55ptNXFycCQ4ONvHx8aZPnz7Frhje4O5+2b17t5FU4O/C+RRXri/wRr/44/qSmJhYaL9kZGS4XK4v8Ea/+OP6MmbMGHPJJZeYsLAwEx0dbVJSUsyCBQtKVG5F4o1xzRd4az/kbVYs7/P5YsBtjDXtXr9+vWnTpo0JDQ01devWNU899ZTJycmxuikl4u52nz171owdO9YkJyebsLAwU7t2bXPvvfea33//3QOtcV1J213YtpuYmOiQpyLu15y1uzzs16xY1ucr7T7N9v+FAQAAAAAAN+IZbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAgP9j777jq6jy/4+/b0glJCGhJSEhFKVJFZRu6CAg9q4UXVGBFRW+CutqwALYC4q4iogNXBQRxUU6FkBBQKWIIEVYmqJIBEFCPr8/+OUul1Tgzm15PR8PHg8yc+7M+Zwzd858pl0AAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBkqZ1157TS6XS1u3bnVPe/vtt/XMM8/4rU4AAODUbN26VS6XS6+99tppfX706NGaMWOGV+sEID+XmZm/KwHAd37++Wf9+OOPatq0qaKioiRJvXr10po1azyScAAAELiOHDmiVatWqVatWqpUqdIpf75cuXK64oorTjthB1Ay4f6uAADfqlSp0mkNzL526NAhlS1bNt/0Y8eOKScnx32ywJvLBgAgWERFRally5b+roZXFDW2e2PM/vPPPxUTE3NGywBOF7eUA//f999/r2uvvVZVqlRRVFSUqlWrpj59+ujIkSPuMmvWrNHFF1+sxMRERUdHq0mTJpo8ebLHchYtWiSXy6UpU6bovvvuU2pqquLj49W5c2dt2LAh33pnz56tTp06KSEhQWXLllW9evU0ZswY9/wVK1bommuuUfXq1RUTE6Pq1avr2muv1bZt29xlvvnmG7lcLk2cODHf8v/zn//I5XJp5syZkvLfUt6+fXvNmjVL27Ztk8vlcv8zM5199tnq1q1bvmX+8ccfSkhI0KBBg4psUzPT+PHj1aRJE8XExCgxMVFXXHGFNm/e7FGuffv2atCggT799FO1bt1aZcuW1U033eS+Xe6xxx7Tww8/rBo1aigqKkoLFy6UJM2cOVOtWrVS2bJlFRcXpy5dumjp0qUeyx45cqRcLpdWrlypK664QomJiapVq1aR9QYAlFxpGT8laePGjbruuutUuXJlRUVFqV69enrhhRdK1E4ul0uDBw/WSy+9pNq1aysqKkr169fX1KlT85UtSXsVdEt53pi3du1aXXvttUpISFCVKlV000036ffff/eoy8GDBzV58mT3uN++fXtJxxPcYcOGqUaNGoqOjlZSUpKaN2+uKVOmFBvj7t27deuttyotLU2RkZGqUaOGRo0apZycnHz1LmhsL2rMPnz4sEaMGKEaNWooMjJSVatW1aBBg7R//36POlSvXl29evXS9OnT1bRpU0VHR2vUqFHF1h1wjAGw1atXW7ly5ax69eo2YcIEmz9/vr355pt21VVX2YEDB8zM7Pvvv7e4uDirVauWvf766zZr1iy79tprTZI9+uij7mUtXLjQJFn16tXt+uuvt1mzZtmUKVOsWrVqdvbZZ1tOTo677CuvvGIul8vat29vb7/9ts2bN8/Gjx9vAwcOdJeZNm2aPfDAA/b+++/b4sWLberUqZaZmWmVKlWyn3/+2V2uadOm1qZNm3yxXXXVVVa5cmU7evSomZlNmjTJJNmWLVvMzGzt2rXWpk0bS05OtqVLl7r/mZk9++yz5nK57IcffvBY5gsvvGCSbO3atUW26y233GIRERE2dOhQmz17tr399ttWt25dq1Kliu3evdtdLjMz05KSkiw9Pd3GjRtnCxcutMWLF9uWLVtMklWtWtU6dOhg7777rs2ZM8e2bNlib731lkmyrl272owZM+ydd96xZs2aWWRkpH322WfuZWdlZZkky8jIsHvvvdfmzp1rM2bMKLLeAICSKU3j59q1ay0hIcEaNmxor7/+us2ZM8eGDh1qYWFhNnLkyGLbSpKlp6db/fr1bcqUKTZz5kzr3r27SbJp06a5y5W0vfLGyEmTJrmn5Y15derUsQceeMDmzp1rTz31lEVFRVn//v3d5ZYuXWoxMTHWo0cP97ifN6bfeuutVrZsWXvqqads4cKF9tFHH9nYsWNt3LhxRca3a9cuS09Pt4yMDHvppZds3rx59tBDD1lUVJT169cvX70LGtsLG7Nzc3OtW7duFh4ebvfff7/NmTPHnnjiCYuNjbWmTZva4cOH3cvPyMiwlJQUq1mzpr366qu2cOFC++qrr4rtH8ApJNyAmXXs2NHKly9ve/fuLbTMNddcY1FRUfbTTz95TL/wwgutbNmytn//fjP73wFDjx49PMr9+9//NknuZDY7O9vi4+Otbdu2lpubW+K65uTk2B9//GGxsbH27LPPuqc/99xzJsk2bNjgnvbrr79aVFSUDR061D3t5ITbzKxnz56WkZGRb10HDhywuLg4GzJkiMf0+vXrW4cOHYqs59KlS02SPfnkkx7Tt2/fbjExMXbPPfe4p2VmZpokmz9/vkfZvEG5Vq1a9tdff7mnHzt2zFJTU61hw4Z27Ngx9/Ts7GyrXLmytW7d2j0tb/B+4IEHiqwvAODUlabxs1u3bpaWlma///67x3IHDx5s0dHR9uuvvxa5fkkWExPjccI5JyfH6tata2eddZZ7Wknbq6iE+7HHHvP47MCBAy06OtqjvWJjY61v37756tmgQQO75JJLioylILfeequVK1fOtm3b5jH9iSee8DhJX9jYfmL9Tx6zZ8+eXWBc77zzjkmyf/3rX+5pGRkZVqZMGY/+BPyJW8pR6h06dEiLFy/WVVddVeSzzQsWLFCnTp2Unp7uMb1fv346dOhQvluZe/fu7fF3o0aNJMl9K9uSJUt04MABDRw4UC6Xq9D1/vHHH7r33nt11llnKTw8XOHh4SpXrpwOHjyo9evXu8tdf/31ioqK8ri1bMqUKTpy5Ij69+9fdCMUIi4uTv3799drr72mgwcPSjreDuvWrdPgwYOL/OxHH30kl8ulG264QTk5Oe5/ycnJaty4sRYtWuRRPjExUR07dixwWb1791ZERIT77w0bNmjnzp268cYbFRb2v91YuXLldPnll2vZsmU6dOiQxzIuv/zyUwkdAFCM0jR+Hj58WPPnz9ell16qsmXLeoxrPXr00OHDh7Vs2bIiWuu4Tp06qUqVKu6/y5Qpo6uvvlqbNm3Sjh07Tqu9ClJQGx4+fFh79+4t9rPnn3++/vOf/2j48OFatGiR/vzzz2I/Ix0f9zt06KDU1FSP9rnwwgslSYsXL85XxxPH9hOdPGYvWLBA0vE2ONGVV16p2NhYzZ8/32N6o0aNVLt27RLVG3AaCTdKvd9++03Hjh1TWlpakeX27dunlJSUfNNTU1Pd809UoUIFj7/zXgSSN3D9/PPPklTseq+77jo9//zz+tvf/qZPPvlEX331lZYvX65KlSp5DIJJSUnq3bu3Xn/9dR07dkzS8ee1zz//fJ1zzjlFrqMof//735Wdna233npLkvT8888rLS1NF198cZGf27Nnj8xMVapUUUREhMe/ZcuW6ZdffvEoX1DbFjYvr60L64/c3Fz99ttvJV4+AODUlabxc9++fcrJydG4cePyjWk9evSQpHzjWkGSk5MLnZbXDqfaXgUprg2L8txzz+nee+/VjBkz1KFDByUlJemSSy7Rxo0bi/zcnj179OGHH+Zrn7w2PNNxPzw8PN+JHZfLpeTk5HxtwpiPQMJbylHqJSUlqUyZMu4zy4WpUKGCdu3alW/6zp07JUkVK1Y8pfXmDRpFrff333/XRx99pKysLA0fPtw9/ciRI/r111/zle/fv7+mTZumuXPnqlq1alq+fLlefPHFU6rXyc466yxdeOGFeuGFF3ThhRdq5syZGjVqlMqUKVPk5ypWrCiXy6XPPvuswLeOnjytqKsUJ8/LO5AorD/CwsKUmJhY4uUDAE5daRo/ExMTVaZMGd14442FvjC0Ro0axdZ99+7dhU7LG9u83V6nKjY2VqNGjdKoUaO0Z88e99Xuiy66SN9//32hn6tYsaIaNWqkRx55pMD5eScM8pzquJ+Tk6Off/7ZI+k2M+3evVvnnXdeiZcN+BpXuFHqxcTEKDMzU9OmTSvy7HSnTp20YMEC94CX5/XXX1fZsmVP+ac5WrdurYSEBE2YMEFmVmCZvLeFn5ycvvLKK+6z8Cfq2rWrqlatqkmTJmnSpEmKjo7WtddeW2xdoqKiijzrPWTIEH377bfq27evypQpo1tuuaXYZfbq1Utmpv/+979q3rx5vn8NGzYsdhmFqVOnjqpWraq3337bo+0OHjyo9957z/3mcgCAc0rT+Fm2bFl16NBBq1atUqNGjQoc106+qlyQ+fPna8+ePe6/jx07pnfeeUe1atVyX7H3dnsVprixX5KqVKmifv366dprr9WGDRvyPa51ol69emnNmjWqVatWge1zcsJ9Kjp16iRJevPNNz2mv/feezp48KB7PhCIuMINSHrqqafUtm1btWjRQsOHD9dZZ52lPXv2aObMmXrppZcUFxenrKws9/NJDzzwgJKSkvTWW29p1qxZeuyxx5SQkHBK6yxXrpyefPJJ/e1vf1Pnzp11yy23qEqVKtq0aZO++eYbPf/884qPj9cFF1ygxx9/XBUrVlT16tW1ePFiTZw4UeXLl8+3zDJlyqhPnz566qmnFB8fr8suu6xE9WrYsKGmT5+uF198Uc2aNVNYWJiaN2/unt+lSxfVr19fCxcu1A033KDKlSsXu8w2bdpowIAB6t+/v1asWKELLrhAsbGx2rVrlz7//HM1bNhQt99++ym1WZ6wsDA99thjuv7669WrVy/deuutOnLkiB5//HHt379fY8eOPa3lAgBOTWkaP5999lm1bdtW7dq10+23367q1asrOztbmzZt0ocffuh+zrgoFStWVMeOHXX//fcrNjZW48eP1/fff+/x02Debq/CNGzYUIsWLdKHH36olJQUxcXFqU6dOmrRooV69eqlRo0aKTExUevXr9cbb7xR7MnsBx98UHPnzlXr1q11xx13qE6dOjp8+LC2bt2qjz/+WBMmTCj2MYDCdOnSRd26ddO9996rAwcOqE2bNvr222+VlZWlpk2b6sYbbzzdZgCc57/3tQGBZd26dXbllVdahQoVLDIy0qpVq2b9+vXz+KmJ7777zi666CJLSEiwyMhIa9y4scfbQc3+95bVE3/iw6zgt4mamX388ceWmZlpsbGxVrZsWatfv77Hz37s2LHDLr/8cktMTLS4uDjr3r27rVmzxjIyMgp8u+gPP/xgkkySzZ07N9/8gt5S/uuvv9oVV1xh5cuXN5fLZQXtGkaOHGmSbNmyZUW0Yn6vvvqqtWjRwmJjYy0mJsZq1aplffr0sRUrVrjLZGZm2jnnnJPvs3lt9vjjjxe47BkzZliLFi0sOjraYmNjrVOnTvbFF194lMl74+mJPwEDAPCe0jJ+5tXlpptusqpVq1pERIRVqlTJWrdubQ8//HCx7STJBg0aZOPHj7datWpZRESE1a1b19566618ZUvSXkW9pfzkMa+gsX/16tXWpk0bK1u2rEmyzMxMMzMbPny4NW/e3BITEy0qKspq1qxpd911l/3yyy/Fxvjzzz/bHXfcYTVq1LCIiAhLSkqyZs2a2X333Wd//PGHR70LGtuLGrP//PNPu/feey0jI8MiIiIsJSXFbr/9dvvtt988ymVkZFjPnj2LrSvgKy6zQu7FAYATNG/eXC6XS8uXL/d3VQAACDoul0uDBg3S888/7++qAPAhbikHUKgDBw5ozZo1+uijj/T111/r/fff93eVAAAAgKBBwg2gUCtXrlSHDh1UoUIFZWVl6ZJLLvF3lQAAAICgwS3lAAAAAAA4gJ8FAwAAAADAASTcAAAAAAA4wC/PcOfm5mrnzp2Ki4uTy+XyRxUAAAgIZqbs7GylpqYqLMw/58EZlwEAOM7b47JfEu6dO3cqPT3dH6sGACAgbd++XWlpaX5ZN+MyAACevDUu+yXhjouLk3Q8iPj4eH9UAQCAgHDgwAGlp6e7x0Z/YFwGAOA4b4/Lfkm4825Xi4+PZ2AHAEDy663cjMsAAHjy1rjM73CfaOGYoud3GOGbegAAgNDB8QUAlFqlJ+EubrADAAAAAMCL+FkwAAAAAAAcUHqucAMAAJwqbgcHAJwBrnADAAAAAOAArnADAACcLt4RAwAoAle4AQAAAABwAAk3AAAAAAAO4JZyAABQenFLOADAQVzhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAH8Aw3AACAPxX3HHmHEb6pBwDA67jCDQAAAACAA7jCDQAAEMi4Ag4AQYsr3AAAAAAAOICEGwAAAAAAB4TOLeXF3W4FAAAAAIAPhU7CDQAAgAI9PfeHIuff1aW2j2oCAKULt5QDAAAAAOAAEm4AAAAAABxAwg0AAAAAgAN4hhsAACDEtfzpX8WUeMIn9QCA0oYr3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgAJ7hBgAAKOX4nW4AcAYJ9ylgMAIAAAAAlBQJ9yngDZ8AACDgLBzj7xoAAArBM9wAAAAAADiAK9xexC3nAADA15Zu3ufvKgAACkHCDQAAQlcQ3G5dXMLcqmYFH9WkcMVdVCgJLjwAKI24pRwAAAAAAAdwhduHSnJ2mLO/AACUUBBcvfaG0nLLOI/mAQhFXOEGAAAAAMABXOE+QWk5gwwAgE8UdwW6wwjf1MPPQuH4orifRl1WbYCPagIAwYUr3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgAJ7hBgAAgOO88VveABBsSLh9qLgXjkjS03OLfukIP4kBAAAAAMGBhNuLSpJQAwAAlEa86RxAaUTCDQAAUMpx0QAAnMFL0wAAAAAAcABXuANMcWeYecYbAAAAAIIDCTcAAADOiC9uSS/uLedcdAAQiEi4AQBAYFo4xt810NLN+4qc36pmBR/VBMXxxs+OOZ20l6SOnDgAQgsJNwAAwGkqLiFHcFk6cViR81vd/ISPagIgVJSahDtUBkSe8QYAAACA4FBqEm4cx61MAAAgEJ3pc+CB8Dve3rit/UzXwXEcEFhIuJFPsTvy8PeKXkCHEV6sDQAAzgmVO+AQGIo7aeCTkwLFvfuA4zTAp/yScJuZJOnAgQPeW+jBw0XP/vOI99YVwBpuGOf4OuYVV2Dt34ucvTytf7HrGNTxrJJXCACCWN5YmDc2+oMj47JU7Nisj7K8u74CfLX1V8fXgcDgjWOgg8XMHzNjZZHzzyvmePPwwT+KrcOBYr4XDYvZpg9UTypy/gvFxCB54Tjs0yeLrkPOxcXXIfyDogtcMPRUanRaXliw6Yw+z/GsFxWzTXlze/D2uOwyP4zwO3bsUHp6uq9XCwBAwNq+fbvS0tL8sm7GZQAAPHlrXPZLwp2bm6udO3cqLi5OLpfrjJd34MABpaena/v27YqPj/dCDf2LeAIb8QQ24glsoRaPdOYxmZmys7OVmpqqsLAwB2pYPG+Py1Jo9rUv0G6nh3Y7PbTb6aHdTk+wtJu3x2W/3FIeFhbmyFn8+Pj4gO68U0U8gY14AhvxBLZQi0c6s5gSEhK8XJtT49S4LIVmX/sC7XZ6aLfTQ7udHtrt9ARDu3lzXPbPqXQAAAAAAEIcCTcAAAAAAA4IiYQ7KipKWVlZioqK8ndVvIJ4AhvxBDbiCWyhFo8UmjF5A+1yemi300O7nR7a7fTQbqentLabX16aBgAAAABAqAuJK9wAAAAAAAQaEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADgjYhHv8+PGqUaOGoqOj1axZM3322WdFll+8eLGaNWum6Oho1axZUxMmTMhX5r333lP9+vUVFRWl+vXr6/3333eq+vl4O57XXntNLpcr37/Dhw87GYbbqcSza9cuXXfddapTp47CwsJ05513FlguWPqnJPEEU/9Mnz5dXbp0UaVKlRQfH69WrVrpk08+yVcuWPqnJPEEU/98/vnnatOmjSpUqKCYmBjVrVtXTz/9dL5ywdI/JYknmPrnRF988YXCw8PVpEmTfPP82T/eEmrjsq94u93Wrl2ryy+/XNWrV5fL5dIzzzzjYO39x9vt9vLLL6tdu3ZKTExUYmKiOnfurK+++srJEPzC2+02ffp0NW/eXOXLl1dsbKyaNGmiN954w8kQ/MKJ/VueqVOnyuVy6ZJLLvFyrf0v1PIbx1gAmjp1qkVERNjLL79s69atsyFDhlhsbKxt27atwPKbN2+2smXL2pAhQ2zdunX28ssvW0REhL377rvuMkuWLLEyZcrY6NGjbf369TZ69GgLDw+3ZcuWBWU8kyZNsvj4eNu1a5fHP1841Xi2bNlid9xxh02ePNmaNGliQ4YMyVcmmPqnJPEEU/8MGTLEHn30Ufvqq6/shx9+sBEjRlhERIStXLnSXSaY+qck8QRT/6xcudLefvttW7NmjW3ZssXeeOMNK1u2rL300kvuMsHUPyWJJ5j6J8/+/futZs2a1rVrV2vcuLHHPH/2j7eE2rjsK06021dffWXDhg2zKVOmWHJysj399NM+isZ3nGi36667zl544QVbtWqVrV+/3vr3728JCQm2Y8cOX4XlOCfabeHChTZ9+nRbt26dbdq0yZ555hkrU6aMzZ4921dhOc6JdsuzdetWq1q1qrVr184uvvhihyPxrVDLb5wUkAn3+eefb7fddpvHtLp169rw4cMLLH/PPfdY3bp1Pabdeuut1rJlS/ffV111lXXv3t2jTLdu3eyaa67xUq0L50Q8kyZNsoSEBK/XtSRONZ4TZWZmFpigBlP/nKiweIK1f/LUr1/fRo0a5f47WPsnz8nxBHv/XHrppXbDDTe4/w72/jk5nmDsn6uvvtr++c9/WlZWVr6E25/94y2hNi77ihPtdqKMjIyQTLidbjczs5ycHIuLi7PJkyefeYUDhC/azcysadOm9s9//vPMKhtAnGq3nJwca9Omjb3yyivWt2/fkEu4Qy2/cVLA3VL+119/6euvv1bXrl09pnft2lVLliwp8DNLly7NV75bt25asWKFjh49WmSZwpbpLU7FI0l//PGHMjIylJaWpl69emnVqlXeD+AkpxNPSQRT/5RUsPZPbm6usrOzlZSU5J4WzP1TUDxS8PbPqlWrtGTJEmVmZrqnBXP/FBSPFFz9M2nSJP3444/KysoqcL6/+sdbQm1c9hUnx/9Q5qt2O3TokI4ePZpvbAhWvmg3M9P8+fO1YcMGXXDBBd6rvB852W4PPvigKlWqpJtvvtn7FfezUMtvnBZwCfcvv/yiY8eOqUqVKh7Tq1Spot27dxf4md27dxdYPicnR7/88kuRZQpbprc4FU/dunX12muvaebMmZoyZYqio6PVpk0bbdy40ZlA/r/Tiackgql/SiKY++fJJ5/UwYMHddVVV7mnBXP/FBRPMPZPWlqaoqKi1Lx5cw0aNEh/+9vf3POCsX+KiieY+mfjxo0aPny43nrrLYWHhxdYxl/94y2hNi77ilPtFup81W7Dhw9X1apV1blzZ+9U3M+cbLfff/9d5cqVU2RkpHr27Klx48apS5cu3g/CD5xqty+++EITJ07Uyy+/7EzF/SzU8hunFXx0EABcLpfH32aWb1px5U+efqrL9CZvx9OyZUu1bNnSPb9NmzY699xzNW7cOD333HPeqvYp1e9M2zKY+qc4wdo/U6ZM0ciRI/XBBx+ocuXKXlmmN3g7nmDsn88++0x//PGHli1bpuHDh+uss87Stddee0bL9BZvxxMs/XPs2DFdd911GjVqlGrXru2VZQayUBuXfcWJdisNnGy3xx57TFOmTNGiRYsUHR3thdoGDifaLS4uTqtXr9Yff/yh+fPn6+6771bNmjXVvn1771Xcz7zZbtnZ2brhhhv08ssvq2LFit6vbAAJtfzGKQGXcFesWFFlypTJd3Zk7969+c6K5ElOTi6wfHh4uCpUqFBkmcKW6S1OxXOysLAwnXfeeY6fATqdeEoimPrndARD/7zzzju6+eabNW3atHxn/IOxf4qK52TB0D81atSQJDVs2FB79uzRyJEj3QlqMPZPUfGcLFD7Jzs7WytWrNCqVas0ePBgSccfYTAzhYeHa86cOerYsaPf+sdbQm1c9hVfjf+hxul2e+KJJzR69GjNmzdPjRo18m7l/cjJdgsLC9NZZ50lSWrSpInWr1+vMWPGhETC7US7rV27Vlu3btVFF13knp+bmytJCg8P14YNG1SrVi0vR+JboZbfOC3gbimPjIxUs2bNNHfuXI/pc+fOVevWrQv8TKtWrfKVnzNnjpo3b66IiIgiyxS2TG9xKp6TmZlWr16tlJQU71S8EKcTT0kEU/+cjkDvnylTpqhfv356++231bNnz3zzg61/iovnZIHePyczMx05csT9d7D1z8lOjqeg+YHYP/Hx8fruu++0evVq97/bbrtNderU0erVq9WiRQtJ/usfbwm1cdlXfDX+hxon2+3xxx/XQw89pNmzZ6t58+ber7wf+XJ7K26fHUycaLe6devmGxt69+6tDh06aPXq1UpPT3csHl8JtfzGcQ6+kO205b1mfuLEibZu3Tq78847LTY21rZu3WpmZsOHD7cbb7zRXT7vNfN33XWXrVu3ziZOnJjvNfNffPGFlSlTxsaOHWvr16+3sWPH+vxnc7wZz8iRI2327Nn2448/2qpVq6x///4WHh5uX375ZcDFY2a2atUqW7VqlTVr1syuu+46W7Vqla1du9Y9P5j6pyTxBFP/vP322xYeHm4vvPCCx08w7N+/310mmPqnJPEEU/88//zzNnPmTPvhhx/shx9+sFdffdXi4+Ptvvvuc5cJpv4pSTzB1D8nK+gt5f7sH28JtXHZV5xotyNHjrjHoJSUFBs2bJitWrXKNm7c6PP4nOJEuz366KMWGRlp7777rsfYkJ2d7fP4nOJEu40ePdrmzJljP/74o61fv96efPJJCw8Pt5dfftnn8TnFiXY7WSi+pTzU8hsnBWTCbWb2wgsvWEZGhkVGRtq5555rixcvds/r27evZWZmepRftGiRNW3a1CIjI6169er24osv5lvmtGnTrE6dOhYREWF169a19957z+kw3Lwdz5133mnVqlWzyMhIq1SpknXt2tWWLFnii1DM7NTjkZTvX0ZGhkeZYOqf4uIJpv7JzMwsMJ6+fft6LDNY+qck8QRT/zz33HN2zjnnWNmyZS0+Pt6aNm1q48ePt2PHjnksM1j6pyTxBFP/nKyghNvMv/3jLaE2LvuKt9tty5YtBe7jitoug5G32y0jI6PAdsvKyvJBNL7j7Xa777777KyzzrLo6GhLTEy0Vq1a2dSpU30Rik85sX87USgm3Gahl984xWX2/59WBwAAAAAAXhNwz3ADAAAAABAKSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGyhlDh06pJEjR2rRokX+rgoAIIC98847OueccxQTEyOXy6XVq1f7bN3r1q3TyJEjtXXr1tNexmuvvSaXy3Xayxg5cqRcLpfHtOrVq6tfv36nXadAcaZtA6Dkwv1dAQC+dejQIY0aNUqS1L59e/9WBgAQkH7++WfdeOON6t69u8aPH6+oqCjVrl3bZ+tft26dRo0apfbt26t69eo+W29x3n//fcXHx/u7GmesZ8+eWrp0qVJSUvxdFSDkkXADKNKhQ4dUtmxZv63/6NGjcrlcCg/Pv7s607qZmQ4fPqyYmJgzqSIAhJwffvhBR48e1Q033KDMzMwiy/p7nPClpk2b+rsKXlGpUiVVqlTJ39U4ZYVta94Yz//8809FR0fnu6sBOFPcUo5S7eeff9aAAQOUnp6uqKgoVapUSW3atNG8efMkSQ899JDCw8O1ffv2fJ+96aabVKFCBR0+fFjS8dvMevXqpY8++khNmzZVTEyM6tWrp48++kjS8du36tWrp9jYWJ1//vlasWKFx/L69euncuXK6fvvv1e3bt0UGxurlJQUjR07VpK0bNkytW3bVrGxsapdu7YmT56cr067d+/WrbfeqrS0NEVGRqpGjRoaNWqUcnJyJElbt251D7CjRo2Sy+WSy+Vy3x6Xd/vcypUrdcUVVygxMVG1atXSG2+8IZfLpaVLl+Zb54MPPqiIiAjt3LmzyLbeuHGjrrvuOlWuXFlRUVGqV6+eXnjhBY8yixYtksvl0htvvKGhQ4eqatWqioqK0qZNm9zt891336lr166Ki4tTp06dJEm//vqrBg4cqKpVqyoyMlI1a9bUfffdpyNHjngs3+VyafDgwZowYYLq1aunqKioAtsRAEqzfv36qW3btpKkq6++Wi6Xy31HVFH74rlz5+riiy9WWlqaoqOjddZZZ+nWW2/VL7/8km8d33//va699lpVqVJFUVFRqlatmvr06aMjR47otdde05VXXilJ6tChg3useu211055PSU1a9YsNWnSRFFRUapRo4aeeOKJAsudfEt53rj19ttv695771VKSorKlSuniy66SHv27FF2drYGDBigihUrqmLFiurfv7/++OMPj2WamcaPH68mTZooJiZGiYmJuuKKK7R582aPcu3bt1eDBg20fPlytWvXTmXLllXNmjU1duxY5ebmusvl5ubq4YcfVp06dRQTE6Py5curUaNGevbZZ91lCrul/NVXX1Xjxo0VHR2tpKQkXXrppVq/fr1HmbxtYNOmTerRo4fKlSun9PR0DR06NN+4W5h33nlHrVq1UmxsrMqVK6du3bpp1apVBa6noG2tqPH8888/V6dOnRQXF6eyZcuqdevWmjVrlsey8+KfM2eObrrpJlWqVElly5Ytcf2BU2JAKdatWzerVKmS/etf/7JFixbZjBkz7IEHHrCpU6eamdmePXssKirK7rvvPo/P7du3z2JiYuz//u//3NMyMjIsLS3NGjRoYFOmTLGPP/7YWrRoYREREfbAAw9YmzZtbPr06fb+++9b7dq1rUqVKnbo0CH35/v27WuRkZFWr149e/bZZ23u3LnWv39/k2QjRoyw2rVr28SJE+2TTz6xXr16mSRbsWKF+/O7du2y9PR0y8jIsJdeesnmzZtnDz30kEVFRVm/fv3MzOzw4cM2e/Zsk2Q333yzLV261JYuXWqbNm0yM7OsrCyTZBkZGXbvvffa3LlzbcaMGXbkyBFLTk6266+/3qMdjh49aqmpqXbllVcW2c5r1661hIQEa9iwob3++us2Z84cGzp0qIWFhdnIkSPd5RYuXGiSrGrVqnbFFVfYzJkz7aOPPrJ9+/ZZ3759LSIiwqpXr25jxoyx+fPn2yeffGJ//vmnNWrUyGJjY+2JJ56wOXPm2P3332/h4eHWo0cPj3rkLbtRo0b29ttv24IFC2zNmjXFbicAUJps2rTJXnjhBZNko0ePtqVLl9ratWvNzArdF5uZvfjiizZmzBibOXOmLV682CZPnmyNGze2OnXq2F9//eVe/urVq61cuXJWvXp1mzBhgs2fP9/efPNNu+qqq+zAgQO2d+9eGz16tEmyF154wT1W7d2795TWM2nSJJNkW7ZsKTLeefPmWZkyZaxt27Y2ffp0mzZtmp133nlWrVo1O/lQOSMjw/r27ev+O2/cysjIsH79+tns2bNtwoQJVq5cOevQoYN16dLFhg0bZnPmzLFHH33UypQpY3//+989lnnLLbdYRESEDR061GbPnm1vv/221a1b16pUqWK7d+92l8vMzLQKFSrY2WefbRMmTLC5c+fawIEDTZJNnjzZXW7MmDFWpkwZy8rKsvnz59vs2bPtmWee8RhvC2qbvDa/9tprbdasWfb6669bzZo1LSEhwX744Qd3uROPV5544gmbN2+ePfDAA+ZyuWzUqFFFtrWZ2SOPPGIul8tuuukm++ijj2z69OnWqlUri42NdW9neespbFsrbDxftGiRRUREWLNmzeydd96xGTNmWNeuXc3lcrmP7U6Mv2rVqjZgwAD7z3/+Y++++67l5OQUW3/gVJFwo1QrV66c3XnnnUWW6du3r1WuXNmOHDninvboo49aWFiYx0CVkZFhMTExtmPHDve01atXmyRLSUmxgwcPuqfPmDHDJNnMmTM91iPJ3nvvPfe0o0ePWqVKlUySrVy50j193759VqZMGbv77rvd02699VYrV66cbdu2zaP+TzzxhElyD2I///yzSbKsrKx8seYl3A888ECB8yIjI23Pnj3uae+8845JssWLFxfYdnm6detmaWlp9vvvv3tMHzx4sEVHR9uvv/5qZv87cLngggvyLSOvfV599VWP6RMmTDBJ9u9//9tj+qOPPmqSbM6cOe5pkiwhIcG9PgBAwfL2x9OmTfOYXti++GS5ubl29OhR27Ztm0myDz74wD2vY8eOVr58eXcCXZBp06aZJFu4cOFpr6ekCXeLFi0sNTXV/vzzT/e0AwcOWFJSUokT7osuusij3J133mmS7I477vCYfskll1hSUpL776VLl5oke/LJJz3Kbd++3WJiYuyee+5xT8vMzDRJ9uWXX3qUrV+/vnXr1s39d69evaxJkyZFxnxy2/z2228WExOT70T1Tz/9ZFFRUXbddde5p+VtAyePuz169LA6deoUud6ffvrJwsPD8510yM7OtuTkZLvqqqvyraegba2w8bxly5ZWuXJly87Odk/LycmxBg0aWFpamuXm5nrE36dPnyLrC3gDt5SjVDv//PP12muv6eGHH9ayZct09OjRfGWGDBmivXv3atq0aZKO36r14osvqmfPnvle5NKkSRNVrVrV/Xe9evUkHb8N7MRnjvKmb9u2zePzLpdLPXr0cP8dHh6us846SykpKR7PjSUlJaly5coen//oo4/UoUMHpaamKicnx/3vwgsvlCQtXry4xO1y+eWX55t2++23S5Jefvll97Tnn39eDRs21AUXXFDosg4fPqz58+fr0ksvVdmyZT3q1qNHDx0+fFjLli0rdv2FzVuwYIFiY2N1xRVXeEzPu+Vv/vz5HtM7duyoxMTEQpcPACheQfvpvXv36rbbblN6errCw8MVERGhjIwMSXLflnzo0CEtXrxYV1111Wk/Q1yS9ZTUwYMHtXz5cl122WWKjo52T4+Li9NFF11U4uX06tXL4++8cb5nz575pv/666/u28o/+ugjuVwu3XDDDR7jY3Jysho3bpzvF0WSk5N1/vnne0xr1KiRx/HA+eefr2+++UYDBw7UJ598ogMHDhRb/6VLl+rPP//M9wb29PR0dezYMd9Y6nK58rXPyfUoyCeffKKcnBz16dPHI97o6GhlZmYW+AsqhR0TnDyeHzx4UF9++aWuuOIKlStXzj29TJkyuvHGG7Vjxw5t2LChRMsGvImEG6XaO++8o759++qVV15Rq1atlJSUpD59+mj37t3uMk2bNlW7du3czxt/9NFH2rp1qwYPHpxveUlJSR5/R0ZGFjk97/nvPGXLlvUY8PPKnvz5vOknfn7Pnj368MMPFRER4fHvnHPOkaRTeratoLeWVqlSRVdffbVeeuklHTt2TN9++60+++yzAtvhRPv27VNOTo7GjRuXr255JxdOrlthb00tW7ZsvrfD7tu3T8nJyfleclK5cmWFh4dr3759JVo2AKBkCtoX5+bmqmvXrpo+fbruuecezZ8/X1999ZX7hOqff/4pSfrtt9907NgxpaWlnda6S7qekvrtt9+Um5ur5OTkfPMKmlaY0x3/9+zZIzNTlSpV8o2Ry5Ytyzc+VqhQId+6o6KiPOIeMWKEnnjiCS1btkwXXnihKlSooE6dOuV7d8yJ8sbKgsbI1NTUfGNpQccrUVFR+Y5rTrZnzx5J0nnnnZcv3nfeeSdfvAVta3lOrutvv/0mMys0BkkcE8AveEs5SrWKFSvqmWee0TPPPKOffvpJM2fO1PDhw7V3717Nnj3bXe6OO+7QlVdeqZUrV+r5559X7dq11aVLFz/WPL+KFSuqUaNGeuSRRwqcnzfYlERhb+gcMmSI3njjDX3wwQeaPXu2ypcvr+uvv77IZSUmJrrPLg8aNKjAMjVq1CjR+guaXqFCBX355ZcyM4/5e/fuVU5OjipWrFiiZQMASqag/eiaNWv0zTff6LXXXlPfvn3d0zdt2uRRLikpSWXKlNGOHTtOa90lXU9JJSYmyuVyeZxoz1PQNG+rWLGiXC6XPvvsM0VFReWbX9C04oSHh+vuu+/W3Xffrf3792vevHn6xz/+oW7dumn79u0FvuU7L5HftWtXvnk7d+7MN5aerrzlvPvuu+67EopS1Jh98rzExESFhYUVGsOJ6y/J8gFvIeEG/r9q1app8ODBmj9/vr744guPeZdeeqmqVaumoUOHavHixXr66acDbifdq1cvffzxx6pVq1aRt0znDd6nehVAkpo1a6bWrVvr0Ucf1Zo1azRgwADFxsYW+ZmyZcuqQ4cOWrVqlRo1auQ+u+8tnTp10r///W/NmDFDl156qXv666+/7p4PAHBW3ph4coL40ksvefwdExOjzMxMTZs2TY888kihiVxhY1VJ11NSeb8cMn36dD3++OPuq7bZ2dn68MMPT2uZp6JXr14aO3as/vvf/+qqq67y+vLLly+vK664Qv/973915513auvWrapfv36+cq1atVJMTIzefPNN9xviJWnHjh1asGBBvse2Tle3bt0UHh6uH3/80eu3c8fGxqpFixaaPn26nnjiCfdPhOXm5urNN99UWlqaT39LHshDwo1S6/fff1eHDh103XXXqW7duoqLi9Py5cs1e/ZsXXbZZR5ly5Qpo0GDBunee+9VbGxsvmecAsGDDz6ouXPnqnXr1rrjjjtUp04dHT58WFu3btXHH3+sCRMmKC0tTXFxccrIyNAHH3ygTp06KSkpSRUrVsz3PHphhgwZ4v6ZmIEDB5boM88++6zatm2rdu3a6fbbb1f16tWVnZ2tTZs26cMPP9SCBQtOO+4+ffrohRdeUN++fbV161Y1bNhQn3/+uUaPHq0ePXqoc+fOp71sAEDJ1K1bV7Vq1dLw4cNlZkpKStKHH36ouXPn5iv71FNPqW3btmrRooWGDx+us846S3v27NHMmTP10ksvKS4uTg0aNJAk/etf/1JcXJyio6NVo0aNU1pPST300EPq3r27unTpoqFDh+rYsWN69NFHFRsbq19//fW0l1sSbdq00YABA9S/f3+tWLFCF1xwgWJjY7Vr1y59/vnnatiwofsdKiV10UUXqUGDBmrevLkqVaqkbdu26ZlnnlFGRobOPvvsAj9Tvnx53X///frHP/6hPn366Nprr9W+ffs0atQoRUdHKysryxvhqnr16nrwwQd13333afPmzerevbsSExO1Z88effXVV4qNjdWoUaNOe/ljxoxRly5d1KFDBw0bNkyRkZEaP3681qxZoylTpgTcxRKUDiTcKLWio6PVokULvfHGG9q6dauOHj2qatWq6d5779U999yTr/zVV1+te++9VzfeeKMSEhL8UOOipaSkaMWKFXrooYf0+OOPa8eOHYqLi1ONGjXcA1qeiRMn6v/+7//Uu3dvHTlyRH379nX/vmlxLrnkEkVFRalDhw6FDtwnq1+/vlauXKmHHnpI//znP7V3716VL19eZ599tsdL4k5HdHS0Fi5cqPvuu0+PP/64fv75Z1WtWlXDhg3z2gECAKBoERER+vDDDzVkyBDdeuutCg8PV+fOnTVv3jxVq1bNo2zjxo311VdfKSsrSyNGjFB2draSk5PVsWNH911QNWrU0DPPPKNnn31W7du317FjxzRp0iT169evxOspqS5dumjGjBn65z//qauvvlrJyckaOHCg/vzzzzNK/krqpZdeUsuWLfXSSy9p/Pjxys3NVWpqqtq0aZPvBWkl0aFDB7333nt65ZVXdODAASUnJ6tLly66//77FRERUejnRowYocqVK+u5557TO++8o5iYGLVv316jR48u8XhfEiNGjFD9+vX17LPPasqUKTpy5IiSk5N13nnn6bbbbjujZWdmZmrBggXKyspSv379lJubq8aNG2vmzJn5XmwH+IrLzMzflQCCwbhx43THHXdozZo17heRlUYffvihevfurVmzZp1xsgwAAACEMhJuoBirVq3Sli1bdOutt6pNmzaaMWOGv6vkF+vWrdO2bds0ZMgQxcbGauXKldyaBQAAABSBhBsoRvXq1bV79261a9dOb7zxxin9TEgoad++vb744gude+65mjx5surWrevvKgEAAAABjYQbAAAAAAAHhPm7AgAAAAAAhCISbgAAAAAAHOCXnwXLzc3Vzp07FRcXx0uXAAClmpkpOztbqampCgvzz3lwxmUAAI7z9rjsl4R7586dSk9P98eqAQAISNu3b1daWppf1s24DACAJ2+Ny35JuOPi4iQdDyI+Pt4fVQAAICAcOHBA6enp7rHRHxiXAQA4ztvjsl8S7rzb1eLj4xnYAQCQ/HorN+MyAACevDUu+yXhxhlYOKbo+R1G+KYeAAA4jTEPABDkeEs5AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAfw0jQAAOAfxb0UDQCAIEfC7UslObDgjasAAAAAEBK4pRwAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMAB/A53qCnut775nW8AAAAA8AkSbi96eu4PRc6/i9YGAAAAgFKDW8oBAAAAAHAA11wBAEBwKu4xKolHqQAAfsUVbgAAAAAAHMAV7lNRzJn0lj/tK/rzNSt4sTIAAAAAgEBGwh1oSnJ7HAAAAAAg4JFwlzY87wYAKE34uUwAgB+RcPvQ0s3F3HIuqRW3nQMAAABASOClaQAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADeEt5gCnuTeYh8RZzfpoMAAAAQClAwh1ivJKw85ulAIAQUey42MFHFQEAlErcUg4AAAAAgAO4wh1kijtTDwBAacGYCAAIdCTcAACg1Hp67g9Fzr8r/L2iF8BjVgCAIpBww/tK8lI0AAAAAAhxJNyngFvXAAAAAAAlRcJdypTkpEFI/PQYACDgBcKJ7JY//avoAoyJAIAzQMINAACcwSNGAIBSjoQbAACgEPyONwDgTJBwn4gz8SVDOwEAAABAscL8XQEAAAAAAEIRV7hPEAgvbwEAAAAAhAYSbqAwxd0632GEb+oBAAhYT8/9odgyd3Wp7YOaAAACEbeUAwAAAADgAK5wI58zvbW+uN/xLtFvgfPWVwBAiFg6cViR81vd/ISPagIA8DUSbgQmp9+Ezu3gAAAAABxWahLu4s4uAwAAAADgTaUm4UbpUtxt69yyDgDwhpY//euMl1Hci9d46RoABC8SbpROTt+y7iu8SR0AQh/7egAIWiTcAAAAwawkJ5FJygHAL0i4EZDO9E3pPlHMAc7TOZcXOT9YbhHkVkcApyso9uVBoNjHpIr5dRBJXCUHAD8h4YbXcYB1XLHP9S0swQESB0DFJvwSST+A4Hamz4GX6Oc2S5KUF4WEHQBOi18SbjOTJB04cMBry3xhwaYi55/35xGvrQvB78DBw46v42Ax21xJ6vDV838vcv751ZOKXsBHWUXPv2BosXU4fPCPIucX9z0u7rvpDWNmrCxy/qCOZxU53xt1PNN1DAr/oOgVlKCv4B3F9lUxfX2q8r5DeWOjPzgxLkvF7wfhO/PW7iy6wNqix5tinennJS1P61/k/PN2TCpyfrFjoqSvtv5a9DL6PFLsMkqFT58sen4ojEnFxVgSodAOwcKH26S3x2WX+WGE37Fjh9LT0329WgAAAtb27duVlpbml3UzLgMA4Mlb47JfEu7c3Fzt3LlTcXFxcrlcvl69Dhw4oPT0dG3fvl3x8fE+X7+vEGdoIc7QQpyh5UziNDNlZ2crNTVVYWFhDtWwaMWNy6WlH08HbVM42qZotE/haJvC0TaF81bbeHtc9sst5WFhYX47i3+i+Pj4UrGhEmdoIc7QQpyh5XTjTEhIcKA2JVfScbm09OPpoG0KR9sUjfYpHG1TONqmcN5oG2+Oy/45lQ4AAAAAQIgj4QYAAAAAwAGlMuGOiopSVlaWoqKi/F0VRxFnaCHO0EKcoSXU4wz1+M4EbVM42qZotE/haJvC0TaFC9S28ctL0wAAAAAACHWl8go3AAAAAABOI+EGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4ICQSbjHjx+vGjVqKDo6Ws2aNdNnn31WZPnFixerWbNmio6OVs2aNTVhwgSP+WvXrtXll1+u6tWry+Vy6ZlnnnGw9iXn7ThffvlltWvXTomJiUpMTFTnzp311VdfORlCiXg7zunTp6t58+YqX768YmNj1aRJE73xxhtOhlAi3o7zRFOnTpXL5dIll1zi5VqfOm/H+dprr8nlcuX7d/jwYSfDKJYT/bl//34NGjRIKSkpio6OVr169fTxxx87FUKJeDvO9u3bF9ifPXv2dDKMYjnRn88884zq1KmjmJgYpaen66677vLJdutELO+9957q16+vqKgo1a9fX++///4Zr9cf/NE2I0eOzLe9JycnezUub/DXsVVp3G5K0jbBst1I/jteLY3bTknaJli2HX8d/zu+3VgImDp1qkVERNjLL79s69atsyFDhlhsbKxt27atwPKbN2+2smXL2pAhQ2zdunX28ssvW0REhL377rvuMl999ZUNGzbMpkyZYsnJyfb000/7KJrCORHnddddZy+88IKtWrXK1q9fb/3797eEhATbsWOHr8LKx4k4Fy5caNOnT7d169bZpk2b7JlnnrEyZcrY7NmzfRVWPk7EmWfr1q1WtWpVa9eunV188cUOR1I0J+KcNGmSxcfH265duzz++ZMTcR45csSaN29uPXr0sM8//9y2bt1qn332ma1evdpXYeXjRJz79u3z6Mc1a9ZYmTJlbNKkST6KKj8n4nzzzTctKirK3nrrLduyZYt98sknlpKSYnfeeWfQxbJkyRIrU6aMjR492tavX2+jR4+28PBwW7Zs2Wmv1x/81TZZWVl2zjnneGz3e/fudTzeU+GvY6vSut2UpG2CYbsx89/xamnddkrSNsGw7fjr+N8X201IJNznn3++3XbbbR7T6tata8OHDy+w/D333GN169b1mHbrrbday5YtCyyfkZEREAm303GameXk5FhcXJxNnjz5zCt8mnwRp5lZ06ZN7Z///OeZVfYMOBVnTk6OtWnTxl555RXr27ev3xNuJ+KcNGmSJSQkeL2uZ8KJOF988UWrWbOm/fXXX96v8Gnyxffz6aeftri4OPvjjz/OvMKnyYk4Bw0aZB07dvQoc/fdd1vbtm29VOuCORHLVVddZd27d/co061bN7vmmmtOe73+4K+2ycrKssaNG59h7Z3lr2Or0rrdnKiwtgmG7cbMf8erbDvHFdQ2wbDt+Ov43xfbTdDfUv7XX3/p66+/VteuXT2md+3aVUuWLCnwM0uXLs1Xvlu3blqxYoWOHj3qWF3PhK/iPHTokI4ePaqkpCTvVPwU+SJOM9P8+fO1YcMGXXDBBd6r/ClwMs4HH3xQlSpV0s033+z9ip8iJ+P8448/lJGRobS0NPXq1UurVq3yfgAl5FScM2fOVKtWrTRo0CBVqVJFDRo00OjRo3Xs2DFnAimGr/ZDEydO1DXXXKPY2FjvVPwUORVn27Zt9fXXX7tv9du8ebM+/vhjR2+ddyqWwsrkLfN01utr/mqbPBs3blRqaqpq1Kiha665Rps3bz7TkLzGX8dWpXm7KalA3m4k/x2vsu38T2HH8oG87fjr+N9X203QJ9y//PKLjh07pipVqnhMr1Klinbv3l3gZ3bv3l1g+ZycHP3yyy+O1fVM+CrO4cOHq2rVqurcubN3Kn6KnIzz999/V7ly5RQZGamePXtq3Lhx6tKli/eDKAGn4vziiy80ceJEvfzyy85U/BQ5FWfdunX12muvaebMmZoyZYqio6PVpk0bbdy40ZlAiuFUnJs3b9a7776rY8eO6eOPP9Y///lPPfnkk3rkkUecCaQYvtgPffXVV1qzZo3+9re/ea/ip8ipOK+55ho99NBDatu2rSIiIlSrVi116NBBw4cPdyYQORdLYWXylnk66/U1f7WNJLVo0UKvv/66PvnkE7388svavXu3WrdurX379nkjtDPmr2Or0rzdlESgbzeS/45X2Xb+p6Bj+UDfdvx1/O+r7Sbca0vyM5fL5fG3meWbVlz5gqYHGifjfOyxxzRlyhQtWrRI0dHRXqjt6XMizri4OK1evVp//PGH5s+fr7vvvls1a9ZU+/btvVfxU+TNOLOzs3XDDTfo5ZdfVsWKFb1f2TPg7f5s2bKlWrZs6Z7fpk0bnXvuuRo3bpyee+45b1X7lHk7ztzcXFWuXFn/+te/VKZMGTVr1kw7d+7U448/rgceeMDLtS85J/dDEydOVIMGDXT++ed7oaZnxttxLlq0SI888ojGjx+vFi1aaNOmTRoyZIhSUlJ0//33e7n2xdftTPusJMs81fX6gz/a5sILL3T/v2HDhmrVqpVq1aqlyZMn6+677z71IBzir2Or0rrdFCdYthvJf8erpX3bKaxtgmXb8dfxv9PbTdAn3BUrVlSZMmXynYXYu3dvvrMVeZKTkwssHx4ergoVKjhW1zPhdJxPPPGERo8erXnz5qlRo0berfwpcDLOsLAwnXXWWZKkJk2aaP369RozZoxfEm4n4ly7dq22bt2qiy66yD0/NzdXkhQeHq4NGzaoVq1aXo6kaL76foaFhem8887z2xVup+JMSUlRRESEypQp4y5Tr1497d69W3/99ZciIyO9HEnRnO7PQ4cOaerUqXrwwQe9W/FT5FSc999/v2688Ub31fuGDRvq4MGDGjBggO677z6FhXn/pjOnYimsTN4yT2e9vuavtilIbGysGjZs6Ld92Mn8dWxVmreb0xFo243kv+NVtp1TO5YPtG3HX8f/vtpugv6W8sjISDVr1kxz5871mD537ly1bt26wM+0atUqX/k5c+aoefPmioiIcKyuZ8LJOB9//HE99NBDmj17tpo3b+79yp8CX/anmenIkSNnXunT4EScdevW1XfffafVq1e7//Xu3VsdOnTQ6tWrlZ6e7lg8hfFVf5qZVq9erZSUFO9U/BQ5FWebNm20adMm94kTSfrhhx+UkpLi82Rbcr4///3vf+vIkSO64YYbvFvxU+RUnIcOHcqXVJcpU0Z2/AWmXozgf5yKpbAyecs8nfX6mr/apiBHjhzR+vXr/bYPO5m/jq1K83ZzOgJtu5H8d7xa2redUz2WD7Rtx1/H/z7bbrz2+jU/ynud+8SJE23dunV25513WmxsrG3dutXMzIYPH2433niju3zea+TvuusuW7dunU2cOLHAn+NZtWqVrVq1ylJSUmzYsGG2atUq27hxo8/jy+NEnI8++qhFRkbau+++6/FTAdnZ2T6PL48TcY4ePdrmzJljP/74o61fv96efPJJCw8Pt5dfftnn8eVxIs6TBcJbyp2Ic+TIkTZ79mz78ccfbdWqVda/f38LDw+3L7/80ufx5XEizp9++snKlStngwcPtg0bNthHH31klStXtocfftjn8eVxcrtt27atXX311T6LpShOxJmVlWVxcXE2ZcoU27x5s82ZM8dq1aplV111VdDF8sUXX1iZMmVs7Nixtn79ehs7dmyhPwtW2HoDgb/aZujQobZo0SLbvHmzLVu2zHr16mVxcXEh3zYlObYqrdtNSdomGLYbM/8dr5bWbackbRMM246/jv99sd2ERMJtZvbCCy9YRkaGRUZG2rnnnmuLFy92z+vbt69lZmZ6lF+0aJE1bdrUIiMjrXr16vbiiy96zN+yZYtJyvfv5OX4mrfjzMjIKDDOrKwsH0RTOG/Hed9999lZZ51l0dHRlpiYaK1atbKpU6f6IpQieTvOkwVCwm3m/TjvvPNOq1atmkVGRlqlSpWsa9eutmTJEl+EUiQn+nPJkiXWokULi4qKspo1a9ojjzxiOTk5TodSJCfi3LBhg0myOXPmOF39EvN2nEePHrWRI0darVq1LDo62tLT023gwIH222+/BV0sZmbTpk2zOnXqWEREhNWtW9fee++9U1pvoPBH21x99dWWkpJiERERlpqaapdddpmtXbvWkfjOhL+OrUrjdlOStgmW7cbMf8erpXHbKUnbBMu246/jf6e3G5eZQ/exAQAAAABQigX9M9wAAAAAAAQiEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNxBiDh06pJEjR2rRokX+roq2bt0ql8ul1157zT1t5MiRcrlc/qsUAAAOOtVxeOfOnRo5cqRWr17taL0k6e2339Yzzzzj+HoA/A8JNxBiDh06pFGjRgVEwl2Qv/3tb1q6dKm/qwEAgCNOdRzeuXOnRo0aRcINhCgSbiBIHDp0yGfrOnr0qHJychxZdlpamlq2bOnIsgEAcIovx+FgQ9sAhSPhBk7R2rVr5XK5NG3aNPe0r7/+Wi6XS+ecc45H2d69e6tZs2buv3Nzc/XYY4+pbt26ioqKUuXKldWnTx/t2LHD43Pt27dXgwYN9Omnn6p169YqW7asbrrpJknSggUL1L59e1WoUEExMTGqVq2aLr/8ch06dEhbt25VpUqVJEmjRo2Sy+WSy+VSv379Co1n0aJFcrlceuONNzR06FBVrVpVUVFR2rRpk37++WcNHDhQ9evXV7ly5VS5cmV17NhRn332Wb7l7Ny5U1dddZXi4uKUkJCgq6++Wrt3785XrqBbyl0ul0aOHJmvbPXq1T3qfujQIQ0bNkw1atRQdHS0kpKS1Lx5c02ZMqXQ+AAAoSWUxuFFixbpvPPOkyT179/fXf7EMXHFihXq3bu3kpKSFB0draZNm+rf//63e/4vv/yi9PR0tW7dWkePHnVPX7dunWJjY3XjjTe6Y5o1a5a2bdvmXk/eeJx3LHDyVfmCHg3r16+fypUrp++++05du3ZVXFycOnXqJEn666+/9PDDD7vbt1KlSurfv79+/vnnAuMHSoNwf1cACDbnnHOOUlJSNG/ePF155ZWSpHnz5ikmJkbr1q3Tzp07lZqaqpycHC1evFi33Xab+7O33367/vWvf2nw4MHq1auXtm7dqvvvv1+LFi3SypUrVbFiRXfZXbt26YYbbtA999yj0aNHKywsTFu3blXPnj3Vrl07vfrqqypfvrz++9//avbs2frrr7+UkpKi2bNnq3v37rr55pv1t7/9TZLcg39RRowYoVatWmnChAkKCwtT5cqV3QNkVlaWkpOT9ccff+j9999X+/btNX/+fLVv316S9Oeff6pz587auXOnxowZo9q1a2vWrFm6+uqrvdXskqS7775bb7zxhh5++GE1bdpUBw8e1Jo1a7Rv3z6vrgcAELhCaRw+99xzNWnSJPXv31///Oc/1bNnT0nH7waTpIULF6p79+5q0aKFJkyYoISEBE2dOlVXX321Dh06pH79+qlixYqaOnWq2rdvr3vvvVdPPfWUDh06pCuvvFLVqlXThAkTJEnjx4/XgAED9OOPP+r9998/oz7466+/1Lt3b916660aPny4cnJylJubq4svvlifffaZ7rnnHrVu3Vrbtm1TVlaW2rdvrxUrVigmJuaM1gsEJQNwym644QarWbOm++/OnTvbLbfcYomJiTZ58mQzM/viiy9Mks2ZM8fMzNavX2+SbODAgR7L+vLLL02S/eMf/3BPy8zMNEk2f/58j7LvvvuuSbLVq1cXWreff/7ZJFlWVlaJYlm4cKFJsgsuuKDYsjk5OXb06FHr1KmTXXrppe7pL774okmyDz74wKP8LbfcYpJs0qRJ7mlZWVl28q6nsPpmZGRY37593X83aNDALrnkkhLFBQAIXaE0Di9fvjzfWJmnbt261rRpUzt69KjH9F69ellKSoodO3bMPe3RRx81Sfb+++9b3759LSYmxr799luPz/Xs2dMyMjLyrSfvWGDhwoUe07ds2ZKvbn379jVJ9uqrr3qUnTJlikmy9957r8D4xo8fX0QrAKGLW8qB09CpUydt3rxZW7Zs0eHDh/X555+re/fu6tChg+bOnSvp+Nn2qKgotW3bVtLxs9SS8t1Wdv7556tevXqaP3++x/TExER17NjRY1qTJk0UGRmpAQMGaPLkydq8ebPXYrr88ssLnD5hwgSde+65io6OVnh4uCIiIjR//nytX7/eXWbhwoWKi4tT7969PT573XXXea1+0vG2+s9//qPhw4dr0aJF+vPPP726fABAcAjFcfhkmzZt0vfff6/rr79ekpSTk+P+16NHD+3atUsbNmxwl/+///s/9ezZU9dee60mT56scePGqWHDho7V7+Tjho8++kjly5fXRRdd5FHXJk2aKDk5OWBf5go4jYQbOA2dO3eWdHww//zzz3X06FF17NhRnTt3dg/Y8+bNU5s2bdy3T+Xd9pySkpJveampqfluiy6oXK1atTRv3jxVrlxZgwYNUq1atVSrVi09++yzZxxTQet76qmndPvtt6tFixZ67733tGzZMi1fvlzdu3f3SHb37dunKlWq5Pt8cnLyGdfrRM8995zuvfdezZgxQx06dFBSUpIuueQSbdy40avrAQAEtlAch0+2Z88eSdKwYcMUERHh8W/gwIGSjj+/nSfvWfHDhw8rOTnZ/ey2E8qWLav4+Ph89d2/f78iIyPz1Xf37t0edQVKE57hBk5DWlqaateurXnz5ql69epq3ry5ypcvr06dOmngwIH68ssvtWzZMo0aNcr9mQoVKkg6/kxY3rNZeXbu3Onx3JikQn+rul27dmrXrp2OHTumFStWaNy4cbrzzjtVpUoVXXPNNacdU0Hre/PNN9W+fXu9+OKLHtOzs7M9/q5QoYK++uqrfJ8v6KVpBYmKitKRI0fyTT/54Cc2NlajRo3SqFGjtGfPHvfV7osuukjff/99idYFAAh+oTgOnyyvPiNGjNBll11WYJk6deq4/79r1y4NGjRITZo00dq1azVs2DA999xzJVpXdHS0JOUbiwtLkgtqm4oVK6pChQqaPXt2gZ+Ji4srUV2AUMMVbuA0de7cWQsWLNDcuXPVpUsXSVLt2rVVrVo1PfDAAzp69Kj7DLwk921pb775psdyli9frvXr17vf8FlSZcqUUYsWLfTCCy9IklauXCnpePIqySu3W7tcLvfy8nz77bf5fke7Q4cOys7O1syZMz2mv/322yVaT/Xq1fXtt996TFuwYIH++OOPQj9TpUoV9evXT9dee602bNjAT5IAQCkTKuNwYeXr1Kmjs88+W998842aN29e4L+8JPbYsWO69tpr5XK59J///EdjxozRuHHjNH369HzrKqhe1atXl6R8Y/HJ43pRevXqpX379unYsWMF1vXEkwNAacIVbuA0derUSePHj9cvv/yiZ555xmP6pEmTlJiY6PFTJHXq1NGAAQM0btw4hYWF6cILL3S/HTU9PV133XVXseucMGGCFixYoJ49e6patWo6fPiwXn31VUn/u70uLi5OGRkZ+uCDD9SpUyclJSWpYsWK7sH0VPTq1UsPPfSQsrKylJmZqQ0bNujBBx9UjRo1PH6nu0+fPnr66afVp08fPfLIIzr77LP18ccf65NPPinRem688Ubdf//9euCBB5SZmal169bp+eefV0JCgke5Fi1aqFevXmrUqJESExO1fv16vfHGG2rVqpXKli17yvEBAIJXqIzDtWrVUkxMjN566y3Vq1dP5cqVU2pqqlJTU/XSSy/pwgsvVLdu3dSvXz9VrVpVv/76q9avX6+VK1e6fxotKytLn332mebMmaPk5GQNHTpUixcv1s0336ymTZuqRo0akqSGDRtq+vTpevHFF9WsWTOFhYWpefPmSk5OVufOnTVmzBglJiYqIyND8+fPz5ewF+Waa67RW2+9pR49emjIkCE6//zzFRERoR07dmjhwoW6+OKLdemll5Z4eUDI8Pdb24Bg9dtvv1lYWJjFxsbaX3/95Z7+1ltvmSS77LLL8n3m2LFj9uijj1rt2rUtIiLCKlasaDfccINt377do1xmZqadc845+T6/dOlSu/TSSy0jI8OioqKsQoUKlpmZaTNnzvQoN2/ePGvatKlFRUWZJI83fZ8s782k06ZNyzfvyJEjNmzYMKtatapFR0fbueeeazNmzLC+ffvme8vpjh077PLLL7dy5cpZXFycXX755bZkyZISvaX8yJEjds8991h6errFxMRYZmamrV69Ot9byocPH27Nmze3xMREi4qKspo1a9pdd91lv/zyS6HxAQBCU6iMw2bH3/Bdt25di4iIyPeG82+++cauuuoqq1y5skVERFhycrJ17NjRJkyYYGZmc+bMsbCwsHxvRd+3b59Vq1bNzjvvPDty5IiZmf366692xRVXWPny5c3lcnmMx7t27bIrrrjCkpKSLCEhwW644QZbsWJFgW8pj42NLTCOo0eP2hNPPGGNGze26OhoK1eunNWtW9duvfVW27hxY5FtAIQql5mZv5J9AAAAAABCFc9wAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAHh/lhpbm6udu7cqbi4OLlcLn9UAQCAgGBmys7OVmpqqsLC/HMenHEZAIDjvD0u+yXh3rlzp9LT0/2xagAAAtL27duVlpbml3UzLgMA4Mlb47JfEu64uDhJx4OIj4/3RxUAAAgIBw4cUHp6unts9AfGZQAAjvP2uOyXhDvvdrX4+HgGdl9bOKb4Mh1GOF8PAIAHf97KzbiMYhV3/MCxA4AQ461xmZemAQAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAF++VkwAAAAlFBJftITABCQuMINAAAAAIADuMINAAAA/yvuSn6HEb6pBwB4EVe4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABPMON/HiGCgAA3+Et5AAQsrjCDQAAAACAA0i4AQAAAABwALeUAwAAIPDxyBuAIMQVbgAAAAAAHMAV7mDD2V0AAAJHSV54xth8XBC8HO7puT8UOf+uLrV9VBMAoYKEG6eOpB8AAAAAikXCDQAAAPhAcVfQJa6iA6GGhBvexxVwAAAAACDhDjlB8HwUAADAyZZu3lfk/FY1KxS9AC88T9/yp38Vs4Anil8HAJyAt5QDAAAAAOAArnADAIDSizvDQkZxV8glqVWHM1xJADw2x5vUgeBCwu1L/HQIAAAAAJQa3FIOAAAAAIADuMINAADgpNJw27oPYizJLeOhoLhbxnmxGxBcuMINAAAAAIADuMIN3+NZdgAA4AfFXj32UT0AlB5c4QYAAAAAwAFc4QYAAKGrNDw/DZyC4q7ylwQ/PQaUHFe4AQAAAABwAFe4AQAAUCoU/4bvM+ONq8cAQgtXuAEAAAAAcAAJNwAAAAAADuCWcgSm4l5yw8+GAQAAAAhwJNyBhreplholec6Lt4ACAELF0s37/F2FUoHnyIHAwi3lAAAAAAA4gCvcCE7ccg4AAAAgwJFwexO3gwcNbuf2neLamnYGgOBXWm4XLzbOar6pB4DgQcKNkBQqzy+RrAIAAADBi4T7VHAFG6UQST8AhL7ScoXa31r+9K9iyyyrNsAHNXEWxw7A/5Bww+d8Mqh74ZauMx0sfHGVPRDqGCp3EwAAAADeRsINAAAABIiSXAX3t2C4qAAEChLuE3HLOEJMabn6zKAMAACAQETCDcCvSJYBlHbFPWrVqmYFx9eB0sMXz5EXt46n5xa//OLr+cQp1KigOvCLNfANEm6EpOJ20qHwQpLSorRcpQcAAEDoIeGG15WWs+gkgsGDq+iAnxT3qFaHEc5+3gu8MaZ54wo1gkMgPH/tjToEQhzFOuNHQS/3SjWKFAD7MPgfCTcAeAFJPQAAAE5WehJuXogGoJTjpAAAAIBvlZ6EG5JKdmtcabj1zRu3SvEceOjwRSIaCOsIBCT98HCmJ8OD5GT6md6WXloe1ULp4Y3jsDP+XlQ74ypo6cRhZ/T5Vh2KL3Om46Y3jg3OdB3BchzlFL8k3GYmSTpw4ID3Fvrpk95bVhD7auuvZ7yMeWt3eqEmoe/wwT/O6PPn7ZhUbJnlaf3PaB3wjjEzVvp9HYM6nlXsMs50m/RGHV5YsOmM6uDVceE0FRdDSdrhVOTFnDc2+oMj47IkHTzs3eX5wcE/j/i7CgAcUJIxs7h94pnuH0qyzy2unsUt40yPDbyxDm+MLb5Yx8nL8ta47DI/jPA7duxQenq6r1cLAEDA2r59u9LS0vyybsZlAAA8eWtc9kvCnZubq507dyouLk4ul8vXq/eqAwcOKD09Xdu3b1d8fLy/qxOQaKPi0UbFo41KhnYqXqC1kZkpOztbqampCgsL80sdQmlczhNo/exNxBaciC04EVtwOpPYvD0u++WW8rCwML+dxXdKfHx8yG2o3kYbFY82Kh5tVDK0U/ECqY0SEhL8uv5QHJfzBFI/exuxBSdiC07EFpxONzZvjsv+OZUOAAAAAECII+EGAAAAAMABJNxnKCoqSllZWYqKivJ3VQIWbVQ82qh4tFHJ0E7Fo41Kh1DuZ2ILTsQWnIgtOAVSbH55aRoAAAAAAKGOK9wAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCXQKffvqpLrroIqWmpsrlcmnGjBke881MI0eOVGpqqmJiYtS+fXutXbvWP5X1o+LaqV+/fnK5XB7/WrZs6Z/K+sGYMWN03nnnKS4uTpUrV9Yll1yiDRs2eJRhWypZO5X2benFF19Uo0aNFB8fr/j4eLVq1Ur/+c9/3PPZjopvo9K+DYWSUB17QnnMCOX9fCjvn0vTfnXMmDFyuVy688473dOCue9OVFBswdp3I0eOzFfv5ORk9/xA6TMS7hI4ePCgGjdurOeff77A+Y899pieeuopPf/881q+fLmSk5PVpUsXZWdn+7im/lVcO0lS9+7dtWvXLve/jz/+2Ic19K/Fixdr0KBBWrZsmebOnaucnBx17dpVBw8edJdhWypZO0mle1tKS0vT2LFjtWLFCq1YsUIdO3bUxRdf7B5E2I6KbyOpdG9DoSRUx55QHjNCeT8fyvvn0rJfXb58uf71r3+pUaNGHtODue/yFBabFLx9d84553jU+7vvvnPPC5g+M5wSSfb++++7/87NzbXk5GQbO3ase9rhw4ctISHBJkyY4IcaBoaT28nMrG/fvnbxxRf7pT6BaO/evSbJFi9ebGZsS4U5uZ3M2JYKkpiYaK+88grbURHy2siMbShUhfLYE8pjRqjv50N5/xxq+9Xs7Gw7++yzbe7cuZaZmWlDhgwxs9D4vhUWm1nw9l1WVpY1bty4wHmB1Gdc4T5DW7Zs0e7du9W1a1f3tKioKGVmZmrJkiV+rFlgWrRokSpXrqzatWvrlltu0d69e/1dJb/5/fffJUlJSUmS2JYKc3I75WFbOu7YsWOaOnWqDh48qFatWrEdFeDkNsrDNlR6hEJfh/KYEar7+VDeP4fqfnXQoEHq2bOnOnfu7DE9FPqusNjyBGvfbdy4UampqapRo4auueYabd68WVJg9Vm4T9cWgnbv3i1JqlKlisf0KlWqaNu2bf6oUsC68MILdeWVVyojI0NbtmzR/fffr44dO+rrr79WVFSUv6vnU2amu+++W23btlWDBg0ksS0VpKB2ktiWJOm7775Tq1atdPjwYZUrV07vv/++6tev7x5E2I4KbyOJbag0CYW+DuUxIxT386G8fw7l/erUqVO1cuVKLV++PN+8YP++FRWbFLx916JFC73++uuqXbu29uzZo4cfflitW7fW2rVrA6rPSLi9xOVyefxtZvmmlXZXX321+/8NGjRQ8+bNlZGRoVmzZumyyy7zY818b/Dgwfr222/1+eef55vHtvQ/hbUT25JUp04drV69Wvv379d7772nvn37avHixe75bEeFt1H9+vXZhkqRUOjrUB4zQnE/H8r751Ddr27fvl1DhgzRnDlzFB0dXWi5YOy7ksQWrH134YUXuv/fsGFDtWrVSrVq1dLkyZPdL30LhD7jlvIzlPcmvLyzKHn27t2b74wKPKWkpCgjI0MbN270d1V86u9//7tmzpyphQsXKi0tzT2dbclTYe1UkNK4LUVGRuqss85S8+bNNWbMGDVu3FjPPvss29EJCmujgpTGbai0Cra+DuUxI1T386G8fw7V/erXX3+tvXv3qlmzZgoPD1d4eLgWL16s5557TuHh4e7+Cca+Ky62Y8eO5ftMMPXdiWJjY9WwYUNt3LgxoL5vJNxnqEaNGkpOTtbcuXPd0/766y8tXrxYrVu39mPNAt++ffu0fft2paSk+LsqPmFmGjx4sKZPn64FCxaoRo0aHvPZlo4rrp0KUtq2pYKYmY4cOcJ2VIS8NioI21DpESx9HcpjRmnbz4fy/jlU9qudOnXSd999p9WrV7v/NW/eXNdff71Wr16tmjVrBm3fFRdbmTJl8n0mmPruREeOHNH69euVkpISWN83n76iLUhlZ2fbqlWrbNWqVSbJnnrqKVu1apVt27bNzMzGjh1rCQkJNn36dPvuu+/s2muvtZSUFDtw4ICfa+5bRbVTdna2DR061JYsWWJbtmyxhQsXWqtWraxq1aqlpp1uv/12S0hIsEWLFtmuXbvc/w4dOuQuw7ZUfDuxLZmNGDHCPv30U9uyZYt9++239o9//MPCwsJszpw5ZsZ2ZFZ0G7ENhZZQHXtCecwI5f18KO+fS9t+9eQ3eQdz353sxNiCue+GDh1qixYtss2bN9uyZcusV69eFhcXZ1u3bjWzwOkzEu4SWLhwoUnK969v375mdvy181lZWZacnGxRUVF2wQUX2HfffeffSvtBUe106NAh69q1q1WqVMkiIiKsWrVq1rdvX/vpp5/8XW2fKahtJNmkSZPcZdiWim8ntiWzm266yTIyMiwyMtIqVapknTp1ch/MmbEdmRXdRmxDoSVUx55QHjNCeT8fyvvn0rZfPTnhDua+O9mJsQVz31199dWWkpJiERERlpqaapdddpmtXbvWPT9Q+sxlZubU1XMAAAAAAEornuEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQSYQ4cOaeTIkVq0aJG/q1Kg1157TS6XS1u3bvX5unfu3KmRI0dq9erVPl83AADeFuhjPoAzR8INBJhDhw5p1KhRATv49uzZU0uXLlVKSorP171z506NGjWKhBsAEBICfcwHcOZIuAEfOXTokL+rcEb+/PNPmZkqVaqkli1bKioqyt9V8pq82AAA8IZAHPPNTH/++ae/qwGUOiTcKHXWrl0rl8uladOmuad9/fXXcrlcOuecczzK9u7dW82aNXP/nZubq8cee0x169ZVVFSUKleurD59+mjHjh0en2vfvr0aNGigTz/9VK1bt1bZsmV10003SZIWLFig9u3bq0KFCoqJiVG1atV0+eWX69ChQ9q6dasqVaokSRo1apRcLpdcLpf69etXaDyLFi2Sy+XSm2++qbvvvlvJycmKiYlRZmamVq1ala/8ihUr1Lt3byUlJSk6OlpNmzbVv//9b48yebeNz5kzRzfddJMqVaqksmXL6siRIwXeUp4X79KlS9W6dWvFxMSoevXqmjRpkiRp1qxZOvfcc1W2bFk1bNhQs2fPzlevjRs36rrrrlPlypUVFRWlevXq6YUXXvCI87zzzpMk9e/f3902I0eO9FpsAIDQEmpj/uHDhzV06FA1adJECQkJSkpKUqtWrfTBBx/kK+tyuTR48GBNmDBB9erVU1RUlCZPniyp+DH3VNcFoAgGlEIpKSk2YMAA999jx461mJgYk2T//e9/zczs6NGjFh8fb/fcc4+73IABA0ySDR482GbPnm0TJkywSpUqWXp6uv3888/ucpmZmZaUlGTp6ek2btw4W7hwoS1evNi2bNli0dHR1qVLF5sxY4YtWrTI3nrrLbvxxhvtt99+s8OHD9vs2bNNkt188822dOlSW7p0qW3atKnQWBYuXGiSLD093S6++GL78MMP7c0337SzzjrL4uPj7ccff3SXXbBggUVGRlq7du3snXfesdmzZ1u/fv1Mkk2aNMldbtKkSSbJqlatagMGDLD//Oc/9u6771pOTo573pYtWzzirVChgtWpU8cmTpxon3zyifXq1csk2ahRo6xhw4Y2ZcoU+/jjj61ly5YWFRXlbmczs7Vr11pCQoI1bNjQXn/9dZszZ44NHTrUwsLCbOTIkWZm9vvvv7vX/c9//tPdNtu3b/dabACA0BNKY/7+/futX79+9sYbb9iCBQts9uzZNmzYMAsLC7PJkyd7lM0b6xo1amRvv/22LViwwNasWVOiMfdU1wWgcCTcKJVuuOEGq1mzpvvvzp072y233GKJiYnuQeSLL74wSTZnzhwzM1u/fr1JsoEDB3os68svvzRJ9o9//MM9LTMz0yTZ/PnzPcq+++67JslWr15daN1+/vlnk2RZWVkliiUv4T733HMtNzfXPX3r1q0WERFhf/vb39zT6tata02bNrWjR496LKNXr16WkpJix44dM7P/JaV9+vTJt77CEm5JtmLFCve0ffv2WZkyZSwmJsYjuV69erVJsueee849rVu3bpaWlma///67x7oGDx5s0dHR9uuvv5qZ2fLly/Ml0N6MDQAQekJpzD9ZTk6OHT161G6++WZr2rSpxzxJlpCQ4B5D85R0zD2VdQEoHLeUo1Tq1KmTNm/erC1btujw4cP6/PPP1b17d3Xo0EFz586VJM2bN09RUVFq27atJGnhwoWSlO9Wr/PPP1/16tXT/PnzPaYnJiaqY8eOHtOaNGmiyMhIDRgwQJMnT9bmzZu9FtN1110nl8vl/jsjI0OtW7d213vTpk36/vvvdf3110uScnJy3P969OihXbt2acOGDR7LvPzyy0u8/pSUFI9b8ZKSklS5cmU1adJEqamp7un16tWTJG3btk3S8VvW5s+fr0svvVRly5bNV6/Dhw9r2bJlRa7b6dgAAMEr1Mb8adOmqU2bNipXrpzCw8MVERGhiRMnav369fnKduzYUYmJie6/T3XMPZV1ASgYCTdKpc6dO0s6PsB+/vnnOnr0qDp27KjOnTu7B9F58+apTZs2iomJkSTt27dPkgp8O3dqaqp7fp6CytWqVUvz5s1T5cqVNWjQINWqVUu1atXSs88+e8YxJScnFzgtr1579uyRJA0bNkwREREe/wYOHChJ+uWXX4qNoTBJSUn5pkVGRuabHhkZKen4oC8db9ecnByNGzcuX7169OhRYL1O5nRsAIDgFUpj/vTp03XVVVepatWqevPNN7V06VItX75cN910k3tcLapepzLmnuq6ABQs3N8VAPwhLS1NtWvX1rx581S9enU1b95c5cuXV6dOnTRw4EB9+eWXWrZsmUaNGuX+TIUKFSRJu3btUlpamsfydu7cqYoVK3pMO/Fq84natWundu3a6dixY1qxYoXGjRunO++8U1WqVNE111xz2jHt3r27wGl59c6r34gRI3TZZZcVuIw6deqUKAZvSkxMVJkyZXTjjTdq0KBBBZapUaNGkcsI1NgAAP4XSmP+m2++qRo1auidd97xWGdhL/48uV6nMuae6roAFIyEG6VW586d9e9//1vp6enq2bOnJKl27dqqVq2aHnjgAR09etR9VlyS+1axN9980/22bElavny51q9fr/vuu++U1l+mTBm1aNFCdevW1VtvvaWVK1fqmmuucf/c1qn+dMeUKVN09913uwfFbdu2acmSJerTp4+k4wnn2WefrW+++UajR48+pWU7qWzZsurQoYNWrVqlRo0aua+AF6SwtgnU2AAAgSFUxnyXy6XIyEiPBHj37t0lfnP4qYy5Z7ouAMeRcKPU6tSpk8aPH69ffvlFzzzzjMf0SZMmKTEx0eOZ5Dp16mjAgAEaN26cwsLCdOGFF2rr1q26//77lZ6errvuuqvYdU6YMEELFixQz549Va1aNR0+fFivvvqqpP/d8hYXF6eMjAx98MEH6tSpk5KSklSxYkVVr169yGXv3btXl156qW655Rb9/vvvysrKUnR0tEaMGOEu89JLL+nCCy9Ut27d1K9fP1WtWlW//vqr1q9fr5UrV3r8bIovPfvss2rbtq3atWun22+/XdWrV1d2drY2bdqkDz/8UAsWLJB0/Pa8mJgYvfXWW6pXr57KlSun1NRUpaamBmxsAAD/C5Uxv1evXpo+fboGDhyoK664Qtu3b9dDDz2klJQUbdy4sURtUdIx1xvrAiB+Fgyl12+//WZhYWEWGxtrf/31l3v6W2+9ZZLssssuy/eZY8eO2aOPPmq1a9e2iIgIq1ixot1www3un6bKk5mZaeecc06+zy9dutQuvfRSy8jIsKioKKtQoYJlZmbazJkzPcrNmzfPmjZtalFRUSbJ+vbtW2gceW8pf+ONN+yOO+6wSpUqWVRUlLVr187jreF5vvnmG7vqqquscuXKFhERYcnJydaxY0ebMGGCu0zem7yXL1+e7/OFvaW8oHgzMjKsZ8+e+aZLskGDBnlM27Jli910001WtWpVi4iIsEqVKlnr1q3t4Ycf9ig3ZcoUq1u3rkVEROR7s+uZxgYACE2hMuabHf9Zs+rVq1tUVJTVq1fPXn75ZcvKyrKTD+sLGmvzlHTMLem6ABTOZWbml0wfgFcsWrRIHTp00LRp03TFFVf4uzoAAAAA/j/eUg4AAAAAgANIuAEAAAAAcAC3lAMAAAAA4ACucAMAAAAA4AASbgAAAAAAHOCX3+HOzc3Vzp07FRcXJ5fL5Y8qAAAQEMxM2dnZSk1NVViYf86DMy4DAHCct8dlvyTcO3fuVHp6uj9WDQBAQNq+fbvS0tL8sm7GZQAAPHlrXPZLwh0XFyfpeBDx8fH+qAIAAAHhwIEDSk9Pd4+N/sC4DADAcd4el/2ScOfdrhYfH8/ADgCA5NdbuRmXAQDw5K1x2S8JNxy0cEzR8zuM8E09AAAIBIyLAAA/4i3lAAAAAAA4gCvcpU1xZ/olzvYDAAAAgBdwhRsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4IBwf1cAAADgtCwc4+8aAABQJK5wAwAAAADgABJuAAAAAAAcwC3lyK+4W/Q6jPBNPQAAAAAgiJFw49SRkAMAAABAsbilHAAAAAAAB3CFO9jwRlYAALyHu7YAAA7iCjcAAAAAAA4g4QYAAAAAwAHcUg7v4/Y8AAAAACDh9qmSPH9NMgoAAAAAIYGEGwAAoDDctQUAOAMk3AAAIDDxyxwAgCDHS9MAAAAAAHAACTcAAAAAAA4g4QYAAAAAwAE8wx1oeF4NAAAAAEICV7gBAAAAAHAAV7i9iavTAAAAAID/jyvcAAAAAAA4gCvcwGl6eu4PRc6/q0ttH9UEAAAAQCDiCjcAAAAAAA4g4QYAAAAAwAHcUg7fK8nL5TqMcL4eAAD/CoWXjTKmAQCKwBVuAAAAAAAcwBVuoBDFvRQtFJQkRl7+BgAAAJwernADAAAAAOAArnAjMBX3TBzPw0kKjJ8mC4Q6AAAAAIGIhPtUhMLLXULE0onDipzf6uYnfFQTZ5WG29oBIJgt3byv2DKtOvigImeIk6cA4AwSbvhcSQ5OzhQHDgCAYMGYBQChi4QbpRJXjgEApUkwjHuceAAQiki4AT8JhoMfKXjqCQCBiv0oAJReJNyAQwLhACsY6sAVCwAIfr74mUnGEwDBqPQk3CV54RlvvgaCEgdhAJzijfeOtPzpX0XOX1ZtwBktPxBOrgIAClZ6Eu6S4C3kAAAgwJBQA0DwIuGG1/niLeQIHYFwdToQ6gCEJE5khwxvJP1OnzgIhBMTjBcATkbCjXyKS5hb1azgo5qcPqdv34PvBMNBni944/lITiwAAAD4Fgk3ThlXsFHa+CLZBeCMYDiJzEni0BEIJzYDoQ4A/scvCbeZSZIOHDjgvYV++qT3lhXEvtr6q7+rEBQabhhXbJnlaf2LnH/ejkln9HmEluL2Z4cP/hHwdSju8y8s2FRsHQZ1POuMllHc533B13XMa/e8sdEfHBmXJengYe8u73Sq8OeRIucfKKaOxX3eF4obsxhvgseYGSvPeBnF7YOK29f7og7eEAjjRSDUAb7n7XHZZX4Y4Xfs2KH09HRfrxYAgIC1fft2paWl+WXdjMsAAHjy1rjsl4Q7NzdXO3fuVFxcnFwu1xkt68CBA0pPT9f27dsVHx/vpRr6TyjFQyyBK5TiIZbAFUrxOBmLmSk7O1upqakKCwvz6rJLKm9cNjNVq1aNPgsgoRKHFDqxhEocErEEolCJQwreWLw9LvvllvKwsDCvn8WPj48Pqo4sTijFQyyBK5TiIZbAFUrxOBVLQkKC15d5KvLG5bzb6OizwBMqcUihE0uoxCERSyAKlTik4IzFm+Oyf06lAwAAAAAQ4ki4AQAAAABwQNAn3FFRUcrKylJUVJS/q+IVoRQPsQSuUIqHWAJXKMUTSrEUJZTiDJVYQiUOKXRiCZU4JGIJRKEShxRasZwJv7w0DQAAAACAUBf0V7gBAAAAAAhEJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHBA0CffIkSPlcrk8/iUnJ7vnm5lGjhyp1NRUxcTEqH379lq7dq0fa1y46tWr54vF5XJp0KBBkqR+/frlm9eyZUs/1/q4Tz/9VBdddJFSU1Plcrk0Y8YMj/kl6YcjR47o73//uypWrKjY2Fj17t1bO3bs8GEU/1NUPEePHtW9996rhg0bKjY2VqmpqerTp4927tzpsYz27dvn669rrrnGx5EU3zcl2a4CpW+Ki6Wg74/L5dLjjz/uLhMo/TJmzBidd955iouLU+XKlXXJJZdow4YNHmWC5XtTXCzB9p0pSd8E0/fmTI0fP141atRQdHS0mjVrps8++8zfVfLgjeMAf/WVr8bO3377TTfeeKMSEhKUkJCgG2+8Ufv37/dpLN76zjgdiy/3zU7G4sv9mNN98uKLL6pRo0aKj49XfHy8WrVqpf/85z/u+cHQHyWNJVj65GRjxoyRy+XSnXfe6Z4WTP3iNxYksrKy7JxzzrFdu3a5/+3du9c9f+zYsRYXF2fvvfeefffdd3b11VdbSkqKHThwwI+1LtjevXs94pg7d65JsoULF5qZWd++fa179+4eZfbt2+ffSv9/H3/8sd1333323nvvmSR7//33PeaXpB9uu+02q1q1qs2dO9dWrlxpHTp0sMaNG1tOTo6Poyk6nv3791vnzp3tnXfese+//96WLl1qLVq0sGbNmnksIzMz02655RaP/tq/f7+PIym+b0qyXQVK3xQXy4kx7Nq1y1599VVzuVz2448/ussESr9069bNJk2aZGvWrLHVq1dbz549rVq1avbHH3+4ywTL96a4WILtO1OSvgmm782ZmDp1qkVERNjLL79s69atsyFDhlhsbKxt27bN31Vz88ZxgL/6yldjZ/fu3a1Bgwa2ZMkSW7JkiTVo0MB69erl01i89Z1xOhZf7pudjMWX+zGn+2TmzJk2a9Ys27Bhg23YsMH+8Y9/WEREhK1Zs8bMgqM/ShpLsPTJib766iurXr26NWrUyIYMGeKeHkz94i9BlXA3bty4wHm5ubmWnJxsY8eOdU87fPiwJSQk2IQJE3xUw9M3ZMgQq1WrluXm5prZ8S/hxRdf7N9KlcDJA21J+mH//v0WERFhU6dOdZf573//a2FhYTZ79myf1b0gBR04nOyrr74ySR4HoZmZmR47nkBQ2EFQUdtVoPZNSfrl4osvto4dO3pMC8R+MTt+wk2SLV682MyC+3tzciwFCZbvjFnB8QTr9+ZUnX/++Xbbbbd5TKtbt64NHz7cTzXK70yPAwKlr5waO9etW2eSbNmyZe4yS5cuNUn2/fff+yQWM+98Z/wRi1P7Zl/H4tR+zB99YmaWmJhor7zyStD2R0GxmAVfn2RnZ9vZZ59tc+fO9RjDQ6FffCFobimXpI0bNyo1NVU1atTQNddco82bN0uStmzZot27d6tr167uslFRUcrMzNSSJUv8Vd0S+euvv/Tmm2/qpptuksvlck9ftGiRKleurNq1a+uWW27R3r17/VjLkilJP3z99dc6evSoR5nU1FQ1aNAg4PtKkn7//Xe5XC6VL1/eY/pbb72lihUr6pxzztGwYcOUnZ3tnwoWo6jtKlj7Zs+ePZo1a5ZuvvnmfPMCsV9+//13SVJSUpKk4P7enBxLYWWC5TtTWDyh+L050V9//aWvv/7aIwZJ6tq1a8DFcCbHAYHaV96q+9KlS5WQkKAWLVq4y7Rs2VIJCQk+j+9MvzP+iMWpfbOvY3FqP+brOI4dO6apU6fq4MGDatWqVdD2R0Gx5AmmPhk0aJB69uypzp07e0wP5n7xpXB/V6CkWrRooddff121a9fWnj179PDDD6t169Zau3atdu/eLUmqUqWKx2eqVKmibdu2+aO6JTZjxgzt379f/fr1c0+78MILdeWVVyojI0NbtmzR/fffr44dO+rrr79WVFSU/ypbjJL0w+7duxUZGanExMR8ZfI+H6gOHz6s4cOH67rrrlN8fLx7+vXXX68aNWooOTlZa9as0YgRI/TNN99o7ty5fqxtfsVtV8HaN5MnT1ZcXJwuu+wyj+mB2C9mprvvvltt27ZVgwYNJAXv96agWE4WTN+ZwuIJ1e/NiX755RcdO3aswG0wkGI40+OAQO0rb9V99+7dqly5cr7lV65c2afxeeM74+tYnNw3+zIWJ/djvorju+++U6tWrXT48GGVK1dO77//vurXr+9OuoKpPwqLRQquPpk6dapWrlyp5cuX55sXjN8TfwiahPvCCy90/79hw4Zq1aqVatWqpcmTJ7tfMnDiFWLp+I7n5GmBZuLEibrwwguVmprqnnb11Ve7/9+gQQM1b95cGRkZmjVrVr6kIhCdTj8Eel8dPXpU11xzjXJzczV+/HiPebfccov7/w0aNNDZZ5+t5s2ba+XKlTr33HN9XdVCne52Feh98+qrr+r6669XdHS0x/RA7JfBgwfr22+/1eeff55vXrB9b4qKRQq+70xh8YTq96YggT6GOnUcEChxeqPuBZX3dXze+s74Mhan982+isXp/Zgv4qhTp45Wr16t/fv367333lPfvn21ePHiQusQyP1RWCz169cPmj7Zvn27hgwZojlz5uQ7zjpRMPWLPwTVLeUnio2NVcOGDbVx40b3W0pPPgOyd+/efGdcAsm2bds0b948/e1vfyuyXEpKijIyMrRx40Yf1ez0lKQfkpOT9ddff+m3334rtEygOXr0qK666ipt2bJFc+fO9bhSV5Bzzz1XERERAd9fJ29Xwdg3n332mTZs2FDsd0jyf7/8/e9/18yZM7Vw4UKlpaW5pwfj96awWPIE23emuHhOFArfm5NVrFhRZcqUCbox9FSPAwK1r7xV9+TkZO3Zsyff8n/++We/xnc63xlfxuL0vtlXsTi9H/NVHJGRkTrrrLPUvHlzjRkzRo0bN9azzz4bdP1RVCwFCdQ++frrr7V37141a9ZM4eHhCg8P1+LFi/Xcc88pPDzcvZ5g6hd/CNqE+8iRI1q/fr1SUlLctyaeeDviX3/9pcWLF6t169Z+rGXRJk2apMqVK6tnz55Fltu3b5+2b9+ulJQUH9Xs9JSkH5o1a6aIiAiPMrt27dKaNWsCsq/yEoeNGzdq3rx5qlChQrGfWbt2rY4ePRrw/XXydhVsfSMdv0OkWbNmaty4cbFl/dUvZqbBgwdr+vTpWrBggWrUqOExP5i+N8XFIgXXd6Yk8ZwsFL43J4uMjFSzZs3y3dI/d+7cgI7hVI8DArWvvFX3Vq1a6ffff9dXX33lLvPll1/q999/92t8p/Od8UUsvto3Ox2Lr/Zj/tq+zExHjhwJmv4oSSwFCdQ+6dSpk7777jutXr3a/a958+a6/vrrtXr1atWsWTPo+8UnnHsfm3cNHTrUFi1aZJs3b7Zly5ZZr169LC4uzrZu3Wpmx19Jn5CQYNOnT7fvvvvOrr322oD9WTAzs2PHjlm1atXs3nvv9ZienZ1tQ4cOtSVLltiWLVts4cKF1qpVK6tatWpAxJKdnW2rVq2yVatWmSR76qmnbNWqVe43EJekH2677TZLS0uzefPm2cqVK61jx45++wmdouI5evSo9e7d29LS0mz16tUeP91w5MgRMzPbtGmTjRo1ypYvX25btmyxWbNmWd26da1p06Y+j6eoWEq6XQVK3xS3nZmZ/f7771a2bFl78cUX830+kPrl9ttvt4SEBFu0aJHHNnTo0CF3mWD53hQXS7B9Z4qLJ9i+N2ci72fBJk6caOvWrbM777zTYmNj3WNsIPDGcYC/+spXY2f37t2tUaNGtnTpUlu6dKk1bNjQ6z+r46uxxulYfLlvdjIWX+7HnO6TESNG2Keffmpbtmyxb7/91v7xj39YWFiYzZkzx8yCoz9KEksw9UlBTv6lkWDqF38JmoQ77zfdIiIiLDU11S677DJbu3ate35ubq5lZWVZcnKyRUVF2QUXXGDfffedH2tctE8++cQk2YYNGzymHzp0yLp27WqVKlWyiIgIq1atmvXt29d++uknP9XU08KFC01Svn99+/Y1s5L1w59//mmDBw+2pKQki4mJsV69evktvqLi2bJlS4HzdMJvpv/00092wQUXWFJSkkVGRlqtWrXsjjvu8MvvphcVS0m3q0Dpm+K2MzOzl156yWJiYgr8/eZA6pfCtqFJkya5ywTL96a4WILtO1NcPMH2vTlTL7zwgmVkZFhkZKSde+65Rf7cmz944zjAX33lq7Fz3759dv3111tcXJzFxcXZ9ddfb7/99pvPYvHmd8bpWHy5b3YyFl/ux5zuk5tuusm9D6pUqZJ16tTJnWybBUd/lCSWYOqTgpyccAdTv/iLy8zsNC6MAwAAAACAIgTtM9wAAAAAAAQyEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBnzo0KFDGjlypBYtWuTvqjhq9OjRmjFjRr7pr732mlwul1asWOH7SgEASr3SMg5729tvv61nnnnG39UAghIJN+BDhw4d0qhRo0J+oC8s4QYAwJ9KyzjsbSTcwOkj4Qa84NChQ/6uAgAApRbjMIBARcKNkLJ27Vq5XC5NmzbNPe3rr7+Wy+XSOeec41G2d+/eatasmfvv3NxcPfbYY6pbt66ioqJUuXJl9enTRzt27PD4XPv27dWgQQN9+umnat26tcqWLaubbrpJkrRgwQK1b99eFSpUUExMjKpVq6bLL79chw4d0tatW1WpUiVJ0qhRo+RyueRyudSvX79C48nNzdXDDz+sOnXqKCYmRuXLl1ejRo307LPPusuMHDlSLpdL3377ra688kolJCQoKSlJd999t3JycrRhwwZ1795dcXFxql69uh577LF86/npp590ww03qHLlyoqKilK9evX05JNPKjc316Pcr7/+qoEDB6pq1aqKjIxUzZo1dd999+nIkSPuMi6XSwcPHtTkyZPdMbZv395jOdnZ2br99ttVsWJFVahQQZdddpl27tzpUaZ69erq1auXZs+erXPPPVcxMTGqW7euXn311Xz13717t2699ValpaUpMjJSNWrU0KhRo5STk+NR7sUXX1Tjxo1Vrlw5xcXFqW7duvrHP/7hnn/o0CENGzZMNWrUUHR0tJKSktS8eXNNmTKl0D4CAPxPqI3DkrR//34NHTpUNWvWdNerR48e+v77791lSjI+SsfHyMGDB2vSpEnusb158+ZatmyZzEyPP/64atSooXLlyqljx47atGlTgbF/9tlnatmypWJiYlS1alXdf//9OnbsmEfZUaNGqUWLFkpKSlJ8fLzOPfdcTZw4UWaWL8a3335brVq1Urly5VSuXDk1adJEEydOdK9z1qxZ2rZtm7vNXC6XJGnr1q1yuVx64okn9NRTT7nr3qpVKy1btizfelasWKHevXsrKSlJ0dHRatq0qf797397lCnJWLx582Zdc801Sk1NVVRUlKpUqaJOnTpp9erVRfYl4BcGhJiUlBQbMGCA+++xY8daTEyMSbL//ve/ZmZ29OhRi4+Pt3vuucddbsCAASbJBg8ebLNnz7YJEyZYpUqVLD093X7++Wd3uczMTEtKSrL09HQbN26cLVy40BYvXmxbtmyx6Oho69Kli82YMcMWLVpkb731lt14443222+/2eHDh2327NkmyW6++WZbunSpLV261DZt2lRoLGPGjLEyZcpYVlaWzZ8/32bPnm3PPPOMjRw50l0mKyvLJFmdOnXsoYcesrlz59o999zjjqVu3br23HPP2dy5c61///4myd577z335/fu3WtVq1a1SpUq2YQJE2z27Nk2ePBgk2S33367u9yff/5pjRo1stjYWHviiSdszpw5dv/991t4eLj16NHDXW7p0qUWExNjPXr0cMe4du1aMzObNGmSSbKaNWva3//+d/vkk0/slVdescTEROvQoYNH7BkZGZaWlmb169e3119/3T755BO78sorTZItXrzYXW7Xrl2Wnp5uGRkZ9tJLL9m8efPsoYcesqioKOvXr5+73JQpU0yS/f3vf7c5c+bYvHnzbMKECXbHHXe4y9x6661WtmxZe+qpp2zhwoX20Ucf2dixY23cuHGF9hEAwFMojcMHDhywc845x2JjY+3BBx+0Tz75xN577z0bMmSILViwwMxKPj6amUmyjIwMa926tU2fPt3ef/99q127tiUlJdldd91lF198sX300Uf21ltvWZUqVaxRo0aWm5v7/9q78/CoyvP/45/JHgJhCQQCgSCUhEUUJLJqw64Ibi2ligtYFBWogNoWpQpYi4JKUb8slSKiAlpAERGFIIvsooBWVpWlUBYBRcImCdy/P/hldMiezJmZTN6v68rFNWeeOed+7jmcZ+5zzjzj0fe4uDirWbOmvfTSS7Zo0SJ76KGHTJINHDjQY1t9+/a1qVOnWnp6uqWnp9vf/vY3i46OtlGjRnm0e+KJJ0yS/eY3v7HZs2fb4sWLbdy4cfbEE0+YmdmWLVusXbt2VqNGDXfO1q5da2Zmu3fvNklWt25du/76623evHk2b948a9q0qVWuXNmOHz/u3s7SpUstIiLCrr32Wnv77bfto48+sr59+5okmzZtmrtdYcbilJQU+9WvfmVvvPGGrVixwubOnWuPPPKILVu2LM/3EvAXCm4EnTvvvNPq1avnfty5c2e77777rHLlyjZ9+nQzM1u9erVJssWLF5uZ2bZt20ySDRgwwGNd69evN0n2+OOPu5elpaWZJPv444892s6ZM8ck2ebNm/OM7ciRIybJRowYUai+9OjRw5o1a5Zvm+yC+4UXXvBY3qxZM5Nk77zzjntZZmamVatWzX7zm9+4lw0bNswk2fr16z1e/+CDD5rL5bIdO3aYmdnkyZNNkv373//2aDdmzBiPXJqZxcTEWJ8+fXLEml1wX5rnsWPHmiQ7ePCge1lSUpJFRUXZ3r173cvOnDljVapUsfvvv9+97P7777fy5ct7tDMze/75502Su9gfNGiQVapUKUdMv3T55ZfbLbfckm8bAED+gmkcfuqpp0ySpaen59mmKOOjJKtRo4adPHnSvWzevHkmyZo1a+ZRXI8fP94k2Zdffulelt339957z2Nb9913n4WEhOQYC7OdP3/eMjMz7amnnrK4uDj3dnbt2mWhoaF2xx135JuH7t27W1JSUo7l2QV306ZNLSsry738008/NUk2a9Ys97KGDRta8+bNLTMz02MdPXr0sISEBDt//ryZFTwWHz161CTZ+PHj840ZCBTcUo6g06lTJ+3atUu7d+/W2bNntWrVKl1//fXq0KGD0tPTJUlLlixRZGSkrrnmGknSsmXLJCnHbWUtW7ZUo0aN9PHHH3ssr1y5sjp27OixrFmzZoqIiFD//v01ffp07dq1q8R9admypb744gsNGDBAixYt0okTJ/Js26NHD4/HjRo1ksvlUrdu3dzLwsLC9Ktf/Up79+51L1u6dKkaN26sli1bery+b9++MjMtXbrU3S4mJkY9e/bM0U5Sjhzl56abbvJ4fMUVV0iSR1zSxZzWqVPH/TgqKkrJycke7RYsWKAOHTqoZs2aysrKcv9l93vFihWSLuby+PHjuv322/Xee+/p6NGjOeJq2bKlPvzwQw0bNkzLly/XmTNnCt0nAMBFwTQOf/jhh0pOTlbnzp3zbFPU8bFDhw6KiYlxP27UqJEkqVu3bu5btX+5/NKxsUKFCjnG0d69e+vChQv65JNPPOLq3LmzKlasqNDQUIWHh+vJJ5/UsWPH9N1330mS0tPTdf78eQ0cODDfPBSke/fuCg0NdT++dFz/5ptvtH37dt1xxx2S5DFe33DDDTp48KB27NghqeCxuEqVKqpfv76ee+45jRs3Tps2bcrxFTggkFBwI+hkD4pLlizRqlWrlJmZqY4dO6pz587uQW/JkiVq166doqOjJUnHjh2TJCUkJORYX82aNd3PZ8utXf369bVkyRLFx8dr4MCBql+/vurXr+/xfeuieuyxx/T8889r3bp16tatm+Li4tSpU6dcf1arSpUqHo8jIiJUrlw5RUVF5Vh+9uxZ9+Njx47l2e/s57P/rVGjhseHAUmKj49XWFhYjhzlJy4uzuNxZGSkJOUYVC9tl932l+0OHz6s999/X+Hh4R5/2d8VzC6s77rrLr366qvau3evfvvb3yo+Pl6tWrVyf/iTpJdeekl/+ctfNG/ePHXo0EFVqlTRLbfcoq+//rrQfQOAsi6YxuEjR44oMTEx3zZFHR9zG6/zW/7LMVuSqlevniOGGjVquGORpE8//VRdu3aVJE2ZMkWrV6/Whg0bNHz4cEk/j7dHjhyRpAL7WJCCxvXDhw9Lkh599NEc4/WAAQMk/TxeFzQWu1wuffzxx7ruuus0duxYXXXVVapWrZoeeughZWRklKgfgBMouBF0EhMTlZycrCVLlig9PV2pqamqVKmSOnXqpIMHD2r9+vVat26dx9nq7IHi4MGDOdZ34MABVa1a1WPZpYNqtmuvvVbvv/++fvzxR61bt05t2rTRkCFD9NZbbxWrL2FhYXr44Ye1ceNGff/995o1a5b27dun6667zmszssbFxeXZb0nuvsfFxenw4cM5Jlv57rvvlJWVlSNHvlK1alV17dpVGzZsyPWvX79+7rb33HOP1qxZox9//FEffPCBzEw9evRwn4GPiYnRqFGjtH37dh06dEiTJk3SunXrdOONN/qlbwBQGgXTOFytWrUck7ZdytfjY3bx+kuHDh1yxyJJb731lsLDw7VgwQL16tVLbdu2VWpqao7XZU8iV1AfSyo7B4899lie43WzZs0kFW4sTkpK0tSpU3Xo0CHt2LFDQ4cO1cSJE/WnP/3J0X4AxUHBjaDUuXNnLV26VOnp6erSpYskKTk5WXXq1NGTTz6pzMxMj4E++7a0N99802M9GzZs0LZt29SpU6cibT80NFStWrXShAkTJEkbN26UlPeV3MKoVKmSevbsqYEDB+r777/Xnj17iryO3HTq1Elbt251x5jt9ddfl8vlUocOHdztTp48meP3tV9//XX389kuvQrtpB49euirr75S/fr1lZqamuMv+0r9L8XExKhbt24aPny4zp07py1btuRoU716dfXt21e33367duzYwU/OAEARBMs43K1bN+3cudP99arcFGV89IaMjAzNnz/fY9nMmTMVEhKiX//615IunpAICwvzuM37zJkzeuONNzxe17VrV4WGhmrSpEn5brOk43pKSooaNGigL774ItexOjU1VRUqVMjxusKMxcnJyfrrX/+qpk2b5vgsAwSCMH8HADihU6dOmjhxoo4eParx48d7LJ82bZoqV67s8VMkKSkp6t+/v15++WWFhISoW7du2rNnj5544gnVrl1bQ4cOLXCbkydP1tKlS9W9e3fVqVNHZ8+edf+EVfaHigoVKigpKUnvvfeeOnXqpCpVqqhq1aqqW7duruu88cYbdfnllys1NVXVqlXT3r17NX78eCUlJalBgwbFT9AvDB06VK+//rq6d++up556SklJSfrggw80ceJEPfjgg0pOTpYk3X333ZowYYL69OmjPXv2qGnTplq1apVGjx6tG264weODU9OmTbV8+XK9//77SkhIUIUKFZSSkuKVeC/11FNPKT09XW3bttVDDz2klJQUnT17Vnv27NHChQs1efJkJSYm6r777lN0dLTatWunhIQEHTp0SM8884wqVqyoq6++WpLUqlUr9ejRQ1dccYUqV66sbdu26Y033lCbNm1Urlw5R+IHgGAULOPwkCFD9Pbbb+vmm2/WsGHD1LJlS505c0YrVqxQjx491KFDhyKNj94QFxenBx98UP/973+VnJyshQsXasqUKXrwwQfd8550795d48aNU+/evdW/f38dO3ZMzz//vPuEQ7a6devq8ccf19/+9jedOXNGt99+uypWrKitW7fq6NGjGjVqlKSL4/o777yjSZMmqUWLFgoJCcn1inl+/vnPf6pbt2667rrr1LdvX9WqVUvff/+9tm3bpo0bN7p/Sq6gsfjLL7/UoEGD9Lvf/U4NGjRQRESEli5dqi+//FLDhg3zQoYBL/PvnG2AM3744QcLCQmxmJgYO3funHv5jBkz3D99canz58/bmDFjLDk52cLDw61q1ap255132r59+zzapaWlWZMmTXK8fu3atXbrrbdaUlKSRUZGWlxcnKWlpdn8+fM92i1ZssSaN29ukZGRJinX2byzvfDCC9a2bVurWrWqRUREWJ06daxfv362Z88ed5vsWcp/+ZMpZmZ9+vSxmJiYHOvMLf69e/da7969LS4uzsLDwy0lJcWee+4594yh2Y4dO2YPPPCAJSQkWFhYmCUlJdljjz1mZ8+e9Wi3efNma9eunZUrV84kWVpampn9PEv5hg0bPNovW7bMJHn8nEdSUpJ179491/iz15ftyJEj9tBDD9lll11m4eHhVqVKFWvRooUNHz7cPRPs9OnTrUOHDla9enWLiIiwmjVrWq9evTxmfx02bJilpqZa5cqVLTIy0urVq2dDhw61o0eP5ogDAJC3YBmHs/syePBgq1OnjoWHh1t8fLx1797dtm/f7m5T2PFRufx8V/ZM388995zH8uyxcfbs2Tn6vnz5cktNTbXIyEhLSEiwxx9/PMfs36+++qqlpKS4x7NnnnnGpk6dapJs9+7dHm1ff/11u/rqqy0qKsrKly9vzZs39/ipru+//9569uxplSpVMpfLZdklRF6xZ/f10tngv/jiC+vVq5fFx8dbeHi41ahRwzp27GiTJ092tyloLD58+LD17dvXGjZsaDExMVa+fHm74oor7B//+IfHTOlAoHCZXfKFEwAAAAABp3379jp69Ki++uorf4cCoJD4DjcAAAAAAA6g4AYAAAAAwAHcUg4AAAAAgAO4wg0AAAAAgAMouAEAAAAAcIBffof7woULOnDggCpUqCCXy+WPEAAACAhmpoyMDNWsWVMhIf45D864DADARd4el/1ScB84cEC1a9f2x6YBAAhI+/btU2Jiol+2zbgMAIAnb43Lfim4K1SoIOliJ2JjY/0RAgAAAeHEiROqXbu2e2z0B8ZlAAAu8va47JeCO/t2tdjYWAZ2AAAkv97KzbgMAIAnb43Lfim4gaCw7Jn8n+/wmG/iAAD4T0FjgcR4AABlGAU3AABAXgpTUAMAkAd+FgwAAAAAAAdwhRvBidu9AQAAAPgZV7gBAAAAAHAAV7gBAEDpxIRlAIAAR8ENAADKLiZFAwA4iFvKAQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AB+FgwAAAQvfvYLAOBHXOEGAAAAAMABXOEGAACBKViuThfUjw6P+SYOAIDPcYUbAAAAAAAHUHADAAAAAOAACm4AAAAAABzAd7jhfcHyXbWSfnfQB3lYO/XRfJ9v0+/5Em8DAOCwYBk3AQA5cIUbAAAAAAAHcIUb8JfCXEHnqgYAAABQalFwIzA5fXtdsPzUDAAAAICARcENAAAQyPiONwCUWhTcQAAraFI0AAAAAIGLSdMAAAAAAHAAV7jhe974/jTfwfaZf6TvzPf5oV2SfRQJAAAAULpQcAMAAP/g5CkAIMhRcJc2vpg4hclZAAAAAKDEKLjLGm7nhpdxyzkA5G/trmP5Pt+mXpyPIgEA+BqTpgEAAAAA4ACucKNMKuhqg1Q2rjgUdHUaAIBAwR1VAEojrnADAAAAAOAArnADwazA79v/1idhAAAcVJi5VZjwFAD8goIbAAAAJVKYrygVdMs3X3MCEIwouAE/Kcz3yB3fRp2Sb6P1f18poMXzJd8IAKBEAuH7zxTUAMoiCm4gD/yMS2DwxlUTAAAAwB+YNA0AAAAAAAdwhRsoJq6AAwB8wRfjzdqpj+a/jX6B//Ug7ogCEIi4wg0AAAAAgAO4wo1SiavL3lHwhGcorECYkAgA8sLxHgD8g4IbAADAj0r6qxW++NULZhgHgOKh4A42y57xdwSAz5X0gyBXnwHkxRfFbDAozBX0dXX6+yCSkuFuJQDeRsGNoBQIH5ACIYZAwFURAAAAlFUU3AAAAMiXN74DXtA6fHEFnJPAAHyNgjvQlIFbwrnyi6IIltsUC8JtjAhKZWBMAwAgPxTcAAAAKPUC4QStN66gF3SC1RdX6UsaAyeJgZ9RcHsTZ/IlcQUbcEJhPmDxAQcAACCwUHADAAAAheCLq8t8zxwILhTcyIEr1PAmb0y043QMa6fm//o2/Z73YjQAfIkxrfQIhPECpUcw3NbO3WtlQ/AU3AXdzt3hMee3UYDCDPpt6sWVaB0lfT1QGjn9IW3t1EcLjqGA5/+Rnv/3Br0xe29BcZb4xEFhjoElPNb64gMUv9sOAAB8xS8Ft5lJkk6cOOG9lZ46m//z3thWQdso6OVnfiqwzYkCtlHQOkr6egDOOHvqZL7PF/R/s6DXF2YdJT7mFuYYWMJtFNRPb4wbhcml0zHktr7ssdEfHBmXpQL3mU/3fO/d7QEFaLrjZUfXvyHxHkfXL0lX759WotcXJsaCjgUlPVZPWPpNgTGU1DPzNjq+jYEdf1Wi1xdmPCrpcdkbuS5pP72hoH54M0Zvj8su88MIv3//ftWuXdvXmwUAIGDt27dPiYmJftk24zIAAJ68NS77peC+cOGCDhw4oAoVKsjlcvl680V24sQJ1a5dW/v27VNsbKy/w/GqYO1bsPZLCt6+0a/SJ1j75ut+mZkyMjJUs2ZNhYSEOL693BR2XA7W99xXyF/JkL+SIX8lQ/5KpjTlz9vjsl9uKQ8JCfHbWfySiI2NDfgdpLiCtW/B2i8pePtGv0qfYO2bL/tVsWJFn2wnL0Udl4P1PfcV8lcy5K9kyF/JkL+SKS358+a47J9T6QAAAAAABDkKbgAAAAAAHEDBXQiRkZEaMWKEIiMj/R2K1wVr34K1X1Lw9o1+lT7B2rdg7Zc3kJuSIX8lQ/5KhvyVDPkrmbKcP79MmgYAAAAAQLDjCjcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADggDJbcE+cOFGXXXaZoqKi1KJFC61cuTLf9itWrFCLFi0UFRWlevXqafLkyTnajB8/XikpKYqOjlbt2rU1dOhQnT171qku5Koo/Tp48KB69+6tlJQUhYSEaMiQIbm2mzt3rho3bqzIyEg1btxY7777rkPR583b/ZoyZYquvfZaVa5cWZUrV1bnzp316aefOtiDvDnxnmV766235HK5dMstt3g36EJwol/Hjx/XwIEDlZCQoKioKDVq1EgLFy50qAd5c6Jvpe348c4776hLly6qVq2aYmNj1aZNGy1atChHu0A4fkje71sgHUO8zYnxsSxx4v9RWVLU/S/b6tWrFRYWpmbNmjkbYAArau5++uknDR8+XElJSYqMjFT9+vX16quv+ijawFPU/M2YMUNXXnmlypUrp4SEBN1zzz06duyYj6INLJ988oluvPFG1axZUy6XS/PmzSvwNWVq7LAy6K233rLw8HCbMmWKbd261QYPHmwxMTG2d+/eXNvv2rXLypUrZ4MHD7atW7falClTLDw83ObMmeNu8+abb1pkZKTNmDHDdu/ebYsWLbKEhAQbMmSIr7pV5H7t3r3bHnroIZs+fbo1a9bMBg8enKPNmjVrLDQ01EaPHm3btm2z0aNHW1hYmK1bt87h3vzMiX717t3bJkyYYJs2bbJt27bZPffcYxUrVrT9+/c73BtPTvQt2549e6xWrVp27bXX2s033+xMB/LgRL9++uknS01NtRtuuMFWrVple/bssZUrV9rmzZsd7o0nJ/pWGo8fgwcPtjFjxtinn35qO3futMcee8zCw8Nt48aN7jaBcPwwc6ZvgXIM8TYnxseyxIl9rSwpav6yHT9+3OrVq2ddu3a1K6+80jfBBpji5O6mm26yVq1aWXp6uu3evdvWr19vq1ev9mHUgaOo+Vu5cqWFhITYiy++aLt27bKVK1dakyZN7JZbbvFx5IFh4cKFNnz4cJs7d65JsnfffTff9mVt7CiTBXfLli3tgQce8FjWsGFDGzZsWK7t//znP1vDhg09lt1///3WunVr9+OBAwdax44dPdo8/PDDds0113gp6oIVtV+/lJaWlmsh0KtXL7v++us9ll133XV22223lSjWonCiX5fKysqyChUq2PTp04sbZrE41besrCxr166d/etf/7I+ffr4vOB2ol+TJk2yevXq2blz57wVZrE40bfSfvzI1rhxYxs1apT7cSAcP8yc6dul/HUM8TYnxseyxBf7WjArbv5+//vf21//+lcbMWJEmS24i5q7Dz/80CpWrGjHjh3zRXgBr6j5e+6556xevXoey1566SVLTEx0LMbSojAFd1kbO8rcLeXnzp3T559/rq5du3os79q1q9asWZPra9auXZuj/XXXXafPPvtMmZmZkqRrrrlGn3/+ufuWwl27dmnhwoXq3r27A73IqTj9Koy8+l6SdRaFU/261OnTp5WZmakqVap4bZ0FcbJvTz31lKpVq6Z+/fqVaD3F4VS/5s+frzZt2mjgwIGqXr26Lr/8co0ePVrnz58vaciF5lTfguH4ceHCBWVkZHj8H/L38UNyrm+X8scxxNucGh/LCl/ta8GquPmbNm2avv32W40YMcLpEANWcXI3f/58paamauzYsapVq5aSk5P16KOP6syZM74IOaAUJ39t27bV/v37tXDhQpmZDh8+rDlz5vhs3C7tytrYEebvAHzt6NGjOn/+vKpXr+6xvHr16jp06FCurzl06FCu7bOysnT06FElJCTotttu05EjR3TNNdfIzJSVlaUHH3xQw4YNc6wvv1ScfhVGXn0vyTqLwql+XWrYsGGqVauWOnfu7LV1FsSpvq1evVpTp07V5s2bSxhh8TjVr127dmnp0qW64447tHDhQn399dcaOHCgsrKy9OSTT5Y07EJxqm/BcPx44YUXdOrUKfXq1cu9zN/HD8m5vl3KH8cQb3NqfCwrfLWvBavi5O/rr7/WsGHDtHLlSoWFlbmPtG7Fyd2uXbu0atUqRUVF6d1339XRo0c1YMAAff/992Xue9zFyV/btm01Y8YM/f73v9fZs2eVlZWlm266SS+//LIvQi71ytrYUeaucGdzuVwej80sx7KC2v9y+fLly/X3v/9dEydO1MaNG/XOO+9owYIF+tvf/ublyPNX1H75a52BFMPYsWM1a9YsvfPOO4qKivLKOovCm33LyMjQnXfeqSlTpqhq1areCK/YvP2eXbhwQfHx8XrllVfUokUL3XbbbRo+fLgmTZpU0lCLzNt9K+3Hj1mzZmnkyJF6++23FR8f75V1epsTfcvm72OIt3l7fCxrnNzXyoLC5u/8+fPq3bu3Ro0apeTkZF+FF9CKsu9duHBBLpdLM2bMUMuWLXXDDTdo3Lhxeu2118rkVW6paPnbunWrHnroIT355JP6/PPP9dFHH2n37t164IEHfBFqUChLY0eZOx1YtWpVhYaG5jhj9d133+U405KtRo0aubYPCwtTXFycJOmJJ57QXXfdpXvvvVeS1LRpU506dUr9+/fX8OHDFRLi7LmN4vSrMPLqe0nWWRRO9Svb888/r9GjR2vJkiW64oorSry+onCib99++6327NmjG2+80b3swoULkqSwsDDt2LFD9evXL37QheDUe5aQkKDw8HCFhoa6lzVq1EiHDh3SuXPnFBERUex1F5ZTfSvNx4+3335b/fr10+zZs3Nc3fX38UNyrm/Z/HkM8Tanxseywul9LdgVNX8ZGRn67LPPtGnTJg0aNEjSxfHOzBQWFqbFixerY8eOPond34qz7yUkJKhWrVqqWLGie1mjRo1kZtq/f78aNGjgaMyBpDj5e+aZZ9SuXTv96U9/kiRdccUViomJ0bXXXqunn3466K7QeltZGzvK3BXuiIgItWjRQunp6R7L09PT1bZt21xf06ZNmxztFy9erNTUVIWHh0u6+P29Sz8Uh4aGyi5OTOfFHuSuOP0qjLz6XpJ1FoVT/ZKk5557Tn/729/00UcfKTU1tUTrKg4n+tawYUP95z//0ebNm91/N910kzp06KDNmzerdu3a3gg9X069Z+3atdM333zjPoEgSTt37lRCQoJPim3Jub6V1uPHrFmz1LdvX82cOTPX7635+/ghOdc3yf/HEG9zanwsK5zc18qCouYvNjY2x3j3wAMPKCUlRZs3b1arVq18FbrfFWffa9eunQ4cOKCTJ0+6l+3cuVMhISFKTEx0NN5AU5z85TVuS/LJuF3albmxwzdzswWW7Kn/p06dalu3brUhQ4ZYTEyM7dmzx8zMhg0bZnfddZe7ffbU9UOHDrWtW7fa1KlTc0xdP2LECKtQoYLNmjXLdu3aZYsXL7b69etbr169ArZfZmabNm2yTZs2WYsWLax37962adMm27Jli/v51atXW2hoqD377LO2bds2e/bZZ/32s2De7NeYMWMsIiLC5syZYwcPHnT/ZWRk+KxfTvXtUv6YpdyJfv33v/+18uXL26BBg2zHjh22YMECi4+Pt6effrrU9600Hj9mzpxpYWFhNmHCBI//Q8ePH3e3CYTjh1N9C5RjiLc5MT6WJU7sa2VJcY6vv1SWZykvau4yMjIsMTHRevbsaVu2bLEVK1ZYgwYN7N577/VXF/yqqPmbNm2ahYWF2cSJE+3bb7+1VatWWWpqqrVs2dJfXfCrjIwM92cdSTZu3DjbtGmT+2fVyvrYUSYLbjOzCRMmWFJSkkVERNhVV11lK1ascD/Xp08fS0tL82i/fPlya968uUVERFjdunVt0qRJHs9nZmbayJEjrX79+hYVFWW1a9e2AQMG2A8//OCD3vysqP2SlOMvKSnJo83s2bMtJSXFwsPDrWHDhjZ37lwf9MSTt/uVlJSUa5sRI0b4pkO/4MR79kv+KLjNnOnXmjVrrFWrVhYZGWn16tWzv//975aVleWD3njydt9K4/EjLS0t13716dPHY52BcPww837fAukY4m3eHh/LGif+H5UlRd3/fqksF9xmRc/dtm3brHPnzhYdHW2JiYn28MMP2+nTp30cdeAoav5eeukla9y4sUVHR1tCQoLdcccdtn//fh9HHRiWLVuW77GsrI8dLjPuewAAAAAAwNvK3He4AQAAAADwBQpuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcQoE6fPq2RI0dq+fLl/g4l4OzZs0cul0uvvfaae9maNWs0cuRIHT9+3G9xAQD8i7GzdKlbt6769u1brNfOnDlT48eP92o8gBPC/B0AgNydPn1ao0aNkiS1b9/ev8EEmISEBK1du1b169d3L1uzZo1GjRqlvn37qlKlSv4LDgDgN4ydpcu7776r2NjYYr125syZ+uqrrzRkyBDvBgV4GQU34GOnT59WuXLl/B1GqRYZGanWrVv7OwwAgI8wdgan5s2b+zsEwHHcUo4ya8uWLXK5XJo9e7Z72eeffy6Xy6UmTZp4tL3pppvUokUL9+MLFy5o7NixatiwoSIjIxUfH6+7775b+/fv93hd+/btdfnll+uTTz5R27ZtVa5cOf3hD3+QJC1dulTt27dXXFycoqOjVadOHf32t7/V6dOntWfPHlWrVk2SNGrUKLlcLrlcrgJvuzp+/LgeeeQR1atXzx3XDTfcoO3bt7vbfP/99xowYIBq1aqliIgI1atXT8OHD9dPP/3ksS6Xy6VBgwbpjTfeUKNGjVSuXDldeeWVWrBgQY7tbt++XbfffruqV6+uyMhI1alTR3fffbd7nUeOHNGAAQPUuHFjlS9fXvHx8erYsaNWrlzpXkdmZqbi4+N111135dqv6OhoPfzww5Jy3lI+cuRI/elPf5IkXXbZZe58LV++XP369VOVKlV0+vTpHOvt2LFjjvcaAJA3xs7SO3ZK0okTJ/Too4/qsssuU0REhGrVqqUhQ4bo1KlT+eZI+vl9WblypVq3bq3o6GjVqlVLTzzxhM6fP+/RtrD5uvSW8uXLl8vlcmnWrFkaPny4atasqdjYWHXu3Fk7duzwiOWDDz7Q3r173e+zy+VyPz9p0iRdeeWVKl++vCpUqKCGDRvq8ccfL7CPgCMMKMMSEhKsf//+7sfPPvusRUdHmyT73//+Z2ZmmZmZFhsba3/+85/d7fr372+SbNCgQfbRRx/Z5MmTrVq1ala7dm07cuSIu11aWppVqVLFateubS+//LItW7bMVqxYYbt377aoqCjr0qWLzZs3z5YvX24zZsywu+66y3744Qc7e/asffTRRybJ+vXrZ2vXrrW1a9faN998k2dfTpw4YU2aNLGYmBh76qmnbNGiRTZ37lwbPHiwLV261MzMzpw5Y1dccYXFxMTY888/b4sXL7YnnnjCwsLC7IYbbvBYnySrW7eutWzZ0v7973/bwoULrX379hYWFmbffvutu93mzZutfPnyVrduXZs8ebJ9/PHH9uabb1qvXr3sxIkTZma2fft2e/DBB+2tt96y5cuX24IFC6xfv34WEhJiy5Ytc69r6NChFh0dbT/++KNHLBMnTjRJ9uWXX5qZ2e7du02STZs2zczM9u3bZ3/84x9Nkr3zzjvufP3444/2xRdfmCSbMmWKxzq3bNlikmzChAl55hQAkBNjZ+kcO0+dOmXNmjWzqlWr2rhx42zJkiX24osvWsWKFa1jx4524cKFfN/3tLQ0i4uLs5o1a9pLL71kixYtsoceesgk2cCBA93tipKvpKQk69Onj/vxsmXL3Dm844477IMPPrBZs2ZZnTp1rEGDBpaVlWVmF8fwdu3aWY0aNdzv89q1a83MbNasWSbJ/vjHP9rixYttyZIlNnnyZHvooYfy7R/gFApulGl33nmn1atXz/24c+fOdt9991nlypVt+vTpZma2evVqk2SLFy82M7Nt27aZJBswYIDHutavX2+S7PHHH3cvS0tLM0n28ccfe7SdM2eOSbLNmzfnGduRI0dMko0YMaJQfXnqqadMkqWnp+fZZvLkySbJ/v3vf3ssHzNmjEcfzS5+aKhevbp74DczO3TokIWEhNgzzzzjXtaxY0erVKmSfffdd4WK08wsKyvLMjMzrVOnTnbrrbe6l3/55ZcmyV555RWP9i1btrQWLVq4H19acJuZPffccybJdu/enWN7aWlp1qxZM49lDz74oMXGxlpGRkah4wYAMHZmK21j5zPPPGMhISG2YcMGj3bZeV24cGG+289+X9577z2P5ffdd5+FhITY3r17zaxo+cqr4L60MP/3v/9tktxFtZlZ9+7dLSkpKUecgwYNskqVKuXbF8CXuKUcZVqnTp20a9cu7d69W2fPntWqVat0/fXXq0OHDkpPT5ckLVmyRJGRkbrmmmskScuWLZOkHLeotWzZUo0aNdLHH3/ssbxy5crq2LGjx7JmzZopIiJC/fv31/Tp07Vr164S9+XDDz9UcnKyOnfunGebpUuXKiYmRj179vRYnt2XS2Pv0KGDKlSo4H5cvXp1xcfHa+/evZIufqduxYoV6tWrl/s2vrxMnjxZV111laKiohQWFqbw8HB9/PHH2rZtm7tN06ZN1aJFC02bNs29bNu2bfr000/dtxMWx+DBg7V582atXr1a0sVb6t544w316dNH5cuXL/Z6AaAsYuy8qLSNnQsWLNDll1+uZs2aKSsry/133XXXub+GVZAKFSropptu8ljWu3dvXbhwQZ988omkoucrN5du44orrpAkdw7z07JlSx0/fly333673nvvPR09erTA1wBOouBGmZY9wC5ZskSrVq1SZmamOnbsqM6dO7sHhCVLlqhdu3aKjo6WJB07dkzSxZmyL1WzZk3389lya1e/fn0tWbJE8fHxGjhwoOrXr6/69evrxRdfLHZfjhw5osTExHzbHDt2TDVq1PD4npMkxcfHKywsLEfscXFxOdYRGRmpM2fOSJJ++OEHnT9/vsDtjhs3Tg8++KBatWqluXPnat26ddqwYYOuv/5697qy/eEPf9DatWvd352bNm2aIiMjdfvtt+e7jfzcfPPNqlu3riZMmCBJeu2113Tq1CkNHDiw2OsEgLKKsfOi0jZ2Hj58WF9++aXCw8M9/ipUqCAzK1RhWr169RzLatSoIenn97io+crNpTmMjIyUpBz9zs1dd92lV199VXv37tVvf/tbxcfHq1WrVu6TQYCvUXCjTEtMTFRycrKWLFmi9PR0paamqlKlSurUqZMOHjyo9evXa926dR5nvrMHgYMHD+ZY34EDB1S1alWPZZcOONmuvfZavf/++/rxxx+1bt06tWnTRkOGDNFbb71VrL5Uq1Ytx8Qzl4qLi9Phw4dlZh7Lv/vuO2VlZeWIvSBVqlRRaGhogdt988031b59e02aNEndu3dXq1atlJqaqoyMjBxtb7/9dkVGRuq1117T+fPn9cYbb+iWW25R5cqVixTbL4WEhGjgwIGaM2eODh48qIkTJ6pTp05KSUkp9joBoKxi7LyotI2dVatWVdOmTbVhw4Zc/5544okCYz98+HCOZYcOHZL083vs7XwVxz333KM1a9boxx9/1AcffCAzU48ePQp1hRzwNgpulHmdO3fW0qVLlZ6eri5dukiSkpOTVadOHT355JPKzMz0+NCQfYvbm2++6bGeDRs2aNu2berUqVORth8aGqpWrVq5r75u3LhRUtHO5kpSt27dtHPnTi1dujTPNp06ddLJkyc1b948j+Wvv/66+/miiI6OVlpammbPnp3vmXGXy+XuT7Yvv/xSa9euzdG2cuXKuuWWW/T6669rwYIFOnToUKFuJy8oX/fee68iIiJ0xx13aMeOHRo0aFCB6wQA5I6xs/SNnT169NC3336ruLg4paam5virW7dugbFnZGRo/vz5HstmzpypkJAQ/frXv5bk/Xzl5Zd3DeQlJiZG3bp10/Dhw3Xu3Dlt2bLFK9sGisSv3yAHAsDcuXNNkkmyFStWuJffc889JskqV65s58+f93hN//79zeVy2ZAhQ2zRokX2z3/+0+Lj46127dp29OhRd7u0tDRr0qRJjm1OmjTJfve739lrr71mS5cutYULF1rPnj1Nki1atMjdLikpyVJSUmzRokW2YcOGXCcEy5Y902r58uXt6aeftsWLF9t7771nDz/8cI6ZVitUqGDjxo2z9PR0GzFihIWHh+c60+ovZx39ZUy/nOAke6bVevXq2SuvvGJLly61WbNm2e233+6eNObJJ580l8tlTz75pH388cc2ceJEq1GjhtWvXz/XCU8WLVpkkiwxMdESExNz5D+3SdOyJ1q5//77bc2aNbZhwwaPSWvMLk6UJsmSkpJyrBMAUHiMnaVv7Dx58qQ1b97cEhMT7YUXXrD09HRbtGiRTZkyxX73u9/ZunXr8syTmecs5S+//LItWrTIBg8ebJLswQcfdLcrSr7ymjRt9uzZHu1yG/dHjBhhkmzixIm2fv1692Rw9957r/3xj3+0t956y1asWGFvv/22NWvWzCpWrFikSeoAb6HgRpn3ww8/WEhIiMXExNi5c+fcy2fMmGGS7De/+U2O15w/f97GjBljycnJFh4eblWrVrU777zT9u3b59Eurw8Na9eutVtvvdWSkpIsMjLS4uLiLC0tzebPn+/RbsmSJda8eXOLjIw0SR6DUl59GTx4sNWpU8fCw8MtPj7eunfvbtu3b3e3OXbsmD3wwAOWkJBgYWFhlpSUZI899pidPXvWY12F/dBgZrZ161b73e9+Z3FxcRYREWF16tSxvn37utf5008/2aOPPmq1atWyqKgou+qqq2zevHnWp0+fXD80nD9/3mrXrm2SbPjw4Tmez23gNTN77LHHrGbNmhYSEmKSPH42xcxs+fLlJsmeffbZfLIIACgIY2fpGzvNLhbdf/3rXy0lJcUiIiKsYsWK1rRpUxs6dKgdOnQo3zxlvy/Lly+31NRUi4yMtISEBHv88cctMzPTo21h81WSgvv777+3nj17WqVKlczlcln2dcTp06dbhw4drHr16hYREWE1a9a0Xr16uX8eDfA1l9klX7AAgCD1yCOPaNKkSdq3b1+uk9oAAIDctW/fXkePHtVXX33l71CAUiXM3wEAgNPWrVunnTt3auLEibr//vsptgEAAOATFNwAgl6bNm1Urlw59ejRQ08//bS/wwEAAEAZwS3lAAAAAAA4gJ8FAwAAAADAARTcAAAAAAA4gIIbAAAAAAAH+GXStAsXLujAgQOqUKGCXC6XP0IAACAgmJkyMjJUs2ZNhYT45zw44zIAABd5e1z2S8F94MAB1a5d2x+bBgAgIO3bt0+JiYl+2TbjMgAAnrw1Lvul4K5QoYKki52IjY31RwgAAASEEydOqHbt2u6x0R8YlwEAuMjb47JfCu7s29ViY2MZ2Itq2TP5P9/hMd/EAQDwKn/eys24jALx+QNAGeOtcZlJ0wAAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4wC+TpjmiNEzmUVCMUmDECQAAAAAoMa5wAwAAAADggOC5wu0LpeEqOgAAAAAgIHCFGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcwHe4vakws5ADAAAAAMoErnADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcACTpgUaJl4DAAC/5I3PBh0eK/k6AABFxhVuAAAAAAAcQMENAAAAAIADys4t5YW5HYvbrQAAAAAAXlJ2Cu7C4PvTAAAAAAAvoeBGqfSP9J35Pj+0S7KPIgEAoIQC4YR/IMQAAEGI73ADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAP4DjcAAAACX0HfM+fXZgAEIApuAAAAJwXChGSBEAMAlEHcUg4AAAAAgAO4wg0AAICS8cYVdG4JBxCEKLgBAACAUuIf6TvzfX5ol2QfRZK30hAj4CsU3GVNYc5Ac4YZAAAAAEqMghsAAAQvZrYGAPgRBTcAAAD8j5nUAQQhZikHAAAAAMABXOGG93H7HgBACo7xIBj6AAQhJmZDacEVbgAAAAAAHMAV7mDjje8/BcDZ/ILOWpb09Zz1BAAAAOA0Cm543dpdx/J9fl1WyYppAAACBhN9IcB446IDFy4A76HgRg4FFcxtOvgoEAAAgDKmpHf5+YIvYnT6bsfC4MQCvIHvcAMAAAAA4ACucKPICjpj2NpHcQAAAJQqhfoKwm8dDwPewa33KAwK7jKmoNvFC6P1f19x/PXr6vQv0TYK4ovbjMrKQbis9BNAkOI72MHDG+8lP/MGwMsouAE/KUzRT1EPAEAACYBfcimp0vAdcSCYUHADxVQaJgzxxvop2gEAAIDioeAGAAClE7eDI9CwTyLAcDek/1FwIyAV9D1vp7/jDQDwgSC4PReBozDz1LSpF+f3GApUp+SrCHTc1l54FMylHwV3kPHKgR7wIgYKAAAAlFUU3EAQ4wwyAAA/K+jCRJsOJXs9gktp+BwVCDFycSV/FNwAACAw8X1Y+NjaqY/6O4QS//wqX7srPQKhWC6MQIizNBf1FNwolQLhO96BEAO8ozQfxAEAABC4KLhRJhXm7DEF80X+/mmyYCl2y0o/AQQnb9xKXdCEZdyuDfheIFy9DnZBU3AX+J2cQsxKWdJ1MFAEjpLejuWtdZR0/SUt+r1xFd7pW9sKWv/aqSXfxtCwuQWs4bcFb8Rh3hjwSkPRXtITDz757Xhmzg4q3vh8EOh8UQx7A5+TfKM0jP1lBcXsRWX9ooNfCm4zkySdOHHCa+s8deanfJ8/ceqs4+so6PUoW86eOpnv84XZXwpaR0EK2kZh1l/S/dobeSjpNk6E5f9/92xWAa/3wrGqpO9lYXjzmOqUAt+rAvpQmDyWOA8FjRdeznN2vNljoz84MS5LKjiXPuCNzweBzhvHUT7jlB2BMPYDv1SYsaeknx+Kwtvjssv8MMLv379ftWvX9vVmAQAIWPv27VNiYqJfts24DACAJ2+Ny34puC9cuKADBw6oQoUKcrlcJV7fiRMnVLt2be3bt0+xsbFeiLB0Ig8XkYeLyMNF5IEcZAvUPJiZMjIyVLNmTYWEhPglBm+Py1Lg5tvfyEveyE3eyE3eyE3eyE3e8suNt8dlv9xSHhIS4shZ/NjYWHYmkYds5OEi8nAReSAH2QIxDxUrVvTr9p0al6XAzHcgIC95Izd5Izd5Izd5Izd5yys33hyX/XMqHQAAAACAIEfBDQAAAACAA4Ki4I6MjNSIESMUGRnp71D8ijxcRB4uIg8XkQdykI08+Bb5zh15yRu5yRu5yRu5yRu5yZsvc+OXSdMAAAAAAAh2QXGFGwAAAACAQEPBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOKDUF98SJE3XZZZcpKipKLVq00MqVK/Ntv2LFCrVo0UJRUVGqV6+eJk+e7KNInVWUPLzzzjvq0qWLqlWrptjYWLVp00aLFi3yYbTOKer+kG316tUKCwtTs2bNnA3QR4qah59++knDhw9XUlKSIiMjVb9+fb366qs+itYZRc3BjBkzdOWVV6pcuXJKSEjQPffco2PHjvkoWmd88sknuvHGG1WzZk25XC7NmzevwNcE4zGyqHkI5mOkE5wYh+fOnavGjRsrMjJSjRs31rvvvutU+I7ydm5ee+01uVyuHH9nz551shuOKEpuDh48qN69eyslJUUhISEaMmRIru2CYb/xdl7K6j5T2ON4MOwzkvdzU1b3m1WrVqldu3aKi4tTdHS0GjZsqH/84x852nltv7FS4K233rLw8HCbMmWKbd261QYPHmwxMTG2d+/eXNvv2rXLypUrZ4MHD7atW7falClTLDw83ObMmePjyL2rqHkYPHiwjRkzxj799FPbuXOnPfbYYxYeHm4bN270ceTeVdQ8ZDt+/LjVq1fPunbtaldeeaVvgnVQcfJw0003WatWrSw9Pd12795t69evt9WrV/swau8qag5WrlxpISEh9uKLL9quXbts5cqV1qRJE7vlllt8HLl3LVy40IYPH25z5841Sfbuu+/m2z5Yj5FFzUOwHiOd4MQ4vGbNGgsNDbXRo0fbtm3bbPTo0RYWFmbr1q3zVbe8woncTJs2zWJjY+3gwYMef6VNUXOze/due+ihh2z69OnWrFkzGzx4cI42wbDfOJGXsrrPFOY4Hgz7jJkzuSmr+83GjRtt5syZ9tVXX9nu3bvtjTfesHLlytk///lPdxtv7jelouBu2bKlPfDAAx7LGjZsaMOGDcu1/Z///Gdr2LChx7L777/fWrdu7ViMvlDUPOSmcePGNmrUKG+H5lPFzcPvf/97++tf/2ojRowIioK7qHn48MMPrWLFinbs2DFfhOcTRc3Bc889Z/Xq1fNY9tJLL1liYqJjMfpaYQrNYD1G/lJh8pCbYDhGOsGJcbhXr152/fXXe7S57rrr7LbbbvNS1L7hRG6mTZtmFStW9HqsvlaSzy1paWm5FpbBsN84kRf2mZ9dehwPhn3GzJncsN/87NZbb7U777zT/dib+03A31J+7tw5ff755+ratavH8q5du2rNmjW5vmbt2rU52l933XX67LPPlJmZ6VisTipOHi514cIFZWRkqEqVKk6E6BPFzcO0adP07bffasSIEU6H6BPFycP8+fOVmpqqsWPHqlatWkpOTtajjz6qM2fO+CJkrytODtq2bav9+/dr4cKFMjMdPnxYc+bMUffu3X0RcsAIxmOkNwTDMdIJTo3DebUp7JgWCJz8jHLy5EklJSUpMTFRPXr00KZNm7zfAQd543NLbkr7fuNUXiT2GSn343hp32ckZ+sA9htp06ZNWrNmjdLS0tzLvLnfBHzBffToUZ0/f17Vq1f3WF69enUdOnQo19ccOnQo1/ZZWVk6evSoY7E6qTh5uNQLL7ygU6dOqVevXk6E6BPFycPXX3+tYcOGacaMGQoLC/NFmI4rTh527dqlVatW6auvvtK7776r8ePHa86cORo4cKAvQva64uSgbdu2mjFjhn7/+98rIiJCNWrUUKVKlfTyyy/7IuSAEYzHSG8IhmOkE5wah/NqU9gxLRA4lZuGDRvqtdde0/z58zVr1ixFRUWpXbt2+vrrr53piAO88bklN6V9v3EqL+wzF+V2HC/t+4zkXG7K+n6TmJioyMhIpaamauDAgbr33nvdz3lzvyk11YfL5fJ4bGY5lhXUPrflpU1R85Bt1qxZGjlypN577z3Fx8c7FZ7PFDYP58+fV+/evTVq1CglJyf7KjyfKcr+cOHCBblcLs2YMUMVK1aUJI0bN049e/bUhAkTFB0d7Xi8TihKDrZu3aqHHnpITz75pK677jodPHhQf/rTn/TAAw9o6tSpvgg3YATrMbK4gu0Y6QQnxuHijmmBxtu5ad26tVq3bu1+vl27drrqqqv08ssv66WXXvJW2D7hxHscDPuNt/vAPpP/cTwY9hnJ+7kp6/vNypUrdfLkSa1bt07Dhg3Tr371K91+++0lWmduAr7grlq1qkJDQ3OcTfjuu+9ynHXIVqNGjVzbh4WFKS4uzrFYnVScPGR7++231a9fP82ePVudO3d2MkzHFTUPGRkZ+uyzz7Rp0yYNGjRI0sXC08wUFhamxYsXq2PHjj6J3ZuKsz8kJCSoVq1a7mJbkho1aiQz0/79+9WgQQNHY/a24uTgmWeeUbt27fSnP/1JknTFFVcoJiZG1157rZ5++mklJCQ4HncgCMZjZEkE0zHSCU6Nw3m1KWhMCyS++owSEhKiq6++ulRddSrJ55b8lPb9xqm8XKqs7TP5HcdL+z4j+a4OKGv7zWWXXSZJatq0qQ4fPqyRI0e6C25v7jcBf0t5RESEWrRoofT0dI/l6enpatu2ba6vadOmTY72ixcvVmpqqsLDwx2L1UnFyYN08YxW3759NXPmzKD4nmpR8xAbG6v//Oc/2rx5s/vvgQceUEpKijZv3qxWrVr5KnSvKs7+0K5dOx04cEAnT550L9u5c6dCQkKUmJjoaLxOKE4OTp8+rZAQz8NeaGiopJ+vMJUFwXiMLK5gO0Y6walxOK82+Y1pgcZXn1HMTJs3by5VJwWL+7mlIKV9v3EqL5cqS/tMQcfx0r7PSL6rA8rSfnMpM9NPP/3kfuzV/abI06z5QfZU71OnTrWtW7fakCFDLCYmxvbs2WNmZsOGDbO77rrL3T77JzeGDh1qW7dutalTpwbFT94UNQ8zZ860sLAwmzBhgsdU/8ePH/dXF7yiqHm4VLDMUl7UPGRkZFhiYqL17NnTtmzZYitWrLAGDRrYvffe668ulFhRczBt2jQLCwuziRMn2rfffmurVq2y1NRUa9mypb+64BUZGRm2adMm27Rpk0mycePG2aZNm9w/h1FWjpFFzUOwHiOd4MQ4vHr1agsNDbVnn33Wtm3bZs8++2yp/qkeb+Zm5MiR9tFHH9m3335rmzZtsnvuucfCwsJs/fr1Pu9fSRRnvM7+P9yiRQvr3bu3bdq0ybZs2eJ+Phj2GyfyUlb3mcIcx4NhnzFzJjdldb/5v//7P5s/f77t3LnTdu7caa+++qrFxsba8OHD3W28ud+UioLbzGzChAmWlJRkERERdtVVV9mKFSvcz/Xp08fS0tI82i9fvtyaN29uERERVrduXZs0aZKPI3ZGUfKQlpZmknL89enTx/eBe1lR94dfCpaC26zoedi2bZt17tzZoqOjLTEx0R5++GE7ffq0j6P2rqLm4KWXXrLGjRtbdHS0JSQk2B133GH79+/3cdTetWzZsnz/r5eVY2RR8xDMx0gnODEOz54921JSUiw8PNwaNmxoc+fOdbobjvB2boYMGWJ16tSxiIgIq1atmnXt2tXWrFnji654XVFzk9v/yaSkJI82wbDfeDsvZXWfKexxPBj2GTPv56as7jcvvfSSNWnSxMqVK2exsbHWvHlzmzhxop0/f95jnd7ab1xmZeg+SgAAAAAAfCTgv8MNAAAAAEBpRMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3EAAOX36tEaOHKnly5f7O5SANnHiRL322mv+DgMA4CPFGR83bdqktLQ0VaxYUS6XS+PHj3csvkt5Yzzfs2ePXC5Xsce75cuXy+VyecTQt29f1a1bt9gxBYqS5gbwpTB/BwDgZ6dPn9aoUaMkSe3bt/dvMAFs4sSJqlq1qvr27evvUAAAPlCc8fEPf/iDTp06pbfeekuVK1f2aaEZqOP5E088ocGDB/s7jBJLSEjQ2rVrVb9+fX+HAhSIghvwgdOnT6tcuXL+DqNMyszMlMvlUlgYhzsACDROjo9fffWV7rvvPnXr1i3fdmfOnFFUVJRcLpcjcQSSYClQIyMj1bp1a3+HARQKt5SjTNmyZYtcLpdmz57tXvb555/L5XKpSZMmHm1vuukmtWjRwv34woULGjt2rBo2bKjIyEjFx8fr7rvv1v79+z1e1759e11++eX65JNP1LZtW5UrV05/+MMfJElLly5V+/btFRcXp+joaNWpU0e//e1vdfr0ae3Zs0fVqlWTJI0aNUoul0sulyvfq7gXLlzQ008/rZSUFEVHR6tSpUq64oor9OKLL0qSVq5cKZfLpVmzZuV47euvvy6Xy6UNGzZIunibWfny5bV9+3Zdd911iomJUUJCgp599llJ0rp163TNNdcoJiZGycnJmj59usf6XnvtNblcLi1dulT33Xef4uLiFBsbq7vvvlunTp3SoUOH1KtXL1WqVEkJCQl69NFHlZmZ6bGOc+fO6emnn3bnuFq1arrnnnt05MgRd5u6detqy5YtWrFihTtH2Vctsm+fe+ONN/TII4+oVq1aioyM1DfffKOwsDA988wzOfLwySef5NgnAKCsCabxMXs8ysrK0qRJk9ztf/nc4sWL9Yc//EHVqlVTuXLl9NNPP+mbb77RPffcowYNGqhcuXKqVauWbrzxRv3nP//JsY3jx4/rkUceUb169dx9vuGGG7R9+/YC4y3Kdgpr+/btuv7661WuXDlVrVpVDzzwgDIyMnK0y+2WcpfLpUGDBmnatGnuzxOpqalat26dzEzPPfecLrvsMpUvX14dO3bUN998k2O9S5YsUadOnRQbG6ty5cqpXbt2+vjjjz3ajBw5Ui6XS1u2bNHtt9+uihUrqnr16vrDH/6gH3/80aPt7Nmz1apVK1WsWFHlypVTvXr13PuKlPct5atWrVKnTp1UoUIFlStXTm3bttUHH3zg0SZ7H1i2bJkefPBBVa1aVXFxcfrNb36jAwcOFCbdQNEYUMYkJCRY//793Y+fffZZi46ONkn2v//9z8zMMjMzLTY21v785z+72/Xv398k2aBBg+yjjz6yyZMnW7Vq1ax27dp25MgRd7u0tDSrUqWK1a5d215++WVbtmyZrVixwnbv3m1RUVHWpUsXmzdvni1fvtxmzJhhd911l/3www929uxZ++ijj0yS9evXz9auXWtr1661b775Js++PPPMMxYaGmojRoywjz/+2D766CMbP368jRw50t2mefPm1q5duxyvvfrqq+3qq692P+7Tp49FRERYo0aN7MUXX7T09HS75557TJI99thjlpycbFOnTrVFixZZjx49TJJ99tln7tdPmzbNJNlll11mjzzyiC1evNjGjBljoaGhdvvtt9tVV11lTz/9tKWnp9tf/vIXk2QvvPCC+/Xnz5+366+/3mJiYmzUqFGWnp5u//rXv6xWrVrWuHFjO336tJmZbdy40erVq2fNmzd352jjxo1mZrZs2TKTZLVq1bKePXva/PnzbcGCBXbs2DG79dZbrU6dOpaVleWRh9/97ndWs2ZNy8zMzDPPAFAWBMv4+N1339natWtNkvXs2dPd3uznsapWrVrWv39/+/DDD23OnDmWlZVlK1assEceecTmzJljK1assHfffdduueUWi46Otu3bt7vXf+LECWvSpInFxMTYU089ZYsWLbK5c+fa4MGDbenSpQXGW9jt7N692yTZtGnT8n3fDh06ZPHx8VarVi2bNm2aLVy40O644w6rU6eOSbJly5a52/bp08eSkpI8Xi/JkpKSrG3btvbOO+/Yu+++a8nJyValShUbOnSo3XzzzbZgwQKbMWOGVa9e3a644gq7cOGC+/VvvPGGuVwuu+WWW+ydd96x999/33r06GGhoaG2ZMkSd7sRI0aYJEtJSbEnn3zS0tPTbdy4cRYZGWn33HOPu92aNWvM5XLZbbfdZgsXLrSlS5fatGnT7K677so3N8uXL7fw8HBr0aKFvf322zZv3jzr2rWruVwue+utt9ztsveBevXq2R//+EdbtGiR/etf/7LKlStbhw4d8s01UBwU3Chz7rzzTqtXr577cefOne2+++6zypUr2/Tp083MbPXq1SbJFi9ebGZm27ZtM0k2YMAAj3WtX7/eJNnjjz/uXpaWlmaS7OOPP/ZoO2fOHJNkmzdvzjO2I0eOmCQbMWJEofrSo0cPa9asWb5tsgeWTZs2uZd9+umnJsndX7OLg7Akmzt3rntZZmamVatWzSS5i1ozs2PHjlloaKg9/PDDObbzxz/+0WP7t9xyi0mycePGeSxv1qyZXXXVVe7Hs2bNyrF9M7MNGzaYJJs4caJ7WZMmTSwtLS1HX7ML7l//+td5Pvfuu++6l/3vf/+zsLAwGzVqVI72AFDWBNP4aHaxkBw4cKDHsuyx6u677y7w9VlZWXbu3Dlr0KCBDR061L38qaeeMkmWnp7ulXjz2k5hC+6//OUv5nK5cuSvS5cuhS64a9SoYSdPnnQvmzdvnkmyZs2aeRTX48ePN0n25ZdfmpnZqVOnrEqVKnbjjTd6rPP8+fN25ZVXWsuWLd3LsgvusWPHerQdMGCARUVFubfz/PPPmyQ7fvx4nn3OLTetW7e2+Ph4y8jIcC/Lysqyyy+/3BITE93rz94HLt1nx44da5Ls4MGDeW4XKA5uKUeZ06lTJ+3atUu7d+/W2bNntWrVKl1//fXq0KGD0tPTJV28NSoyMlLXXHONJGnZsmWSlOP2tZYtW6pRo0Y5bpuqXLmyOnbs6LGsWbNmioiIUP/+/TV9+nTt2rWrxH1p2bKlvvjiCw0YMECLFi3SiRMncrS5/fbbFR8frwkTJriXvfzyy6pWrZp+//vfe7R1uVy64YYb3I/DwsL0q1/9SgkJCWrevLl7eZUqVRQfH6+9e/fm2F6PHj08Hjdq1EiS1L179xzLf/n6BQsWqFKlSrrxxhuVlZXl/mvWrJlq1KhRpJlef/vb3+ZY1r59e1155ZUeeZg8ebJcLpf69+9f6HUDQLAKpvGxILmNE1lZWRo9erQaN26siIgIhYWFKSIiQl9//bW2bdvmbvfhhx8qOTlZnTt3Lta2C7udwlq2bJmaNGmiK6+80mN57969C72ODh06KCYmxv04e+zu1q2bx3fbs5dnj99r1qzR999/rz59+niM3RcuXND111+vDRs26NSpUx7buummmzweX3HFFTp79qy+++47SdLVV18tSerVq5f+/e9/63//+1+B8Z86dUrr169Xz549Vb58effy0NBQ3XXXXdq/f7927NhRYBy/7BvgLRTcKHOyB8glS5Zo1apVyszMVMeOHdW5c2f3B4MlS5aoXbt2io6OliQdO3ZM0sVZMS9Vs2ZN9/PZcmtXv359LVmyRPHx8Ro4cKDq16+v+vXru79vXRyPPfaYnn/+ea1bt07dunVTXFycOnXqpM8++8zdJjIyUvfff79mzpyp48eP68iRI/r3v/+te++9V5GRkR7rK1eunKKiojyWRUREqEqVKjm2HRERobNnz+ZYfmnbiIiIPJf/8vWHDx/W8ePHFRERofDwcI+/Q4cO6ejRo4XMSu75l6SHHnpIH3/8sXbs2KHMzExNmTJFPXv2VI0aNQq9bgAIVsE0PhYktzgefvhhPfHEE7rlllv0/vvva/369dqwYYOuvPJKnTlzxt3uyJEjSkxMLPa2C7udwjp27Fiu41hRxraijN2S3OP34cOHJUk9e/bMMXaPGTNGZqbvv//eYx1xcXEej7M/i2T3/de//rXmzZunrKws3X333UpMTNTll1+e63w02X744QeZWZ77oaQc+2JBcQDewrS9KHMSExOVnJysJUuWqG7dukpNTVWlSpXUqVMnDRgwQOvXr9e6devcP+ch/XxQPnjwYI5B9sCBA6patarHsrxmOr322mt17bXX6vz58/rss8/08ssva8iQIapevbpuu+22IvclLCxMDz/8sB5++GEdP35cS5Ys0eOPP67rrrtO+/btc8/8+uCDD+rZZ5/Vq6++qrNnzyorK0sPPPBAkbfnpOxJSz766KNcn69QoUKh15VX/nv37q2//OUvmjBhglq3bq1Dhw5p4MCBxYoXAIJNMI2PBcktjjfffFN33323Ro8e7bH86NGjqlSpkvtxtWrVckwIVxSF3U5hxcXF6dChQzmW57bM27Lf35dffjnPWcOrV69e5PXefPPNuvnmm/XTTz9p3bp1euaZZ9S7d2/VrVtXbdq0ydG+cuXKCgkJ0cGDB3M8lz0R2qX7IuArXOFGmdS5c2ctXbpU6enp6tKliyQpOTlZderU0ZNPPqnMzEyPW8Wyb3978803PdazYcMGbdu2TZ06dSrS9kNDQ9WqVSv37c0bN26UVLKzq5UqVVLPnj01cOBAff/999qzZ4/7uYSEBP3ud7/TxIkTNXnyZN14442qU6dOkbfhpB49eujYsWM6f/68UlNTc/ylpKS420ZGRhYrR1FRUe5bFseNG6dmzZqpXbt23uwGAJRqwTg+FpbL5cpx59cHH3yQ45bmbt26aefOnVq6dGme68ov3sJup7A6dOigLVu26IsvvvBYPnPmzGKtryjatWunSpUqaevWrbmO3ampqe6r4sURGRmptLQ0jRkzRpK0adOmXNvFxMSoVatWeueddzxyfuHCBb355pvuk0mAP3CFG2VSp06dNHHiRB09elTjx4/3WD5t2jRVrlzZ4ydPUlJS1L9/f7388ssKCQlRt27dtGfPHj3xxBOqXbu2hg4dWuA2J0+erKVLl6p79+6qU6eOzp49q1dffVXSz7fxVahQQUlJSXrvvffUqVMnValSRVWrVs3xEx7ZbrzxRl1++eVKTU1VtWrVtHfvXo0fP15JSUlq0KCBR9vBgwerVatWkqRp06YVJV0+cdttt2nGjBm64YYbNHjwYLVs2VLh4eHav3+/li1bpptvvlm33nqrJKlp06Z666239Pbbb6tevXqKiopS06ZNC7WdAQMGaOzYsfr888/1r3/9y8kuAUCpEyzjY3H06NFDr732mho2bKgrrrhCn3/+uZ577rkcV+6HDBmit99+WzfffLOGDRumli1b6syZM1qxYoV69OihDh065BtvYbdTWEOGDNGrr76q7t276+mnn1b16tU1Y8YMbd++3RtpyVf58uX18ssvq0+fPvr+++/Vs2dPxcfH68iRI/riiy905MgRTZo0qUjrfPLJJ7V//3516tRJiYmJOn78uF588UWFh4crLS0tz9c988wz6tKlizp06KBHH31UERERmjhxor766ivNmjWrTPzOOgKUv2dtA/zhhx9+sJCQEIuJibFz5865l8+YMcMk2W9+85scrzl//ryNGTPGkpOTLTw83KpWrWp33nmn7du3z6NdWlqaNWnSJMfr165da7feeqslJSVZZGSkxcXFWVpams2fP9+j3ZIlS6x58+YWGRlpkqxPnz559uOFF16wtm3bWtWqVS0iIsLq1Klj/fr1sz179uTavm7dutaoUaNcn+vTp4/FxMTkWJ5Xf5KSkqx79+7ux9mzfm7YsMGjXfaspL/8aZi8tpeZmWnPP/+8XXnllRYVFWXly5e3hg0b2v33329ff/21u92ePXusa9euVqFCBffPmZj9PBP57Nmzc+1jtvbt21uVKlXcPzUGALgoWMZHs/xnKb90rMrue79+/Sw+Pt7KlStn11xzja1cudLS0tJy/DLGDz/8YIMHD7Y6depYeHi4xcfHW/fu3T1+1iuveAu7ncLOUm5mtnXrVuvSpYtFRUVZlSpVrF+/fvbee+8VepbyS/OUve3nnnvOY3le4+yKFSuse/fuVqVKFQsPD7datWpZ9+7dPdrl9Xkg+z3ZvXu3mZktWLDAunXrZrVq1bKIiAiLj4+3G264wVauXFlgblauXGkdO3a0mJgYi46OttatW9v777+f6/Yu3Qey+/bLfAHe4DIz82mFD8AvvvzyS/cs3QMGDPB3OH7z3XffKSkpSX/84x81duxYf4cDAACAIEbBDQS5b7/9Vnv37tXjjz+u//73v/rmm2/ck6mVJfv379euXbv03HPPaenSpdq5c6dq1arl77AAAAAQxJg0DQhyf/vb39SlSxedPHlSs2fPLpPFtiT961//Uvv27bVlyxbNmDGDYhsAAACO4wo3AAAAAAAO4Ao3AAAAAAAOoOAGAAAAAMABfvkd7gsXLujAgQOqUKECv4kHACjTzEwZGRmqWbOmQkL8cx6ccRkAgIu8PS77peA+cOCAateu7Y9NAwAQkPbt26fExES/bJtxGQAAT94al/1ScFeoUEHSxU7Exsb6IwQAAALCiRMnVLt2bffY6A+MywAAXOTtcdkvBXf27WqxsbEM7AAASH69lZtxGQAAT94al/1ScKOUW/ZM/s93eMw3cQAA4G8FjYkS4yIAlGEU3AAAIHhxkhgA4Ef8LBgAAAAAAA7gCnew4Uw+AAAAAAQECm7kVJjvowEAEAw4UQ0AcBC3lAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOYJZyeB8zvgIAAAAABXeZw09+AQDgW5yIBoAyi1vKAQAAAABwAAU3AAAAAAAO4JZyAACAvPBVLABACVBwlzYM/AAAAABQKnBLOQAAAAAADqDgBgAAAADAAdxSDt8rzG3x/EQKAAAAgFKOghsAAMCf+J1uAAhaFNyBhknRAAAAACAoUHAjMJX0xANXAwAg+HGSGgAQ4Jg0DQAAAAAAB3CFGwFp7a5j+T7fpl6cjyIBAAAAgOLhCjcAAAAAAA7gCjeC0j/Sd+b7/NAuyT6KBAAAAEBZxRVuAAAAAAAcwBVueB3fvwYAAAAArnADAAAAAOAIrnADAADkwRt3bXHnFwCUXRTcCEqt//tKAS2e90kcAAAAAMoubikHAAAAAMABXOEGAAAIZMueyf/5Do/5Zh0AgCLjCjcAAAAAAA7gCjeKrKDJX0pFDAWd6Zc42w8AAACgRCi4fakwRR6Cxj/Sd+b7/NAuyT6KBABKKcZNAEApR8ENAADgRyX+2bAAODFR0ElmiRPNAMomCu4ypjC3Yjv9e6CBcEs6AAAAADiNghsAAKCYOIkMAMgPBTcAAPCPALgVujTwRVHfpoPjm2BuEwBlEgU3cuBsPQAAZUthvoMNACg6Cu5SpsQTq8Br+HACAIGPcRMA4E8U3CiTCnMVf11W6S+ouX0PAAAA8J8QfwcAAAAAAEAw4gp3URQ0uUuHx3wTBwAA8MqcI8xbclHr/76S7/Pr6vQv0esLsw4ACEYU3N4UALOt8sHBe0r64aMghfkOOLd8AwACQWEKagBAThTcAAAgIHESGYGGuVEAFBUF9y8FwBVqlB4lPdvvjVvrfDFTOh8uADiFgrpsKWjcXDs1/9e36fd8vs8X6s6xsLn5ryPrtwWuAwCKImgK7mApCvjwAXjyxUmF0nJ8AAAAQOkSNAV3wVcb8z8r6g381ie8rTT81newnOwC4H0FHR9a+ygOBAdvjIkFXtioU+JNBDzmkAF8K2gKbgAAEFiYaAtlzdqpj+b7vDe+TlZQMVwaTtYDZYlfCm4zkySdOHHCa+s8deanfJ8/sWCE17ZV3BiWbDngeAwoPc6eOllgm6v3T8v3+Q2J95QohiX/98cC2zQt4PmCYnhm3sYiROQf3jgWTVj6Tb7PD+z4K0fXXxgFxeB0H0oLX+che//LHhv9wYlxWSp4XASKoumOl/N9vjBjYkH7ZEHbKHD9BTxfmPUX1I+CPtOezbq5wG0UpKCxe2DYe/mv4NePlDiG0oBx03d8mWtvj8su88MIv3//ftWuXdvXmwUAIGDt27dPiYmJftk24zIAAJ68NS77peC+cOGCDhw4oAoVKsjlcvl68447ceKEateurX379ik2Ntbf4ZRK5LDkyGHJkUPvII/5MzNlZGSoZs2aCgkJ8UsMgTIuB+u+Qr9KF/pVutCv0ifQ++btcdkvt5SHhIT47Sy+L8XGxgbkTlSakMOSI4clRw69gzzmrWLFin7dfqCNy8G6r9Cv0oV+lS70q/QJ5L55c1z2z6l0AAAAAACCHAU3AAAAAAAOoOB2QGRkpEaMGKHIyEh/h1JqkcOSI4clRw69gzyisIJ1X6FfpQv9Kl3oV+kTzH3LjV8mTQMAAAAAINhxhRsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAV3MUycOFGXXXaZoqKi1KJFC61cuTLPtu+88466dOmiatWqKTY2Vm3atNGiRYt8GG3gKkoeV61apXbt2ikuLk7R0dFq2LCh/vGPf/gw2sBUlBz+0urVqxUWFqZmzZo5G2ApUJQcLl++XC6XK8ff9u3bfRhx4CnqfvjTTz9p+PDhSkpKUmRkpOrXr69XX33VR9HCl4q6b6xYsUItWrRQVFSU6tWrp8mTJ+doc/z4cQ0cOFAJCQmKiopSo0aNtHDhQqe6kCsn+jV+/HilpKQoOjpatWvX1tChQ3X27FmnupCnovTt4MGD6t27t1JSUhQSEqIhQ4bk2m7u3Llq3LixIiMj1bhxY7377rsORZ83b/drypQpuvbaa1W5cmVVrlxZnTt31qeffupgD3LnxPuV7a233pLL5dItt9zi3aALwYl+lbZjR2H7FQjHDidqo0A4bniNoUjeeustCw8PtylTptjWrVtt8ODBFhMTY3v37s21/eDBg23MmDH26aef2s6dO+2xxx6z8PBw27hxo48jDyxFzePGjRtt5syZ9tVXX9nu3bvtjTfesHLlytk///lPH0ceOIqaw2zHjx+3evXqWdeuXe3KK6/0TbABqqg5XLZsmUmyHTt22MGDB91/WVlZPo48cBRnP7zpppusVatWlp6ebrt377b169fb6tWrfRg1fKGo+8auXbusXLlyNnjwYNu6datNmTLFwsPDbc6cOe42P/30k6WmptoNN9xgq1atsj179tjKlStt8+bNvuqWI/168803LTIy0mbMmGG7d++2RYsWWUJCgg0ZMsRX3TKzovdt9+7d9tBDD9n06dOtWbNmNnjw4Bxt1qxZY6GhoTZ69Gjbtm2bjR492sLCwmzdunUO9+ZnTvSrd+/eNmHCBNu0aZNt27bN7rnnHqtYsaLt37/f4d78zIl+ZduzZ4/VqlXLrr32Wrv55pud6UAenOhXaTx2FKZfgXDscKI2CoTjhjdRcBdRy5Yt7YEHHvBY1rBhQxs2bFih19G4cWMbNWqUt0MrVbyRx1tvvdXuvPNOb4dWahQ3h7///e/tr3/9q40YMaLMF9xFzWF2wf3DDz/4ILrSoag5/PDDD61ixYp27NgxX4QHPyrqvvHnP//ZGjZs6LHs/vvvt9atW7sfT5o0yerVq2fnzp3zfsCF5ES/Bg4caB07dvRo8/DDD9s111zjpagLpyRjc1paWq4FQa9evez666/3WHbdddfZbbfdVqJYi8KJfl0qKyvLKlSoYNOnTy9umEXmVL+ysrKsXbt29q9//cv69Onj84LbiX6VxmPHL+XVr0A4djhRGwXCccObuKW8CM6dO6fPP/9cXbt29VjetWtXrVmzplDruHDhgjIyMlSlShUnQiwVvJHHTZs2ac2aNUpLS3MixIBX3BxOmzZN3377rUaMGOF0iAGvJPth8+bNlZCQoE6dOmnZsmVOhhnQipPD+fPnKzU1VWPHjlWtWrWUnJysRx99VGfOnPFFyPCR4uwba9euzdH+uuuu02effabMzExJF/efNm3aaODAgapevbouv/xyjR49WufPn3emI5dwql/XXHONPv/8c/ctybt27dLChQvVvXt3B3qRO2+MzbnJq/8lWWdRONWvS50+fVqZmZk++3znZL+eeuopVatWTf369SvReorDqX6VxmNHYfj72OFUbeTv44a3hfk7gNLk6NGjOn/+vKpXr+6xvHr16jp06FCh1vHCCy/o1KlT6tWrlxMhlgolyWNiYqKOHDmirKwsjRw5Uvfee6+ToQas4uTw66+/1rBhw7Ry5UqFhfFfvzg5TEhI0CuvvKIWLVrop59+0htvvKFOnTpp+fLl+vWvf+2LsANKcXK4a9curVq1SlFRUXr33Xd19OhRDRgwQN9//z3f4w4ixdk3Dh06lGv7rKwsHT16VAkJCdq1a5eWLl2qO+64QwsXLtTXX3+tgQMHKisrS08++aRj/cnmVL9uu+02HTlyRNdcc43MTFlZWXrwwQc1bNgwx/pyKW98xslNXv0vyTqLwql+XWrYsGGqVauWOnfu7LV15sepfq1evVpTp07V5s2bSxhh8TjVr9J47CgMfx87nKqN/H3c8DY+dReDy+XyeGxmOZblZtasWRo5cqTee+89xcfHOxVeqVGcPK5cuVInT57UunXrNGzYMP3qV7/S7bff7mSYAa2wOTx//rx69+6tUaNGKTk52VfhlQpF2Q9TUlKUkpLiftymTRvt27dPzz//fJksuLMVJYcXLlyQy+XSjBkzVLFiRUnSuHHj1LNnT02YMEHR0dGOxwvfKepxPrf2v1x+4cIFxcfH65VXXlFoaKhatGihAwcO6LnnnvPJh+b84ixJv5YvX66///3vmjhxolq1aqVvvvlGgwcPVkJCgp544gkvR5+/4n7G8fU6AymGsWPHatasWVq+fLmioqK8ss7C8ma/MjIydOedd2rKlCmqWrWqN8IrNm+/X6X12FGQQDl2OFEbBcJxw1souIugatWqCg0NzXF25bvvvstxFuZSb7/9tvr166fZs2f77OxnoCpJHi+77DJJUtOmTXX48GGNHDmyTBbcRc1hRkaGPvvsM23atEmDBg2SdHHwMTOFhYVp8eLF6tixo09iDxQl2Q9/qXXr1nrzzTe9HV6pUJwcJiQkqFatWu5iW5IaNWokM9P+/fvVoEEDR2OGbxRn36hRo0au7cPCwhQXFyfp4v4THh6u0NBQd5tGjRrp0KFDOnfunCIiIrzcE09O9euJJ57QXXfd5b5rq2nTpjp16pT69++v4cOHKyTE+W8AeuuYeKm8+l+SdRaFU/3K9vzzz2v06NFasmSJrrjiihKvr7Cc6Ne3336rPXv26MYbb3Qvu3DhgiQpLCxMO3bsUP369YsfdCE49X6VxmNHYfj72OFUbeTv44a38R3uIoiIiFCLFi2Unp7usTw9PV1t27bN83WzZs1S3759NXPmTJ9+HytQFTePlzIz/fTTT94Or1Qoag5jY2P1n//8R5s3b3b/PfDAA0pJSdHmzZvVqlUrX4UeMLy1H27atEkJCQneDq9UKE4O27VrpwMHDujkyZPuZTt37lRISIgSExMdjRe+U5x9o02bNjnaL168WKmpqQoPD5d0cf/55ptv3EWAdHH/SUhIcPwDs+Rcv06fPp3jg3FoaKjs4uS2XuxB3rx1TLxUXv0vyTqLwql+SdJzzz2nv/3tb/roo4+UmppaonUVlRP9atiwYY7PCjfddJM6dOigzZs3q3bt2t4IPV9OvV+l8dhRGP4+djhVG/n7uOF1vpqdLVhkT30/depU27p1qw0ZMsRiYmJsz549ZmY2bNgwu+uuu9ztZ86caWFhYTZhwgSPnxE6fvy4v7oQEIqax//7v/+z+fPn286dO23nzp326quvWmxsrA0fPtxfXfC7oubwUsxSXvQc/uMf/7B3333Xdu7caV999ZUNGzbMJNncuXP91QW/K2oOMzIyLDEx0Xr27GlbtmyxFStWWIMGDezee+/1VxfgkKLuG9k/nzV06FDbunWrTZ06NcfPZ/33v/+18uXL26BBg2zHjh22YMECi4+Pt6effrpU92vEiBFWoUIFmzVrlu3atcsWL15s9evXt169evmsX8Xpm5nZpk2bbNOmTdaiRQvr3bu3bdq0ybZs2eJ+fvXq1RYaGmrPPvusbdu2zZ599lm//SyYN/s1ZswYi4iIsDlz5nh8vsvIyCjV/bqUP2Ypd6JfpfHYUZh+BcKxw4naKBCOG95EwV0MEyZMsKSkJIuIiLCrrrrKVqxY4X6uT58+lpaW5n6clpZmknL89enTx/eBB5ii5PGll16yJk2aWLly5Sw2NtaaN29uEydOtPPnz/sh8sBRlBxeioL7oqLkcMyYMVa/fn2LioqyypUr2zXXXGMffPCBH6IOLEXdD7dt22adO3e26OhoS0xMtIcffthOnz7t46jhC0XdN5YvX27Nmze3iIgIq1u3rk2aNCnHOtesWWOtWrWyyMhIq1evnv3973+3rKwsp7viwdv9yszMtJEjR7qPL7Vr17YBAwb45ScIi9q33D7jJCUlebSZPXu2paSkWHh4uDVs2NAvJym93a+kpKRc24wYMcI3Hfr/nHi/fskfBbeZM/0qjceOgvoVKMcOJ2qjQDhueIvLzEf3KgEAAAAAUIbwHW4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4gIIbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcMD/A8hPDvkJfbUxAAAAAElFTkSuQmCC", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABOgAAAMwCAYAAACa9V8lAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXhN1+LG8e/JIDJJIiGEEFNMIRq05qHVpoZcVA1BNLTamjVV6hpqqqGKVl2KItq6VS16XZeYippqCEEliBBTY4ixEkKG3x9+Tp0MJC12G+/nefbz5Oyz93rX3ifnZGedtfYyZWRkZCAiIiIiIiIiIiKGsDK6AiIiIiIiIiIiIk8zNdCJiIiIiIiIiIgYSA10IiIiIiIiIiIiBlIDnYiIiIiIiIiIiIHUQCciIiIiIiIiImIgNdCJiIiIiIiIiIgYSA10IiIiIiIiIiIiBlIDnYiIiIiIiIiIiIHUQCciIiIiIiIiImIgNdCJiIiIiIiIiIgYSA10IiIiIiIiIiKSb/30008EBQXh5eWFyWTihx9+eOg+mzdvpmbNmhQsWJCyZcvy+eefP9Y6qoFORERERERERETyraSkJPz9/ZkxY0autj9x4gQtWrSgYcOG7Nu3j3/+85/079+fpUuXPrY6mjIyMjIeW+kiIiIiIiIiIiJ/ESaTieXLl9OmTZsctxkyZAgrVqwgJibGvO7tt99m//797Nix47HUSz3oRERERERERETkbyUlJYXr169bLCkpKY+k7B07dvDSSy9ZrAsMDGTPnj3cuXPnkWRkZvNYShX5C7mTeNyw7J+qDjUsu9GhCYZlGym8xkjDsttWPW1YtkPzSoZlm8qUMyx7zIAow7In/7rZsOzrH7UyLPvFybGGZa97r4Jh2UaqNGa7YdkjHPwNyzZSQ4fLhmW/nnTLsOzSNi6GZS+MnGJY9tN6vZZ2Jtqw7NFt/m1Y9msFrhqW7VEmybBsI68VF31s3HFvtDYu+98nlxuW/SQZ+f/2hBlfMnr0aIt1H3zwAaNGjfrTZZ87dw5PT0+LdZ6enqSmppKYmEjx4sX/dEZmaqATEREREREREZG/laFDhxIWFmaxzs7O7pGVbzKZLB7fu0Nc5vWPihroRERERERERETkb8XOzu6RNsjdr1ixYpw7d85i3YULF7CxscHd3f2xZKqBTkRERERERERE8i49zegaPBZ169blv//9r8W6tWvXUqtWLWxtbR9LpiaJkMcqNDTUYmaUJk2aMHDgQMPqIyIiIiIiIiJPlxs3bhAVFUVUVBQAJ06cICoqilOnTgF3h8t269bNvP3bb7/NyZMnCQsLIyYmhvnz5zNv3jwGDRr02OqoHnTyRC1btuyxtTY/KnuiDrLg398TffgYFy9d5tMJI3ihUb1HnlMi9CVK9wmiQFFXko6cIXbEQq7uPJztti7PVqT8iC44lvfCyt6OW2cucvar9ZyeveqR1edJHbeR2ZW7NcP/7RbYF3XlytGz/Dzqa87tOpLttj7Na1E55AXcq5bGuoAtV46eYe/UZZzZfPAPZRds1Qb79p2wKlyYtJPx3Ph8Bqm/HMh2W9vqNXCZ/GmW9VfeCCHt9Kk8Z9tUb4xNzZcwObqQcelXbm9eQvqvx7LdtsBLr2FTJeu5T7/0K7e+Gp3NHg/27fZoFm4+SOJvNynn6cp7/6hDQJliOW7/v73HWLj5IKcSr+FUsAD1KpYkrOWzuDoWzHN2na7NaPhWK5yLunLh6FlWjvmS+N3Zv97ORVxpMbwLJfzK4F6mGDvC17ByzFd5zrzfyBFhvPF6F9zcXNi1ax/9BgwjOvpojtu3adOc94f0o3w5H2xtbYk9doJpn8xm0aKleco18vUG6BH2Gq27tMTZxZlD+2KYOmw6J47G57h9GV8f3hgUSsXqvhT3LsanH/yLJV/k7ZjB2OM2+py/M6QXnbu9iotrIfZFHmTE4A85ejgux+2Du7WjXccgKla+O+nGwahoJo37lP17f8lTrpGfqUZmu3ZuSeHX22FTtDC3Y09yfvwcbu459ND97AOqUOrrSaTExhPfut8fygbj3mMP025gR57v/BKOLo4c2xfLghFzOBv7+CZP0vXa4z3ub9dsI/y/m0i8ep1yJYsx+LXWBFQum+P2i9dsZXHENn69eJliHm70bNuMoMa1/lC2kX+/jXx/P63XikZ+nj9Is5CXafVWG1yLuHE29jRfjp7Hkd0xjzwnX8hIN7oGubJnzx6aNm1qfnzv3nWvvfYa4eHhJCQkmBvrAMqUKcOqVat45513+Ne//oWXlxfTp0+nXbt2j62O6kEnD/UopxAuXLgwzs7Oj6y8x+HmzVtULF+Wf4b1fmwZRVvXxXfsa8R/spxdzd7n6s7D+H8zFLsS2Y9lT0tO4cz8CCLbjOLnhmHET1tGufc74hXywiOr05M4biOzywY9R91RXdn32QqWvzycc7uO8PJX7+Holf05L/ZcJc5u+YWIbh+zvMVwft0ew0sL3sW9auk8Zxdo3BTHt/uS/M1XXO3dkzu/HMBl3CSsihR94H6Xe3ThUqe25iXt7Jk8Z1v71sK2cQfu7FrFrUXjSPv1GHZt+mFydst2+9ubviV5znvm5eYXQ8i4eYO02Mg8Z6+JOs7k/+7kjedrsHhAG54pU4w+89aQcOVGttvvO3GOEd/+RJvavix9tx2Tuz7PodMXGf391jxnV2tVh5Yju7Fxxg981uKfxO8+TGj4EFxyeL2t7WxIuvwbG//1H87F5P3CNrP3BvVm4IA36T9wOHXqteTc+YtErPoGJyfHHPe5cvkqEyZOp0Gjf/BMzWYsXPgt8+ZO5aUXG+c618jXG6BL7050evNVpg7/jNdb9uLyxct88s1HODja57iPnb0dv55KYNb4uSSev/SHco08bqPPea/+PXijdzdGDBlPq2bBXLyQyKKlc3B0cshxnzr1a/Ofpavp+I8etAnsytmzCXy9dDaexR/8mXQ/Iz9Tjcx2btEIz3++yaXPvyW+TT+S9xzCe+4YbIoXeeB+Vk4OFP/oXZJ2ROU5835GvcceJujttjR/4x+Ej5zL8KDBXLt4hX8uGkXBP/DlSm7peu3xHXfE9n18tPA/9Gz7At9ODCOgUhl6T5hLQuKVbLdfsnY7079ZxdvtX2LZlMH0ah/I+PnL2BT58IatzIz8+23k+/tpvVY08vP8Qeq0qk+3kT34Ycb3/LPluxzeFc2QhSNw9/J4pDnyZDVp0oSMjIwsS3h4OADh4eFs2rTJYp/GjRuzd+9eUlJSOHHiBG+//fZjraMa6J6AJk2a0K9fPwYOHIibmxuenp7MmTOHpKQkunfvjrOzM+XKlWP16tUW+0VHR9OiRQucnJzw9PQkJCSExMRE8/MRERE0aNAAV1dX3N3dadWqFXFxv39jHh8fj8lkYtmyZTRt2hQHBwf8/f3ZsWPHA+trMpn4/PPPad26NY6OjowbN460tDRef/11ypQpg729PRUrVuTTTy2/tUlLSyMsLMxcn8GDB5tnObn/XNw/xNVkMvHDDz9YbOPq6mp+k9y+fZu+fftSvHhxChYsiI+PDxMmPN7p6BvWrU3/N1/jxSb1H1tGqbdb8uu/f+TXRT+SHHuW2BELSTl7iZKhL2W7/Y1f4jm/fDtJR85w6/RFzi3dyqWNB3B97tFNl/4kjtvI7GpvNufI4k0c+WYTV4/9ys+jvubGr5eo0i37i+afR33NgVn/I3H/ca6fOM+eSUu4fuIcpV58Js/Z9q904NaaVaRE/I+00ydJ+nwGaRcvUrBV6wful3H1KhlXLpsX0vP+7ZRNQDNSD20j7dA2Mq6c487mJWTcuIJN9RwafW7fguTr5sXKszQUdCD10PY8Z3+15Rfa1vbllecqUtbTlcH/qEMxV0e++zn7bx8PnLqIl5sTnRtUpURhZ54pU4xX61Qi+kxitts/SMM3WrBnySb2fLuJi3G/snLMV1xLuESdrs2y3f7qmURWjv6Sfcu2cOu35DznZda/3xtMmDidH35YzaFDR+jeYyAODvYEd2qb4z6bf9rBf/4TweHDxzh+/CSfzZjHgYMx1K//bK5zjXy9ATq80Y6F0xexefUWThyJZ9zASdjZF+TFtjn/c3p4/xH+NW42G1Zs5M7tP/aFkJHHbfQ5f/3trsyYMpeIlRs4GnOMsN7DKOhQkDbtWua4z4C33uer+d8S/csR4mJPMGTAKKysrGjQ6Llc5xr5mWpkduHubbn6/VqufbeG23GnuTB+DnfOXcStc87nG6DY2H5c/+8mbkVl3/Mqt4x6jz3My6+34j8zvmd3xM+cOXqKWe9Op0BBO+q1bvRY8kDXa4/zuL/630+0ff5ZXnmhDmVLejI4tA3F3F1Zsjb7z6mVW/bwarO6vFzvGUp6utO8/jO0bfosC/7zY56zjfz7beT7+2m9VjTy8/xBWrzxDzZ9u4FNi9fz67EzfDVmPpcSLtGs68uPNEckMzXQPSELFy7Ew8ODXbt20a9fP3r16kX79u2pV68ee/fuJTAwkJCQEJKT7/5hSUhIoHHjxtSoUYM9e/YQERHB+fPn6dChg7nMpKQkwsLC2L17Nxs2bMDKyoq2bduSnumDediwYQwaNIioqCh8fX0JDg4mNTX1gfX94IMPaN26NQcPHqRHjx6kp6dTsmRJlixZQnR0NCNHjuSf//wnS5YsMe8zZcoU87jsrVu3cvnyZZYvX/6nztv06dNZsWIFS5Ys4ciRI3z99df4+Pj8qTKNZrK1xrl6WS5vsuyyfnnzflxq+eaqDCc/H1xq+3J1h7pZ54aVrTUe1cpw9ifL4Vtnf/oFz1oVcleIyYStU0FSriblLdzGBpsKvtyJ3G2x+k7kbmyr+D1wV9eZX1D438soNHEqtv5/4MLDyhqroqVIPxltsTrtZDRWxcvlqgibqg1IP3WYjN8u5yn6TmoaMWcTqetbwmJ9nQol2B9/Idt9/EsX5fy1JLbEnCYjI4NLv91k/YF4GlbyzlO2ta01Xn5liN1i+R6L3XKQUjVz9x77M8qUKUXx4p6sW7/ZvO727dv8tOVn6tbN/XCf55s2oKJvObZs+Tl3Oxj4egN4lSqOh6c7uzbvMa+7c/sOUT/vp1qtqnkuL9eMPG6Dz3mp0iUpWqwIP238/Z+i27fvsHNbJDWf9c91OfYOBbG1seHqlWu52t7Iz1RDP89tbShYtTxJ2/ZarE7aug/7ZyrnuJvLKy9iW6o4iTMW5S0vE8PeYw9R1NsTt6KFObAlyrwu9XYqMTsP4Vvz0TVMPWlP6/XandRUYo6foW71ihbr6/pXZH8OQ6lv30mjgK3lnZPsCtjyy7HT3EnN/c3jDf37beT7+ym9VjT08/wBrG1tKFOtnMVnGsDBn6L+1p9pj1V6unFLPqN70D0h/v7+DB8+HLh788GJEyfi4eFBz549ARg5ciSzZs3iwIED1KlTh1mzZhEQEMD48ePNZcyfPx9vb2+OHj2Kr69vlrHP8+bNo2jRokRHR+Pn9/uH+aBBg2jZ8u43P6NHj6Zq1aocO3aMSpVy/oDp3LkzPXr0sFg3evTv9xQoU6YM27dvZ8mSJeZGw08++YShQ4ea6/X555+zZs2aPJ+r+506dYoKFSrQoEEDTCYTpUs/2u7LRrAtXAgrG2tuX7T8Ryjl4jUKF3V94L71982kgHshTDbWHJ/8Hb8uyvs3k0+jgoWdsbKxJjnTOb958Rr2RVxzVUb1t1pg42DH8f/uzFO2VSEXTNY2pF+1vGhJv3oFk1vhbPdJv3yJ3z6ZTGrsEUy2BbB74SUKTZzKtfcG5HgvkuyY7J0wWVmTkXzdYn1G8m+YHAo9vACHQlj5VOX26nm5zrznStIt0tIzKOxkOezK3dmexN9uZrtPDR9Pxgc3YciijdxOTSU1PYMmVUoxpE3dPGU7uDljbWPNjUyv942L13D2cMnbgfwBxTzvDkc5f96y59/58xcpXarkA/ctVMiZU/GR2NkVIC0tjb79/sn6DVtylWvk6w1QuOjd3+crmYZAXb54hWIlPf9Qmblh5HEbfc6LeN4dApR40XLYYuLFS5TwLp7rct4f+Q7nEi6wdXPuGoON/Ew1MtvG7e7f37TEqxbr0y5dwdoj+6FgtqW9KDIolJOdB0Pan/tHwqj32MO4/P+1y7WLVy3WX0+8ikeJBw8N/Ct7Wq/XrlxPIi09HXcXJ4v17i5OJF79Ldt96vlXZPmPO3m+th+Vy5Qk+vgZfti0i9S0NK7+lkQRt1x8HmLs328j399P67WikZ/nD+L8/7+H1zL9LlxLvIpLLusl8kepge4JqV69uvlna2tr3N3dqVatmnmdp+fdC6sLF+72LImMjGTjxo04OVn+cQSIi4vD19eXuLg4RowYwc8//0xiYqK559ypU6csGujuzy5evLg550ENdLVqZe3l8fnnn/PFF19w8uRJbt68ye3bt6lRowYA165dIyEhgbp1f/9H2sbGhlq1amUZ5poXoaGhvPjii1SsWJGXX36ZVq1a8dJL2Q8rAEhJSSElJcVinVVKCnZ2dn+4Do9LBpbnxWQywUPOVWTrD7B2LIhLzQqUH9aZm/HnOL/8jw2LeiplPr+mbNZlo1zrugSEtWVtj2ncunT9odtnn53psSm7lXelnTlN2pnfb6ydGnMI6yJFsX+1E7/l4aLrz7KpWg9SbpIWF/WHyzCZLB9nZGRdd0/c+St89J+febNZDepVLEni9WSm/W8XHy7bxqj2Df9wHX6vTE5n/M8JDm7LrH9NMj/+R+u7sz9l/uwzmUwP/Tz87bcb1Kz9Ek5OjjzftAEfT/6AEydOsfmnB9+a4FHI6+v9UtsXeG9SmPnxe92GAn/suI30KH7Pn1R2m1dbMmHqSPPj0E59gOzOedZ1OXm7X3dat2tOh6AepKTczl3F7zH0M9W47Kzn1kS2ny5WVnhNHUzi9EXciT+b55y/6nusfptGvD7+93vwfNT9w+w3/Iu/93Prab1eM2X6Y/2gv99vtnuRxKvXCRk+nYwMKOzixD8a1yZ8xUasrHLYKU+VeTx/v7PzpN7f2YdnE/0UXCsa+rfkgfXK9DgX7/2nVcbfZJKIvwM10D0hmWcuNZlMFuvu/RG818iWnp5OUFAQkyZNIrN7jWxBQUF4e3szd+5cvLy8SE9Px8/Pj9u3LS+wH5STE0dHyxuZL1myhHfeeYcpU6ZQt25dnJ2dmTx5Mjt3/rlvK7K7qLx/UoqAgABOnDjB6tWrWb9+PR06dKBZs2Z8//332ZY3YcIEi55+AMPf68/IwQP+VD0fpTuXr5OemoZdpm9gCngUyvItbWa3Tl0EICnmNAWKuFJmUPu/3QWfEW5d/o301DQcMn3jbe/hws3EB5/zskHP0ejjN1j/1mf8ujXvNztOv36NjLRUrDJ9A2rl4kbGlexvtpydO4cPYfd8zo3T2cm4eYOM9LQs34CaHJyzfFOaHZsq9UiN+RnScz885R43x4JYW5m4lKm33OUbN3F3yv5m5vM37sffpyihTe5+qeBbvDD2BWzoPut/9AmsSZFCOd/0/n7JV34jLTUNpyKW37Y7ebhw4yGv9x/x3/+uZdeufebHdnYFAChWrAjnzv0+nLdoUQ/OX3jw/fQyMjKIi4sHYP/+Q1SqVJ4hg/vmqoHuSb/eW9du59C+34dtFShw97gLFynMpQu/9wJw83DN0uPnUTLy9/xJZ6+L2Mi+yN//8br3u1akqAcX7uux6e7hTuKFh08G8Gbf1+gT9gZd2vbk8ANmGM7MyM9UI7NTr1wnIzUNmyKWvWms3V2z9LoBsHK0x76aLwUrl8NzZK//X2nCZGVFxej/crrHcJJ/3p9j3l/lPZZZ5LpdHNv3+++LTYG715kuRVy5euH3ehRyd+HaY/jMfVKe1us1t0KOWFtZZektd/n6Ddxdsp/orWABW8b06sSInu25fO03PNwKsXT9zzja2+HmnPPkSJk96b/f93vS7+/7Pa3XikZ+nj/Ib///e5i5t5zL3/wzTf4edA+6v6iAgAAOHTqEj48P5cuXt1gcHR25dOkSMTExDB8+nBdeeIHKlStzJQ8f4Hm1ZcsW6tWrR+/evXnmmWcoX768xYQULi4uFC9enJ9//n14TGpqKpGRD57Np0iRIiQkJJgfx8bGmu/Dd0+hQoXo2LEjc+fO5dtvv2Xp0qVcvpz9PQ6GDh3KtWvXLJYhAx7vTCt5lXEnjd8OHKdw4+oW6ws3qs61Pbn/BwnAqoDa2HMj/U4aiQdPUKKh5X08SjT04/ye2Bz3K9e6Lo2nvcWPfWdy+seoPxaemkpq7FFsAyx7pdoG1OJO9C857JSVTbkKpF/O4+x76WmkXziFVSnLe6dYl6pMekJcDjvdZVXSFys3T1IPbctb5v+ztbGmcgkPdsRafqu8M/ZX/H2yn5Hs1u00rDJ9PX/vm/e89MJIu5PGr7+coEKDahbryzfw41Rk3t5juXHjRhJxcfHmJTr6KAkJ52n2wu83R7e1taVRwzrs2LHnASVlZTKZzI0wD/WEX+/kpJucjf/VvJw4Gk/i+UvUblTTvI2NrQ016vhzcM+jvXi2YODv+ZPOTrqRzMkTp83L0cNxXDh3kYZNfu+9bmtrw3P1axK568H/GL7VL5T+g96iW/teHIiKfuC2mRn5mWro5/mdVG4dOoZjPcv7PDnWf4ab+7LeYyz9RjLHW/biROu+5uXqN6tIOX6aE637cnP/g28o/5d5j2VyK+kW50+eMy9nY09z5cJlqjX4/b6H1rY2VH6uKkcj/9ykGEZ6Wq/XbG1sqFy2JD8fsDzGnw8cxd/X5yH7WuPp7oq1lRUR2/fRKKAKVla5/3fzSf/9tvCE398WntJrRUM/zx8g7U4qJw7GUa2h5b1c/Rr6/60/0x4r3YPukfn7/LV4yvTp04e5c+cSHBzMe++9h4eHB8eOHWPx4sXMnTsXNzc33N3dmTNnDsWLF+fUqVO8//77j60+5cuX58svv2TNmjWUKVOGr776it27d1OmTBnzNgMGDGDixIlUqFCBypUrM3XqVK5evfrAcp9//nlmzJhBnTp1SE9PZ8iQIRY9/qZNm0bx4sWpUaMGVlZWfPfddxQrVgxXV9dsy7Ozs8synPXO7bzNAJmcfJNTZ341Pz7763kOH43DpZAzxYs9eKrz3Dr1+f+oOqMv1/fHcW1PLCVCXsCupAdnF64DoNywYOyKFSa6378AKNn9JW6dTSQp9m69XJ+rROneQZyeF/FI6gNP5riNzD44ZzVNPu3FxQPHuRB5jEpdmuJUwp2YrzYAUPv9DjgWc2PTwNnA3T/+TT55i+0ffM2Fvcew//9vc1Nv3eZODvdQy8nNZUtwfm8YqUePkBpziIItWmFdtCi3/rcCAIfuPbHyKMKNyXfvOVmw7auknztH6skTmGxtsXv+RewaNuH6mOF5Pu7UvespENid9PMnSU84jk21hpicC5N64CcAbOu3weToyu214Rb72VStT1rCcTIu/ZpNqbkT0tCPYd9upmrJIlQvVZSlOw+TcPUGr9a5O7x++urdXLiWzLhOd2cJa1TFm7Hfb2XJjhjq+Zbg4m83mbziZ/y8i1DUJfffwANs+WIVHab25syB45zaG8uznZ/H1cuDnYvuvt6BgztSyLMw3707y7xP8Sp373FZwKEgjoULUbxKadJup3LhWN6Hrkz/7AveH9KP2GMnOHbsBO8P6Udy8k2+Wfz7xDkL5n/Kr78mMGz4RACGDO5LZOR+4o6fpEABW5q//AIhXV+lT9+huc418vUGWPLFUrr168KZE2c5feIM3fp1IeXmLdYt32DeZvin75OYkMjnE7+4m21rQxnfu+fe1taGIsU8qFC1nLlx4q9+3Eaf83mff02fsDc4cfwkJ46fou87PbmVfIsflv7PvM20mR9yLuECk8benX397X7defeffen/5hDOnDpLkaJ372WXlJRMclLuPt+M/Ew1MvvyguV4ffQut36J5WbUYVw7vIxt8SJc+WYVAEXeDcXG052EwVMgI4PbsSct9k+7fI2MlNtZ1ueWUe+xh4mYt5LWfV7lXHwC504k0LpvO27fSmH7f356JOVnR9drdz2O4w5p2YhhM76hSrmS+FfwYemGn0lIvEL7F+9+GfDpv//HhcvX+LBvZwDif73IL3GnqFa+FNeTbvLVys0cO32Osb2D85xt5N9vI9/fT+u1opGf5w+y6osV9J42gOMH4ojde4Tng1/Ew8uDDYv+3P3VRR5GDXR/UV5eXmzbto0hQ4YQGBhISkoKpUuX5uWXX8bKygqTycTixYvp378/fn5+VKxYkenTp9OkSZPHUp+3336bqKgoOnbsiMlkIjg4mN69e7N69WrzNu+++y4JCQmEhoZiZWVFjx49aNu2Ldeu5dwVeMqUKXTv3p1GjRrh5eXFp59+atHrzsnJiUmTJhEbG4u1tTW1a9dm1apVefo2Lq9+ORxLj35DzI8/+mwOAK2bN+PD4e8+kowL/9mBrZszZcLaYefpxo3Dp9nfeSK3ztxtTCxQ1JWCJdx/38HKinLDOmNfqggZqekkx5/n2Lh/c/bL9Y+kPvBkjtvI7OP/3YmdmzMBA9viUNSVy0fOENFtMjfO3v2m0aGoK44lPMzbV+r6PFa2NjQYH0qD8aHm9UeX/MTmsDl5yr69eSNJzi44dOmGVWF30k6e4NrwIaRfOA+AVWF3rIv8flFtsrHF8c1eWLkXIeN2Cmkn47k2fDB3dud9SHna0T3cKeiIbZ2WmBxcyLj0Kyn/mWGeacvk6IKpUKYbEBcoiHX5AG5v/jbPefcLrFGWq8m3mL1+H4nXkylfzI0ZPV7Cy+3uEJmL12+ScPWGefvWtXxJTrnD4u3RTF25E+eCdtQuX5wBLWrnOfvgyp9xdHXihQGv4FzElfNHzxDe/SOunr37HnMu6orr/e8xoP+qCeafS1YvS4029bly5iIfNcj7EPnJH8/E3r4gM6aPx83NhV279tG8ZWdu3Ph9lrFS3l4WtxtwdHTgs+kTKFmyGDdv3uLIkTi6hfbnu+9W5DrXyNcbYNHMxdgVtOPd8QNwdnEmel8MAzsPtmj08fQqSsZ9x+3h6U742rnmx517daRzr47s3R5Fv/Zh5IaRx230OZ81fT4F7e34cPJwCrkWIiryIF1efYukG7/3RvcqWZz09N97oYa83hE7uwLMXjjNoqxpk2YybdIscsPIz1Qjs39b9RPnXZ3x6NMZ66KFuX00ntM9PyD117vD2W2KuGFb/PFNjGDUe+xh/vv5cgoULED3cW/iWMiJuKhYJnQdza2kW4+k/Ozoeu2ux3HcL9d7hmu/JTNn6TouXrlOee/i/Ov9N/AqcvezLPHqdc5dumrePj09nS9XbuLkrxexsbamdtVyfDm2HyWKZj/BwYMY+ffbyPf303qtaOTn+YP8vHIbTm7OvNK/A65F3Thz9BQfhY4j8ezFR5Yhkh1TRn64e6vIA9xJPG5Y9k9Vc9/z5VFrdGjCwzfKh8JrjHz4Ro9J26qnH77RY+LQ3Lhp301lyhmWPWZAlGHZk3/dbFj29Y9aGZb94uSch508buveq2BYtpEqjTHu3lUjHPwfvlE+1NAh+1tpPAmvP8ZGrYcpbfP4Z7rOycLIKYZlP63Xa2ln8ja8/VEa3ebfhmW/VuCqYdkeZZIevtFjYuS14qKPjTvujdbGZf/75PKHb5QP3D6du/stPg4FvPPXdYruQSciIiIiIiIiImIgDXEVEREREREREZG8+wOz+Er21INORERERERERETEQGqgExERERERERERMZCGuIqIiIiIiIiISN5lpD98G8kVNdBJvve0zsxl5HEbaWPB28aFH/I2LLps1FXDsiHSuGh7a8OinytS0bDsHROuGpY9z6WgYdlGHreRGjiXNy7cwNvKHLcx8II/ubBh0aVtjJtxsLTJ3rDsuHp9Dcuuv6a3YdlGHvfZa86GZRv599vI4z4bZeA5N/Ba8biBr3dpjPtcE8krNdCJiIiIiIiIiEjepasH3aOie9CJiIiIiIiIiIgYSD3oREREREREREQkzzJ0D7pHRj3onlImk4kffvjhb1OuiIiIiIiIiEh+pQY6+Uu6fdvAG/2LiIiIiIiIiDxBaqB7RJo0aUK/fv0YOHAgbm5ueHp6MmfOHJKSkujevTvOzs6UK1eO1atXW+wXHR1NixYtcHJywtPTk5CQEBITE83PR0RE0KBBA1xdXXF3d6dVq1bExcWZn4+Pj8dkMrFs2TKaNm2Kg4MD/v7+7NixI8e6+vj4ANC2bVtMJpP5McB///tfatasScGCBSlbtiyjR48mNTUVgDFjxuDl5cWlS5fM2//jH/+gUaNGpKen51huaGgobdq0sajDwIEDadKkicX569u3L2FhYXh4ePDiiy/m6vyIiIiIiIiIiEHS041b8hk10D1CCxcuxMPDg127dtGvXz969epF+/btqVevHnv37iUwMJCQkBCSk5MBSEhIoHHjxtSoUYM9e/YQERHB+fPn6dChg7nMpKQkwsLC2L17Nxs2bMDKyoq2bduSnumXcdiwYQwaNIioqCh8fX0JDg42N6xltnv3bgAWLFhAQkKC+fGaNWvo2rUr/fv3Jzo6mtmzZxMeHs6HH35ozvDx8eGNN94A4PPPP+enn37iq6++wsrKKsdy83L+bGxs2LZtG7Nnz87V+RERERERERER+bvTJBGPkL+/P8OHDwdg6NChTJw4EQ8PD3r27AnAyJEjmTVrFgcOHKBOnTrMmjWLgIAAxo8fby5j/vz5eHt7c/ToUXx9fWnXrp1Fxrx58yhatCjR0dH4+fmZ1w8aNIiWLVsCMHr0aKpWrcqxY8eoVKlSlnoWKVIEAFdXV4oVK2Ze/+GHH/L+++/z2muvAVC2bFnGjh3L4MGD+eCDD7C2tubrr7+mRo0avP/++3z22WfMmTOH0qVLP7Dc3CpfvjwfffSR+fHIkSMfen5ERERERERExCCaJOKRUQPdI1S9enXzz9bW1ri7u1OtWjXzOk9PTwAuXLgAQGRkJBs3bsTJySlLWXFxcfj6+hIXF8eIESP4+eefSUxMNPecO3XqlEUD3f3ZxYsXN+dk10CXk8jISHbv3m3uMQeQlpbGrVu3SE5OxsHBgbJly/Lxxx/z1ltv0bFjR7p06ZLr8h+mVq1aWerzsPOTWUpKCikpKRbrbmekUcBk/cjqKSIiIiIiIiLyKKmB7hGytbW1eGwymSzWmUwmAHMjW3p6OkFBQUyaNClLWfca2YKCgvD29mbu3Ll4eXmRnp6On59flkkUHpSTW+np6YwePZpXXnkly3MFCxY0//zTTz9hbW1NfHw8qamp2Ng8+NfIysqKjIwMi3V37tzJsp2jo2OW+jzs/GQ2YcIERo8ebbEuxKEKrzn5Zbu9iIiIiIiIiIjR1EBnoICAAJYuXYqPj0+2jVyXLl0iJiaG2bNn07BhQwC2bt36SLJtbW1JS0vLUp8jR45Qvnz5HPf79ttvWbZsGZs2baJjx46MHTvWokEsu3KLFCnCL7/8YrEuKioqS4NmZg87P9kZOnQoYWFhFuu2le+Rq31FREREREREJA/S0x6+jeSKJokwUJ8+fbh8+TLBwcHs2rWL48ePs3btWnr06EFaWhpubm64u7szZ84cjh07xo8//pil8emP8vHxYcOGDZw7d44rV64Ad+/59uWXXzJq1CgOHTpETEwM3377rfm+emfOnKFXr15MmjSJBg0aEB4ezoQJE/j5558fWO7zzz/Pnj17+PLLL4mNjeWDDz7I0mD3R85Pduzs7ChUqJDFouGtIiIiIiIiIvJXpgY6A3l5ebFt2zbS0tIIDAzEz8+PAQMG4OLigpWVFVZWVixevJjIyEj8/Px45513mDx58iPJnjJlCuvWrcPb25tnnnkGgMDAQFauXMm6deuoXbs2derUYerUqZQuXZqMjAxCQ0N59tln6du3LwAvvvgiffv2pWvXrty4ceOB5Y4YMYLBgwdTu3ZtfvvtN7p16/anz4+IiIiIiIiIGCgj3bglnzFlZL45mEg+s8Gzo2HZjQ5NMCz7p6pDDcs20ryCtx++0WPSNM3x4Rs9JmWzua/j02C9vXE9ZH+6c86w7DGpRQzLLuHym2HZZ685G5ZtpKf1c+24jXEX3mVTjfsicKN1kmHZpU32hmW/VuCqYdk+S3oblh3fYaZh2UZ+phr597vZTQ3He9KMfL2NNCH+30ZX4YlIidloWLZd5aaGZT8OugediIiIiIiIiIjkXR4np5ScaZygiIiIiIiIiIiIgdRAJyIiIiIiIiIiYiANcRURERERERERkbzLh5M1GEU96ERERERERERERAykHnSS7z2tM6kaedxGOl5jpGHZnV65ali2ddkShmVbNQkyLPt4q8WGZU++eMSw7LofVTAs+8XJFw3LXjfU27BsI3Ufs92w7NdtqxiWDbaGJTd0uGxYdnjSLcOyjfzPoNz2GYZlG3u9Ztxx+5yJNizbyL/f/jXOGpZtV9G4mXNtXmhkWPbxPgeMyzZwRvCnhiaJeGTUg05ERERERERERMRAaqATERERERERERExkIa4ioiIiIiIiIhInmVkpBldhXxDPegEAB8fHz755BOjqyEiIiIiIiIi8tRRDzoBYPfu3Tg6Oj72HJPJxPLly2nTps1jzxIRERERERGRxyhDk0Q8KupB95S7ffs2AEWKFMHBwcHg2uTenTt3jK6CiIiIiIiIiMgj8VQ30DVp0oR+/foxcOBA3Nzc8PT0ZM6cOSQlJdG9e3ecnZ0pV64cq1evttgvOjqaFi1a4OTkhKenJyEhISQmJpqfj4iIoEGDBri6uuLu7k6rVq2Ii4szPx8fH4/JZGLZsmU0bdoUBwcH/P392bFjxwPrazKZmDVrFs2bN8fe3p4yZcrw3XffWWxz9uxZOnbsiJubG+7u7rRu3Zr4+Hjz86GhobRp04YJEybg5eWFr68vkHWIq8lkYvbs2bRq1QoHBwcqV67Mjh07OHbsGE2aNMHR0ZG6detaHBfAf//7X2rWrEnBggUpW7Yso0ePJjU11ZwB0LZtW0wmk/nxw/a7V5/PP/+c1q1b4+joyLhx4x54rkRERERERERE/i6e6gY6gIULF+Lh4cGuXbvo168fvXr1on379tSrV4+9e/cSGBhISEgIycnJACQkJNC4cWNq1KjBnj17iIiI4Pz583To0MFcZlJSEmFhYezevZsNGzZgZWVF27ZtSU+37Po5bNgwBg0aRFRUFL6+vgQHB1s0SmVnxIgRtGvXjv3799O1a1eCg4OJiYkBIDk5maZNm+Lk5MRPP/3E1q1bcXJy4uWXXzb3lAPYsGEDMTExrFu3jpUrV+aYNXbsWLp160ZUVBSVKlWic+fOvPXWWwwdOpQ9e/YA0LdvX/P2a9asoWvXrvTv35/o6Ghmz55NeHg4H374IXB3GC3AggULSEhIMD9+2H73fPDBB7Ru3ZqDBw/So0ePB54nEREREREREXnM0tONW/KZp76Bzt/fn+HDh1OhQgWGDh2Kvb09Hh4e9OzZkwoVKjBy5EguXbrEgQMHAJg1axYBAQGMHz+eSpUq8cwzzzB//nw2btzI0aNHAWjXrh2vvPIKFSpUoEaNGsybN4+DBw8SHR1tkT1o0CBatmyJr68vo0eP5uTJkxw7duyB9W3fvj1vvPEGvr6+jB07llq1avHZZ58BsHjxYqysrPjiiy+oVq0alStXZsGCBZw6dYpNmzaZy3B0dOSLL76gatWq+Pn55ZjVvXt3OnTogK+vL0OGDCE+Pp4uXboQGBhI5cqVGTBggEW5H374Ie+//z6vvfYaZcuW5cUXX2Ts2LHMnj0buDuMFsDV1ZVixYqZHz9sv3s6d+5Mjx49KFu2LKVLl37geRIRERERERER+bt46ieJqF69uvlna2tr3N3dqVatmnmdp6cnABcuXAAgMjKSjRs34uTklKWsuLg4fH19iYuLY8SIEfz8888kJiaae86dOnXKokHs/uzixYubcypVqpRjfevWrZvlcVRUlLlux44dw9nZ2WKbW7duWQxFrVatGgUKFMgxI7v63TsPmc/NrVu3uH79OoUKFSIyMpLdu3db9HxLS0vj1q1bJCcn53iPu9zuV6tWrYfWOSUlhZSUFIt1Vikp2NnZPXRfEREREREREckDTRLxyDz1DXS2trYWj00mk8U6k8kEYG5kS09PJygoiEmTJmUp614jW1BQEN7e3sydOxcvLy/S09Px8/OzGGaaOTtzTl7cv2/NmjVZtGhRlm3u9VYDcj1ba3b1e9i5GT16NK+88kqWsgoWLJhjTm73y029J0yYwOjRoy3WDX+vPyMHD3joviIiIiIiIiIiRnjqG+jyKiAggKVLl+Lj44ONTdbTd+nSJWJiYpg9ezYNGzYEYOvWrY8s/+eff6Zbt24Wj5955hlz3b799luKFi1KoUKFHllmbgUEBHDkyBHKly+f4za2trakpaXleb/cGjp0KGFhYRbrrH47+6fLFREREREREZFM0tMevo3kylN/D7q86tOnD5cvXyY4OJhdu3Zx/Phx1q5dS48ePUhLSzPPnjpnzhyOHTvGjz/+mKXB6M/47rvvmD9/PkePHuWDDz5g165d5okaunTpgoeHB61bt2bLli2cOHGCzZs3M2DAAM6cOfPI6pCTkSNH8uWXXzJq1CgOHTpETEwM3377LcOHDzdv4+Pjw4YNGzh37hxXrlzJ9X65ZWdnR6FChSwWDW8VERERERERkb8yNdDlkZeXF9u2bSMtLY3AwED8/PwYMGAALi4uWFlZYWVlxeLFi4mMjMTPz4933nmHyZMnP7L80aNHs3jxYqpXr87ChQtZtGgRVapUAcDBwYGffvqJUqVK8corr1C5cmV69OjBzZs3n0iPusDAQFauXMm6deuoXbs2derUYerUqRYTOkyZMoV169bh7e1t7vmXm/1ERERERERERPKrp3qI6/0zkN4THx+fZV1GRobF4woVKrBs2bIcy23WrFmWGVvvL8PHxydLma6urlnWZcfLy4u1a9fm+HyxYsVYuHBhjs+Hh4dnuz7zcWeuS3Z1btKkSZZ1gYGBBAYG5pgfFBREUFBQlvUP2y8350ZEREREREREniBNEvHIqAediIiIiIiIiIiIgZ7qHnQiIiIiIiIiIvIHpasH3aOiBrq/EQ3zFBERERERERHJfzTEVURERERERERExEDqQSciIiIiIiIiInmnSSIeGTXQiYg8IilHfjMs26GsYdFknIx++EYif3OmMuUMTN9uWPJxW1vDskXk8bIuWcWw7OM2xv1Dn3jC0bBsD4y7VrQuG2dYtpGv98mMm4Zli+SVGuhERERERERERCTvNEnEI6N70ImIiIiIiIiIiBhIPehERERERERERCTv1IPukVEPOhEREREREREREQOpgU6euLS0NNLVyi4iIiIiIiIiAqiB7i+pSZMm9OvXj4EDB+Lm5oanpydz5swhKSmJ7t274+zsTLly5Vi9erXFftHR0bRo0QInJyc8PT0JCQkhMTHR/HxERAQNGjTA1dUVd3d3WrVqRVzc77P5xMfHYzKZWLZsGU2bNsXBwQF/f3927NjxwPpOnTqVatWq4ejoiLe3N7179+bGjRvm58PDw3F1dWXlypVUqVIFOzs7Tp48ye3btxk8eDAlSpTA0dGR5557jk2bNpn3u3TpEsHBwZQsWRIHBweqVavGN9988yfProiIiIiIiIg8ChkZaYYt+Y0a6P6iFi5ciIeHB7t27aJfv3706tWL9u3bU69ePfbu3UtgYCAhISEkJycDkJCQQOPGjalRowZ79uwhIiKC8+fP06FDB3OZSUlJhIWFsXv3bjZs2ICVlRVt27bN0ptt2LBhDBo0iKioKHx9fQkODiY1NTXHulpZWTF9+nR++eUXFi5cyI8//sjgwYMttklOTmbChAl88cUXHDp0iKJFi9K9e3e2bdvG4sWLOXDgAO3bt+fll18mNjYWgFu3blGzZk1WrlzJL7/8wptvvklISAg7d+58VKdZRERERERERMRwmiTiL8rf35/hw4cDMHToUCZOnIiHhwc9e/YEYOTIkcyaNYsDBw5Qp04dZs2aRUBAAOPHjzeXMX/+fLy9vTl69Ci+vr60a9fOImPevHkULVqU6Oho/Pz8zOsHDRpEy5YtARg9ejRVq1bl2LFjVKpUKdu6Dhw40PxzmTJlGDt2LL169WLmzJnm9Xfu3GHmzJn4+/sDEBcXxzfffMOZM2fw8vIy50ZERLBgwQLGjx9PiRIlGDRokLmMfv36ERERwXfffcdzzz2X53MqIiIiIiIiIo+Qbl/1yKiB7i+qevXq5p+tra1xd3enWrVq5nWenp4AXLhwAYDIyEg2btyIk5NTlrLi4uLw9fUlLi6OESNG8PPPP5OYmGjuOXfq1CmLBrr7s4sXL27OyamBbuPGjYwfP57o6GiuX79Oamoqt27dIikpCUdHRwAKFChgUe7evXvJyMjA19fXoqyUlBTc3d2Bu/eqmzhxIt9++y1nz54lJSWFlJQUc5nZubfN/axSUrCzs8txHxERERERERERI6mB7i/K1tbW4rHJZLJYZzKZAMyNbOnp6QQFBTFp0qQsZd1rZAsKCsLb25u5c+fi5eVFeno6fn5+3L59O8fszDmZnTx5khYtWvD2228zduxYChcuzNatW3n99de5c+eOeTt7e3tzWffKs7a2JjIyEmtra4sy7zUyTpkyhWnTpvHJJ5+Y73E3cODALPW934QJExg9erTFuuHv9Wfk4AE57iMiIiIiIiIiYiQ10OUTAQEBLF26FB8fH2xssr6sly5dIiYmhtmzZ9OwYUMAtm7d+qdz9+zZQ2pqKlOmTMHK6u4tDZcsWfLQ/Z555hnS0tK4cOGCuT6ZbdmyhdatW9O1a1fgbqNebGwslStXzrHcoUOHEhYWZrHO6rezuT0cEREREREREcmtDA1xfVQ0SUQ+0adPHy5fvkxwcDC7du3i+PHjrF27lh49epCWloabmxvu7u7MmTOHY8eO8eOPP2ZpyPojypUrR2pqKp999hnHjx/nq6++4vPPP3/ofr6+vnTp0oVu3bqxbNkyTpw4we7du5k0aRKrVq0CoHz58qxbt47t27cTExPDW2+9xblz5x5Yrp2dHYUKFbJYNLxVRERERERERP7K1ECXT3h5ebFt2zbS0tIIDAzEz8+PAQMG4OLigpWVFVZWVixevJjIyEj8/Px45513mDx58p/OrVGjBlOnTmXSpEn4+fmxaNEiJkyYkKt9FyxYQLdu3Xj33XepWLEi//jHP9i5cyfe3t4AjBgxgoCAAAIDA2nSpAnFihWjTZs2f7rOIiIiIiIiIvIIpKcbt+QzGuL6F7Rp06Ys6+Lj47Osy8jIsHhcoUIFli1blmO5zZo1Izo6OscyfHx8spTp6uqaZV1m77zzDu+8847FupCQEPPPoaGhhIaGZtnP1taW0aNHZ7ln3D2FCxfmhx9+eGC2iIiIiIiIiMjfnRroREREREREREQk73QPukdGQ1xFREREREREREQMpAY6ERERERERERERA2mIq4iIiIiIiIiI5F0+nKzBKOpBJyIiIiIiIiIiYiD1oBMRERERERERkbzTJBGPjCkjIyPD6EqIPE5zS3Y1LHujdZJh2U3THA3LNlJo1BjDsu98Pcmw7OTVhw3LtqvobFj202rXt8a9v9fbWxuW3exmmmHZx21tDcsue+eOYdlGMvKcN3S4bFj2luTChmUb+btm5Out67Unr+u/qhuWnTTzf4ZlJ54w7vX2KGPc7/nTetweazYblv0k3Vwzw7Bs+8C+hmU/DhriKiIiIiIiIiIiYiANcRURERERERERkbzTJBGPjHrQyQP5+PjwySefGF0NEREREREREZF8Sw10AkB4eDiurq5GV0NERERERERE/i7S041b8hk10ImIiIiIiIiIiBgoXzTQNWnShH79+jFw4EDc3Nzw9PRkzpw5JCUl0b17d5ydnSlXrhyrV6+22C86OpoWLVrg5OSEp6cnISEhJCYmmp+PiIigQYMGuLq64u7uTqtWrYiLizM/Hx8fj8lkYtmyZTRt2hQHBwf8/f3ZsWPHA+s7atQoSpUqhZ2dHV5eXvTv39/8nI+PD+PGjaNbt244OTlRunRp/vOf/3Dx4kVat26Nk5MT1apVY8+ePRZlLl26lKpVq2JnZ4ePjw9TpkyxeP7KlSt069YNNzc3HBwcaN68ObGxsQBs2rSJ7t27c+3aNUwmEyaTiVGjRpn3TU5OpkePHjg7O1OqVCnmzJmT53Owfft2GjVqhL29Pd7e3vTv35+kpN9n1Jk5cyYVKlSgYMGCeHp68uqrr5qf+/7776lWrRr29va4u7vTrFkzi31FRERERERExAAZ6cYt+Uy+aKADWLhwIR4eHuzatYt+/frRq1cv2rdvT7169di7dy+BgYGEhISQnJwMQEJCAo0bN6ZGjRrs2bOHiIgIzp8/T4cOHcxlJiUlERYWxu7du9mwYQNWVla0bduW9ExdKYcNG8agQYOIiorC19eX4OBgUlNTs63n999/z7Rp05g9ezaxsbH88MMPVKtWzWKbadOmUb9+ffbt20fLli0JCQmhW7dudO3alb1791K+fHm6detGRkYGAJGRkXTo0IFOnTpx8OBBRo0axYgRIwgPDzeXGRoayp49e1ixYgU7duwgIyODFi1acOfOHerVq8cnn3xCoUKFSEhIICEhgUGDBpn3nTJlCrVq1WLfvn307t2bXr16cfjw4Vyfg4MHDxIYGMgrr7zCgQMH+Pbbb9m6dSt9+96dEnnPnj3079+fMWPGcOTIESIiImjUqJH5dQoODqZHjx7ExMSwadMmXnnlFfOxi4iIiIiIiIj83Zky8kFLR5MmTUhLS2PLli0ApKWl4eLiwiuvvMKXX34JwLlz5yhevDg7duygTp06jBw5kp07d7JmzRpzOWfOnMHb25sjR47g6+ubJefixYsULVqUgwcP4ufnR3x8PGXKlOGLL77g9ddfB+72yqtatSoxMTFUqlQpSxlTp05l9uzZ/PLLL9ja2mZ53sfHh4YNG/LVV19Z1HvEiBGMGTMGgJ9//pm6deuSkJBAsWLF6NKlCxcvXmTt2rXmcgYPHsz//vc/Dh06RGxsLL6+vmzbto169eoBcOnSJby9vVm4cCHt27cnPDycgQMHcvXq1QfWJyMjg2LFijF69GjefvvtXJ2Dbt26YW9vz+zZs83lbt26lcaNG5OUlMSqVavo3r07Z86cwdnZ2SJ/79691KxZk/j4eEqXLp3lfOXG3JJd/9B+j8JGa+N6+jVNczQs20ihUWMMy77z9STDspNXH374Ro+JXUXnh28kj9Sub417f6+3tzYsu9nNNMOyj2fzN/tJKXvnjmHZRjLynDd0uGxY9pbkwoZlG/m7ZuTrreu1J6/rv6oblp0083+GZSeeMO719ihj3O/503rcHms2G5b9JN1cOdWwbPtWYXnafubMmUyePJmEhASqVq3KJ598QsOGDXPcftGiRXz00UfExsbi4uLCyy+/zMcff4y7u/ufrXq28k0PuurVf/+Qt7a2xt3d3aJnmqenJwAXLlwA7vY627hxI05OTublXoPavWGscXFxdO7cmbJly1KoUCHKlCkDwKlTp3LMLl68uEVOZu3bt+fmzZuULVuWnj17snz58iy97e4v7169H3QsMTEx1K9f36KM+vXrExsbS1paGjExMdjY2PDcc8+Zn3d3d6dixYrExMRkW8+c6mMymShWrFiW43vQOYiMjCQ8PNziXAcGBpKens6JEyd48cUXKV26NGXLliUkJIRFixaZezr6+/vzwgsvUK1aNdq3b8/cuXO5cuVKjnVNSUnh+vXrFsudDOP+oRMRERERERHJt/4mk0R8++23DBw4kGHDhrFv3z4aNmxI8+bNs7Tv3LN161a6devG66+/zqFDh/juu+/YvXs3b7zxxqM4a9nKNw10mXujmUwmi3UmkwnAPDw1PT2doKAgoqKiLJbY2Fjz8MqgoCAuXbrE3Llz2blzJzt37gTg9u3bOWZnzsnsXg+9f/3rX9jb29O7d28aNWrEnfu+scyuvAdlZGRkmNfdc3/HyJw6SWa3X3ayO7eZj+9h5/qtt96yOM/79+8nNjaWcuXK4ezszN69e/nmm28oXrw4I0eOxN/fn6tXr2Jtbc26detYvXo1VapU4bPPPqNixYqcOHEi27pOmDABFxcXi2X1b4ceeowiIiIiIiIikj9NnTqV119/nTfeeIPKlSvzySef4O3tzaxZs7Ld/ueff8bHx4f+/ftTpkwZGjRowFtvvZVlPoBHKd800OVVQEAAhw4dwsfHh/Lly1ssjo6OXLp0iZiYGIYPH84LL7xA5cqVH9hzKy/s7e35xz/+wfTp09m0aRM7duzg4MGDf7i8KlWqsHXrVot127dvx9fXF2tra6pUqUJqaqq5gRHuDnE9evQolStXBqBAgQKkpT2enmb3znXm81y+fHkKFCgAgI2NDc2aNeOjjz7iwIEDxMfH8+OPPwJ3G/zq16/P6NGj2bdvHwUKFGD58uXZZg0dOpRr165ZLM2dqz6W4xIRERERERF5qhk4SUR2I+hSUlKyVPH27dtERkby0ksvWax/6aWX2L59e7aHVa9ePc6cOcOqVavIyMjg/PnzfP/997Rs2fKxnEZ4ihvo+vTpw+XLlwkODmbXrl0cP36ctWvX0qNHD9LS0nBzc8Pd3Z05c+Zw7NgxfvzxR8LC8ja+OTvh4eHMmzePX375hePHj/PVV19hb2//h++vBvDuu++yYcMGxo4dy9GjR1m4cCEzZswwT/RQoUIFWrduTc+ePdm6dSv79++na9eulChRgtatWwN37zV348YNNmzYQGJionmI6aMwZMgQduzYQZ8+fcy9FFesWEG/fv0AWLlyJdOnTycqKoqTJ0/y5Zdfkp6eTsWKFdm5cyfjx49nz549nDp1imXLlnHx4kVzw2JmdnZ2FCpUyGKxNRl3vyQRERERERERefSyG0E3YcKELNslJiaSlpZmvl3YPZ6enpw7dy7bsuvVq8eiRYvo2LEjBQoUoFixYri6uvLZZ589lmOBp7iBzsvLi23btpGWlkZgYCB+fn4MGDAAFxcXrKyssLKyYvHixURGRuLn58c777zD5MmT/3Suq6src+fOpX79+lSvXp0NGzbw3//+90/dZDAgIIAlS5awePFi/Pz8GDlyJGPGjCE0NNS8zYIFC6hZsyatWrWibt26ZGRksGrVKvPQ1Hr16vH222/TsWNHihQpwkcfffRnD9WsevXqbN68mdjYWBo2bMgzzzzDiBEjzPeqc3V1ZdmyZTz//PNUrlyZzz//nG+++YaqVatSqFAhfvrpJ1q0aIGvry/Dhw9nypQpNG/e/JHVT0RERERERET+XrIbQTd06NAct8/u1mA53fYrOjqa/v37M3LkSCIjI4mIiODEiRO8/fbbj/QYLOqXH2ZxFXkQzeL6dNEsrk+eZnF98jSL65OnWVyfPM3i+uRpFtcn72m9XtMsrk+eZnE1IPtpmcV1+UTDsu3bvp+r7W7fvo2DgwPfffcdbdu2Na8fMGAAUVFRbN6c9bUKCQnh1q1bfPfdd+Z1W7dupWHDhvz666/mDkeP0lPbg05ERERERERERPK3AgUKULNmTdatW2exft26ddSrVy/bfZKTk7Gysmwys7a++0X14+rnZvNYShURERERERERkfwtI93oGuRKWFgYISEh1KpVi7p16zJnzhxOnTplHrI6dOhQzp49y5dffglAUFAQPXv2ZNasWQQGBpKQkMDAgQN59tln8fLyeix1VAOdiIiIiIiIiIjkWx07duTSpUuMGTOGhIQE/Pz8WLVqlXnCzoSEBE6dOmXePjQ0lN9++40ZM2bw7rvv4urqyvPPP8+kSY/vtkZqoBMRERERERERkbxL/3v0oAPo3bs3vXv3zva58PDwLOv69etHv379HnOtfqd70ImIiIiIiIiIiBhIPegk32tb9bRx4Ye8DYvu9MpVw7JTjvxmWLaRM6nadh1iWLZTk2jDsjNOGpeduuEnw7KN5F/jnGHZ8w4bN8Okfw3jZtb0NywZphwpYVj2iIHGzdL87PGzhmUb+h32MuOijfxsMfI9puu1Jy91w1XDsp0mvWdYtqOB10yGMnDmXIfmlQzLFskrNdCJiIiIiIiIiEje/Y2GuP7VaYiriIiIiIiIiIiIgdSDTkRERERERERE8i4jw+ga5BvqQSe5smnTJkwmE1evXjW6KiIiIiIiIiIi+Yoa6ERERERERERERAz0l2+ga9KkCf369WPgwIG4ubnh6enJnDlzSEpKonv37jg7O1OuXDlWr15tsV90dDQtWrTAyckJT09PQkJCSExMND8fERFBgwYNcHV1xd3dnVatWhEXF2d+Pj4+HpPJxLJly2jatCkODg74+/uzY8eOB9b36tWrvPnmm3h6elKwYEH8/PxYuXKl+fmlS5dStWpV7Ozs8PHxYcqUKRb7+/j4MG7cOLp164aTkxOlS5fmP//5DxcvXqR169Y4OTlRrVo19uzZY94nPDwcV1dXfvjhB3x9fSlYsCAvvvgip0//PntpXFwcrVu3xtPTEycnJ2rXrs369estslNSUhg8eDDe3t7Y2dlRoUIF5s2bR3x8PE2bNgXAzc0Nk8lEaGio+fXp378/gwcPpnDhwhQrVoxRo0ZZlHvt2jXefPNNihYtSqFChXj++efZv3+/+fn9+/fTtGlTnJ2dKVSoEDVr1jQf38mTJwkKCsLNzQ1HR0eqVq3KqlWrHvgaiIiIiIiIiMgTkJ5u3JLP/OUb6AAWLlyIh4cHu3btol+/fvTq1Yv27dtTr1499u7dS2BgICEhISQnJwOQkJBA48aNqVGjBnv27CEiIoLz58/ToUMHc5lJSUmEhYWxe/duNmzYgJWVFW3btiU904s8bNgwBg0aRFRUFL6+vgQHB5OampptPdPT02nevDnbt2/n66+/Jjo6mokTJ2JtbQ1AZGQkHTp0oFOnThw8eJBRo0YxYsQIwsPDLcqZNm0a9evXZ9++fbRs2ZKQkBC6detG165d2bt3L+XLl6dbt25k3DfWOzk5mQ8//JCFCxeybds2rl+/TqdOnczP37hxgxYtWrB+/Xr27dtHYGAgQUFBnDp1yrxNt27dWLx4MdOnTycmJobPP/8cJycnvL29Wbp0KQBHjhwhISGBTz/91OL1cXR0ZOfOnXz00UeMGTOGdevWAZCRkUHLli05d+4cq1atIjIykoCAAF544QUuX74MQJcuXShZsiS7d+8mMjKS999/H1tbWwD69OlDSkoKP/30EwcPHmTSpEk4OTnl4rdGREREREREROTv4W8xSYS/vz/Dhw8HYOjQoUycOBEPDw969uwJwMiRI5k1axYHDhygTp06zJo1i4CAAMaPH28uY/78+Xh7e3P06FF8fX1p166dRca8efMoWrQo0dHR+Pn5mdcPGjSIli1bAjB69GiqVq3KsWPHqFSpUpZ6rl+/nl27dhETE4Ovry8AZcuWNT8/depUXnjhBUaMGAGAr68v0dHRTJ482dwjDaBFixa89dZbFsdWu3Zt2rdvD8CQIUOoW7cu58+fp1ixYgDcuXOHGTNm8NxzzwF3G80qV67Mrl27ePbZZ/H398ff39+cMW7cOJYvX86KFSvo27cvR48eZcmSJaxbt45mzZplqXvhwoUBKFq0KK6urhbHXb16dT744AMAKlSowIwZM9iwYQMvvvgiGzdu5ODBg1y4cAE7OzsAPv74Y3744Qe+//573nzzTU6dOsV7771nPqcVKlQwl33q1CnatWtHtWrVstRJRERERERERAyUD3uyGeVv0YOuevXq5p+tra1xd3c3N9gAeHp6AnDhwgXgbk+1jRs34uTkZF7uNf7cG8YaFxdH586dKVu2LIUKFaJMmTIAFj3KMmcXL17cIiezqKgoSpYsaW6cyywmJob69etbrKtfvz6xsbGkpaVlm3nv2B50vAA2NjbUqlXL/LhSpUq4uroSExMD3O0xOHjwYKpUqYKrqytOTk4cPnzYfLxRUVFYW1vTuHHjbOv+IPfXF+6ep/tfixs3buDu7m7xepw4ccL8WoSFhfHGG2/QrFkzJk6caDHUuH///owbN4769evzwQcfcODAgQfWJSUlhevXr1ssKfrAEBEREREREZG/sL9FA9294Y73mEwmi3UmkwnAPDw1PT2doKAgoqKiLJbY2FgaNWoEQFBQEJcuXWLu3Lns3LmTnTt3AnD79u0cszPnZGZvb//A48jIyDCXcf+6Bx3vve1zU4/MZd+/7r333mPp0qV8+OGHbNmyhaioKKpVq2Y+3ofV/UGye33ufy2KFy+e5bU4cuQI7733HgCjRo3i0KFDtGzZkh9//JEqVaqwfPlyAN544w2OHz9OSEgIBw8epFatWnz22Wc51mXChAm4uLhYLJ8eP5Xj9iIiIiIiIiLyB2WkG7fkM3+LBrq8CggI4NChQ/j4+FC+fHmLxdHRkUuXLhETE8Pw4cN54YUXqFy5MleuXPnTudWrV+fMmTMcPXo02+erVKnC1q1bLdZt374dX19f833q/qjU1FSLiSOOHDnC1atXzT0Ht2zZQmhoKG3btqVatWoUK1aM+Ph48/bVqlUjPT2dzZs3Z1t+gQIFACx6+uVGQEAA586dw8bGJstr4eHhYd7O19eXd955h7Vr1/LKK6+wYMEC83Pe3t68/fbbLFu2jHfffZe5c+fmmDd06FCuXbtmsQwoWypPdRYREREREREReZLyZQNdnz59uHz5MsHBwezatYvjx4+zdu1aevToQVpaGm5ubri7uzNnzhyOHTvGjz/+SFhY2J/Obdy4MY0aNaJdu3asW7eOEydOsHr1aiIiIgB499132bBhA2PHjuXo0aMsXLiQGTNmMGjQoD+dbWtrS79+/di5cyd79+6le/fu1KlTh2effRaA8uXLs2zZMqKioti/fz+dO3e26IHn4+PDa6+9Ro8ePfjhhx84ceIEmzZtYsmSJQCULl0ak8nEypUruXjxIjdu3MhVvZo1a0bdunVp06YNa9asIT4+nu3btzN8+HD27NnDzZs36du3L5s2beLkyZNs27aN3bt3U7lyZQAGDhzImjVrOHHiBHv37uXHH380P5cdOzs7ChUqZLHYWeXLX3MRERERERERySfyZcuFl5cX27ZtIy0tjcDAQPz8/BgwYAAuLi5YWVlhZWXF4sWLiYyMxM/Pj3feeYfJkyc/kuylS5dSu3ZtgoODqVKlCoMHDzb3OgsICGDJkiUsXrwYPz8/Ro4cyZgxYywmiPijHBwcGDJkCJ07d6Zu3brY29uzePFi8/PTpk3Dzc2NevXqERQURGBgIAEBARZlzJo1i1dffZXevXtTqVIlevbsSVJSEgAlSpRg9OjRvP/++3h6etK3b99c1ctkMrFq1SoaNWpEjx498PX1pVOnTsTHx+Pp6Ym1tTWXLl2iW7du+Pr60qFDB5o3b87o0aOBuz32+vTpQ+XKlXn55ZepWLEiM2fO/NPnS0RERERERET+pPR045Z8xpSR3U3Q5G8lPDycgQMHcvXqVaOr8peUGJj3iS8eleWHvA3L7vTKVcOyU478Zli2Q/OsMyw/KbZdhxiWnXYm2rDsjJPGZadu+MmwbCMZ+R7rf7iwYdnTK102LNtIU46UMCx7xEBnw7LTjp81LNtIi5e5Gpbdtuppw7KNpOu1J8+uonGfLbY93jYs28hrJiMlzfyfYdlG/m/gMHC2YdlP0s0vhxqWbd9tgmHZj4ON0RUQEREREREREZG/IfX5emTy5RBXERERERERERGRvws10OUDoaGhGt4qIiIiIiIiIvI3pSGuIiIiIiIiIiKSd/lwsgajqAediIiIiIiIiIiIgdSDTkRERERERERE8k496B4ZNdBJvmfk1Nplo64alm1dtoRh2Q5lDYsmefVhw7KdmkQblm1dsoph2WmGJUPKkf8Zlr0/qphh2c92NCya0kfsDcs28vM87fhZw7I5Yly0kYz8O2asJMOSE084GpZdqre3Ydm6XnvyTs08bVh26ReMu14zlTbues1IiSc2GZbtYeD/Bg4DDYuWvykNcRURERERERERETGQetCJiIiIiIiIiEjeZWiI66OiHnRiiNDQUNq0aWN0NUREREREREREDKcedGKITz/9lIyMDPPjJk2aUKNGDT755BPjKiUiIiIiIiIiuZaRnvHwjSRX1EAnhnBxcTG6CiIiIiIiIiIifwka4srd3lv9+vVj4MCBuLm54enpyZw5c0hKSqJ79+44OztTrlw5Vq9ebbFfdHQ0LVq0wMnJCU9PT0JCQkhMTDQ/HxERQYMGDXB1dcXd3Z1WrVoRFxdnfj4+Ph6TycSyZcto2rQpDg4O+Pv7s2PHjgfW9+rVq7z55pt4enpSsGBB/Pz8WLlypfn5pUuXUrVqVezs7PDx8WHKlCkW+/v4+DB+/Hh69OiBs7MzpUqVYs6cORbbnDlzhk6dOlG4cGEcHR2pVasWO3fuBCAuLo7WrVvj6emJk5MTtWvXZv369eZ9hw4dSp06dbLUu3r16nzwwQeA5RDX0NBQNm/ezKefforJZMJkMnHixAnKly/Pxx9/bFHGL7/8gpWVlcV5FBEREREREREDpKcbt+QzaqD7fwsXLsTDw4Ndu3bRr18/evXqRfv27alXrx579+4lMDCQkJAQkpOTAUhISKBx48bUqFGDPXv2EBERwfnz5+nQoYO5zKSkJMLCwti9ezcbNmzAysqKtm3bkp7pF2nYsGEMGjSIqKgofH19CQ4OJjU1Ndt6pqen07x5c7Zv387XX39NdHQ0EydOxNraGoDIyEg6dOhAp06dOHjwIKNGjWLEiBGEh4dblDNlyhRq1arFvn376N27N7169eLw4btTUN+4cYPGjRvz66+/smLFCvbv38/gwYPN9b5x4wYtWrRg/fr17Nu3j8DAQIKCgjh16hQAXbp0YefOnRaNaIcOHeLgwYN06dIlyzF9+umn1K1bl549e5KQkEBCQgKlSpWiR48eLFiwwGLb+fPn07BhQ8qVK/fQ11RERERERERE5O9AQ1z/n7+/P8OHDwfu9gCbOHEiHh4e9OzZE4CRI0cya9YsDhw4QJ06dZg1axYBAQGMHz/eXMb8+fPx9vbm6NGj+Pr60q5dO4uMefPmUbRoUaKjo/Hz8zOvHzRoEC1btgRg9OjRVK1alWPHjlGpUqUs9Vy/fj27du0iJiYGX19fAMqWLWt+furUqbzwwguMGDECAF9fX6Kjo5k8eTKhoaHm7Vq0aEHv3r0BGDJkCNOmTWPTpk1UqlSJf//731y8eJHdu3dTuHBhAMqXL29xrvz9/c2Px40bx/Lly1mxYgV9+/bFz8+P6tWr8+9//9tcj0WLFlG7dm1zne/n4uJCgQIFcHBwoFixYub13bt3Z+TIkezatYtnn32WO3fu8PXXXzN58uQsZYiIiIiIiIiI/F2pB93/q169uvlna2tr3N3dqVatmnmdp6cnABcuXADu9lTbuHEjTk5O5uVeg9q9nmNxcXF07tyZsmXLUqhQIcqUKQNg7mmWXXbx4sUtcjKLioqiZMmS2TZ0AcTExFC/fn2LdfXr1yc2Npa0tLRsM00mE8WKFTNnRkVF8cwzz5gb5zJLSkpi8ODBVKlSBVdXV5ycnDh8+LDFcXXp0oVFixYBkJGRwTfffJNt77kHKV68OC1btmT+/PkArFy5klu3btG+ffsc90lJSeH69esWS0pqWo7bi4iIiIiIiMgflJFu3JLPqIHu/9na2lo8NplMFutMJhOAeZhneno6QUFBREVFWSyxsbE0atQIgKCgIC5dusTcuXPZuXOn+R5ut2/fzjE7c05m9vb2DzyOjIwMcxn3r8vN8d7LfFjGe++9x9KlS/nwww/ZsmULUVFRVKtWzeK4OnfuzNGjR9m7dy/bt2/n9OnTdOrU6YHlZueNN95g8eLF3Lx5kwULFtCxY0ccHBxy3H7ChAm4uLhYLB+v35fnXBERERERERGRJ0VDXP+ggIAAli5dio+PDzY2WU/jpUuXiImJYfbs2TRs2BCArVu3/unc6tWrc+bMGfMw2syqVKmSJWf79u34+vqa71OXm4wvvviCy5cvZ9uLbsuWLYSGhtK2bVvg7j3p4uPjLbYpWbIkjRo1YtGiRdy8eZNmzZqZeyFmp0CBAhY9/O5p0aIFjo6OzJo1i9WrV/PTTz89sO5Dhw4lLCzMYl3anLActhYRERERERGRPyw9a4cg+WPUg+4P6tOnD5cvXyY4OJhdu3Zx/Phx1q5dS48ePUhLS8PNzQ13d3fmzJnDsWPH+PHHH7M0HP0RjRs3plGjRrRr145169Zx4sQJVq9eTUREBADvvvsuGzZsYOzYsRw9epSFCxcyY8YMBg0alOuM4OBgihUrRps2bdi2bRvHjx9n6dKl5tlly5cvz7Jly4iKimL//v107tw52x5/Xbp0YfHixXz33Xd07dr1gZk+Pj7s3LmT+Ph4EhMTzeVZW1sTGhrK0KFDKV++PHXr1n1gOXZ2dhQqVMhisbPJXcOkiIiIiIiIiIgR1ED3B3l5ebFt2zbS0tIIDAzEz8+PAQMG4OLigpWVFVZWVixevJjIyEj8/Px45513HtnkBkuXLqV27doEBwdTpUoVBg8ebO59FhAQwJIlS1i8eDF+fn6MHDmSMWPGWEwQ8TAFChRg7dq1FC1alBYtWlCtWjWLmWKnTZuGm5sb9erVIygoiMDAQAICArKU0759ey5dukRycjJt2rR5YOagQYOwtramSpUqFClSxOJ+dq+//jq3b9+mR48euT4GEREREREREZG/Cw1xBTZt2pRlXeYhm5D1Xm4VKlRg2bJlOZbbrFkzoqOjcyzDx8cnS5murq7Z3jPufoULFzZPnJCddu3aZZlB9n7ZHVtUVJTF49KlS/P9999nu7+Pjw8//vijxbo+ffpk2c7V1ZVbt25lW0Z4eLjFY19fX3MPvcwSEhKwsbGhW7du2T4vIiIiIiIiIgbI4f75kndqoJO/rJSUFE6fPs2IESPo0KHDA+9hJyIiIiIiIiLyd6UhrvKX9c0331CxYkWuXbvGRx99ZHR1REREREREROR+6enGLfmMGujkLys0NJS0tDQiIyMpUaKE0dUREREREREREXksNMRVRERERERERETy7iH30JfcUw86ERERERERERERA6kHneR7pjLlDEyPNCzZqkmQYdkZJ6MfvtFjYlfxrGHZRh53mmHJYF2yioHpxjlua2tYdt2yxg37P5lxwbBscDYs2fopPefJq437TE084WhYdomXjfsO+7hNIcOyy14z7j1WWtdrT5yR1y0eZQ4blp1xIs6wbCOZSj+d12tG/i3xMCxZ/q7UQCciIiIiIiIiInmXDydrMIqGuIqIiIiIiIiIiBhIDXTyWISHh+Pq6mp0NURERERERETkcUnPMG7JZ9RAJ49Fx44dOXr0aJ72adKkCQMHDnw8FRIRERERERER+YvSPejksbC3t8fe3t7oaoiIiIiIiIiI/OU90R50TZo0oV+/fgwcOBA3Nzc8PT2ZM2cOSUlJdO/eHWdnZ8qVK8fq1ast9ouOjqZFixY4OTnh6elJSEgIiYmJ5ucjIiJo0KABrq6uuLu706pVK+Lifp+dJz4+HpPJxLJly2jatCkODg74+/uzY8eOB9b36tWrvPnmm3h6elKwYEH8/PxYuXKl+fmlS5dStWpV7Ozs8PHxYcqUKRb7+/j4MH78eHr06IGzszOlSpVizpw5FtucOXOGTp06UbhwYRwdHalVqxY7d+4EIC4ujtatW+Pp6YmTkxO1a9dm/fr15n2HDh1KnTp1stS7evXqfPDBB+bHCxYsoHLlyhQsWJBKlSoxc+bMBx53kyZN6Nu3L3379jWf0+HDh5OR8XsX0itXrtCtWzfc3NxwcHCgefPmxMbGmp/PPMR11KhR1KhRg6+++gofHx9cXFzo1KkTv/32GwChoaFs3ryZTz/9FJPJhMlkIj4+nitXrtClSxeKFCmCvb09FSpUYMGCBQ+sv4iIiIiIiIg8ARnpxi35zBMf4rpw4UI8PDzYtWsX/fr1o1evXrRv35569eqxd+9eAgMDCQkJITk5GYCEhAQaN25MjRo12LNnDxEREZw/f54OHTqYy0xKSiIsLIzdu3ezYcMGrKysaNu2LemZZhMZNmwYgwYNIioqCl9fX4KDg0lNTc22nunp6TRv3pzt27fz9ddfEx0dzcSJE7G2tgYgMjKSDh060KlTJw4ePMioUaMYMWIE4eHhFuVMmTKFWrVqsW/fPnr37k2vXr04fPjutOI3btygcePG/Prrr6xYsYL9+/czePBgc71v3LhBixYtWL9+Pfv27SMwMJCgoCBOnToFQJcuXdi5c6dFY+ShQ4c4ePAgXbp0AWDu3LkMGzaMDz/8kJiYGMaPH8+IESNYuHDhQ18nGxsbdu7cyfTp05k2bRpffPGF+fnQ0FD27NnDihUr2LFjBxkZGbRo0YI7d+7kWGZcXBw//PADK1euZOXKlWzevJmJEycC8Omnn1K3bl169uxJQkICCQkJeHt7M2LECKKjo1m9ejUxMTHMmjULDw9NWC0iIiIiIiIi+ccTH+Lq7+/P8OHDgbs9wCZOnIiHhwc9e/YEYOTIkcyaNYsDBw5Qp04dZs2aRUBAAOPHjzeXMX/+fLy9vTl69Ci+vr60a9fOImPevHkULVqU6Oho/Pz8zOsHDRpEy5YtARg9ejRVq1bl2LFjVKpUKUs9169fz65du4iJicHX1xeAsmXLmp+fOnUqL7zwAiNGjADA19eX6OhoJk+eTGhoqHm7Fi1a0Lt3bwCGDBnCtGnT2LRpE5UqVeLf//43Fy9eZPfu3RQuXBiA8uXLW5wrf39/8+Nx48axfPlyVqxYQd++ffHz86N69er8+9//Ntdj0aJF1K5d21znsWPHMmXKFF555RUAypQpQ3R0NLNnz+a1117L8XXy9vZm2rRpmEwmKlasyMGDB5k2bRo9e/YkNjaWFStWsG3bNurVq2fO9fb25ocffqB9+/bZlpmenk54eDjOzs4AhISEsGHDBj788ENcXFwoUKAADg4OFCtWzLzPqVOneOaZZ6hVqxZwt1eiiIiIiIiIiPwF5MPJGozyxHvQVa9e3fyztbU17u7uVKtWzbzO09MTgAsXLgB3e6pt3LgRJycn83KvQe1ez7G4uDg6d+5M2bJlKVSoEGXKlAEw9zTLLrt48eIWOZlFRUVRsmRJc0NXZjExMdSvX99iXf369YmNjSUtLS3bTJPJRLFixcyZUVFRPPPMM+bGucySkpIYPHgwVapUwdXVFScnJw4fPmxxXF26dGHRokUAZGRk8M0335h7z128eJHTp0/z+uuvW5y/cePGWfS6y06dOnUwmUzmx3Xr1jUfW0xMDDY2Njz33HPm593d3alYsSIxMTE5lunj42NunIO7r0FO5/+eXr16sXjxYmrUqMHgwYPZvn37A7dPSUnh+vXrFkvKnex7SYqIiIiIiIiI/BU88R50tra2Fo9NJpPFunuNQveGeaanpxMUFMSkSZOylHWvkS0oKAhvb2/mzp2Ll5cX6enp+Pn5cfv27RyzM+dk9rAJDjIyMiwasO6tyyy7472X+bCM9957jzVr1vDxxx9Tvnx57O3tefXVVy2Oq3Pnzrz//vvs3buXmzdvcvr0aTp16mRxbHPnzrVoTAPMQ3X/iOyO8976zOfkfg86Fzlp3rw5J0+e5H//+x/r16/nhRdeoE+fPnz88cfZbj9hwgRGjx5tse6fHZsxPPjFB+aIiIiIiIiISN5kPOR/esm9v/wsrgEBASxduhQfHx9sbLJW99KlS8TExDB79mwaNmwIwNatW/90bvXq1Tlz5ox5GG1mVapUyZKzfft2fH19c934Vb16db744gsuX76cbS+6LVu2EBoaStu2bYG796SLj4+32KZkyZI0atSIRYsWcfPmTZo1a2buhejp6UmJEiU4fvy4uVddbv38889ZHleoUAFra2uqVKlCamoqO3fuNA9xvXTpEkePHqVy5cp5yrlfgQIFLHof3lOkSBFCQ0MJDQ2lYcOGvPfeezk20A0dOpSwsDCLdelrZ/zhOomIiIiIiIiIPG5PfIhrXvXp04fLly8THBzMrl27OH78OGvXrqVHjx6kpaXh5uaGu7s7c+bM4dixY/z4449ZGmj+iMaNG9OoUSPatWvHunXrOHHiBKtXryYiIgKAd999lw0bNjB27FiOHj3KwoULmTFjBoMGDcp1RnBwMMWKFaNNmzZs27aN48ePs3TpUvPssuXLl2fZsmVERUWxf/9+OnfunG2Psy5durB48WK+++47unbtavHcqFGjmDBhAp9++ilHjx7l4MGDLFiwgKlTpz6wbqdPnyYsLIwjR47wzTff8NlnnzFgwAAAKlSoQOvWrenZsydbt25l//79dO3alRIlStC6detcH39mPj4+7Ny5k/j4eBITE0lPT2fkyJH85z//4dixYxw6dIiVK1c+sBHQzs6OQoUKWSx2tn/5dmgREREREREReYr95RvovLy82LZtG2lpaQQGBuLn58eAAQNwcXHBysoKKysrFi9eTGRkJH5+frzzzjtMnjz5kWQvXbqU2rVrExwcTJUqVRg8eLC5h1dAQABLlixh8eLF+Pn5MXLkSMaMGWMxQcTDFChQgLVr11K0aFFatGhBtWrVLGaKnTZtGm5ubtSrV4+goCACAwMJCAjIUk779u25dOkSycnJtGnTxuK5N954gy+++ILw8HCqVatG48aNCQ8PN9+nLyfdunXj5s2bPPvss/Tp04d+/frx5ptvmp9fsGABNWvWpFWrVtStW5eMjAxWrVqVZRhrXgwaNMjcQ69IkSKcOnWKAgUKMHToUKpXr06jRo2wtrZm8eLFfzhDRERERERERB6R9AzjlnzGlJHTDcXkqdWkSRNq1KjBJ598YnRVHomb//nIsOztb0Yall1/TYhh2Rknow3LTt3wk2HZNi80MizbVLqKYdnWJY3Lvtalu2HZyw95G5bdZZCjYdlvTHvw5D6P0xfvFDUs20hGnvPplS4blp14wrjf8xIvG/cd9vgVhQzLbnYz661GnpR6c2oalq3rtScvaeb/DMt2aF7JsGxTmXLGZRt4rRjfYaZh2UaqdHSV0VV4IpI+7GZYtuOwLw3Lfhw09k9ERERERERERPIuQ5NEPCp/+SGuIiIiIiIiIiIi+Zl60EkWmzZtMroKIiIiIiIiIiJPDTXQiYiIiIiIiIhI3uXDyRqMoiGuIiIiIiIiIiIiBlIPOhERERERERERybt0TRLxqJgyMjLUH1HytaE+nY2ugiHKpj6dHWQ7vXLV6CoYIuXIb0ZXwRAuixYYln2tS3fDsvdHFTMse729tWHZzW6mGZZ93NbWsGwjNXS4bFj22WvOhmWXcHk6P1ONPOdGvseO2xj3z+XTer1m5GeLR5kkw7ITTzgalm2kUr29Dcs+NfO0YdmVjq4yLPtJShoVbFi246hvDMt+HNSDTkRERERERERE8k73oHtkns6vbERERERERERERP4i1EAnIiIiIiIiIiJiIDXQyRMVHx+PyWQiKirK6KqIiIiIiIiIyJ+RkW7cks+ogU7+km7fvm10FUREREREREREnointoGuSZMm9OvXj4EDB+Lm5oanpydz5swhKSmJ7t274+zsTLly5Vi9erXFftHR0bRo0QInJyc8PT0JCQkhMTHR/HxERAQNGjTA1dUVd3d3WrVqRVxcnPn5ez3Ili1bRtOmTXFwcMDf358dO3Y8sL6jRo2iVKlS2NnZ4eXlRf/+/QEYM2YM1apVy7J9zZo1GTlyJAChoaG0adOG8ePH4+npiaurK6NHjyY1NZX33nuPwoULU7JkSebPn5+lnkuWLKFhw4bY29tTu3Ztjh49yu7du6lVqxZOTk68/PLLXLx40SJ7wYIFVK5cmYIFC1KpUiVmzpxpfq5MmTIAPPPMM5hMJpo0aWJRxwkTJuDl5YWvr2+ujk1EREREREREDJKeYdySzzy1DXQACxcuxMPDg127dtGvXz969epF+/btqVevHnv37iUwMJCQkBCSk5MBSEhIoHHjxtSoUYM9e/YQERHB+fPn6dChg7nMpKQkwsLC2L17Nxs2bMDKyoq2bduSnm7Z/XLYsGEMGjSIqKgofH19CQ4OJjU1Ndt6fv/990ybNo3Zs2cTGxvLDz/8YG646tGjB9HR0ezevdu8/YEDB9i3bx+hoaHmdT/++CO//vorP/30E1OnTmXUqFG0atUKNzc3du7cydtvv83bb7/N6dOW01B/8MEHDB8+nL1792JjY0NwcDCDBw/m008/ZcuWLcTFxVk0ls2dO5dhw4bx4YcfEhMTw/jx4xkxYgQLFy4EYNeuXQCsX7+ehIQEli1bZt53w4YNxMTEsG7dOlauXJnrYxMRERERERER+TuzMboCRvL392f48OEADB06lIkTJ+Lh4UHPnj0BGDlyJLNmzeLAgQPUqVOHWbNmERAQwPjx481lzJ8/H29vb44ePYqvry/t2rWzyJg3bx5FixYlOjoaPz8/8/pBgwbRsmVLAEaPHk3VqlU5duwYlSpVylLPU6dOUaxYMZo1a4atrS2lSpXi2WefBaBkyZIEBgayYMECateuDdztwda4cWPKli1rLqNw4cJMnz4dKysrKlasyEcffURycjL//Oc/LY5/27ZtdOrUyaKegYGBAAwYMIDg4GA2bNhA/fr1AXj99dcJDw83bz927FimTJnCK6+8AtztMRcdHc3s2bN57bXXKFKkCADu7u4UK1bM4jgdHR354osvKFCggHldbo5NREREREREROTv7KnuQVe9enXzz9bW1ri7u1sMqfT09ATgwoULAERGRrJx40acnJzMy70GtXvDWOPi4ujcuTNly5alUKFC5iGdp06dyjG7ePHiFjmZtW/fnps3b1K2bFl69uzJ8uXLLXrb9ezZk2+++YZbt25x584dFi1aRI8ePSzKqFq1KlZWv7/cnp6eFsd67/gz1+H+et47H5nP0b19Ll68yOnTp3n99dctztG4ceMshvnmpFq1ahaNc7k9tvulpKRw/fp1iyU1I+2h2SIiIiIiIiKSNxnp6YYt+c1T3YPO1tbW4rHJZLJYZzKZAMzDU9PT0wkKCmLSpElZyrrXyBYUFIS3tzdz587Fy8uL9PR0/Pz8skx68KCczLy9vTly5Ajr1q1j/fr19O7dm8mTJ7N582ZsbW0JCgrCzs6O5cuXY2dnR0pKSpaefA871nvrMtchu3pmXnf/+YG7w1yfe+45i3Ksra2zPbb7OTo6ZlmXm2O734QJExg9erTFuvoufjR0zXovOxERERERERGRv4KnuoEurwICAli6dCk+Pj7Y2GQ9dZcuXSImJobZs2fTsGFDALZu3fpIsu3t7fnHP/7BP/7xD/r06UOlSpU4ePAgAQEB2NjY8Nprr7FgwQLs7Ozo1KkTDg4OjyQ3Lzw9PSlRogTHjx+nS5cu2W5zr4dcWlruerXl9diGDh1KWFiYxbqx1Xrm8ghEREREREREJNfy4WQNRlEDXR706dOHuXPnEhwczHvvvYeHhwfHjh1j8eLFzJ07Fzc3N9zd3ZkzZw7Fixfn1KlTvP/++386Nzw8nLS0NJ577jkcHBz46quvsLe3p3Tp0uZt3njjDSpXrgzAtm3b/nTmHzVq1Cj69+9PoUKFaN68OSkpKezZs4crV64QFhZG0aJFsbe3JyIigpIlS1KwYEFcXFweWGZejs3Ozg47OzuLdTamh/feExERERERERExylN9D7q88vLyYtu2baSlpREYGIifnx8DBgzAxcUFKysrrKysWLx4MZGRkfj5+fHOO+8wefLkP53r6urK3LlzqV+/PtWrV2fDhg3897//xd3d3bxNhQoVqFevHhUrVswyvPRJeuONN/jiiy8IDw+nWrVqNG7cmPDwcPO9+GxsbJg+fTqzZ8/Gy8uL1q1bP7TMv8qxiYiIiIiIiMh90jOMW/IZU0ZGRv47qqdQRkYGlSpV4q233soyxPPv7s8e21Cfzo+hVn99ZVOfzvb3Tq9cNboKhkg58pvRVTCEy6IFhmVf69LdsOz9UcUevtFjst7euF7JzW4aN+nP8Uz3bX1aNHS4bFj22WvOhmWXcHk6P1ONPOdGvseO2xh3o/Gn9XrNyM8WjzJJhmUnnsh6z+2nQane3oZln5p52rDsSkdXGZb9JN14r61h2U6TlxuW/ThoiGs+cOHCBb766ivOnj1L9+7G/cP4OOTnYxMRERERERERATXQ5Quenp54eHgwZ84c3NzcjK7OI5Wfj01ERERERETkby3DuF7I+Y0a6PKB/DxKOT8fm4iIiIiIiIgIqIFORERERERERET+iHw4WYNRns67koqIiIiIiIiIiPxFqAed5HuTf91sWPZzRSoalj354hHDso3UiWeMroIhjJzV08iZ99oaOJOqkTPIzqv5rmHZTVONm4FuXsGbhmWfTL1oWPY8x4KGZZd42cDvciOMm0nVyNlMjVR3qKth2YGDVxqWbeT12qKbxn22GGlv1cJGV8EQT+tMyWUnXDUsu96cJoZli+SVetCJiIiIiIiIiEieZaRnGLbk1cyZMylTpgwFCxakZs2abNmy5YHbp6SkMGzYMEqXLo2dnR3lypVj/vz5f/RUPZR60ImIiIiIiIiISL717bffMnDgQGbOnEn9+vWZPXs2zZs3Jzo6mlKlSmW7T4cOHTh//jzz5s2jfPnyXLhwgdTU1MdWRzXQiYiIiIiIiIhI3v1NJomYOnUqr7/+Om+88QYAn3zyCWvWrGHWrFlMmDAhy/YRERFs3ryZ48ePU7jw3WH5Pj4+j7WO+XqI67lz53jxxRdxdHTE1dXV6Opky8fHh08++SRP+4SGhtKmTRvz4yZNmjBw4MBHWq/HwWQy8cMPPxhdDRERERERERH5m0tJSeH69esWS0pKSpbtbt++TWRkJC+99JLF+pdeeont27dnW/aKFSuoVasWH330ESVKlMDX15dBgwZx8+bjuydyvu5BN23aNBISEoiKisLFxeWxZplMJpYvX27RcPakLFu2DFsDb/qZWwkJCbi5uRldDRERERERERF5FNLTDYueMGECo0ePtlj3wQcfMGrUKIt1iYmJpKWl4enpabHe09OTc+fOZVv28ePH2bp1KwULFmT58uUkJibSu3dvLl++/NjuQ5evG+ji4uKoWbMmFSpUyHGbO3fu/C0atx7kXnfLv7pixYybZVJERERERERE8o+hQ4cSFhZmsc7Ozi7H7U0mk8XjjIyMLOvuSU9Px2QysWjRInOHr6lTp/Lqq6/yr3/9C3t7+z9Z+6zyNMS1SZMm9OvXj4EDB+Lm5oanpydz5swhKSmJ7t274+zsTLly5Vi9erXFftHR0bRo0QInJyc8PT0JCQkhMTHR/HxERAQNGjTA1dUVd3d3WrVqRVxcnPn5+Ph4TCYTy5Yto2nTpjg4OODv78+OHTtyrKuPjw9Lly7lyy+/xGQyERoaCtx9QT7//HNat26No6Mj48aNIy0tjddff50yZcpgb29PxYoV+fTTT7OUOX/+fKpWrYqdnR3Fixenb9++5iyAtm3bYjKZzI/j4uJo3bo1np6eODk5Ubt2bdavX5+XU05aWhphYWHmczN48GAyMizHeGce4urj48O4cePo1q0bTk5OlC5dmv/85z9cvHiR1q1b4+TkRLVq1dizZ49FOdu3b6dRo0bY29vj7e1N//79SUpKsih3/Pjx9OjRA2dnZ0qVKsWcOXPMz9++fZu+fftSvHhxChYsiI+Pj8VY7sxDXA8ePMjzzz+Pvb097u7uvPnmm9y4ccP8/L2hvB9//DHFixfH3d2dPn36cOfOnTydQxERERERERHJX+zs7ChUqJDFkl0DnYeHB9bW1ll6y124cCFLr7p7ihcvTokSJSxGY1auXJmMjAzOnDnzaA/k/+X5HnQLFy7Ew8ODXbt20a9fP3r16kX79u2pV68ee/fuJTAwkJCQEJKTk4G7wxobN25MjRo12LNnDxEREZw/f54OHTqYy0xKSiIsLIzdu3ezYcMGrKysaNu2LemZukoOGzaMQYMGERUVha+vL8HBwTnOoLF7925efvllOnToQEJCgkWD2wcffEDr1q05ePAgPXr0ID09nZIlS7JkyRKio6MZOXIk//znP1myZIl5n1mzZtGnTx/efPNNDh48yIoVKyhfvrw5C2DBggUkJCSYH9+4cYMWLVqwfv169u3bR2BgIEFBQZw6dSrX53vKlCnMnz+fefPmsXXrVi5fvszy5csfut+0adOoX78++/bto2XLloSEhNCtWze6du3K3r17KV++PN26dTM39h08eJDAwEBeeeUVDhw4wLfffsvWrVvNjZD316dWrVrs27eP3r1706tXLw4fPgzA9OnTWbFiBUuWLOHIkSN8/fXXOd5EMTk5mZdffhk3Nzd2797Nd999x/r167Pkbdy4kbi4ODZu3MjChQsJDw8nPDw81+dPRERERERERB6T9AzjllwqUKAANWvWZN26dRbr161bR7169bLdp379+vz6668WnYiOHj2KlZUVJUuW/GPn6iHyPMTV39+f4cOHA3e7E06cOBEPDw969uwJwMiRI5k1axYHDhygTp06zJo1i4CAAMaPH28uY/78+Xh7e3P06FF8fX1p166dRca8efMoWrQo0dHR+Pn5mdcPGjSIli1bAjB69GiqVq3KsWPHqFSpUpZ6FilSBDs7O+zt7bMMrezcuTM9evSwWHf/uOUyZcqwfft2lixZYm5IHDduHO+++y4DBgwwb1e7dm1zFoCrq6tFlr+/P/7+/ubH48aNY/ny5axYsSJLQ1ROPvnkE4YOHWo+R59//jlr1qx56H4tWrTgrbfeAn5/TWrXrk379u0BGDJkCHXr1uX8+fMUK1aMyZMn07lzZ3NPvAoVKjB9+nQaN27MrFmzKFiwoLnc3r17m8uYNm0amzZtolKlSpw6dYoKFSrQoEEDTCYTpUuXzrF+ixYt4ubNm3z55Zc4OjoCMGPGDIKCgpg0aZK5FdvNzY0ZM2ZgbW1NpUqVaNmyJRs2bDD/vmWWkpKS5aaQD+q2KiIiIiIiIiL5W1hYGCEhIdSqVYu6desyZ84cTp06xdtvvw3cbd86e/YsX375JXC33Wjs2LF0796d0aNHk5iYyHvvvUePHj0ey/BW+AM96KpXr27+2draGnd3d6pVq2Zed69h5cKFCwBERkayceNGnJyczMu9BrV7w1jj4uLo3LkzZcuWpVChQpQpUwYgS0+z+7OLFy9ukZMXtWrVyrLu888/p1atWhQpUgQnJyfmzp1rzr9w4QK//vorL7zwQp5ykpKSGDx4MFWqVMHV1RUnJycOHz6c6x50165dIyEhgbp165rX2djYZFv/zO4/V/dek4e9TuHh4RavU2BgIOnp6Zw4cSLbck0mE8WKFTOXERoaSlRUFBUrVqR///6sXbs2x/rFxMTg7+9vbpyDuy3U6enpHDlyxLyuatWqWFtbmx8XL178ga/5hAkTcHFxsVgy0n/LcXsRERERERER+YP+Bj3oADp27Mgnn3zCmDFjqFGjBj/99BOrVq0ydyxKSEiwaKtxcnJi3bp1XL16lVq1atGlSxeCgoKYPn36Iz1998tzD7rMEyqYTCaLdfd6Kt0bnpqenm7uFZXZvUa2oKAgvL29mTt3Ll5eXqSnp+Pn58ft27dzzM6ckxf3NwoBLFmyhHfeeYcpU6ZQt25dnJ2dmTx5Mjt37gT4w62j7733HmvWrOHjjz+mfPny2Nvb8+qrr2Y5rschu3P1sNfprbfeon///lnKKlWqVLbl3ivnXhkBAQGcOHGC1atXs379ejp06ECzZs34/vvvs5T5oF5t969/UF52srtJpJt71h6WIiIiIiIiIvL06N27t3lEYGbZ3UqrUqVKWYbFPk6PfRbXgIAAli5dio+PDzY2WeMuXbpETEwMs2fPpmHDhgBs3br1cVfLwpYtW6hXr57FC3X/JBXOzs74+PiwYcMGmjZtmm0Ztra2pKWlZSk3NDSUtm3bAnfvSRcfH5/rerm4uFC8eHF+/vlnGjVqBEBqaiqRkZEEBATkupzcCAgI4NChQ+b76v1RhQoVomPHjnTs2JFXX32Vl19+mcuXL2eZabZKlSosXLiQpKQkc4Pptm3bsLKywtfX9w/n29nZZbkppIa3ioiIiIiIiMhfWZ6HuOZVnz59uHz5MsHBwezatYvjx4+zdu1aevToQVpaGm5ubri7uzNnzhyOHTvGjz/+mKUH1ONWvnx59uzZw5o1azh69CgjRowwT/Rwz6hRo5gyZQrTp08nNjaWvXv38tlnn5mfv9eAd+7cOa5cuWIud9myZURFRbF//346d+6c5x5/AwYMYOLEiSxfvpzDhw/Tu3dvrl69+qePObMhQ4awY8cO+vTpQ1RUFLGxsaxYsYJ+/frluoxp06axePFiDh8+zNGjR/nuu+8oVqwYrq6uWbbt0qULBQsW5LXXXuOXX35h48aN9OvXj5CQkBxnURERERERERGRv46MjAzDlvzmsTfQeXl5sW3bNtLS0ggMDMTPz48BAwbg4uKClZUVVlZWLF68mMjISPz8/HjnnXeYPHny466WhbfffptXXnmFjh078txzz3Hp0qUs3R5fe+01PvnkE2bOnEnVqlVp1aoVsbGx5uenTJnCunXr8Pb25plnngHuNli5ublRr149goKCCAwMzHPPt3fffZdu3boRGhpqHn57r0feo1S9enU2b95MbGwsDRs25JlnnmHEiBHmYci54eTkxKRJk6hVqxa1a9cmPj6eVatWYWWV9dfMwcGBNWvWcPnyZWrXrs2rr77KCy+8wIwZMx7lYYmIiIiIiIiI/OWZMvJjs6PIfWwKlDAs+7kiFQ3L3nnxyMM3yoeuvP2M0VUwxK5vHR++0WNyPNO9Ip+ktlVPG5btsmiBYdmv1XzXsOymacb9rm20TjIs+2TqNcOy5zkWNCy7xMuP/bvcHJ2NyPt9hh9Z9jVnw7KNVHeoq2HZhQavNCzbyOu10zcvGpZtpL21Cz98o3xof1Qxw7KNvF4re+eOYdn15tQ0LNu+9WDDsp+k6z1fMiy70NycJ6b8OzLuqktERERERERERETUQCciIiIiIiIiImKkxz6Lq4iIiIiIiIiI5EPpumvao6IedCIiIiIiIiIiIgZSDzoREREREREREcmzDPWge2Q0i6vke8mfvGVY9o4JVw3LNnImNiMZec79a5wzLNuuonEzDlqXNW6mZCNf73kFbxuWvTByimHZ4TVGGpYdGjXGsGwjGXnOGzpcNizbyJlUjfw8X37I27BsI3UZZNwM0Ys+Nm6GaCOP21SmnGHZYwZEGZbd7GaaYdklXH4zLNujjHG/51OOGHeteDLjpmHZ/z653LDsJ+la92aGZbssWG9Y9uOgHnQiIiIiIiIiIpJ36kH3yOgedCIiIiIiIiIiIgZSA10+ER4ejqurq/nxqFGjqFGjhmH1ERERERERERGR3FEDXT41aNAgNmzYYHQ1RERERERERCS/SjdwyWfUQPcXcvv2o7vhuJOTE+7u7o+svEchu+NLS0sjPT3v76w/up+IiIiIiIiIyF+NGugM1KRJE/r27UtYWBgeHh68+OKLAEydOpVq1arh6OiIt7c3vXv35saNGxb7hoeHU6pUKRwcHGjbti2XLl2yeD7zENcmTZowcOBAi23atGlDaGio+fHMmTOpUKECBQsWxNPTk1dfffWB9d++fTuNGjXC3t4eb29v+vfvT1LS77MD+fj4MG7cOEJDQ3FxcaFnz57mobgrV66kSpUq2NnZcfLkSa5cuUK3bt1wc3PDwcGB5s2bExsba3G82e0nIiIiIiIiIsbISM8wbMlv1EBnsIULF2JjY8O2bduYPXs2AFZWVkyfPp1ffvmFhQsX8uOPPzJ48GDzPjt37qRHjx707t2bqKgomjZtyrhx4/5UPfbs2UP//v0ZM2YMR44cISIigkaNGuW4/cGDBwkMDOSVV17hwIEDfPvtt2zdupW+fftabDd58mT8/PyIjIxkxIgRACQnJzNhwgS++OILDh06RNGiRQkNDWXPnj2sWLGCHTt2kJGRQYsWLbhz5465rOz2ExERERERERH5u7MxugJPu/Lly/PRRx9ZrLu/p1uZMmUYO3YsvXr1YubMmQB8+umnBAYG8v777wPg6+vL9u3biYiI+MP1OHXqFI6OjrRq1QpnZ2dKly7NM888k+P2kydPpnPnzua6VqhQgenTp9O4cWNmzZpFwYIFAXj++ecZNGiQeb+tW7dy584dZs6cib+/PwCxsbGsWLGCbdu2Ua9ePQAWLVqEt7c3P/zwA+3btwfIsp+IiIiIiIiISH6gHnQGq1WrVpZ1Gzdu5MUXX6REiRI4OzvTrVs3Ll26ZB4+GhMTQ926dS32yfw4r1588UVKly5N2bJlCQkJYdGiRSQnJ+e4fWRkJOHh4Tg5OZmXwMBA0tPTOXHixAOPr0CBAlSvXt38OCYmBhsbG5577jnzOnd3dypWrEhMTEyO+2UnJSWF69evWywpqWm5OgciIiIiIiIikgfpGcYt+Ywa6Azm6Oho8fjkyZO0aNECPz8/li5dSmRkJP/6178AzMM9MzLy/otoZWWVZb/7h486Ozuzd+9evvnmG4oXL87IkSPx9/fn6tWr2ZaXnp7OW2+9RVRUlHnZv38/sbGxlCtXLsfjA7C3t8dkMpkf53Q8GRkZFttl3i87EyZM+D/27j2u57P/A/jr2+nb+SCpkA4qSpFkG1E5bDnMMkMOQ4twk5xZI8uxhZCZwzJl2GiY28xyWqUDRhOmbpKSm8yMOYRO3+/vD78+t68OtMVFXs89Po/b9/pc1/W+rk+tdb9d1+eCkZGRyrX04Mka2xARERERERERicQE3UvmxIkTKCsrQ1RUFN566y04Ojri6tWrKnWcnZ1x9OhRlbInPz/JzMwMhYWF0ufy8nL89ttvKnU0NDTQvXt3LF68GKdPn0Z+fj5+/vnnKvtzd3fH2bNnYW9vX+nS0tKqzZTh7OyMsrIyHDt2TCr7888/cf78eTg5OdWqr9DQUNy+fVvlmta9+q26RERERERERPQ3KQRe9QzfQfeSad68OcrKyvD555+jT58+SEtLw9q1a1XqhISEoGPHjli8eDH69u2L/fv3P/X9c127dsWUKVPw448/onnz5li+fLnK6rg9e/bg4sWL8PLygomJCfbu3QuFQoEWLVpU2d/MmTPx1ltvYfz48QgKCoKenh6ys7Nx4MABfP7557Was4ODA/z8/BAUFIR169bBwMAAH3/8MZo0aQI/P79a9SWXyyGXy1XK7muo16oPIiIiIiIiIqIXiSvoXjJubm5YtmwZIiMj4eLigi1btiAiIkKlzltvvYX169fj888/h5ubG/bv34/Zs2fX2G9gYCBGjBiB4cOHw9vbG7a2tujSpYt039jYGDt37kTXrl3h5OSEtWvX4ttvv0WrVq2q7K9169ZITk5GTk4OOnfujLZt2yIsLAyWlpZ/a96xsbFo164d3n33XXTo0AFKpRJ79+6Fpqbm3+qPiIiIiIiIiJ4vpUIp7KpvZMq/80IzolfI/RVjhMU+EvGXsNgdQo2FxRZJ5DNv43ZNWGx5CwNhsdXtmgiLLfLr/ZV2ibDYGzOihMWOc5sjLHZA5jxhsUUS+cw7694UFvvKbXE/10T+PP/+rJWw2CINnVb5vcUvypalRcJii5y3zLb50ys9J/MmZgqL3f2BuAPkmhjdFRa7oa247/Ooc+J+V7ykfCAs9jeXvhcW+0W6NcBHWGyT75KExX4euIKOiIiIiIiIiIhIIL6DjoiIiIiIiIiIaq8eHtYgClfQERERERERERERCcQVdEREREREREREVGv18bAGUbiCjoiIiIiIiIiISCCe4kr1nmeTrsJif6WnLSz2yKKHwmKL5KVpISy2yFOirGU6wmK/rvO2K3s9/45L5EmqIk8zvagh7gUrI7T+EhY75X4DYbFfV6/ryblzNP4QFpu/rxHVX9YaRsJivy6nuN5831tY7AbfJwuL/TxwiysREREREREREdUeD4moM6/nX/8TERERERERERG9JLiCjoiIiIiIiIiIak3JFXR1hivoiIiIiIiIiIiIBGKC7jny8fHBpEmTXpl+iYiIiIiIiIiemULgVc8wQfcaKy0tfWGxSkpK6nQML3LsRERERERERETPExN0z0lAQACSk5MRHR0NmUwGmUyG/Px8AEBWVhZ69eoFfX19mJubY9iwYbhx4wYAICkpCVpaWkhJSZH6ioqKQsOGDVFYWFhtv3FxcTA2NlYZw65duyCTyaTP4eHhcHNzw4YNG2BnZwe5XA6lUonbt29j9OjRaNSoEQwNDdG1a1ecOnWqxvlduXIF/v7+MDExgampKfz8/KT5Vcy/b9++iIiIQOPGjeHo6Ij8/HzIZDLEx8fDx8cH2tra2Lx5MxQKBebNm4emTZtCLpfDzc0NCQkJUl/VtSMiIiIiIiIiqg+YoHtOoqOj0aFDBwQFBaGwsBCFhYWwsrJCYWEhvL294ebmhhMnTiAhIQG///47Bg4cCOB/21eHDRuG27dv49SpU5g1axZiYmJgaWlZbb/P6sKFC4iPj8eOHTuQmZkJAOjduzeuXbuGvXv3IiMjA+7u7ujWrRtu3rxZZR/3799Hly5doK+vj8OHDyM1NRX6+vro0aOHykq5Q4cOITs7GwcOHMCePXuk8pkzZyIkJATZ2dnw9fVFdHQ0oqKisHTpUpw+fRq+vr547733kJOToxL3yXZEREREREREJI5SIe6qb3iK63NiZGQELS0t6OrqwsLCQipfs2YN3N3dsWjRIqlsw4YNsLKywvnz5+Ho6IgFCxbg4MGDGD16NM6ePYthw4bh/fffr7HfZ1VSUoJNmzbBzMwMAPDzzz/jzJkzuH79OuRyOQBg6dKl2LVrF7Zv347Ro0dX6mPr1q1QU1PD+vXrpRV6sbGxMDY2RlJSEt555x0AgJ6eHtavXw8tLS0AkFbYTZo0Cf369ZP6W7p0KWbOnIlBgwYBACIjI5GYmIgVK1bgiy++kOo92Y6IiIiIiIiIqD5ggu4Fy8jIQGJiIvT19Svdy83NhaOjI7S0tLB582a0bt0a1tbWWLFiRZ3Ft7a2lpJzFeO5d+8eTE1NVeo9ePAAubm51c7hwoULMDAwUCl/+PChShtXV1cpOfc4Dw8P6c937tzB1atX4enpqVLH09Oz0jbbx9tVp7i4GMXFxSplCqUCajIuFiUiIiIiIiKqU/VwJZsoTNC9YAqFAn369EFkZGSle5aWltKf09PTAQA3b97EzZs3oaenV2O/ampqUCqVKmVVHaTwZD8KhQKWlpZISkqqVPfJd9o93qZdu3bYsmVLpXuPJ/+qG3NV5Y+/Kw8AlEplpbKnPQMAiIiIwNy5c1XKmurboJmh7VPbEhERERERERGJwATdc6SlpYXy8nKVMnd3d+zYsQM2NjbQ0Kj68efm5mLy5MmIiYlBfHw8hg8fjkOHDkFNTa3afs3MzHD37l0UFRVJiayKd8zVxN3dHdeuXYOGhgZsbGyeaV7u7u7Ytm2bdKjEP2FoaIjGjRsjNTUVXl5eUnl6ejreeOONWvcXGhqKKVOmqJT5tnzvH42RiIiIiIiIiOh54r6/58jGxgbHjh1Dfn4+bty4AYVCgfHjx+PmzZsYPHgwfvnlF1y8eBH79+9HYGAgysvLUV5ejmHDhuGdd97BRx99hNjYWPz222+Iioqqsd8333wTurq6+OSTT3DhwgV88803iIuLe+oYu3fvjg4dOqBv377Yt28f8vPzkZ6ejtmzZ+PEiRNVthk6dCgaNmwIPz8/pKSkIC8vD8nJyZg4cSL++9//1vo5TZ8+HZGRkdi2bRvOnTuHjz/+GJmZmZg4cWKt+5LL5TA0NFS5uL2ViIiIiIiIqO7xkIi6w8zFczRt2jSoq6vD2dkZZmZmKCgoQOPGjZGWloby8nL4+vrCxcUFEydOhJGREdTU1LBw4ULk5+fjyy+/BABYWFhg/fr1mD17trQirqp+GzRogM2bN2Pv3r1wdXXFt99+i/Dw8KeOUSaTYe/evfDy8kJgYCAcHR0xaNAg5Ofnw9zcvMo2urq6OHz4MJo1a4Z+/frByckJgYGBePDgwd9aURcSEoKpU6di6tSpcHV1RUJCAnbv3g0HB4da90VERERERERE9KqRKZ98cRlRPePZpKuw2F/paQuLPbLoobDYInlp1v5047pySflAWGxrmY6w2K/rvO3KXs+/4wrInCcsdpzbHGGxL2qI+2vaEVp/CYudcr+BsNivq866N4XFvnLb4OmVnpM5Gn8Ii83f14jqL2sNI2Gxv7n0vbDYL9L1bt7CYjc6lCws9vPwev6/CyIiIiIiIiIiopcED4kgIiIiIiIiIqJaq4/vghOFK+iIiIiIiIiIiIgEYoKOiIiIiIiIiIhIIG5xJSIiIiIiIiKi2lPKRI+g3mCCjuq9A9MdhMU+EvGXsNgHQq2ExRZJ5DNv4ybu5D3dni2FxQbEnfon8uv9lba402s3ZkQJiy3yJFWRJ8iW/zdLWOxN724VFlvkiaIiNekhbpPJL9vE/Uy9qKkpLPaBaeJ+X9uytEhY7APT9YTFFmn+irvCYnd/UC4sdhMjcfMW+XNt0W5DYbGJXiVM0BERERERERERUa3xkIi6w3fQERERERERERERCcQE3QsWEBCAvn37ih4GERERERERERG9JLjF9QWLjo6GUql87nF8fHzg5uaGFStWPPdYRERERERERPT6USp4SERd4Qq6F6S8vBwKhQJGRkYwNjYWPZxnVlJSUmd9lZaW1ln/dTkuIiIiIiIiIiKRmKCrgo+PD4KDgxEcHAxjY2OYmppi9uzZKivfSkpKMGPGDDRp0gR6enp48803kZSUJN2Pi4uDsbEx9uzZA2dnZ8jlcly6dKnSFlcfHx9MmDABkyZNgomJCczNzfHll1+iqKgIH330EQwMDNC8eXP89NNPKmPMyspCr169oK+vD3NzcwwbNgw3btwA8GgbbXJyMqKjoyGTySCTyZCfn//Udo/PfcqUKWjYsCHefvvtap9TbGwsnJycoK2tjZYtW2L16tXSvfz8fMhkMsTHx8PHxwfa2trYvHmzNP+IiAg0btwYjo6OAIAzZ86ga9eu0NHRgampKUaPHo179+5J/VXXjoiIiIiIiIjEUCrEXfUNE3TV2LhxIzQ0NHDs2DGsXLkSy5cvx/r166X7H330EdLS0rB161acPn0aAwYMQI8ePZCTkyPVuX//PiIiIrB+/XqcPXsWjRo1qjZWw4YN8csvv2DChAn417/+hQEDBqBjx4749ddf4evri2HDhuH+/fsAgMLCQnh7e8PNzQ0nTpxAQkICfv/9dwwcOBDAo220HTp0QFBQEAoLC1FYWAgrK6untnty7mlpaVi3bl2VY46JicGsWbOwcOFCZGdnY9GiRQgLC8PGjRtV6s2cORMhISHIzs6Gr68vAODQoUPIzs7GgQMHsGfPHty/fx89evSAiYkJjh8/ju+++w4HDx5EcHCwSl9PtiMiIiIiIiIiqg/4DrpqWFlZYfny5ZDJZGjRogXOnDmD5cuXIygoCLm5ufj222/x3//+F40bNwYATJs2DQkJCYiNjcWiRYsAPNrSuXr1arRp06bGWG3atMHs2bMBAKGhofjss8/QsGFDBAUFAQDmzJmDNWvW4PTp03jrrbewZs0auLu7S3EAYMOGDbCyssL58+fh6OgILS0t6OrqwsLCQqrzLO0AwN7eHosXL65xzPPnz0dUVBT69esHALC1tUVWVhbWrVuHESNGSPUmTZok1amgp6eH9evXQ0tLC8CjZN+DBw/w9ddfQ09PDwCwatUq9OnTB5GRkTA3N6+yHRERERERERGJo1TyHXR1hQm6arz11luQyf73jdahQwdERUWhvLwcv/76K5RKZaVtlsXFxTA1NZU+a2lpoXXr1k+N9XgddXV1mJqawtXVVSqrSFBdv34dAJCRkYHExETo6+tX6is3N7fa7Z/P2s7Dw6PG8f7xxx+4fPkyRo4cKSURAaCsrAxGRkYqdavqy9XVVSXJlp2djTZt2kjJOQDw9PSEQqHAuXPnpPk/2a4qxcXFKC4uVikrLyuHXEO9xnZERERERERERKIwQfc3KBQKqKurIyMjA+rqqomfx5NfOjo6Kkm+6mhqaqp8lslkKmUVfSgUCul/K1aXPcnS0rLGcT9Lu8cTZdX1Azxa+fbmm2+q3HvyeVTV15NlSqWy2uf0ePnTxgUAERERmDt3rkrZJ77umNWj5qQjEREREREREZEoTNBV4+jRo5U+Ozg4QF1dHW3btkV5eTmuX7+Ozp07v/Cxubu7Y8eOHbCxsYGGRtVfQi0tLZSXl9e63bMwNzdHkyZNcPHiRQwdOvRv91PB2dkZGzduRFFRkZSES0tLg5qaWq0PgwgNDcWUKVNUysq/nFJNbSIiIiIiIiL6u+rjYQ2i8JCIaly+fBlTpkzBuXPn8O233+Lzzz/HxIkTAQCOjo4YOnQohg8fjp07dyIvLw/Hjx9HZGQk9u7d+9zHNn78eNy8eRODBw/GL7/8gosXL2L//v0IDAyUknI2NjY4duwY8vPzcePGDSgUimdq96zCw8MRERGB6OhonD9/HmfOnEFsbCyWLVtW6/kMHToU2traGDFiBH777TckJiZiwoQJGDZsmLS99VnJ5XIYGhqqXNzeSkREREREREQvMyboqjF8+HA8ePAAb7zxBsaPH48JEyZg9OjR0v3Y2FgMHz4cU6dORYsWLfDee+/h2LFjsLKyeu5ja9y4MdLS0lBeXg5fX1+4uLhg4sSJMDIygpraoy/ptGnToK6uDmdnZ5iZmaGgoOCZ2j2rUaNGYf369YiLi4Orqyu8vb0RFxcHW1vbWs9HV1cX+/btw82bN9G+fXv0798f3bp1w6pVq2rdFxERERERERG9GEqFTNhV33CLazU0NTWxYsUKrFmzptr7c+fOrfS+swoBAQEICAioVB4XF6fyOSkpqVKd/Pz8SmVKpVLls4ODA3bu3FllbODRKr8jR45UKn9au6rGU50hQ4ZgyJAhVd6zsbGpNGag8vwruLq64ueff642VnXtiIiIiIiIiIhedVxBR0REREREREREJBBX0BERERERERERUa1VsXGO/iYm6KpQm22eRERERERERERE/wQTdEREREREREREVGv18bAGUfgOOiIiIiIiIiIiIoG4go6I6pzMtrmw2Bc1TwuL3UZYZKD84hVhsdXtmgiLfVFTU1jsS2V/CIst0kUNhbDY5f/NEhZbvamzsNgin7ndbQNhsYVKuCss9EEddWGxAXHfayJ/dwDE/e4gdt4iZYoewGun+Jy4n2uAobDIl5QPhMV+XXAFXd3hCjoiIiIiIiIiIiKBmKAjIiIiIiIiIiISiFtciYiIiIiIiIio1pRK0SOoP7iCjoiIiIiIiIiISCCuoCMiIiIiIiIiolrjIRF1hyvo6rnS0lLRQ5CUlJRUWf53x/gyzY2IiIiIiIiI6O9igu4VkpCQgE6dOsHY2BimpqZ49913kZubK93Pz8+HTCZDfHw8fHx8oK2tjc2bNwMAYmNj4eTkBG1tbbRs2RKrV69W6XvmzJlwdHSErq4u7OzsEBYW9tQE2JUrV+Dv7w8TExOYmprCz88P+fn50v2AgAD07dsXERERaNy4MRwdHasdo0KhwLx589C0aVPI5XK4ubkhISHhmeZGRERERERERPQqY4LuFVJUVIQpU6bg+PHjOHToENTU1PD+++9DoVCo1Js5cyZCQkKQnZ0NX19fxMTEYNasWVi4cCGys7OxaNEihIWFYePGjVIbAwMDxMXFISsrC9HR0YiJicHy5curHcv9+/fRpUsX6Ovr4/Dhw0hNTYW+vj569OihslLu0KFDyM7OxoEDB7Bnz55qxxgdHY2oqCgsXboUp0+fhq+vL9577z3k5OTUODciIiIiIiIiEkOplAm76hu+g+4V8sEHH6h8/uqrr9CoUSNkZWXBxcVFKp80aRL69esnfZ4/fz6ioqKkMltbW2RlZWHdunUYMWIEAGD27NlSfRsbG0ydOhXbtm3DjBkzqhzL1q1boaamhvXr10Mme/QvRmxsLIyNjZGUlIR33nkHAKCnp4f169dDS0sLAKQVdk+OcenSpZg5cyYGDRoEAIiMjERiYiJWrFiBL774otq5ERERERERERG96pige4Xk5uYiLCwMR48exY0bN6SVcwUFBSoJOg8PD+nPf/zxBy5fvoyRI0ciKChIKi8rK4ORkZH0efv27VixYgUuXLiAe/fuoaysDIaGhtWOJSMjAxcuXICBgYFK+cOHD1W23bq6ukrJucc9PsY7d+7g6tWr8PT0VKnj6emJU6dOVduuKsXFxSguLlYpKy8rh1xDvcZ2RERERERERFQ7SsXT69CzYYLuFdKnTx9YWVkhJiYGjRs3hkKhgIuLS6XDF/T09KQ/VyTxYmJi8Oabb6rUU1d/lLQ6evQoBg0ahLlz58LX1xdGRkbYunUroqKiqh2LQqFAu3btsGXLlkr3zMzMqhxLdWOsULESr4JSqaxUVl1/FSIiIjB37lyVsk983TGrR82JPSIiIiIiIiIiUZige0X8+eefyM7Oxrp169C5c2cAQGpq6lPbmZubo0mTJrh48SKGDh1aZZ20tDRYW1tj1qxZUtmlS5dq7Nfd3R3btm1Do0aNalxp9ywMDQ3RuHFjpKamwsvLSypPT0/HG2+8Uau+QkNDMWXKFJWy8i+nVFObiIiIiIiIiEg8HhLxiqg4KfXLL7/EhQsX8PPPP1dKRFUnPDwcERERiI6Oxvnz53HmzBnExsZi2bJlAAB7e3sUFBRg69atyM3NxcqVK/H999/X2OfQoUPRsGFD+Pn5ISUlBXl5eUhOTsbEiRPx3//+t9bzmz59OiIjI7Ft2zacO3cOH3/8MTIzMzFx4sRa9SOXy2FoaKhycXsrERERERERUd1TKGXCrtpavXo1bG1toa2tjXbt2iElJeWZ2qWlpUFDQwNubm61jlkbTNC9ItTU1LB161ZkZGTAxcUFkydPxpIlS56p7ahRo7B+/XrExcXB1dUV3t7eiIuLg62tLQDAz88PkydPRnBwMNzc3JCeno6wsLAa+9TV1cXhw4fRrFkz9OvXD05OTggMDMSDBw/+1oq6kJAQTJ06FVOnToWrqysSEhKwe/duODg41LovIiIiIiIiIqIK27Ztw6RJkzBr1iycPHkSnTt3Rs+ePVFQUFBju9u3b2P48OHo1q3bcx8jt7i+Qrp3746srCyVMqVSKf3ZxsZG5fPjhgwZgiFDhlTb9+LFi7F48WKVskmTJtU4HgsLC2zcuLHa+3FxcZXKqhujmpoa5syZgzlz5lTZV01zIyIiIiIiIqIXT/k3VrKJsGzZMowcORKjRo0CAKxYsQL79u3DmjVrEBERUW27MWPGYMiQIVBXV8euXbue6xi5go6IiIiIiIiIiOqlkpISZGRk4J133lEpf+edd5Cenl5tu9jYWOTm5uLTTz993kMEwBV0RERERERERET0NygV4lbQFRcXo7i4WKVMLpdDLperlN24cQPl5eUwNzdXKTc3N8e1a9eq7DsnJwcff/wxUlJSoKHxYlJnXEFHRERERERERESvlIiICBgZGalcNW1XlclUk4lKpbJSGQCUl5djyJAhmDt3LhwdHet83NXhCjoiIiIiIiIiInqlhIaGYsqUKSplT66eA4CGDRtCXV290mq569evV1pVBwB3797FiRMncPLkSQQHBwMAFAoFlEolNDQ0sH//fnTt2rUOZ/IIE3RERERERERERFRrIs9yrGo7a1W0tLTQrl07HDhwAO+//75UfuDAAfj5+VWqb2hoiDNnzqiUrV69Gj///DO2b98OW1vbfz74KsiUPBqT6rlmDVyFxe5kYC8sdurdC8JiixSr6Sws9kEddWGxRbqkfCAsdpdyPWGxO+veFBY75X4DYbFf13lf1FAIiz3vxAJhsQ+3ChUW+3XVxOiu6CEI8c6Ny8JiW+mYCYt9+cEfwmKLJPL3tTkar+czf115aVoIix2R/42w2C9StkMvYbGdcvY+c91t27Zh2LBhWLt2LTp06IAvv/wSMTExOHv2LKytrREaGoorV67g66+/rrJ9eHg4du3ahczMzDoafWVcQUdERERERERERLUm8pCI2vD398eff/6JefPmobCwEC4uLti7dy+sra0BAIWFhSgoKBA6RiboiIiIiIiIiIioXhs3bhzGjRtX5b24uLga24aHhyM8PLzuB/UYnuL6EggICEDfvn1FD4OIiIiIiIiIiARggu4Fys/Ph0wme657lomIiIiIiIiIXgSFUibsqm+YoKMXorS0tFblf7c/IiIiIiIiIqJXTb1N0G3fvh2urq7Q0dGBqakpunfvjqKiIgD/21K6aNEimJubw9jYGHPnzkVZWRmmT5+OBg0aoGnTptiwYYNKn2fOnEHXrl2lPkePHo179+5J9xUKBebNm4emTZtCLpfDzc0NCQkJ0v2Ko3jbtm0LmUwGHx8flf6XLl0KS0tLmJqaYvz48SpJKBsbGyxatAiBgYEwMDBAs2bN8OWXX6q0v3LlCvz9/WFiYgJTU1P4+fkhPz9fup+UlIQ33ngDenp6MDY2hqenJy5dugQAOHXqFLp06QIDAwMYGhqiXbt2OHHiRLXP9/bt2xg9ejQaNWoEQ0NDdO3aFadOnZLuh4eHw83NDRs2bICdnR3kcjmUSiVkMhnWrl0LPz8/6OnpYcGCRyfTrVmzBs2bN4eWlhZatGiBTZs2qcSrrh0RERERERERiaFUyoRd9U29TNAVFhZi8ODBCAwMRHZ2NpKSktCvXz8olUqpzs8//4yrV6/i8OHDWLZsGcLDw/Huu+/CxMQEx44dw9ixYzF27FhcvvzoyPf79++jR48eMDExwfHjx/Hdd9/h4MGDCA4OlvqMjo5GVFQUli5ditOnT8PX1xfvvfcecnJyAAC//PILAODgwYMoLCzEzp07pbaJiYnIzc1FYmIiNm7ciLi4uEovKYyKioKHhwdOnjyJcePG4V//+hf+85//SOPr0qUL9PX1cfjwYaSmpkJfXx89evRASUkJysrK0LdvX3h7e+P06dM4cuQIRo8eDZns0Tf10KFD0bRpUxw/fhwZGRn4+OOPoampWeXzVSqV6N27N65du4a9e/ciIyMD7u7u6NatG27evCnVu3DhAuLj47Fjxw6Vbb2ffvop/Pz8cObMGQQGBuL777/HxIkTMXXqVPz2228YM2YMPvroIyQmJqrEfbIdEREREREREVF9UC9PcS0sLERZWRn69esnHZnr6uqqUqdBgwZYuXIl1NTU0KJFCyxevBj379/HJ598AgAIDQ3FZ599hrS0NAwaNAhbtmzBgwcP8PXXX0NPTw8AsGrVKvTp0weRkZEwNzfH0qVLMXPmTAwaNAgAEBkZicTERKxYsQJffPEFzMzMAACmpqawsLBQGY+JiQlWrVoFdXV1tGzZEr1798ahQ4cQFBQk1enVq5d04sjMmTOxfPlyJCUloWXLlti6dSvU1NSwfv16KekWGxsLY2NjJCUlwcPDA7dv38a7776L5s2bAwCcnJykvgsKCjB9+nS0bNkSAODg4FDt801MTMSZM2dw/fp1yOVyAI9W/+3atQvbt2/H6NGjAQAlJSXYtGmTNO8KQ4YMUUmwDRkyBAEBAdLcpkyZgqNHj2Lp0qXo0qVLte2IiIiIiIiISJzH1kHRP1QvV9C1adMG3bp1g6urKwYMGICYmBjcunVLpU6rVq2gpva/6Zubm6sk8dTV1WFqaorr168DALKzs9GmTRspOQcAnp6eUCgUOHfuHO7cuYOrV6/C09NTJY6npyeys7OfOuZWrVpBXV1d+mxpaSnFrtC6dWvpzzKZDBYWFlKdjIwMXLhwAQYGBtDX14e+vj4aNGiAhw8fIjc3Fw0aNEBAQAB8fX3Rp08fREdHo7CwUOpvypQpGDVqFLp3747PPvsMubm51Y41IyMD9+7dg6mpqRRLX18feXl5Ku2sra0rJecAwMPDQ+Vzdnb2Mz23J9tVpbi4GHfu3FG5lErFU9sREREREREREYlSLxN06urqOHDgAH766Sc4Ozvj888/R4sWLZCXlyfVeXL7pkwmq7JMoXiU3Kl4f1pVHi9/sk5N7R5XU+xnqaNQKNCuXTtkZmaqXOfPn8eQIUMAPFpRd+TIEXTs2BHbtm2Do6Mjjh49CuDRO+POnj2L3r174+eff4azszO+//77KseqUChgaWlZKda5c+cwffp0qd7jyczHVVX+LM+tuv4eFxERASMjI5XrzsM/ntqOiIiIiIiIiEiUepmgAx4lfDw9PTF37lycPHkSWlpa1SacnoWzszMyMzOlgyYAIC0tDWpqanB0dIShoSEaN26M1NRUlXbp6enSVlItLS0AQHl5+d8eR3Xc3d2Rk5ODRo0awd7eXuUyMjKS6rVt2xahoaFIT0+Hi4sLvvnmG+meo6MjJk+ejP3796Nfv36IjY2tNta1a9egoaFRKVbDhg1rPXYnJ6can1tthIaG4vbt2yqXoXblVXxERERERERE9M8olDJhV31TLxN0x44dw6JFi3DixAkUFBRg586d+OOPP/5WwqfC0KFDoa2tjREjRuC3335DYmIiJkyYgGHDhsHc3BwAMH36dERGRmLbtm04d+4cPv74Y2RmZmLixIkAgEaNGkFHRwcJCQn4/fffcfv27TqZb8X4GjZsCD8/P6SkpCAvLw/JycmYOHEi/vvf/yIvLw+hoaE4cuQILl26hP379+P8+fNwcnLCgwcPEBwcjKSkJFy6dAlpaWk4fvx4tc+re/fu6NChA/r27Yt9+/YhPz8f6enpmD17do0nv1Zn+vTpiIuLw9q1a5GTk4Nly5Zh586dmDZtWq37ksvlMDQ0VLlksnr5bU5ERERERERE9US9PCTC0NAQhw8fxooVK3Dnzh1YW1sjKioKPXv2/Nt96urqYt++fZg4cSLat28PXV1dfPDBB1i2bJlUJyQkBHfu3MHUqVNx/fp1ODs7Y/fu3dKBCxoaGli5ciXmzZuHOXPmoHPnzkhKSvqn05XGd/jwYcycORP9+vXD3bt30aRJE3Tr1g2GhoZ48OAB/vOf/2Djxo34888/YWlpieDgYIwZMwZlZWX4888/MXz4cPz+++9o2LAh+vXrh7lz51YZSyaTYe/evZg1axYCAwPxxx9/wMLCAl5eXlKysjb69u2L6OhoLFmyBCEhIbC1tUVsbCx8fHz+4VMhIiIiIiIioudFWQ9XsokiUyp55gbVb80auD690nPSycBeWOzUuxeExRYpVtNZWOyDOupPr1QPXVI+EBa7S/nT3035vHTWvSksdsr9BsJiv67zvqgh7sCheScWCIt9uFWosNivqyZGd0UPQYh3blwWFttKR9zrUC4/eD3flSzy97U5Gq/nM39deWlaCIsdkf/N0yvVAyeb+QmL3bbg38JiPw/c+0dERERERERERCRQvdziSkREREREREREzxf3ZNYdrqAjIiIiIiIiIiISiCvoiIiIiIiIiIio1hQ8JKLOcAUdERERERERERGRQFxBR/VemG4bccHLxYUeKfB0rIuamsJio7RUWOiwSQbCYot0/6crwmLfyCsWFrtJD3F/x9U54fU8SVXkCbJ2t8X9+y3yJFWvsxHCYpelbRcWW5mXKyx2wWpxp7haL/ERFjts/GlhsUWe0ty9TNwJsiJ/X7soLDJwYJqDsNjlF8X9zlR8TtzPllOZ4k5SfeODO8Jivy6UXEFXZ7iCjoiIiIiIiIiISCAm6IiIiIiIiIiIiATiFlciIiIiIiIiIqo1HhJRd7iCjoiIiIiIiIiISCAm6F5CMpkMu3btEj0MIiIiIiIiIqJqKQVe9Q0TdPTClFZxumZJScnf6uvvtiMiIiIiIiIietm8sgk6hUKByMhI2NvbQy6Xo1mzZli4cKF0/8yZM+jatSt0dHRgamqK0aNH4969e9L9gIAA9O3bF4sWLYK5uTmMjY0xd+5clJWVYfr06WjQoAGaNm2KDRs2SG3y8/Mhk8mwdetWdOzYEdra2mjVqhWSkpKkOuXl5Rg5ciRsbW2ho6ODFi1aIDo6utL4N2zYgFatWkEul8PS0hLBwcEAABsbGwDA+++/D5lMJn0ODw+Hm5sbNm3aBBsbGxgZGWHQoEG4e/d/x2UrlUosXrwYdnZ20NHRQZs2bbB9+3bp/q1btzB06FCYmZlBR0cHDg4OiI2NBfAo4RUcHAxLS0toa2vDxsYGERERNX4NYmNj4eTkBG1tbbRs2RKrV6+u9Kzi4+Ph4+MDbW1tbN68WXruERERaNy4MRwdHWv19XqyHRERERERERHRq+6VPSQiNDQUMTExWL58OTp16oTCwkL85z//AQDcv38fPXr0wFtvvYXjx4/j+vXrGDVqFIKDgxEXFyf18fPPP6Np06Y4fPgw0tLSMHLkSBw5cgReXl44duwYtm3bhrFjx+Ltt9+GlZWV1G769OlYsWIFnJ2dsWzZMrz33nvIy8uDqakpFAoFmjZtivj4eDRs2BDp6ekYPXo0LC0tMXDgQADAmjVrMGXKFHz22Wfo2bMnbt++jbS0NADA8ePH0ahRI8TGxqJHjx5QV1eX4ubm5mLXrl3Ys2cPbt26hYEDB+Kzzz6TEpOzZ8/Gzp07sWbNGjg4OODw4cP48MMPYWZmBm9vb4SFhSErKws//fQTGjZsiAsXLuDBgwcAgJUrV2L37t2Ij49Hs2bNcPnyZVy+fLna5x8TE4NPP/0Uq1atQtu2bXHy5EkEBQVBT08PI0aMkOrNnDkTUVFRiI2NhVwuR3JyMg4dOgRDQ0McOHAASqXymb9eT7YjIiIiIiIiInF4SETdeSUTdHfv3kV0dDRWrVolJYOaN2+OTp06AQC2bNmCBw8e4Ouvv4aenh4AYNWqVejTpw8iIyNhbm4OAGjQoAFWrlwJNTU1tGjRAosXL8b9+/fxySefAHiUBPzss8+QlpaGQYMGSfGDg4PxwQcfAHiUbEtISMBXX32FGTNmQFNTE3PnzpXq2traIj09HfHx8VKCbsGCBZg6dSomTpwo1Wvfvj0AwMzMDABgbGwMCwsLlXkrFArExcXBwMAAADBs2DAcOnQICxcuRFFREZYtW4aff/4ZHTp0AADY2dkhNTUV69atg7e3NwoKCtC2bVt4eHgA+N9qPQAoKCiAg4MDOnXqBJlMBmtr6xq/BvPnz0dUVBT69esnzTMrKwvr1q1TSdBNmjRJqlNBT08P69evh5aWFoBHyb5n+Xo92Y6IiIiIiIiIqD54JRN02dnZKC4uRrdu3aq936ZNGynZAwCenp5QKBQ4d+6clPBp1aoV1NT+t8vX3NwcLi4u0md1dXWYmpri+vXrKv1XJMAAQENDAx4eHsjOzpbK1q5di/Xr1+PSpUt48OABSkpK4ObmBgC4fv06rl69Wu3Ya2JjYyMl5wDA0tJSGltWVhYePnyIt99+W6VNSUkJ2rZtCwD417/+hQ8++AC//vor3nnnHfTt2xcdO3YE8GgL6dtvv40WLVqgR48eePfdd/HOO+9UOY4//vgDly9fxsiRIxEUFCSVl5WVwcjISKVuRTLwca6uripJtmf9ej3ZrirFxcUoLi5WKStVlkNTpl5NCyIiIiIiIiL6O5RcQVdnXskEnY6OTo33lUolZLKqv0keL9fU1Kx0r6oyhULx1DFV9BsfH4/JkycjKioKHTp0gIGBAZYsWYJjx44909hrUtPYKv73xx9/RJMmTVTqyeVyAEDPnj1x6dIl/Pjjjzh48CC6deuG8ePHY+nSpXB3d0deXh5++uknHDx4EAMHDkT37t1V3mFXoSJWTEwM3nzzTZV7j2/JBaCSdKuu7Fm/XlX19aSIiAiVFYwA8K6BK94zbP3UtkREREREREREIrySh0Q4ODhAR0cHhw4dqvK+s7MzMjMzUVRUJJWlpaVBTU2tTg4XOHr0qPTnsrIyZGRkoGXLlgCAlJQUdOzYEePGjUPbtm1hb2+P3Nxcqb6BgQFsbGyqHTvwKBFXXl5eqzE5OztDLpejoKAA9vb2Ktfj788zMzNDQEAANm/ejBUrVuDLL7+U7hkaGsLf3x8xMTHYtm0bduzYgZs3b1aKZW5ujiZNmuDixYuVYtna2tZq3BVjr6uvV2hoKG7fvq1y9TRoVesxEREREREREVHNFAKv+uaVXEGnra2NmTNnYsaMGdDS0oKnpyf++OMPnD17FiNHjsTQoUPx6aefYsSIEQgPD8cff/yBCRMmYNiwYdJ2yX/iiy++gIODA5ycnLB8+XLcunULgYGBAAB7e3t8/fXX2LdvH2xtbbFp0yYcP35cJXEVHh6OsWPHolGjRujZsyfu3r2LtLQ0TJgwAQCkBJ6npyfkcjlMTEyeOiYDAwNMmzYNkydPhkKhQKdOnXDnzh2kp6dDX18fI0aMwJw5c9CuXTu0atUKxcXF2LNnD5ycnAAAy5cvh6WlJdzc3KCmpobvvvsOFhYWMDY2rjJeeHg4QkJCYGhoiJ49e6K4uBgnTpzArVu3MGXKlFo9z7r8esnlcmnFYAVubyUiIiIiIiKil9krmaADgLCwMGhoaGDOnDm4evUqLC0tMXbsWACArq4u9u3bh4kTJ6J9+/bQ1dXFBx98gGXLltVJ7M8++wyRkZE4efIkmjdvjn//+99o2LAhAGDs2LHIzMyEv78/ZDIZBg8ejHHjxuGnn36S2o8YMQIPHz7E8uXLMW3aNDRs2BD9+/eX7kdFRWHKlCmIiYlBkyZNkJ+f/0zjmj9/Pho1aoSIiAhcvHgRxsbGcHd3lw690NLSQmhoKPLz86Gjo4POnTtj69atAAB9fX1ERkYiJycH6urqaN++Pfbu3avyjr7HjRo1Crq6uliyZAlmzJgBPT09uLq6YtKkSbV+ns/760VERERERERE9DKTKZVKpehBvCry8/Nha2uLkydPSoc+0MsvpumHoocghF1pqbDYF594X+KLJHLeHUKNhcUW6f5P/xEW+0be099N+bw06SHuLRFXEsQt6k+530BY7M66lV+78KJcuW3w9Er1kNfZCGGxy9Iqvwf3RVHm5T690nNSsPqysNjWS3yExd48/rSw2Bc1xP1M7f6gdq+1qUsif18Taeg0cb87lF+8Iix28bm7wmKfyrQQFvsN/6KnV3pODFbuERb7RTpsMUBYbK9r3wmL/Ty8ku+gIyIiIiIiIiIiqi9e2S2uREREREREREQkjoJ7MusME3S1YGNjA+4IJiIiIiIiIiKiusQtrkRERERERERERAJxBR0REREREREREdWaAjLRQ6g3mKAjeo5EngoGvJ4nc4k8kewNgSdzqds1ERZb5EmqQk/WTBB3GprQeb+eP1peWyJPUtXw7C8sdhnEzfvK7b+ExW4m8PTa19XrepKqSCJPn9ft2VJY7BsJ4k6IFvl93kTgyffivtr0qmKCjoiIiIiIiIiIak3JFXR1hu+gIyIiIiIiIiIiEogJunouLi4OxsbGoodBRERERERERPWMQuBV3zBBV8/5+/vj/Pnz0ufw8HC4ubmJGxAREREREREREangO+jqOR0dHejo6IgeBgCgpKQEWlpaKmXl5eWQyWRQU6tdrvjvtiMiIiIiIiIietkwu1EDhUKByMhI2NvbQy6Xo1mzZli4cKF0/8yZM+jatSt0dHRgamqK0aNH4969e9L9gIAA9O3bF0uXLoWlpSVMTU0xfvx4lJaWSnWKi4sxY8YMWFlZQS6Xw8HBAV999RWAR0mokSNHwtbWFjo6OmjRogWio6Oltvv27YO2tjb++usvlXGHhITA29sbgOoW17i4OMydOxenTp2CTCaDTCZDXFwcAgMD8e6776r0UVZWBgsLC2zYsKHa55Oeng4vLy/o6OjAysoKISEhKCoqku7b2NhgwYIFCAgIgJGREYKCgqTx7NmzB87OzpDL5bh06RJu3bqF4cOHw8TEBLq6uujZsydycnKkvqprR0RERERERERiKCETdtU3TNDVIDQ0FJGRkQgLC0NWVha++eYbmJubAwDu37+PHj16wMTEBMePH8d3332HgwcPIjg4WKWPxMRE5ObmIjExERs3bkRcXBzi4uKk+8OHD8fWrVuxcuVKZGdnY+3atdDX1wfwKEHYtGlTxMfHIysrC3PmzMEnn3yC+Ph4AED37t1hbGyMHTt2SP2Vl5cjPj4eQ4cOrTQff39/TJ06Fa1atUJhYSEKCwvh7++PUaNGISEhAYWFhVLdvXv34t69exg4cGCVz+bMmTPw9fVFv379cPr0aWzbtg2pqamV5r9kyRK4uLggIyMDYWFh0rOLiIjA+vXrcfbsWTRq1AgBAQE4ceIEdu/ejSNHjkCpVKJXr14qycyq2hERERERERERveq4xbUad+/eRXR0NFatWoURI0YAAJo3b45OnToBALZs2YIHDx7g66+/hp6eHgBg1apV6NOnDyIjI6VEnomJCVatWgV1dXW0bNkSvXv3xqFDhxAUFITz588jPj4eBw4cQPfu3QEAdnZ20hg0NTUxd+5c6bOtrS3S09MRHx+PgQMHQl1dHf7+/vjmm28wcuRIAMChQ4dw69YtDBgwoNKcdHR0oK+vDw0NDVhYWEjlHTt2RIsWLbBp0ybMmDEDABAbG4sBAwZIycInLVmyBEOGDMGkSZMAAA4ODli5ciW8vb2xZs0aaGtrAwC6du2KadOmSe1SU1NRWlqK1atXo02bNgCAnJwc7N69G2lpaejYsaP0fK2srLBr1y5pLk+2IyIiIiIiIiJx6uNhDaJwBV01srOzUVxcjG7dulV7v02bNlJyDgA8PT2hUChw7tw5qaxVq1ZQV1eXPltaWuL69esAgMzMTKirq0vbUauydu1aeHh4wMzMDPr6+oiJiUFBQYF0f+jQoUhKSsLVq1cBPEps9erVCyYmJrWa76hRoxAbGwsAuH79On788UcEBgZWWz8jIwNxcXHQ19eXLl9fXygUCuTl5Un1PDw8KrXV0tJC69atpc/Z2dnQ0NDAm2++KZWZmpqiRYsWyM7OrrZdVYqLi3Hnzh2Vq1RZ/vQHQEREREREREQkCBN01XjawQpKpRIyWdV7nh8v19TUrHRPoVA8U4z4+HhMnjwZgYGB2L9/PzIzM/HRRx+hpKREqvPGG2+gefPm2Lp1Kx48eIDvv/8eH374YY39VmX48OG4ePEijhw5gs2bN8PGxgadO3eutr5CocCYMWOQmZkpXadOnUJOTg6aN28u1Xs8gVlBR0dH5RkplcoqYzz5jJ9sV5WIiAgYGRmpXD/dPVtjGyIiIiIiIiIikZigq4aDgwN0dHRw6NChKu87OzsjMzNT5VCEtLQ0qKmpwdHR8ZliuLq6QqFQIDk5ucr7KSkp6NixI8aNG4e2bdvC3t4eubm5leoNGTIEW7ZswQ8//AA1NTX07t272phaWlooL6+8oszU1BR9+/ZFbGwsYmNj8dFHH9U4dnd3d5w9exb29vaVridPan0aZ2dnlJWV4dixY1LZn3/+ifPnz8PJyalWfYWGhuL27dsqV0+DVrXqg4iIiIiIiIieTiHwqm+YoKuGtrY2Zs6ciRkzZuDrr79Gbm4ujh49Kp2wOnToUGhra2PEiBH47bffkJiYiAkTJmDYsGHS++eexsbGBiNGjEBgYCB27dqFvLw8JCUlSYdA2Nvb48SJE9i3bx/Onz+PsLAwHD9+vFI/Q4cOxa+//oqFCxeif//+0vvfqouZl5eHzMxM3LhxA8XFxdK9UaNGYePGjcjOzpbeu1edmTNn4siRIxg/fjwyMzOl98hNmDDhmeb+OAcHB/j5+SEoKAipqak4deoUPvzwQzRp0gR+fn616ksul8PQ0FDl0pSpP70hEREREREREZEgTNDVICwsDFOnTsWcOXPg5OQEf39/6f1xurq62LdvH27evIn27dujf//+6NatG1atWlWrGGvWrEH//v0xbtw4tGzZEkFBQdKqvLFjx6Jfv37w9/fHm2++iT///BPjxo2r1IeDgwPat2+P06dPV3l66+M++OAD9OjRA126dIGZmRm+/fZb6V737t1haWkJX19fNG7cuMZ+WrdujeTkZOTk5KBz585o27YtwsLCYGlpWav5V4iNjUW7du3w7rvvokOHDlAqldi7d2+lLcJERERERERE9HJQQibsqm9kyupeAEavnfv376Nx48bYsGED+vXrJ3o4dSamae3fyVdXLmqIW3hrV8b8+4s2qN9fwmKr2zURFrtg9WVhsa/cNhAWu4nRXWGxRc77osC/OOmse1NYbJHPXKSOX7YTFlvDs7+w2GVp24XFTh+dISx2h1BjYbG3LC16eqXnhL+vvV7ebyXu9xbdni2FxRb5+1rK/QbCYov83aHl+b3CYr9IP5oPFha79+/fPr3SK0RD9ABIPIVCgWvXriEqKgpGRkZ47733RA+JiIiIiIiIiF5yivq3kE0YJugIBQUFsLW1RdOmTREXFwcNDX5bEBERERERERG9KMzEEGxsbMCdzkREREREREREYjBBR0REREREREREtaaoh4c1iMK3khIREREREREREQnEFXRU74k8uQev6YlFr6/X8+88mvQQOO+E1/Mk1TZu14TFvnjWSlhsfq+9eMq8XGGxyyDuJFWRJ8gC4k5xLb94RVjszrriTlK9WGIsLPbrejq1yBPBRbr/03+ExW7SQ+DXe7fAk5IFfp+LO7P3xeLLsurO6/n/JomIiIiIiIiIiF4STNAREREREREREREJxC2uRERERERERERUa+I2MNc/XEH3isvPz4dMJkNmZqbooRARERERERER0d/ABN0rzsrKCoWFhXBxcXnmNuHh4XBzc3t+gyIiIiIiIiKiek8hkwm76hsm6F5x6urqsLCwgIbGy79bubS09JnK/m5fRERERERERESvopciQadQKBAZGQl7e3vI5XI0a9YMCxculO6fOXMGXbt2hY6ODkxNTTF69Gjcu3dPuh8QEIC+ffti6dKlsLS0hKmpKcaPH6+SxCkuLsaMGTNgZWUFuVwOBwcHfPXVVwCA8vJyjBw5Era2ttDR0UGLFi0QHR0ttd23bx+0tbXx119/qYw7JCQE3t7e0uf09HR4eXlBR0cHVlZWCAkJQVFRUbXzrljJtm7dOlhZWUFXVxcDBgxQiaNQKDBv3jw0bdoUcrkcbm5uSEhIkO4/ucU1KSkJMpkMhw4dgoeHB3R1ddGxY0ecO3cOABAXF4e5c+fi1KlTkMlkkMlkiIuLk8bTrFkzyOVyNG7cGCEhITV+3X744Qe0a9cO2trasLOzw9y5c1FWVibdl8lkWLt2Lfz8/KCnp4cFCxZIc96wYQPs7Owgl8uhVCpRUFAAPz8/6Ovrw9DQEAMHDsTvv/9e6Vk92Y6IiIiIiIiI6FX3UiToQkNDERkZibCwMGRlZeGbb76Bubk5AOD+/fvo0aMHTExMcPz4cXz33Xc4ePAggoODVfpITExEbm4uEhMTsXHjRsTFxUmJJwAYPnw4tm7dipUrVyI7Oxtr166Fvr4+gEdJsKZNmyI+Ph5ZWVmYM2cOPvnkE8THxwMAunfvDmNjY+zYsUPqr7y8HPHx8Rg6dCiAR0lEX19f9OvXD6dPn8a2bduQmppaaZxPunDhAuLj4/HDDz8gISEBmZmZGD9+vHQ/OjoaUVFRWLp0KU6fPg1fX1+89957yMnJqbHfWbNmISoqCidOnICGhgYCAwMBAP7+/pg6dSpatWqFwsJCFBYWwt/fH9u3b8fy5cuxbt065OTkYNeuXXB1da22/3379uHDDz9ESEgIsrKysG7dOsTFxakkVgHg008/hZ+fH86cOSONoWLOO3bskBKLffv2xc2bN5GcnIwDBw4gNzcX/v7+VT6rx9sRERERERERkRhKgVd9I3xf5N27dxEdHY1Vq1ZhxIgRAIDmzZujU6dOAIAtW7bgwYMH+Prrr6GnpwcAWLVqFfr06YPIyEgpkWdiYoJVq1ZBXV0dLVu2RO/evXHo0CEEBQXh/PnziI+Px4EDB9C9e3cAgJ2dnTQGTU1NzJ07V/psa2uL9PR0xMfHY+DAgVBXV4e/vz+++eYbjBw5EgBw6NAh3Lp1CwMGDAAALFmyBEOGDMGkSZMAAA4ODli5ciW8vb2xZs0aaGtrVzn/hw8fYuPGjWjatCkA4PPPP0fv3r0RFRUFCwsLLF26FDNnzsSgQYMAAJGRkUhMTMSKFSvwxRdfVPtcFy5cKK3u+/jjj9G7d288fPgQOjo60NfXh4aGBiwsLKT6BQUFsLCwQPfu3aGpqYlmzZrhjTfeqLH/jz/+WPqa2dnZYf78+ZgxYwY+/fRTqd6QIUOkxFyFkpISbNq0CWZmZgCAAwcO4PTp08jLy4OVlRUAYNOmTWjVqhWOHz+O9u3bV9mOiIiIiIiIiKg+EL6CLjs7G8XFxejWrVu199u0aSMl5wDA09MTCoVC2rYJAK1atYK6urr02dLSEtevXwcAZGZmQl1dXWU76pPWrl0LDw8PmJmZQV9fHzExMSgoKJDuDx06FElJSbh69SqAR4nDXr16wcTEBACQkZGBuLg46OvrS5evry8UCgXy8vKqjdusWTMpOQcAHTp0kOZ2584dXL16FZ6eniptPD09kZ2dXW2fANC6dWuVZwFAeh5VGTBgAB48eAA7OzsEBQXh+++/V9mu+qSMjAzMmzdPZb5BQUEoLCzE/fv3pXoeHh6V2lpbW6sk2bKzs2FlZSUl5wDA2dkZxsbGKvN8sl1ViouLcefOHZWrRFFeYxsiIiIiIiIiqj2FwKu+EZ6g09HRqfG+UqmErJrTOR4v19TUrHRPoVA8U4z4+HhMnjwZgYGB2L9/PzIzM/HRRx+hpKREqvPGG2+gefPm2Lp1Kx48eIDvv/8eH374oXRfoVBgzJgxyMzMlK5Tp04hJycHzZs3rzF+VXN6fG5Pzr+mZ1Lh8edRUbfieVTFysoK586dwxdffAEdHR2MGzcOXl5e1R7GoFAoMHfuXJX5njlzBjk5OSqrBR9PrFZXVt18niyvqq8nRUREwMjISOX68tbFp7YjIiIiIiIiIhJF+BZXBwcH6Ojo4NChQxg1alSl+87Ozti4cSOKioqkBE1aWhrU1NTg6Oj4TDFcXV2hUCiQnJwsbXF9XEpKCjp27Ihx48ZJZbm5uZXqDRkyBFu2bEHTpk2hpqaG3r17S/fc3d1x9uxZ2NvbP9OYKhQUFODq1ato3LgxAODIkSPS3AwNDdG4cWOkpqbCy8tLapOenl7j9tOn0dLSQnl55VVlOjo6eO+99/Dee+9h/PjxaNmyJc6cOQN3d/dKdd3d3XHu3Llaz7cqzs7OKCgowOXLl6VVdFlZWbh9+zacnJxq1VdoaCimTJmiUpbvPuAfj5GIiIiIiIiIVClqXjtEtSB8BZ22tjZmzpyJGTNm4Ouvv0Zubi6OHj0qnbA6dOhQaGtrY8SIEfjtt9+QmJiICRMmYNiwYdL7557GxsYGI0aMQGBgIHbt2oW8vDwkJSVJh0DY29vjxIkT2LdvH86fP4+wsDAcP368Uj9Dhw7Fr7/+ioULF6J///4qK8VmzpyJI0eOYPz48cjMzEROTg52796NCRMmPHX+I0aMwKlTp5CSkoKQkBAMHDhQej/c9OnTERkZiW3btuHcuXP4+OOPkZmZiYkTJz7T3Kt7Hnl5ecjMzMSNGzdQXFyMuLg4fPXVV/jtt99w8eJFbNq0CTo6OrC2tq6yjzlz5uDrr79GeHg4zp49i+zsbGzbtg2zZ8+u9Xi6d++O1q1bS8/3l19+wfDhw+Ht7V3lFtmayOVyGBoaqlxaaupPb0hEREREREREJIjwBB0AhIWFYerUqZgzZw6cnJzg7+8vvS9NV1cX+/btw82bN9G+fXv0798f3bp1w6pVq2oVY82aNejfvz/GjRuHli1bIigoCEVFRQCAsWPHol+/fvD398ebb76JP//8U2U1XQUHBwe0b98ep0+flk5vrdC6dWskJycjJycHnTt3Rtu2bREWFia9/6069vb26NevH3r16oV33nkHLi4uWL16tXQ/JCQEU6dOxdSpU+Hq6oqEhATs3r0bDg4OtZr/4z744AP06NEDXbp0gZmZGb799lsYGxsjJiYGnp6eaN26NQ4dOoQffvgBpqamVfbh6+uLPXv24MCBA2jfvj3eeustLFu2rNqEXk1kMhl27doFExMTeHl5oXv37rCzs8O2bdv+9hyJiIiIiIiIiF4VMqVSWR9Pp30lhIeHY9euXcjMzBQ9lHrtP469hMVOud9AWOzOujeFxX5dNekh7u881O2aCItdfvGKsNhXEsS9HvbKbQNhsdu4XRMW+/uzVk+v9JwM6veXsNiv6/dah1BjYbFlts/+Dt+6puHZX1jsw61ChcV+w79IWGyR/45tLDEWFnuE1l/CYov82XLxifeHv0jvt7osLLZI8hbivt6LdhsKi939gbgDA7v9/nosONnS+MOnV3pOhl7dLCz28/BSrKAjIiIiIiIiIiJ6XQk/JIKIiIiIiIiIiF493JJZd7iCTqDw8HBubyUiIiIiIiIies0xQUdERERERERERCQQt7gSEREREREREVGtKWSiR1B/8BRXqvc8m3QVFttaw0hY7Etlt4XFFilA1lj0EF47FzXEnbzH0+9ePJEnRIt85gd11IXFFvl9LpLIr7dIXmcjhMXO7RgsLPbIoofCYtOLJ/J3ZGuZjrDYl5QPhMUWaY68WFhskac0R+R/Iyz2i/R1E3GnuA6/Ur9OceUKOiIiIiIiIiIiqjVxf1Vf//AddERERERERERERAJxBR0REREREREREdUa35lWd7iCrp6ysbHBihUrRA+DiIiIiIiIiIieggm611h5eTkUihezY7y0tLRW5X+3PyIiIiIiIiKiJ61evRq2trbQ1tZGu3btkJKSUm3dnTt34u2334aZmRkMDQ3RoUMH7Nu377mOjwm6x2zfvh2urq7Q0dGBqakpunfvjqKiIhw+fBiampq4du2aSv2pU6fCy8sLABAXFwdjY2Ps2bMHLVq0gK6uLvr374+ioiJs3LgRNjY2MDExwYQJE1BeXi71YWNjgwULFmD48OHQ19eHtbU1/v3vf+OPP/6An58f9PX14erqihMnTqjETk9Ph5eXF3R0dGBlZYWQkBAUFRUBAHx8fHDp0iVMnjwZMpkMMpms0hidnZ0hl8uRkpLy1LlV5fbt2xg9ejQaNWoEQ0NDdO3aFadOnZLuh4eHw83NDRs2bICdnR3kcjmUSiVkMhnWrl0LPz8/6OnpYcGCBQCANWvWoHnz5tDS0kKLFi2wadMmlXjVtSMiIiIiIiIiMRQycVdtbNu2DZMmTcKsWbNw8uRJdO7cGT179kRBQUGV9Q8fPoy3334be/fuRUZGBrp06YI+ffrg5MmTdfDUqsYE3f8rLCzE4MGDERgYiOzsbCQlJaFfv35QKpXw8vKCnZ2dStKorKwMmzdvxkcffSSV3b9/HytXrsTWrVuRkJAg9bF3717s3bsXmzZtwpdffont27erxF6+fDk8PT1x8uRJ9O7dG8OGDcPw4cPx4Ycf4tdff4W9vT2GDx8OpfLR7u4zZ87A19cX/fr1w+nTp7Ft2zakpqYiODgYwKNMb9OmTTFv3jwUFhaisLBQZYwRERFYv349zp49Cw8Pj2ea2+OUSiV69+6Na9euSd+s7u7u6NatG27evCnVu3DhAuLj47Fjxw5kZmZK5Z9++in8/Pxw5swZBAYG4vvvv8fEiRMxdepU/PbbbxgzZgw++ugjJCYmqsR9sh0RERERERER0dMsW7YMI0eOxKhRo+Dk5IQVK1bAysoKa9asqbL+ihUrMGPGDLRv3x4ODg5YtGgRHBwc8MMPPzy3MfKQiP9XWFiIsrIy9OvXD9bW1gAAV1dX6f7IkSMRGxuL6dOnAwB+/PFH3L9/HwMHDpTqlJaWSivBAKB///7YtGkTfv/9d+jr68PZ2RldunRBYmIi/P39pXa9evXCmDFjAABz5szBmjVr0L59ewwYMAAAMHPmTHTo0AG///47LCwssGTJEgwZMgSTJk0CADg4OGDlypXw9vbGmjVr0KBBA6irq8PAwAAWFhYq8ywtLcXq1avRpk2bWs3tcYmJiThz5gyuX78OuVwOAFi6dCl27dqF7du3Y/To0QCAkpISbNq0CWZmZirthwwZopJgGzJkCAICAjBu3DgAwJQpU3D06FEsXboUXbp0qbYdEREREREREYnzYl6aVbXi4mIUFxerlMnlcilPUaGkpAQZGRn4+OOPVcrfeecdpKenP1MshUKBu3fvokGDBv9s0DXgCrr/16ZNG3Tr1g2urq4YMGAAYmJicOvWLel+QEAALly4gKNHjwIANmzYgIEDB0JPT0+qo6urKyXnAMDc3Bw2NjbQ19dXKbt+/bpK7NatW6vcB1STgxVlFe0yMjIQFxcHfX196fL19YVCoUBeXl6N89TS0lKJ96xze1xGRgbu3bsHU1NTlTHk5eUhNzdXqmdtbV0pOQcAHh4eKp+zs7Ph6empUubp6Yns7Owa21WluLgYd+7cUbkUSpE/MoiIiIiIiIiorkVERMDIyEjlioiIqFTvxo0bKC8vl3IrFczNzSu97qs6UVFRKCoqqnYhU13gCrr/p66ujgMHDiA9PR379+/H559/jlmzZuHYsWOwtbVFo0aN0KdPH8TGxsLOzg579+5FUlKSSh+ampoqn2UyWZVlTx7M8HidivfFVVVW0U6hUGDMmDEICQmpNI9mzZrVOE8dHR2pvwrPMrfHKRQKWFpaVlnH2NhY+nN1Cb6qyp8cU8X76p7W7kkRERGYO3euSllTfRs0M7R9alsiIiIiIiIiejWEhoZiypQpKmVPrp573LPkHary7bffIjw8HP/+97/RqFGjvzfYZ8AE3WNkMhk8PT3h6emJOXPmwNraGt9//730BR81ahQGDRqEpk2bonnz5pVWfb0o7u7uOHv2LOzt7auto6WlpXIYxdPUZm7u7u64du0aNDQ0YGNjU5uhV8nJyQmpqakYPny4VJaeng4nJ6da91XVv6C+Ld/7x2MkIiIiIiIiIlUi96tVtZ21Kg0bNoS6unql1XLXr1+vtKruSdu2bcPIkSPx3XffoXv37v9ovE/DLa7/79ixY1i0aBFOnDiBgoIC7Ny5E3/88YdKksjX1xdGRkZYsGBBtQcovAgzZ87EkSNHMH78eGRmZiInJwe7d+/GhAkTpDo2NjY4fPgwrly5ghs3bjy1z9rMrXv37ujQoQP69u2Lffv2IT8/H+np6Zg9e3al02afxfTp0xEXF4e1a9ciJycHy5Ytw86dOzFt2rRa9yWXy2FoaKhyqcn4bU5ERERERET0OtLS0kK7du1w4MABlfIDBw6gY8eO1bb79ttvERAQgG+++Qa9e/d+3sNkgq6CoaEhDh8+jF69esHR0RGzZ89GVFQUevbsKdVRU1NDQEAAysvLVVZ7vWitW7dGcnIycnJy0LlzZ7Rt2xZhYWGwtLSU6sybNw/5+flo3rx5le+Be1Jt5iaTybB37154eXkhMDAQjo6OGDRoEPLz85+afa5K3759ER0djSVLlqBVq1ZYt24dYmNj4ePjU+u+iIiIiIiIiOjFUMrEXbUxZcoUrF+/Hhs2bEB2djYmT56MgoICjB07FsCj3XiP50K+/fZbDB8+HFFRUXjrrbdw7do1XLt2Dbdv367Lx6dCplQqlc+t93ooKCgIv//+O3bv3i16KHWuvs7Ns0lXYbGtNYyExb5U9vx+cLzMAmSNRQ/htXNRQ9zC9hFafwmLfeW2gbDYF594v+mL1Fn3prDYIp/5QR11YbFFfp+LJPLrLZLX2covt35RcjsGC4s9suihsNj04on8HdlapiMs9iXlA2GxRZojL356pedkY4mxsNgR+d8Ii/0irbX6UFjssZc316r+6tWrsXjxYhQWFsLFxQXLly+Hl5cXgEeHZ+bn50vv2vfx8UFycnKlPkaMGIG4uLh/OvQq8R10z+j27ds4fvw4tmzZgn//+9+ih1On6vPciIiIiIiIiOj5EPkOutoaN24cxo0bV+W9J5NuNR2c+bwwQfeM/Pz88Msvv2DMmDF4++23RQ+nTtXnuRERERERERERveyYoHtGIrKnL0p9nhsRERERERER0cuOCToiIiIiIiIiIqq1V2mL68uOp7gSEREREREREREJxBV0RERERERERERUa0rRA6hHmKCjeu91Pcb9df232+5hqbDYbdyuCYt9I09PWGy72wbCYl95IC52h1BjYbEvLi0SFvuKwK/3RU1NYbFf1w0c1kt8hMVulpcrLHb5xSvCYud2DBYWu3n6KmGxrdtNFRabXrwu5eJ+b+mse1NYbJFS7jcQFntjibj/T/TJe3eExSaqLW5xJSIiIiIiIiIiEug1XWNDRERERERERET/hEImegT1R52soFMqlRg9ejQaNGgAmUyGzMzMuui2TgUEBKBv3761ahMXFwdjY2Ppc3h4ONzc3Op0XM+Dj48PJk2aJHoYRERERERERET0DOpkBV1CQgLi4uKQlJQEOzs7NGzYsC66rZKPjw/c3NywYsWK5xajOtOmTcOECRNeeNza2rlzJzSFvqeHiIiIiIiIiOq71/Mtvc9HnSTocnNzYWlpiY4dO1Zbp6SkBFpaWnURThh9fX3o6+uLHsZTNWgg7gWgNanqe0CpVKK8vBwaGrX7Vvy77YiIiIiIiIiIXjb/eItrQEAAJkyYgIKCAshkMtjY2AB4tNItODgYU6ZMQcOGDfH2228DAJYtWwZXV1fo6enBysoK48aNw71791T6TEtLg7e3N3R1dWFiYgJfX1/cunULAQEBSE5ORnR0NGQyGWQyGfLz81FeXo6RI0fC1tYWOjo6aNGiBaKjo2s9l7i4ODRr1gy6urp4//338eeff6rcf3KLa8W22UWLFsHc3BzGxsaYO3cuysrKMH36dDRo0ABNmzbFhg0bVPq5cuUK/P39YWJiAlNTU/j5+SE/P79Sv0uXLoWlpSVMTU0xfvx4lJb+73TK1atXw8HBAdra2jA3N0f//v2le09ucb116xaGDx8OExMT6OrqomfPnsjJyVGZt7GxMfbt2wcnJyfo6+ujR48eKCwsrPF5ZWVloVevXtDX14e5uTmGDRuGGzduqIzjye+BpKQkyGQy7Nu3Dx4eHpDL5UhJSUFxcTFCQkLQqFEjaGtro1OnTjh+/LjUV3XtiIiIiIiIiEgMhcCrvvnHCbro6GjMmzcPTZs2RWFhoUpSZePGjdDQ0EBaWhrWrVv3KKCaGlauXInffvsNGzduxM8//4wZM2ZIbTIzM9GtWze0atUKR44cQWpqKvr06YPy8nJER0ejQ4cOCAoKQmFhIQoLC2FlZQWFQoGmTZsiPj4eWVlZmDNnDj755BPEx8c/8zyOHTuGwMBAjBs3DpmZmejSpQsWLFjw1HY///wzrl69isOHD2PZsmUIDw/Hu+++CxMTExw7dgxjx47F2LFjcfnyZQDA/fv30aVLF+jr6+Pw4cNITU2VEmIlJSVSv4mJicjNzUViYiI2btyIuLg4xMXFAQBOnDiBkJAQzJs3D+fOnUNCQgK8vLyqHWNAQABOnDiB3bt348iRI1AqlejVq5dKwu/+/ftYunQpNm3ahMOHD6OgoADTpk2rts/CwkJ4e3vDzc0NJ06cQEJCAn7//XcMHDhQpV5V3wMAMGPGDERERCA7OxutW7fGjBkzsGPHDmzcuBG//vor7O3t4evri5s3VY9Bf7IdEREREREREdGr7h/vDzQyMoKBgQHU1dVhYWGhcs/e3h6LFy9WKXt8ZZetrS3mz5+Pf/3rX1i9ejUAYPHixfDw8JA+A0CrVq2kP2tpaUFXV1cllrq6OubOnavSb3p6OuLj4ysljKoTHR0NX19ffPzxxwAAR0dHpKenIyEhocZ2DRo0wMqVK6GmpoYWLVpg8eLFuH//Pj755BMAQGhoKD777DOkpaVh0KBB2Lp1K9TU1LB+/XrIZI+OO4mNjYWxsTGSkpLwzjvvAABMTEywatUqqKuro2XLlujduzcOHTqEoKAgFBQUQE9PD++++y4MDAxgbW2Ntm3bVjm+nJwc7N69G2lpadIW5C1btsDKygq7du3CgAEDAAClpaVYu3YtmjdvDgAIDg7GvHnzqp33mjVr4O7ujkWLFkllGzZsgJWVFc6fPw9HR0cAlb8Hrl27BgCYN2+etKqyqKgIa9asQVxcHHr27AkAiImJwYEDB/DVV19h+vTpUvvH2xERERERERER1Qd1coprdTw8PCqVJSYm4u2330aTJk1gYGCA4cOH488//0RRURGA/62gq621a9fCw8MDZmZm0NfXR0xMDAoKCp65fXZ2Njp06KBS9uTnqrRq1Qpqav97jObm5nB1dZU+q6urw9TUFNevXwcAZGRk4MKFCzAwMJDeadegQQM8fPgQubm5Kv2qq6tLny0tLaU+3n77bVhbW8POzg7Dhg3Dli1bcP/+/WrnpaGhgTfffFMqMzU1RYsWLZCdnS2V6erqSsm5J+NVJSMjA4mJidIc9PX10bJlSwBQmUdV3wNPlufm5qK0tBSenp5SmaamJt544w2VMdbUX4Xi4mLcuXNH5SpXltfYhoiIiIiIiIhqTynwqm+ea4JOT09P5fOlS5fQq1cvuLi4YMeOHcjIyMAXX3wBANJ2Sx0dnVrHiY+Px+TJkxEYGIj9+/cjMzMTH330kcqW0adRKv/el/fJ01JlMlmVZQrFox3SCoUC7dq1Q2Zmpsp1/vx5DBkypMZ+K/owMDDAr7/+im+//RaWlpaYM2cO2rRpg7/++uuZ56VUKqUVfNXFq+mZKBQK9OnTp9I8cnJyVLbbPvk9UFV5RZzHx1PVGGvqr0JERASMjIxUrqzb52tsQ0REREREREQk0nNN0D3pxIkTKCsrQ1RUFN566y04Ojri6tWrKnVat26NQ4cOVduHlpYWystVV0SlpKSgY8eOGDduHNq2bQt7e3uVVVzPwtnZGUePHlUpe/JzXXB3d0dOTg4aNWoEe3t7lcvIyOiZ+9HQ0ED37t2xePFinD59Gvn5+fj5558r1XN2dkZZWRmOHTsmlf355584f/48nJyc/tE8zp49Cxsbm0rzeFoS7Un29vbQ0tJCamqqVFZaWooTJ07UeoyhoaG4ffu2yuVs5FirPoiIiIiIiIjo6RQycVd980ITdM2bN0dZWRk+//xzXLx4EZs2bcLatWtV6oSGhuL48eMYN24cTp8+jf/85z9Ys2aNdDqojY0Njh07hvz8fNy4cQMKhQL29vY4ceIE9u3bh/PnzyMsLEzlsIpnERISgoSEBCxevBjnz5/HqlWrnvr+ub9j6NChaNiwIfz8/JCSkoK8vDwkJydj4sSJ+O9///tMfezZswcrV65EZmYmLl26hK+//hoKhQItWrSoVNfBwQF+fn4ICgpCamoqTp06hQ8//BBNmjSBn5/f357H+PHjcfPmTQwePBi//PILLl68iP379yMwMLBSAvVp9PT08K9//QvTp09HQkICsrKyEBQUhPv372PkyJG16ksul8PQ0FDlUpepP70hEREREREREZEgLzRB5+bmhmXLliEyMhIuLi7YsmULIiIiVOo4Ojpi//79OHXqFN544w106NAB//73v6Gh8eg8i2nTpkFdXR3Ozs4wMzNDQUEBxo4di379+sHf3x9vvvkm/vzzT4wbN65WY3vrrbewfv16fP7553Bzc8P+/fsxe/bsOpt7BV1dXRw+fBjNmjVDv3794OTkhMDAQDx48ACGhobP1IexsTF27tyJrl27wsnJCWvXrsW3336rcpjG42JjY9GuXTu8++676NChA5RKJfbu3VtpW2ttNG7cGGlpaSgvL4evry9cXFwwceJEGBkZqbyT71l99tln+OCDDzBs2DC4u7vjwoUL2LdvH0xMTP72GImIiIiIiIiIXgUy5d99+RrRK2KI9fvCYlvLav9OxbpySflAWGyRRj7UEha7jds1YbFv5NVua3ldunLbQFhskTqEGguLvWVpkbDYdv//zlgRLv6Dv1j6x7E1FMJij9D6S1hs6yU+wmIr82r3upK6VH7xirDYVxLEfa81T18lLPaIdlOFxaYXr0u5uN9bOuveFBZbpJT7DYTFFvnf0E/euyMstsHKPcJiv0ifWX8oLPbHlzYLi/08vNAVdERERERERERERKRKQ/QAiIiIiIiIiIjo1cMtmXWHK+iIiIiIiIiIiIgE4go6IiIiIiIiIiKqNQXX0NUZrqAjIiIiIiIiIiISiKe4Ur1XeuOisNi5HYOFxRZ5EptIcW5zhMV+v9VlYbF1e7YUFltm21xY7M3jTwuL/a/ricJi31n8rrDYby/JERb7wHQHYbFFfp+3+GiTsNhhum2ExRZJ5CmPI4seCottrWEkLPbGjChhsfn72ov3uv6+Jm9hICy2RjcvYbFF/r4m8gTZiPxvhMV+kRZaDxUWe9alLcJiPw/c4kpERERERERERLUmLgVa/3CLKxERERERERERkUBM0NFz5ePjg0mTJkmfbWxssGLFCmHjISIiIiIiIqK6oRR41Tfc4voakclk+P7779G3b19hYzh+/Dj09PSExSciIiIiIiIietlwBV09UVJS8tz6Li0trbO+zMzMoKurW2f9ERERERERERG96pigewF++OEHGBsbQ6F49PrEzMxMyGQyTJ8+XaozZswYDB48WPq8Y8cOtGrVCnK5HDY2NoiKUj3ZysbGBgsWLEBAQACMjIwQFBSEkpISBAcHw9LSEtra2rCxsUFERIRUHwDef/99yGQy6fOT8vPzIZPJEB8fDx8fH2hra2Pz5s34888/MXjwYDRt2hS6urpwdXXFt99+q9K2qKgIw4cPh76+PiwtLSuNuWIcFVtcK2JlZmZK9//66y/IZDIkJSUBAG7duoWhQ4fCzMwMOjo6cHBwQGxs7FOfORERERERERE9XwqBV33DBN0L4OXlhbt37+LkyZMAgOTkZDRs2BDJyclSnaSkJHh7ewMAMjIyMHDgQAwaNAhnzpxBeHg4wsLCEBcXp9LvkiVL4OLigoyMDISFhWHlypXYvXs34uPjce7cOWzevFlKxB0/fhwAEBsbi8LCQulzdWbOnImQkBBkZ2fD19cXDx8+RLt27bBnzx789ttvGD16NIYNG4Zjx45JbaZPn47ExER8//332L9/P5KSkpCRkfGPnl1YWBiysrLw008/ITs7G2vWrEHDhg3/UZ9ERERERERERC8TvoPuBTAyMoKbmxuSkpLQrl07JCUlYfLkyZg7dy7u3r2LoqIinD9/Hj4+PgCAZcuWoVu3bggLCwMAODo6IisrC0uWLEFAQIDUb9euXTFt2jTpc0FBARwcHNCpUyfIZDJYW1tL98zMzAAAxsbGsLCweOqYJ02ahH79+qmUPR5rwoQJSEhIwHfffYc333wT9+7dw1dffYWvv/4ab7/9NgBg48aNaNq0ae0e1hMKCgrQtm1beHh4AEC1K/+IiIiIiIiI6MVSyESPoP7gCroXxMfHB0lJSVAqlUhJSYGfnx9cXFyQmpqKxMREmJubo2XLlgCA7OxseHp6qrT39PRETk4OysvLpbKKpFWFgIAAZGZmokWLFggJCcH+/fv/9nif7Lu8vBwLFy5E69atYWpqCn19fezfvx8FBQUAgNzcXJSUlKBDhw5SmwYNGqBFixZ/ewwA8K9//Qtbt26Fm5sbZsyYgfT09BrrFxcX486dOypXcXHxPxoDEREREREREdHzxATdC+Lj44OUlBScOnUKampqcHZ2hre3N5KTk1W2twKAUqmETKaahlYqKx8i/ORpqO7u7sjLy8P8+fPx4MEDDBw4EP379/9b432y76ioKCxfvhwzZszAzz//jMzMTPj6+kqHU1Q1vqdRU1Or1PbJAyl69uyJS5cuYdKkSbh69Sq6deumspLvSRERETAyMlK5IqPX1npsRERERERERFQzBZTCrvqGCboXpOI9dCtWrIC3tzdkMhm8vb2RlJRUKUHn7OyM1NRUlfbp6elwdHSEurp6jXEMDQ3h7++PmJgYbNu2DTt27MDNmzcBAJqamior8GqjYtXfhx9+iDZt2sDOzg45OTnSfXt7e2hqauLo0aNS2a1bt3D+/Plq+6zYdltYWCiVPX5gxOP1AgICsHnzZqxYsQJffvlltX2Ghobi9u3bKtfMiWNrM1UiIiIiIiIioheK76B7QSreQ7d582ZER0cDeJS0GzBgAEpLS6X3zwHA1KlT0b59e8yfPx/+/v44cuQIVq1ahdWrV9cYY/ny5bC0tISbmxvU1NTw3XffwcLCAsbGxgAevb/t0KFD8PT0hFwuh4mJyTOP397eHjt27EB6ejpMTEywbNkyXLt2DU5OTgAAfX19jBw5EtOnT4epqSnMzc0xa9YsaZVcVXR0dPDWW2/hs88+g42NDW7cuIHZs2er1JkzZw7atWuHVq1aobi4GHv27JFiVkUul0Mul6uUlZbceOZ5EhERERERERG9aFxB9wJ16dIF5eXlUjLOxMQEzs7OMDMzU0k6ubu7Iz4+Hlu3boWLiwvmzJmDefPmqRwQURV9fX1ERkbCw8MD7du3R35+Pvbu3SslyaKionDgwAFYWVmhbdu2tRp7WFgY3N3d4evrCx8fH1hYWKBv374qdZYsWQIvLy+899576N69Ozp16oR27drV2O+GDRtQWloKDw8PTJw4EQsWLFC5r6WlhdDQULRu3RpeXl5QV1fH1q1bazV2IiIiIiIiIqp7SoFXfcMVdC/Q0qVLsXTpUpWyqrZ0AsAHH3yADz74oNq+8vPzK5UFBQUhKCio2jZ9+vRBnz59ahyjjY1Nle+Ta9CgAXbt2lVjW319fWzatAmbNm2SyqZPn17juJ2cnHDkyBGVssfjz549u9KqOiIiIiIiIiKi+oQJOiIiIiIiIiIiqjWF6AHUI9ziSkREREREREREJBATdERERERERERERAJxiysREREREREREdWaol4e1yAGV9AREREREREREREJxBV0RERERERERERUa1w/V3eYoKN673CrUGGxPfeNExZb5LxFStQuERf8rJWw0HaZfwmLDWQIi3xRR11Y7DfNWgiLvWVpkbDYX+lpC4stct7AaWGRrXTMhMW+qPF6ns12scRYYPRrAmOLk9sxWFjs5umrhMUWOe8rtw2ExYamprDQN/L0hMVGnsCfqQlJ4mKjgbDIdmXcNEivDn63EhERERERERERCcQVdEREREREREREVGuv51r754Mr6IiIiIiIiIiIiARigq6ekclk2LVrV4118vPzIZPJkJmZWaexn1e/RERERERERPTyUUAp7KpvmKB7hZSUCHz5/Qv2Os2ViIiIiIiIiF5vTNDVkR9++AHGxsZQKB7twM7MzIRMJsP06dOlOmPGjMHgwYOlzzt27ECrVq0gl8thY2ODqKgolT5tbGywYMECBAQEwMjICEFBQSgpKUFwcDAsLS2hra0NGxsbRERESPUB4P3334dMJpM+P8nW1hYA0LZtW8hkMvj4+Ej3YmNj4eTkBG1tbbRs2RKrV6+W7gUGBqJ169YoLi4GAJSWlqJdu3YYOnRojf36+Phg0qRJKmPo27cvAgICapwrAKSnp8PLyws6OjqwsrJCSEgIiopEnuJHRERERERERACgFHjVN0zQ1REvLy/cvXsXJ0+eBAAkJyejYcOGSE5OluokJSXB29sbAJCRkYGBAwdi0KBBOHPmDMLDwxEWFoa4uDiVfpcsWQIXFxdkZGQgLCwMK1euxO7duxEfH49z585h8+bNUiLu+PHjAB4l2QoLC6XPT/rll18AAAcPHkRhYSF27twJAIiJicGsWbOwcOFCZGdnY9GiRQgLC8PGjRsBACtXrkRRURE+/vhjAEBYWBhu3LghJfGq6/dZPTnXM2fOwNfXF/369cPp06exbds2pKamIjhY3HH0RERERERERER1jae41hEjIyO4ubkhKSkJ7dq1Q1JSEiZPnoy5c+fi7t27KCoqwvnz56VVZcuWLUO3bt0QFhYGAHB0dERWVhaWLFmisrKsa9eumDZtmvS5oKAADg4O6NSpE2QyGaytraV7ZmZmAABjY2NYWFhUO9aKeqampir15s+fj6ioKPTr1w/AoxVxWVlZWLduHUaMGAF9fX1s3rwZ3t7eMDAwQFRUFA4dOgQjI6Ma+31WT851+PDhGDJkiLT6zsHBAStXroS3tzfWrFkDbW3tWscgIiIiIiIiInrZcAVdHfLx8UFSUhKUSiVSUlLg5+cHFxcXpKamIjExEebm5mjZsiUAIDs7G56enirtPT09kZOTg/LycqnMw8NDpU5AQAAyMzPRokULhISEYP/+/XUy9j/++AOXL1/GyJEjoa+vL10LFixAbm6uVK9Dhw6YNm0a5s+fj6lTp8LLy6tO4gOV55qRkYG4uDiV8fj6+kKhUCAvL6/KPoqLi3Hnzh2Vq0RZXmVdIiIiIiIiIvr7FAKv+oYr6OqQj48PvvrqK5w6dQpqampwdnaGt7c3kpOTcevWLWl7KwAolUrIZDKV9kpl5V3Uenp6Kp/d3d2Rl5eHn376CQcPHsTAgQPRvXt3bN++/R+NveLdeTExMXjzzTdV7qmrq6vUS0tLg7q6OnJycp6pbzU1tUpzKy0trVTvybkqFAqMGTMGISEhleo2a9asylgRERGYO3euStkwXWeM0Hd5prESEREREREREb1oXEFXhyreQ7dixQp4e3tDJpPB29sbSUlJKu+fAwBnZ2ekpqaqtE9PT4ejo6NKQqwqhoaG8Pf3R0xMDLZt24YdO3bg5s2bAABNTU2VFXhV0dLSAgCVeubm5mjSpAkuXrwIe3t7lavi8Afg0XvisrOzkZycjH379iE2NrbGfoFHW18LCwulz+Xl5fjtt99qHCPwKBl59uzZSuOxt7eXYj0pNDQUt2/fVrkG6zk9NRYRERERERER1Y5S4D/1DVfQ1aGK99Bt3rwZ0dHRAB4l7QYMGIDS0lKV01KnTp2K9u3bY/78+fD398eRI0ewatUqlVNTq7J8+XJYWlrCzc0Nampq+O6772BhYQFjY2MAj05DPXToEDw9PSGXy2FiYlKpj0aNGkFHRwcJCQlo2rQptLW1YWRkhPDwcISEhMDQ0BA9e/ZEcXExTpw4gVu3bmHKlCnIzMzEnDlzsH37dnh6eiI6OhoTJ06Et7c37Ozsqu23a9eumDJlCn788Uc0b94cy5cvx19//fXU5zlz5ky89dZbGD9+PIKCgqCnp4fs7GwcOHAAn3/+eZVt5HI55HK5SpmWrOaEJxERERERERGRSFxBV8e6dOmC8vJyKRlnYmICZ2dnmJmZwcnpfyu53N3dER8fj61bt8LFxQVz5szBvHnzVA6IqIq+vj4iIyPh4eGB9u3bIz8/H3v37oWa2qMvZVRUFA4cOAArKyu0bdu2yj40NDSwcuVKrFu3Do0bN4afnx8AYNSoUVi/fj3i4uLg6uoKb29vxMXFwdbWFg8fPsTQoUMREBCAPn36AABGjhyJ7t27Y9iwYSgvL6+238DAQIwYMQLDhw+Ht7c3bG1t0aVLl6c+y9atWyM5ORk5OTno3Lkz2rZti7CwMFhaWj61LRERERERERHRq0KmrOrFZ0T1yCFzf2GxPfcNExY7zXeTsNgifaVdIix2l3K9p1d6TuyqeK/j6+CgjrgVsodLrwmLHSBrLCx2Z92bwmKn3G8gLLZIccqrwmJ7adb+VHb6Z0T+bLHWMBIWe468WFjs5umrhMXO7RgsLPaV2wbCYl/U1BQWW+R/x15Xr+t/v4P+u1n0EF6IYBtx/397Vf42YbGfB66gIyIiIiIiIiIiEojvoCMiIiIiIiIiolpT1MPDGkThCjoiIiIiIiIiIiKBuIKOiIiIiIiIiIhqjevn6g5X0BEREREREREREQnEFXRU73mdjRAWW+TJXF5nxZ1IJtJFtznCYg/q95ew2Op2TYTFVvPpIyz2xXe3Cou95cEfwmIPneMgLPbbS8SdKHpguriTkmW2zYXFnv/RKWGxu5eZCYv9up7yePj1PJT7tT1JVeS8xf1UE/v7WkPbImGx5S3EnZyr0c1LWOwrozOExT6ooy4sNlFtMUFHRERERERERES1xkMi6g63uBIREREREREREQnEBB1JZDIZdu3aJXoYRERERERERPQKUAi86hsm6F4SL0NyrLCwED179nyuMfLz8yGTyZCZmflc4xARERERERERvSqYoHsBSkpKRA+hRhXjs7CwgFwuFzyaZ1da+pq+RZmIiIiIiIiI6pXXPkH3ww8/wNjYGArFowWSmZmZkMlkmD59ulRnzJgxGDx4sPR5x44daNWqFeRyOWxsbBAVFaXSp42NDRYsWICAgAAYGRkhKCgIJSUlCA4OhqWlJbS1tWFjY4OIiAipPgC8//77kMlk0ucnVaw+27p1Kzp27AhtbW20atUKSUlJKvWysrLQq1cv6Ovrw9zcHMOGDcONGzek+z4+PggODsaUKVPQsGFDvP322wBUV/FVxIqPj0fnzp2ho6OD9u3b4/z58zh+/Dg8PDygr6+PHj164I8/VE8yjI2NhZOTE7S1tdGyZUusXr1aumdrawsAaNu2LWQyGXx8fJ6p3ePj8fHxgba2NjZv3lzlcyIiIiIiIiKi508p8J/65rVP0Hl5eeHu3bs4efIkACA5ORkNGzZEcnKyVCcpKQne3t4AgIyMDAwcOBCDBg3CmTNnEB4ejrCwMMTFxan0u2TJEri4uCAjIwNhYWFYuXIldu/ejfj4eJw7dw6bN2+WEnHHjx8H8ChBVVhYKH2uzvTp0zF16lScPHkSHTt2xHvvvYc///wTwKNtqt7e3nBzc8OJEyeQkJCA33//HQMHDlTpY+PGjdDQ0EBaWhrWrVtXbaxPP/0Us2fPxq+//goNDQ0MHjwYM2bMQHR0NFJSUpCbm4s5c/53THpMTAxmzZqFhQsXIjs7G4sWLUJYWBg2btwIAPjll18AAAcPHkRhYSF27tz5TO0qzJw5EyEhIcjOzoavr2+Nz4mIiIiIiIiI6FWgIXoAohkZGcHNzQ1JSUlo164dkpKSMHnyZMydOxd3795FUVERzp8/L630WrZsGbp164awsDAAgKOjI7KysrBkyRIEBARI/Xbt2hXTpk2TPhcUFMDBwQGdOnWCTCaDtbW1dM/MzAwAYGxsDAsLi6eOOTg4GB988AEAYM2aNUhISMBXX32FGTNmYM2aNXB3d8eiRYuk+hs2bICVlRXOnz8PR0dHAIC9vT0WL1781FjTpk2TEmETJ07E4MGDcejQIXh6egIARo4cqZKcnD9/PqKiotCvXz8Aj1bMZWVlYd26dRgxYoQ0V1NTU5W5Pq1dhUmTJkl1iIiIiIiIiEic+nhYgyiv/Qo64NGWz6SkJCiVSqSkpMDPzw8uLi5ITU1FYmIizM3N0bJlSwBAdna2lJyq4OnpiZycHJSXl0tlHh4eKnUCAgKQmZmJFi1aICQkBPv37//b4+3QoYP0Zw0NDXh4eCA7OxvAoxV+iYmJ0NfXl66Ksefm5lY7vuq0bt1a+rO5uTkAwNXVVaXs+vXrAIA//vgDly9fxsiRI1XiL1iwQCX2k2rT7mnjLi4uxp07d1Su4uLiZ5orEREREREREZEIr/0KOuBRgu6rr77CqVOnoKamBmdnZ3h7eyM5ORm3bt2StrcCgFKphEwmU2mvVFbe+6ynp6fy2d3dHXl5efjpp59w8OBBDBw4EN27d8f27dvrZA4VY1IoFOjTpw8iIyMr1bG0tKx2fNXR1NSsFOPJsor391X8b0xMDN58802VftTV1auNUZt2Txt3REQE5s6dq1I2e3oI5syYWGM7IiIiIiIiIqqd+vguOFGYoMP/3kO3YsUKeHt7QyaTwdvbGxEREbh16xYmTvxfcsfZ2Rmpqakq7dPT0+Ho6FhjEgoADA0N4e/vD39/f/Tv3x89evTAzZs30aBBA2hqaqqswKvJ0aNH4eXlBQAoKytDRkYGgoODATxKBO7YsQM2NjbQ0HixX15zc3M0adIEFy9exNChQ6uso6WlBQAqc32Wds8qNDQUU6ZMUSlTu3vlH/VJRERERERERPQ8MUGH/72HbvPmzYiOjgbwKGk3YMAAlJaWqpw0OnXqVLRv3x7z58+Hv78/jhw5glWrVqmcOFqV5cuXw9LSEm5ublBTU8N3330HCwsLGBsbA3h0kmvFu93kcjlMTEyq7euLL76Ag4MDnJycsHz5cty6dQuBgYEAgPHjxyMmJgaDBw/G9OnT0bBhQ1y4cAFbt25FTEzMU5OI/1R4eDhCQkJgaGiInj17ori4GCdOnMCtW7cwZcoUNGrUCDo6OkhISEDTpk2hra0NIyOjp7Z7VnK5HHK5XKWstORGNbWJiIiIiIiIiMTjO+j+X5cuXVBeXi4l40xMTODs7AwzMzM4OTlJ9dzd3REfH4+tW7fCxcUFc+bMwbx581QOiKiKvr4+IiMj4eHhgfbt2yM/Px979+6FmtqjL0FUVBQOHDgAKysrtG3btsa+PvvsM0RGRqJNmzZISUnBv//9bzRs2BAA0LhxY6SlpaG8vBy+vr5wcXHBxIkTYWRkJMV6nkaNGoX169cjLi4Orq6u8Pb2RlxcHGxtbQE8emfeypUrsW7dOjRu3Bh+fn7P1I6IiIiIiIiIXi4KgVd9I1NW9QI1einl5+fD1tYWJ0+ehJubm+jhvDJKb1wUFju3Y7Cw2M3TVwmLLVKc2xxhsQf1+0tYbHW7JsJiq/n0ERZ707tbhcWef/+UsNj/mdNRWOy3l+QIi31guoOw2DLb5sJit/hok7DYsZrOwmJffOydty9aZ92bwmKPLHooLLa1hpGw2BszooTF5u9rL57I39feb3VZWGx5CwNhsTW6eQmLnT46Q1jsgzrPdwdZTSLyvxEW+0UaYfOBsNgb83cIi/08cIsrERERERERERHVmoJrvuoMt7gSEREREREREREJxBV0rxAbGxtwRzIRERERERERUf3CFXRERERERERERFRrSoFXba1evRq2trbQ1tZGu3btkJKSUmP95ORktGvXDtra2rCzs8PatWv/RtRnxwQdERERERERERHVW9u2bcOkSZMwa9YsnDx5Ep07d0bPnj1RUFBQZf28vDz06tULnTt3xsmTJ/HJJ58gJCQEO3Y8v4MpmKAjIiIiIiIiIqJaU0Ap7KqNZcuWYeTIkRg1ahScnJywYsUKWFlZYc2aNVXWX7t2LZo1a4YVK1bAyckJo0aNQmBgIJYuXVoXj61KfAcd1Xvl/80SFvvKbXFHqdsInLd6U2dhsUUqPndXWGxdO2Ghobwk7nvtdSWzbS4weo7A2PSiXdTUFD0EIUT+9xsaD8XFfk2J/HqL/Gn+urqRpycsdkOI+11R3S5XWGyx/y1RCIxNz1txcTGKi4tVyuRyOeRyuUpZSUkJMjIy8PHHH6uUv/POO0hPT6+y7yNHjuCdd95RKfP19cVXX32F0tJSaD6H72uuoCMiIiIiIiIiolpTCvwnIiICRkZGKldERESlMd64cQPl5eUwNzdXKTc3N8e1a9eqnNe1a9eqrF9WVoYbN27U3QN8DFfQERERERERERHRKyU0NBRTpkxRKXty9dzjZDKZymelUlmp7Gn1qyqvK0zQERERERERERHRK6Wq7axVadiwIdTV1Sutlrt+/XqlVXIVLCwsqqyvoaEBU1PTvz/oGnCLK1VLJpNh165doodBRERERERERC8hhcDrWWlpaaFdu3Y4cOCASvmBAwfQsWPHKtt06NChUv39+/fDw8Pjubx/DmCC7rVVUlIiLHZpaamw2ERERERERET0epkyZQrWr1+PDRs2IDs7G5MnT0ZBQQHGjh0L4NF22eHDh0v1x44di0uXLmHKlCnIzs7Ghg0b8NVXX2HatGnPbYxM0L2EfvjhBxgbG0OheJQTzszMhEwmw/Tp06U6Y8aMweDBg6XPO3bsQKtWrSCXy2FjY4OoqCiVPm1sbLBgwQIEBATAyMgIQUFBKCkpQXBwMCwtLaGtrQ0bGxvphYo2NjYAgPfffx8ymUz6XJWZM2fC0dERurq6sLOzQ1hYmEoSLjw8HG5ubtiwYQPs7Owgl8uhVCpx+/ZtjB49Go0aNYKhoSG6du2KU6dOSe1yc3Ph5+cHc3Nz6Ovro3379jh48ODffq5EREREREREVHcUUAq7asPf3x8rVqzAvHnz4ObmhsOHD2Pv3r2wtrYGABQWFqKgoECqb2tri7179yIpKQlubm6YP38+Vq5ciQ8++KBOn9/j+A66l5CXlxfu3r2LkydPol27dkhOTkbDhg2RnJws1UlKSsLkyZMBABkZGRg4cCDCw8Ph7++P9PR0jBs3DqampggICJDaLFmyBGFhYZg9ezYAYOXKldi9ezfi4+PRrFkzXL58GZcvXwYAHD9+HI0aNUJsbCx69OgBdXX1asdrYGCAuLg4NG7cGGfOnEFQUBAMDAwwY8YMqc6FCxcQHx+PHTt2SH317t0bDRo0wN69e2FkZIR169ahW7duOH/+PBo0aIB79+6hV69eWLBgAbS1tbFx40b06dMH586dQ7NmzerseRMRERERERFR/TZu3DiMGzeuyntxcXGVyry9vfHrr78+51H9DxN0LyEjIyO4ubkhKSkJ7dq1k5Jxc+fOxd27d1FUVITz58/Dx8cHALBs2TJ069YNYWFhAABHR0dkZWVhyZIlKgm6rl27qizHLCgogIODAzp16gSZTCZljgHAzMwMAGBsbAwLC4sax1uR8AMerbybOnUqtm3bppKgKykpwaZNm6R+f/75Z5w5cwbXr1+XXuq4dOlS7Nq1C9u3b8fo0aPRpk0btGnTRupjwYIF+P7777F7924EBwfX5pESEREREREREb20uMX1JeXj44OkpCQolUqkpKTAz88PLi4uSE1NRWJiIszNzdGyZUsAQHZ2Njw9PVXae3p6IicnB+Xl5VKZh4eHSp2AgABkZmaiRYsWCAkJwf79+//WWLdv345OnTrBwsIC+vr6CAsLU1kaCgDW1tZScg54tOrv3r17MDU1hb6+vnTl5eUhNzcXAFBUVIQZM2bA2dkZxsbG0NfXx3/+859KfT+uuLgYd+7cUbmKS/jOOyIiIiIiIqK6phT4T33DBN1LysfHBykpKTh16hTU1NTg7OwMb29vJCcnIykpCd7e3lJdpVIJmUym0l6prPzNqqenp/LZ3d0deXl5mD9/Ph48eICBAweif//+tRrn0aNHMWjQIPTs2RN79uzByZMnMWvWrEqHUDwZW6FQwNLSEpmZmSrXuXPnpHftTZ8+HTt27MDChQuRkpKCzMxMuLq61njARUREBIyMjFSuJRu+q9WciIiIiIiIiIheJG5xfUlVvIduxYoV8Pb2hkwmg7e3NyIiInDr36HrtgABAABJREFU1i1MnDhRquvs7IzU1FSV9unp6XB0dKzx3XEAYGhoCH9/f/j7+6N///7o0aMHbt68iQYNGkBTU1NlBV5V0tLSYG1tjVmzZkllly5deur83N3dce3aNWhoaFR7AEVKSgoCAgLw/vvvAwDu3buH/Pz8GvsNDQ3FlClTVMqU/zn01PEQERERERERUe0oRA+gHmGC7iVV8R66zZs3Izo6GsCjpN2AAQNQWloqvX8OAKZOnYr27dtj/vz58Pf3x5EjR7Bq1SqsXr26xhjLly+HpaUl3NzcoKamhu+++w4WFhYwNjYG8Oh9cocOHYKnpyfkcjlMTEwq9WFvb4+CggJs3boV7du3x48//ojvv//+qfPr3r07OnTogL59+yIyMhItWrTA1atXsXfvXvTt2xceHh6wt7fHzp070adPH8hkMoSFhUkn21ZHLpdL77Sr8FBL86njISIiIiIiIiIShVtcX2JdunRBeXm5lIwzMTGBs7MzzMzM4OTkJNVzd3dHfHw8tm7dChcXF8yZMwfz5s1TOSCiKvr6+oiMjISHhwfat2+P/Px87N27F2pqj74toqKicODAAVhZWaFt27ZV9uHn54fJkycjODgYbm5uSE9Plw6rqIlMJsPevXvh5eWFwMBAODo6YtCgQcjPz4e5uTmARwlEExMTdOzYEX369IGvry/c3d2f4ckRERERERER0fOmVCqFXfWNTFkfZ0X0mIeZe4TFTvPdJCy2575hwmKrN3UWFjvObY6w2O+3uiwstm7PlsJiy2ybC4u9efxpYbHn3z8lLPa5WHH/fncflyAs9oHpDsJii/w+b/GRuP+WhOm2eXqlesiuVNwBU3M0/hAW21rDSFjsjRlRwmIfbhUqLLbX2QhhsUUS+ftaZ92bwmI3tC0SFlvk74pbloqb90UNcRswI/K/ERb7RXq/WR9hsb8v+EFY7OeBK+iIiIiIiIiIiIgE4jvoiIiIiIiIiIio1hTgpsy6whV0REREREREREREAnEFHRERERERERER1Zq4t/zVP1xBR0REREREREREJBBPcaV6L9RmiOghCGFXJi7/LvK0pDnRbsJilx06LCz2lQRxz1zkiWQ38vSExRY576hzTYTFPlx6TVhsL00LYbFF6v6gXFjsi5qawmK/rhLVxf1s6VIu7mcqvV4CMucJi/1wXoiw2MXn7gqL/bqStzAQFttg5R5hsV+kPs3eFRb7h4L69Yy5xZWIiIiIiIiIiGpNyUMi6gy3uBIREREREREREQnEFXRERET/x96dx9WU/38Af532fVGpkJIWlVKJQabFMoghM2TJkjCMoYTQdyajsieyDBpLIfs61kjcJoREtlJJZMnYtZC2z++Pvp2fqzDm2+kY3s953Me45557Xp9z7u326XM/CyGEEEIIIeSjVVIPujpDPejIe3Ech71794pdDEIIIYQQQgghhJDPFjXQ1aF/c2PWzJkzYW9vL3YxCCGEEEIIIYQQ8i/BGBPt9rmhBrq/qbS0VOwiEEIIIYQQQgghhJDP0GfRQLd//35oaWmhsrISAJCWlgaO4xAYGMjvM2bMGAwaNIi/v2vXLtjY2EBRUREmJiaIiIiQOqaJiQlmzZoFHx8faGpqYvTo0SgtLcX48eNhaGgIJSUlmJiYYO7cufz+ANC3b19wHMfff9v7jgFU9cKLiopCr169oKKiAisrKyQnJ+PGjRtwc3ODqqoq2rdvj5ycHKnjrly5Es2bN4eCggIsLS2xceNGqcfz8vLQp08fqKmpQUNDA15eXvjrr78AADExMQgJCcGlS5fAcRw4jkNMTAz/3MePH6Nv375QUVGBubk59u3bxz8mkUjAcRwSEhLg5OQEFRUVdOjQAZmZmTVeo9atW0NJSQmmpqYICQlBeXk5//jMmTPRtGlTKCoqolGjRvDz+//lz1esWAFzc3MoKSlBX18f/fr1q/XaEkIIIYQQQgghhPwbfRYNdC4uLigsLMTFixcBAImJidDV1UViYiK/j0QigaurKwAgNTUVXl5eGDhwIK5cuYKZM2ciODhYqlEKAMLDw9GyZUukpqYiODgYS5cuxb59+7B9+3ZkZmYiNjaWb4hLSUkBAERHRyM/P5+//7b3HaNaWFgYhg0bhrS0NLRo0QKDBw/GmDFjEBQUhPPnzwMAxo8fz++/Z88e+Pv7Y/Lkybh69SrGjBmDESNG4MSJEwCqupx6enri6dOnSExMRHx8PHJycjBgwAAAwIABAzB58mTY2NggPz8f+fn5/GMAEBISAi8vL1y+fBkeHh7w9vbG06dPpcr8888/IyIiAufPn4ecnBx8fX35x44cOYIhQ4bAz88P6enpiIqKQkxMDGbPng0A2LlzJxYvXoyoqChkZ2dj7969sLW1BQCcP38efn5+CA0NRWZmJuLi4uDi4lLrtSWEEEIIIYQQQkj9qRTx9rn5LFZx1dTUhL29PSQSCVq3bg2JRIKAgACEhISgsLAQxcXFyMrKgpubGwBg0aJF6Ny5M4KDgwEAFhYWSE9PR3h4OHx8fPjjdurUCVOmTOHv5+XlwdzcHB07dgTHcTA2NuYf09PTAwBoaWnBwMDgnWV93zGqjRgxAl5eXgCAadOmoX379ggODka3bt0AAP7+/hgxYgS//8KFC+Hj44Nx48YBACZNmoQzZ85g4cKFcHd3x7Fjx3D58mXk5ubCyMgIALBx40bY2NggJSUFbdq0gZqaGuTk5Gotu4+PD9/7cM6cOVi2bBnOnTuH7t278/vMnj2bbwCdPn06evbsiZKSEigpKWH27NmYPn06hg8fDgAwNTVFWFgYpk6dil9//RV5eXkwMDBAly5dIC8vj6ZNm6Jt27b89VJVVUWvXr2grq4OY2NjODg4vPP6vn79Gq9fv5baVs4qIMfJvvM5hBBCCCGEEEIIIWL6LHrQAYCbmxskEgkYY0hKSkKfPn3QsmVLnDx5EidOnIC+vj5atGgBAMjIyICzs7PU852dnZGdnY2Kigp+m5OTk9Q+Pj4+SEtLg6WlJfz8/HD06NGPLuffOYadnR3/b319fQDge5RVbyspKUFBQcF7zycjI4N/3MjIiG+cAwBra2toaWnx+7zPm+VRVVWFuro6Hj58+M59DA0NAYDfJzU1FaGhoVBTU+Nvo0ePRn5+Pl6+fIn+/fvj1atXMDU1xejRo7Fnzx5++GvXrl1hbGwMU1NTDB06FJs2bcLLly/fWda5c+dCU1NT6pb8Iv2D50gIIYQQQgghhJCPw0T873PzWTXQJSUl4dKlS5CRkYG1tTVcXV2RmJgoNbwVqBryyXGc1PNrWwFEVVVV6r6joyNyc3MRFhaGV69ewcvL66PnQ/s7x5CXl+f/XV3O2rZVz7n35rY3z6d6W23n+77tb3szuzrrzewPla+yshIhISFIS0vjb1euXEF2djaUlJRgZGSEzMxM/Pbbb1BWVsa4cePg4uKCsrIyqKur48KFC9iyZQsMDQ0xY8YMtGrVCs+fP6+1rEFBQXjx4oXUrb2m9QfPkRBCCCGEEEIIIUQsn00DXfU8dJGRkXB1dQXHcXB1dYVEIqnRQGdtbY2TJ09KPf/06dOwsLCArOz7h0JqaGhgwIABWL16NbZt24Zdu3bx87HJy8tL9cD7J8f4J6ysrGo9HysrKwBV55uXl4c7d+7wj6enp+PFixf8PgoKCn+r7P+Eo6MjMjMzYWZmVuMmI1P1FlRWVkbv3r2xdOlSSCQSJCcn48qVKwAAOTk5dOnSBQsWLMDly5dx69YtHD9+vNYsRUVFaGhoSN1oeCshhBBCCCGEEEI+ZZ/FHHTA/89DFxsbiyVLlgCoarTr378/ysrK+PnnAGDy5Mlo06YNwsLCMGDAACQnJ2P58uVYsWLFezMWL14MQ0ND2NvbQ0ZGBjt27ICBgQG0tLQAVK3kmpCQAGdnZygqKkJbW/ujj/FPBAYGwsvLC46OjujcuTP279+P3bt349ixYwCALl26wM7ODt7e3oiMjER5eTnGjRsHV1dXfhiviYkJcnNzkZaWhiZNmkBdXR2Kior/uExvmjFjBnr16gUjIyP0798fMjIyuHz5Mq5cuYJZs2YhJiYGFRUV+Oqrr6CiooKNGzdCWVkZxsbGOHDgAG7evAkXFxdoa2vj0KFDqKyshKWlZZ2UjRBCCCGEEEIIIf9M5Wc41FQsn00POgBwd3dHRUUF3xinra0Na2tr6Onp8T3FgKoeXdu3b8fWrVvRsmVLzJgxA6GhoVILRNRGTU0N8+fPh5OTE9q0aYNbt27h0KFDfC+wiIgIxMfHw8jI6J0LGXzoGP+Ep6cnlixZgvDwcNjY2CAqKgrR0dH8deA4Dnv37oW2tjZcXFzQpUsXmJqaYtu2bfwxvv/+e3Tv3h3u7u7Q09PDli1b/nF53tatWzccOHAA8fHxaNOmDdq1a4dFixbxC2RoaWlh9erVcHZ2hp2dHRISErB//37o6OhAS0sLu3fvRqdOnWBlZYVVq1Zhy5YtsLGxqbPyEUIIIYQQQgghhIiJY7VNvkbIZyTIZLDYRRCFabl47e835cRb9HrGEnvRsssT/hQt+16ceNdct1mxaNmPc1U/vJNAxDzviMzGomX/WfZAtGwX+Xevkv456/JKmCko/o6bb81DS4R3Qla8zxb3CvE+U8mXxSctVLTsklA/0bJfZxaKlv2lUrRUFy1bfekB0bLrU+cm34iWnXD34xfu/JR9Vj3oCCGEEEIIIYQQQgj5t6EGOkIIIYQQQgghhBBCRPTZLBJBCCGEEEIIIYQQQuoPLRJRd6gHHSGEEEIIIYQQQgghIqIedIQQQgghhBBCCCHkozHqQVdnqIGOfPaGKzwXLfveC/FWDWplf0+0bDFX1ixecVC0bLX5gaJlG3dOFy2b5eaIlq17+Lpo2WISc1XPP0WsOYh53mKaIfdItOz4KeaiZb/8Qn++b4q4SvPXKk9Fy6YVweufmOct5kqqSjOWipYtf1fE+tpt8bLLE/4ULVuus4to2YR8LGqgI4QQQgghhBBCCCEfrZJRD7q6QnPQEUIIIYQQQgghhBAiImqgI4QQQgghhBBCCCFERNRA9zdxHIe9e/eKXQzRxMTEQEtLS+xiEEIIIYQQQggh5BPBRLx9bqiBDkBpaanYRSCEEEIIIYQQQgghX6hPvoFu//790NLSQmVlJQAgLS0NHMchMPD/V0scM2YMBg0axN/ftWsXbGxsoKioCBMTE0REREgd08TEBLNmzYKPjw80NTUxevRolJaWYvz48TA0NISSkhJMTEwwd+5cfn8A6Nu3LziO4+/X5u7duxg4cCAaNGgAVVVVODk54ezZs/zjK1euRPPmzaGgoABLS0ts3LhR6vkcxyEqKgq9evWCiooKrKyskJycjBs3bsDNzQ2qqqpo3749cnL+f9XEmTNnwt7eHlFRUTAyMoKKigr69++P58+f8/ukpKSga9eu0NXVhaamJlxdXXHhwgWp7OfPn+OHH36Avr4+lJSU0LJlSxw4cAASiQQjRozAixcvwHEcOI7DzJkz+WszZ84c+Pr6Ql1dHU2bNsXvv/8uddx79+5hwIAB0NbWho6ODvr06YNbt27xj0skErRt2xaqqqrQ0tKCs7Mzbt++DQC4dOkS3N3doa6uDg0NDbRu3Rrnz59/5/UnhBBCCCGEEEJI/agEE+32ufnkG+hcXFxQWFiIixcvAgASExOhq6uLxMREfh+JRAJXV1cAQGpqKry8vDBw4EBcuXIFM2fORHBwMGJiYqSOGx4ejpYtWyI1NRXBwcFYunQp9u3bh+3btyMzMxOxsbF8Q1xKSgoAIDo6Gvn5+fz9txUVFcHV1RX379/Hvn37cOnSJUydOpVvXNyzZw/8/f0xefJkXL16FWPGjMGIESNw4sQJqeOEhYVh2LBhSEtLQ4sWLTB48GCMGTMGQUFBfOPU+PHjpZ5z48YNbN++Hfv370dcXBzS0tLw008/8Y8XFhZi+PDhSEpKwpkzZ2Bubg4PDw8UFhYCACorK9GjRw+cPn0asbGxSE9Px7x58yArK4sOHTogMjISGhoayM/PR35+PqZMmcIfOyIiAk5OTrh48SLGjRuHH3/8EdevXwcAvHz5Eu7u7lBTU8Off/6JkydPQk1NDd27d0dpaSnKy8vh6ekJV1dXXL58GcnJyfjhhx/AcRwAwNvbG02aNEFKSgpSU1Mxffp0yMvLv+8tQwghhBBCCCGEEPKvIid2AT5EU1MT9vb2kEgkaN26NSQSCQICAhASEoLCwkIUFxcjKysLbm5uAIBFixahc+fOCA4OBgBYWFggPT0d4eHh8PHx4Y/bqVMnqUamvLw8mJubo2PHjuA4DsbGxvxjenp6AAAtLS0YGBi8s6ybN2/Go0ePkJKSggYNGgAAzMzM+McXLlwIHx8fjBs3DgAwadIknDlzBgsXLoS7uzu/34gRI+Dl5QUAmDZtGtq3b4/g4GB069YNAODv748RI0ZIZZeUlGD9+vVo0qQJAGDZsmXo2bMnIiIiYGBggE6dOkntHxUVBW1tbSQmJqJXr144duwYzp07h4yMDFhYWAAATE1NpV4HjuNqPX8PDw/+nKZNm4bFixdDIpGgRYsW2Lp1K2RkZLBmzRq+0S06OhpaWlqQSCRwcnLCixcv0KtXLzRv3hwAYGVlJfW6BAYGokWLFgAAc3Pzd15/QgghhBBCCCGEkH+jT74HHQC4ublBIpGAMYakpCT06dMHLVu2xMmTJ3HixAno6+vzDTgZGRlwdnaWer6zszOys7NRUVHBb3NycpLax8fHB2lpabC0tISfnx+OHj360eVMS0uDg4MD3zj3tneVLSMjQ2qbnZ0d/299fX0AgK2trdS2kpISFBQU8NuaNm3KN84BQPv27VFZWYnMzEwAwMOHDzF27FhYWFhAU1MTmpqaKCoqQl5eHl/2Jk2a8I1zH+PN8lY34j18+BBAVY/GGzduQF1dHWpqalBTU0ODBg1QUlKCnJwcNGjQAD4+PujWrRu+/fZbLFmyBPn5+fzxJk2ahFGjRqFLly6YN2+e1NDe2rx+/RoFBQVSt9LKivc+hxBCCCGEEEIIIR+PhrjWnX9NA11SUhIuXboEGRkZWFtbw9XVFYmJiVLDWwGAMcb31Hpz29tUVVWl7js6OiI3NxdhYWF49eoVvLy80K9fv48qp7Ky8gf3qa1sb297cwhn9WO1baseOvu+nOr/+/j4IDU1FZGRkTh9+jTS0tKgo6PDL5Dxd8r+Lm8POeU4ji9bZWUlWrdujbS0NKlbVlYWBg8eDKCqR11ycjI6dOiAbdu2wcLCAmfOnAFQNb/etWvX0LNnTxw/fhzW1tbYs2fPO8syd+5cvgGy+vb7s5v/+NwIIYQQQgghhBBChPavaKCrnocuMjISrq6u4DgOrq6ukEgkNRrorK2tcfLkSannnz59GhYWFpCVlX1vjoaGBgYMGIDVq1dj27Zt2LVrF54+fQqgqhHqzR54tbGzs0NaWhr/nLdZWVnVWrY3h3T+U3l5ebh//z5/Pzk5GTIyMnyPuKSkJPj5+cHDw4NfQOPx48dSZb979y6ysrJqPb6CgsIHz782jo6OyM7ORsOGDWFmZiZ109TU5PdzcHBAUFAQTp8+jZYtW2Lz5s38YxYWFggICMDRo0fx3XffITo6+p15QUFBePHihdTtB23Td+5PCCGEEEIIIYSQf4YxJtrtc/OvaKCrnocuNjaWn2vOxcUFFy5ckJp/DgAmT56MhIQEhIWFISsrC+vXr8fy5cul5purzeLFi7F161Zcv34dWVlZ2LFjBwwMDKClpQWgarXShIQEPHjwAM+ePav1GIMGDYKBgQE8PT1x6tQp3Lx5E7t27UJycjIAIDAwEDExMVi1ahWys7OxaNEi7N69+4Nl+zuUlJQwfPhwXLp0iW+M8/Ly4ueMMzMzw8aNG5GRkYGzZ8/C29tbqtecq6srXFxc8P333yM+Ph65ubk4fPgw4uLi+PMvKipCQkICHj9+jJcvX/6tcnl7e0NXVxd9+vRBUlIScnNzkZiYCH9/f9y9exe5ubkICgpCcnIybt++jaNHjyIrKwtWVlZ49eoVxo8fD4lEgtu3b+PUqVNISUl5b4OmoqIiNDQ0pG4KMu9vmCWEEEIIIYQQQggR07+igQ4A3N3dUVFRwTfGaWtrw9raGnp6elINNo6Ojti+fTu2bt2Kli1bYsaMGQgNDZVaIKI2ampqmD9/PpycnNCmTRvcunULhw4dgoxM1SWKiIhAfHw8jIyM4ODgUOsxFBQUcPToUTRs2BAeHh6wtbXlV0IFAE9PTyxZsgTh4eGwsbFBVFQUoqOjpRoY/ykzMzN899138PDwwDfffIOWLVtixYoV/OPr1q3Ds2fP4ODggKFDh8LPzw8NGzaUOsauXbvQpk0bDBo0CNbW1pg6dSrfa65Dhw4YO3YsBgwYAD09PSxYsOBvlUtFRQV//vknmjZtiu+++w5WVlbw9fXFq1evoKGhARUVFVy/fh3ff/89LCws8MMPP2D8+PEYM2YMZGVl8eTJEwwbNgwWFhbw8vJCjx49EBIS8j9fL0IIIYQQQgghhPxvaA66usOxz7Ff4Bdm5syZ2Lt3L9LS0sQuyifpuoWHaNn3XqiLlt3K/oFo2Y9zVT+8k0B0mxWLlq02P1C0bHY7Xbzs3Pcv3iKkl4evi5Ytpktp715RXGgz5B6Jlh1aridatpjEvObxgeKtnv6l/nxHZDYWLXu4wnPRssX8/f2l1lvEPO/G3cXrJ6I0Y6lo2RV3RayviVhXLE/4U7Rsuc4uomUr95kqWnZ9atvI9cM7CeTc/UTRsoXwr+lBRwghhBBCCCGEEELI50hO7AIQQgghhBBCCCGEkH8f9hkONRUL9aD7DMycOZOGtxJCCCGEEEIIIYT8S1EPOkIIIYQQQgghhBDy0WhZg7pDPegIIYQQQgghhBBCCBER9aAjnz0xV8e6lybeKq6KluJl66JQtGwxVyRTFXF1LM7YWrRsMT3OvSNatpirNDfWFO9nDOJ9pIp73mIS8ZpX3LwnWrZKjxaiZYu5guxt9kq0bDGJWW9BbqVo0V9qfe11pmjRkBdxJVXZJuLV1ypESwZeZx4ULVvWNEe0bEI+FjXQEUIIIYQQQgghhJCPVkmLRNQZGuJKCCGEEEIIIYQQQoiIqAcdIYQQQgghhBBCCPlotEhE3aEedEQUbm5umDhxotjFIIQQQgghhBBCCBEdNdB9QjiOw969e8UuRr3YvXs3wsLC+PsmJiaIjIwUr0CEEEIIIYQQQgj5KJVgot0+NzTEtZ6UlpZCQUFB7GJ8Mho0aCB2EQghhBBCCCGEEEI+CdSDDsD+/fuhpaWFysqqJdbT0tLAcRwCAwP5fcaMGYNBgwbx93ft2gUbGxsoKirCxMQEERERUsc0MTHBrFmz4OPjA01NTYwePRqlpaUYP348DA0NoaSkBBMTE8ydO5ffHwD69u0LjuP4+7W5e/cuBg4ciAYNGkBVVRVOTk44e/Ys//jKlSvRvHlzKCgowNLSEhs3bpR6PsdxWLNmDfr27QsVFRWYm5tj3759Uvtcu3YNPXv2hIaGBtTV1fH1118jJ6dqieqUlBR07doVurq60NTUhKurKy5cuMA/d9CgQRg4cKDU8crKyqCrq4vo6GgA0kNc3dzccPv2bQQEBIDjOHAch+LiYmhoaGDnzp01XitVVVUUFoq3LDwhhBBCCCGEEEJIXaIGOgAuLi4oLCzExYsXAQCJiYnQ1dVFYmIiv49EIoGrqysAIDU1FV5eXhg4cCCuXLmCmTNnIjg4GDExMVLHDQ8PR8uWLZGamorg4GAsXboU+/btw/bt25GZmYnY2Fi+IS4lJQUAEB0djfz8fP7+24qKiuDq6or79+9j3759uHTpEqZOnco3Lu7Zswf+/v6YPHkyrl69ijFjxmDEiBE4ceKE1HFCQkLg5eWFy5cvw8PDA97e3nj69CkA4N69e3BxcYGSkhKOHz+O1NRU+Pr6ory8HABQWFiI4cOHIykpCWfOnIG5uTk8PDz4RjNvb2/s27cPRUVFfN6RI0dQXFyM77//vsY57d69G02aNEFoaCjy8/ORn58PVVVVDBw4kG/QqxYdHY1+/fpBXV39Ha8mIYQQQgghhBBC6gMT8b/PDQ1xBaCpqQl7e3tIJBK0bt0aEokEAQEBCAkJQWFhIYqLi5GVlQU3NzcAwKJFi9C5c2cEBwcDACwsLJCeno7w8HD4+Pjwx+3UqROmTJnC38/Ly4O5uTk6duwIjuNgbGzMP6anpwcA0NLSgoGBwTvLunnzZjx69AgpKSn8MFEzMzP+8YULF8LHxwfjxo0DAEyaNAlnzpzBwoUL4e7uzu/n4+PD9wicM2cOli1bhnPnzqF79+747bffoKmpia1bt0JeXp4/xzfP601RUVHQ1tZGYmIievXqhW7dukFVVRV79uzB0KFD+XJ/++230NDQqHFODRo0gKysLNTV1aXOfdSoUejQoQPu37+PRo0a4fHjxzhw4ADi4+PfeX0IIYQQQgghhBBC/m2oB91/ubm5QSKRgDGGpKQk9OnTBy1btsTJkydx4sQJ6Ovro0WLFgCAjIwMODs7Sz3f2dkZ2dnZqKio4Lc5OTlJ7ePj44O0tDRYWlrCz88PR48e/ehypqWlwcHB4Z1zuL2rbBkZGVLb7Ozs+H+rqqpCXV0dDx8+5DO+/vprvnHubQ8fPsTYsWNhYWEBTU1NaGpqoqioCHl5eQAAeXl59O/fH5s2bQIAFBcX448//oC3t/dHnWvbtm1hY2ODDRs2AAA2btyIpk2bwsXF5Z3Pef36NQoKCqRur//bu5AQQgghhBBCCCF1p5Ix0W6fG2qg+y83NzckJSXh0qVLkJGRgbW1NVxdXZGYmCg1vBUAGGPgOE7q+ayWN4eqqqrUfUdHR+Tm5iIsLAyvXr2Cl5cX+vXr91HlVFZW/uA+tZXt7W1vN75xHMcPk/1Qho+PD1JTUxEZGYnTp08jLS0NOjo6KC0t5ffx9vbGsWPH8PDhQ+zduxdKSkro0aPHB8v+tlGjRvHDXKOjozFixIga5/KmuXPn8o2G1bclN/M+OpcQQgghhBBCCCGkvlAD3X9Vz0MXGRkJV1dXcBwHV1dXSCSSGg101tbWOHnypNTzT58+DQsLC8jKyr43R0NDAwMGDMDq1auxbds27Nq1i5/7TV5eXqoHXm3s7OyQlpbGP+dtVlZWtZbNysrqvcd9OyMpKQllZWW1Pp6UlAQ/Pz94eHjwC2U8fvxYap8OHTrAyMgI27Ztw6ZNm9C/f//3rmKroKBQ67kPGTIEeXl5WLp0Ka5du4bhw4e/t+xBQUF48eKF1M3ftOnfOGtCCCGEEEIIIYQQcVAD3X9Vz0MXGxvLzzXn4uKCCxcuSM0/BwCTJ09GQkICwsLCkJWVhfXr12P58uVS883VZvHixdi6dSuuX7+OrKws7NixAwYGBtDS0gJQtZJrQkICHjx4gGfPntV6jEGDBsHAwACenp44deoUbt68iV27diE5ORkAEBgYiJiYGKxatQrZ2dlYtGgRdu/e/cGyvWn8+PEoKCjAwIEDcf78eWRnZ2Pjxo3IzMwEUDXn3caNG5GRkYGzZ8/C29u7Rq87juMwePBgrFq1CvHx8RgyZMh7M01MTPDnn3/i3r17Uo192tra+O677xAYGIhvvvkGTZo0ee9xFBUVoaGhIXVTlKG3OSGEEEIIIYQQUtdokYi6Qy0Xb3B3d0dFRQXfGKetrQ1ra2vo6elJ9UBzdHTE9u3bsXXrVrRs2RIzZsxAaGio1AIRtVFTU8P8+fPh5OSENm3a4NatWzh06BBk/tuAFBERgfj4eBgZGcHBwaHWYygoKODo0aNo2LAhPDw8YGtri3nz5vE99zw9PbFkyRKEh4fDxsYGUVFRiI6Olmpg/BAdHR0cP36cXzG2devWWL16NT8sdt26dXj27BkcHBwwdOhQ+Pn5oWHDhjWO4+3tjfT0dDRu3LjGvHhvCw0Nxa1bt9C8eXN+wYxqI0eORGlpKXx9ff/2ORBCCCGEEEIIIYT8W3CstsnTCPmEbNq0Cf7+/rh///57h8m+y+Nurh/eSSCX0t69Iq/Q2g4oFi37dWahaNmPc1U/vJNAjMPdRMvmjK1Fy2a300XLvh0oES373gt10bIba4r3MzayuES07LWqSqJli0nMax73vYpo2bKmjUXLfnn4umjZftdrXwisPsxQfC1aduPu4vUbuBcn3oJiYp63mPU1ManNDxQtW7aJePW1irvi1deKpoWLlq3So4V42ROjRMuuT1YN24qWnfHwnGjZQpATuwCEvMvLly+Rm5uLuXPnYsyYMf+ocY4QQgghhBBCCCHkU0dDXMkna8GCBbC3t4e+vj6CgoLELg4hhBBCCCGEEELeQHPQ1R1qoCOfrJkzZ6KsrAwJCQlQU1MTuziEEEIIIYQQQgghgqAGOkIIIYQQQgghhBBCRERz0BFCCCGEEEIIIYSQj1ZJ647WGWqgI589MVfuQdpz0aLlOruIli1rmiNatq6IK+99qcRcQRaQiJZ8U15etOxWzcRbpRlXZUWL/mJXOxTxmou6KnfcHdGyG3cXb5VmiPhrLOmleCvIDulsJ1o24iSiRX+p9TUxV0oWc/X5CtGSxV1BVkxivtdUJooWTf6lqIGOEEIIIYQQQgghhHy0z3GxBrHQHHSEEEIIIYQQQgghhIiIGuiIIGbOnAl7e3uxi0EIIYQQQgghhBDyyfsiG+g4jsPevXvFLsZnbcqUKUhISPio55iYmCAyMlKYAhFCCCGEEEIIIaROVTIm2u1z89nNQVdaWgoFBQWxi/HFU1NTg5qamtjFIIQQQgghhBBCCPnk1WsPuv3790NLSwuVlZUAgLS0NHAch8DAQH6fMWPGYNCgQfz9Xbt2wcbGBoqKijAxMUFERITUMU1MTDBr1iz4+PhAU1MTo0ePRmlpKcaPHw9DQ0MoKSnBxMQEc+fO5fcHgL59+4LjOP5+be7evYuBAweiQYMGUFVVhZOTE86ePcs/vnLlSjRv3hwKCgqwtLTExo0bpZ7PcRzWrFmDvn37QkVFBebm5ti3b5/UPteuXUPPnj2hoaEBdXV1fP3118jJqVpRKSUlBV27doWuri40NTXh6uqKCxcu8M8dNGgQBg4cKHW8srIy6OrqIjo6GgDAGMOCBQtgamoKZWVltGrVCjt37nznOVdfo7CwMAwePBhqampo1KgRli1bJrVPXl4e+vTpAzU1NWhoaMDLywt//fUX//jbQ1x9fHzg6emJhQsXwtDQEDo6Ovjpp59QVlYGAHBzc8Pt27cREBAAjuPAcRwA4Pbt2/j222+hra0NVVVV2NjY4NChQ+8tPyGEEEIIIYQQQoTHRPzvc1OvDXQuLi4oLCzExYsXAQCJiYnQ1dVFYmIiv49EIoGrqysAIDU1FV5eXhg4cCCuXLmCmTNnIjg4GDExMVLHDQ8PR8uWLZGamorg4GAsXboU+/btw/bt25GZmYnY2Fi+IS4lJQUAEB0djfz8fP7+24qKiuDq6or79+9j3759uHTpEqZOnco3Lu7Zswf+/v6YPHkyrl69ijFjxmDEiBE4ceKE1HFCQkLg5eWFy5cvw8PDA97e3nj69CkA4N69e3BxcYGSkhKOHz+O1NRU+Pr6ory8HABQWFiI4cOHIykpCWfOnIG5uTk8PDxQWFgIAPD29sa+fftQVFTE5x05cgTFxcX4/vvvAQC//PILoqOjsXLlSly7dg0BAQEYMmSI1DWvTXh4OOzs7HDhwgUEBQUhICAA8fHxAKoa/Tw9PfH06VMkJiYiPj4eOTk5GDBgwHuPeeLECeTk5ODEiRNYv349YmJi+Ndy9+7daNKkCUJDQ5Gfn4/8/HwAwE8//YTXr1/jzz//xJUrVzB//nzqmUcIIYQQQgghhJDPSr0OcdXU1IS9vT0kEglat24NiUSCgIAAhISEoLCwEMXFxcjKyoKbmxsAYNGiRejcuTOCg4MBABYWFkhPT0d4eDh8fHz443bq1AlTpkzh7+fl5cHc3BwdO3YEx3EwNjbmH9PT0wMAaGlpwcDA4J1l3bx5Mx49eoSUlBQ0aNAAAGBmZsY/vnDhQvj4+GDcuHEAgEmTJuHMmTNYuHAh3N3d+f18fHz4HoFz5szBsmXLcO7cOXTv3h2//fYbNDU1sXXrVsjLy/Pn+OZ5vSkqKgra2tpITExEr1690K1bN6iqqmLPnj0YOnQoX+5vv/0WGhoaKC4uxqJFi3D8+HG0b98eAGBqaoqTJ08iKiqKbwitjbOzM6ZPn86X6dSpU1i8eDG6du2KY8eO4fLly8jNzYWRkREAYOPGjbCxsUFKSgratGlT6zG1tbWxfPlyyMrKokWLFujZsycSEhIwevRoNGjQALKyslBXV5d6XfLy8vD999/D1taWLz8hhBBCCCGEEELEx1il2EX4bNT7IhFubm6QSCRgjCEpKQl9+vRBy5YtcfLkSZw4cQL6+vpo0aIFACAjIwPOzs5Sz3d2dkZ2djYqKir4bU5OTlL7+Pj4IC0tDZaWlvDz88PRo0c/upxpaWlwcHDgG+fe9q6yZWRkSG2zs7Pj/62qqgp1dXU8fPiQz/j666/5xrm3PXz4EGPHjoWFhQU0NTWhqamJoqIi5OXlAQDk5eXRv39/bNq0CQBQXFyMP/74A97e3gCA9PR0lJSUoGvXrvyccGpqatiwYQM/jPZdqhv03rxffW4ZGRkwMjLiG+cAwNraGlpaWjXO/002NjaQlZXl7xsaGvLX4l38/Pwwa9YsODs749dff8Xly5ffu//r169RUFAgdXtdXvHe5xBCCCGEEEIIIYSISZQGuqSkJFy6dAkyMjKwtraGq6srEhMTpYa3AlVDKavnIntz29tUVVWl7js6OiI3NxdhYWF49eoVvLy80K9fv48qp7Ky8gf3qa1sb297u/GN4zh+mOyHMnx8fJCamorIyEicPn0aaWlp0NHRQWlpKb+Pt7c3jh07hocPH2Lv3r1QUlJCjx49AIDPOXjwINLS0vhbenr6B+ehe9/51nae79te7X3X4l1GjRqFmzdvYujQobhy5QqcnJxqzIf3prlz5/KNmdW3hccuvjeDEEIIIYQQQgghBACePXuGoUOH8m0KQ4cOxfPnz9+5f1lZGaZNmwZbW1uoqqqiUaNGGDZsGO7fv/9RufXeQFc9D11kZCRcXV3BcRxcXV0hkUhqNNBZW1vj5MmTUs8/ffo0LCwspHpi1UZDQwMDBgzA6tWrsW3bNuzatYuf+01eXl6qB15t7OzskJaWxj/nbVZWVrWWzcrK6r3HfTsjKSmJXyjhbUlJSfDz84OHhwe/UMbjx4+l9unQoQOMjIywbds2bNq0Cf379+dXsbW2toaioiLy8vJgZmYmdXuz91ttzpw5U+N+dc9Ga2tr5OXl4c6dO/zj6enpePHixUed/9sUFBRqfV2MjIwwduxY7N69G5MnT8bq1avfeYygoCC8ePFC6jali8M/LhMhhBBCCCGEEEJqVwkm2k0ogwcPRlpaGuLi4hAXF4e0tDR+WrHavHz5EhcuXEBwcDAuXLiA3bt3IysrC7179/6o3Hqdgw74/3noYmNjsWTJEgBVjXb9+/dHWVkZP/8cAEyePBlt2rRBWFgYBgwYgOTkZCxfvhwrVqx4b8bixYthaGgIe3t7yMjIYMeOHTAwMICWlhaAqlVKExIS4OzsDEVFRWhra9c4xqBBgzBnzhx4enpi7ty5MDQ0xMWLF9GoUSO0b98egYGB8PLygqOjIzp37oz9+/dj9+7dOHbs2N++FuPHj8eyZcswcOBABAUFQVNTE2fOnEHbtm1haWkJMzMzbNy4EU5OTigoKEBgYGCNXnccx2Hw4MFYtWoVsrKypBapUFdXx5QpUxAQEIDKykp07NgRBQUFOH36NNTU1DB8+PB3lu3UqVNYsGABPD09ER8fjx07duDgwYMAgC5dusDOzg7e3t6IjIxEeXk5xo0bB1dX1xrDjT+GiYkJ/vzzTwwcOBCKiorQ1dXFxIkT0aNHD1hYWODZs2c4fvz4exsBFRUVoaioKLXtpdz7G3MJIYQQQgghhBBCMjIyEBcXhzNnzuCrr74CAKxevRrt27dHZmYmLC0tazxHU1OTX1Sz2rJly9C2bVvk5eWhadOmfyu73nvQAYC7uzsqKir4xjhtbW1YW1tDT09PqvHF0dER27dvx9atW9GyZUvMmDEDoaGhUgtE1EZNTQ3z58+Hk5MT2rRpg1u3buHQoUOQkak63YiICMTHx8PIyAgODrX3rlJQUMDRo0fRsGFDeHh4wNbWFvPmzeN77nl6emLJkiUIDw+HjY0NoqKiEB0dLdXA+CE6Ojo4fvw4v2Js69atsXr1an4o6Lp16/Ds2TM4ODhg6NCh8PPzQ8OGDWscx9vbG+np6WjcuHGNefHCwsIwY8YMzJ07F1ZWVujWrRv279+PZs2avbdskydPRmpqKhwcHBAWFoaIiAh069YNQFWj4N69e6GtrQ0XFxd06dIFpqam2LZt298+99qEhobi1q1baN68Ob+YR0VFBX766SdYWVmhe/fusLS0/GADLSGEEEIIIYQQQoTHGBPtVusc9K9f/0/nk5ycDE1NTb5xDgDatWsHTU1NnD59+m8f58WLF+A4ju8o9ndwrLZJ3cgXzcTEBBMnTsTEiRPFLkqdeBk5RrTs5LnPRcvu8Htr0bJZ7vsXIRHSy8PXRctWHddTtGzO2Fq0bDHd8hKvwT7pZe2LCNWHvjZ3PryTQPpcFa9Xctz3KqJlv84sFC1bzGv+R0vxFlp6nKv64Z0E0ri7KN9hAwDG/FH74mH1wb1CvGs+5De7D+8kkNuBEtGyjcPdRMum+lr9E7O+JttEvOwX3iNEyxaT7pFEsYtQL5o2sBUt29fve4SEhEht+/XXXzFz5sx/fMw5c+YgJiYGWVlZUtstLCwwYsQIBAUFffAYJSUl6NixI1q0aIHY2Ni/nS1e7YMQQgghhBBCCCGEkH+gtjno39WANnPmTHAc997b+fPnAdRcEBT48KKY1crKyjBw4EBUVlZ+9Oi/ep+DjhBCCCGEEEIIIYT8+wm5WMOH1DYH/buMHz8eAwcOfO8+JiYmuHz5Mv76668ajz169Aj6+vrvfX5ZWRm8vLyQm5uL48ePQ0ND42+VrRo10JEabt26JXYRCCGEEEIIIYQQQuqErq4udHV1P7hf+/bt8eLFC5w7dw5t27YFAJw9exYvXrxAhw4d3vm86sa57OxsnDhxAjo6Oh9dRhriSgghhBBCCCGEEEI+mpiLRAiheoHK0aNH48yZMzhz5gxGjx6NXr16Sa3g2qJFC+zZswcAUF5ejn79+uH8+fPYtGkTKioq8ODBAzx48AClpaV/O5sa6AghhBBCCCGEEEIIAbBp0ybY2trim2++wTfffAM7Ozts3LhRap/MzEy8ePECAHD37l3s27cPd+/ehb29PQwNDfnbx6z8Squ4ks/e6iZDRMu+KVcpWrZpuXjt72Ke93CF56Jl6zYrFi1bzNUOxdR0nJFo2WKu0nxMWbxVPf8seyBatou8gWjZYhLzmoeW64mWfVNevNVMv9TfY+tLtUTLFrPeIibTsjLRssX8GRNzNXJFS3XRssVcEVxMmpuiRcsWcwXZL2UVV0Mt8VYIzn+eLlq2EL7M34SEEEIIIYQQQgghhHwiqIGOEEIIIYQQQgghhBAR0SquhBBCCCGEEEIIIeSjMdCsaXWFetARQgghhBBCCCGEECIiaqATGcdx2Lt3r9jFqDcSiQQcx+H58+diF4UQQgghhBBCCCH/A8aYaLfPDTXQCai0tFTsIvxr0bUjhBBCCCGEEELIl+KLbaDbv38/tLS0UFlZCQBIS0sDx3EIDAzk9xkzZgwGDRrE39+1axdsbGygqKgIExMTRERESB3TxMQEs2bNgo+PDzQ1NTF69GiUlpZi/PjxMDQ0hJKSEkxMTDB37lx+fwDo27cvOI7j77/tfcfw9fVFr169pPYvLy+HgYEB1q1bBwBwc3PDhAkTMHHiRGhra0NfXx+///47iouLMWLECKirq6N58+Y4fPgwf4zqnm5HjhyBg4MDlJWV0alTJzx8+BCHDx+GlZUVNDQ0MGjQILx8+ZJ/HmMMCxYsgKmpKZSVldGqVSvs3LkTAHDr1i24u7sDALS1tcFxHHx8fPgyjh8/HpMmTYKuri66du36t86NEEIIIYQQQggh5N/ui22gc3FxQWFhIS5evAgASExMhK6uLhITE/l9JBIJXF1dAQCpqanw8vLCwIEDceXKFcycORPBwcGIiYmROm54eDhatmyJ1NRUBAcHY+nSpdi3bx+2b9+OzMxMxMbG8g1xKSkpAIDo6Gjk5+fz99/2vmOMGjUKcXFxyM/P5/c/dOgQioqK4OXlxW9bv349dHV1ce7cOUyYMAE//vgj+vfvjw4dOuDChQvo1q0bhg4dKtXYBgAzZ87E8uXLcfr0ady5cwdeXl6IjIzE5s2bcfDgQcTHx2PZsmX8/r/88guio6OxcuVKXLt2DQEBARgyZAgSExNhZGSEXbt2AQAyMzORn5+PJUuWSJVRTk4Op06dQlRU1N8+N0IIIYQQQgghhNS/SjDRbp+bL3YVV01NTdjb20MikaB169aQSCQICAhASEgICgsLUVxcjKysLLi5uQEAFi1ahM6dOyM4OBgAYGFhgfT0dISHh/O9wACgU6dOmDJlCn8/Ly8P5ubm6NixIziOg7GxMf+Ynp4eAEBLSwsGBgbvLOv7jtGhQwdYWlpi48aNmDp1KoCqBr/+/ftDTU2N369Vq1b45ZdfAABBQUGYN28edHV1MXr0aADAjBkzsHLlSly+fBnt2rXjnzdr1iw4OzsDAEaOHImgoCDk5OTA1NQUANCvXz+cOHEC06ZNQ3FxMRYtWoTjx4+jffv2AABTU1OcPHkSUVFRcHV1RYMGDQAADRs2hJaWltR5mpmZYcGCBVLb/s65EUIIIYQQQgghhPybfbE96ICqYZUSiQSMMSQlJaFPnz5o2bIlTp48iRMnTkBfXx8tWrQAAGRkZPANVdWcnZ2RnZ2NiooKfpuTk5PUPj4+PkhLS4OlpSX8/Pxw9OjRjy7nh44xatQoREdHAwAePnyIgwcPwtfXV2ofOzs7/t+ysrLQ0dGBra0tv01fX59//ruep6+vDxUVFb5xrnpb9XPS09NRUlKCrl27Qk1Njb9t2LABOTk5HzzPt6/d3z23N71+/RoFBQVStzJW8c79CSGEEEIIIYQQ8s/QIhF154tvoEtKSsKlS5cgIyMDa2truLq6IjExUWp4K1D1puM4Tur5tb0hVFVVpe47OjoiNzcXYWFhePXqFby8vNCvX7+PKueHjjFs2DDcvHkTycnJ/PDXr7/+WuoY8vLyUvc5jpPaVn1u1XPy1fa8t59Tva36OdX/P3jwINLS0vhbeno6Pw/d+7x97f7uub1p7ty50NTUlLodLrz2wWxCCCGEEEIIIYQQsXyxQ1yB/5+HLjIyEq6uruA4Dq6urpg7dy6ePXsGf39/fl9ra2ucPHlS6vmnT5+GhYUFZGVl35ujoaGBAQMGYMCAAejXrx+6d++Op0+fokGDBpCXl5fqgfdPjqGjowNPT09ER0cjOTkZI0aM+GcX5H9kbW0NRUVF5OXlSTVuvklBQQEA/tY5A/jocwsKCsKkSZOktsVajflbWYQQQgghhBBCCCFi+KIb6KrnoYuNjeUXK3BxcUH//v1RVlbGzz8HAJMnT0abNm0QFhaGAQMGIDk5GcuXL8eKFSvem7F48WIYGhrC3t4eMjIy2LFjBwwMDPj510xMTJCQkABnZ2coKipCW1v7o48BVA0F7dWrFyoqKjB8+PD/+dr8E+rq6pgyZQoCAgJQWVmJjh07oqCgAKdPn4aamhqGDx8OY2NjcByHAwcOwMPDA8rKyh+cT+5jzk1RURGKiopS2+S59zegEkIIIYQQQggh5ONVfoZDTcXyRQ9xBQB3d3dUVFTwjXHa2tqwtraGnp4erKys+P0cHR2xfft2bN26FS1btsSMGTMQGhoqtUBEbdTU1DB//nw4OTmhTZs2uHXrFg4dOgQZmapLHxERgfj4eBgZGcHBweEfHQMAunTpAkNDQ3Tr1g2NGjX63y7K/yAsLAwzZszA3LlzYWVlhW7dumH//v1o1qwZAKBx48YICQnB9OnToa+vj/Hjx3/wmJ/KuRFCCCGEEEIIIYQIgWOf48x6X6CXL1+iUaNGWLduHb777juxi1On/tdzW91kiACl+ntuylV+eCeBmJaL1/4u5nkPV3guWrZus2LRsh/n1pzD8UvQdJyRaNnJc5+Lln1MWbyewX+WPRAt20X+3Suef87EvOah5XqiZd98a97bes3+Qn+PrS/VEi1bzHqLmEzLykTLFvNnrK/NHdGyFS3VRct+nVkoWraYNDdFi5b9wluc6Z8AQPdIomjZ9UlbzUy07GdFN0TLFsIXPcT1c1BZWYkHDx4gIiICmpqa6N27t9hFqjOf87kRQgghhBBCCCGEVKMGun+5vLw8NGvWDE2aNEFMTAzk5D6fl/RzPjdCCCGEEEIIIeTfrhI0KLOuUIvHv5yJiQk+11HKn/O5EUIIIYQQQgghhFT7Mid7IIQQQgghhBBCCCHkE0E96AghhBBCCCGEEELIR6NRb3WHGujIZ++ErHgraxpDWbRsMVegu81eiZYt5kqqKj1aiJate/i6aNliriCbt0K8VeA6/O4mWvZav2TRso3lNEXLFpOYn2tirl7b9vsC0bIbx4m4EvoL8VZ5XA8t0bL/01u813vOPg3RssVcQVbMVbkB8X7GxFxJVa6zi2jZsqY5omW/FLGuKOZKqmKuIEvIx6IGOkIIIYQQQgghhBDy0SqpB12doTnoCCGEEEIIIYQQQggR0WfdQHf9+nW0a9cOSkpKsLe3F7s4teI4Dnv37v2o57i5uWHixIn8fRMTE0RGRtZpuerarVu3wHEc0tLSxC4KIYQQQgghhBBCyCflXznEleM47NmzB56enu/d79dff4WqqioyMzOhpqYmWHlu3bqFZs2a4eLFi6I0BKakpEBVVbw5oP4OIyMj5OfnQ1dXV+yiEEIIIYQQQgghpA4w0BDXuvLJNdCVlpZCQUGhTo6Vk5ODnj17wtjY+J37lJWVQV5evk7yxKKnpyd2ET5IVlYWBgbiTXJNCCGEEEIIIYQQ8qn6qCGu+/fvh5aWFiorq1b8SUtLA8dxCAwM5PcZM2YMBg0axN/ftWsXbGxsoKioCBMTE0REREgd08TEBLNmzYKPjw80NTUxevRolJaWYvz48TA0NISSkhJMTEwwd+5cfn8A6Nu3LziO4++/jeM4pKamIjQ0FBzHYebMmfwwy+3bt8PNzQ1KSkqIjY3FkydPMGjQIDRp0gQqKiqwtbXFli1bpI5XWVmJ+fPnw8zMDIqKimjatClmz54NAGjWrBkAwMHBARzHwc3NDUBVz7auXbtCV1cXmpqacHV1xYULFz7mkqO4uBjDhg2DmpoaDA0Na1y/6mvy5hBXjuMQFRWFXr16QUVFBVZWVkhOTsaNGzfg5uYGVVVVtG/fHjk50qsI7d+/H61bt4aSkhJMTU0REhKC8vJyqeOuWbMGffv2hYqKCszNzbFv3z7+8WfPnsHb2xt6enpQVlaGubk5oqOrVs2pbYhrYmIi2rZtC0VFRRgaGmL69OlSeW5ubvDz88PUqVPRoEEDGBgYYObMmR91/QghhBBCCCGEECKMSsZEu31uPqqBzsXFBYWFhbh48SKAqgYWXV1dJCYm8vtIJBK4uroCAFJTU+Hl5YWBAwfiypUrmDlzJoKDgxETEyN13PDwcLRs2RKpqakIDg7G0qVLsW/fPmzfvh2ZmZmIjY3lG+JSUlIAANHR0cjPz+fvvy0/Px82NjaYPHky8vPzMWXKFP6xadOmwc/PDxkZGejWrRtKSkrQunVrHDhwAFevXsUPP/yAoUOH4uzZs/xzgoKCMH/+fAQHByM9PR2bN2+Gvr4+AODcuXMAgGPHjiE/Px+7d+8GABQWFmL48OFISkrCmTNnYG5uDg8PDxQWFv7tax4YGIgTJ05gz549OHr0KCQSCVJTUz/4vLCwMAwbNgxpaWlo0aIFBg8ejDFjxiAoKAjnz58HAIwfP57f/8iRIxgyZAj8/PyQnp6OqKgoxMTE8I2Q1UJCQuDl5YXLly/Dw8MD3t7eePr0KQDw1+bw4cPIyMjAypUr3zmk9d69e/Dw8ECbNm1w6dIlrFy5EmvXrsWsWbOk9lu/fj1UVVVx9uxZLFiwAKGhoYiPj//b148QQgghhBBCCCHkU/dRQ1w1NTVhb28PiUSC1q1bQyKRICAgACEhISgsLERxcTGysrL4HmSLFi1C586dERwcDACwsLBAeno6wsPD4ePjwx+3U6dOUg1oeXl5MDc3R8eOHcFxnNQQ1erhnFpaWu8dMmlgYAA5OTmoqanx+z1+/BgAMHHiRHz33XdS+7+ZP2HCBMTFxWHHjh346quvUFhYiCVLlmD58uUYPnw4AKB58+bo2LGjVJl0dHSkytSpUyepjKioKGhrayMxMRG9evV6Z9mrFRUVYe3atdiwYQO6du0KoKrBqkmTJh987ogRI+Dl5QWgqkGyffv2CA4ORrdu3QAA/v7+GDFiBL//7NmzMX36dP78TE1NERYWhqlTp+LXX3/l9/Px8eF7SM6ZMwfLli3DuXPn0L17d+Tl5cHBwQFOTk4A8M7ejQCwYsUKGBkZYfny5eA4Di1atMD9+/cxbdo0zJgxAzIyVW3HdnZ2fL65uTmWL1+OhIQE/noQQgghhBBCCCFEHOwz7Mkmlo9exdXNzQ0SiQSMMSQlJaFPnz5o2bIlTp48iRMnTkBfXx8tWrQAAGRkZMDZ2Vnq+c7OzsjOzkZFRQW/rbpBp5qPjw/S0tJgaWkJPz8/HD169J+c2zu9nVdRUYHZs2fDzs4OOjo6UFNTw9GjR5GXl8efx+vXr9G5c+ePynn48CHGjh0LCwsLaGpqQlNTE0VFRfxxPyQnJwelpaVo3749v61BgwawtLT84HPt7Oz4f1f39LO1tZXaVlJSgoKCAgDghwOrqanxt9GjRyM/Px8vX76s9biqqqpQV1fHw4cPAQA//vgjtm7dCnt7e0ydOhWnT59+Z/kyMjLQvn17cBzHb3N2dkZRURHu3r1bax4AGBoa8nm1ef36NQoKCqRuFazinfsTQgghhBBCCCGEiO0fNdAlJSXh0qVLkJGRgbW1NVxdXZGYmCg1vBWoakl9swGmetvb3l6B1NHREbm5uQgLC8OrV6/g5eWFfv36fWxR3+ntvIiICCxevBhTp07F8ePHkZaWhm7duqG0tBQAoKys/I9yfHx8kJqaisjISJw+fRppaWnQ0dHhj/sh/0tL9JsLX1S/BrVtq55PsLKyEiEhIUhLS+NvV65cQXZ2NpSUlGo9bvVxqo/Ro0cP3L59GxMnTsT9+/fRuXNnqZ6Jb5/bu94bb25/X15t5s6dyzeGVt/SX2S9c39CCCGEEEIIIYQQsX10A131PHSRkZFwdXUFx3FwdXWFRCKp0UBnbW2NkydPSj3/9OnTsLCwgKys7HtzNDQ0MGDAAKxevRrbtm3Drl27+LnO5OXlpXrg/a+qewIOGTIErVq1gqmpKbKzs/nHzc3NoaysjISEhFqfX73q7NtlSkpKgp+fHzw8PPiFMqqH2f4dZmZmkJeXx5kzZ/htz549Q1ZW3Tc4OTo6IjMzE2ZmZjVu1cNN/w49PT34+PggNjYWkZGR+P3332vdz9raGqdPn5ZqhDx9+jTU1dXRuHHjf3weQUFBePHihdTNWtPiHx+PEEIIIYQQQgghtWMi/ve5+ag56ID/n4cuNjYWS5YsAVDVaNe/f3+UlZXx888BwOTJk9GmTRuEhYVhwIABSE5OxvLly7FixYr3ZixevBiGhoawt7eHjIwMduzYAQMDA2hpaQGomtssISEBzs7OUFRUhLa29seehhQzMzPs2rULp0+fhra2NhYtWoQHDx7AysoKAKCkpIRp06Zh6tSpUFBQgLOzMx49eoRr165h5MiRaNiwIZSVlREXF4cmTZpASUkJmpqaMDMzw8aNG+Hk5ISCggIEBgZ+VG88NTU1jBw5EoGBgdDR0YG+vj5+/vnnj2ow+7tmzJiBXr16wcjICP3794eMjAwuX76MK1eu1Fi44X3HaN26NWxsbPD69WscOHCAv4ZvGzduHCIjIzFhwgSMHz8emZmZ+PXXXzFp0qT/6fwUFRWhqKgotU2We39jMCGEEEIIIYQQQoiY/lFLiLu7OyoqKvjGOG1tbVhbW0NPT0+qQcbR0RHbt2/H1q1b0bJlS8yYMQOhoaFSC0TURk1NDfPnz4eTkxPatGmDW7du4dChQ3zDTUREBOLj42FkZAQHB4d/cgpSgoOD4ejoiG7dusHNzQ0GBgbw9PSssc/kyZMxY8YMWFlZYcCAAfxcaHJycli6dCmioqLQqFEj9OnTBwCwbt06PHv2DA4ODhg6dCj8/PzQsGHDjypbeHg4XFxc0Lt3b3Tp0gUdO3ZE69at/+dzflu3bt1w4MABxMfHo02bNmjXrh0WLVoktUDHhygoKCAoKAh2dnZwcXGBrKwstm7dWuu+jRs3xqFDh3Du3Dm0atUKY8eOxciRI/HLL7/U1SkRQgghhBBCCCFEQIwx0W6fG459jmdFyBsGG/cVLduY+2fzF/7b3WavRMte2uKpaNkqPVqIlv3y8HXRsh/nqn54p8+QcbibaNkj/ZJFyxaTmJ+pYn6uiXne/+ldIFr2vbh3zzkrePYLddGyjymL1/NfzNd7zj4N0bJNy+t+dMrfdVNOvPe5mMR8r8l1dhEtm+XmiJYtZl1RTJqbokXLltc1FS27PikoNhEtu/T13Q/v9C8i3m8jQgghhBBCCCGEEELIx89BRwghhBBCCCGEEEIIDcqsO9SDjhBCCCGEEEIIIYQQEVEPOkIIIYQQQgghhBDy0aj/XN2hHnSEEEIIIYQQQgghhIiJEUJqVVJSwn799VdWUlJC2ZRN2ZRN2ZRN2ZRN2ZRN2ZRN2ZRNiGA4xmhGP0JqU1BQAE1NTbx48QIaGhqUTdmUTdmUTdmUTdmUTdmUTdmUTdmECIKGuBJCCCGEEEIIIYQQIiJqoCOEEEIIIYQQQgghRETUQEcIIYQQQgghhBBCiIiogY6Qd1BUVMSvv/4KRUVFyqZsyqZsyqZsyqZsyqZsyqZsyqZsQgRDi0QQQgghhBBCCCGEECIi6kFHCCGEEEIIIYQQQoiIqIGOEEIIIYQQQgghhBARUQMdIYQQQgghhBBCCCEiogY6Qggh5DPEGMPt27fx6tUrsYtCiGDofV7/KioqkJiYiGfPnoldFEIIIeSzQg10hLzH8+fP6zXvxo0bOHLkCP+HhtBruJSXlyMkJAR37twRNOdT9OrVK7x8+ZK/f/v2bURGRuLo0aMilkp4ubm5YhdBFDNnzsTt27fFLka9YozB3Nwcd+/erffs8vJyrF+/Hg8ePKj37LKyMpiamiI9Pb3es8U8bzE/z8W85mK/z+Xk5HD16lVRssV6vWVlZdGtW7d6ryN9SkpLS5GZmYny8vJ6yRPzs+VTU1BQgL179yIjI0PsogiquLhY7CKIIiYmRqp+TsiXhhroCPmv+fPnY9u2bfx9Ly8v6OjooHHjxrh06ZKg2U+ePEGXLl1gYWEBDw8P5OfnAwBGjRqFyZMnC5YrJyeH8PBwVFRUCJbxISkpKZg6dSoGDhyI7777TuompD59+mDDhg0Aqhpiv/rqK0RERKBPnz5YuXKloNlxcXE4efIkf/+3336Dvb09Bg8eLHiPBDMzM7i7uyM2NhYlJSWCZr1t/fr1OHjwIH9/6tSp0NLSQocOHQRvPNu/fz+aN2+Ozp07Y/PmzfV67hUVFVi7di0GDx6MLl26oFOnTlI3ocjIyMDc3BxPnjwRLONd5OTk8OOPP+L169f1ni0vL4/Xr1+D47h6zxbzvMX8PBfzmov9Pjc2Nhblmov9+9vW1hY3b94UJVvM3yUvX77EyJEjoaKiAhsbG+Tl5QEA/Pz8MG/ePMFyxfxsqXb37l2sWLEC06dPx6RJk6RuQvLy8sLy5csBVH256uTkBC8vL9jZ2WHXrl2CZl+4cAFXrlzh7//xxx/w9PTEf/7zH5SWlgqara+vD19fX6n6Yn0Rs54aFBQEAwMDjBw5EqdPnxY0qzYJCQn4z3/+g1GjRsHX11fqRkh9oAY6Qv4rKioKRkZGAID4+HjEx8fj8OHD6NGjBwIDAwXNDggIgJycHPLy8qCiosJvHzBgAOLi4gTN7tKlCyQSiaAZ77J161Y4OzsjPT0de/bsQVlZGdLT03H8+HFoamoKmn3hwgV8/fXXAICdO3dCX18ft2/fxoYNG7B06VJBswMDA1FQUAAAuHLlCiZPngwPDw/cvHlT8IrupUuX4ODggMmTJ8PAwABjxozBuXPnBM2sNmfOHCgrKwMAkpOTsXz5cixYsAC6uroICAgQNDs1NRUXLlyAnZ0dAgICYGhoiB9//BEpKSmC5gKAv78//P39UVFRgZYtW6JVq1ZSNyEtWLAAgYGBovTw+eqrr5CWllbvuQAwYcIEzJ8/v956t7xJzPMW8/NczGsu5vv8l19+QVBQEJ4+fVrv2WK+3rNnz8aUKVNw4MAB5Ofno6CgQOomJDF/lwQFBeHSpUuQSCRQUlLit3fp0kXqS14hiPnZkpCQAEtLS6xYsQIRERE4ceIEoqOjsW7dOsHL9Oeff/L1tT179oAxhufPn2Pp0qWYNWuWoNljxoxBVlYWAODmzZsYOHAgVFRUsGPHDkydOlXQ7C1btuDFixfo3LkzLCwsMG/ePNy/f1/QzGpi1lPv3r2L2NhYPHv2DO7u7mjRogXmz59fL71HQ0JC8M033yAhIQGPHz/Gs2fPpG6E1AeOCT2GjpB/CWVlZWRlZcHIyAj+/v4oKSlBVFQUsrKy8NVXXwn6wWxgYIAjR46gVatWUFdXx6VLl2Bqaorc3FzY2tqiqKhIsOyoqCjMnDkT3t7eaN26NVRVVaUe7927t2DZdnZ2GDNmDH766Sf+vJs1a4YxY8bA0NAQISEhgmWrqKjg+vXraNq0Kby8vGBjY4Nff/0Vd+7cgaWlpaDd69XU1HD16lWYmJhg5syZuHr1Knbu3IkLFy7Aw8OjXioh5eXl2L9/P2JiYnD48GGYm5tj5MiRGDp0KPT09ATJfPOaT5s2Dfn5+diwYQOuXbsGNzc3PHr0SJDct1Wfe3R0NOLi4mBpaYlRo0bBx8dHkIZhXV1dbNiwAR4eHnV+7A/R1tbGy5cvUV5eDgUFBf6P2mpCNirs2LED06dPR0BAQK2fLXZ2doJl9+3bFwkJCVBTU4OtrW2N7N27dwuWLeZ5i/l5LuY1F/N97uDggBs3bqCsrAzGxsY1zvvChQuCZYv5esvI/P93/G/2nGSMgeM4QXv2ifm7xNjYGNu2bUO7du2k6ms3btyAo6OjoI2TYn62tG3bFt27d0doaCh/3g0bNoS3tze6d++OH3/8UbDsN+vnw4YNQ6NGjTBv3jzk5eXB2tpa0DqypqYmLly4gObNm2P+/Pk4fvw4jhw5glOnTmHgwIH1MsT8yZMn2LBhA2JiYpCeno5u3brB19cXvXv3hpycnCCZn0I9FQAePnyI2NhYxMTE4Pr16+jevTtGjhyJb7/9VuozqK4YGhpiwYIFGDp0aJ0fm5C/S5ifakL+hbS1tXHnzh0YGRkhLi6O/1aOMSb4EJLi4mKpnnPVHj9+DEVFRUGzqytVixYtqvGY0JXsnJwc9OzZEwCgqKiI4uJicByHgIAAdOrUSdAGOjMzM+zduxd9+/bFkSNH+G/dHz58CA0NDcFyAUBBQYFvADx27BiGDRsGAGjQoIHgPQ+qycnJoW/fvvDw8MCKFSsQFBSEKVOmICgoCAMGDMD8+fNhaGhYp5lqamp48uQJmjZtiqNHj/LXXElJqV4neK+srERpaSlev34NxhgaNGiAlStXIjg4GKtXr8aAAQPqNE9BQQFmZmZ1esy/KzIyUpRcAPx19PPz47dxHFcvf8BraWnh+++/F+z47yPmeYv5eS7mNRfzfe7p6Slatpiv94kTJwQ79oeI+bvk0aNHaNiwYY3t1fUXIYn52ZKRkYEtW7YAqKo/vHr1CmpqaggNDUWfPn0EbaAzMjJCcnIyGjRogLi4OGzduhUA8OzZM6lejEJgjKGyshJAVX2tV69efJkeP34saHY1HR0dBAQEICAgAMuWLUNgYCAOHToEXV1djB07FtOnT6/1b4j/xadQTwWAhg0bwtnZGZmZmcjKysKVK1fg4+MDLS0tREdHw83NrU7zSktL0aFDhzo9JiEfixroCPmv7777DoMHD+bnsunRowcAIC0tTfA/rl1cXLBhwwaEhYUBqKpwVVZWIjw8HO7u7oJmV1c8xNCgQQMUFhYCABo3boyrV6/C1tYWz58/F3yC2BkzZmDw4MEICAhA586d0b59ewDA0aNH4eDgIGh2x44dMWnSJDg7O+PcuXP8sJisrCw0adJE0Oxq58+fx7p167B161aoqqpiypQpGDlyJO7fv48ZM2agT58+dT70tWvXrhg1ahQcHByQlZXFN85eu3YNJiYmdZpVm9TUVERHR2PLli1QVFTEsGHD8Ntvv/E/3xEREfDz86vzBrrJkydjyZIlWL58eb3P0TV8+PB6zXuTmAuSREdHi5Yt5nmL+Xku5jUX833+66+/ipYt5uvt6uoqWraYv0vatGmDgwcPYsKECQD+v/fg6tWr+XqEUMT8bFFVVeXnv2vUqBFycnJgY2MDAII3VE2cOBHe3t5QU1ODsbEx3yjz559/wtbWVtBsJycnzJo1C126dEFiYiI/R3Fubi709fUFza724MEDbNiwAdHR0cjLy0O/fv34+tq8efNw5syZOl/gTOx66l9//YWNGzciOjoaN2/ehKenJw4cOIAuXbrg1atX+OWXXzB8+PA6n3Ny1KhR2Lx5M4KDg+v0uIR8DGqgI+S/Fi9eDBMTE9y5cwcLFiyAmpoaACA/Px/jxo0TNDs8PBxubm44f/48SktLMXXqVFy7dg1Pnz7FqVOnBM1+U0lJieDfRr7p66+/Rnx8PGxtbeHl5QV/f38cP34c8fHx6Ny5s6DZ/fr1Q8eOHZGfny81D1jnzp3Rt29fQbOXL1+OcePGYefOnVi5ciUaN24MADh8+DC6d+8uaPaiRYsQHR2NzMxMeHh48EMvq4cKNGvWDFFRUWjRokWdZ//222/45ZdfcOfOHezatQs6OjoAqhrOBg0aVOd5b7Kzs0NGRga++eYbrF27Ft9++y1kZWWl9hk2bJgg802ePHkSJ06cwOHDh2FjYwN5eXmpx4Uc+gdULVJRveIdx3GwtrZG7969a5x/XTM2Nhb0+H/Ho0ePkJmZCY7jYGFhIdjw7Td9CuctJjGuOSDe+7xaamqqVLbQX/R8Cp4/f461a9dKnbevr6/gc8iK+btk7ty56N69O9LT01FeXo4lS5bg2rVrSE5ORmJioqDZYn62tGvXDqdOnYK1tTV69uyJyZMn48qVK9i9ezfatWsnaPa4cePQtm1b3LlzB127duXrK6ampoLPQRcZGQlvb2/s3bsXP//8M/+F3s6dOwXvabV7925ER0fjyJEjsLa2xk8//YQhQ4ZAS0uL38fe3l6Qzxox66nffvstjhw5AgsLC4wePRrDhg1DgwYN+MeVlZUxefJkLF68uM6zS0pK8Pvvv+PYsWOws7OrUV+rrbcyIXWN5qAj5BPx4MEDrFy5EqmpqaisrISjoyN++umnOh9m+LaKigrMmTMHq1atwl9//YWsrCyYmpoiODgYJiYmGDlypGDZT58+RUlJCRo1aoTKykosXLgQJ0+ehJmZGYKDg6GtrS1Y9tsKCgpw/PhxWFpawsrKqt5y65u5uTl8fX0xYsQIGBgY1LpPaWkptmzZUue9UvLy8tCkSZMa84YwxnDnzh00bdq0TvPeFBYWBl9fX76SWZ9GjBjx3seF7Hl048YNeHh44N69e7C0tARjjJ/L5+DBg2jevLlg2UDVMPbIyEj+D3grKyv4+/sLnltcXIwJEyZgw4YNfC8jWVlZDBs2DMuWLavz4UBvE+u8ASAxMRELFy6Uyg4MDOQnWReKmNdczPf5w4cPMXDgQEgkEmhpaYExhhcvXsDd3R1bt24VvIFSrNf7/Pnz6NatG5SVldG2bVswxnD+/Hm8evUKR48ehaOjo6D5Yrp69SrCw8Ol6mvTpk0TvDcXAGzcuBGrVq1Cbm4ukpOTYWxsjMjISDRr1gx9+vQRLPfmzZsoKiqCnZ0dXr58iSlTpvD1tcWLF9dr42FFRQWuXLkCY2Pjeq0nvqmkpASysrI1GnDqkqamJgYOHIhRo0ahTZs2te7z6tUrLFiwQNSevHVt5MiRGDVq1Ht7pDLGkJeXV+fvu/eNWuI4DsePH6/TPEJqxQghjDHG1q9f/96bUEpLS5mbmxvLzMwULON9QkJCmKmpKYuNjWXKysosJyeHMcbYtm3bWLt27UQpU33o378/W7ZsGWOMsZcvXzJzc3MmLy/P5OTk2M6dOwXNTk1NZZcvX+bv7927l/Xp04cFBQWx169fC5qdm5vLKioqamyvrKxkt2/fFjRbRkaG/fXXXzW2P378mMnIyAiWW1paypo1a8auXbsmWManqkePHqx79+7syZMn/LbHjx+z7t27Mw8PD0Gz4+LimIKCAmvbti0LCAhgEydOZG3btmWKiors6NGjgmb/8MMPzNTUlB06dIi9ePGCvXjxgh08eJA1b96cjR07VtBsMc9748aNTE5Ojnl5ebElS5awyMhI5uXlxeTl5dmmTZsEzRbzmov5Pvfy8mKtW7dm6enp/LZr164xJycnNnDgQEGzxXy9O3bsyHx8fFhZWRm/raysjA0fPpx9/fXXgmYnJia+9yaU0tJS5uPjw9eT6tuKFSuYrq4umzVrllR9LTo6mrm5uYlSpvrg7+/P1qxZwxhjrLy8nDk7OzOO45iqqio7ceKEoNl5eXnszp07/P2zZ88yf39/FhUVJWguY4wVFxcLnvEuYtbXxPybiJBPATXQEfJfWlpaUjdVVVXGcRxTVFRk2tragmbr6uqyrKwsQTPepXnz5uzYsWOMMcbU1NT4Cl9GRgbT0tISPP/GjRvs559/ZgMHDuQrA4cPH2ZXr14VNFdfX5+lpaUxxhjbtGkTMzMzY8XFxWzFihXM3t5e0GwnJye+ETAnJ4cpKSmxQYMGMTMzM+bv7y9otliVLsYY4ziu1uxbt24xFRUVQbMbNWok9Qe0GB4+fMiSkpLYyZMn2cOHD+slU0VFRaoxuFpaWhpTVVUVNNve3p5NmzatxvZp06YxBwcHQbN1dHRq/cPt+PHjTFdXV9BsMc+7RYsWbNGiRTW2R0REsBYtWgiaLeY1F/N9rqGhwc6dO1dj+9mzZ5mmpqag2WK+3kpKSiwjI6PG9mvXrjFlZWVBszmOq3GTkZHhb0LS1NQUrYHOysqK7dmzhzEmXV+7cuUK09HRETz/2bNnbPXq1Wz69Ol8Y3hqaiq7e/euoLmNGzdmKSkpjDHG9uzZwxo1asQyMzPZzz//zDp06CBodseOHdmGDRsYY4zl5+czDQ0N1r59e6ajo8NCQkIEzf4U62v37t1jSkpKgmaL+TfRm+7cuSP4e5uQ2tT9+sSE/Es9e/ZM6lZUVITMzEx07NiRX7lKKMOGDcPatWsFzXiXe/fu1boIRmVlJcrKygTNTkxMhK2tLc6ePYvdu3ejqKgIAHD58mXBu+u/ePGCn9MiLi4O33//PVRUVNCzZ09kZ2cLmp2VlQV7e3sAwI4dO+Di4oLNmzcjJiYGu3btEjSbvWNWg6KiIsHmH5w0aRImTZoEjuMwY8YM/v6kSZPg7++PAQMG8NdDKBMmTMD8+fNRXl4uaE5tiouL4evrC0NDQ7i4uODrr79Go0aNMHLkSMEXQ1FUVOQXYnlTUVERFBQUBM3OyMiodYi8r68v0tPTBc1++fJlrRN4N2zYUPBrLuZ537x5E99++22N7b179xZ8gnkxr7mY7/PKyspah7nJy8sLvoiDmK+3hoYG8vLyamy/c+cO1NXVBc1+u7728OFDxMXFoU2bNnU+Wf7b+vbti7179wqa8S65ubm1zjemqKiI4uJiQbMvX74MCwsLzJ8/HwsXLsTz588BAHv27EFQUJCg2Y8fP+an5Dh06BD69+8PCwsLjBw5EleuXBE0++rVq2jbti0AYPv27WjZsiVOnz7N19mE9K762uvXrwX7XFu6dCmWLl0KjuOwZs0a/v7SpUuxePFi/PTTT4LMUfwmMf8mqqysRGhoKDQ1NWFsbIymTZtCS0sLYWFhoi7KQ74stEgEIe9hbm6OefPmYciQIbh+/bpgOaWlpVizZg3i4+Ph5OQEVVVVqceFnJTUxsYGSUlJNeZx2LFjh+CTXE+fPh2zZs3CpEmTpCr07u7uWLJkiaDZRkZGSE5ORoMGDRAXF4etW7cCqKr4C71QBmOM/0V/7Ngx9OrViy+TUKuhTZo0CQD4RrI354OqqKjA2bNnBWsku3jxIoCq875y5YpUxVJBQQGtWrXClClTBMmudvbsWSQkJODo0aOwtbWt8TMm5EINkyZNQmJiIvbv3w9nZ2cAVQtH+Pn5YfLkyfyqcELo1asXfvjhB6xdu5b/I+Ps2bMYO3YsevfuLVguAOjp6SEtLQ3m5uZS29PS0tCwYUNBs9u3b49ff/0VGzZs4H+eX716hZCQEMFXWhTzvI2MjJCQkFDjS5eEhAQYGRkJmi3mNRfzfd6pUyf4+/tjy5YtaNSoEYCqL76qVwgXkpiv94ABAzBy5EgsXLgQHTp0AMdxOHnyJAIDAwVfqKG2RSi6du0KRUVFBAQEIDU1VbBsMzMzhIWF4fTp02jdunWN3yV+fn6CZTdr1gxpaWk16muHDx+GtbW1YLlA1e8xHx8fLFiwQKq+1qNHDwwePFjQbH19faSnp8PQ0BBxcXFYsWIFgKovBYReBKasrAyKiooAqupr1Z8nLVq0QH5+viCZS5cuBQC+kax60Tqgqr72559/CtZIVr3wAmMMq1atkrq+CgoKMDExwapVqwTJribm30Q///wz1q5di3nz5sHZ2RmMMZw6dQozZ85ESUkJZs+eLVg2IdWogY6QD5CVlcX9+/cFzbh69So/oXJWVpbUYxzHCZr966+/YujQobh37x4qKyuxe/duZGZmYsOGDThw4ICg2VeuXMHmzZtrbNfT08OTJ08EzZ44cSK8vb2hpqaGpk2bws3NDQDw559/Cj7Rs5OTE2bNmoUuXbogMTGRb6DJzc2ttQdKXRCzkezEiRMAqhZLWLJkCTQ0NATJeR8tLS18//339Z4LALt27cLOnTv59xgAeHh4QFlZGV5eXoI20C1duhTDhw9H+/bt+V4+5eXl6N27t+CN4KNHj8YPP/yAmzdvSv0BP3/+fEyePFnQ7MjISPTo0QNNmjRBq1atwHEc0tLSoKSkhCNHjgiaLeZ5T548GX5+fkhLS5PKjomJEfz1FvOai/k+X758Ofr06QMTExMYGRmB4zjk5eXB1tYWsbGxgmaL+XovXLgQHMdh2LBhfM9keXl5/Pjjj5g3b56g2e+ip6eHzMxMQTPWrFkDLS0tpKam1mgI5DhO0Aa6wMBA/PTTTygpKQFjDOfOncOWLVswd+5crFmzRrBcAEhJSUFUVFSN7Y0bN8aDBw8EzR4xYgS8vLxgaGgIjuPQtWtXAFWN8EL35rKxscGqVavQs2dPxMfHIywsDABw//59fvXguiZmI1l1z1t3d3fs3r1blEU43vc3kdDWr1+PNWvWSH2x06pVKzRu3Bjjxo2jBjpSL2gVV0L+a9++fVL3GWPIz8/H8uXLYWRkhMOHD4tUMuEdOXIEc+bMkVqRbMaMGfjmm28EzW3SpAm2b9+ODh06QF1dHZcuXYKpqSn27NmDKVOmICcnR9D88+fP486dO+jatSv/DeXBgwehpaXF93QSwuXLl+Ht7Y28vDxMmjSJH847YcIEPHnypNZGy7oiZiPZl0pFRQWpqak1Vge+du0a2rZtK9jQJPbfVc709PRw//59ZGRkgDEGa2vrWoe1C5EfGRmJiIgI/kuORo0aITAwEH5+foJ/+fDq1SvExsbi+vXr/Hl7e3tDWVlZ0Fyxz3vPnj2IiIhARkYGAPCregq5wmM1Ma652O/zavHx8VLn3aVLl3rJFeP1rqiowMmTJ2FrawslJSXk5OSAMQYzMzPBV0gGqn6Hvqm6vjZv3jyUlZXh1KlTgpdBLKtXr8asWbNw584dAFUNZDNnzqx1WH1d0tfXR1xcHBwcHKTqa0ePHsXIkSP58ghl586duHPnDvr3748mTZoAqGpM0dLSEvS9LpFI0LdvXxQUFGD48OFYt24dAOA///kPrl+/LmjvezEbyb5USkpK/HDuN2VmZsLe3h6vXr0SqWTkS0INdIT8l4yM9JSMHMdBT08PnTp1QkREBAwNDUUq2edr6tSpSE5Oxo4dO2BhYYELFy7gr7/+wrBhwzBs2LB6WTa+tLQUubm5aN68OeTkxO1UXFJSAllZ2VrnM/ocFBcXY968eUhISMDDhw9rzOdx8+ZNwbI7deqE3bt3Q0tLS2p7QUEBPD09cfz4ccGyO3fuDB0dnRpD/4YPH46nT5/i2LFjguRWVlZCSUkJ165dqzHcUmjl5eXYtGkTunXrBgMDA35+MKHnpgKqhiRZWlriwIEDgg/7epuY511eXo7Zs2fD19dX8OGNbxPzmov9PldSUkJaWhpatmxZ79livd5A1R+yGRkZaNasWb1ny8jIgOO4GnN0tWvXDuvWrRO8V9Wn4PHjx6isrBR82Hy1H374AY8ePcL27dvRoEEDXL58GbKysvD09ISLiwsiIyPrpRwlJSWCT0PytoqKChQUFEg1lN26dQsqKir1dv3rW0VFBWJiYt5ZXxOyzuTr64slS5bU+L1ZXFyMCRMm8I2kQvjqq6/w1Vdf8cOMq02YMAEpKSk4c+aMYNmEVKMGOkI+Ae7u7u/tVSHkL0JTU1OkpKTU6Kr//PlzODo6CtpoUlZWBh8fH2zduhWMMcjJyaGiogKDBw9GTEyMoHOLvHz5EhMmTMD69esBVHWjNzU1hZ+fHxo1aoTp06cLlg1UXd+dO3ciJycHgYGBaNCgAS5cuAB9fX00bty4TrO+++47xMTEQENDA99999179xXy2+BBgwYhMTERQ4cO5YeqvMnf31+wbBkZGTx48KBGZfrhw4do3LixoAuiXL16Fd27d0dJSUmtQ/9sbGwEy7axscHatWvRrl07wTLeRUVFBRkZGTXmS6oPjRs3xrFjx2r0WqwPYp63mpoarl69ChMTk3rPFvOai/k+b968OXbv3o1WrVrVe7aYr3ebNm0wb948wefZq83t27el7svIyEBPT69eGm58fX3f+7iQjQdiftFUUFAADw8PXLt2DYWFhWjUqBEePHiA9u3b49ChQzXmCatLFRUVmDNnDlatWoW//vqLr68FBwfDxMRE8N6D5eXlkEgkyMnJweDBg6Guro779+9DQ0NDan64ujBp0iSEhYVBVVWVnzv4XYSci238+PGIiYlBz549a62vVQ/DFYKsrCzy8/Nr1NeqFwsRcrGvxMRE9OzZE02bNkX79u3BcRxOnz6NO3fu4NChQ/j6668FyyakGs1BR8gn4O3J+cvKypCWloarV69i+PDhgmbfunULFRUVNba/fv0a9+7dEyyXMYb79+9j9erVCAsLw4ULF1BZWQkHB4d66QURFBSES5cuQSKRoHv37vz2Ll264NdffxW0ge7y5cvo3LkztLS0cOvWLYwePRoNGjTAnj17cPv2bWzYsKFO8zQ1NfnKlYaGhuBD7N7l8OHDOHjwoKDDh9/25lCo9PR0qblyKioqEBcXV+cNom9r2bIlsrOzpYb+DRw4sF6GWy5YsACBgYFYuXJlvffw+eqrr3Dx4kVRGqqqV+1ds2ZNvfeMFfO8u3TpAolEAh8fn3rPFvOai/k+/+WXXxAUFITY2Fh+ZfD6IubrPXv2bEyZMgVhYWG1LpYg5DQKYvxsVXv27JnU/bKyMly9ehXPnz9Hp06dBM2WSCQoLS2tsb2kpARJSUmCZmtoaODkyZM4fvw4X19zdHSsl6Hcs2fPxvr167FgwQKMHj2a325ra4vFixcL2kB3+/ZtdO/eHXl5eXj9+jW6du0KdXV1LFiwACUlJXU+F9zFixf5Lwyr5w6ujdD1uK1bt2L79u3w8PAQNOdNBQUFYIyBMYbCwkKpBveKigocOnRI8B6Lrq6uyMrKwm+//cbX17777juMGzeOXwSIEKFRDzryRftUvql6l5kzZ6KoqAgLFy6s82NXz7nn6emJ9evXS62KVlFRgYSEBMTHxws24bKYw5KAqgr+tm3b0K5dO6n5VG7cuAFHR0cUFBQIlt2lSxc4Ojryq6FVZ58+fRqDBw/GrVu3BMsWU7NmzXDo0KF67WFTPRQKQI3hUACgrKyMZcuWfbBXxL+VtrY2Xr58ifLycigoKNRoEHz69Klg2Tt27MD06dMREBBQ6x/wdnZ2gmX37dsXCQkJUFNTq/dVe8U876ioKMycORPe3t61Zgu5oqmY11zM97mDgwNu3LiBsrIyGBsb1zjvCxcuCJYt5uv95rQgbzYWMMbAcVytX/z9L5YuXYoffvgBSkpKNYafvU3IhRpqU1lZiXHjxsHU1BRTp06t8+NXf9Fkb2+P48ePSzUEV3/RFBUVJVjdQcyh3EDVyrlRUVHo3LmzVJ3p+vXraN++fY1G07rk6ekJdXV1rF27Fjo6Onx2YmIiRo0ahezsbMGyxdSoUSNIJJIac7EJ6c36Wm04jkNISAh+/vnneisTIWKgBjryRXN3d8eePXugpaUFd3f3d+7HcZygQwfe5caNG2jbtq0gf1xUV65rm8dFXl4eJiYmiIiIQK9eveo8u5rYw++uXr0KU1NTqQrfpUuX4OLighcvXgiWrampiQsXLqB58+ZS2bdv34alpSVKSkoEyw4JCcGQIUPQvHlzwTLeJTY2Fn/88QfWr19fLxOJA1XffjPGYGpqinPnzkFPT49/TEFBAQ0bNhRkKPW+ffvQo0cPyMvL11iA5m1C/hFdPYT7XYTsofv2vJ7A/3/eCPEH/JtGjBjx3sejo6MFyxbzvGvLfrMMn+s1F/N9HhIS8t7HhZxLVczXOzEx8b2Pu7q61mles2bNcP78eejo6Lx33juO4wSdmuNdMjMz4ebmhvz8/Do/9qfwRZOYQ7mVlZVx/fp1GBsbS9WZ0tPT0bZtWxQVFQmWrauri1OnTsHS0lIq+9atW7C2tsbLly8Fy16/fj369esn6PDhd4mIiMDNmzexfPnyeht1kZiYCMYYOnXqhF27dkk1RCsoKMDY2FiQXmyXL19Gy5YtISMjU2MBmrcJ+QUbIdVoiCv5op04caLWf38qkpOTBZtTpXrC12bNmiElJQW6urqC5LyPmMOS2rRpg4MHD2LChAkA/r8HwOrVq9G+fXtBs5WUlGrtoZeZmSnVgCSEXbt2ITQ0FG3atMGQIUMwYMAAwTOrRUREICcnB/r6+jAxMamxGIYQPU2qh0K9PcGx0Dw9Pfk57zw9Pd+5n5B/RJeVlUEikSA4OBimpqaCZLxPbm5uvWcCVb093Nzc+IUa6ptY5w3U//u8mpjXXMz3efVcSGIt1CDW611WVoaZM2ciKiqq3nrYvPlzJebP2Lvk5OQINjdWbm6uKF80vUnModw2NjZISkqqMbR5x44dcHBwEDS7srKy1t/Rd+/eFXzxnylTpmDcuHH49ttvMWTIEHTv3r3epg84efIkTpw4gcOHD8PGxqZGfU2IHtHVjfq5ublo2rRpvTUM2tvb8/U1e3v7WjsuAMJ/6UFINepBR8gn4O2J+xljyM/Px/nz5xEcHFwvq5kC9b86lpjDkk6fPo3u3bvD29sbMTExGDNmDK5du4bk5GQkJiaidevWgmWLvRratWvXsGnTJmzduhV3795Fly5dMGTIEHh6egras03MniYAsHHjRqxatQq5ublITk6GsbExFi9eDFNTU/Tp00fQbLFoaWnhwoUL9d5wIeaqnoB4CzWIvYKsmMPQxFwcQ6z3OVC1Qu+VK1fqfaEGsV9vPT09nD59WpQpKsT09nQo1fW1gwcPYvjw4Vi+fLlIJROWmEO59+/fj6FDhyIoKAihoaEICQlBZmYmNmzYgAMHDqBr166CZQ8YMACampr4/fffoa6ujsuXL0NPTw99+vRB06ZNBe0ZXF5ejri4OGzZsgV//PEHlJWV0b9/fwwZMgQdOnQQLBcQt0c0ACQlJSEqKgo3b97Ejh070LhxY2zcuBHNmjVDx44d6zTr9u3bfIPg2wvQvE3M+S/Jl4Ma6MgX7UMrWr5JyPlzfHx8pL4pql6RrFOnTvjmm28EywWqvh2cPXu2KKtjiTksCQCuXLmChQsXIjU1lZ/weNq0abC1tRU0V8zV0N526tQpbN68GTt27EBJSYmgc++JaeXKlZgxYwYmTpyI2bNn88ObY2JisH79+nrvQfv8+fMaK/EJYcSIEbC1tf3gHJtCEHNVT3d3d/j7+7+396JQxDxvMYehiXnNxXyfe3p6wtPTU5SFGsR8vSdPngx5eXnMmzevXvI+5rUVcs7gt6dDebO+5uvrK3gPJ7G+aBL7C7YjR45gzpw5UvW1GTNmCF5Hvn//Ptzd3SErK4vs7Gw4OTkhOzsburq6+PPPPwVftKDay5cvsWfPHmzevBnHjh1DkyZNkJOTUy/Z9W3Xrl0YOnQovL29sXHjRqSnp8PU1BQrVqzAgQMHcOjQIbGLSIigqIGOfNHe/IaIMYY9e/ZAU1MTTk5OAIDU1FQ8f/4c3333neDfFoklNDQU69evR2hoKEaPHs03XGzfvh2LFy9GcnKy2EX8bImxGtrb0tLSEBsbi61bt+LJkyd49epVvZehPlhbW2POnDn8hM/V88hcvXoVbm5uePz4sWDZ8+fPh4mJCQYMGAAA6N+/P3bt2gVDQ0McOnRI0D+uZ8+ejYULF6Jz5861TiIv5GTq8+bNw/Xr10VZ1VPMhRrEPO/o6Gjs2LFDlGFoYl5zMd/nYi7UIObrPWHCBGzYsAFmZmZwcnKqcd513Uj2dsNYamoqKioqYGlpCQDIysqCrKwsWrduLcqcwfXhU/ui6Uvx6tUrbNmyRaq+Vh+rsL/t8ePH2Lp1K1atWoWMjIzPdrilg4MDAgICMGzYMKn6WlpaGrp3744HDx4Ilr1+/Xro6uqiZ8+eAICpU6fi999/h7W1NbZs2UI96Ei9oAY6Qv5r2rRpePr0KVatWsXP5VFRUYFx48ZBQ0MD4eHhgmWbmpoiJSUFOjo6UtufP38OR0dHQSc8FnN1rDe9evWKX1q+moaGhqCZlZWVuHHjBh4+fFhjLh8XFxdBs8WUm5uLzZs3Y9OmTcjKyoKLiwsGDx6M/v37S63mWxcaNGiArKws6OrqQltb+71zigg5pPldk0xnZ2fDzs5O0IZJU1NTxMbGokOHDoiPj4eXlxe2bduG7du3Iy8vD0ePHhUsW8zJ1MVc1VPMhRrEPG8xh6GJec3FfJ+LuVCDmK+3mAtrLVq0CBKJBOvXr4e2tjYA4NmzZxgxYgS+/vprTJ48WbDsTp06Yffu3TV6QBcUFMDT01PQ8xbzi6ZPQWlpaa31taZNm4pUIuFV95zbtGkTjh07BiMjIwwaNAje3t513kvb0dERCQkJ0NbWhoODw3vra0J+tqioqCA9PR0mJiZS7/ObN2/C2tpa0IXULC0tsXLlSnTq1AnJycno3LkzIiMjceDAAcjJyQn6+5uQarRIBCH/tW7dOpw8eVJqol1ZWVlMmjQJHTp0ELSB7tatW7VW4l+/fo179+4JlgsA9+7dg5mZWY3tlZWVNRrM6lpxcTGmTZuG7du348mTJzUeF/IPmzNnzmDw4MH8Kp9vqo+JYBMSEpCQkFBrZXPdunWC5bZv3x7nzp2Dra0tRowYgcGDB6Nx48aC5S1evJifSFnoufXep1mzZkhLS6vx7efhw4cFnyssPz+fn0D+wIED8PLywjfffAMTExN89dVXgmaLOZm6lpYWvv/+e1Gyv9TzFmN4aTUxr/mXuDAHIO7rLWZvrYiICBw9epRvnAOq5rSdNWsWvvnmG0Eb6CQSCUpLS2tsLykpQVJSkmC5QNX7vLZFERQVFVFcXCxodkVFBRYvXsx/sfT2NRDyC7bs7Gz4+vri9OnTUtvro/EfqOqdKZFIaq2vzZgxQ7DcQYMGYf/+/VBRUUH//v0hkUgEnXuuT58+UFRUBCDuZ4uhoSFu3LhRY17PkydPCj7P6J07d/i/ifbu3Yt+/frhhx9+gLOzM9zc3ATNJqQaNdAR8l/l5eXIyMjgh0tUy8jIEKwCvm/fPv7fR44ckeq9VFFRgYSEBMEnnhZzdaypU6fixIkTWLFiBYYNG4bffvsN9+7dQ1RUlODz2owdOxZOTk44ePAgDA0N6221KKBqLpfQ0FA4OTnVe7a7uzvWrFkDGxubesl7cx5BoecUfJ/AwED89NNPKCkpAWMM586dw5YtWzB37lysWbNG0GxtbW3cuXMHRkZGiIuLw6xZswBU/XFRX0NUSktLkZubi+bNm9fbsEsxpwUQcxiKmOddXwsK1eZTGPojxvv8TfW90JKYr3e1GzduICcnBy4uLlBWVuYbTYRUUFCAv/76q8bvsYcPH6KwsFCQzMuXL/P/Tk9PlxpmV1FRgbi4OEG/7ALE/aIpJCQEa9aswaRJkxAcHIyff/4Zt27dwt69ewVtpAKq5mmWk5PDgQMH6r3OtHr1avz444/Q1dWFgYGBVDbHcYKeO8dx2LZtG7p161Yvn2dvfp6I+dkyZswY+Pv7Y926deA4Dvfv30dycjKmTJki+HtNTU0NT548QdOmTXH06FEEBAQAAJSUlD7bKWDIJ4gRQhhjjAUEBDBtbW0WHh7OkpKSWFJSEgsPD2c6OjosICBAkEyO4xjHcUxGRob/d/VNQUGBWVhYsP379wuSXW3fvn1MU1OTzZs3j6moqLDw8HA2atQopqCgwI4ePSpotpGRETtx4gRjjDF1dXWWnZ3NGGNsw4YNrEePHoJmq6io8Hn1zcDAgG3YsEGUbLGVl5eznTt3srCwMDZr1iy2e/duVl5eXi/Zv//+O2vatCn/M9akSRO2Zs0awXN/+uknZmxszLp06cJ0dHRYYWEhY4yxrVu3MgcHB0Gzi4uLma+vL5OVlWWysrIsJyeHMcbYhAkT2Ny5cwXNZoyxsrIyFh8fz1atWsUKCgoYY4zdu3ePvwZC2rBhA+vQoQMzNDRkt27dYowxtnjxYrZ3717Bs8U872fPnrHVq1ez6dOnsydPnjDGGEtNTWV3794VPFusay7m+7y8vJyFhoayRo0aSWX/8ssv9fL5Itbr/fjxY9apUye+DlN93r6+vmzSpEmCZg8dOpQ1bdqU7dixg925c4fduXOH7dixg5mYmLBhw4YJkll9nrXV1ziOYyoqKmzt2rWCZFdbt24da9y4Mdu6dStTVVVlW7ZsYbNmzeL/LSRTU1N24MABxhhjampq7MaNG4wxxpYsWcIGDRokaLaKigrLyMgQNONdmjZtyubNmydK9qfg/PnzbOPGjSw2NpZduHCh3nL/85//MGVlZf7nS0lJif3yyy+C5w4ePJg5OjqykSNHMhUVFfb48WPGGGN//PEHs7GxETyfEMYYowY6Qv6roqKCzZ8/nzVq1Ij/hdCoUSM2f/58wRsQTExM2KNHjwTNeJ+4uDjm4uLCVFVVmbKyMnN2dmZHjhwRPFdVVZX/I65x48bs7NmzjDHGbt68yVRVVQXNdnd3Z4cPHxY0410aNGjAV27FcOfOHfbbb7+xadOmsYCAAKmbkLKzs5m5uTlTUVFhDg4OzN7enqmoqDBLS8t6vR6PHj1if/31V73llZaWsvDwcObn5ydVwV28eDFbvXq1oNl+fn6sdevWLCkpiamqqvJ/RP/xxx/M3t5e0Oxbt26xFi1aMBUVFamGC39/fzZmzBhBs1esWMF0dXXZrFmzmLKyMp8dHR3N3NzcBM0W87wvXbrE9PT0mJmZGZOTk5NqLBo6dKig2WJeczHf5yEhIczU1JTFxsZKnfe2bdtYu3btBM0W8/UeOnQo69atG7tz5w5TU1Pjs48cOcKsra0FzS4uLmY//vgjU1RU5BvNFBQU2I8//siKiooEybx16xbLzc1lHMexlJQUduvWLf52//79z/6LJhUVFXb79m3GWNWXjKmpqYwxxnJycpiGhoag2U5OTiwpKUnQjHdRV1fn39tiKCoqYgcPHmQrV65kS5YskboJ6a+//mLu7u6M4zimra3NtLS0GMdxrFOnTuzhw4eCZlcrLi5mKSkp7OzZs/Xy5RZjVV94/PTTT6x3795SfyPMmDGDzZo1q17KQAg10BFSixcvXrAXL17UW15lZeU7HysuLq63ctQ3W1tbJpFIGGOMde3alU2ePJkxVvWNbOPGjQXN3r17N7O2tmbR0dHs/Pnz7NKlS1I3IU2dOpWFhoYKmvEux44dYyoqKszGxobJyckxe3t7pqWlxTQ1NZm7u7ug2T169GDdu3fne3kwVtULo3v37szDw0PQ7C9V06ZNWXJyMmOMSf0RnZ2dzdTV1QXN7tOnDxsyZAh7/fq1VLZEImFmZmaCZltZWbE9e/YwxqTP+8qVK0xHR0fQbDHPu3PnziwwMJAxJn3ep06dYsbGxoJmi3nNxXyfN2/enB07dqxGdkZGBtPS0hI0W8zXW19fn6WlpdXIro8v2KoVFRWxS5cusbS0NMEa5j7G++pyda2+v2iysLBgZ86cYYwx1rFjR75n6tatW5menp6g2QkJCax9+/bsxIkT7PHjx3wdvT7q6r6+vmzlypWCZrzLhQsXmIGBAdPQ0GCysrJMT0+PcRzHVFVVWbNmzQTN9vLyYq1bt2bp6en8tmvXrjEnJyc2cOBAQbMJ+dLRHHSE1ELo1UPf5u7ujtjYWDRp0kRq+9mzZzF06FBkZWXVSzmKiopqzLcn5LUYMWIELl26BFdXVwQFBaFnz55YtmwZysvLsWjRIsFyAfCTuPv6+vLb6mvFwZKSEvz+++84duwY7OzsIC8vL/W4kOceFBSEyZMnIzQ0FOrq6ti1axcaNmwIb29vdO/eXbBcAEhMTMSZM2fQoEEDfpuOjg7mzZsHZ2dnQbOfPHmCGTNm4MSJE7VO9CzkBNeAeJNMP3r0CA0bNqyxvbi4WPB5fE6ePIlTp05BQUFBaruxsbHgi9+IOZm6mOedkpKCqKioGtsbN24sNWeWEMS85mK+z8VcaEnM17u4uBgqKio1tj9+/JifaF5oqqqqsLOzq5esakOHDsXKlSuhpqYmtf3WrVsYOnSo4AtFVNPV1a2XnGrVq1N/9dVX8Pf3x6BBg7B27Vrk5eXx83QJpUuXLgCAzp07S22vj/qamZkZgoODcebMGdja2taor/n5+QmWHRAQgG+//RYrV66ElpYWzpw5A3l5eQwZMgT+/v6C5QJAXFwcjh07JrVSrLW1NX777Td88803gmaXlJRg2bJl76yvCbmCLAA8f/4c586dq5HNcRyGDh0qaDYhAC0SQYiUnTt3vnOFKiF/IWhoaMDOzg4rVqzAwIEDUVlZidDQUMydOxcTJkwQLBeo+qNq/PjxkEgkUkuX10fF581Knbu7O65fv47z58+jefPmaNWqlWC5gLir/l2+fBn29vYAgKtXr9ZrdkZGBrZs2QIAkJOTw6tXr6CmpobQ0FD06dMHP/74o2DZioqKtU7gXVRUVKMxo64NGTIEOTk5GDlyJPT19b+YSabbtGmDgwcP8p8j1dmrV69G+/btBcsFqhooavv8uHv3Lr+yr1DEnExdzPNWUlJCQUFBje2ZmZnQ09MTNFvMay7m+1zMhZbEfL1dXFywYcMGhIWFAai65pWVlQgPD4e7u7ug2UBV4+SOHTtqra/t3r1bsNz09HTY2toiNjaW/2Jp/fr18PPzQ9euXQXLBcT9ounNhbv69esHIyMjnDp1CmZmZujdu7dguYC4Kwb//vvvUFNTQ2JiIhITE6Ue4zhO0Aa6tLQ0REVFQVZWFrKysnj9+jVMTU2xYMECDB8+HN99951g2ZWVlTUaIwFAXl5e8JWrfX19ER8fj379+qFt27b1Wl/bv38/vL29UVxcDHV19Rr1NWqgI/VC5B58hHwylixZwtTU1NhPP/3EFBQU2JgxY1iXLl2YpqYm+89//iN4/sqVK5mqqiobNGgQa9++PWvcuDGLj48XPLd9+/asffv2bOvWrezEiRNMIpFI3cjnRV9fn127do0xxpi1tTX7448/GGOMpaWlCT4saejQoczGxoadOXOGVVZWssrKSpacnMxatmzJhg8fLmi2mpoaPxyrvok5yfSpU6eYuro6Gzt2LFNSUmL+/v6sS5cuTFVVlZ0/f17QbC8vLzZ69GjGWNX1v3nzJissLGSdOnViPj4+gmaLOZm6mOc9evRo5unpyUpLS/ns27dvMwcHB+bv7y9otpjXXMz3uZgLLYn5el+7do3p6emx7t27MwUFBdavXz9mZWXF9PX1BZ9TdMuWLUxeXp717NmTKSgosF69ejFLS0umqakp+M9YWVkZmzZtGlNQUGBBQUGsX79+TE1NTfAFIhhjrHv37szc3JzNmzePRUdHs5iYGKkb+bzo6uqyzMxMxljVEOO4uDjGWNXweWVlZUGze/fuzVxcXNi9e/f4bXfv3mWurq7M09NT0GwNDQ128uRJQTPexdzcnPn7+3/W0wuRTx810BHyX5aWlmzz5s2MMen5VIKDg9lPP/1UL2WYPn064ziOycvLs1OnTtVLpqqqKrt+/Xq9ZH1qMjMzWVRUFAsLC2MhISFSNyGNGDGCX9nxTUVFRWzEiBGCZvfp04f9/vvvjDHGAgMDmZmZGZs1axZzdHRknTt3FjT72bNnrHfv3vwqxQoKCkxGRoZ5enqy58+fC5rt5OTEz1FV38SeZPry5cts2LBhzMbGhllZWTFvb292+fJlwXPv3bvHLCwsmJWVFZOTk2Pt2rVjOjo6zNLSsl7mThJrMnUxz/vFixfM2dmZaWlpMVlZWWZkZMTk5eWZi4tLvczRJdY1Z0y89zlj4i20JPbrnZ+fz2bMmMF69uzJevTowX7++Wd2//59wXNtbW3Z8uXLGWP/X1+rrKxko0ePZjNmzBA8n7GqSeOr62unT5+ul0wxv2gS27Nnz9iRI0fYxo0b2fr166VuQgoJCam1sebly5eC1xW7du3KNm3axBhjbMyYMaxt27YsNjaWdevWjbVt21bQ7Ly8PObg4MDk5eWZqakpa968OZOXl2eOjo7szp07gmZbWVkJPhf0u6ioqIhaXyOEMcY4xhgTuxcfIZ8CFRUVZGRkwNjYGA0bNkR8fDxatWqF7OxstGvXDk+ePBEs+9mzZxg1ahQSEhIQHh6OxMRE7N27FwsWLMC4ceMEywWqhpb+/PPP/BwfX4oPDTsUckizrKws8vPza8yZ9PjxYxgYGKC8vFyw7Js3b6KoqAh2dnZ4+fIlpkyZgpMnT8LMzAyLFy+uMUxLCNnZ2cjIyABQNadJbfM31bWUlBRMnz4dM2bMQMuWLWsM3RByrsWRI0eiTZs2GDt2rGAZn6pXr15h69atSE1NRWVlJRwdHeHt7Q1lZeV6K8Pjx49RWVlZ6xxlQhH7vI8fP44LFy7w2fX9+S7GNf+Sif161zdVVVVcu3YNJiYm0NXVxYkTJ2Bra4uMjAx06tQJ+fn5gmWXlZVh+vTp+O233zB58mScPHkSmZmZWLduHTw8PATLBaqGci9btgzt2rUTNOdT86Fhh0IO7X1Xfe3Jkydo2LChoNPAnD9/HoWFhXB3d8ejR48wfPhwvr62bt06fqoUIcXHx+P69etgjMHa2rpePlsOHz6MpUuXYtWqVfVSJ33Td999h4EDB8LLy6tecwl5EzXQEfJfpqam2LlzJxwdHdGmTRuMGjUKY8aMwdGjRzFw4EBBKwCNGzdGs2bNsHHjRjRr1gwAsG3bNowbNw7t2rXDwYMHBcvOycnB2LFjMWTIkFobLup7Aub6YmxsjHHjxmHatGn1lllQUADGGLS1tZGdnS01R1BFRQX279+P6dOn4/79+/VWJrFU/+qpr7lFsrOzMWjQIFy8eLFGOYSea3Hu3LlYtGgRevbsWe+TTBNCyOfGyMgIhw4dgq2tLVq1aoXp06dj0KBBSE5ORvfu3fHixQvBslu1aoWXL19i48aNaNeuHRhjWLBgAX799Vf4+vpixYoVgmWL+UWTmCwsLODh4YE5c+bUujCJkGRkZPDXX3/VmNPx+PHjGDBgAB49elSv5fkSPHr0CF5eXvjzzz+hoqJS430u5N9ja9euRWhoKEaMGFFrfU3o+RYJAWiRCEJ4nTp1wv79++Ho6IiRI0ciICAAO3fuxPnz5wWdiBUAxo4di59//hkyMjL8tgEDBsDZ2RkjRowQNPvRo0fIycmRyqmv1UzF9OzZM/Tv379eM7W0tMBxHDiOg4WFRY3HOY5DSEiIoGVISUlBZWUlvvrqK6ntZ8+ehaysLJycnATNX7t2LRYvXozs7GwAgLm5OSZOnIhRo0YJmuvt7Q0FBQVs3ry53heJEHOSaUII+dx8/fXXiI+Ph62tLby8vODv74/jx48jPj6+xkqfdc3JyQlLly6FqqoqgKrP8GnTpqFbt24YMmSIoNlaWlp48eIFOnXqJLX9c6+v3bt3D35+fvXaOKetrS1VX3uzzlBRUYGioiLBe8Xn5uaivLwc5ubmUtuzs7MhLy8PExMTQfMTEhKwePFiZGRkgOM4tGjRAhMnThS8F92gQYNw7949zJkzp97ra6NHjwYAhIaG1njsc/4ZI58W6kFHyH9VVlaisrIScnJV7dbbt2/nu5KPHTtW8FUmq5WUlEBJSalesoCqIYZWVlaYOnVqrb8Ihe5eXllZiRs3btS6IpmLi4tguWIMO0xMTARjDJ06dcKuXbvQoEED/jEFBQUYGxujUaNGgpahbdu2mDp1Kvr16ye1fffu3Zg/fz7Onj0rWHZwcDAWL16MCRMm8CsrJicnY/ny5fD398esWbMEy1ZRUcHFixdhaWkpWAYhhBDhPX36FCUlJWjUqBEqKyuxcOFCvr4WHBwMbW1tUcr1+vVrKCoqCnb8tm3bQk5ODv7+/rXW11xdXQXLNjU1RUpKCnR0dKS2P3/+HI6Ojrh586Zg2WIMO1y/fj0YY/D19UVkZCQ0NTX5xxQUFGBiYiL4CtGurq7w9fXF8OHDpbbHxsZizZo1kEgkgmUvX74cAQEB6NevH3+eZ86cwc6dO7Fo0SKMHz9esGwVFRUkJyejVatWgmUQ8imjBjpCAJSXl2P27Nnw9fWFkZFRvedXVlZi9uzZWLVqFf766y9kZWXB1NQUwcHBMDExwciRIwXLVlVVxaVLl+plHrC3nTlzBoMHD8bt27fx9keREN9ULV26lP93cXGxaMMOb9++jaZNm9brt4LV1NTUcPnyZZiamkptz83NhZ2dHQoLCwXL1tXVxbJlyzBo0CCp7Vu2bMGECRPw+PFjwbJdXFwwY8YMUedmKi0tRW5uLpo3b85/EUAIIeTvKy8vx6ZNm9CtWzcYGBiIUoaNGzdi1apVyM3NRXJyMoyNjREZGYlmzZqhT58+guWK+UWTjIwMHjx4UGMutr/++gtNmzbF69ev6zRv3759/L8fPXok2rDDxMREODs7i/I7W0NDAxcuXKhRP79x4wacnJzw/PlzwbIbN26MoKCgGg1xv/32G2bPni3oVCyOjo7/x96Zx9Wcvv//dUp70S5MUx0t2sk6Gilr1oiKosgWpaQpzMgyMpYk2wzRDNKgJrLOFNomWUraJNplqaFojLJ27t8fTe+v4xzz2brP26/ez8fjPNR9enhdcs773O/rvq7XhR9++IF1r0VJF0xwcLTB3SFwcADo0qULwsPDRU6pJEVYWBgOHz6MrVu3MuXVAGBpaYnIyEiqCboRI0awlqDz8fHBgAEDcP78efTo0YN6wioyMlLoe7baDlNTU6GsrCzSYvvLL7+gubmZ6utQTk4Of/zxh0iCrra2lvoGtKWlRWwLbf/+/akOxgCApUuXIiAgAMHBwWI3+DS9Fpubm7F06VIcPnwYAJgEvL+/P3r27ImVK1dS0/6Q58+fIzU1FSYmJjA1NZWYriR5+/YtxowZg6ioKLGt5BySo7GxEaqqqtR1Dh06BFdXV4n7U4mjpaUFRUVF0NPTY62SqyPTpUsXLF68mBk0JGn27t2LNWvWYNmyZdi4cSNzkKiqqoodO3ZQTdANGDAA9+/fl2iC7v1EWXJyslAlWUtLC1JSUqi0Wk6ZMkVkjY22w6amJqSkpGDs2LFC68nJyRAIBBg3bhw1bR6PJ/bQ9M8//6Teavn8+XM4OjqKrI8ZM4a6d/PmzZsRFBSEjRs3it2v0fRabGlpwXfffcdKwQQHB4Okx8ZycHyqODk5kYMHD7Ki3bt3b3Lp0iVCCCHKysrMiO+SkhKiqqpKVTsqKoro6uqStWvXkoSEBHL69GmhB00UFRVJWVkZVY1PEWNjY5Kamiqynp6eToyNjalqu7m5keHDh5PGxkZm7dmzZ2T48OHExcWFqrafnx8JDAwUWQ8KCiJLliyhqs3j8UQeUlJSzJ808ff3J/379yeZmZlESUmJeX+fPn2a9O3bl6q2i4sL2b17NyGEkObmZmJkZERkZGRIly5dSEJCAlVtQlpfWwcOHCArV64kDQ0NhBBCcnNzyYMHD6jqampqktLSUqoanyK5ubmksLCQ+f7UqVPEycmJrFq1irx+/Zqq9ubNm8nx48eZ711cXIiUlBTp2bMnyc/Pp6qto6NDVFRUiLe3N8nKyqKq9SEBAQEkOjqaEELIu3fviK2tLeHxeERJSYmkpaVR1R4+fDg5fPgwaW5upqojjrq6OjJr1izSo0cPIi0tTaSkpIQeNLG3tyeJiYlUNT6Gqakpo/3+fq2oqIhoaGhQ1Y6PjydmZmbk4MGD5MaNG6SgoEDoQYMPPy/ff8jKyhJjY2Ny9uxZKtqfApaWluT8+fMi67/99huxsrKiqj1hwgTi4uJC3r17x6y9e/eOTJs2jTg6OlLVdnd3J1u3bhVZDw8PJzNmzKCq/f5r7v2HJPZr69evJ3w+n8TGxhIFBQXm/R0XF0eGDBlCVZuDow2ugo6D42/GjRuHVatW4datW+jfvz9jANwGzRL6hw8fiq1gEwgEePv2LTVdAIwHGxsnk4MHD0Z5eTkr1XvffvstvvrqK5GKi5cvXyI8PBxr1qyhpn3v3j1mWu/76OnpoaamhpouAERERMDOzg56enro168fACA/Px/du3fHkSNHqGoDrUMiLly4wLQuXLt2Dffv34enpyeWL1/O/Nz27dvbVbeqqqpd/77/hFOnTiEuLg5DhgwRqhI1MzNDRUUFVe3ff/8d33zzDQAgMTERhBA0Njbi8OHDCAsLw7Rp06hpFxYWYtSoUejWrRuqq6uxYMECqKurIzExEffu3UNMTAw1bU9PT/z444/YvHkzNY2P0dLSgsjISMTHx6OmpgZv3rwRep7mBLpFixZh5cqVsLS0RGVlJWbMmIGpU6cy1bk7duygph0VFYXY2FgAwMWLF3Hx4kX89ttviI+PR3BwMC5cuEBN+8GDBzh//jwOHToEBwcHGBgYYO7cufDy8qLeBpmQkMAMBzh79iyqqqpw584dxMTE4JtvvkFWVhY17f79+yMkJARLly6Fq6sr5s2bJ7G2sDlz5qCmpgahoaESqYB/nyVLliAoKAgPHjwQu1+jWRFdVVXFfHa+j5ycHJqamqjpAq3DwwDA29ubWaM91KvNG9jAwAA5OTnQ1NRsd41/RUxMDNzc3ET8/d68eYPjx4/D09OTmnZZWRnMzMxE1vv06YPy8nJqugCwdetW2NnZwcTEBMOGDQMAZGZmMpXwNDE1NcXGjRuRnp4u5EGXlZWFoKAgIcuY9u44SUtLa9e/7z8hJiYG+/fvx8iRI4U8qq2srHDnzh3W4uLoXHAedBwcf/P+BNUPoZ2oGjBgAJYtW4ZZs2ZBRUUFBQUF4PP5WL9+PS5duoTMzExq2mySmJiI1atXs9J2KC0tjdraWhE/lYaGBmhra1P9//7888+xZ88ekaTv6dOn4evriwcPHlDTBlpbNn7++WcUFBRAQUEBVlZWmDlzpsjvv71xcHD4t36Ox+NR33xKEkVFRdy6dQt8Pl/o/V1QUAA7Ozv8+eef1LQVFBRQWloKXV1deHp6omfPnti8eTNqampgZmaGFy9eUNMeNWoUbGxssHXrVqF/95UrV+Du7o7q6mpq2kuXLkVMTAwMDQ0xYMAAkRv49k4Av8+aNWsQHR2N5cuXIzQ0FN988w2qq6tx6tQprFmzhmr7fLdu3XDz5k307t0bW7ZsQWpqKpKTk5GVlYUZM2bg/v371LTff60FBATg1atXiIqKQmlpKQYPHoxnz55R036fx48fIzY2FocOHcKdO3fg6OiIefPmYdKkSf/4Of/fIi8vj/Lycnz22WdYuHAhFBUVsWPHDlRVVcHa2hrPnz9vd833aWlpwblz53Dw4EH8+uuvMDQ0hLe3N2bPno3u3btT01VRUUFmZib69u1LTeNjiPt/lNT0eTMzM2zatAlOTk5C17Vdu3bh8OHDyM3NpaZ97969f3ye9lCvD5FUCzub+zUdHR0cPXpUZHLupUuX4O7ujsePH1PTBoBHjx5hz549Qvs1Pz8/oSFjNBB3iCwOHo9HdUCIpFFQUMCdO3egp6cn9P6+ffs2Bg0aRHXPxMHRBldBx8HxNx9OEJUka9euxezZs/Hw4UMIBAKcPHkSd+/eRUxMDM6dO8daXLRpq96R5GlwG20aH1JQUEB94zNjxgz4+/tDRUWFmVSbkZGBgIAAzJgxg6o20DoYZOHChdR1PoTNU1GgtVI1KytL7MRgmkmTgQMH4vz581i6dCkAMK+7AwcOUJ8Cp6uri6tXr0JdXR1JSUk4fvw4AODZs2fUzY9zcnIQFRUlst6rVy/U1dVR1b516xZsbGwAtHr+vQ/tSp+ff/4ZBw4cwIQJE7B+/XrMnDkTvXv3hpWVFa5du0b1tUYIYV7bly5dwsSJEwG0vg5oDmEBADU1Ndy/fx+6urpISkpipjITQqj7Jb2PtrY2bG1tcffuXZSWlqKoqAhz5syBqqoqDh48CHt7+3bV6969O27fvo0ePXogKSkJP/zwA4BW70lpael21RKHtLQ0nJyc4OTkhCdPniAqKgqhoaH4+uuvMX78ePj7+4skF9oDXV1dkeFOkoLNiujg4GD4+vri1atXIIQgOzsbx44dw6ZNmxAdHU1VW9IJuPfZsmUL9PX1mSo+FxcXnDhxAj169MCvv/5KdeLmx/ZrDx48EPLEo8HkyZOxbNkyJCYmonfv3gBahzQEBQVR7axpo2fPnvjuu++o63wIm+8xoHVAQ2Fhodj9Gs3fu7m5OTIzM0Xea7/88ovYylkODhpwCToOjk+ASZMmIS4uDt999x14PB7WrFkDGxsbnD17FqNHj253vV27dmHhwoWQl5cXKlMXB82bSTY2AGpqauDxeODxeDA2Nhba9LW0tODFixdCZe00CAsLw7179zBy5EhmMINAIICnp6dENmKlpaVIT08Xu/Gh2drLJgcPHoSPjw9kZWWhoaEh9P9OeyjIpk2b4OjoiNu3b+Pdu3fYuXMniouLcfXqVZEBJe3NsmXL4OHhAWVlZejp6THJid9//x2WlpZUteXl5cVWD929exdaWlpUtdlMBtfV1TG/W2VlZaZCcuLEiQgNDaWqPWDAAISFhWHUqFHIyMjA3r17AbRea2lWUwGAs7Mz3N3dYWRkhIaGBsY8PT8/XyI2Bn/88QeOHDmCgwcPorKyElOmTMG5c+cwatQovHz5EqtXr4aXl9e/rEL6T5k7dy5cXV2ZNs+2z+zr16+jT58+7ar1T2RnZ+PgwYM4duwYtLW1MWfOHNTW1mLSpElYvHgxtm3b1q56O3bswMqVKxEVFUVlSMA/wWaiau7cuXj37h1CQkLQ3NwMd3d39OrVCzt37qRywHbmzBmMGzcOMjIyQgMbxEEzcfFhC/ulS5eQlJREtYW9X79+zH7t/f0S0Lpfq6qqEjvIoD0JDw+Ho6Mj+vTpg88++wxAa2Jw2LBh7f6eEkdjYyOys7PF7tdotvaySVJSEjw9PcUeKtE+vO+sBRMcnxZciysHRyfEwMAAN27cgIaGxj+WsXe00nUAOHz4MAgh8Pb2xo4dO4ROX2VlZaGvr0+9qqmN0tJSpm3B0tJSIjcdBw4cwOLFi6GpqQkdHR2RRNXNmzepx8AGurq68PHxwapVq6i0uf0rioqKsG3bNuTm5kIgEMDGxgYrVqygniQDgBs3buD+/fsYPXo0lJWVAQDnz5+HqqoqbG1tqekuXLgQT548QXx8PNTV1VFYWAhpaWlMmTIFdnZ2VP3Q2igvL0dFRQXs7OygoKDw0UqM9sTExAQxMTEYPHgwhg0bhgkTJmDlypWIi4vD0qVLqbZEFRYWwsPDAzU1NVi+fDnWrl0LoLXlt6GhAUePHqWm/fbtW+zcuRP379/HnDlzmGqDHTt2QFlZGfPnz6emPWnSJCQnJ8PY2Bjz58+Hp6enSCX0o0eP8Nlnn1Gplk9ISMD9+/fh4uLC3MQfPnwYqqqqVKd6Pn78mElKlpWVYdKkSZg/fz7Gjh3LvM4vXbqEKVOmtHtrlpqaGpqbm/Hu3TsoKiqKWCTQ9Fr8VKivr4dAIBBpvWxPpKSkUFdXB21tbVatWNhoYV+/fj3zZ1BQEPP5Bfzffm3atGmQlZVtd+33IYTg4sWLQm2mbd0PNDl79iw8PDzQ1NQEFRUVkf1aR32PGRoaYuzYsVizZg31gyVxJCcn47vvvhPar61ZswZjxoyReCwcnRMuQcfB8QnA5/ORk5MDDQ0NofXGxkbY2Nh0uCTZh9y+fVusmTrN0+CMjAwMHTqUuu/ap4aenh6WLFmCFStWsB2KRNHQ0EB2djbTosJBn+fPn2P8+PEoLi7GX3/9hZ49e6Kurg5ffPEFfv31VxFfuPakoaEBrq6uSEtLA4/HQ1lZGfh8PubNmwdVVVVERERQ0165ciW6du2Kr7/+GgkJCZg5cyb09fVRU1ODwMBAVgZXvHr1CtLS0h32ejdv3jzMnz//Hw9XCCGoqalp94MQNg3sZWVl0bt3b3h7e2POnDliK1OfP38OJyendq8qPXz48D8+7+Xl1a56nwojRozAyZMnRbzXnj9/jilTpnQo/9T36dmzJxISEjB06FCYmJggLCwMLi4uuHv3LgYOHEjVa/Hw4cNwc3OjbsvwqWFsbIzx48fju+++Exlo1pHp2rUr8vLyuP0aR6eFS9BxcHwCvH9C+j5//PEHPv/8c7x+/ZqK7tu3b2FiYoJz586JnVJFm8rKSkydOhVFRUWM9xzwfx5RtH2LWlpacOrUKZSUlIDH48HMzAyTJ0+WiG/QgwcPcObMGbGJSZoG9l27dkV+fj74fD41jU+RkJAQqKurY+XKlazF8PjxY7FtKjSHobzv7yiOn376iZp2G6mpqbh58yZzEj1q1Cjqmp6ennj8+DGio6NhamrKGD1fuHABgYGBKC4uph5DG9euXcOVK1dgaGhI3bPo/v374PF4TBVXdnY2jh49CjMzM+q+k4cPH4ampiYmTJgAoPU9t3//fpiZmeHYsWNUK4TZTJKxZWBPCEFmZiYGDBjQqW7e2eZj+7XHjx+jV69eePv2LRXdt2/fYsyYMYiKioKxsTEVjX/Cz88P586dg5GREfLy8lBdXQ1lZWXExcVhy5YtEqm+z83NFdqvScoTrKmpCRkZGWL3azTtMZSUlFBUVNTp9mve3t6wtbXFvHnzWI3jxYsXIvu1rl27shQNR2eC86Dj4GCR9/1EkpOThdotW1pakJKSQtXbRUZGBq9fv6be8vUxAgICYGBggEuXLoHP5yM7OxsNDQ0ICgqi7u1RXl6O8ePH4+HDhzAxMQEhhGnfOH/+PNWTu5SUFEyePBkGBga4e/cuLCwsUF1dDUIIY2xPCxcXF1y4cIG6z96nxqZNmzBx4kQkJSWJnRhMMymam5sLLy8vlJSUiJiq025L+rDt6O3bt7h16xYaGxupGMe38e7dO8jLyyM/Px8jRoygqiWOCxcuIDk5mUlUtWFkZNTuHmT/iiFDhmDIkCES0XJ3d8fChQsxe/Zs1NXVYfTo0TA3N0dsbCzq6uqoekx+9913jOfd1atXsWfPHuzYsQPnzp1DYGAgTp48SU177ty5cHR0FEma/PXXX5g7dy7VBB1bBvaEEIwaNQrFxcUwMjKipvNPsHnIJWkKCwuZr2/fvi006KalpQVJSUno1asXNX0ZGRncunWLtf1aZGQkDAwMUFNTg61btzLtprW1tViyZAlV7cePH2PGjBlIT0+HqqoqCCH4888/4eDggOPHj1P1NM3Ly8P48ePR3NyMpqYmqKuro76+HoqKitDW1qaaoBs7dixu3LjR6RJ0e/bsgYuLCzIzM8Xu12h7Y/v5+SE9PR2vXr1i1iUxvI6Dow0uQcfB8R4CgQDl5eViq1xo+E1MmTIFQOtN+oftIDIyMtDX16faigW0ehNt2bIF0dHRQga8kuDq1atITU2FlpYWpKSkICUlhS+//BKbNm2Cv78/8vLyqGn7+/ujd+/euHbtGuNV1NDQgFmzZsHf3x/nz5+npr1q1SoEBQXh22+/hYqKCk6cOAFtbW14eHhQNzw2NDREaGgorl27JvGNDwAcOXIE+/btQ1VVFa5evQo9PT3s2LEDBgYGVL2avvvuOyQnJ8PExAQARLxcaDJ37lwYGxvjxx9/RPfu3SV6g5WYmCiyJhAIsGTJEqqb/i5dukBPT4+1zWxTU5PYqqL6+nqRKisasPU6v3XrFgYNGgQAiI+Ph4WFBbKyspikPM0E3f3795lhEKdOncL06dOxcOFC2Nratvvk1A9hI0nGtoG9lJQUM5CDjQQdm4dcbNiC9O3bl/n/FnfgoKCggN27d7e77vt4enrixx9/lHib/Nu3b7Fw4UKEhoaKfG4sW7aMuv7SpUvx/PlzFBcXw9TUFEBrktTLywv+/v44duwYNe3AwEBMmjQJe/fuhaqqKq5duwYZGRnMmjULAQEB1HQBYMKECQgODsbt27fF7tdoV2RnZmYiKioKFRUVSEhIQK9evXDkyBEYGBjgyy+/pKZ79OhRJCcnQ0FBAenp6RId6uXh4QGgtbtA0vs1Dg4GwsHBQQgh5OrVq8TAwIBISUkRHo8n9JCSkqKqra+vT548eUJV42NMmTKFqKiokB49epAxY8aQqVOnCj1ooqqqSioqKgghhPD5fJKamkoIIaS8vJwoKChQ1VZUVCSFhYUi6/n5+URJSYmqtrKyMikvLyeEtP4Obt26xWjr6elR1dbX1//ow8DAgKr2Dz/8QDQ1NUlYWBhRUFBg/u8PHjxI7O3tqWqrqqqSgwcPUtX4GMrKyqSsrIwV7Y9x584doqOjQ1Xjp59+IuPGjSMNDQ1UdcQxfvx4snr1akJI6++/srKStLS0EBcXFzJt2jSq2my+zpWUlEhVVRUhhJBJkyaRzZs3E0IIuXfvHpGXl6eqraWlRW7evEkIIaRv377k8OHDhJDW6zmta2rfvn1Jv379iJSUFLG0tCT9+vVjHlZWVkRFRYW4uLhQ0V63bh1Zt24d4fF45KuvvmK+X7duHfnuu+/I0aNHyevXr6lot3Hu3Dny5ZdfkqKiIqo64hg3bhxxdHQUen/X19cTR0dHMn78eKraPB6P/PHHHyLrdXV1RFZWlopmdXU1qaqqIjwej+Tk5JDq6mrm8ejRI/Lu3Tsquu/j5+dHunbtSmxsbMjChQtJYGCg0IMm3bp1Y65lkqZr164kOztbZP369eukW7duVLW7detG7ty5w3x9+/ZtQggh165dIyYmJlS1P7wXkeR9SUJCAlFQUCDz588ncnJyzP/9999/T8aNG0dVu3v37mTjxo2kpaWFqo44lJSUmP9vDg624CroODj+xsfHBwMGDMD58+fRo0cPiZ6aVFVVSUzrQ1RVVTFt2jRWtC0sLFBYWAg+n4/Bgwdj69atkJWVxf79+6mX9MvJyeGvv/4SWX/x4gX1iWBKSkqMr2DPnj1RUVEBc3NzABA7Vr49YfO1tnv3bhw4cABTpkwRqgAYMGAAvvrqK6racnJyVCeW/hMjR45EQUEBU130KVBRUYF3795R1di1axfKy8vRs2dP6OnpiQyFoOlZFB4eDnt7e9y4cQNv3rxBSEgIiouL8fTpU2RlZVHTBdh9nZubm2Pfvn2YMGECLl68iA0bNgBonWD6YbVRezN69GjMnz8f/fr1Q2lpKeNFV1xcTM2qoa0KPT8/H2PHjv3olEcatE3I1dfXZ83AftasWWhuboa1tTVkZWWhoKAg9DzNKY8ZGRlCFehA6zCezZs3U7vWsmkL0uahSGMK8L/LrVu3GBuM0tJSoedo71mnTp2KU6dOYfny5VR1xCEQCMQOuJGRkaH+/yEjI8P8brt3746amhqYmpqiW7duqKmpoarN5mstLCwM+/btg6enJ44fP86sDx06FN9++y1V7Tdv3sDNze0fpxbTYuDAgbh//z7TbcHBwQZcgo6D42/KysqQkJDwSd1ES4KDBw+ypr169Wo0NTUBaN0MTJw4EcOGDYOGhgbi4uKoak+cOBELFy7Ejz/+yLSEXb9+HT4+PtTbBoYMGYKsrCyYmZlhwoQJCAoKQlFREU6ePCkxryoAIkM5aFNVVSXW1FlOTo55HdAiICAAu3fvxq5du6jqiCM6OhpeXl64desWLCwsJNqm8uHNFCEEtbW1OH/+PPUpi23JEzYwMzNDYWEh9u7dC2lpaTQ1NcHZ2Rm+vr7o0aMHVW02X+dbtmzB1KlTER4eDi8vL1hbWwNoTWy0Xedo8f3332P16tW4f/8+Tpw4wSQEc3NzMXPmTCqan0KSrO199ObNG7H2GJ9//jk17cjISNZasNg45PoUbEHYpL0n8f4nGBoaYsOGDbhy5Qr69+8vcuBCs+1wxIgRCAgIwLFjx9CzZ08AwMOHDxEYGIiRI0dS0wVaW9lv3LgBY2NjODg4YM2aNaivr8eRI0dgaWlJVZtN7t69K9bap2vXrmhsbKSq7eXlhbi4OHz99ddUdcQRHR0NHx8fPHz4UOx+jeZQLw6ONrgprhwcfzNixAiEhIRQ9wD7FHn37h3S09NRUVEBd3d3qKio4NGjR+jatatQRYIkePr0KdTU1KjfdDQ2NsLLywtnz55lPoDfvXuHyZMn49ChQ1TNvSsrK/HixQtYWVmhubkZX331FS5fvgxDQ0NERkZSnXYItE48DA8PR1lZGQDA2NgYwcHBmD17NlVdMzMzbNq0CU5OTlBRUWEma+7atQuHDx9Gbm4uNe2pU6ciNTUVGhoaMDc3F9l00TSwP3PmDGbPni32Zpa26bCDg4PQ91JSUtDS0sKIESPg7e0tcd9JSVFTUwNdXV2x15GamhqqSRM2X+dAayXR8+fPoaamxqxVV1czpuYc7UtZWRm8vb1x5coVoXXSwU3FPT09cfPmTZFDrgULFqB///44dOgQNW0DAwPk5ORAU1OTmsanTHl5OSoqKmBnZwcFBYWPejC2JwYGBh99jsfjUfH9a+P+/ftwcnLCrVu3mOt6TU0NLC0tcfr0aZFhQO3JjRs38Ndff8HBwQFPnjyBl5cXs187ePAgcwhCi4yMDGzbto0ZxGJqaorg4GAMGzaMqm7v3r0RFRWFUaNGCX2OxcTEYPPmzbh9+zY1bX9/f8TExMDa2hpWVlYSHep17do1uLu7o7q6mlnj8Xgd/nrO8WnRMXfmHBz/BUuXLkVQUBDq6urEmrF21FOTe/fuwdHRETU1NXj9+jVGjx4NFRUVbN26Fa9evcK+ffuox/D+ZlNdXV1k0iUNVFVVcfr0aZSVlaGkpARA6421JCoo32/fVVRUxA8//EBds43t27cjNDQUfn5+sLW1BSEEWVlZ8PHxQX19PQIDA6lpBwcHw9fXF69evQIhBNnZ2Th27Bg2bdqE6OhoarpA6/+3s7MzVY2P4e/vj9mzZyM0NBTdu3eXqDabFRdsYmBggNraWpGEVENDAwwMDKhustl8nQOtiaHc3FyhAxdZWVmxQzPamzZT8crKSvzyyy9UTcXV1dVRWloKTU3Nf3moQ7PVc86cOejSpQvOnTsncXsMaWnpj77OtbW1qb7Od+3aBS8vL3zxxRcih1w7d+6kpguIt2pobGyEqqoqVV22aWhogKurK9LS0sDj8VBWVgY+n4/58+dDVVWVavUgm/YYurq6uHnzJi5evIg7d+6AEAIzMzOMGjWKuvaAAQOYr7W0tPDrr79S12wjNjYWc+fOhbOzM/z9/UEIwZUrVzBy5EgcOnQI7u7u1LQXLVqEgIAA/PTTT+DxeHj06BGuXr2Kr776iuqwIQAoKipiqtBv3bol9Bzt66u3tzf69euHY8eOcUMiONiDDeM7Do5PkY+ZsErCjJVNnJycyKxZs8jr16+JsrIyYwSbnp5ODA0NqWrX19eTESNGML/jNm1vb2+yfPlyqtrvIxAIiEAgkJheGzk5OSQmJoYcOXKE3LhxQyKa+vr6jHn7+xw6dIjo6+tT19+/fz/5/PPPmffYZ599RqKjo6nrssn7Q0E6E23v6489aGs/fvxYZL26upooKipS1SaEvdd5dXU16dOnD1FUVCTS0tLMNTUgIIAsWrSIqrakTcUPHTpEXr16RQhpHcBx6NChjz5ooqioSEpKSqhqfIyPDUt4+PAh9aEgbZSWlpIzZ86Q06dPS2wYzubNm8nx48eZ76dPn054PB7p2bMnyc/Pl0gMbDB79mwyduxYcv/+faH9WnJyMjEzM5NYHGztmdjkjz/+IL///jvJzMwU+9lCgz59+pDt27eLrEdERJA+ffpQ1//666+JgoIC8zkmLy/PDF/qqCgqKn5yQ704Oh9cBR0Hx9+weToItJrBlpeXi/WwEecD0V5cvnwZWVlZIp4xenp6ePjwITVdoHV8vYyMDGO624abmxsCAwOpe8mw1er54MEDzJw5E1lZWcyJf2NjI4YOHYpjx45BV1eXmnZtbS2GDh0qsj506FDU1tZS021jwYIFWLBgAerr6yEQCCTWcrdu3TrMnTuXevuwOJydnZGWlobevXtLRM/GxgYpKSlQU1NDv379/vEEmOaghsTERKHv3759i7y8PBw+fBjr16+notnmucfj8RAaGipUNdbS0oLr16+jb9++VLSB1gqin3/+GZMmTWLldR4QEIABAwagoKBAaCjE1KlTMX/+fKrakjYVf9+DbM6cOe3+9/+7mJmZUR/u8yFtXpo8Hg/R0dFCVhQtLS34/fff0adPH4nEYmRkBCMjI4lotREVFYXY2FgAwMWLF3Hp0iUkJSUhPj4ewcHBuHDhAjVtPp+PnJwckaErjY2NsLGxodrqeeHCBSQnJ4u0dBoZGeHevXvUdNtga88EsNfq+fz5c/j6+uL48eNMRaq0tDTc3Nzw/fffU7dDmTRpksj65MmTJeLPtnHjRnzzzTe4ffs2BAIBzMzMJGJ7c+jQIbi5uYkMvZEEI0aM+OSGenF0PrgEHQfH37Bx495Gm+fBvXv3RNo7aXseCAQCsX//gwcPoKKiQk0XYHezyWarp7e3N96+fYuSkhJmUtTdu3fh7e2NefPmUb25MDQ0RHx8vMjmLi4ujvpN1vr16zFr1iz07t1b4t5BZ8+eRVhYGIYPH4558+bB2dlZYobyxsbGWLVqFS5fviy2fb69zbWdnJwgJycHgN1BDU5OTiJr06dPh7m5OeLi4jBv3rx218zLywPQ2uZZVFQkdPAgKysLa2trqpNUu3TpgsWLFzNt85J+nbN54MKmqbiDgwNmzZqF6dOnU71hFseWLVsQEhKC7777Tuz7u2vXru2uGRkZCaD1db5v3z5IS0szz7VNr6VhT7F8+XJs2LABSkpK/3KaJ02fqNraWuYg69y5c3B1dcWYMWOgr6+PwYMHU9MFWv0cxe2ZXr9+Tf091tTUJLZVvb6+nrnm04LNPRObrZ7z589Hfn4+zp07hy+++AI8Hg9XrlxBQEAAFixYgPj4eGraurq6SElJEUkWpaSkUD3IBYDDhw9j+vTpUFJSEmrzlQSrVq2Cv78/XFxcMG/ePLGHyrSYNGkSAgMDUVRUJPZ6TnuIHAcHAK7FlYPjQ4qLi8lvv/1GTp8+LfSgibW1NXFxcSG3b98mz549I42NjUIPmri6upIFCxYQQlpb8SorK8lff/1FRowYQebMmUNVW1lZmZSWljJft7VrZGdnE3V1darabLZ6ysvLk5s3b4qs5+bmUm9LSkhIINLS0mTs2LHk22+/JRs2bCBjx44lXbp0ISdPnqSqbWlpSaSkpMjgwYPJ7t27JdYm0kZBQQFZtmwZ0dbWJqqqqsTHx4dkZ2dT19XX1//ow8DAgLr+p0Z5eTn1NtM5c+aQP//8k6rGx7C3tyeJiYmsaKupqZHi4mJCiPA1NTMzk2hra1PV5vP55OLFiyLahw8fJqamplS1ly5dSnR0dIi8vDxxdnYmiYmJ5PXr11Q123jfEuP9hyTsMezt7cnTp0+panyo9+zZM+brjz0cHByoxtGjRw+SlZVFCCHE2NiYxMfHE0IIuXPnDlFRUaGi2bYX5PF4JCYmRmh/ePLkSeLr60uMjY2paLcxfvx4psWwbb/W0tJCXFxcyLRp06hqs7lnYrPVU1FRkWRmZoqs//7779Q/x3744QciKytLfHx8GDuURYsWETk5ObJv3z6q2pqamkRRUZG4ubmRs2fPkrdv31LVe593796R06dPk6lTpxJZWVliYmJCNm/eTGpra6lri7M7ev8az8EhCbgEHQfH31RUVBArKysh77n3N900YdPz4OHDh8TY2JiYmpqSLl26kCFDhhANDQ1iYmIi1tumPWFzsyknJyf2d15aWkrk5OSoahsbG5Pr16+LrF+/fp307t2bqjYhhNy4cYN4eHgQGxsb0q9fP+Lh4SE2YUiDW7dukVWrVhEDAwMiIyNDxo0bR37++WfS1NQkEX1CCHn79i05efIkmTRpEpGRkSEWFhZkx44d1JPhHIQ0NzeTgIAA6jeybZSVlZGkpCTS3NxMCCES8U2Kj48nfD6f7N69m1y5coUUFBQIPWjC5oHLli1biJmZGbl27RpRUVEhmZmZJDY2lmhpaZHdu3dT1SaEkJaWFpKcnEy8vLxI165diZqaGlmwYAFJT0+nqpuenv6PD0nw+vVrcufOHYneRLOJr68v0dPTI6NGjSIaGhrkr7/+IoQQcvz4cdKvXz8qmh96E7//kJWVJcbGxuTs2bNUtNsoLi4mWlpaxNHRkcjKypLp06cTU1NT0r17d+o+p2zumWRlZcVql5WVUdfW1dUlhYWFIusFBQWkV69eVLUJIeTkyZPE1taWqKurE3V1dWJra0tOnTpFXfft27fk7NmzxN3dnSgpKRFNTU2yePFiJjEuKf744w8SERFBLC0tiYyMDJk0aRI5deoUaWlpkWgcHByShEvQcXD8zcSJE4mTkxN5/PgxUVZWJrdv3yaZmZlk0KBB5Pfff6eq7eDgQH777TeqGv9Ec3Mz+fHHH4mvry9ZvHgxOXDgAHNDSxM2N5vm5uZk48aNIusbNmwgFhYWVLVPnTpFBg0aRHJycpiEQU5ODhkyZAhrlTdscPnyZbJkyRKipaVFrepBHK9fvybHjx8nY8aMIV26dCF2dnbExMSEqKioCBmP//+KqqoqUVNT+7cekoxDVVWVSEtLExUVFepVyQ0NDawNoGFz4BCbBy6EfDqm4i9fviTx8fHE2tq6Q1c9NDc3E29vbyItLS00FGTp0qVk06ZNEo3lzz//JImJiRIZmPHmzRuybds24u/vL3S4FBkZSQ4cOEBVW19fnzx58oSqxj9RW1tL1qxZQyZMmEDGjRtHvvnmG/Lo0SPqumzumXr37i22Ymzfvn3Uh5lFRUWRUaNGCf2Oa2tryZgxY6hXsX0qNDU1kdjYWDJ+/HgiKytL+Hy+RPWvXbtGFi5cSOTk5Ii+vj5RVVUl+vr6JC0tTaJxcHBICh4hHxhecXB0UjQ1NZGamgorKyt069YN2dnZMDExQWpqKoKCghhvIxokJiZi9erVCA4OFut5YGVlRU2bberq6rB3717k5uZCIBDAxsYGvr6+6NGjB1XdEydOwM3NDaNGjYKtrS14PB4uX76MlJQUxMfHY+rUqdS01dTU0NzcjHfv3qFLl1Yr0LavlZSUhH726dOn7ar966+/QlpaGmPHjhVaT05OhkAgwLhx49pV75/Iz89HbGwsjh8/joaGBrx8+ZKqXm5uLg4ePIhjx45BTk4Onp6emD9/PuPvEhERga1bt+KPP/74n7XY9Go6fPgw83VDQwPCwsIwduxYfPHFFwCAq1evIjk5GaGhoVR9gw4dOiQ0oEJKSgpaWloYPHgw1NTUqOkCgKenJx4/fozo6GiYmpqioKAAfD4fFy5cQGBgIIqLi6lp/yv/TNp+py9fvsSxY8dw8+ZN5prq4eEhMcPt5uZmiZuKv09dXR2OHz+O2NhY3Lx5EwMHDsT169fbVaOwsBAWFhaQkpJCYWHhP/4szc/vgIAAZGVlYceOHXB0dERhYSH4fD7OnDmDtWvXUt23uLq6ws7ODn5+fnj58iWsra1RXV0NQgiOHz+OadOmUdF9+/YtFi5ciNDQUPD5fCoa/ymNjY3MwKeOCpt7pr1792LZsmXw9vbG0KFDGe1Dhw5h586dWLRoETXtfv36oby8HK9fv8bnn38OAKipqYGcnJyIb297D13KycmBQCAQ8VW8fv06pKWlJeoNV19fj+PHj2Pfvn0oKSmh6o0NAH/88QeOHDmCgwcPorKyElOmTMG8efMwatQovHz5EqtXr0ZCQkK7+FXv2rULCxcuhLy8PDOA52O0t2cwB4c4uAQdB8ffqKmpITc3F3w+H71790Z0dDQcHBxQUVEBS0tLNDc3U9OWkpISWePxeCCEUB8SAbSae+/evZuZjtWnTx/4+flJbAocW+Tm5iIyMhIlJSUghMDMzAxBQUHo168fVd33Eyj/ivcnFbYHVlZW2Lx5M8aPHy+0npSUhBUrVqCgoKBd9T6kqqoKR48exc8//4zS0lLY2dnB3d0dLi4uVM3draysUFJSgjFjxmDBggWYNGmSkLE6ADx58gTdu3cXmaL83+Dg4IDExESoqqrCwcHhoz/H4/GQmpr6P+t9jGnTpsHBwQF+fn5C63v27MGlS5dw6tQpato1NTXQ1dUVO0W2pqaGudmhgY6ODpKTk2FtbQ0VFRUmQVdVVQVLS0u8ePGCmjaH5Hn+/DlOnDiBo0ePIj09HXw+H+7u7vDw8KAyjU9KSgp1dXXQ1taGlJQU83n9IbQ/v/X09BAXF4chQ4YIvc7Ly8thY2OD58+fU9N+/z129OhRrF27FgUFBTh8+DD2799PNTmoqqqKmzdvspKg27JlC/T19eHm5gYAcHFxwYkTJ9CjRw/8+uuvsLa2pqr/7Nkz/Pjjj0LTTOfOnQt1dXWqugB7eyag9SA7IiKCGcDTNsVV3DCi9uQ/mTi+du3adtUeNGgQQkJCMH36dKH1kydPYsuWLe1+8PAhzc3NSExMxM8//4xLly5BV1cXM2fOhIeHB0xNTanpTpo0CcnJyTA2Nsb8+fPh6ekp8vp+9OgRPvvss3bZrxkYGODGjRvQ0NCAgYHBR3+Ox+NRndLMwdEGl6Dj4PibYcOGISgoCFOmTIG7uzuePXuG1atXY//+/cjNzcWtW7eoabNZcZGQkICZM2diwIABTIXNtWvXkJOTg6NHj8LFxYWaNgC8evUKhYWFePz4scgHLTctqf1RUFBASUkJ9PX1hdarq6thbm6OpqYmatpffPEFsrOzYWlpCQ8PD7i7u6NXr17U9N5nw4YN8Pb2lpjep4KysjLy8/NFkhRlZWXo168f1USVtLQ0amtroa2tLbTe0NAAbW1tqokLFRUV3Lx5E0ZGRkKJi5ycHDg6OqKhoYGadhu3b99GTU0N3rx5I7RO+7pWWlqK9PR0sdfUNWvWUNNtamrC5s2bkZKSIlab5o2NgoIC1NTU4OrqCg8PDwwcOJCaFtD6mf3555+Dx+Ox+vmtqKiIW7dugc/nC73OCwoKYGdnhz///JOatoKCAkpLS6GrqwtPT0/07NkTmzdvRk1NDczMzKheW+bOnQtLS8t/WZ1MAz6fj9jYWAwdOhQXL16Eq6sr4uLiEB8fj5qaGqpT2DMyMuDk5ISuXbsy1VO5ublobGzEmTNnMHz4cGraHJJHWVmZqYp9n6qqKlhZWeGvv/6ipj1z5kycPXsWioqKcHFxgYeHh8Smqc6bNw/z589n7knEQQhBTU0N9Yp0Dg426MJ2ABwcnwqrV69mkhNhYWGYOHEihg0bBg0NDcTFxVHVZvMDJiQkBKtWrcK3334rtL527VqsWLGCaoIuKSkJnp6eqK+vF3lOEpWDAPD48WOxN5OSaCtmQ7tbt26orKwUSdCVl5eLtNe2Nw4ODoiOjoa5uTlVHXGEhoYKfd/S0oKioiLo6elRb7dkEw0NDSQmJiI4OFho/dSpU9DQ0KCq/bHzvxcvXkBeXp6qtp2dHWJiYrBhwwYArdcTgUCA8PDwf6xobA8qKysxdepUFBUVCVVWtVUS0ryuHThwAIsXL4ampiZ0dHSEqhd5PB7VBN38+fORkZGB2bNno0ePHmIrJ2lx+vRpjBo1Smw1Og3e/8xm8/N74MCBOH/+PJYuXQrg/15jBw4c+Meb2/ZAV1cXV69ehbq6OpKSknD8+HEArRVetN/fhoaG2LBhA65cuYL+/fuLfHbRbEOrra2Frq4uAODcuXNwdXXFmDFjoK+vL9KK2N74+vrC1dUVe/fuZSrAW1pasGTJEvj6+lI9SPbw8IC9vT3s7e1FWjslyYsXL0T2TF27du2Q2nJycvjjjz9EEnS1tbWMPQoteDwe4uLiMHbsWOpaH/Ljjz+KrH3YRs7j8bjkHEeHhaug4+D4B54+fQo1NTWJ3WiwUXGhqKiIwsJCsRU21tbWVFt7DQ0NMXbsWKxZswbdu3enpiOO3NxceHl5Ma0a70M7Ocim9sKFC3Ht2jUkJiaid+/eAFqTc9OmTcPAgQMRHR1NTZtNli1bBktLS8ybNw8tLS0YPnw4rly5AkVFRZw7dw729vbtqufs7Pxv/+zJkyfbVft9Dh06hHnz5sHR0VGoQjYpKQnR0dGYM2dOu2u2VbXs3LkTCxYsgKKiIvNcS0sL45+TlZXV7tpt3L59G/b29ujfvz9SU1MxefJkFBcX4+nTp8jKymJe+zRoa58+cOAA+Hw+srOz0dDQgKCgIGzbtg3Dhg2jpq2np4clS5ZgxYoV1DQ+hqqqKs6fPw9bW1uJa7NNRUUFduzYIdR2GBAQQPV1BgBXrlyBo6MjPDw8cOjQISxatAjFxcW4evUqMjIy0L9/f2raP/zwAwICAqCsrAw9PT3cvHkTUlJS2L17N06ePIm0tDRq2my2ofXs2RMJCQkYOnQoTExMEBYWBhcXF9y9excDBw6k2lasoKCA/Px8mJiYCK3fvXsXffv2perjumjRImRkZKC0tBQ6OjoYPnw4hg8fDnt7e+p2KFVVVfDz80N6ejpevXrFrEvCBoZN7RkzZqCurg6nT59mLEAaGxsxZcoUaGtrIz4+npo2m3zYRu7q6ooTJ05AR0eHShv5f1KJ296ewRwc4uAq6Dg4PqC8vBwVFRWws7ODurr6R6tA2hM2Ky7s7e2RmZkpkqC7fPky1RtJoLWCbPny5RJPzgGtLTLGxsb48ccf0b17d4lWe7CpHR4eDkdHR/Tp0wefffYZAODBgwcYNmwYtm3bRl3/wYMHOHPmjNhENM2NT0JCAmbNmgUAOHv2LKqqqnDnzh3ExMTgm2++afdk0ft+eoQQJCYmolu3biJtSf9JIu+/Yc6cOTA1NcWuXbtw8uRJxjcoKyuLWrVHm/cUIQRFRUWQlZVlnpOVlYW1tTW++uorKtptmJmZobCwkKk0aWpqgrOzs0QG0Fy9ehWpqanQ0tKClJQUpKSk8OWXX2LTpk3w9/en6s317Nkz6rYEH0NNTU0iPlgfIyEhgWkz/PDa0t7m7e+TnJyMyZMno2/fvrC1tQUhBFeuXIG5uTnOnj2L0aNHU9MeOnQosrKysG3bNvTu3RsXLlyAjY0Nrl69CktLS2q6ALBkyRIMGjQI9+/fx+jRo5nqRT6fj7CwMKraVVVVVP/+f8LZ2Rnu7u4wMjJCQ0MDM1hJnJVAe2NjY4OSkhKRBF1JSQn69u1LVTsqKgpA6xCW9PR0pKenY+fOnfD19YW2tjZqa2upaXt4eAAAfvrpJ4nvmdjUjoiIgJ2dHfT09Bifv/z8fHTv3h1Hjhyhrt/U1ISMjAyx11SaVapRUVGIjY0FAFy8eBEXL17Eb7/9hvj4eAQHB7d7G/mHn8m5ubloaWlh3melpaWQlpameuDBwSGEpMbFcnB86tTX15MRI0YQHo9HpKSkSEVFBSGEEG9vb7J8+XKq2hMnTiROTk7k8ePHRFlZmdy+fZtkZmaSQYMGkd9//52q9t69e4mWlhbx9fUlR44cIUeOHCG+vr5EW1ub7N27l5w+fZp5tDdz584l0dHR7f73/jsoKyuTsrKyTqdNCCECgYAkJyeTrVu3kt27d5OMjAyJ6F66dIkoKioSc3Nz0qVLF9K3b1+iqqpKunXrRhwcHKhqy8nJkfv37xNCCFmwYAEJCAgghBBSWVlJVFRUqGqHhISQ+fPnk3fv3jFr7969IwsXLiRfffUVVW02mTNnDvnzzz/ZDkPiqKqqMp8ffD6fpKamEkIIKS8vJwoKClS1vb29yd69e6lqfIwjR46Q6dOnk6amJolr79y5kygrKxNfX18iKytLFi1aREaNGkW6detGvv76a6raffv2JStWrBBZX7FiBenXrx9VbY7WzzOBQCAxvTdv3pBt27YRf39/cvPmTWY9MjKSHDhwgKr28ePHyeeff07Cw8NJZmYmyczMJOHh4URfX58cP36cFBQUMA9avHjxgiQlJZGVK1eSIUOGEFlZWdK3b19qeoQQoqSkRO7cuUNV41PUJqT19x0VFUWWLFlCgoKCyOHDh8mbN2+o6968eZPo6OiQrl27EmlpaaKlpUV4PB5RUlIiBgYGVLXl5eVJTU0NIYQQf39/snDhQkIIIXfv3iWqqqpUtSMiIsikSZPI06dPmbWnT58SJycnsm3bNqraHBxtcC2uHBx/4+npicePHyM6OhqmpqaM2fKFCxcQGBiI4uJiatqamppITU2FlZUVunXrhuzsbJiYmCA1NRVBQUFUKy7+Xc8eGqX8zc3NcHFxgZaWFiwtLSEjIyP0PM0TuilTpmD27NmYNm0aNY1PUZtNBg0aBEdHR3z77beMobm2tjY8PDzg6OiIxYsXU9PW09PDgQMHMHLkSBgYGOCHH37AxIkTUVxcjC+//BLPnj2jpq2lpYXLly+LbUsaOnSoRAYWAMDLly/x9u1boTWa/jl//PHHR6tjCwsLqfs8sjWAhs2BQ5s2bcL27dsxYcIEiV9T+/Xrh4qKChBCoK+vL6JNs4qtT58+WLt2LWbOnCk0LGHNmjV4+vQp9uzZQ01bXl4eRUVFIr5cpaWlsLKyEmqLowUbfqYtLS04dOjQR4eC0JxODQAxMTEIDw9HWVkZAMDY2BjBwcGYPXs2Nc23b99i4cKFCA0NZWWC7L/ar7V1YNDYr61YsQIZGRkoKCiAhYUF7OzsMHz4cNjZ2Ql5g9HAwcEB33zzDUaNGkVV51PTZhN7e3sYGxtj7969UFVVRUFBAWRkZDBr1iwEBARQrf5ns428V69euHDhgohX8q1btzBmzBg8evSImjYHRxtciysHx99cuHABycnJTNtfG0ZGRv9yStv/SktLC5SVlQG0JusePXoEExMT6Onp4e7du1S122NE+X/L0aNHkZycDAUFBaSnp4sYmtO8mYyOjoaXlxdu3boFCwsLkZtJmjfwbGqzSUlJCY4dOwYA6NKlC16+fAllZWV8++23cHJyopqgmzt3LlxdXRnz+ra2s+vXr1P3z3n37t1H25Jov/+am5sREhKC+Ph4sYlAmu3zlpaWiI6OFnk9b9u2DaGhoVT9ktgcQMPmwKH9+/dDWVkZGRkZyMjIEHqO9jXVyclJou1f71NTU8NMGFRQUGCmG86ePRtDhgyhmqDT0tJCfn6+SIIuPz9fZIJxe8Omn2lAQAAOHTqECRMmwMLCQqL/99u3b0doaCj8/PyYtuKsrCz4+Pigvr4egYGBVHRlZGSQmJgoMnRIUrDZ2hseHg4tLS2sXbsWTk5OMDU1lZh2dHQ0fHx88PDhQ7F7JpqJaDa12SQ/Px9RUVGQlpaGtLQ0Xr9+DT6fj61bt8LLy4tqgo7NNvLnz5/jjz/+EEnQPX78mOrUXA6O9+ESdBwcf9PU1CRkZt5GfX095OTkqGpbWFgwo9QHDx6MrVu3QlZWFvv372fllFZSrF69Gt9++y1Wrlwpsel7bVy5cgWXL1/Gb7/9JvIc7RsbNrXZRElJCa9fvwbQekJaUVHBbILEJVLak3Xr1sHCwgL379+Hi4sL856WlpbGypUrqWrPnTsX3t7eKC8vx5AhQwC0DmrYvHkz5s6dS1U7ODgYaWlp+OGHH+Dp6Ynvv/8eDx8+RFRUFDZv3kxVe8WKFXBzc4OXlxciIyPx9OlTzJ49G8XFxdQTVX5+fnBxcWFlAM3YsWOZr/l8Pm7fvi2xgUNs3sCvW7eONW0dHR00NDRAT08Penp6uHbtGqytrVFVVUXdR3bBggVYuHAhKisrMXToUPB4PFy+fBlbtmxBUFAQVW02/UyPHz+O+Ph4jB8/XmKabezevRt79+6Fp6cns+bk5ARzc3OsW7eOWoIOAKZOnYpTp079R8by7QWbUyvz8vKQkZGB9PR0REREQFpamhkSYW9vTzVh9+TJE1RUVAh9XtKsFvxUtNlERkaGuZ50794dNTU1MDU1Rbdu3VBTU0NVOzIyEvr6+rh//z62bt3KFDDU1tZiyZIlVLWnTp2KuXPnIiIiQmi/FhwcTN0zmIODgbXmWg6OT4zx48eT1atXE0JaPcIqKytJS0sLcXFxIdOmTaOqnZSURE6cOEEIIaSiooKYmpoSHo9HNDU1SUpKClVtNlFTUyPl5eWsaOvp6RFfX19SV1fXqbTZxMnJiezfv58QQkhwcDAxNDQkYWFhxMbGhowcOZLl6OjR0tJCtmzZQnr27El4PB7h8XikZ8+eZMuWLUK+dDTQ1dUlaWlphBBCVFRUGO/DmJgYMm7cOKrahBCSn59PLCwsiKGhIVFXVyfjx4+XyOteRUWFtWtLZ8XAwIDU19eLrD979oy6Z9G8efPIunXrCCGtvqoKCgpk1KhRRFVVlXh7e1PVFggEZPv27aRXr17M+7tXr15kx44d1L3R2PQz7dGjB7l79y4r2nJycmL/3aWlpUROTo6qdlhYGFFVVSXTpk0j3333Hdm5c6fQo7OQn59P5syZQ7p06UKkpKSoapmamhJnZ2dy7do1UlVVRaqrq4UeHVWbTUaPHk1+/vlnQgghixYtIoMGDSKxsbFk7NixZNCgQSxHR4+mpiayePFiIicnR6SkpIiUlBSRlZUlixcvJi9evGA7PI5OAudBx8HxN7dv34a9vT369++P1NRUTJ48GcXFxXj69CmysrLQu3dvicYjqYoLNgkMDISWlha+/vpriWurqKggPz9f4v+vbGuzSWVlJV68eAErKys0Nzfjq6++wuXLl2FoaIjIyEhWqwMkRZt3Ck3vt/dRVlZGcXEx9PT08Nlnn+HkyZMYNGgQqqqqYGlpiRcvXlDV/+uvv7BgwQKcOHECwP+1d9PG29sbtra2mDdvHnWtTwk2fcGkpKRQV1cn0tb5xx9/QFdXV2QKYHsiEAggEAjQpUtrY0h8fDxzbfHx8RGaJEyTthYoFRUVieix6WcaERGByspK7NmzR+L7FAsLC7i7u4vsHcLCwhAXF4eioiJq2gYGBh99jsfjobKykpo22+Tl5TETXDMzM/H8+XP07dsXDg4OCA8Pp6arpKSEgoIC6u2Nn5o2m9y4cQN//fUXHBwc8OTJE3h5eTHX1J9++on61GC2aWpqYjxVDQ0NoaSkxHZIHJ0IrsWVg+NvzMzMUFhYiL1790JaWhpNTU1wdnaGr68vevToIZEYysvLUVFRATs7O6irq1NvzWGblpYWbN26FcnJybCyshLx9ti+fTs1bWdnZ6SlpbGSJGNTG2i9mS0vLxd7A29nZ0dN9/12bUVFRfzwww/UtD5VJJWYa4PP56O6uhp6enowMzNDfHw8Bg0ahLNnz1I39s7KysKsWbOgoaGBwsJCZGVlYenSpTh//jyioqKgpqZGTXvPnj1wcXFBZmamxIclsAkbvmBnzpxhvk5OTka3bt2Y71taWpCSkvKPSY32QEpKSsgmwdXVFa6urlQ1P+Tx48e4e/cueDweTExMoKWlRV2TTT/Ty5cvIy0tDb/99hvMzc1FtE+ePElNe/369XBzc8Pvv/8OW1tbpq04JSUF8fHx1HQBdtvI2URNTQ0vXryAtbU17O3tsWDBAtjZ2UnkM23EiBGsJcnY1Obz+cjJyYGGhobQemNjI2xsbKgmgwcMGMB8raWlhV9//ZWa1qeIkpJSh/UX5Pj04SroODg+ARoaGuDq6oq0tDTweDyUlZWBz+dj3rx5UFVVRUREBNshUsHBweGjz/F4PKrVHhs3bsSOHTtYmXbIpva1a9fg7u6Oe/fuSdxUPCcnBwKBAIMHDxZav379OqSlpYU2hBztQ2RkJKSlpeHv74+0tDRMmDABLS0tePfuHbZv346AgABq2nJycggMDMSGDRuY13hFRQVmz56NmpoaPHjwgJp2m7G3goICNDQ0RAbQdNQqF01NTcTExEjUF6wtMdbmy/Q+MjIy0NfXR0REBCZOnEgthoMHD0JZWRkuLi5C67/88guam5upVm0+f/4cvr6+OHbsGHPgIS0tDTc3N3z//fdCCcv25syZM5g9e7ZY83La1/N/5Z958OBBatpA64CMyMhIZkCGmZkZgoKC0K9fP6q679P2eu/InQ5tnDt3TmIJuQ/Zv38/wsLC4O3tLXbPRDMRzab2P1Ulf/7554ynLw2qqqrw7t07keE3ZWVlzHWdg4ODDlyCjoPjPV69eoXCwkKxlUU0P4Q9PT3x+PFjREdHw9TUFAUFBeDz+bhw4QICAwNRXFxMTRtgr6KKTdhsU2FTu2/fvjA2Nsb69euZiabvQ/NmctCgQQgJCcH06dOF1k+ePIktW7bg+vXr1LQ5WqmpqcGNGzfQu3dvWFtbU9XKyMjA8OHDRdYFAgE2btxIdRKijo4O/P39WRlAwyY9e/ZEeno6jI2NJa5tYGCAnJwcaGpqSlzbxMQE+/btEzn0ycjIwMKFC6lOQ3d1dUV+fj52796NL774AjweD1euXEFAQACsrKyoVnTp6+tj4sSJCA0NlfgwlM5MTEwMwsPDUVZWBgAwNjZGcHAwZs+eTVWXzYoqNvmnazjtRDQb2m1VyVOmTMHhw4fFViVfvHiR6nVt+PDh8Pb2FjnciI2NRXR0NNLT06lpc3B0drgEHQfH3yQlJcHT01PsNEnaGwAdHR0kJyfD2toaKioqTIJOEj5RbFZUcUgeNv1UlJWVmWnF71NVVQUrKyvqI+w7WyL67du3GDNmDKKiolhJ2LTxfuu+goICM/2OJurq6sjJyel0Po9s+oKxiby8PO7cuSNS1VFdXQ1TU1O8fPmSmraSkhKSk5Px5ZdfCq1nZmbC0dERTU1N1LTZ9jN99+4d0tPTUVFRAXd3d6ioqODRo0fo2rUrM3mRBh4eHsz00A8rfGizfft2hIaGws/PD7a2tiCEICsrC99//z3CwsKoTpBls6KKQ3J8ClXJXbt2xc2bN0X2iuXl5RgwYAAaGxupaXfWRDQHRxucBx0Hx9/4+fnBxcUFa9askfhJdFNTExQVFUXW6+vrIScnR1Xbx8cHAwYMwPnz58VWVHUWWlpaUFRUBD09PareWB8i6RaZwYMHo7y8nJUEnZycHP744w+RBF1tbS1j7k6LzpiIlpGRwa1bt1h7T3+sdX/+/PlQU1PDtm3bqGl7eXkhLi6OlQE0ksbZ2Vno+9TUVFZ8wfz9/WFoaCjSor9nzx6Ul5djx44d1LS1tbVRWFgokqArKCgQuclrbzQ0NMRWHnfr1o36Zwmbfqb37t2Do6Mjampq8Pr1a4wePRoqKirYunUrXr16hX379lHTVlZWRkREBBYtWgQdHR0MHz4cw4cPh729Pfr06UNNFwB2796NvXv3wtPTk1lzcnKCubk51q1bRyVB9+/4PHa2lsPGxkbqPqps0XaAyGZVMo/HE3to+ueff1LfL1VXV4vVeP36NR4+fEhVm4PjU4BL0HFw/M3jx4+xfPlyVtpE7OzsEBMTgw0bNgBo/WAUCAQIDw//R5+29qCsrAwJCQmdbkLVsmXLYGlpiXnz5qGlpQV2dna4evUqFBUVce7cOdjb21PVZ6tFZunSpQgKCkJdXZ1YPxWaprijR4/GqlWrcPr0aeYGo7GxEV9//TVGjx5NTReQfCJ6165d//bP0vQc9PT0xI8//ojNmzdT0/gYgYGBkJGRQU1NDUxNTZl1Nzc3BAYGUk3QsTmARtJ8mByaOnUqK3GcOHFCKJHQxtChQ7F582aqCboZM2bA398fKioqTDVsRkYGAgICMGPGDGq6ALB69WosX74cMTExzECpuro6BAcHU23jBlo/N1atWoXLly9L3M80ICAAAwYMEEmCTp06FfPnz6emCwBRUVEAWn/PbVNFd+7cCV9fX2hra6O2tpaadm1tLYYOHSqyPnToUGq6U6ZMAdC6N/yw5fD9iqqOypYtW6Cvrw83NzcAgIuLC06cOIEePXrg119/lYhdw7Zt21BSUgIejwdTU1MEBwdj2LBhVHXFDSSRVGJy2LBh2LRpE44dOwZpaWkArZ+rmzZtEqkWbi/YSkSL+9z6GDTtjjg42uBaXDk4/sbb2xu2traYN2+exLVv374Ne3t79O/fH6mpqZg8eTKKi4vx9OlTZGVlUT0dHzFiBEJCQuDo6EhN41Pks88+w6lTpzBgwACcOnUKvr6+SEtLQ0xMDNLS0pCVlUVNm+0WmQ9pa6OgXUn28OFD2NnZoaGhgTHyzs/PR/fu3XHx4kXo6upS05Z0a++/O7WStufg0qVLERMTA0NDQwwYMABKSkpCz9NMVLHZus/mAJrOiry8PG7duiW2JcrCwgKvXr2ipv3mzRvMnj0bv/zyC1ONKxAI4OnpiX379kFWVpaadr9+/VBeXo7Xr1/j888/B9Dq9SgnJyfSfnnz5s121WbTz1RTUxNZWVkwMTERen9XV1fDzMwMzc3N1LTbaGpqwuXLl5kk3c2bN2FmZoa8vDxqmhYWFnB3dxepzg0LC0NcXByKioqoabNZUcUmfD4fsbGxGDp0KC5evAhXV1fExcUhPj4eNTU1uHDhAjXt2NhYzJ07F87Ozsx+7cqVK0hMTMShQ4fg7u5OTZvNxOTt27dhZ2cHVVVVJhGZmZmJ58+fIzU1FRYWFu2uyVZr77/rU9tRuy04Pj24BB0Hx980NzfDxcUFWlpaEj+JBlpPgvfu3Yvc3FwIBALY2NjA19eXOZGnRWJiIlavXo3g4GCJV1Sxiby8PMrLy/HZZ59h4cKFUFRUxI4dO1BVVQVra2s8f/6cmraBgQHWr18v1CIDAIcPH8a6devEnpq2F/fu3fvH5/X09KhpA603VD///DMKCgqgoKAAKysrzJw5U+R119501kQ0m4kqFRUV3Lx5E0ZGRkI38Dk5OXB0dERDQwM17c7KiBEjcPLkSZEKi+fPn2PKlClU/78tLCzg4+MDPz8/ofW2lsDbt29T026jtLSUubZYWlpSv54BwPr16//tn127di3FSCSLuro6Ll++DDMzM6H39+XLlzFt2jT88ccf1LRXrFiBjIwMFBQUwMLCAnZ2dhg+fDiTUKDJiRMn4ObmhlGjRsHW1hY8Hg+XL19GSkoK4uPjJV7B2pFbPdtQUFBAaWkpdHV1ERAQgFevXiEqKgqlpaUYPHgwnj17Rk3b1NQUCxcuFDk43b59Ow4cOICSkhJq2mwmJgHg0aNH2LNnj9B+zc/PD+rq6lR1O2simoOjDS5Bx8HxN9HR0fDx8YGCggI0NDSEWuBon0SzCZsVVWyip6eHAwcOYOTIkTAwMMAPP/yAiRMnori4GF9++SXVDd/HKk3KyspgaWlJtdKks9JZE9FsMmHCBNjY2GDDhg1QUVFBYWEh9PT0MGPGDAgEAiQkJLAdYofjYybyjx8/Rq9evfD27Vtq2j/99BP8/PwQHByMESNGAABSUlIQERGBHTt2YMGCBdS0OSSPm5sbunXrhv379zPvby0tLTg5OeHzzz/HwYMHqWlLSUlBS0sLgYGBcHJyEmqhlwS5ubmIjIxESUkJCCEwMzNDUFAQUxlOC7ZbPdmiZ8+eSEhIwNChQ2FiYoKwsDC4uLjg7t27GDhwINUDVTk5ORQXF7NSGcxmYvJTozMkojk42uA86Dg4/mb16tX49ttvsXLlyn+73Lk9efXqFQoLC8VOmKTpeUCzWutTZu7cuXB1dWX8yNo80K5fv07dZNrQ0BDx8fEiLTJxcXESm0h3+/Zt1NTU4M2bN0LrtP01SktLkZ6eLvZ1vmbNGmq606ZNA9Dayt6GJBPRDx48wJkzZ8T+zjuSH9r7hIeHw97eHjdu3MCbN28QEhIi1Lrf3jg7O+PQoUPo2rWryOCED6E5LIENCgsLma9v376Nuro65vuWlhYkJSWhV69eVGPw9vbG69evsXHjRsZPVV9fX8RQnwYtLS04dOgQUlJSxF5bOmpLMyEECQkJSEtLE/vvpvk6j4yMhIODA8zMzPDq1Su4u7ujrKwMmpqaOHbsGDVdAMjLy0NGRgbS09MREREBaWlpZkiEvb099YRd//79ERsbS1VDHFFRUYzuxYsXcenSJSQlJSE+Ph7BwcHUK6rYwtnZGe7u7jAyMkJDQwPGjRsHoNUig7Ztha6uLlJSUkR0UlJSqNpyAICamhru378PXV1dJCUlISwsDEDr+14Sh+eNjY3Izs4We22heU1nOxHd1NSEjIwMsfs12t1UHBwAl6Dj4GB48+YN3NzcWEnOJSUlwdPTE/X19SLP0U4eSKIF6FNk3bp1sLCwwP379+Hi4sJMy5WWlsbKlSupaq9fvx5ubm74/fffxbbI0KSyshJTp05FUVGRkM9HW8UozdfagQMHsHjxYmhqakJHR0ekSpVmgo7NRHRKSgomT54MAwMD3L17FxYWFqiurgYhBDY2NqzFRRszMzMUFhZi7969kJaWRlNTE5ydnam17nfr1o15TYmbqtmR6du3L3g8Hng8HlO99j4KCgrYvXs39TgWL16MxYsX48mTJ1BQUICysjJ1TaB1YMGhQ4cwYcIEWFhYdJpp5AEBAdi/fz8cHBzQvXt3if67e/bsifz8fBw7dgw3b96EQCDAvHnz4OHhAQUFBara1tbWsLa2Zm6WCwoKsGPHDvj7+0MgEFD9HPPw8GASgZI6UGujtraWSQqdO3cOrq6uGDNmDPT19TF48GCJxiJJIiMjoa+vj/v372Pr1q3MdaW2thZLliyhqh0UFAR/f3/k5+dj6NChzH7t0KFD2LlzJ1VtNhOTZ8+ehYeHB5qamqCioiKyX6OZoGMzEZ2Xl4fx48ejubkZTU1NUFdXR319PRQVFaGtrc0l6DgkAtfiysHxN4GBgdDS0hKpapIEhoaGGDt2LNasWcPKFFmAvYqqzgpbLTKTJk2CtLQ0Dhw4AD6fj+zsbDQ0NCAoKAjbtm2jOpVMT08PS5YswYoVK6hpfIoMGjQIjo6O+PbbbxmvJm1tbXh4eMDR0RGLFy9mO0Qq1NTUQFdXV2zSoKamhjHUb28IIaipqYGWlhYUFRWpaHxq3Lt3D4QQ5j2tpaXFPCcrKwttbW1mEl9HRFNTEzExMRg/fjzboUgUdXV1xMbGdrp/N9B6I902HKLNvL5v375wcHBAeHg4Nd1FixYhIyMDpaWl0NHRwfDhw5nqPdrV92y2enZmEhMTERERwfjNtU1xdXJyoqr79u1b7Nq1CzU1NZgzZw6zP9yxYweUlZWpTks2NjbG+PHj8d1330n8c5TN1l57e3sYGxtj7969UFVVRUFBAWRkZDBr1iwEBAT8y+p8Do72gEvQcXD8jb+/P2JiYmBtbQ0rKysRjyqabWhdu3ZFXl4e1WmtH4PNiioOyaOpqYnU1FRYWVmhW7duyM7OhomJCVJTUxEUFER1+l3Xrl2Rn58PPp9PTeNfwUYiWkVFBfn5+ejduzfU1NRw+fJlmJubo6CgAE5OTqiurqamzSbS0tKora0V8UNraGiAtrY2tWuLQCCAvLw8iouLJV7h0tlJSEhgDMw/fI+19wTT9+nZsyfS09NhbGxMTeNTxMDAAL/99hv1xNDHePjwIbKyssS2wNGsNFFTU8OLFy9gbW3NVLPZ2dmha9eu1DQ/pK6ujkkQtiXstLW1UVtbS03Tz88P586dg5GREfLy8lBdXQ1lZWXExcVhy5YtVN9jHJLl7du3WLhwIUJDQ1nZMykpKaGoqIgVbTYT0aqqqrh+/TpMTEygqqqKq1evwtTUFNevX4eXlxfu3LlDTZuDow2uxZWD42+KioqY06lbt24JPUe7bWT69OlIT09nJUEXEBAAAwMDXLp0SWxFFUf78+uvv0JaWhpjx44VWk9OToZAIGDaGGjQ0tLCtIdoamri0aNHMDExgZ6eHu7evUtNF2j1Eblw4QJ8fHyo6oiDzUS0kpISXr9+DaB141lRUQFzc3MAENvW3lFo8/f7kBcvXkBeXp6arpSUFNMS1BkTdBUVFdixYwdKSkrA4/FgamqKgIAA6p8vu3btwjfffAMvLy+cPn0ac+fORUVFBXJycuDr60tVOygoCDt37sSePXtYa2998+YNqqqq0Lt3b3TpIpnt9bp167B+/Xr89NNP1NtKP+TgwYPw8fGBrKys2MFaNBN0R44ckXhC7kNUVFSgpqYGNTU1qKqqokuXLtDR0aGqGRkZCQMDA9TU1Ei81bOzkpOTA4FAINJCfP36dUhLS2PAgAFUdGVkZJCYmIjQ0FAqf/+/YuzYsbhx4wYrCTo2W3tlZGSYa1n37t1RU1MDU1NTdOvWDTU1NVS1OTja4BJ0HBx/k5aWxpr2nj174OLigszMTLETJmludK9evYrU1FRoaWlBSkoKUlJS+PLLL7Fp0yb4+/tTrajqrKxcuRKbN28WWSeEYOXKlVQTdBYWFigsLASfz8fgwYOxdetWyMrKYv/+/dQ3YoaGhggNDcW1a9ck/jpnMxE9ZMgQZGVlwczMDBMmTEBQUBCKiopw8uRJDBkyhKo2GyxfvhxA6016aGioUHtMS0sLrl+/jr59+1KNYevWrQgODsbevXthYWFBVetTIjk5GZMnT0bfvn1ha2sLQgiuXLkCc3NznD17lhmGQ4MffvgB+/fvx8yZM3H48GGEhISAz+djzZo1ePr0KTVdALh8+TLS0tLw22+/wdzcXOTaQnNYQnNzM5YuXYrDhw8DaB2Ew+fz4e/vj549e1L1NHVxccGxY8egra0NfX19kX83zYqqNWvWYM2aNVi1apXEvXsnTpwoUb33WbFiBTIyMlBQUAALCwvY2dlh1apVsLOzozpl8p8qqpYtW0ZNt7Pj6+uLkJAQkQTdw4cPsWXLFly/fp2a9tSpU3Hq1CnmM1WSTJgwAcHBwbh9+7bY/RrNrgM2E9H9+vXDjRs3YGxsDAcHB6xZswb19fU4cuQILC0tqWpzcLTBtbhycHwCREdHw8fHBwoKCmJPoisrK6lpq6mpITc3F3w+H71790Z0dDQcHBxQUVEBS0tLNDc3U9PurCgoKKCkpAT6+vpC69XV1TA3N0dTUxM17eTkZMasv7KyEhMnTsSdO3egoaGBuLg4sQbz7YWBgcFHn6P9OmeztbeyshIvXryAlZUVmpub8dVXX+Hy5cswNDREZGRkhxvU4uDgAADIyMjAF198AVlZWeY5WVlZ6Ovr46uvvqJa3aampobm5ma8e/cOsrKyItVFtBNGbNGvXz+MHTtW5ABg5cqVuHDhAtWEjaKiIkpKSqCnpwdtbW1cvHgR1tbWKCsrw5AhQ9DQ0EBNe+7cuf/4/MGDB6lpBwQEICsrCzt27ICjoyNzAHLmzBmsXbuW6rXF1dUVaWlpmD59utghEWvXrqWmraGhgezsbFYq/9lESkoKWlpaCAwMhJOTE/WJse+jqqqKmzdvsmoT0dlQVlZm3tPvU1VVBSsrK/z111/UtDdu3Iht27Zh5MiR6N+/P5SUlISep3mo+U9Jd5rD69hu7b1x4wb++usvODg44MmTJ/Dy8mL2az/99BP1w0UODoBL0HFwfBLo6OjA398fK1eulPhJ9LBhwxAUFIQpU6bA3d0dz549w+rVq7F//37k5uaKtPt2JAQCAcrLy8X659jZ2VHT1dHRwdGjR0WSYZcuXYK7uzseP35MTVscT58+hZqaWoeefMgloiXP3LlzsXPnTlba0NqqmT6Gl5eXhCKRLPLy8igqKhJJfpaWlsLKygqvXr2ips3n85GQkAAbGxsMHDgQ8+fPx6JFi3DhwgXMmDGjwyZF9fT0EBcXhyFDhjBDYPh8PsrLy2FjY0PVL0lJSQnJycn48ssvqWl8jJCQEKirq1Ofev6pUVBQgIyMDGY4hbS0NDMkwt7enmrCbu7cubC0tGSloopN+Hw+cnJyoKGhIbTe2NgIGxsbqod7GhoaOHfuHL744guh9StXrmDChAlUBxaweajJJlwimqOzw7W4cnB8Arx58wZubm4ST84BwOrVq5mKrbCwMEycOBHDhg1jKqo6KteuXYO7uzsz/fB9aJ4OAq2tAcuWLUNiYiJTfVBeXo6goCCJTc0tLy9HRUUF7OzsoK6uLvI7oM2HHnC0YbO1l82bCzahWbX0r+ioCbh/hZaWFvLz80USdPn5+SLDOtqbESNG4OzZs7CxscG8efMQGBiIhIQE3LhxQ2KT7548eYK7d++Cx+PB2NhYaJotTU1xv9umpibq1zddXV3WfNg2bdqEiRMnIikpSWwLHM3BWmxibW0Na2trpnqpoKAAO3bsgL+/PwQCAdW9g6GhITZs2IArV65IvKKKTaqrq8X+Xl+/fo2HDx9S1R49ejRWrVqF06dPo1u3bgBaP7u//vprqpYBQGuVXmeEzdbeESNG4OTJkyLt6s+fP8eUKVOQmpoq8Zg4Oh9cgo6D4xPAy8sLcXFx+PrrryWu/f6gAj6fj9u3b3eKiiofHx8MGDAA58+fR48ePST6bw0PD4ejoyP69OmDzz77DADw4MEDDBs2jLofWkNDA9MWxePxUFZWBj6fj/nz50NVVRURERFU9WNiYhAeHo6ysjIAgLGxMYKDgzF79myqumwmotm8uejMVFRU4ODBg6ioqMDOnTuhra2NpKQk6OrqMkM6OhoLFizAwoULUVlZiaFDh4LH4+Hy5cvYsmULgoKCqGrv37+fqUT28fGBuro6Ll++jEmTJlEfDNPU1ISlS5ciJiaGiUFaWhqenp7YvXu3kA9iezNw4ECcP38eS5cuBfB/hw4HDhwQqbppbyIiIhASEoJ9+/aJWCbQ5rvvvkNycjJMTEwAQMSaoyOTl5fHTHDNzMzE8+fP0bdvX6a9nxbR0dFQVVVFbm4ucnNzhZ6jPZiDDc6cOcN8nZyczCTIgFY/05SUFOqv+4iICNjZ2UFPT48ZJJefn4/u3bvjyJEjVLXfR9KHmkCrTcW2bduEBg4FBwdj2LBhVHXZTESnp6eLTCAHgFevXiEzM5OaLgfH+3AtrhwcnwD+/v6IiYmBtbU1rKysWDmJfr+iSkFB4aMTGDsKSkpKKCgooD4R6mMQQnDx4kUUFBRAQUEBVlZWVNtq2/D09MTjx48RHR0NU1NTph3rwoULCAwMRHFxMTXt7du3IzQ0FH5+foyBfVZWFr7//nuEhYUhMDCQmrY4aCei224upkyZgsOHD4u9ubh48SL16bmdkYyMDIwbNw62trb4/fffUVJSAj6fj61btyI7OxsJCQlsh0gFQgh27NiBiIgIPHr0CEDr5ODg4GD4+/t32Gv6okWLcOnSJezZswe2trYAWgdH+Pv7Y/To0di7dy817StXrsDR0REeHh44dOgQFi1ahOLiYly9ehUZGRno378/Ne33vRYVFRVF9g4024rV1NQQGRmJOXPmUNP4FFFTU8OLFy9gbW3NtLWyPVG2o9LWVfL+5PU2ZGRkoK+vj4iICOpDQ5qamvDzzz8L7ddmzpwp8n6jAVuHmrGxsZg7dy6cnZ2FBg4lJibi0KFDcHd3p6bNRmtvYWEhAKBv375ITU2Furo681xLSwuSkpIQFRWF6urqdtfm4PgQLkHHwfEJ8E+nrjwej2pJ9ccqqubNmyeRiiq2GDFiBEJCQuDo6Mh2KBJFR0cHycnJsLa2FvJLqqqqgqWlJV68eEFN28DAAOvXr4enp6fQ+uHDh7Fu3TqJtHNIMhH9qdxcdEa++OILuLi4YPny5UKv85ycHEyZMqVTVC62mZerqKhITPPZs2f48ccfhSou5s6dK3SzQwNNTU0kJCTA3t5eaD0tLQ2urq548uQJVf2ioiJs27YNubm5EAgEsLGxwYoVK6hP/Tt06NA/Xr9otnrr6OggMzOT6rCXT5Fz5859Egk5Niqq2MLAwAA5OTnQ1NRkOxSJwuahpqmpKRYuXCiisX37dhw4cAAlJSXUtNlASkqKeS+JS40oKChg9+7d8Pb2lnRoHJ0QLkHHwdHJYbOiik0SExOxevVqBAcHi/XPsbKyYikyuqioqODmzZswMjISSVw4OjpSnbQoLy+PW7duiVQtlpWVwdLSkqqBPZuJ6M56c8EmysrKKCoqgoGBgdDrvLq6Gn369KH6WvsUeN+LzcTERCKvvYyMDDg5OaFr164YMGAAACA3NxeNjY04c+YMhg8fTk1bUVERubm5Igb9xcXFGDRoENXJ2J2VTZs2oba2Frt27WI7lE4FWxVVnxqNjY0iPmEdDTYPNeXk5FBcXCyyXysvL4eFhYXEPkMllYhu86Pm8/nIzs4W8i+VlZWFtrY2pKWlqcbAwdGG5B3pOTg4PikuXLiALVu2MF5obRgZGeHevXssRUWfadOmoaSkBN7e3hg4cCD69u2Lfv36MX92VOzs7BATE8N8z+PxIBAIEB4eTt0/x9DQEPHx8SLrcXFx1KswAgMDISMjg5qaGiE/Kjc3NyQlJVHVrqqq4pJzEkZVVRW1tbUi63l5eejVqxcLEUmGpqYmeHt7o0ePHrCzs8OwYcPQo0cPzJs3j/qkYl9fX7i6uqKqqgonT57EyZMnUVlZiRkzZsDX15eq9hdffIG1a9cK3TS+fPkS69evp+4D5+DggB9//BF//vknVR1x2NvbIyYmBi9fvpS4dnZ2Ng4fPgw+n49JkybB2dlZ6MHR/mzfvh2LFy/G+PHjER8fj7i4ODg6OsLHxweRkZFsh0eNLVu2CHnFuri4QF1dHb169UJBQQGLkdGltrYWQ4cOFVkfOnSo2M+39kRXVxcpKSki6ykpKdDV1aWqDbQmoi0tLaGgoMC0FdP0/NPT04O+vj4EAgEGDBgAPT095tGjRw8uOcchUbghERwcnZympiaxBtr19fWQk5NjISLJ0FmnY4WHh8Pe3h43btzAmzdvEBISguLiYjx9+hRZWVlUtdevXw83Nzf8/vvvsLW1ZQzsU1JSxCbu2pMLFy4gOTmZlUS0v78/DA0NRYyN9+zZg/LycuzYsYOqfmfE3d0dK1aswC+//MIkobOysvDVV1+JVCN0JJYvX46MjAycPXtWxIstKCiIqhdbRUUFTpw4IXQjIy0tjeXLlwsdCtBg586dcHR0xGeffQZra2vweDzk5+dDXl4eycnJVLUtLS2xevVq+Pn5Yfz48Zg9ezbGjx8PWVlZqroA0L9/f4SEhGDp0qVwdXXFvHnzMGTIEOq6QGsSnEvESZbdu3dj7969QtcwJycnmJubY926dRL3cZUUUVFRiI2NBQBcvHgRly5dQlJSEuLj4xEcHIwLFy6wHCEd2g41PxwgJ4lDzaCgIPj7+yM/P19o4NChQ4ewc+dOqtofa+318fFBfX091df5pk2b0L17d5FW1p9++glPnjzBihUrqGlzcDAQDg6OTs348ePJ6tWrCSGEKCsrk8rKStLS0kJcXFzItGnTWI6Ogwa1tbVkzZo1ZMKECWTcuHHkm2++IY8ePZKI9o0bN4iHhwexsbEh/fr1Ix4eHuTmzZvUdZWVlUlpaSnzdUVFBSGEkOzsbKKurk5Vu2fPnuTGjRsi67m5uaRXr15UtTsrb968Ie7u7kRKSorweDwiIyNDpKSkyKxZs8i7d+/YDo8aGhoaJC0tTWQ9NTWVaGpqUtUeOnQoSUxMFFlPTEwkQ4YMoapNCCHNzc1k//79ZPny5SQwMJAcOHCANDc3U9clhJCWlhaSnJxMvLy8SNeuXYmamhpZsGABSU9Pp6797t07curUKeLk5ERkZGSIqakpCQ8PJ3V1ddS1OSSLnJwcKSsrE1kvLS0lcnJyLEQkGeTl5UlNTQ0hhBB/f3+ycOFCQgghd+/eJaqqqmyGRpWEhAQiLS1Nxo4dS7799luyYcMGMnbsWNKlSxdy8uRJ6vonT54ktra2RF1dnairqxNbW1ty6tQp6rr6+vrk8OHDIuuHDh0i+vr6VLX19PRIVlaWyPq1a9eoa3NwtMF50HFwdHJu374Ne3t79O/fH6mpqZg8ebJQRVXv3r3ZDpEqt2/fRk1NjchY9cmTJ1PVFQgEKC8vx+PHjyEQCISek8Q0187GhAkTYGNjgw0bNkBFRQWFhYXQ09PDjBkzIBAIqE71/Jj3nqS9XDojFRUVyMvLg0AgQL9+/Tq8oT2bXmxxcXFMNVdbFde1a9fw/fffY/PmzUIxdVSPTwB49eoVzp49i40bN6KoqAgtLS0S037y5AmioqKwceNGtLS0YPz48fD398eIESMkFgMHPSwsLODu7i5SURUWFoa4uDgUFRWxFBldevbsiYSEBAwdOhQmJiYICwuDi4sL7t69i4EDB+L58+fUtNs8ejU0NITWGxsbYWNjQ2Wi6Pvk5uYiMjISJSUlIITAzMwMQUFBHdqKhU2/Ynl5eZSUlIhMkq2srISZmRm3X+OQCFyLKwdHJ8fMzAyFhYXYu3cvpKWl0dTUBGdnZ/j6+qJHjx5sh0eNyspKTJ06FUVFRUJTNtuMaGneVF27dg3u7u6MKe378Hg86jd0r169QmFhodjkIM3E5K+//gppaWmMHTtWaD05ORkCgQDjxo2jps1ma6+hoSGSkpLg5+cntP7bb7+Bz+dT1e7s9O7dm/kdd4Zph21ebDExMZCXlwcgOS+2mTNnAgBCQkLEPtd2naVxjftU2pLq6upw/PhxxMbGorCwEAMHDpSILtDqCXfw4EEcO3YM2tramDNnDmprazFp0iQsXrwY27Zt+581bGxskJKSAjU1NfTr1+8f31M3b978n/U4hGHTJoJNnJ2d4e7uDiMjIzQ0NDB7hfz8fJEkTntTXV0t9nr1+vVriUwD79+/P9PeK0lycnIgEAgwePBgofXr169DWlqaGQREAzZbe3V1dZGVlSWSoMvKykLPnj2panNwtMEl6Dg4OKCjo4P169ezHYZECQgIgIGBAS5dusRMbWpoaEBQUFC73Mj8Ez4+PhgwYADOnz+PHj16SDRxkJSUBE9PT9TX14s8Rzs5uHLlSmzevFlknRCClStXUk3QsZmIXr58Ofz8/PDkyROmkiUlJQURERGc/xxFfvzxR0RGRjLTDo2MjLBs2TLMnz+f5cjowaYXG5u+nlFRUTh69KjIurm5OWbMmEE1Qff8+XOcOHECR48eRXp6Ovh8Ptzd3XH8+HHqyYPHjx/jyJEjOHjwIMrKyjBp0iQcP34cY8eOZT5XXF1dMWXKlHb5XHNycmK8aadMmfI//30c/xnTpk3D9evXERkZiVOnTjEVVdnZ2R26oioyMhIGBgaoqanB1q1boaysDKB1iMKSJUuoaJ45c4b5Ojk5Gd26dWO+b2lpQUpKCvT19alot+Hh4QF7e3vY29tLvPrb19cXISEhIgm6hw8fYsuWLbh+/To1bTYT0fPnz8eyZcvw9u1bof1aSEgIgoKCqGpzcLTBtbhycHCwVlHFJpqamkhNTYWVlRW6deuG7OxsmJiYIDU1FUFBQcjLy6OmraSkhIKCAuo3b+IwNDTE2LFjsWbNGnTv3l2i2goKCigpKRHZ1FZXV8Pc3Jxq+x3b7N27Fxs3bsSjR48AAPr6+li3bl2HHljAJqGhoYiMjMTSpUuZyrGrV69iz549CAgIQFhYGMsR0uPly5eIjY3FnTt3mBt4Dw8PKCgosB0aNdhsS1JQUICamhpcXV3h4eEh0ao5WVlZ9O7dG97e3pgzZw60tLREfub58+dwcnJCWlqaxOLi4Ggv3r59i4ULFyI0NFSiFedSUlIAINRh0YaMjAz09fURERGBiRMnUoth0aJFyMjIQGlpKXR0dDB8+HAMHz4c9vb26NOnDzVdAFBWVkZhYaHI77yqqgpWVlb466+/qOqz1drbdmC8a9cuxvpGXl4eK1aswJo1a6hqc3C0wSXoODg6OWxWVLGJmpoacnNzwefz0bt3b0RHR8PBwQEVFRWwtLREc3MzNe0RI0YgJCQEjo6O1DQ+RteuXZGXl8eKt6COjg6OHj0q4od06dIluLu74/Hjx1T1P4VE9JMnT6CgoMBUAHDQQVNTE7t372baLts4duwYli5dKvZ6x/G/8/DhQ2RlZYl9j304xbg9MTIywtq1azFr1iyh9SNHjmDt2rVUfaIuXLiAUaNGMTf0kiQzMxPDhg2TuC4HO7BZUcUmqqqquHnzJiuWEAYGBsjJyYGmpqbEtduoq6tDeno60tPTmYSdtrY2amtrqWlqaGjg3LlzItYIV65cwYQJE/Ds2TNq2p8CL168QElJCRQUFGBkZMRUDnNwSAKuxZWDo5Pj5+cHFxcXViqq2MTCwoI5HRw8eDC2bt0KWVlZ7N+/n/omcOnSpQgKCkJdXR0sLS0hIyMj9DxNA/Xp06cjPT2dlQTd5MmTsWzZMiQmJjL65eXlCAoKop4g+1QS0eIqXDjan5aWFrEeOf3798e7d+9YiEhysJUkO3jwIHx8fCArKwsNDQ2h1n0ej0dVm822pDFjxlD9+/+JAQMGoLm5GYqKigCAe/fuITExEWZmZlTiUlNT+7ctGZ4+fdru+p0dZWVlREREYNGiRRKvqGKTqVOn4tSpU1i+fLnEtcW17jc2NkJVVVViMaioqEBNTQ1qampQVVVFly5doKOjQ1Vz9OjRWLVqFU6fPs209zY2NuLrr7/G6NGjqWp/ColoZWVliVZDc3C8D1dBx8HRyWGzoopNkpOTGR+yyspKTJw4EXfu3IGGhgbi4uKoTr0TV2lB00D9fZqbm+Hi4gItLS2xyUGaN9F//vknHB0dcePGDXz22WcAgAcPHmDYsGE4efIk1Q0vm629AJCQkID4+HixE4M5M/X2Z+nSpZCRkcH27duF1r/66iu8fPkS33//PUuR0eVfJcloVpLp6urCx8cHq1atkng1GdttSWy9v8eMGQNnZ2f4+PigsbERffr0gYyMDOrr67F9+3YsXry4XfUOHz7MfN3Q0ICwsDCMHTtWqI08OTkZoaGhCAwMbFdtjv+DjYoqNtm4cSO2bduGkSNHon///lBSUhJ6nua+ZcuWLdDX14ebmxsAwMXFBSdOnECPHj3w66+/wtrampr2ihUrkJGRgYKCAlhYWMDOzg7Dhw+HnZ0d9QThw4cPYWdnh4aGBqatND8/H927d8fFixehq6tLTZvN1l6gdUDGL7/8IvZ6fvLkSer6HBxcgo6Do5Pj7e0NW1tbzJs3j+1QWOfp06f/UYXAf8u9e/f+8Xk9PT1q2tHR0fDx8YGCgoLEb+CB1hvpixcvoqCgAAoKCrCysoKdnR1VTYDdRPSuXbvwzTffwMvLCwcOHMDcuXNRUVGBnJwc+Pr6YuPGjRKPqaOzdOlSxMTEQFdXF0OGDAHQOj35/v378PT0FEpMf5jE+/8ZNpNkGhoayM7OZvWwh422JDbf35qamsjIyIC5uTmio6Oxe/du5OXl4cSJE1izZg1KSkqoaU+bNg0ODg4i06n37NmDS5cu4dSpU9S0OztNTU24fPkyk6S7efMmzMzMqHrnssmH3pLvQ3vfwufzERsbi6FDh+LixYtwdXVFXFwck5C/cOECNW0pKSloaWkhMDAQTk5OMDU1paYljqamJvz8889C+7WZM2eKHOzSgo1E9PHjx+Hp6YkxY8bg4sWLGDNmDMrKylBXV4epU6fi4MGD1LQ5ONrgEnQcHJ0cNiuqPgXKy8tRUVEBOzs7KCgoMFVsHRUdHR34+/tj5cqVrHgmsQWbieg+ffpg7dq1mDlzJlRUVFBQUAA+n481a9bg6dOn2LNnj8Rj6ug4ODj8Wz/H4/GQmppKORrJwWaSLCQkBOrq6li5cqXEtdmEzfe3oqIi7ty5g88//xyurq4wNzfH2rVrcf/+fZiYmFD1UlVWVkZ+fr7IsKOysjL069cPL168oKbdWWGzoqqzoqCggNLSUujq6iIgIACvXr1CVFQUSktLMXjwYKpebAUFBcjIyEB6ejoyMzMhLS3NVJLZ29tLPGEnadhIRFtZWWHRokXw9fVlrucGBgZYtGgRevTogfXr11PT5uBog0vQcXB0ctiuqGKLhoYGuLq6Ii0tDTweD2VlZeDz+Zg3bx5UVVURERFBPYbbt2+LLaGn6cemrq6OnJycTtfSzGYiWlFRESUlJdDT04O2tjYuXrwIa2trlJWVYciQIWhoaKCmzdG5YDNJ1tLSgokTJ+Lly5di32MdqVLxfdh8f1tZWWH+/PmYOnUqLCwskJSUhC+++AK5ubmYMGEC6urqqGnr6enBz88PwcHBQuvh4eHYs2fPv6wU5/jPYbui6u7Y2YYAACTGSURBVFOg7bZVUgepPXv2REJCAoYOHQoTExOEhYXBxcUFd+/excCBA/H8+XOJxAG0Jux27NiB2NhYCASCDjvEjc1EtJKSEoqLi6Gvrw9NTU2kpaXB0tISJSUlGDFiRIdtI+f4tOCGRHBwdHJWr16Nb7/9ttNVVAUGBkJGRgY1NTVCm1w3NzcEBgZSTdBVVlZi6tSpKCoqYrzngP/bcNLcdHl5eSEuLg5ff/01NY1PkaNHjyI5ORkKCgpIT0+XqIG9jo4OGhoaoKenBz09PVy7dg3W1taoqqoCd0bG0Z5s2rQJEydORFJSksSTZN999x2Sk5NhYmICACLvsY4Km+/vNWvWwN3dHYGBgRg5ciTjBXfhwgXGN4oW69evx7x585Cens7oXrt2DUlJSYiOjqaq3VnJy8tjKqoiIiI6VUVVTEwMwsPDUVZWBgAwNjZGcHAwZs+eTVXX2dkZ7u7uMDIyQkNDA8aNGwcAYqtHaZCXl8dUkGVmZuL58+fo27fvv10l/v8j4eHh0NLSwtq1ayWeiFZXV8dff/0FAOjVqxdu3boFS0tLNDY2Uq1I5uB4Hy5Bx8HRyXnz5g3c3Nw6VXIOaL2BSU5OZoYVtGFkZET95D8gIAAGBga4dOkS+Hw+srOz0dDQgKCgIGzbto2qdktLC7Zu3Yrk5GRYWVl1mioXNhPRI0aMwNmzZ2FjY4N58+YhMDAQCQkJuHHjBpydnSUaC0fHhs0k2fbt2/HTTz9hzpw5VHU+Ndh8f0+fPh1ffvklamtrhczqR44cialTp1LVnjNnDkxNTbFr1y6cPHkShBCYmZkhKysLgwcPpqrdWbG2toa1tTVzoNRWUeXv79+hK6q2b9+O0NBQ+Pn5wdbWFoQQZGVlwcfHB/X19VQHkkRGRsLAwAA1NTXYunUrlJWVAQC1tbVYsmQJNV2gdWryixcvYG1tDXt7eyxYsAB2dnbo2rUrVV22YTMRPWzYMFy8eBGWlpZwdXVFQEAAUlNTcfHiRYwcOZKaLgfH+3AtrhwcnZzAwEBoaWl1uooqFRUV3Lx5E0ZGRkK+QTk5OXB0dKTalqSpqYnU1FRYWVmhW7duyM7OhomJCVJTUxEUFETVX+OfTl07mh/X+7DZ2isQCCAQCNClS+uZWHx8PC5fvgxDQ0Nm4iYHR3ugpqaGyMhIVpJkOjo6yMzMhJGRkcS12YR7f3NIkn+qqAoPD2c7PCoYGBhg/fr18PT0FFo/fPgw1q1bh6qqKiq6b9++xcKFCxEaGgo+n09F4584d+5cp0jI/Ssk2dr79OlTvHr1Cj179oRAIMC2bduY63loaCjU1NSoaXNwtMEl6Dg4Ojn+/v6IiYmBtbV1p6qomjBhAmxsbLBhwwaoqKigsLAQenp6mDFjBgQCARISEqhpq6mpITc3F3w+H71790Z0dDQcHBxQUVEBS0vLDl1GLxAIUF5ejsePH0MgEAg9R3OaK1uJ6Hfv3mHjxo3w9vaGrq6uRLU5Oh9sJsk2bdqE2tpa7Nq1S+LaHBydgQ8rquzt7TtFAkdeXh63bt0SO5DE0tISr169oqatqqqKmzdvspKgY5O2A2sNDQ2h9cbGRtjY2FD3p2YjEf3u3Tv8/PPPGDt2LHR0dKhocHD8O3AtrhwcnZyioiLGq+bWrVtCz3Vk36Dw8HDY29vjxo0bePPmDUJCQlBcXIynT58iKyuLqraFhQUKCwvB5/MxePBgbN26FbKysti/f3+H3gReu3YN7u7uuHfvnog3E4/Ho3oqylZrb5cuXRAeHg4vLy8qfz8Hx/sEBARg9+7drCTJsrOzkZqainPnzsHc3FzkPXby5EmJxyQJDh48CGVlZbi4uAit//LLL2hubube+xztxpEjRzpFQu5DDA0NER8fL3LAFhcXR/0wYurUqTh16hSWL19OVedTo7q6Wuye7PXr13j48CFVbbZae7t06YLFixejpKSEqg4Hx7+CS9BxcHRy0tLS2A6BFczMzFBYWIi9e/dCWloaTU1NcHZ2hq+vL3r06EFVe/Xq1WhqagIAhIWFYeLEiRg2bBg0NDQQFxdHVZtNfHx8MGDAAJw/fx49evSQaAKYzUT0qFGjkJ6e3um8uTgkD5tJMlVV1U7pqbh582bs27dPZF1bWxsLFy7kEnQc7cbEiRPZDoEV1q9fDzc3N/z++++wtbUFj8fD5cuXkZKSgvj4eKrahoaG2LBhA65cuYL+/ftDSUlJ6HmaA6bY4MyZM8zXycnJ6NatG/N9S0sLUlJSoK+vTzUGNhPRgwcPRl5eHvT09CSuzcHRBtfiysHBwfEJ8PTpU6ipqXXoqkUlJSUUFBRIZPLZp0RUVBTWrVsHDw8PsRv8yZMnsxQZR0dj7ty5//j8wYMHJRRJ50FeXh537twRuWmtrq6GqakpXr58yU5gHBwdiNzcXERGRqKkpIQZSBIUFER9WrGBgcFHn+PxeNRbPSVN2xAtHo8n0ukgIyMDfX19REREdNhk8S+//IKVK1ciMDBQ7H7NysqKpcg4OhNcgo6Dg6PT8urVKxQWFor1Q5NE0qS8vBwVFRWws7ODgoICCCEdOkE3YsQIhISEwNHRke1QJMo/TY2l3drLwSFpnjx5grt374LH48HY2BhaWlpsh0SVzz//HHv27BH5zDh9+jR8fX3x4MEDliLj4ODg+O8wMDBATk4ONDU12Q5Foojbr7UlK7n9Goek4FpcOTg4OiVJSUnw9PREfX29yHO0P4QbGhrg6uqKtLQ08Hg8lJWVgc/nY/78+VBVVUVERAQ1bTZZunQpgoKCUFdXB0tLS5H2u456Mvlh8peDgxYvX74EIQSKiooAgHv37iExMRFmZmYYM2YMVe2mpiYsXboUMTExzGteWloanp6e2L17NxNTR2PGjBnw9/eHiooKM+gmIyMDAQEBmDFjBsvRtS//SQtzR/Uc5JA8Hh4ezFAMNqdEt9W0dOSD1DbETcZtbGyEqqqq5IORILQmAnNw/Cd8/Fifg4ODowPj5+cHFxcX1NbWQiAQCD1on5AFBgZCRkYGNTU1Qjetbm5uSEpKoqrNJtOmTUNJSQm8vb0xcOBA9O3bF/369WP+7Eioq6szyV9vb2/89ddfLEfE0RlwcnJCTEwMgNabqUGDBiEiIgJOTk7Yu3cvVe3ly5cjIyMDZ8+eRWNjIxobG3H69GlkZGQgKCiIqjabhIWFYfDgwRg5ciQUFBSgoKCAMWPGYMSIEfjuu+/YDq9d6dat27/94OBoL5SVlREREQETExP07NkTM2fOxL59+3Dnzh2J6MfExMDS0pJ5f1tZWeHIkSMS0WaLLVu2CHkiu7i4QF1dHb169UJBQQGLkbU/NjY2ePbsGQDg8OHD0NLSgp6entgHB4ck4FpcOTg4OiVdu3ZFXl4eevfuLXFtHR0dJCcnw9raGioqKigoKACfz0dVVRUsLS3x4sULicckCe7du/ePz3ekzY+ysjIzqVdaWhp1dXUdvtWPg300NTWRkZEBc3NzREdHY/fu3cjLy8OJEyewZs0aqtPpNDU1kZCQAHt7e6H1tLQ0uLq64smTJ9S0PwVKS0tRUFAABQUFWFpadqjrGQfHp0BdXR3S09ORnp6OjIwMlJaWQltbG7W1tdQ0t2/fjtDQUPj5+cHW1haEEGRlZeH7779HWFgYAgMDqWmzCZ/PR2xsLIYOHYqLFy/C1dUVcXFxiI+PR01NDS5cuMB2iO2GgoICysrK8Nlnn0FaWhq1tbXQ1tZmOyyOTgzX4srBwdEpmT59OtLT01lJ0DU1NYlt96qvr4ecnJzE45EUnemG9YsvvsCUKVPQv39/EELg7+8PBQUFsT/7008/STg6jo5Kc3MzVFRUAAAXLlyAs7MzpKSkMGTIkH+ZIG8P7e7du4usa2tro7m5mar2p4CxsTGMjY3ZDoODo8OioqICNTU1qKmpQVVVFV26dIGOjg5Vzd27d2Pv3r3w9PRk1pycnGBubo5169Z12ARdbW0tdHV1AQDnzp2Dq6srxowZA319fQwePJjl6NqXvn37Yu7cufjyyy9BCMG2bdugrKws9mfXrFkj4eg4OiNcgo6Dg6NTsmfPHri4uCAzM1OsH5q/vz81bTs7O8TExGDDhg0AWv1MBAIBwsPD4eDgQE33U+H27duoqanBmzdvhNY70jTT2NhYREZGoqKiAjweD3/++SdevXrFdlgcHRxDQ0OcOnUKU6dORXJyMnPz+PjxY3Tt2pWq9hdffIG1a9ciJiYG8vLyAFo98davX48vvviCqjabtLS04NChQ0hJSRE7cCg1NZWlyOiTkJDAVNR8eD2/efMmS1FxdDRWrFiBjIwMFBQUwMLCAnZ2dli1ahXs7Oyoe6LV1tZi6NChIutDhw6lWrnHNmpqarh//z50dXWRlJSEsLAwAK0+fB1tUMKhQ4ewdu1anDt3DjweD7/99hu6dBFNkfB4PC5BxyERuBZXDg6OTkl0dDR8fHygoKAADQ0NIdNfHo+HyspKatq3b9+Gvb09+vfvj9TUVEyePBnFxcV4+vQpsrKyWKnqkwSVlZWYOnUqioqKmKlYwP8ZLne0TV8bBgYGuHHjBjQ0NNgOhaODk5CQAHd3d7S0tGDkyJFMG9KmTZvw+++/47fffqOmfevWLTg6OuLVq1ewtrYGj8dDfn4+5OXlkZycDHNzc2rabOLn54dDhw5hwoQJ6NGjh4iBfGRkJEuR0WXXrl345ptv4OXlhQMHDmDu3LmoqKhATk4OfH19sXHjRrZD5OggSElJQUtLC4GBgXBycoKpqanEtC0sLODu7o6vv/5aaD0sLAxxcXEoKiqSWCySxM/PD+fOnYORkRHy8vJQXV0NZWVlxMXFYcuWLR02AS8lJYW6ujquxZWDVbgEHQcHR6dER0cH/v7+WLlypdix6rSpq6vD3r17kZubC4FAABsbG/j6+qJHjx4Sj0VSTJo0CdLS0jhw4AD4fD6ys7PR0NCAoKAgbNu2DcOGDWM7RA6O/++pq6tDbW0trK2tmWtbdnY2unbtij59+lDVfvnyJWJjY3Hnzh0QQmBmZgYPD4+Ptnd3BDQ1NRETE4Px48ezHYpE6dOnD9auXYuZM2cKeamuWbMGT58+xZ49e9gOkaODUFBQgIyMDKSnpyMzMxPS0tIYPnw4M9mVZsLuxIkTcHNzw6hRo2Brawsej4fLly8jJSUF8fHxmDp1KjVtNnn79i127dqFmpoazJkzhxnktWPHDigrK2P+/PksR8jB0XHhEnQcHBydEnV1deTk5HTYarVPEU1NTaSmpsLKygrdunVDdnY2TExMkJqaiqCgIOTl5bEdIgcHB8d/RM+ePZGent7p/OcUFRVRUlICPT09aGtr4+LFi7C2tkZZWRmGDBmChoYGtkPk6KAUFBRgx44diI2NhUAgoF59n5ubi8jISJSUlDAHD0FBQR1u+nwbb9++xcKFCxEaGgo+n892OBwcnQ7Og46Dg6NT4uXlhbi4OJG2BUnx6tUrFBYWivUs6khebO/T0tLCGO9qamri0aNHMDExgZ6eHu7evctydBwcHP8LmzZtQvfu3eHt7S20/tNPP+HJkydYsWIFS5HRJSgoCDt37sSePXtE2ls7Mjo6OmhoaICenh709PRw7do1WFtbo6qqCtzZP0d7k5eXx0xwzczMxPPnz9G3b1+J+Pb2798fsbGx1HU+FWRkZJCYmIjQ0FC2Q+Hg6JRwCToODo5OSUtLC7Zu3Yrk5GRYWVmJDInYvn07Ne2kpCR4enqivr5e5Dkej9dhvdgsLCxQWFgIPp+PwYMHY+vWrZCVlcX+/fu5U1oOjv/PiYqKwtGjR0XWzc3NMWPGjA6boLt8+TLS0tLw22+/wdzcXOSz5OTJkyxFRpcRI0bg7NmzsLGxwbx58xAYGIiEhATcuHEDzs7ObIfH0YFQU1PDixcvYG1tDXt7eyxYsAB2dnbUB98AgIeHB9NKa2RkRF3vU2Hq1Kk4deoUli9fznYoHBydDq7FlYODo1PyT6euPB6P6uQ9Q0NDjB07FmvWrEH37t2p6XxqJCcno6mpCc7OzqisrMTEiRNx584daGhoIC4uDiNGjGA7RA4Ojv8SeXl5lJSUwMDAQGi9srISZmZmHXaK8dy5c//x+YMHD0ooEskiEAggEAiYaYfx8fG4fPkyDA0N4ePjA1lZWZYj5OgonDt3TmIJuQ9ZtGgRMjIyUFpaCh0dHQwfPpzxv6Pt6ckmGzduxLZt2zBy5Ej0798fSkpKQs/7+/uzFBkHR8eHS9BxcHBwSJiuXbsiLy+P878D8PTpU6ipqXX41jCBQIDy8nKxLc12dnYsRcXB0X4YGRlh7dq1mDVrltD6kSNHsHbtWqqTsTkkT01NDXR1dUWu3YQQ3L9/H59//jlLkXFwtD91dXVMi21bwk5bWxu1tbVsh0aFDw9a3ofH43XY6zmfz0dOTg40NDSE1hsbG2FjY9Nh/90cnxZciysHBweHhJk+fTrS09M7bYKuvLwcFRUVsLOzg7q6eof3K7p27Rrc3d1x7949kX9rR25p5uhczJ8/H8uWLcPbt2+ZatiUlBSEhIQgKCiI5ejo8+TJE9y9exc8Hg/GxsbQ0tJiOySqGBgYoLa2Ftra2kLrT58+hYGBAXdd4+hQqKioQE1NDWpqalBVVUWXLl2go6PDdljUqKqqYjsEVqiurhZ77Xr9+jUePnzIQkQcnREuQcfBwcEhYfbs2QMXFxdkZmbC0tJSxLOoo7YONDQ0wNXVFWlpaeDxeCgrKwOfz8f8+fOhqqqKiIgItkOkgo+PDwYMGIDz58+jR48eHb5akKNzEhISgqdPn2LJkiV48+YNgNa21xUrVmDVqlUsR0ePpqYmLF26FDExMUx1rLS0NDw9PbF7924oKiqyHCEdCCFir2UvXryAvLw8CxFxcLQ/K1asQEZGBgoKCmBhYQE7OzusWrUKdnZ2UFVVZTs8idB2sNiR9y5nzpxhvk5OTka3bt2Y71taWpCSkgJ9fX0WIuPojHAtrhwcHBwSJjo6Gj4+PlBQUICGhobQpqcjtw54enri8ePHiI6OhqmpKQoKCsDn83HhwgUEBgaiuLiY7RCpoKSkhIKCAhgaGrIdCgcHdV68eIGSkhIoKCjAyMgIcnJybIdElUWLFuHSpUvYs2cPbG1tAbQOjvD398fo0aOxd+9eliNsX9pM43fu3IkFCxYIJSBbWlpw/fp1SEtLIysri60QOTjaDSkpKWhpaSEwMBBOTk4wNTVlOySJERMTg/DwcJSVlQEAjI2NERwcjNmzZ7McWfsjJSUFoHUP/mFqREZGBvr6+oiIiMDEiRPZCI+jk8FV0HFwcHBImNWrV+Pbb7/FypUrmU1BZ+DChQtITk7GZ599JrRuZGSEe/fusRQVfQYPHozy8nIuQcfRKVBWVsbAgQPZDkNinDhxAgkJCbC3t2fWxo8fDwUFBbi6una4BF1eXh6A1qqaoqIioWEQsrKysLa2xldffcVWeBwc7UpeXh4yMjKQnp6OiIgISEtLM0Mi7O3tO2zCbvv27QgNDYWfnx9sbW1BCEFWVhZ8fHxQX1+PwMBAtkNsV9qqnw0MDJCTkwNNTU2WI+LozHAJOg4ODg4J8+bNG7i5uXWq5BzQ2gomrt2rvr6+Q1fZLF26FEFBQairqxPb0mxlZcVSZBwcHP8rzc3NYqdxa2tro7m5mYWI6JKWlgagdXrtzp07WZmsycEhKaytrWFtbc1YjxQUFGDHjh3w9/eHQCDosF6Lu3fvxt69e+Hp6cmsOTk5wdzcHOvWretwCbo2xHnvNTY2dpp2Zo5PA67FlYODg0PCBAYGQktLC19//TXboUiUCRMmwMbGBhs2bICKigoKCwuhp6eHGTNmQCAQICEhge0QqSAuEdvWRsENieDg+P+bkSNHQkNDAzExMYz32suXL+Hl5YWnT5/i0qVLLEdIhz///BMtLS1QV1cXWn/69Cm6dOnCJe44Ogx5eXnMBNfMzEw8f/4cffv2hYODA8LDw9kOjwry8vK4deuWSOV/WVkZLC0t8erVK5Yio8uWLVugr68PNzc3AICLiwtOnDiBHj164Ndff4W1tTXLEXJ0BrgKOg4ODg4J09LSgq1btyI5ORlWVlYiFVXbt29nKTK6hIeHw97eHjdu3MCbN28QEhKC4uJiPH36tEP7FXXWaWgcHJ2BnTt3wtHREZ999hmsra3B4/GQn58PeXl5JCcnsx0eNWbMmIFJkyZhyZIlQuvx8fE4c+YMfv31V5Yi4+BoP9TU1PDixQtYW1vD3t4eCxYsgJ2dXYdPQBsaGiI+Pl7kIDkuLg5GRkYsRUWfqKgoxMbGAgAuXryIS5cuISkpCfHx8QgODsaFCxdYjpCjM8BV0HFwcHBIGAcHh48+x+PxkJqaKsFoJEtdXR327t2L3NxcCAQC2NjYwNfXFz169GA7NA4ODo7/ipcvXyI2NhZ37twBIQRmZmbw8PCAgoIC26FRQ11dHVlZWSIeXHfu3IGtrS0aGhpYioyDo/04d+5cp0jIfciJEyfg5uaGUaNGwdbWFjweD5cvX0ZKSgri4+MxdepUtkOkgoKCAkpLS6Grq4uAgAC8evUKUVFRKC0txeDBg/Hs2TO2Q+ToBHAJOg4ODg4ODglw+/Zt1NTU4M2bN0LrkydPZikiDg4Ojv8OJSUlXLt2DZaWlkLrRUVFGDx4cIf03+Pg6Ezk5uYiMjISJSUlzMFDUFAQ+vXrx3Zo1OjZsycSEhIwdOhQmJiYICwsDC4uLrh79y4GDhyI58+fsx0iRyeAa3Hl4ODg4JAYr169QmFhIR4/fsxMzWqjoyaqKisrMXXqVBQVFTHec0BrtSQAzoOOg+P/YzZt2oTu3bvD29tbaP2nn37CkydPsGLFCpYio8vAgQOxf/9+7N69W2h937596N+/P0tRcXBwtBf9+/dn2j07C87OznB3d4eRkREaGhowbtw4AEB+fr6IHx8HBy24BB0HBwcHh0RISkqCp6cn6uvrRZ7ryMMSAgICYGBggEuXLoHP5yM7OxsNDQ0ICgrCtm3b2A6Pg4PjfyAqKgpHjx4VWTc3N8eMGTM6bIJu48aNGDVqFAoKCjBy5EgAQEpKCnJycjifJg6O/8/x8PCAvb097O3tO7Tn3IdERkbCwMAANTU12Lp1K5SVlQEAtbW1In6bHBy04FpcOTg4ODgkgqGhIcaOHYs1a9age/fubIcjMTQ1NZGamgorKyt069YN2dnZMDExQWpqKoKCgpCXl8d2iBwcHP8l8vLyKCkpgYGBgdB6ZWUlzMzMOuy0Q6C1qiQ8PBz5+flQUFCAlZUVVq1a1alu6Dk4OiKLFi1CRkYGSktLoaOjg+HDh2P48OGwt7dHnz592A6PCm/fvsXChQsRGhoKPp/PdjgcnRgptgPg4ODg4OgcPH78GMuXL+9UyTmgtYW17RRWU1MTjx49AgDo6enh7t27bIbGwcHxP6Krqyt2CnVWVhZ69uzJQkSSo2/fvvj5559RXFyMGzdu4Kf/1979hVR9/3Ecf52jZ+usjp2TnLARxY7aqDzaP+oiOOdYEUFjYmU1AiNaIFSnTFALNlYxCFf2j9iMGEyDoaw/UIQG1Tm4rqzkSJGsZivoL9lFmEXm8XcxfvJz1Y/VPOcz/T4f4M3n48ULL/T4+n4/n/ePP1LOAcNATU2N2tvbde/ePVVXV2v06NHav3+/pk6dOmyHejkcDp04ccJ0DIAjrgCA5Fi2bJkikYgyMzNNR0mqnJwctbW1yefzac6cOaqqqtIHH3ygw4cP85QWGOK+/PJLbd68WT09PZo3b56kP496lpeXq6yszHC65Hj+/Ll6enoGrFlt6iUwHLlcLnk8Hnk8HrndbqWmpiojI8N0rIQpLCzUyZMntWXLFtNRYGEccQUAJEV3d7eKiork9Xrl9/vlcDgG7IfDYUPJEqupqUnPnj3TkiVL1NHRoc8++0zt7e1KT09XfX19/z/1AIaevr4+VVZW6sCBA/0TmkeMGKGKigp9/fXXhtMlTnd3t8rLy9XQ0KDOzs7X9ofrnaKAFVRUVCgajSoWiyknJ0eBQEDBYFCBQEBut9t0vIT59ttvtXv3bs2fP18zZ87UyJEjB+wP18+p+HehoAMAJMWRI0dUUlIip9Op9PT0/imm0p9DIjo6OgymS64nT57I4/EM+BkAGLq6urp0/fp1OZ1OZWdn68MPPzQdKaHWr1+vCxcuaMeOHSouLtahQ4d09+5d1dTUaNeuXVq1apXpiADek91ul9frVWlpqQoKCjR58mTTkZLir3eJ/i+rfU6FORR0AICkyMjIUDgcVmVlpex2612BevPmTf3+++8KBAJyOp3q6+ujoAMwJE2YMEG1tbUKhUJKS0vTlStXlJWVpbq6Ov388886c+aM6YgA3lMsFlM0GlUkElFzc7NSUlL6h0SEQiHLFHaACRR0AICkGDNmjFpaWix3B11nZ6eWL1+uCxcuyGaz6caNG/L5fFq7dq3cbrf27NljOiIAvJNRo0bp2rVrmjhxosaPH6/jx49r9uzZunXrlvx+v7q6ukxHBDBIYrGY9u3bp6NHjyoej1viCPt/KxIepCLZrPcKAwDAiNWrV6u+vt50jKQrLS2Vw+HQnTt39NFHH/Wvr1ixQo2NjQaTAcD78fl8+uOPPyRJU6ZMUUNDgyTp1KlTw/qOKsAqWltbtXfvXhUUFCg/P191dXXKy8sb9gMUamtr5ff75XQ65XQ6lZubq7q6OtOxYCFMcQUAJEVvb6+qqqrU1NSk3Nzc14ZEVFdXG0qWWGfPnlVTU5PGjx8/YD07O1u3b982lAoA3t+aNWsUi8UUDAa1detWLV68WAcPHtSrV6+G7e9ywCo8Ho+6urqUl5enUCikdevWKRAIDPvpzNXV1frqq6+0YcMGzZ07V319fbp48aJKSkr0+PFjlZaWmo4IC+CIKwAgKfLz89+6Z7PZdP78+SSmSR6Xy6UrV64oOztbLpdLsVhMPp9PLS0tWrRo0RsnIALAUHLnzh1dunRJmZmZysvLMx0HwD9w+vRpSxRyf/XJJ59o+/btKi4uHrD+008/6ZtvvtGtW7cMJYOVUNABAJBAixcv1owZM7Rz5065XC61tbVp4sSJWrlypeLxuH755RfTEQHgb+vp6dHChQtVU1OjSZMmmY4DAINixIgRunr1qrKysgas37hxQ36/Xy9evDCUDFbCEVcAABLou+++UygU0qVLl/Ty5UuVl5fr2rVrevLkiS5evGg6HgC8E4fDoatXr3J5OoBhJSsrSw0NDdq2bduA9fr6emVnZxtKBavhDToAABLswYMH+v7773X58mXF43HNmDFD69ev17hx40xHA4B3VlZWJofDoV27dpmOAgCD4tixY1qxYoUWLFiguXPnymaz6ddff9W5c+fU0NCgwsJC0xFhARR0AAAAAP62jRs3qra2VllZWZo1a5ZGjhw5YJ9BEQCGosuXL2vv3r26fv26+vr6NGXKFJWVlWn69Ommo8EiKOgAAEiwFy9eqK2tTY8ePVI8Hh+w9/nnnxtKBQDvx6pDfwAASCQKOgAAEqixsVHFxcV6/Pjxa3s2m029vb0GUgHAu2lra1NOTo7sdrvpKAAw6FatWqVQKKRQKMSdczCGv7AAACTQhg0bVFRUpPv37ysejw/4opwDMFRMnz69/0GDz+dTZ2en4UQAMHhGjRqlPXv26NNPP9XHH3+sL774Qj/88IPa29tNR4OF8AYdAAAJlJaWptbWVmVmZpqOAgDvLT09XWfOnNGcOXNkt9v18OFDeb1e07EAYFA9ePBAkUhEkUhE0WhUv/32m8aOHav79++bjgYLSDUdAACA4WzZsmWKRCIUdACGtKVLlyoYDGrcuHGy2WyaNWuWUlJS3vi9HR0dSU4HAIPD5XLJ4/HI4/HI7XYrNTVVGRkZpmPBIniDDgCABOru7lZRUZG8Xq/8fr8cDseA/XA4bCgZALybxsZG3bx5U+FwWDt27JDL5Xrj923atCnJyQDgn6moqFA0GlUsFlNOTo4CgYCCwaACgYDcbrfpeLAICjoAABLoyJEjKikpkdPpVHp6umw2W/+ezWbjTRMAQ86aNWt04MCBtxZ0ADDU2O12eb1elZaWqqCgQJMnTzYdCRZEQQcAQAJlZGQoHA6rsrKS6YcAAAD/QrFYTNFoVJFIRM3NzUpJSVEwGOyf7Ephh2SgoAMAIIHGjBmjlpYW7qADAAAYImKxmPbt26ejR48qHo+rt7fXdCRYAEMiAABIoNWrV6u+vl7btm0zHQUAAABv0dra2j/Btbm5WU+fPtW0adOUn59vOhosgoIOAIAE6u3tVVVVlZqampSbm/vakIjq6mpDyQAAACBJHo9HXV1dysvLUygU0rp16xQIBJSWlmY6GiyEI64AACTQ/3vqarPZdP78+SSmAQAAwF+dPn2aQg7GUdABAAAAAAAABjFODgAAAAAAADCIgg4AAAAAAAAwiIIOAAAAAAAAMIiCDgAAAAAAADCIgg4AAAAAAAAwiIIOAAAAAAAAMIiCDgAAAAAAADCIgg4AAAAAAAAwiIIOAAAAAAAAMOg/y0S3Y0TUd1sAAAAASUVORK5CYII=", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn.model_selection import train_test_split \n", - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.linear_model import LogisticRegression\n", - "cancer = load_breast_cancer()\n", - "import pandas as pd\n", - "# Making a data frame\n", - "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n", - "\n", - "fig, axes = plt.subplots(15,2,figsize=(10,20))\n", - "malignant = cancer.data[cancer.target == 0]\n", - "benign = cancer.data[cancer.target == 1]\n", - "ax = axes.ravel()\n", - "\n", - "for i in range(30):\n", - " _, bins = np.histogram(cancer.data[:,i], bins =50)\n", - " ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n", - " ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n", - " ax[i].set_title(cancer.feature_names[i])\n", - " ax[i].set_yticks(())\n", - "ax[0].set_xlabel(\"Feature magnitude\")\n", - "ax[0].set_ylabel(\"Frequency\")\n", - "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n", - "fig.tight_layout()\n", - "plt.show()\n", - "\n", - "import seaborn as sns\n", - "correlation_matrix = cancerpd.corr().round(1)\n", - "# use the heatmap function from seaborn to plot the correlation matrix\n", - "# annot = True to print the values inside the square\n", - "plt.figure(figsize=(15,8))\n", - "sns.heatmap(data=correlation_matrix, annot=True)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "id": "e9552a3c", - "metadata": {}, - "source": [ - "## Discussing the correlation data\n", - "\n", - "In the above example we note two things. In the first plot we display\n", - "the overlap of benign and malignant tumors as functions of the various\n", - "features in the Wisconsing breast cancer data set. We see that for\n", - "some of the features we can distinguish clearly the benign and\n", - "malignant cases while for other features we cannot. This can point to\n", - "us which features may be of greater interest when we wish to classify\n", - "a benign or not benign tumour.\n", - "\n", - "In the second figure we have computed the so-called correlation\n", - "matrix, which in our case with thirty features becomes a $30\\times 30$\n", - "matrix.\n", - "\n", - "We constructed this matrix using **pandas** via the statements" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "623ddee7", - "metadata": {}, - "outputs": [], - "source": [ - "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)" - ] - }, - { - "cell_type": "markdown", - "id": "7a61e306", - "metadata": {}, - "source": [ - "and then" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "859552c6", - "metadata": {}, - "outputs": [], - "source": [ - "correlation_matrix = cancerpd.corr().round(1)" - ] - }, - { - "cell_type": "markdown", - "id": "43d915d7", - "metadata": {}, - "source": [ - "Diagonalizing this matrix we can in turn say something about which\n", - "features are of relevance and which are not. This leads us to\n", - "the classical Principal Component Analysis (PCA) theorem with\n", - "applications. This will be discussed later this semester ([week 43](https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html))." - ] - }, - { - "cell_type": "markdown", - "id": "5c8e892e", - "metadata": {}, - "source": [ - "## Other measures in classification studies: Cancer Data again" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "08b680f2", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(426, 30)\n", - "(143, 30)\n", - "[1. 0.86666667 1. 0.85714286 1. 0.85714286\n", - " 1. 0.92857143 0.92857143 1. ]\n", - "Test set accuracy with Logistic Regression: 0.94\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n", - "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n", - "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", - "\n", - "Increase the number of iterations (max_iter) or scale the data as shown in:\n", - " https://scikit-learn.org/stable/modules/preprocessing.html\n", - "Please also refer to the documentation for alternative solver options:\n", - " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", - " n_iter_i = _check_optimize_result(\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfYAAAHFCAYAAAAABdu/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABCWElEQVR4nO3deVxU9f7H8few4wIGCC4houa+pJCJZrmkZWZat7TsuqJlVmSuqaVpGtmiZoW5pFipVyu1zRauS1kuqVGWeuumKC7gnigmCJzfH17m1wjawMxIM+f19HEeD/nO95zzOTDwmc/3fM85FsMwDAEAAI/gVdYBAAAA5yGxAwDgQUjsAAB4EBI7AAAehMQOAIAHIbEDAOBBSOwAAHgQEjsAAB6ExA4AgAchsXuI5ORkWSwWBQQEaP/+/UVeb9eunRo3blwGkTlH//79VbNmTZu2mjVrqn///lc1jn379slisSg5Odmu/nv37tVjjz2munXrKjAwUOXKlVOjRo309NNP69ChQy6PtWvXrgoJCZHFYtGwYcOcvo+y+BlI0vr162WxWK74s+jQoYMsFkuR9429lixZopkzZ5ZonZK+PwBX8CnrAOBcOTk5evrpp/XOO++UdSgut3LlSgUFBZV1GJf1ySef6P7771dYWJgee+wxNW/eXBaLRT/99JMWLFigTz/9VKmpqS7b/5NPPqktW7ZowYIFqlKliqpWrer0fZT1z6BixYp66623iny4SEtL0/r16x2KbcmSJfr5559L9IGoatWq2rRpk2rXrl3q/QKOIrF7mNtvv11LlizRyJEj1axZM5ft548//lBgYKDLtm+P5s2bl+n+ryQtLU3333+/6tatq3Xr1ik4ONj6WocOHZSQkKCVK1e6NIaff/5ZLVu2VI8ePVy2j7L+GfTq1Uvz58/Xf//7X1133XXW9gULFqh69epq0qSJdu3a5fI48vPzlZeXJ39/f7Vq1crl+wOuhKF4DzN69GiFhoZqzJgxf9n3/PnzGjt2rKKjo+Xn56fq1avr0Ucf1e+//27Tr2bNmrrzzju1YsUKNW/eXAEBAZo0aZJ1OHTJkiUaM2aMqlatqgoVKqhbt246cuSIzpw5o4ceekhhYWEKCwvTgAEDdPbsWZttv/HGG7r55psVHh6u8uXLq0mTJnrxxRd14cKFv4z/0mHgdu3aWYdnL13+PDSamZmphx9+WNdee638/PwUHR2tSZMmKS8vz2b7hw8fVs+ePVWxYkUFBwerV69eyszM/Mu4JGn69OnKzs5WUlKSTVIvZLFYdM8999i0LViwQM2aNVNAQIBCQkJ09913a/fu3TZ9+vfvrwoVKui3337THXfcoQoVKigyMlIjRoxQTk6OpP8fpv7tt9/02WefWb8H+/bts56y2bdvn812C9dZv369tS01NVV33nmnwsPD5e/vr2rVqqlr1646ePCgtU9xQ/Hp6en65z//aV2vQYMGeuWVV1RQUGDtUzhk/fLLL2v69OmKjo5WhQoVFBcXp82bN9v1PZakTp06KTIyUgsWLLC2FRQUaNGiRerXr5+8vIr+ibPnPdeuXTt9+umn2r9/v8376M+xv/jii5oyZYqio6Pl7++vdevWFRmKP3/+vJo3b646dero9OnT1u1nZmaqSpUqateunfLz8+0+XsAeVOwepmLFinr66af1xBNPaO3aterQoUOx/QzDUI8ePbRmzRqNHTtWbdu21Y4dOzRx4kRt2rRJmzZtkr+/v7X/999/r927d+vpp59WdHS0ypcvr+zsbEnSuHHj1L59eyUnJ2vfvn0aOXKkHnjgAfn4+KhZs2ZaunSpUlNTNW7cOFWsWFGzZs2ybnfPnj3q3bu39cPFjz/+qKlTp+o///mPzR9reyQlJSkrK8um7ZlnntG6detUr149SRf/oLZs2VJeXl6aMGGCateurU2bNmnKlCnat2+fFi5cKOniiMStt96qw4cPKzExUXXr1tWnn36qXr162RXLl19+qYiICLurt8TERI0bN04PPPCAEhMTdeLECT377LOKi4vT1q1bbarRCxcu6K677lJ8fLxGjBihr7/+Ws8995yCg4M1YcIEtWjRQps2bdLdd9+t2rVr6+WXX5akEg3FZ2dnq1OnToqOjtYbb7yhiIgIZWZmat26dTpz5sxl1zt27Jhat26t3NxcPffcc6pZs6Y++eQTjRw5Unv27FFSUpJN/zfeeEP169e3nst+5plndMcddygtLa3YD0SX8vLyUv/+/fXWW29pypQp8vb21pdffqmDBw9qwIABeuKJJ4qsY897LikpSQ899JD27Nlz2ZGVWbNmqW7dunr55ZcVFBRk8zMqFBAQoOXLlysmJkYDBw7UBx98oIKCAj344IMyDENLly6Vt7f3Xx4nUCIGPMLChQsNScbWrVuNnJwco1atWkZsbKxRUFBgGIZh3HLLLUajRo2s/T///HNDkvHiiy/abGfZsmWGJGPu3LnWtqioKMPb29v45ZdfbPquW7fOkGR069bNpn3YsGGGJCMhIcGmvUePHkZISMhljyE/P9+4cOGC8fbbbxve3t7GyZMnra/169fPiIqKsukfFRVl9OvX77Lbe+mll4ocy8MPP2xUqFDB2L9/v03fl19+2ZBk7Ny50zAMw5g9e7Yhyfjwww9t+g0ePNiQZCxcuPCy+zUMwwgICDBatWp1xT6FTp06ZQQGBhp33HGHTXt6errh7+9v9O7d29rWr18/Q5KxfPlym7533HGHUa9ePZu2qKgoo2vXrjZthe+TtLQ0m/bCn+W6desMwzCMbdu2GZKMVatWXTH2S38GTz31lCHJ2LJli02/Rx55xLBYLNb3UFpamiHJaNKkiZGXl2ft99133xmSjKVLl15xv4Xxvvfee8bevXsNi8VifPLJJ4ZhGMZ9991ntGvXzjAMw+jatWuR982fXek9d7l1C2OvXbu2kZubW+xrl74/Cn+vZs6caUyYMMHw8vIyvvzyyyseI1BaDMV7ID8/P02ZMkXbtm3T8uXLi+2zdu1aSSoyjHrfffepfPnyWrNmjU1706ZNVbdu3WK3deedd9p83aBBA0lS165di7SfPHnSZjg+NTVVd911l0JDQ+Xt7S1fX1/17dtX+fn5+vXXX//6YC9j6dKlGj16tJ5++mkNHjzY2v7JJ5+offv2qlatmvLy8qxLly5dJElfffWVJGndunWqWLGi7rrrLpvt9u7du9QxXc6mTZv0xx9/FPlZREZGqkOHDkV+FhaLRd26dbNpa9q0abFXQ5RWnTp1dM0112jMmDF688037T5PvXbtWjVs2FAtW7a0ae/fv78Mw7C+7wp17drVpmJt2rSpJJXoWKKjo9WuXTstWLBAJ06c0IcffqiBAwdetr+z3nN33XWXfH197erbs2dPPfLIIxo1apSmTJmicePGqVOnTnbvCygJEruHuv/++9WiRQuNHz++2PPVJ06ckI+PjypXrmzTbrFYVKVKFZ04ccKm/UrDuCEhITZf+/n5XbH9/Pnzki6ei23btq0OHTqkV199VRs2bNDWrVv1xhtvSLo4HF4a69atU//+/dW3b18999xzNq8dOXJEH3/8sXx9fW2WRo0aSZKOHz8u6eL3JyIiosi2q1SpYlcMNWrUUFpaml19C7/XxX2Pq1WrVuRnUa5cOQUEBNi0+fv7W7+vzhAcHKyvvvpK119/vcaNG6dGjRqpWrVqmjhx4hXnP5w4ceKyx1H4+p+FhobafF14+qekP/v4+Hh9/PHHmj59ugIDA3XvvfcW28+Z77mSXmUwcOBAXbhwQT4+PkpISCjRukBJcI7dQ1ksFk2bNk2dOnXS3Llzi7weGhqqvLw8HTt2zCa5G4ahzMxM3XDDDUW252yrVq1Sdna2VqxYoaioKGv7Dz/8UOpt7tixQz169NAtt9yiefPmFXk9LCxMTZs21dSpU4tdvzABhYaG6rvvvivyur2T52677Ta99tpr2rx581+eZy9MbhkZGUVeO3z4sMLCwuzapz0KPxAUTrQrVPiB5s+aNGmif/3rXzIMQzt27FBycrImT56swMBAPfXUU8VuPzQ09LLHIcmpx/Jn99xzjx599FG98MILGjx48GWv2HDme64kvxPZ2dnq06eP6tatqyNHjmjQoEH68MMPS7xPwB5U7B7s1ltvVadOnTR58uQis9E7duwoSXr33Xdt2j/44ANlZ2dbX3elwj+Mf56kZxhGsQnZHunp6erSpYtq1aqlDz74oNhh0jvvvFM///yzateurdjY2CJLYWJv3769zpw5o48++shm/SVLltgVy5NPPqny5ctr6NChNrOhCxmGYZ2UFRcXp8DAwCI/i4MHD2rt2rVO/VkU3qxlx44dNu2XHuefWSwWNWvWTDNmzFClSpX0/fffX7Zvx44dtWvXriJ93n77bVksFrVv3770wV9BYGCgJkyYoG7duumRRx65bL+SvOf8/f1LPWp0qSFDhig9PV0rVqzQW2+9pY8++kgzZsxwyraBS1Gxe7hp06YpJiZGR48etQ43SxcvE7rttts0ZswYZWVlqU2bNtZZ8c2bN1efPn1cHlunTp3k5+enBx54QKNHj9b58+c1e/ZsnTp1qlTb69Kli37//Xe9/vrr2rlzp81rtWvXVuXKlTV58mSlpKSodevWSkhIUL169XT+/Hnt27dPq1ev1ptvvqlrr71Wffv21YwZM9S3b19NnTpV1113nVavXq0vvvjCrliio6P1r3/9S7169dL1119vvUGNJO3atUsLFiyQYRi6++67ValSJT3zzDMaN26c+vbtqwceeEAnTpzQpEmTFBAQoIkTJ5bq+1GcG264QfXq1dPIkSOVl5ena665RitXrtQ333xj0++TTz5RUlKSevTooVq1askwDK1YsUK///77Fc8NP/nkk3r77bfVtWtXTZ48WVFRUfr000+VlJSkRx555LLzNJxh+PDhGj58+BX7lOQ916RJE61YsUKzZ89WTEyMvLy8FBsbW+K45s+fr3fffVcLFy5Uo0aN1KhRIz322GMaM2aM2rRpU2Q+AuCwspu3B2f686z4S/Xu3duQZDMr3jAM448//jDGjBljREVFGb6+vkbVqlWNRx55xDh16pRNv+JmVxuG7cxke2KZOHGiIck4duyYte3jjz82mjVrZgQEBBjVq1c3Ro0aZXz22Wc2M7QNw75Z8ZIuu/x5lvKxY8eMhIQEIzo62vD19TVCQkKMmJgYY/z48cbZs2et/Q4ePGj84x//MCpUqGBUrFjR+Mc//mFs3LjRrlnxhfbs2WMMHTrUqFOnjuHv728EBgYaDRs2NIYPH15kZvr8+fONpk2bGn5+fkZwcLDRvXt36yz9P38fypcvX2Q/hd/bS78/xf3cfv31V6Nz585GUFCQUblyZePxxx83Pv30U5vv+X/+8x/jgQceMGrXrm0EBgYawcHBRsuWLY3k5OQi+7j0yoT9+/cbvXv3NkJDQw1fX1+jXr16xksvvWTk5+db+xTOHn/ppZeKxCfJmDhxYpH2P7vce+9Sxc1st/c9d/LkSePee+81KlWqZFgsFuv390qxXzorfseOHUZgYGCR79H58+eNmJgYo2bNmkV+3wBHWQzDMK7i5wgAAOBCnGMHAMCDkNgBAPAgJHYAADwIiR0AAA9CYgcAwIOQ2AEA8CBufYOagoICHT58WBUrVnTJLU8BAK5lGIbOnDmjatWqycvLdbXm+fPnlZub6/B2/Pz8ijyr4e/GrRP74cOHFRkZWdZhAAAcdODAAV177bUu2fb58+cVWDFUyjvn8LaqVKmitLS0v3Vyd+vEXrFiRUmSX8N+snj7lXE0gGukr3+5rEMAXOZMVpbqREda/567Qm5urpR3Tv4N+0mO5Ir8XGXuWqTc3FwSu6sUDr9bvP1I7PBYQUFBZR0C4HJX5XSqT4BDucKwuMe0NLdO7AAA2M0iyZEPEG4ylYvEDgAwB4vXxcWR9d2Ae0QJAADsQsUOADAHi8XBoXj3GIsnsQMAzIGheAAA4G6o2AEA5sBQPAAAnsTBoXg3GeR2jygBAIBdqNgBAObAUDwAAB6EWfEAAMDdULEDAMyBoXgAADyISYbiSewAAHMwScXuHh8/AACAXajYAQDmwFA8AAAexGJxMLEzFA8AAK4yKnYAgDl4WS4ujqzvBkjsAABzMMk5dveIEgAA2IWKHQBgDia5jp3EDgAwB4biAQCAu6FiBwCYA0PxAAB4EJMMxZPYAQDmYJKK3T0+fgAAALtQsQMAzIGheAAAPAhD8QAAwN1QsQMATMLBoXg3qYVJ7AAAc2AoHgAAuBsqdgCAOVgsDs6Kd4+KncQOADAHk1zu5h5RAgAAu1CxAwDMwSST50jsAABzMMlQPIkdAGAOJqnY3ePjBwAAsAsVOwDAHBiKBwDAgzAUDwAA3A0VOwDAFCwWiywmqNhJ7AAAUzBLYmcoHgAAD0LFDgAwB8v/FkfWdwMkdgCAKTAUDwAA3A4VOwDAFMxSsZPYAQCmQGIHAMCDmCWxc44dAAAPQsUOADAHLncDAMBzMBQPAAAclpSUpOjoaAUEBCgmJkYbNmy4Yv/FixerWbNmKleunKpWraoBAwboxIkTdu+PxA4AMIWLT221OLCUfJ/Lli3TsGHDNH78eKWmpqpt27bq0qWL0tPTi+3/zTffqG/fvoqPj9fOnTv13nvvaevWrRo0aJDd+ySxAwBMwSJHkrpFllKcZJ8+fbri4+M1aNAgNWjQQDNnzlRkZKRmz55dbP/NmzerZs2aSkhIUHR0tG666SY9/PDD2rZtm937JLEDAFACWVlZNktOTk6x/XJzc7V9+3Z17tzZpr1z587auHFjseu0bt1aBw8e1OrVq2UYho4cOaL3339fXbt2tTs+EjsAwBQcG4b//4l3kZGRCg4Oti6JiYnF7u/48ePKz89XRESETXtERIQyMzOLXad169ZavHixevXqJT8/P1WpUkWVKlXSa6+9ZvdxktgBAOZgccIi6cCBAzp9+rR1GTt27JV3e8nJecMwLjs7f9euXUpISNCECRO0fft2ff7550pLS9OQIUPsPkwudwMAoASCgoIUFBT0l/3CwsLk7e1dpDo/evRokSq+UGJiotq0aaNRo0ZJkpo2bary5curbdu2mjJliqpWrfqX+6ViBwCYg6PD8CWcFu/n56eYmBilpKTYtKekpKh169bFrnPu3Dl5edmmZm9vb0kXK317ULEDAEzB0RvUlGbd4cOHq0+fPoqNjVVcXJzmzp2r9PR069D62LFjdejQIb399tuSpG7dumnw4MGaPXu2brvtNmVkZGjYsGFq2bKlqlWrZtc+SewAAFMoi8Teq1cvnThxQpMnT1ZGRoYaN26s1atXKyoqSpKUkZFhc017//79debMGb3++usaMWKEKlWqpA4dOmjatGn2x2nYW9v/DWVlZSk4OFj+TQbL4u1X1uEALnFq6+tlHQLgMllZWYoIDdbp06ftOm9d2n0EBwcr9MGF8vIrV+rtFOSe04nFA1waqzNQsQMAzIGHwAAA4DnKYii+LDArHgAAD0LFDgAwBbNU7CR2AIApmCWxMxQPAIAHoWIHAJiCWSp2EjsAwBxMcrkbQ/EAAHgQKnYAgCkwFA8AgAchsQMA4EHMktg5xw4AgAehYgcAmINJZsWT2AEApsBQPAAAcDskdpN76L622v3Jszq1eYa+XTxabZrXvmL/h3verNQPntbJTdP148pn1PvOlkX6BFcI1Iynemrvl1N1avMMpX7wtG67qaGrDgG4ojmzk1T/umhVqhCg1i1j9M03G67Yf8PXX6l1yxhVqhCgBnVrad6cNy/bd/myfynQ16L7/tHDyVHDFQordkcWd1DmiT0pKUnR0dEKCAhQTEyMNmy48i8dnOfezi300qh/aNpbX6jVAy9oY+oerXp9qCKrXFNs/8H33aTJj3fT1Dmr1eLeqZry5mrNfKqn7ri5sbWPr4+3Pn3zMUVVC9GDo95Ss7sn69Hnlujw0dNX67AAq/eWL9OoEcM05qnx2rw1Va1vaqsed3ZRenp6sf33paWpR7c71Pqmttq8NVWjx4zTiCcTtHLFB0X67t+/X2PHjFSbm9q6+jDgJBY5mNjd5CR7mSb2ZcuWadiwYRo/frxSU1PVtm1bdely+V86OFfCPzsoedUmJa/cpF/SjmjUyx/oYOYpDb6v+D9Uvbu21FsffKv3v/xe+w6d0HtfbNeiVZs0on8na59+PeJ0TVA59Rw+V5t+3Kv0jFPa+MNe/fTroat1WIDVrJnT1X9AvAbED1L9Bg308vSZujYyUvPmzC62/7y5byqyRg29PH2m6jdooAHxg9Sv/0DNnP6yTb/8/HwN6PugnpkwSdHRta7GoQB2K9PEPn36dMXHx2vQoEFq0KCBZs6cqcjISM2eXfwvHZzH18dbzRtEas2m3TbtazbvVqtm0cWu4+fro/O5F2za/si5oNjGUfLxufhW6npLE23ZkaaZT/XSvn8/r23vjdOogZ3l5eUen3ThOXJzc5X6/XZ17NTZpr3jrZ21edPGYtfZsnmTOt5q2//Wzrfp++3bdOHC/7/3n58yWWGVK6v/wHjnBw6XYSjexXJzc7V9+3Z17mz7S9S5c2dt3Fj8Lx2cJ+yaCvLx8dbRk2ds2o+cOKOI0KBi1/n3pt3q36O1mjeIlCS1aFhDfbu3kp+vj8IqVZAkRVcP1d23Npe3t0V3Pz5b0+Z/oSf6dNSYQbe59oCASxw/flz5+fkKD4+waY+IiNCRI5nFrnPkSKYiImz7h4dHKC8vT8ePH5ckbfz2WyUvfEtJb85zTeBwHYsTFjdQZpe7Ff7SXfpLFBERoczM4n/pcnJylJOTY/06KyvLpTGagWHYfm2xWGRc2vg/ifM+V0RokL5aNFIWi3T05Bm9+9EWjRjQSfn5BZIkLy8vHTt5Ro8+t1QFBYZSdx9Q1crBGta3oxLnfu7qwwGKuLTKMgzjipVXcf0L28+cOaOB/f+ppDfnKSwszPnBAk5Q5texl+SXLjExUZMmTboaYXm846fOKi8vXxGhFW3aw0MqFKniC53PuaAhkxbrsalLFRESpIzjpxX/jzbKOvuHjv+eLUnKPH5aF/LyVVDw/x8O/pOWqaqVg+Xr460LefmuOyjgT8LCwuTt7V2kOj969GiRKr5QRESVIoXFsWNH5ePjo9DQUO3auVP79+3TP3p0s75eUHDxQ22FAB/t2PmLatW+8pUlKDtcx+5ihb90l/4SHT16tEgVX2js2LE6ffq0dTlw4MDVCNUjXcjLV+ruA+rQqr5Ne4dW9bX5x7QrrpuXV6BDR39XQYGh+26L0Wcbdlqrmk0/7FXtyMo2vwDX1QhXxrHTJHVcVX5+fmreIkZr/51i0752TYpaxbUudp0bW8Vp7Rrb/mtSvlSLmFj5+vqqXv362pb6k7Zs+8G6dO12l25p115btv2gayMjXXY8cJxZzrGXWcXu5+enmJgYpaSk6O6777a2p6SkqHv37sWu4+/vL39//6sVoseb9e5avTWlr77fla4tO9IUf08bRVYJ0fz3L15yOPnxu1QtPFiDnnlHklSnRrhiG0dp68/7dE3Fckro00ENa1ezvi5J897boEfuv0WvjL5XSUu/Up0alTUqvrOSln5VJscIc0sYNlzx/fuoRUysbmwVp7fmz9WB9HQNemiIJOmZ8WN1+NAhvZX8tiRp8END9GbS6xo9crgGxg/Wls2blLzwLS16d6kkKSAgQI0aN7bZR6XgSpJUpB1/PxbLxcWR9d1BmQ7FDx8+XH369FFsbKzi4uI0d+5cpaena8iQIWUZlmm8/+X3Cgkur3EPdVGVsCDt/C1DPR5PUnrGKUlSlbAgRVYJsfb39rboiT4dVDcqQhfy8vX1tl/Vvv8rSs84ae1z8Mjv6jb0Db044h5tXT5Wh4/+rjeWrNcrySlF9g+42n09e+nkiRN6fupkZWZkqFGjxlr18WpFRUVJkjIzMnTgwP9fXlszOlqrPl6t0SOe1JzZb6hqtWp6ZcYs3X3PP8rqEIASsxiXmyl1lSQlJenFF19URkaGGjdurBkzZujmm2+2a92srCwFBwfLv8lgWbz9XBwpUDZObX29rEMAXCYrK0sRocE6ffq0goKKvyLHGfsIDg5Wrcffl5d/+VJvpyAnW3tfu9elsTpDmU+eGzp0qIYOHVrWYQAAPJ2DQ/Hucrlbmd9SFgAAOE+ZV+wAAFwNZrncjcQOADAFs8yKZygeAAAPQsUOADAFLy+LQw+kMtzkYVYkdgCAKTAUDwAA3A4VOwDAFJgVDwCABzHLUDyJHQBgCmap2DnHDgCAB6FiBwCYglkqdhI7AMAUzHKOnaF4AAA8CBU7AMAULHJwKN5NnttKYgcAmAJD8QAAwO1QsQMATIFZ8QAAeBCG4gEAgNuhYgcAmAJD8QAAeBCzDMWT2AEApmCWip1z7AAAeBAqdgCAOTg4FO8mN54jsQMAzIGheAAA4Hao2AEApsCseAAAPAhD8QAAwO1QsQMATIGheAAAPAhD8QAAwO1QsQMATMEsFTuJHQBgCpxjBwDAg5ilYuccOwAAHoSKHQBgCgzFAwDgQRiKBwAAboeKHQBgChY5OBTvtEhci4odAGAKXhaLw0tpJCUlKTo6WgEBAYqJidGGDRuu2D8nJ0fjx49XVFSU/P39Vbt2bS1YsMDu/VGxAwDgIsuWLdOwYcOUlJSkNm3aaM6cOerSpYt27dqlGjVqFLtOz549deTIEb311luqU6eOjh49qry8PLv3SWIHAJhCWcyKnz59uuLj4zVo0CBJ0syZM/XFF19o9uzZSkxMLNL/888/11dffaW9e/cqJCREklSzZs0S7ZOheACAKRTOindkKYnc3Fxt375dnTt3tmnv3LmzNm7cWOw6H330kWJjY/Xiiy+qevXqqlu3rkaOHKk//vjD7v1SsQMATMHLcnFxZH1JysrKsmn39/eXv79/kf7Hjx9Xfn6+IiIibNojIiKUmZlZ7D727t2rb775RgEBAVq5cqWOHz+uoUOH6uTJk3afZ6diBwCgBCIjIxUcHGxdihtS/7NLK33DMC5b/RcUFMhisWjx4sVq2bKl7rjjDk2fPl3Jycl2V+1U7AAAc7A4eJOZ/6164MABBQUFWZuLq9YlKSwsTN7e3kWq86NHjxap4gtVrVpV1atXV3BwsLWtQYMGMgxDBw8e1HXXXfeXYVKxAwBMoXDynCOLJAUFBdksl0vsfn5+iomJUUpKik17SkqKWrduXew6bdq00eHDh3X27Flr26+//iovLy9de+21dh0niR0AABcZPny45s+frwULFmj37t168sknlZ6eriFDhkiSxo4dq759+1r79+7dW6GhoRowYIB27dqlr7/+WqNGjdLAgQMVGBho1z4ZigcAmILlf/8cWb+kevXqpRMnTmjy5MnKyMhQ48aNtXr1akVFRUmSMjIylJ6ebu1foUIFpaSk6PHHH1dsbKxCQ0PVs2dPTZkyxe59ktgBAKbgrFnxJTV06FANHTq02NeSk5OLtNWvX7/I8H1JMBQPAIAHoWIHAJiCWR7baldinzVrlt0bTEhIKHUwAAC4SlncUrYs2JXYZ8yYYdfGLBYLiR0AgDJkV2JPS0tzdRwAALiUI49eLVzfHZR68lxubq5++eWXEj1KDgCAsuKsG9T83ZU4sZ87d07x8fEqV66cGjVqZL3+LiEhQS+88ILTAwQAwBmu9tPdykqJE/vYsWP1448/av369QoICLC233rrrVq2bJlTgwMAACVT4svdVq1apWXLlqlVq1Y2n14aNmyoPXv2ODU4AACchVnxl3Hs2DGFh4cXac/OznabYQoAgPkwee4ybrjhBn366afWrwuT+bx58xQXF+e8yAAAQImVuGJPTEzU7bffrl27dikvL0+vvvqqdu7cqU2bNumrr75yRYwAADjMIjnwCBjH1r2aSlyxt27dWt9++63OnTun2rVr68svv1RERIQ2bdqkmJgYV8QIAIDDzDIrvlT3im/SpIkWLVrk7FgAAICDSpXY8/PztXLlSu3evVsWi0UNGjRQ9+7d5ePDM2UAAH9PZfXY1qutxJn4559/Vvfu3ZWZmal69epJkn799VdVrlxZH330kZo0aeL0IAEAcJRZnu5W4nPsgwYNUqNGjXTw4EF9//33+v7773XgwAE1bdpUDz30kCtiBAAAdipxxf7jjz9q27Ztuuaaa6xt11xzjaZOnaobbrjBqcEBAOBMblJ0O6TEFXu9evV05MiRIu1Hjx5VnTp1nBIUAADOxqz4P8nKyrL+//nnn1dCQoKeffZZtWrVSpK0efNmTZ48WdOmTXNNlAAAOIjJc39SqVIlm08qhmGoZ8+e1jbDMCRJ3bp1U35+vgvCBAAA9rArsa9bt87VcQAA4FJmmRVvV2K/5ZZbXB0HAAAuZZZbypb6jjLnzp1Tenq6cnNzbdqbNm3qcFAAAKB0SvXY1gEDBuizzz4r9nXOsQMA/o54bOtlDBs2TKdOndLmzZsVGBiozz//XIsWLdJ1112njz76yBUxAgDgMIvF8cUdlLhiX7t2rT788EPdcMMN8vLyUlRUlDp16qSgoCAlJiaqa9eurogTAADYocQVe3Z2tsLDwyVJISEhOnbsmKSLT3z7/vvvnRsdAABOYpYb1JTqznO//PKLJOn666/XnDlzdOjQIb355puqWrWq0wMEAMAZGIq/jGHDhikjI0OSNHHiRN12221avHix/Pz8lJyc7Oz4AABACZQ4sT/44IPW/zdv3lz79u3Tf/7zH9WoUUNhYWFODQ4AAGcxy6z4Ul/HXqhcuXJq0aKFM2IBAMBlHB1Od5O8bl9iHz58uN0bnD59eqmDAQDAVbil7J+kpqbatTF3OWgAADyVRzwE5r8p0xQUFFTWYQAucU3bp8o6BMBljLycq7YvL5XiUrBL1ncHDp9jBwDAHZhlKN5dPoAAAAA7ULEDAEzBYpG8mBUPAIBn8HIwsTuy7tXEUDwAAB6kVIn9nXfeUZs2bVStWjXt379fkjRz5kx9+OGHTg0OAABn4SEwlzF79mwNHz5cd9xxh37//Xfl5+dLkipVqqSZM2c6Oz4AAJyicCjekcUdlDixv/baa5o3b57Gjx8vb29va3tsbKx++uknpwYHAABKpsST59LS0tS8efMi7f7+/srOznZKUAAAOJtZ7hVf4oo9OjpaP/zwQ5H2zz77TA0bNnRGTAAAOF3h090cWdxBiSv2UaNG6dFHH9X58+dlGIa+++47LV26VImJiZo/f74rYgQAwGHcUvYyBgwYoLy8PI0ePVrnzp1T7969Vb16db366qu6//77XREjAACwU6luUDN48GANHjxYx48fV0FBgcLDw50dFwAATmWWc+wO3XkuLCzMWXEAAOBSXnLsPLmX3COzlzixR0dHX/Ei/b179zoUEAAAKL0SJ/Zhw4bZfH3hwgWlpqbq888/16hRo5wVFwAATsVQ/GU88cQTxba/8cYb2rZtm8MBAQDgCjwEpoS6dOmiDz74wFmbAwAApeC0x7a+//77CgkJcdbmAABwqovPYy992e2xQ/HNmze3mTxnGIYyMzN17NgxJSUlOTU4AACchXPsl9GjRw+br728vFS5cmW1a9dO9evXd1ZcAACgFEqU2PPy8lSzZk3ddtttqlKliqtiAgDA6Zg8VwwfHx898sgjysnJcVU8AAC4hMUJ/9xBiWfF33jjjUpNTXVFLAAAuExhxe7I4g5KfI596NChGjFihA4ePKiYmBiVL1/e5vWmTZs6LTgAAFAydif2gQMHaubMmerVq5ckKSEhwfqaxWKRYRiyWCzKz893fpQAADjILOfY7U7sixYt0gsvvKC0tDRXxgMAgEtYLJYrPuvEnvXdgd2J3TAMSVJUVJTLggEAAI4p0Tl2d/m0AgDApRiKL0bdunX/MrmfPHnSoYAAAHAF7jxXjEmTJik4ONhVsQAAAAeVKLHff//9Cg8Pd1UsAAC4jJfF4tBDYBxZ92qyO7Fzfh0A4M7Mco7d7jvPFc6KBwAAf192V+wFBQWujAMAANdycPKcm9wqvuT3igcAwB15yeLwUhpJSUmKjo5WQECAYmJitGHDBrvW+/bbb+Xj46Prr7++RPsjsQMATKHwcjdHlpJatmyZhg0bpvHjxys1NVVt27ZVly5dlJ6efsX1Tp8+rb59+6pjx44l3ieJHQAAF5k+fbri4+M1aNAgNWjQQDNnzlRkZKRmz559xfUefvhh9e7dW3FxcSXeJ4kdAGAKznpsa1ZWls2Sk5NT7P5yc3O1fft2de7c2aa9c+fO2rhx42XjXLhwofbs2aOJEyeW7jhLtRYAAG6m8Dp2RxZJioyMVHBwsHVJTEwsdn/Hjx9Xfn6+IiIibNojIiKUmZlZ7Dr//e9/9dRTT2nx4sXy8Snxk9UlleJ57AAAmNmBAwcUFBRk/drf3/+K/S+9D0zhY84vlZ+fr969e2vSpEmqW7duqeMjsQMATMFZ94oPCgqySeyXExYWJm9v7yLV+dGjR4tU8ZJ05swZbdu2TampqXrsscckXbzU3DAM+fj46Msvv1SHDh3+cr8kdgCAKXjJwVvKlvByNz8/P8XExCglJUV33323tT0lJUXdu3cv0j8oKEg//fSTTVtSUpLWrl2r999/X9HR0Xbtl8QOAICLDB8+XH369FFsbKzi4uI0d+5cpaena8iQIZKksWPH6tChQ3r77bfl5eWlxo0b26wfHh6ugICAIu1XQmIHAJhCWTy2tVevXjpx4oQmT56sjIwMNW7cWKtXr1ZUVJQkKSMj4y+vaS9xnIYb3wQ+KytLwcHBSs88adf5DsAdVekwvqxDAFzGyMtRzraZOn36tMv+jhfmiqS1PyuwQsVSb+ePs2c0tENjl8bqDFzuBgCAB2EoHgBgChaLxaFHkLvL48tJ7AAAU7DIsQe0uUdaJ7EDAEziz3ePK+367oBz7AAAeBAqdgCAabhHze0YEjsAwBTK4jr2ssBQPAAAHoSKHQBgClzuBgCAB/GSY8PU7jLE7S5xAgAAO1CxAwBMgaF4AAA8iFnuPMdQPAAAHoSKHQBgCgzFAwDgQcwyK57EDgAwBbNU7O7yAQQAANiBih0AYApmmRVPYgcAmAIPgQEAAG6Hih0AYApessjLgQF1R9a9mkjsAABTYCgeAAC4HSp2AIApWP73z5H13QGJHQBgCgzFAwAAt0PFDgAwBYuDs+IZigcA4G/ELEPxJHYAgCmYJbFzjh0AAA9CxQ4AMAUudwMAwIN4WS4ujqzvDhiKBwDAg1CxAwBMgaF4AAA8CLPiAQCA26FiBwCYgkWODae7ScFOYgcAmAOz4gEAgNshsZvc/Dmz1bRBHUVcU163tG6pjd9uuGL/bzZ8pVtat1TENeXVrOF1WjBvjs3rXW/roErlfIosPe/u5srDAC7roXtaafcHo3Vq/XP6duFjatOs5hX7P/yPVkpdOlwn1z+nH/81Qr27tLB5vfstjfTNgseU8eVEHV87WZsXJeiB25u78AjgLBYn/HMHZZrYv/76a3Xr1k3VqlWTxWLRqlWryjIc01nx/nKNHT1cI0eP1debtimuzU26r8edOnAgvdj++/alqefd3RTX5iZ9vWmbRox6SmNGDtOHq1ZY+7y79H39svegddm07Ud5e3ur+z33Xq3DAqzu7dhULw27U9OS16lVv1na+OM+rZo+QJERwcX2H3z3jZr8yO2aOv/fatF7hqbMT9HMEd11x00NrH1OZv2hFxetU7vBSbqhz0y98+l2zR1/r2698bqrdVgopcJZ8Y4s7qBME3t2draaNWum119/vSzDMK03Zs1Qn34D1XdAvOrVb6AXXpqu6tdGasG8N4vtv3D+HF0bWUMvvDRd9eo3UN8B8fpn3wF6feYr1j7XhIQookoV67Ju7b9Vrlw59SCxowwkPHCTkj/epuSPt+qX/cc0auYnOnj0tAbf06rY/r27tNBbq7bo/TU7tO/wSb337x1a9MlWjfjnLdY+G1L36qOvduqX/ceUduik3lj+rX7ak6nWfzESgLJnccLiDso0sXfp0kVTpkzRPffcU5ZhmFJubq5+SP1e7Tt2smlv37GTtmzeVOw6323ZXKR/h1s7K/X77bpw4UKx67y7aKHuubeXypcv75zAATv5+nireb3qWvPdf23a12z5r1o1iSp2HT9fb53PzbNp+yPngmIbXisf7+L/XLaLra26NSrrm9Q05wQOOMitZsXn5OQoJyfH+nVWVlYZRuPeThw/rvz8fIVHhNu0h4eH6+iRI8Wuc/TIEYWHX9I/Ilx5eXk6cfy4qlStavPa9q3fadfOn/Va0lznBg/YIaxSOfn4eOvoyTM27UdOnVFESN1i1/n3lv+qf7cb9PFXu5T6yyG1qF9dfe+MlZ+vj8IqlVfmiYvbCirvrz0fjZO/n4/y8wv0xMsfau3W31x+THCMlyzycmA83ctNana3SuyJiYmaNGlSWYfhUSyXvMkNwyjS9lf9i2uXpHcWLVTDRo0Vc0NLJ0QKlM7/3qJWFllkyCi2b+LCNYoIraiv5g+VRdLRU2f17qfbNaJPO+UXFFj7nTmXqxv7zVKFQD+1j62jaQldlXbopDak7nXhkcBRjg6nu0dad7NZ8WPHjtXp06ety4EDB8o6JLcVGhYmb29vHcm0rc6PHTumypdU5YXCIyJ05JJq/tjRY/Lx8VFIaKhN+7lz57Ti/WXq03+gcwMH7HT893PKy8tXRGhFm/bwayro6Mmzxa5zPidPQ6a+r5B2z6j+PdN0XY8XtD/jlLKyz+v47+es/QzD0N6DJ7Tjvxl6dekGrVz3s0b1befKwwHs5laJ3d/fX0FBQTYLSsfPz0/XN2+h9Wv/bdO+fu2/dWOruGLXaXljqyL9161JUfMWMfL19bVpX/nBe8rJyVGv+x90buCAnS7k5Sv1l0PqcEMdm/YOLeto80/7r7huXn6BDh3LUkGBofs6NdNn3/7HOjpVHItF8vdzqwFQczLJ7DneiSb2aMKTeji+n65vEaOWN7ZS8oJ5OnggXQMGPSxJmjRhnA4fPqw585MlSQMGPax5byZp3JgR6jdgkL7bslnvLFqg+YsWF9n2u4sWqGu37kUqeeBqmrX0G701sae+/88hbflpv+J73KjIiEqav3KLJGnyI7epWuVgDZq8XJJUJzJMsQ2v1dadB3RNUKAS7m+rhrUiNGjye9ZtjuzbTt/vPqi9h07Kz9dbt8fV04NdWijhxVVlcYgoAZ7udhWcPXtWv/32/xNO0tLS9MMPPygkJEQ1atQow8jM4Z57e+rkiRN6MXGKjmRmqEHDxlq+8mPVqHFxxnBmZqYO/uma9po1o7V85ccaN3qk5s+ZrSpVq2nayzPVvYftVQ2//fdXbdr4rVZ+/NlVPR7gUu+v2aGQ4HIaN7CjqoRW1M69meoxIlnpmb9LkqqEBikyopK1v7eXRU/0vll1a4TpQl6Bvt6+R+0fmq30zFPWPuUD/PTqqB6qHh6sP3Iu6Nf9xzTw2WV6f82Oq3x0QPEsxpXGl1xs/fr1at++fZH2fv36KTk5+S/Xz8rKUnBwsNIzTzIsD49VpcP4sg4BcBkjL0c522bq9OnTLvs7Xpgr1vyQrgoVS7+Ps2ey1PH6Gi6N1RnKtGJv167dFc9bAQDgLMyKBwAAbofJcwAAczBJyU5iBwCYArPiAQDwII4+oY2nuwEAgKuOih0AYAomOcVOYgcAmIRJMjtD8QAAeBAqdgCAKTArHgAAD8KseAAA4Hao2AEApmCSuXMkdgCASZgkszMUDwCAB6FiBwCYArPiAQDwIGaZFU9iBwCYgklOsXOOHQAAV0pKSlJ0dLQCAgIUExOjDRs2XLbvihUr1KlTJ1WuXFlBQUGKi4vTF198UaL9kdgBAOZgccJSQsuWLdOwYcM0fvx4paamqm3bturSpYvS09OL7f/111+rU6dOWr16tbZv36727durW7duSk1Ntf8wDcMwSh7q30NWVpaCg4OVnnlSQUFBZR0O4BJVOowv6xAAlzHycpSzbaZOnz7tsr/jhbli8+7DqlCx9Ps4eyZLrRpUK1GsN954o1q0aKHZs2db2xo0aKAePXooMTHRrm00atRIvXr10oQJE+zqT8UOAEAJZGVl2Sw5OTnF9svNzdX27dvVuXNnm/bOnTtr48aNdu2roKBAZ86cUUhIiN3xkdgBAKZQOCvekUWSIiMjFRwcbF0uV3kfP35c+fn5ioiIsGmPiIhQZmamXTG/8sorys7OVs+ePe0+TmbFAwBMwVmz4g8cOGAzFO/v73/l9S65Ts4wjCJtxVm6dKmeffZZffjhhwoPD7c7ThI7AAAlEBQUZNc59rCwMHl7exepzo8ePVqkir/UsmXLFB8fr/fee0+33nprieJjKB4AYA5XeVa8n5+fYmJilJKSYtOekpKi1q1bX3a9pUuXqn///lqyZIm6du1asp2Kih0AYBJlcUvZ4cOHq0+fPoqNjVVcXJzmzp2r9PR0DRkyRJI0duxYHTp0SG+//baki0m9b9++evXVV9WqVStrtR8YGKjg4GC79kliBwDARXr16qUTJ05o8uTJysjIUOPGjbV69WpFRUVJkjIyMmyuaZ8zZ47y8vL06KOP6tFHH7W29+vXT8nJyXbtk8QOADCFsrpX/NChQzV06NBiX7s0Wa9fv750O/kTEjsAwBTMcq94EjsAwBxMktmZFQ8AgAehYgcAmEJZzIovCyR2AIA5ODh5zk3yOkPxAAB4Eip2AIApmGTuHIkdAGASJsnsDMUDAOBBqNgBAKbArHgAADxIWd1S9mpjKB4AAA9CxQ4AMAWTzJ0jsQMATMIkmZ3EDgAwBbNMnuMcOwAAHoSKHQBgChY5OCveaZG4FokdAGAKJjnFzlA8AACehIodAGAKZrlBDYkdAGAS5hiMZygeAAAPQsUOADAFhuIBAPAg5hiIZygeAACPQsUOADAFhuIBAPAgZrlXPIkdAGAOJjnJzjl2AAA8CBU7AMAUTFKwk9gBAOZglslzDMUDAOBBqNgBAKbArHgAADyJSU6yMxQPAIAHoWIHAJiCSQp2EjsAwByYFQ8AANwOFTsAwCQcmxXvLoPxJHYAgCkwFA8AANwOiR0AAA/CUDwAwBTMMhRPYgcAmIJZbinLUDwAAB6Eih0AYAoMxQMA4EHMcktZhuIBAPAgVOwAAHMwSclOYgcAmAKz4gEAgNuhYgcAmAKz4gEA8CAmOcVOYgcAmIRJMjvn2AEA8CBU7AAAUzDLrHgSOwDAFJg85wYMw5AknTmTVcaRAK5j5OWUdQiAyxj5F9/fhX/PXSkry7Fc4ej6V4tbJ/YzZ85IkhpdV7NsAwEAOOTMmTMKDg52ybb9/PxUpUoVXRcd6fC2qlSpIj8/PydE5ToW42p8THKRgoICHT58WBUrVpTFXcZI3FxWVpYiIyN14MABBQUFlXU4gFPx/r76DMPQmTNnVK1aNXl5uW4+9/nz55Wbm+vwdvz8/BQQEOCEiFzHrSt2Ly8vXXvttWUdhikFBQXxhw8ei/f31eWqSv3PAgIC/vYJ2Vm43A0AAA9CYgcAwIOQ2FEi/v7+mjhxovz9/cs6FMDpeH/DE7j15DkAAGCLih0AAA9CYgcAwIOQ2AEA8CAkdgAAPAiJHXZLSkpSdHS0AgICFBMTow0bNpR1SIBTfP311+rWrZuqVasmi8WiVatWlXVIQKmR2GGXZcuWadiwYRo/frxSU1PVtm1bdenSRenp6WUdGuCw7OxsNWvWTK+//npZhwI4jMvdYJcbb7xRLVq00OzZs61tDRo0UI8ePZSYmFiGkQHOZbFYtHLlSvXo0aOsQwFKhYodfyk3N1fbt29X586dbdo7d+6sjRs3llFUAIDikNjxl44fP678/HxFRETYtEdERCgzM7OMogIAFIfEDrtd+mhcwzB4XC4A/M2Q2PGXwsLC5O3tXaQ6P3r0aJEqHgBQtkjs+Et+fn6KiYlRSkqKTXtKSopat25dRlEBAIrjU9YBwD0MHz5cffr0UWxsrOLi4jR37lylp6dryJAhZR0a4LCzZ8/qt99+s36dlpamH374QSEhIapRo0YZRgaUHJe7wW5JSUl68cUXlZGRocaNG2vGjBm6+eabyzoswGHr169X+/bti7T369dPycnJVz8gwAEkdgAAPAjn2AEA8CAkdgAAPAiJHQAAD0JiBwDAg5DYAQDwICR2AAA8CIkdAAAPQmIHHPTss8/q+uuvt37dv3//MnmW9759+2SxWPTDDz9ctk/NmjU1c+ZMu7eZnJysSpUqORybxWLRqlWrHN4OgL9GYodH6t+/vywWiywWi3x9fVWrVi2NHDlS2dnZLt/3q6++avfdyuxJxgBQEtwrHh7r9ttv18KFC3XhwgVt2LBBgwYNUnZ2tmbPnl2k74ULF+Tr6+uU/QYHBztlOwBQGlTs8Fj+/v6qUqWKIiMj1bt3bz344IPW4eDC4fMFCxaoVq1a8vf3l2EYOn36tB566CGFh4crKChIHTp00I8//miz3RdeeEERERGqWLGi4uPjdf78eZvXLx2KLygo0LRp01SnTh35+/urRo0amjp1qiQpOjpaktS8eXNZLBa1a9fOut7ChQvVoEEDBQQEqH79+kpKSrLZz3fffafmzZsrICBAsbGxSk1NLfH3aPr06WrSpInKly+vyMhIDR06VGfPni3Sb9WqVapbt64CAgLUqVMnHThwwOb1jz/+WDExMQoICFCtWrU0adIk5eXllTgeAI4jscM0AgMDdeHCBevXv/32m5YvX64PPvjAOhTetWtXZWZmavXq1dq+fbtatGihjh076uTJk5Kk5cuXa+LEiZo6daq2bdumqlWrFkm4lxo7dqymTZumZ555Rrt27dKSJUusz7H/7rvvJEn//ve/lZGRoRUrVkiS5s2bp/Hjx2vq1KnavXu3nn/+eT3zzDNatGiRJCk7O1t33nmn6tWrp+3bt+vZZ5/VyJEjS/w98fLy0qxZs/Tzzz9r0aJFWrt2rUaPHm3T59y5c5o6daoWLVqkb7/9VllZWbr//vutr3/xxRf65z//qYSEBO3atUtz5sxRcnKy9cMLgKvMADxQv379jO7du1u/3rJlixEaGmr07NnTMAzDmDhxouHr62scPXrU2mfNmjVGUFCQcf78eZtt1a5d25gzZ45hGIYRFxdnDBkyxOb1G2+80WjWrFmx+87KyjL8/f2NefPmFRtnWlqaIclITU21aY+MjDSWLFli0/bcc88ZcXFxhmEYxpw5c4yQkBAjOzvb+vrs2bOL3dafRUVFGTNmzLjs68uXLzdCQ0OtXy9cuNCQZGzevNnatnv3bkOSsWXLFsMwDKNt27bG888/b7Odd955x6hatar1a0nGypUrL7tfAM7DOXZ4rE8++UQVKlRQXl6eLly4oO7du+u1116zvh4VFaXKlStbv96+fbvOnj2r0NBQm+388ccf2rNnjyRp9+7dRZ5BHxcXp3Xr1hUbw+7du5WTk6OOHTvaHfexY8d04MABxcfHa/Dgwdb2vLw86/n73bt3q1mzZipXrpxNHCW1bt06Pf/889q1a5eysrKUl5en8+fPKzs7W+XLl5ck+fj4KDY21rpO/fr1ValSJe3evVstW7bU9u3btXXrVpsKPT8/X+fPn9e5c+dsYgTgeiR2eKz27dtr9uzZ8vX1VbVq1YpMjitMXIUKCgpUtWpVrV+/vsi2SnvJV2BgYInXKSgokHRxOP7GG2+0ec3b21uSZDjhacv79+/XHXfcoSFDhui5555TSEiIvvnmG8XHx9ucspAuXq52qcK2goICTZo0Sffcc0+RPgEBAQ7HCaBkSOzwWOXLl1edOnXs7t+iRQtlZmbKx8dHNWvWLLZPgwYNtHnzZvXt29fatnnz5stu87rrrlNgYKDWrFmjQYMGFXndz89P0sUKt1BERISqV6+uvXv36sEHHyx2uw0bNtQ777yjP/74w/rh4UpxFGfbtm3Ky8vTK6+8Ii+vi9Ntli9fXqRfXl6etm3bppYtW0qSfvnlF/3++++qX7++pIvft19++aVE32sArkNiB/7n1ltvVVxcnHr06KFp06apXr16Onz4sFavXq0ePXooNjZWTzzxhPr166fY2FjddNNNWrx4sXbu3KlatWoVu82AgACNGTNGo0ePlp+fn9q0aaNjx45p586dio+PV3h4uAIDA/X555/r2muvVUBAgIKDg/Xss88qISFBQUFB6tKli3JycrRt2zadOnVKw4cPV+/evTV+/HjFx8fr6aef1r59+/Tyyy+X6Hhr166tvLw8vfbaa+rWrZu+/fZbvfnmm0X6+fr66vHHH9esWbPk6+urxx57TK1atbIm+gkTJujOO+9UZGSk7rvvPnl5eWnHjh366aefNGXKlJL/IAA4hFnxwP9YLBatXr1aN998swYOHKi6devq/vvv1759+6yz2Hv16qUJEyZozJgxiomJ0f79+/XII49ccbvPPPOMRowYoQkTJqhBgwbq1auXjh49Kuni+etZs2Zpzpw5qlatmrp37y5JGjRokObPn6/k5GQ1adJEt9xyi5KTk62Xx1WoUEEff/yxdu3apebNm2v8+PGaNm1aiY73+uuv1/Tp0zVt2jQ1btxYixcvVmJiYpF+5cqV05gxY9S7d2/FxcUpMDBQ//rXv6yv33bbbfrkk0+UkpKiG264Qa1atdL06dMVFRVVongAOIfFcMbJOgAA8LdAxQ4AgAchsQMA4EFI7AAAeBASOwAAHoTEDgCAByGxAwDgQUjsAAB4EBI7AAAehMQOAIAHIbEDAOBBSOwAAHgQEjsAAB7k/wBOlcO9LR7lIQAAAABJRU5ErkJggg==", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAHFCAYAAAAe+pb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAACwZ0lEQVR4nOzdd1gUVxcH4N/SexWkiFhRLIBdrLGiqLHEChYUNZbYMBpjwdiN0WjsilGMYokaEzWiErtgF9AINkRBiigoKJ3d8/3Bx8R1FwQFh3Le5+FJ9s7cmTO76+7ZO7dIiIjAGGOMMVYBqYgdAGOMMcaYWDgRYowxxliFxYkQY4wxxiosToQYY4wxVmFxIsQYY4yxCosTIcYYY4xVWJwIMcYYY6zC4kSIMcYYYxUWJ0KMMcYYq7A4EWKMyfH19YVEIhH+1NTUYGlpicGDB+Phw4dK62RnZ2PTpk1wdnaGoaEhtLW1YW9vj1mzZiExMVFpHZlMhl27dqFz586oVKkS1NXVYW5ujp49e+Lo0aOQyWQfjDUzMxPr169HmzZtYGxsDA0NDVhbW2PgwIE4f/78Jz0PjLGKgRMhxphSO3bswOXLl/HPP//gm2++wZEjR9CmTRu8evVKbr+0tDR06dIFkyZNQqNGjbB3714cP34cw4YNw9atW9GoUSPcv39frk5GRgZcXV0xYsQImJubY9OmTThz5gw2b94MKysrDBgwAEePHi0wvpcvX6J169bw8vJCgwYN4Ovri9OnT2PVqlVQVVVFp06dEBoaWuzPC2OsnCHGGHvHjh07CABdv35drnzBggUEgLZv3y5XPnbsWAJA+/btUzjW/fv3ydDQkOrXr085OTlC+fjx4wkA7dy5U2kMDx48oNDQ0ALj7N69O6mpqdHp06eVbr927Ro9ffq0wGMUVlpaWrEchzFW+nCLEGOsUJo2bQoAeP78uVAWHx+P7du3w8XFBYMGDVKoY2dnh++++w53797Fn3/+KdTZtm0bXFxcMHz4cKXnql27NhwcHPKN5ebNm/D394enpyc6duyodJ9mzZqhatWqAIAffvgBEolEYZ+824BPnjwRyqpVq4aePXvijz/+QKNGjaClpYUFCxagUaNGaNu2rcIxpFIprK2t0a9fP6EsKysLixcvRt26daGpqQkzMzOMHDkSL168kKt75swZfPHFFzA1NYW2tjaqVq2Kr776CmlpafleO2OseKmJHQBjrGyIjIwEkJvc5Dl79ixycnLQp0+ffOv16dMHs2fPRkBAAL766iucPXsW2dnZBdb5kFOnTgnHLgm3bt1CeHg45s6di+rVq0NXVxdWVlaYMmUKHj58iNq1a8vFEhsbi5EjRwLI7fvUu3dvXLx4ETNnzkSrVq3w9OlTzJ8/H1988QVu3LgBbW1tPHnyBD169EDbtm2xfft2GBkZISYmBidOnEBWVhZ0dHRK5NoYY/I4EWKMKSWVSpGTk4OMjAwEBgZi8eLFaNeuHb788kthn6ioKABA9erV8z1O3ra8fQtT50OK4xgFSUhIQFhYmFzSV6NGDcyYMQO+vr5YsmSJUO7r64vKlSuje/fuAIDff/8dJ06cwKFDh+RaiRwdHdGsWTP4+vpi/PjxuHnzJjIyMvDTTz/B0dFR2M/Nza1ErokxphzfGmOMKdWyZUuoq6tDX18f3bp1g7GxMf766y+oqX3c7ydlt6ZKKwcHB7kkCABMTU3Rq1cv7Ny5UxjR9urVK/z1118YPny48LwcO3YMRkZG6NWrF3JycoQ/JycnWFhY4Ny5cwAAJycnaGhoYOzYsdi5cyceP378Wa+RMZaLEyHGmFK//fYbrl+/jjNnzuDrr79GeHg4hgwZIrdPXh+cvNtmyuRts7GxKXSdDymOYxTE0tJSafmoUaMQExODgIAAAMDevXuRmZkJDw8PYZ/nz5/j9evX0NDQgLq6utxffHw8Xr58CQCoWbMm/vnnH5ibm2PixImoWbMmatasiV9++aVErokxphwnQowxpezt7dG0aVN06NABmzdvxujRo3HixAkcPHhQ2KdDhw5QU1MTOkIrk7etS5cuQh11dfUC63yIi4uL3LE/REtLC0DuvEPvyktK3pdf65WLiwusrKywY8cOALlTDLRo0QL16tUT9qlUqRJMTU1x/fp1pX8bN24U9m3bti2OHj2K5ORkXLlyBc7Ozpg6dSr27dtXqOtijBUDsYetMcZKl/yGzyclJZGxsTHZ29uTVCoVykti+PyjR48+efj89evXheHze/fuJQB07do1uX3atWtHACgyMlIos7W1pR49euR73u+++440NTXpwoULBIC2bNkit3337t0EgK5cuVJg/Mq8fv2aANCMGTOKXJcx9nG4szRjrFCMjY3x/fffY+bMmdizZw+GDh0KAPj5559x//59DB06FBcuXECvXr2gqamJK1euYOXKldDX18ehQ4egqqoqHOvnn3/G48eP4eHhgZMnT6Jv376oXLkyXr58iYCAAOzYsQP79u0rcAj9b7/9hm7duqF79+4YNWoUunfvDmNjY8TFxeHo0aPYu3cvbt68iapVq8LV1RUmJibw9PTEwoULoaamBl9fX0RHRxf5eRg1ahR+/PFHuLm5QVtbW2HagMGDB8PPzw+urq6YMmUKmjdvDnV1dTx79gxnz55F79690bdvX2zevBlnzpxBjx49ULVqVWRkZGD79u0AgM6dOxc5LsbYRxI7E2OMlS75tQgREaWnp1PVqlWpdu3aci08WVlZtGHDBmrRogXp6emRpqYm1alTh2bOnEkvX75Uep6cnBzauXMndezYkUxMTEhNTY3MzMyoe/futGfPHrlWp/ykp6fT2rVrydnZmQwMDEhNTY2srKyoX79+9Pfff8vte+3aNWrVqhXp6uqStbU1zZ8/n7Zt21bkFiEiolatWhEAcnd3V7o9OzubVq5cSY6OjqSlpUV6enpUt25d+vrrr+nhw4dERHT58mXq27cv2drakqamJpmamlL79u3pyJEjH7xuxljxkRARiZyLMcYYY4yJgjtLM8YYY6zC4kSIMcYYYxUWJ0KMMcYYq7A4EWKMMcZYhcWJEGOMMcYqLE6EGGOMMVZhVbgJFWUyGWJjY6Gvr1+mFoFkjDHGKjIiwps3b2BlZQUVleJrx6lwiVBsbKyw+CNjjDHGypbo6GhUqVKl2I5X4RIhfX19ALlPpIGBgcjRMMYYY6wwUlJSYGNjI3yPF5cKlwjl3Q4zMDDgRIgxxhgrY4q7Wwt3lmaMMcZYhcWJEGOMMcYqLE6EGGOMMVZhcSLEGGOMsQqLEyHGGGOMVVicCDHGGGOswuJEiDHGGGMVFidCjDHGGKuwOBFijDHGWIXFiRBjjDHGKixRE6ELFy6gV69esLKygkQiwZ9//vnBOufPn0eTJk2gpaWFGjVqYPPmzSUfKGOMMcbKJVHXGktNTYWjoyNGjhyJr7766oP7R0ZGwtXVFWPGjMHu3bsRGBiICRMmwMzMrFD1WemSkZGDrCypQrmBgeYH6759mwWZjOTK1NRUoKOjXmA9mYzw9m2WQrmmpio0NQv+55CVJUVGRo5Cua6uOlRVC/5NkZaWjZwcmVyZRALo63/4WlNSMhXK1NVVoK1d8LXm5MiQlpatUK6trQZ1ddUC6+b32ujra3xwnR9lr42qqgS6uhoF1iMivHnzca9NdrYU6ekf99qkp2cjO1umUF6Y9+GbN5kg+Ust1PtQKpUhNVXxtdHSUoOGRsGvTWZmDjIzFV8bPT0NqKgU/NqkpmZBKpUPWEVFAj29j3ttNDRUoaX1ca+Njo461NQKfm34M4I/I95F7/9jKyaiJkLdu3dH9+7dC73/5s2bUbVqVaxZswYAYG9vjxs3bmDlypWcCImIiPD6dQaePk1GVFTu38SJzT74j+G77wKwdu01uTIVFQmkK12BU5HyO2urA3/0ER42bboV9+8nyu3SvXstHDcxBBLT5eu2qQLMcQYAREcno1q1XxRiWePdHlOuvVAMckpToFt1AMCOHcEYN+5vhV3Cf+mBuv6RCuX4rQdgpgMAGDr0Dxw+fE9uc9Wqhng6tAlwK16+XlUDYIuL8NDScpXCh9XYsY2xJSpD8Zx97ICvHQEAV648Q9u2OxR2OfRjF/Q7G6NYd1FboKkFAGDJkgtYvPiiwi5vf+kJXf/HinX9Bwj/27HjTly/Hiu3uXVrG1yqbw1EpcjXa2wBLGkLAHj9OgMmJisUDr3wu9aYF/pK8ZxjHIF+drnXdCgcQ4YcUtjl6hpXND/xRLHupq5ANUMAwIQJx+HrGyK32chIC6++bQdceiZfz1Qb2N1TeGhntx7x8W/ldhk0qD72ZakB6e99wXStDkxrCgC4e/cFHB0VW7O3L+6IkZfiFcrxfUugnQ0AYO3aq5g58x+FXWLX9oDlcSXvw4N9AN3cL8Tevffh9Gn5fRo2NMftTnWAe/L/nlDXFFjdEUBucmBouFzh0N9OaYGf7r9RPKdbPWBYfQBAQMBj9OixR2GXgFUu6BwQrVh3VQegXiUA/BnBnxFQ+IwoCWVq9fnLly+ja9eucmUuLi749ddfkZ2dDXX1grNfVvzWrr2KOXPOKPyCGjiwPszNdT/uoE9TgBvP5cv0Cvna3n4BxKXKl1nqfbhetkzxnADwIu3DdRPTlNdV0sqg4EGSYt03ir/QlFJ2zv9/SBUoPUd5XSW/KhXEvFFetzDuJgL3k+TLPtASAQCQkvJzfpmqWPa+VxnK6yr51a4g4rViXctCvqeDnwNv33sd65h+uF6mVHm8rwrx2jxPVV5XVohf0fcSP+51JSiv16Hqh+u+zVZeV0lLmQL+jPhwPaDcfUbcu3f34477AWWqs3R8fDwqV64sV1a5cmXk5OTg5cuXSutkZmYiJSVF7o99WFJSulzT5YEDB2Bvb48qVarI/c2fP1tpM3KDBq1RpUoVDLJqjZvm3+N6je9wpem3uNx8Oi43nw7HNVWw9uE6hXoykuLXyF8VylNy3kD1gKrwd//NfYV9/OP98Sxd8VfMwWcHhXrV/66h9HqX3/tRafnYm2OFuuNvjle6z4/3FFsxAKDqsapC3cMxhxW2R6U9xannpxTK76bclbvWNKniF/7Wxz5Kz7n6wRqhXruz7ZTusyhssdLy7hddhbqLw5Yo3Wfdo/VKy9+N93rSdYXtgS8DcTdF8UPs1PNTQj3TPyspPfbKB6uUlk8P/Vao63bFTek+S+8tVVre8KSDUNf3ia/C9tfZr3Dw2UGF8mfpMXLXGp8Rp7DP/uj9SMlRbCX5NfJXoZ7TKSelcS0JVx7voCuDhLrf3f5O6T6r7q9WWm502EioezrhtML2O8l3cDnxikL55cQrQj31g8oT1rUP1yotX3B3oVC358WeSvdZHK78fdj6TBuhLn9G8GfEqeenoLJTBZJ2EnTr1knpsT9VmWoRAqBwuyXvnmF+t2GWLVuGBQsWlHhcZZ1MRrh1Kw7+/g/h7/8IV6/G4MoVTzRrZg0A8Pb2xr1795TU1FJ6vBcvsgDEQKZhhiYG1sAb5P79n46mOqCl5JeqBCBNZeUEmck7v56U3b5WB6CiWJc08F/dfH6BkbJYAMh03zmvrvJ9SDufukbv1FX2PaICkLIfsaqQv1Zlb21lzxFyr0Ooq1/EePVl/9XNZx/Kp7uCXLzKPlXUoPQ1I/V36qoX7bUhnXeuVS+/a80nXsN3rlXp+y33faNA5b33obKfkhoAJEreh5rv1H1dxNdGrxCvTX7vYWOCTOf/dZW9Nqr5lKu989pIlX++kvJ//iDtD78PZfnFa/DOtfJnRIX/jPg3NQ40iwAld4yLS5lKhCwsLBAfL/9sJCQkQE1NDaamypudv//+e3h5eQmPU1JSYGNjU6JxljW3bz9Hly67kJAg/6vC3/+RkAi9eZObxaioqMDS0lLYJztbEwkJisc0MLCFvn4KTKQmgJK7ECopKkCGkn+9BEgylZVLoJL037eOTLGPHpANQKZYV5IFoS69VoGyf74SZbEAUEn977yUKlFeNz2fuq8lUJHk1pUpNpoBMkCirIVbCvlrVXZSZc8Rcq9DiPdNEeN9oyLUleWzjySflnG5eJXddcqB0qRBkv3Oa/O2aK+NJO2da32b37XmE2+yClR0/3+tSt9vue8bBbL33ofKvjOzAJCS92HmO/EmF/G1efvfa0Pp+dTN7z38SgKVjP9fq7LXRgql/0aR885rI8vvnEpPCUn6h9+HKvnFm/LO+5A/Iyr0Z0QmZWPZv6dy/00Bub+7S6CbUJlKhJydnXH06FG5slOnTqFp06b59g/S1NSEpuaHe92XR5mZOYiNfQNLS335kR3no4G94cDTZABAbakMKS8V73P7+z+Ct3d7ubIaFlXxsOF/zbxpUhl0AxRbikaOnII1a7oBfz0CRp9Q2B7Y8RL8Hd7ianP5ZmqJBPA06AVkyXcsNNBWh3TAf59s659fw8v3Yq5duy+q+EcpdITs32Y0pANym9Jfv87AmgTF2wDdHAcB6XcUyrd2242t/+8IeaN6LI5ZPVDY55tq/YAUxecgqu8zoSPkftm/CA+Xv31raKiJri9zAF355L5+1dqQDviv2XnJgwsKo5qaNBkMbLytcM5pHb0xbUBup+GoqGRslwYr7DOo/kAg5V+Fcv8ep4X+A2dMI3Gh4VOFfbzMvgTSHimUv/va+Ly+iZgY+VtDVasaov7VBIWOkF0bO0I6YAOA3NFbPz4NVDh2h0aDgSzFa13VdRNW/b+z9L/2CThoFKawz5ha/YFkxfI7vcOEztJ/qt9DSHv510BLSw39paqAqnxn6Sqm1eSudeXTIIVbww0aDIDBngcKnaU9v5gGzwF+AIDnz99i09sbCnH1bjgQeKv42uzv/qfQWTrQKgoBNRU7o0636g28fahQ/vqrZKGz9G/poXj8WL7jeeXKunB+8AYwle9Y7Fy3HqQDcm9LymSEheHnFY7dqvFggBRfm/mdV2D+gNxOww+dEuGnqfhva3jd/sBrxdcmsNdVobO0v95D/oyowJ8RmgD8jBbAZd+3aN26NdasWYdmzRorHPtTSaikxqMVwtu3b/HoUe4T1qhRI/z888/o0KEDTExMULVqVXz//feIiYnBb7/9BiB3+HyDBg3w9ddfY8yYMbh8+TLGjRuHvXv3FnrUWEpKCgwNDZGcnAwDA4MSuzYxBAVFY/HiC4iNfYOYmDfCh8CVK55o0aJK7k7no4HBR4H3hml2T07Biez3fnpICBZHd0HVMBNxcXGQSWUwzNLG6+ny97vHvnmLiPoxuPnFPahVfgtVi9w/FZ1s9Aisgy0r+yoGe/wroJmlYjljjLEKiYiQkZEBbW35e9qnTp1Cx44dkZaWViLf36K2CN24cQMdOnQQHufdwhoxYgR8fX0RFxeHqKgoYXv16tVx/PhxTJs2DRs2bICVlRXWrl1bPobOv9dK8zZHhkdpWXjkYouH6hK0aFEFHTtWV153111gTxhSX76F/40ohc1ymffecIUkCAC6a6grJkIkQXyQDtDxMWCSWyRLU8ybt+rr4YdmETjj+t6Q3kzgkaYuLtf+75ebtoo2nPQdhV+ojDHGWFJSEsaNG4f09HQcOXJErt/v+6PFi5uoLUJiKA0tQgcOHIC3t7fQ76aNzBa7sgdCHapIJ0KNpFeIf+9l0dW9AyMjxeZaAPDKaQMvaRuE5eSg/utkhe2GhoHQ08ttgl6R3Q11yAxZ1bIANUAzWw2Nn9jgoVQKu1evcyuo5wD1nkLi9AhoEQ5JpRTIpLnJk7IWIQD4edAl/Dz4UoHXra+qj0U1FqF/5f4F7scYY6ziOHv2LIYNG4aYmNzboBs3bsT48Yoj8Erq+7tM9REqy1JTsxAb+wa1a5sqjMDqpdcJ6lq53eW1JRIo6+OXmqqB1FQlE1wBSNFJAXQAaxXlsyEkJ8uQnJxb1x3/H3a6FtAwUsPBH0ehMWxQW1UVHpqZ8M08CGQ/AUKzQaEAdkKuM10tu1pKz+FVdRq82uwr6ClgjDHGBFlZWZg7dy5WrlwpjAA3NjaGhUUh5joqRpwIlZC4uDe4tDUYgQfDcSnqNULeZKChnhaCF3RSGIFVN8tKLtuoraqKFzny3erV1CrDoKcJUvqlQKb13jTsxyTAccBAIoEOAIVuz+0MoPKNfJIkM5Jh+l8dUSlVDzcluUlSF8MXCFBPA2Cu9Jr09fUxZ+5cYLeS0QKFmZCMMcYYA3Dv3j24ubkhOPi/ztodO3bEzp07UaVKlc8aCydCJWSK2x84cO6JXNntNxlIefDfrJmWlpZ49uwZEPbyv9lUs6So5boHQSnyiZBEYgIzr8pIynpv1k38N+eDRCJBc3U1ZFJu69BfLoHItngN1I6Vn8Ph/5b1PYXD7lEIdw4HADQBoHxauve4F2YnxhhjTB4RYcuWLfDy8kJ6eu7oPXV1dSxbtgzTpk2DSj53NkoSJ0IlpE0WcOC9MhmAK/FK1uWp986MuqtvoHZlPSBFfninTEZ4nSAFjAAVqMBS878RV6mVgZt2ua06K9+pEzL5LtK18jpAWyucNq/PDmOMMVbSMjMzMWDAALlpcOzt7eHn54dGjRqJFhcnQp/gwPMD8H7sjTdSxeRm+dvhyJ0FQd6Wh0GIWx0HSIE41ThUufReE2AzIFPNArrnq0OtSgrUqiRDrUoKVC3eIF6au0idpaYlnrV5Z36TNgCUTJ79EMs+4eoYY4yx4qOpqQl9fX3h8YQJE/DTTz9BR0dHxKg4Efok3o+9cS9N2bITgLnRG+hKNJH63pi8sBeqwm0qGWSIyVTSAbpeDFDvpnzZOz2o9VX1wRhjjJU1GzZswMOHD+Ht7Y2ePZWvQ/e5cSL0EfKGvz9Y+CB3fh0ZoPJa/r5m9282QpY0DPi3OnS0stBMS4KG+lKkNorAvRe5+6ioyi9XURh8O4sxxlhZcPv2bcTGxqJbt25CmZGREa5evZrv+qBi4ESoiIKCojF1qg9iYx8B/59bB4mAbIiyRYdOAshCWkYSzmcA518DiAZwJHerXV07hIeHf5a4GWOMsc9BJpPhl19+waxZs6Crq4vbt2/LjQQrTUkQwIlQkf34YyBiY1sDaAz8+i/QOwQS8zhYWefXsqMNpR2V9fWxaBG37DDGGCs/YmNj4eHhgYCAAAC5cwUtXboUGzduFDmy/HEiVAQJCak4fjxvUUNt4J9mwD/NoNEyCs+e/SpqbIwxxpiY/vzzT4wePRqJif8t4Dt9+nQsWbKkgFri40SoCPbsuYMcJet0dczRAbq/N1jeVBvYXTo6gjHGGGMlJTU1FdOmTYOPj49QZmlpid9++w2dO3cWMbLC4USokNLTs7F6tfK1vr4ylAA3nssXWup+hqgYY4wx8dy4cQPu7u548OCBUNa3b1/4+PjA1NRUxMgK7/NP4VhGyWSEIUMaQFNTVX6DwxNU1lHWUZoxxhgrvzIyMvDll18KSZCOjg62bduGQ4cOlZkkCOBEqNB0dTWwfHln3L//DbS1H/63oc818YJijDHGRKKlpSV0gm7WrBlCQkLg6elZ6kaFfQgnQkVka2sEE5NzALYC3a8CbcPEDokxxhj7LLKysuQe9+nTB4cPH0ZgYCBq164tUlSfhvsIfciTZCBDfgHUqmSIGIRBZWQ8ZBLgsVUS0LSyfD1T7c8YJGOMMVZykpOT8c033yAzMxP79++Xa/Xp06ePeIEVA06E8pG3jtjub/ugyQP5eYBWVe+GVqvCIDPK7Ru0YugFTG6zR4wwGWOMsRIVGBiIoUOH4smTJwCAHj16YMSIEeIGVYz41lg+8tYRy6IsxY1qAMwgPHu89hdjjLHyJjs7G97e3mjXrp2QBBkYGEBLS0vcwIoZtwjlQ9mK8oIcAC9y1wqzq2LHa38xxhgrVx49eoShQ4fi6tWrQlnr1q2xe/duVKtWTbzASgAnQh+gIdEAulYDOlYFZl3ILXuiAQwBLK0tEf6M1wpjjDFWPhARfH19MWnSJKSmpgIAVFVV8cMPP2DWrFlQUyt/aQPfGlMiO1uKpIVfAPesEdDsEbC9O6Ch+sF6jDHGWFmVkZGBgQMHYtSoUUISVLNmTQQGBmLu3LnlMgkCOBFS6tSpCKQH1Aa+GYOF/2ohMjZFbnuU5LU4gTHGGGMlRFNTE9nZ2cJjT09PhISEoEWLFiJGVfI4EVLizJlI4f/TT9dC3bob8O3eUGQQAWoq2Kd6W8ToGGOMseInkUiwbds21K9fHwcPHsS2bdugp6cndlgljhMhJc6deyr3OCtLCt/AKGh8ZQfs74VAlaf51GSMMcbKhnv37uH8+fNyZZUqVcLt27fx1VdfiRTV58eJ0HtevUpHcHCcQnn7zjWgssUFaGcjQlSMMcZY8SAibN68GY0bN8bAgQPx/Ln8ouEqKhUrNahYV1sIFy9GgUix/IsvbD9/MIwxxlgxSkhIQO/evTF+/Hikp6cjISEBixZV7ClgOBF6T4sW1tixozd0uj0AzF8L5YsXj0aVKlVQpUoVxMUpthgxxhhjpZm/vz8cHBxw9OhRoWzixIlYsWKFiFGJr3yOhfsElSvrwcNAB08bpCHZPhTJcZqIOWQEo1ea2J8dI7evvj7PKM0YY6x0S09Px3fffYd169YJZebm5ti+fTt69OghYmSlAydCyviEYv6NTv89NgRuSqrhkka8UKSvr1/hmxMZY4yVbqGhoXB3d8fdu3eFMldXV2zfvh2VK1cuoGbFwYlQITVp0hjP/JeJHQZjjDFWKOnp6ejatSsSEhIAAFpaWli5ciUmTJggt3p8Rcd9hBhjjLFySFtbG6tXrwYAODo64ubNm5g4cSInQe/hFiHGGGOsnJBKpVBV/W9JKDc3NxAR+vfvD01NTREjK724RYgxxhgr41JTUzF27FiMHj1aYZu7uzsnQQXgFiFlfuuBZoFNEJcVDyQClostcP23m2JHxRhjjCm4ceMG3N3d8eDBAwC5naEHDBggclRlBydCypjpIK7SG8RkvgZkACS6gJmO2FExxhhjAqlUihUrVsDb2xs5OTkAAB0dHWRmZoocWdnCidD/9e69D+rqKmjf3hbt21cDycSOiDHGGFMuKioKw4YNw4ULF4Sypk2bws/PD3Z2diJGVvZwIgQgNTULx48/RE6ODIcOhQMAJAbDgK8Cga4XRY6OMcYY+8++ffswbtw4JCcnA8hdNX727NmYP38+1NXVRY6u7OFECEBQUDRycuSbgChFC1CTihQRY4wxJi89PR1ff/01du3aJZRVrVoVu3fvRtu2bUWMrGzjUWMAzv92R/kGx6efNxDGGGMsH5qamnIrxbu5uSE0NJSToE/EiRCA8/88VizUzAJq8eKqjDHGSgcVFRX4+vqiZs2a2L17N/z8/GBkZCR2WGUe3xoD0EpbA29VVREqlYL+X1bdNBWRarm3y3hxVcYYY5/bo0ePkJiYiBYtWghllpaWuHfvHtTU+Ou7uHCLEIAf61RGsLEREk2McURfH9O1tTBUUwsAoKamxourMsYY+2yICDt27ICTkxO++uorJCUlyW3nJKh4cSL0DmMVFfTS1MBKXV30tsxdi6WyRWX0799f5MgYY4xVBElJSRg4cCBGjRqF1NRUxMTEYMGCBWKHVa5xIgQAVQ1wX/ICd3PicF/yAqhjgujKr8WOijHGWAVy9uxZODg44ODBg0KZp6cnlixZImJU5R+3rwHAFhd0+tsTMS9iYG1tjWeXnuGbSzMBnpyTMcZYCcvKysLcuXOxcuVKEOX2VDU2NoaPjw+++uorkaMr/zgRYowxxkRy7949uLm5ITg4WCjr2LEjdu7ciSpVqogYWcXBiRBjjDEmgrS0NLRr1w4vXrwAAKirq2PZsmWYNm0aVFS458rnUmETobqX60JF9783WtzqOEAKxKnGocqlKojL5DmEGGOMlRwdHR0sWbIEY8eOhb29Pfbs2QMnJyexw6pwKmwiFHfAGugfDKj/fxkNk9z/yCBDTGaMsJ++Ks8hxBhjrHgQESQSifB49OjRICIMHToUOjo6IkZWcVXYRAi/dobqibYw/Po6tL6IRHx8HGRSGVRUVWBpaQkgNwlaVIPnEGKMMfZp0tPT8d1334GIsG7dOqFcIpFg7NixIkbGKm4iBEAaY4gk785wdq6CnIg1SEj4F5bWlnj27JnYoTHGGCsnQkND4e7ujrt37wIAunXrhh49eogcFcvDvbEAXL8eC0D2wf0YY4yxwpLJZFi9ejWaN28uJEFaWlpC52hWOlToFqE846wM4ZH+JQ5pV8FvuCt2OIwxxsq42NhYeHh4ICAgQChzdHTEnj17UK9ePREjY++TUN7sTRVESkoKDA0NAcwCoAUDiQSPjI1g9v5QxaaVAf8BYoTIGGOsDDt8+DDGjBmDxMREoWz69OlYsmQJNDU1RYysbMv7/k5OToaBgUGxHbfCtgiZaMmQlAF8r62tmAQxxhhjRZSRkYHJkyfDx8dHKLOyssLOnTvRuXNnESNjBamwGYDHqLtY3aU2pkxqBoxzhK/KTfkdbA3FCYwxxliZpK6ujnv37gmP+/bti9u3b3MSVMpV2BahvV/eQqzLsdwHmVJYbz3030Y1FcDNXpzAGGOMlUmqqqrYtWsXWrdujQULFmDUqFFycwax0qnCJkJyNgbDhLRxOTsS8Zpp6Lt/NtDORuyoGGOMlWJPnz7Fq1ev5GaDtrW1RUREBPcFKkMq7K0xOdOaorfGLrRK/hmT1I9yEsQYY6xAe/fuhaOjI/r164eUlBS5bZwElS0VNhGKj49HlSpVhL+4OF5bjDHGWMGSk5MxbNgwuLm5ITk5GZGRkViwYIHYYbFPIHoitHHjRlSvXh1aWlpo0qQJLl68WOD+fn5+cHR0hI6ODiwtLTFy5Ei5IYqFRVJCTEyM8CeT5U6oqK/Pa4sxxhhTFBgYCCcnJ+zevVsoc3Nzg7e3t4hRsU8laiK0f/9+TJ06FXPmzEFwcDDatm2L7t27IyoqSun+ly5dwvDhw+Hp6Ym7d+/iwIEDuH79OkaPHv1R57e2tpb7q1u3LhYt4rXFGGOM/Sc7Oxve3t5o164dnjx5AgAwMDDA7t274efn9/+56VhZJeqEii1atEDjxo2xadMmocze3h59+vTBsmXLFPZfuXIlNm3ahIiICKFs3bp1WLFiBaKjowt1zrwJmSS+EshG8LIajDHG8hcREQF3d3dcvXpVKGvTpg127dqFatWqiRdYBVRSEyqK1iKUlZWFmzdvomvXrnLlXbt2RVBQkNI6rVq1wrNnz3D8+HEQEZ4/f46DBw8WuHhdZmYmUlJS5P4YY4yxD0lNTUXLli2FJEhVVRWLFy/GuXPnOAkqR0RLhF6+fAmpVIrKlSvLlVeuXBnx8fFK67Rq1Qp+fn4YNGgQNDQ0YGFhASMjI6xbty7f8yxbtgyGhobCn41N7oiwpk+rAmeeyv/dUH5exhhjFY+uri7mzp0LAKhZsyaCgoIwZ84cqKqqihwZK06izyP0/mRTRJTvBFRhYWGYPHkyvL294eLigri4OMyYMQPjxo3Dr7/+qrTO999/Dy8vL+FxSkoKbGxs8M8vE4F1R+V35vXFGGOsQnv/O2jSpEmQyWQYM2YM9PT0RIyMlRTREqFKlSpBVVVVofUnISFBoZUoz7Jly9C6dWvMmDEDAODg4ABdXV20bdsWixcvhqWlpUIdTU1NntOBMcZYgbKysjB37lyoqKhg+fLlQrmKigqmTZsmYmSspIl2a0xDQwNNmjRBQECAXHlAQABatWqltE5aWhpU3lsgNa+Jsqh9vi/VeFyk/RljjJVP4eHhaNmyJX766SesWLECZ8+eFTsk9hmJOnzey8sL27Ztw/bt2xEeHo5p06YhKioK48aNA5B7W2v48OHC/r169cIff/yBTZs24fHjxwgMDMTkyZPRvHlzWFlZFencqzsreaPzQquMMVZhEBE2bdqEJk2aIDg4GACgpqYmNzKZlX+i9hEaNGgQEhMTsXDhQsTFxaFBgwY4fvw4bG1tAQBxcXFycwp5eHjgzZs3WL9+PaZPnw4jIyN07NgRP/7446cHwwutMsZYhZGQkABPT08cO3ZMKLO3t8eePXvk1g5j5Z+o8wiJIW8egmYLbHHt8k+5hbaGuUkQrzHGGGPlnr+/Pzw8PJCQkCCUTZgwAT/99BN0dHREjIwVpKTmERJ91JhYbthGAd48QowxxiqKjIwMzJw5U27KFTMzM2zfvh09e/YUMTImJtHXGmOMMcY+B1VVVVy5ckV47Orqijt37nASVMFxIsQYY6xCUFdXh5+fHypVqoT169fj2LFj+U7XwiqOCntrjDHGWPkWGxuL5ORk2Nv/NxCmdu3aePLkCXR1dUWMjJUmFbZFaMlfPYEtoWKHwRhjrAQcPnwYDg4O+Oqrr5CWlia3jZMg9q4KmwhNPN8W+POB2GEwxhgrRqmpqRg7diz69euHxMREhIeHY+HChWKHxUoxvjXGGGOsXLhx4wbc3d3x4MF/P3L79u0rLMvEmDIVtkWIMcZY+SCVSrFs2TI4OzsLSZCOjg62bduGQ4cOwdTUVOQIWWnGLUKMMcbKrKioKAwbNgwXLlwQypo1awY/Pz/Url1bxMhYWcGJEGOMsTLpzZs3aNq0KV68eAEAkEgkmD17NubPnw91dXWRo2NlRYW9NWb083eAP88szRhjZZW+vj6mTp0KAKhatSrOnz+PxYsXcxLEioRbhBhjjJVZ3333HWQyGb755hsYGRmJHQ4rgzgRYowxVurl5ORg0aJFUFNTw7x584RyVVVVzJ07V8TIWFnHiRBjjLFSLSIiAu7u7rh69SpUVFTQuXNnODs7ix0WKycqbB8hxhhjpRsRwdfXF05OTrh69SqA3A7RoaG8KgArPtwixBhjrNRJSkrC119/jYMHDwplNWvWhJ+fH1q0aCFiZKy84RYhxhhjpcrZs2fh4OAglwR5enoiJCSEkyBW7DgRYowxVipkZWXhu+++Q6dOnRATEwMAMDY2xsGDB7Ft2zbo6emJHCErj/jWGGOMsVJBJpPB398fRAQA6NixI3bu3IkqVaqIHBkrzypsi9DlFV7A1yfFDoMxxtj/aWlpYc+ePTAwMMDKlSsREBDASRArcRW2Rcg+vjIQlSJ2GIwxVmElJCTgzZs3qFmzplDWoEEDPH36lCdHZJ9NhW0RYowxJh5/f380bNgQ/fv3R2Zmptw2ToLY58SJEGOMsc8mPT0dkydPhqurKxISEhASEoIlS5aIHRarwCrsrTHGGGOfV2hoKNzd3XH37l2hzNXVFRMnThQxKlbRVdgWodN1HgCNLcQOgzHGyj2ZTIbVq1ejefPmQhKkpaWF9evX49ixY6hcubLIEbKKTEJ54xQriJSUFBgaGkLiK4FshEzscBhjrFyLjY3FiBEj8M8//whljo6O2LNnD+rVqydiZKysyfv+Tk5OhoGBQbEdl2+NMcYYKxHJyclwcnLCixcvhLLp06djyZIl0NTUFDEyxv5TYW+NMcYYK1mGhoYYO3YsAMDKygoBAQFYuXIlJ0GsVOEWIcYYYyVm/vz5kMlkmD59OkxNTcUOhzEFH9UilJOTg3/++QdbtmzBmzdvAOTeB3779m2xBscYY6xskEqlWLZsGVavXi1Xrq6ujqVLl3ISxEqtIrcIPX36FN26dUNUVBQyMzPRpUsX6OvrY8WKFcjIyMDmzZtLIk7GGGOlVFRUFIYNG4YLFy5AXV0dX3zxBRo1aiR2WIwVSpFbhKZMmYKmTZvi1atX0NbWFsr79u2L06dPF2twjDHGSrd9+/bBwcEBFy5cAJB7xyAoKEjkqBgrvCK3CF26dAmBgYHQ0NCQK7e1tUVMTEyxBcYYY6z0SklJwTfffINdu3YJZVWrVsXu3bvRtm1bESNjrGiK3CIkk8kglUoVyp89ewZ9ff1iCepzsHptCLxIEzsMxhgrcwIDA+Ho6CiXBLm5uSE0NJSTIFbmFDkR6tKlC9asWSM8lkgkePv2LebPnw9XV9fijK1EhS2cDQz/W+wwGGOszMjOzoa3tzfatWuHJ0+eAAAMDAywe/du+Pn58WKprEwq8q2x1atXo0OHDqhXrx4yMjLg5uaGhw8folKlSti7d29JxMgYY6wUyMrKwv79+yGT5c7K36ZNG+zatQvVqlUTNzDGPkGREyErKyuEhIRg3759uHnzJmQyGTw9PeHu7i7XeZoxxlj5oqurCz8/P7Rr1w5z5szBrFmzoKqqKnZYjH2SIq81duHCBbRq1QpqavI5VN5IgXbt2hVrgMVNWKvEZAUMmlcD/AeIHRJjjJVKSUlJSE1NhY2NjVx5QkICzM3NRYqKVVQltdZYkfsIdejQAUlJSQrlycnJ6NChQ7EExRhjTFxnz56Fg4MDBg4ciJycHLltnASx8qTIiRARQSKRKJQnJiZCV1e3WIL6HGb3PgaMcRQ7DMYYK1WysrIwc+ZMdOrUCTExMbhy5Qp+/PFHscNirMQUuo9Qv379AOSOEvPw8JBbNE8qleL27dto1apV8UdYQja2v4j1/ezEDoMxxkqN8PBwuLu7Izg4WCjr2LEjRowYIWJUjJWsQidChoaGAHJbhPT19eU6RmtoaKBly5YYM2ZM8UfIGGOsRBERtmzZAi8vL6SnpwP4b40wLy8vqKh81LKUjJUJhU6EduzYAQCoVq0avv322zJ1G4wxxphyCQkJGD16NI4ePSqU2dvbw8/Pj9cLYxVCkYfPz58/vyTiYIwx9pm9fv0ajo6OiI+PF8omTJiAn376CTo6OiJGxtjnU+RECAAOHjyI33//HVFRUcjKypLbduvWrWIJjDHGWMkyMjLC4MGDsWbNGpiZmWH79u3o2bOn2GEx9lkV+cbv2rVrMXLkSJibmyM4OBjNmzeHqakpHj9+jO7du5dEjIwxxkrIsmXLMHnyZNy5c4eTIFYhFXlCxbp162L+/PkYMmQI9PX1ERoaiho1asDb2xtJSUlYv359ScVaLPImZJL4SiAbIRM7HMYY+yxkMhl++eUX6OrqYuzYsWKHw1iRlZoJFaOiooRh8tra2njz5g0AYNiwYbzWGGOMlUKxsbHo1q0bvLy8MGXKFISHh4sdEmOlRpETIQsLCyQmJgIAbG1tceXKFQBAZGQkiti4JKrhV5oBJyLFDoMxxkrU4cOH4eDggICAAABARkaG8P+MsY9IhDp27CgMs/T09MS0adPQpUsXDBo0CH379i32AEvK2t/7A7/cEDsMxhgrEampqRg7diz69esn/Hi1srJCQEAAJk+eLHJ0jJUeRR41tnXrVshkuX1rxo0bBxMTE1y6dAm9evXCuHHjij1AxhhjRXPjxg24u7vjwYMHQlnfvn3h4+MDU1NTESNjrPQpciKkoqIiN8vowIEDMXDgQABATEwMrK2tiy86xhhjhSaVSrFixQp4e3sLC6Xq6Ohg7dq1GDVqlNJ1Ihmr6Ipl3vT4+HhMmjQJtWrVKo7DMcYY+wipqanYsmWLkAQ1a9YMISEh8PT05CSIsXwUOhF6/fo13N3dYWZmBisrK6xduxYymQze3t6oUaMGrly5gu3bt5dkrIwxxgpgYGCAXbt2QV1dHXPmzEFgYCBq164tdliMlWqFnkdowoQJOHr0KAYNGoQTJ04gPDwcLi4uyMjIwPz589G+ffuSjrVY5M1DYL/cAmGD7gHVDMUOiTHGPkpKSgrS0tJgYWEhVx4dHQ0bGxuRomKsZIg+j9Dff/+NHTt2YOXKlThy5AiICHZ2djhz5kyZSYLedc/iOSdBjLEyKzAwEI6OjnBzcxMGsOThJIixwit0IhQbG4t69eoBAGrUqAEtLS2MHj26xAJjjDGmKDs7G97e3mjXrh2ePHmCs2fPYvXq1WKHxViZVehRYzKZDOrq6sJjVVVV6OrqlkhQjDHGFD169AhDhw7F1atXhbI2bdrgq6++EjEqxsq2QidCRAQPDw9oamoCyJ2ddNy4cQrJ0B9//FG8ETLGWAVHRPD19cWkSZOQmpoKIPfH6IIFCzBr1iyoqqqKHCFjZVehb42NGDEC5ubmMDQ0hKGhIYYOHQorKyvhcd5fUW3cuBHVq1eHlpYWmjRpgosXLxa4f2ZmJubMmQNbW1toamqiZs2aPFqNMVZuJSUlYeDAgRg1apSQBNWsWRNBQUGYM2cOJ0GMfaJCtwjt2LGj2E++f/9+TJ06FRs3bkTr1q2xZcsWdO/eHWFhYahatarSOgMHDsTz58/x66+/olatWkhISBDmzGCMsfLk1atXcHR0xLNnz4QyT09PrFmzBnp6eiJGxlj5Uejh8yWhRYsWaNy4MTZt2iSU2dvbo0+fPli2bJnC/idOnMDgwYPx+PFjmJiYfNQ584bfSXwlkI2QfbgCY4yJ6Ouvv8bWrVthbGwMHx8f7g/EKizRh88Xt6ysLNy8eRNdu3aVK+/atSuCgoKU1jly5AiaNm2KFStWwNraGnZ2dvj222+Rnp7+OUJmjLHP7ueff4anpydu377NSRBjJaDIa40Vl5cvX0IqlaJy5cpy5ZUrV0Z8fLzSOo8fP8alS5egpaWFw4cP4+XLl5gwYQKSkpLy7SeUmZmJzMxM4XFKSgoAwHenO/DsMjDHuZiuiDHGPh4RYcuWLdDT08PQoUOFcl1dXWzbtk3EyBgr30RLhPK8v/4NEeW7Jo5MJoNEIoGfn5/QMfvnn39G//79sWHDBmhrayvUWbZsGRYsWKBQ3ifUAdB8plDOGGOfW0JCAkaPHo2jR49CT08Pzs7OqFmzpthhMVYhiHZrrFKlSlBVVVVo/UlISFBoJcpjaWkJa2trudFp9vb2ICK5zoTv+v7775GcnCz8RUdHF99FMMbYJ/L394eDgwOOHj0KAHj79i2OHTsmclSMVRwflQjt2rULrVu3hpWVFZ4+fQoAWLNmDf76669CH0NDQwNNmjRBQECAXHlAQABatWqltE7r1q0RGxuLt2/fCmUPHjyAiooKqlSporSOpqYmDAwM5P4YY0xs6enpmDx5MlxdXfH8+XMAgJmZGY4ePYopU6aIHB1jFUeRE6FNmzbBy8sLrq6ueP36NaRSKQDAyMgIa9asKdKxvLy8sG3bNmzfvh3h4eGYNm0aoqKiMG7cOAC5rTnDhw8X9ndzc4OpqSlGjhyJsLAwXLhwATNmzMCoUaOU3hZjjLHS6Pbt22jWrBnWrVsnlLm6uuLOnTvo2bOniJExVvEUORFat24dfHx8FCbyatq0Ke7cuVOkYw0aNAhr1qzBwoUL4eTkhAsXLuD48eOwtbUFAMTFxSEqKkrYX09PDwEBAXj9+jWaNm0Kd3d39OrVC2vXri3qZSDGKBkw5eSJMfb5yGQyrF69Gs2aNcPdu3cBAFpaWli/fj2OHTuWb7cAxljJKfI8Qtra2rh37x5sbW2hr6+P0NBQ1KhRAw8fPoSDg0OpH8rO8wgxxsTy6tUr1K9fH3FxcQAABwcH7NmzB/Xr1xc5MsZKv1Izj1D16tUREhKiUO7v7y+sTs8YY0yRsbExdu7cCRUVFUyfPh3Xrl3jJIgxkRV5+PyMGTMwceJEZGRkgIhw7do17N27F8uWLeO5Lhhj7B2pqanIyMiAqampUNalSxfcv38ftWrVEjEyxlieIidCI0eORE5ODmbOnIm0tDS4ubnB2toav/zyCwYPHlwSMTLGWJlz48YNuLu7o1atWjh27Jjc/GicBDFWenzSWmMvX76ETCaDubl5ccZUoriPEGOsJEmlUqxYsQLe3t7CgtAbNmzAhAkTRI6MsbKt1PQRWrBgASIiIgDkTopYlpIgxhgrSVFRUejYsSNmz54tJEHNmjVDly5dRI6MMZafIidChw4dgp2dHVq2bIn169fjxYsXJREXY4yVKfv27YODgwMuXLgAAFBRUcGcOXMQGBiI2rVrixwdYyw/RU6Ebt++jdu3b6Njx474+eefYW1tDVdXV+zZswdpaWklESNjjJVaKSkpGD58OIYMGYLk5GQAQNWqVXHu3DksXrwY6urqIkfIGCvIJ/URAoDAwEDs2bMHBw4cQEZGhrC6e2nFfYQYY8UlMTERzZo1Q2RkpFDm5uaGDRs2wMjISLzAGCuHSk0foffp6upCW1sbGhoayM7OLo6YPouo2QuBfn+KHQZjrAwzNTVF69atAQAGBgbYvXs3/Pz8OAlirAz5qEQoMjISS5YsQb169dC0aVPcunULP/zwg8JK8qWZQYYmkF52EjfGWOm0fv16DBkyBKGhoXB3dxc7HMZYERV5HiFnZ2dcu3YNDRs2xMiRI4V5hBhjrDwjIuzcuRMGBgbo16+fUG5oaIg9e/aIGBlj7FMUORHq0KEDtm3bxtPCM8YqjKSkJHz99dc4ePAgjIyM0KxZM9jY2IgdFmOsGBT51tjSpUs5CWKMVRhnz56Fg4MDDh48CAB4/fq18P+MsbKvUC1CXl5eWLRoEXR1deHl5VXgvj///HOxBFbSfmtxHd906SB2GIyxUiorKwtz587FypUrkTe41tjYGD4+Pvjqq69Ejo4xVlwKlQgFBwcLI8KCg4NLNKDPZfKgg/hmxO9ih8EYK4Xu3bsHNzc3uc+7jh07YufOnahSpYqIkTHGiluhEqGzZ88q/X/GGCtPiAhbtmyBl5cX0tPTAQDq6upYtmwZpk2bBhWVT55xhDFWyhT5X/WoUaPw5s0bhfLU1FSMGjWqWIJijDExJCUlYd68eUISZG9vj2vXrmH69OmcBDFWThX5X/bOnTuFD4l3paen47fffiuWoBhjTAympqbYtm0bAGDChAm4ceMGnJycxA2KMVaiCj18PiUlBUQEIsKbN2+gpaUlbJNKpTh+/DivRM8YK1PS09ORlZUFQ0NDoax37964ffs2GjZsKGJkjLHPpdCJkJGRESQSCSQSCezs7BS2SyQSLFiwoFiDY4yxknL79m24ubnB3t4ev//+OyQSibCNkyDGKo5CJ0Jnz54FEaFjx444dOgQTExMhG0aGhqwtbWFlZVViQTJGGPFRSaT4ZdffsGsWbOQlZWFu3fvYufOnfDw8BA7NMaYCAqdCLVv3x5A7jpjVatWlfv1xBhjZUFsbCw8PDwQEBAglDk6OqJ58+YiRsUYE1OhEqHbt2+jQYMGUFFRQXJyMu7cuZPvvg4ODsUWXElq9sQWCHsJ1KskdiiMsc/g8OHDGDNmDBITE4Wy6dOnY8mSJdDU1BQxMsaYmCSUN2VqAVRUVBAfHw9zc3OoqKhAIpFAWTWJRAKpVFoigRaXlJQUGBoaItlkBQyaVwP8B4gdEmOsBKWmpmLatGnw8fERyqysrLBz50507txZxMgYY0UhfH8nJ8PAwKDYjluoFqHIyEiYmZkJ/88YY2XBixcv0KZNGzx48EAo69u3L3x8fGBqaipiZIyx0qJQiZCtra3S/2eMsdKsUqVKqF+/Ph48eAAdHR2sXbsWo0aN4j6OjDHBR02o+PfffwuPZ86cCSMjI7Rq1QpPnz4t1uAYY+xTSCQS+Pj44Msvv0RISAg8PT05CWKMySlyIrR06VJoa2sDAC5fvoz169djxYoVqFSpEqZNm1bsATLGWGHt27cP/v7+cmWmpqb466+/ULt2bZGiYoyVZoUePp8nOjoatWrVAgD8+eef6N+/P8aOHYvWrVvjiy++KO74SozHCD/80eeo2GEwxopBSkoKvvnmG+zatQtmZma4c+cOKleuLHZYjLEyoMgtQnp6esLw01OnTgmjLrS0tJSuQVZa/el4G2hnI3YYjLFPFBgYCEdHR+zatQtAbgdpPz8/kaNijJUVRW4R6tKlC0aPHo1GjRrhwYMH6NGjBwDg7t27qFatWnHHxxhjSmVnZ2PRokVYsmQJZDIZAMDAwAAbN26Eu7u7yNExxsqKIrcIbdiwAc7Oznjx4gUOHTokDEG9efMmhgwZUuwBMsbY+x49eoS2bdti0aJFQhLUpk0bhIaGchLEGCuSQk2oWJ7kTcgk8ZVANkImdjiMsSIgIvj6+mLSpElITU0FAKiqqmLBggWYNWsWVFVVRY6QMVZSRJ1Q8X2vX7/Gr7/+ivDwcEgkEtjb28PT0xOGhobFFhhjjL3vxYsXmDZtmpAE1axZE35+fmjRooXIkTHGyqoi3xq7ceMGatasidWrVyMpKQkvX77E6tWrUbNmTdy6daskYmSMMQCAubk5Nm/eDADw9PRESEgIJ0GMsU9S5Ftjbdu2Ra1ateDj4wM1tdwGpZycHIwePRqPHz/GhQsXSiTQ4sK3xhgrO7KyspCdnQ1dXV258mvXrvGK8YxVMCV1a+yjWoS+++47IQkCADU1NcycORM3btwotsAYYxXbvXv34OzsjIkTJyps4ySIMVZcipwIGRgYICoqSqE8Ojoa+vr6xRLU5/Ddic7Arrtih8EYew8RYfPmzWjcuDFu3bqFnTt34vfffxc7LMZYOVXkRGjQoEHw9PTE/v37ER0djWfPnmHfvn0YPXp0mRo+//2pLsCeMLHDYIy948WLF+jduzfGjx8vTNBqb2/Py2MwxkpMkUeNrVy5EhKJBMOHD0dOTg4AQF1dHePHj8fy5cuLPUDGWMVw4sQJeHh44Pnz50LZhAkT8NNPP0FHR0fEyBhj5dlHzyOUlpaGiIgIEBFq1apVZj6ohM5WJitg0Lwa4D9A7JAYq9DS09Mxa9YsrF27VigzMzPD9u3b0bNnTxEjY4yVJqJ3lk5LS8PEiRNhbW0Nc3NzjB49GpaWlnBwcCgzSRBjrHRJSEhA8+bN5ZIgV1dX3Llzh5MgxthnUehEaP78+fD19UWPHj0wePBgBAQEYPz48SUZG2OsnKtUqRKsra0B5C7cvH79ehw7doxXjmeMfTaFvjVWs2ZNLFmyBIMHDwaQO49H69atkZGRUaamtc9rWjPYrIXkoSmArrrYITFWocXFxWH48OH45ZdfUK9ePbHDYYyVUiV1a6zQiZCGhgYiIyOFX28AoK2tjQcPHsDGxqbYAippPKEiY+L5888/YWRkhC+++ELsUBhjZYzofYSkUik0NDTkytTU1ISRY4wxlp/U1FSMHTsWffv2xdChQ5GUlCR2SIwxBqAIw+eJCB4eHtDU1BTKMjIyMG7cOLnp7//444/ijZAxVqbduHED7u7uePDgAQAgJiYGvr6+8PLyEjkyxhgrQiI0YsQIhbKhQ4cWazCMsfJDKpVixYoV8Pb2FlqOdXR0sHbtWowaNUrk6BhjLFehE6EdO3aUZByMsXIkKioKw4YNk1uEuWnTpvDz84OdnZ2IkTHGmLwiL7HBGGMF2bdvHxwcHIQkSCKRYM6cOQgKCuIkiDFW6hR5iQ3GGMtPfHw8Ro8ejdTUVABA1apVsXv3brRt21bkyBhjTDluEWKMFRsLCwv88ssvAIAhQ4YgNDSUkyDGWKlWYVuETq2dAIScAVZ3FDsUxsqs7OxsSKVSaGlpCWWjRo1CjRo10KFDBxEjY4yxwqmwLULNn9gC9xLFDoOxMuvRo0do27Ytpk+fLlcukUg4CWKMlRkflQjt2rULrVu3hpWVFZ4+fQoAWLNmDf76669iDY4xVvoQEXbs2AEnJydcvXoVGzduxLFjx8QOizHGPkqRE6FNmzbBy8sLrq6ueP36NaRSKQDAyMgIa9asKe74GGOlSFJSEgYOHIhRo0YJHaJr1qwJc3NzkSNjjLGPU+REaN26dfDx8cGcOXPkFltt2rQp7ty5U6zBMcZKj7Nnz8LBwQEHDx4Uyjw9PRESEoLmzZuLGBljjH28IidCkZGRaNSokUK5pqam8AuxLLhW7SlQ11TsMBgr9bKysjBz5kx06tQJMTExAABjY2McPHgQ27Ztg56ensgRMsbYxyvyqLHq1asjJCQEtra2cuX+/v6oV69esQVW0rpO3gjZiPVih8FYqZaQkIBu3bohODhYKOvUqRN27twJa2trESNjjLHiUeREaMaMGZg4cSIyMjJARLh27Rr27t2LZcuWYdu2bSURI2NMJKamptDX1wcAqKurY9myZZg2bRpUVCrsgFPGWDlT5E+zkSNHYv78+Zg5cybS0tLg5uaGzZs345dffsHgwYOLHMDGjRtRvXp1aGlpoUmTJrh48WKh6gUGBkJNTQ1OTk5FPidjrHBUVVWxa9cutGrVCteuXcP06dM5CWKMlSsSIqKPrfzy5UvIZLKPHjGyf/9+DBs2DBs3bkTr1q2xZcsWbNu2DWFhYahatWq+9ZKTk9G4cWPUqlULz58/R0hISKHPmZKSAkNDQ0h8JZCNkH1U3IyVV/7+/jA2NkbLli3lyokIEolEpKgYY+y/7+/k5GQYGBgU23E/6addpUqVPmnY7M8//wxPT0+MHj0a9vb2WLNmDWxsbLBp06YC63399ddwc3ODs7PzR5+bMfaf9PR0TJ48Ga6urnBzc0NKSorcdk6CGGPl1Ud1li7oQ/Hx48eFOk5WVhZu3ryJWbNmyZV37doVQUFB+dbbsWMHIiIisHv3bixevPiD58nMzERmZqbw+P0PeMYqutDQULi7u+Pu3bsAckeG/vrrr5g2bZrIkTHGWMkrciI0depUucfZ2dkIDg7GiRMnMGPGjEIf5+XLl5BKpahcubJceeXKlREfH6+0zsOHDzFr1ixcvHgRamqFC33ZsmVYsGBBoeNirKKQyWT45ZdfMGvWLGRlZQEAtLS0sGrVKowfP17k6Bhj7PMociI0ZcoUpeUbNmzAjRs3ihzA+61L+fVFkEqlcHNzw4IFC2BnZ1fo43///ffw8vISHqekpMDGxqbIcTJWnsTGxsLDwwMBAQFCmaOjI/bs2VOmpsFgjLFPVWzDP7p3745Dhw4Vev9KlSpBVVVVofUnISFBoZUIAN68eYMbN27gm2++gZqaGtTU1LBw4UKEhoZCTU0NZ86cUXoeTU1NGBgYyP0BgH6GJpCaXYQrZKx8OHz4MBwcHOSSoOnTp+Pq1aucBDHGKpxiS4QOHjwIExOTQu+voaGBJk2ayH0YA0BAQABatWqlsL+BgQHu3LmDkJAQ4W/cuHGoU6cOQkJC0KJFiyLFGz17IdD/zyLVYaysi42NxZAhQ5CYmAgAsLKyQkBAAFauXAlNTU2Ro2OMsc+vyLfGGjVqJHfriogQHx+PFy9eYOPGjUU6lpeXF4YNG4amTZvC2dkZW7duRVRUFMaNGwcg97ZWTEwMfvvtN6ioqKBBgwZy9c3NzaGlpaVQzhhTzsrKCj/99BMmT56Mvn37wsfHB6amvNQMY6ziKnIi1KdPH7nHKioqMDMzwxdffIG6desW6ViDBg1CYmIiFi5ciLi4ODRo0ADHjx8Xlu+Ii4tDVFRUUUNkjP2fVCqFTCaDurq6UPbNN9+gRo0acHV15WHxjLEKr0gTKubk5MDPzw8uLi6wsLAoybhKjDAhk8kKGDSvBvgPEDskxkpEVFQUhg0bhhYtWmDFihVih8MYY5+kVEyoqKamhvHjx8vNy8MYK3327dsHBwcHXLhwAT/99BNOnz4tdkiMMVYqFbmzdIsWLeRWoi6rlnUNANx4hAwrX1JSUjB8+HAMGTIEycnJAICqVatCS0tL5MgYY6x0KnIfoQkTJmD69Ol49uwZmjRpAl1dXbntDg4OxRZcSfqx2z9YNqy+2GEwVmwCAwMxdOhQPHnyRChzc3PDhg0bYGRkJFpcjDFWmhW6j9CoUaOwZs0apR+oEolEmAhRKpUWd4zFihddZeVNdnY2Fi1ahCVLlkAmy31PGxgYYOPGjXB3dxc5OsYYKx4l1Ueo0ImQqqoq4uLikJ6eXuB+eSO+SitOhFh5kpCQgC+//BJXr14Vytq0aYNdu3ahWrVq4gXGGGPFrKQSoULfGsvLl0p7osNYRWJsbCz821RVVcWCBQswa9YsqKqqihwZY4yVDUXqLM1zjjBWuqirq8PPzw9OTk4ICgrCnDlzOAlijLEiKFJnaTs7uw8mQ0lJSZ8UEGMsf2fPnoWxsTGcnJyEslq1auHWrVv8Q4Uxxj5CkRKhBQsWwNDQsKRiYYzlIysrC3PnzsXKlStRp04d3Lx5Ezo6OsJ2ToIYY+zjFCkRGjx4MMzNzUsqFsaYEvfu3YObm5swf9e9e/fg4+ODKVOmiBwZY4yVfYXuI1TefnH2CXUALkSLHQZj+SIibN68GY0bNxaSIHV1daxcuRKTJk0SOTrGGCsfijxqrLzw3ekOhF8B2tmIHQpjChISEjB69GgcPXpUKLO3t8eePXvk+gcxxhj7NIVuEZLJZHxbjLHPwN/fHw4ODnJJ0IQJE3Djxg1OghhjrJgVeYkNxljJefbsGXr37o3s7GwAgJmZGbZv346ePXuKHBljjJVPRV50lTFWcqpUqYKFCxcCALp37447d+5wEsQYYyWIW4QYE5FMJgMRyU2COGPGDNSsWRP9+/cvd4MUGGOstKmwLUJdJm8EVnUQOwxWgcXGxqJbt25YtGiRXLmqqioGDBjASRBjjH0GFTYRul7tKVCvkthhsArq8OHDcHBwQEBAABYtWoSgoCCxQ2KMsQqpwiZCjIkhNTUVY8eORb9+/ZCYmAgAqFy5stA5mjHG2OfFfYQY+0xu3LgBd3d3PHjwQCjr27cvfHx8YGpqKmJkjDFWcXGLEGMlTCqVYtmyZXB2dhaSIB0dHWzbtg2HDh3iJIgxxkTELUKMlaCEhAQMGDAAFy5cEMqaNWsGPz8/1K5dW8TIGGOMAdwixFiJMjAwwOvXrwHkrtc3Z84cBAYGchLEGGOlBCdCjJUgLS0t7NmzB3Xq1MH58+exePFiqKurix0WY4yx/+NbY4wVo8DAQBgbG6NevXpCWf369XH37l25SRMZY4yVDhW2RWjt/v7A6htih8HKiezsbHh7e6Ndu3Zwc3NDZmam3HZOghhjrHSqsInQ8KvNgFORYofByoGIiAi0bdsWixYtgkwmQ2hoKLZu3Sp2WIwxxgqhwiZCjH0qIoKvry+cnJxw9epVALktP4sXL8aECRNEjo4xxlhhcB8hxj5CUlISvv76axw8eFAoq1mzJvbs2YPmzZuLGBljjLGi4BYhxorozJkzcHBwkEuCPD09ERISwkkQY4yVMRW2RShFKxMG2jyMmRVNVFQUXFxckJOTAwAwNjaGj48PvvrqK5EjY4wx9jEqbItQ1aXewB99xA6DlTFVq1bF999/DwDo2LEjbt++zUkQY4yVYRW2RYixwiAiEBFUVP77zTBv3jzUrFkTw4YNkytnjDFW9vCnOGP5SEhIQO/evbFq1Sq5cnV1dYwYMYKTIMYYKwf4k5wxJfz9/eHg4ICjR49izpw5uHXrltghMcYYKwGcCDH2jvT0dEyePBmurq54/vw5AMDIyAivXr0SOTLGGGMlgfsIMfZ/oaGhcHd3x927d4Wy7t27Y8eOHahcubKIkTHGGCsp3CLEKjyZTIbVq1ejefPmQhKkpaWFdevW4e+//+YkiDHGyjFuEWIV2osXL+Dm5oZ//vlHKHNwcMCePXtQv379Yj+fVCpFdnZ2sR+XMcbKAw0Njc8+EIUTIVah6ejoICoqSng8ffp0LFmyBJqamsV6HiJCfHw8Xr9+XazHZYyx8kRFRQXVq1eHhobGZzunhIjos52tFEhJSYGhoSHCasyBvXNLYHdPsUNiIrt58yb69+8PHx8fdO7cuUTOERcXh9evX8Pc3Bw6OjqQSCQlch7GGCurZDIZYmNjoa6ujqpVqyp8TuZ9fycnJ8PAwKDYzlthW4SsXxsCielih8E+sxs3bsDY2Bg1a9YUypo0aYIHDx5AXb1kllyRSqVCEmRqaloi52CMsfLAzMwMsbGxyMnJKbHP5PdxZ2lWIUilUixbtgzOzs5wd3dX6KdTkv/g8s6lo6NTYudgjLHyIO+WmFQq/Wzn5ESIlXtRUVHo2LEjZs+ejZycHFy9ehXbtm377HHw7TDGGCuYGJ+TnAixcm3fvn1wcHDAhQsXAOT+I5szZw5Gjx4tcmSMMcZKgwqbCP3peBtoU0XsMFgJSUlJwfDhwzFkyBAkJycDyF05/vz581i8ePFnu/fMWH4CAwPRsGFDqKuro0+fPkWuf+7cOUgkkjI3EvH+/fuwsLDAmzdvxA6l3Pn2228xefJkscMocypsIuQxwg+Y4yx2GKwEBAUFwcnJCbt27RLK3NzcEBoairZt24oYWdnj4eEBiUQCiUQCNTU1VK1aFePHj1e65EhQUBBcXV1hbGwMLS0tNGzYEKtWrVJ6r//s2bNwdXWFqakpdHR0UK9ePUyfPh0xMTGf47JKBS8vLzg5OSEyMhK+vr5ih1Nkr169wrBhw2BoaAhDQ0MMGzasUEnZnDlzMHHiROjr65d8kCI5dOgQ6tWrB01NTdSrVw+HDx/+YJ3ff/8dTk5O0NHRga2tLX766SeFfTZs2AB7e3toa2ujTp06+O233+S2z5w5Ezt27EBkZGSxXUuFQBVMcnIyASCJr0TsUFgJiIyMJDU1NQJAAMjAwIB2794takzp6ekUFhZG6enposbxMUaMGEHdunWjuLg4io6OppMnT5K1tTUNHjxYbr8//viD1NTUaMyYMRQcHEyRkZHk4+NDxsbG1L9/f5LJZMK+mzdvJhUVFRo5ciSdPXuWIiMj6fz58+Tp6UnTpk37bNeWmZn52c6ljKmpKW3fvv2j6589e5YA0KtXr4ovqCLo1q0bNWjQgIKCgigoKIgaNGhAPXv2LLBOdHQ0qaurU3R09CedW+zXriBBQUGkqqpKS5cupfDwcFq6dCmpqanRlStX8q1z/PhxUlNTo02bNlFERAQdO3aMLCwsaN26dcI+GzduJH19fdq3bx9FRETQ3r17SU9Pj44cOSJ3rH79+tHMmTNL7PpKWkGfl3nf38nJycV6Tk6EWLkzbdo0AkCtW7emyMhIscMp84lQ79695cq8vLzIxMREePz27VsyNTWlfv36KdQ/cuQIAaB9+/YRUe4XoYaGBk2dOlXp+Qr6Un/16hWNGTOGzM3NSVNTk+rXr09Hjx4lIqL58+eTo6Oj3P6rV68mW1tbhWtZunQpWVpakq2tLc2aNYtatGihcK6GDRuSt7e38Hj79u1Ut25d0tTUpDp16tCGDRvyjZOIKCMjgyZNmkRmZmakqalJrVu3pmvXrhFRbrKel6jn/e3YsSPf48yYMYOqVKlCGhoaVKtWLdq2bRsRKSZCL1++pMGDB5O1tTVpa2tTgwYNaM+ePXLHO3DgADVo0IC0tLTIxMSEOnXqRG/fvhWO16xZM9LR0SFDQ0Nq1aoVPXnyRGlcYWFhBEDuy/3y5csEgO7du5fv87Jq1Spq2rSpXFlh4m7fvj1NnDiRpk2bRqamptSuXTsiIrp79y51796ddHV1ydzcnIYOHUovXrwQ6vn7+1Pr1q3J0NCQTExMqEePHvTo0aN84ysOAwcOpG7dusmVubi4KPx4eNeQIUOof//+cmWrV6+mKlWqCD8inJ2d6dtvv5XbZ8qUKdS6dWu5Ml9fX7KxsfmUSxCVGIlQhb01xsoHyk3m5cqWLl2KDRs24Ny5c6hWrZo4gZVTjx8/xokTJ+T6WJ06dQqJiYn49ttvFfbv1asX7OzssHfvXgDAgQMHkJWVhZkzZyo9vpGRkdJymUyG7t27IygoCLt370ZYWBiWL18OVVXVIsV/+vRphIeHIyAgAMeOHYO7uzuuXr2KiIgIYZ+7d+/izp07cHd3BwD4+Phgzpw5WLJkCcLDw7F06VLMmzcPO3fuzPc8M2fOxKFDh7Bz507cunULtWrVgouLC5KSkmBjY4O4uDgYGBhgzZo1iIuLw6BBg5QeZ/jw4di3bx/Wrl2L8PBwbN68GXp6ekr3zcjIQJMmTXDs2DH8+++/GDt2LIYNG4arV68CyJ3Uc8iQIRg1ahTCw8Nx7tw59OvXD0SEnJwc9OnTB+3bt8ft27dx+fJljB07Nt8RPJcvX4ahoSFatGghlLVs2RKGhoYICgrK93m5cOECmjZtWqS48+zcuRNqamoIDAzEli1bEBcXh/bt28PJyQk3btzAiRMn8Pz5cwwcOFCok5qaCi8vL1y/fh2nT5+GiooK+vbtC5lMlm+MS5cuhZ6eXoF/Fy9ezLf+5cuX0bVrV7kyFxeXAp+XzMxMaGlpyZVpa2vj2bNnePr0aYH7XLt2TW46kObNmyM6OlqoxwqhWNOqMoBbhMqPxMRE6t+/P61fv17sUAqU3y+cJk2akLW19Wf/a9KkSaFjHzFiBKmqqpKuri5paWkJLRg///yzsM/y5csLvEXz5Zdfkr29PRERjR8/ngwMDIr8HJ48eZJUVFTo/v37SrcXtkWocuXKCrdVHBwcaOHChcLj77//npo1ayY8trGxUWihWLRoETk7OyuN5e3bt6Surk5+fn5CWVZWFllZWdGKFSuEMkNDw3xbgoiI7t+/TwAoICBA6fbC3BpzdXWl6dOnExHRzZs3CYDSVp7ExEQCQOfOncv3WO9asmQJ1a5dW6G8du3atHTp0nzrOTo6yj3XhYmbKLdFyMnJSW6fefPmUdeuXeXKoqOjCUC+75OEhAQCQHfu3Mn33ImJifTw4cMC/9LS0vKt//5rT0Tk5+dHGhoa+dbZsmUL6ejo0D///ENSqZTu379PdevWJQAUFBRERLnvSwsLC7px4wbJZDK6fv06mZubEwCKjY0VjpX3HVfY17K0EaNFqMLOLM3KtrNnz2LYsGGIiYnBsWPH8MUXX5TIIqklKT4+vkx0Du7QoQM2bdqEtLQ0bNu2DQ8ePMCkSZMU9qN8VushIqFl4d3/L4qQkBBUqVIFdnZ2Ra77roYNGyqsYeTu7o7t27dj3rx5ICLs3bsXU6dOBZC7KG90dDQ8PT0xZswYoU5OTg4MDQ2VniMiIgLZ2dlo3bq1UKauro7mzZsjPDy80LGGhIRAVVUV7du3L9T+UqkUy5cvx/79+xETE4PMzExkZmZCV1cXAODo6IhOnTqhYcOGcHFxQdeuXdG/f38YGxvDxMQEHh4ecHFxQZcuXdC5c2cMHDgQlpaW+Z5P2ev4odc3PT1doVXjQ3Hneb8l6ebNmzh79qzSFrKIiAjY2dkhIiIC8+bNw5UrV/Dy5UuhJSgqKgoNGjRQGqOJiQlMTEzyvYbCeP85+NDzMmbMGERERKBnz57Izs6GgYEBpkyZgh9++EFo9Zw3bx7i4+PRsmVLEBEqV64MDw8PrFixQq5lVFtbGwCQlpb2SddQkXAixMqUrKwszJ07FytXrhS+eLW1tRETE1PmEiELC4sycV5dXV3UqlULALB27Vp06NABCxYswKJFiwBASE7Cw8PRqlUrhfr37t1DvXr1hH2Tk5MRFxdX4Jfs+/I+3POjoqKikIi9P3t43rW8z83NDbNmzcKtW7eQnp6O6OhoDB48GACEL04fHx+520AA8r0tlxdHUb8M3/eha37fqlWrsHr1aqxZswYNGzaErq4upk6diqysLCHegIAABAUF4dSpU1i3bh3mzJmDq1evonr16tixYwcmT56MEydOYP/+/Zg7dy4CAgLQsmVLhXNZWFjg+fPnCuUvXrxA5cqV842xUqVKCiMOPxR3nvdfO5lMhl69euHHH39UOE/ee6tXr16wsbGBj48PrKysIJPJ0KBBA4Vjv2vp0qVYunRpvtsBwN/fP98RqBYWFoiPj5crS0hIKPB5kUgk+PHHH7F06VLEx8fDzMwMp0+fBgDh9r62tja2b9+OLVu24Pnz57C0tMTWrVuhr6+PSpUqCcdKSkoCkLtUBSukYm1fKgP41ljZFRYWRo0aNZLrZNqxY8dPHoFS0spbZ+mzZ8+SlpYWxcTEEFHurSATExOlnaX/+usvuc7SUVFRH9VZ+ty5cwXeGtu4cSOZm5vLjU5zc3NT2llamfbt25OXlxeNHz+eXFxc5LZZW1sX6nZOnrdv35KGhobCrTFra2v66aefhLIP3RqLjIwkiURS6FtjPXv2pFGjRgnbpVIp2dnZ5XvNOTk5ZG1tTatWrVK6vWXLljRp0iSl2/I6S1+9elUou3Llygc7S0+cOFEhnsLE3b59e5oyZYpcvdmzZ1OdOnUoOztb6blevnxJAOjChQtC2cWLFwkAHT58ON8YP/XW2MCBA6l79+5yZd26dSuws7Qyw4YNy/f2a5527drRkCFD5Mr++ecfUldXLzDG0oxHjX0GeU+k/XILosjXYofDCkEmk9HGjRtJW1tbSIDU1dVp5cqVJJVKxQ7vg8pbIkSU279p4sSJwuMDBw6QqqoqjRkzhkJDQykyMpK2bdumdPj8hg0bSCKR0KhRo+jcuXP05MkTunTpEo0dO5a8vLzyjeWLL76gBg0a0KlTp+jx48d0/Phx8vf3J6LcL2aJRELLly+nR48e0fr168nY2LjQidDWrVvJysqKKlWqRLt27ZLb5uPjQ9ra2rRmzRq6f/8+3b59m7Zv355vAkGUO5rHysqK/P396e7duzRixAgyNjampKQkYZ8PJUJERB4eHmRjY0OHDx+mx48f09mzZ2n//v1EpJgITZ06lWxsbCgwMJDCwsJo9OjRZGBgIFzzlStXaMmSJXT9+nV6+vQp/f7776ShoUHHjx+nx48f06xZsygoKIiePHlCJ0+eJBMTE9q4cWO+sXXr1o0cHBzo8uXLdPnyZWrYsOEHh88fOXKEzM3NKScnRyj7UNxEyhOhmJgYMjMzo/79+9PVq1cpIiKCTp48SSNHjqScnBySSqVkampKQ4cOpYcPH9Lp06epWbNmH0yEPlVgYCCpqqrS8uXLKTw8nJYvX64wfH7dunXUsWNH4fGLFy9o06ZNFB4eTsHBwTR58mTS0tKSSzTv379Pu3btogcPHtDVq1dp0KBBZGJiojAydv78+XLHLms4EfoMhCfSZAVRt9/FDod9wMuXL6lnz55yrUD29vZ069YtsUMrtPKYCOV1/oyKihLKLly4QN26dSNDQ0PS0NCgevXq0cqVK+W+9PIEBASQi4sLGRsbk5aWFtWtW5e+/fZbuU6f70tMTKSRI0eSqakpaWlpUYMGDejYsWPC9k2bNpGNjQ3p6urS8OHDacmSJYVOhF69ekWampqko6NDb968UXq9Tk5OpKGhQcbGxtSuXTv6448/8o01PT2dJk2aRJUqVVIYPp+nMIlQeno6TZs2jSwtLYXh83lzD72fCCUmJlLv3r1JT0+PzM3Nae7cuTR8+HDhmsPCwsjFxUUY0m9nZyfMUxMfH099+vQRzmNra0ve3t4F/tBITEwkd3d30tfXJ319fXJ3d//gnEZ5rVAnTpyQO05BcRMpT4SIiB48eEB9+/YlIyMj0tbWprp169LUqVOFxDsgIIDs7e1JU1OTHBwc6Ny5cyWeCBHl/jCoU6cOqaurU926denQoUNy2+fPny/33nzx4gW1bNmSdHV1SUdHhzp16qQw71BYWBg5OTmRtra2kCgqa32zs7OjvXv3lsh1fQ5iJEISonx6OJZTKSkpMDQ0RLLJChg0rwb4DxA7JFaA5ORkODo6CkNBJ0yYgJ9++qlMreSekZGByMhIVK9eXaGjKGMVzcaNG/HXX3/h5MmTYodS7vz999+YMWMGbt++DTW1stkFuKDPS+H7OzkZBgYGxXZOnkeIlWqGhobYvXs3LC0tcfToUWzYsKFMJUGMMXljx45Fu3bteK2xEpCamoodO3aU2SRILPxssVIlNDQUJiYmsLGxEcratGmDx48fc2sKY+WAmpoa5syZI3YY5dK7k0mywhO9RWjjxo1CE1iTJk0KnLHzjz/+QJcuXWBmZgYDAwM4Oztz82o5IZPJsHr1ajRv3hzDhg1TWKiTkyDGGGMlQdREaP/+/Zg6dSrmzJmD4OBgtG3bFt27d0dUVJTS/S9cuIAuXbrg+PHjuHnzJjp06IBevXohODi4yOeePPAgMKXph3dkJS42NhbdunWDl5cXsrKycP78eWzfvl3ssBhjjFUAonaWbtGiBRo3boxNmzYJZfb29ujTpw+WLVtWqGPUr18fgwYNgre3d6H2z+tsJfGVQDYi//Vm2Odx+PBhjBkzBomJiULZ9OnTsWTJEmhqaooYWfHhztKMMVY4YnSWFq2PUFZWFm7evIlZs2bJlXft2rXAxeneJZPJ8ObNmwKnQ8+brj1PSkrKxwXMilVqaiqmTZsGHx8foczKygo7d+5E586dRYyMMcZYRSLarbGXL19CKpUqTDteuXJlhenJ87Nq1SqkpqYW2EFs2bJlMDQ0FP7e7YTLxHHjxg00btxYLgnq168fbt++zUkQY4yxz0r0ztIfux7P3r178cMPP2D//v0wNzfPd7/vv/8eycnJwl90dPQnx8w+3uPHj+Hs7IwHDx4AyF0/6Ndff8XBgwdhamoqcnSMMcYqGtESoUqVKkFVVbXIi9MBuZ2sPT098fvvv3+wBUFTUxMGBgZyf0w8NWrUgKenJwCgWbNmCA4OxqhRoz5qRXLGGGPsU4mWCGloaKBJkyYICAiQKw8ICFC6gnWevXv3wsPDA3v27EGPHj1KOkxWAlatWoWVK1ciMDAQtWvXFjscxkQRGBiIhg0bQl1dHX369Cly/XPnzkEikeD169fFHltJun//PiwsLHhCxRLw7bffYvLkyWKHUeaIemvMy8sL27Ztw/bt2xEeHo5p06YhKioK48aNA5B7W2v48OHC/nv37sXw4cOxatUqtGzZEvHx8YiPj0dycrJYl8AKkJKSguHDh2PHjh1y5bq6upg+fTrU1dVFiowVloeHByQSCSQSCdTU1FC1alWMHz8er169Utg3KCgIrq6uMDY2hpaWFho2bIhVq1YpzAkFAGfPnoWrqytMTU2ho6ODevXqYfr06YiJifkcl1UqeHl5wcnJCZGRkfD19RU7nCJbsmQJWrVqBR0dHRgZGRW63pw5czBx4kTo6+uXXHAiO3ToEOrVqwdNTU3Uq1cPhw8f/mCd33//HU5OTtDR0YGtrS1++uknhX02bNgAe3t7aGtro06dOvjtt9/kts+cORM7duxAZGRksV1LhVCsK5d9hA0bNpCtrS1paGhQ48aN6fz588K2ESNGUPv27YXH7du3l1t8M+9vxIgRhT5f3qJtEl9JMV4Fe19gYCBVr16dAJCenh49fPhQ7JBEU9YXXe3WrRvFxcVRdHQ0nTx5kqytrWnw4MFy+/3xxx+kpqZGY8aMoeDgYIqMjCQfHx+lq89v3ryZVFRUaOTIkXT27FmKjIyk8+fPk6enJ02bNu2zXVtmZuZnO5cypqamwgKqH+P9RVc/N29vb/r555/Jy8uLDA0NC1UnOjqa1NXVKTo6+pPOLfZrV5CgoCBSVVWlpUuXUnh4OC1dulRh9fn3HT9+nNTU1GjTpk0UERFBx44dIwsLC2FRXCKijRs3kr6+Pu3bt48iIiJo7969pKenR0eOHJE7Vr9+/WjmzJkldn0ljVef/wzynsiJI9sRHbovdjjlTnZ2Nnl7e5OKioqQqBoYGJC/v7/YoYmmrCdC76/Y7uXlRSYmJsLjt2/fkqmpKfXr10+h/pEjRwgA7du3j4hyvwg1NDRo6tSpSs9X0Jf6q1evaMyYMWRubk6amppUv359Onr0KBHlrubt6Ogot//q1auVrj6/dOlSsrS0JFtbW5o1axa1aNFC4VwNGzYkb29v4fH27dupbt26pKmpSXXq1KENGzbkGycRUUZGBk2aNElY6f3d1ecjIyMVfszltwp9RkYGzZgxg6pUqSKsPr9t2zYiUkyEXr58SYMHDyZra2vS1tamBg0a0J49e+SOd+DAAWrQoAFpaWmRiYkJderUid6+fSscr1mzZqSjo0OGhobUqlUrevLkSYHXSUS0Y8eOQidCq1atoqZNm8qVFSbu9u3b08SJE2natGlkampK7dq1IyKiu3fvUvfu3UlXV5fMzc1p6NCh9OLFC6Gev78/tW7dmgwNDcnExIR69OhBjx49KlSsH2vgwIHUrVs3uTIXFxeFHw/vGjJkCPXv31+ubPXq1VSlShXhR4SzszN9++23cvtMmTKFWrduLVfm6+tLNjY2n3IJohIjEaqwa40t/asnEBcK9LMTO5RyIyIiAu7u7rh69apQ1qZNG+zatQvVqlUTL7BSqum1pojPKtxUEcXJQsMCN5rf+Ki6jx8/xokTJ+Rua546dQqJiYn49ttvFfbv1asX7OzssHfvXgwaNAgHDhxAVlYWZs6cqfT4+d1ikclk6N69O968eYPdu3ejZs2aCAsLg6qqapHiP336NAwMDBAQEAD6/1yyy5cvR0REBGrWrAkAuHv3Lu7cuYODBw8CAHx8fDB//nysX78ejRo1QnBwMMaMGQNdXV2MGDFC6XlmzpyJQ4cOYefOnbC1tcWKFSvg4uKCR48ewcbGBnFxcahTpw4WLlyIQYMGwdDQUOlxhg8fjsuXL2Pt2rVwdHREZGQkXr58qXTfjIwMNGnSBN999x0MDAzw999/Y9iwYahRowZatGiBuLg4DBkyBCtWrEDfvn3x5s0bXLx4EUSEnJwc9OnTB2PGjMHevXuRlZWFa9euFfsghgsXLqBpU/kZ/T8Ud56dO3di/PjxCAwMBBEhLi4O7du3x5gxY/Dzzz8jPT0d3333HQYOHIgzZ84AyJ2vzMvLCw0bNkRqaiq8vb3Rt29fhISEQEVFec+QpUuXYunSpQVeh7+/P9q2bat02+XLlzFt2jS5MhcXF6xZsybf42VmZiosJq2trY1nz57h6dOnqFatGjIzMxUmGNTW1sa1a9eQnZ0t/Jts3rw5oqOj8fTpU9ja2hZ4Hez/ijWtKgOEjNJkBVG338UOp1yQyWS0Y8cO0tPTE37hqqqq0uLFiyknJ0fs8ESX3y8c64vWhH/w2f+sL1oXOvYRI0aQqqoq6erqkpaWlvD6/vzzz8I+y5cvL/AWzZdffkn29vZERDR+/HgyMDAo8nN48uRJUlFRofv3lbfiFrZFqHLlygq3VRwcHGjhwoXC4++//56aNWsmPLaxsVFooVi0aBE5OzsrjeXt27ekrq5Ofn5+QllWVhZZWVnRihUrhDJDQ8N8W4KIiO7fv08AKCAgQOn2wtwac3V1penTpxMR0c2bNwmA0laexMREAkDnzp3L91j5KUqLkKOjo9xznZ934ybKbRFycnKS22fevHnUtWtXubLo6GgCkO/7JCEhgQDQnTt38j13YmIiPXz4sMC/tLS0fOu//9oTEfn5+ZGGhka+dbZs2UI6Ojr0zz//kFQqpfv371PdunUJAAUFBRFR7vvSwsKCbty4QTKZjK5fv07m5uYEgGJjY4Vj5X3HfcxrWRpwixArc169eoWxY8cKv54BoGbNmtizZw+aN28uYmSln4WGRZk4b4cOHbBp0yakpaVh27ZtePDgASZNmqSwH+WzWg+9MzcYFXKesPeFhISgSpUqsLP7tBbchg0bQkNDQ67M3d0d27dvx7x580BE2Lt3L6ZOnQoAePHiBaKjo+Hp6YkxY8YIdXJycvJtxYmIiEB2djZat24tlKmrq6N58+YIDw8vdKwhISFQVVVF+/btC7W/VCrF8uXLsX//fsTExAiz6uvq6gIAHB0d0alTJzRs2BAuLi7o2rUr+vfvD2NjY5iYmMDDwwMuLi7o0qULOnfujIEDB8LS0rLQ8RZGenq6QqvGh+LO835L0s2bN3H27Fno6ekpnCciIgJ2dnaIiIjAvHnzcOXKFbx8+RIyWe6ySlFRUWjQoIHSGE1MTApcraAwijo/3pgxYxAREYGePXsiOzsbBgYGmDJlCn744Qeh1XPevHmIj49Hy5YtQUSoXLkyPDw8sGLFCrmWUW1tbQBAWlraJ11DRcKJEPskMplMbkkUT09PrFmzRumHE5P3sbenPjddXV3UqlULALB27Vp06NABCxYswKJFiwBASE7Cw8OVTn1x79491KtXT9g3OTkZcXFxRfqSzftwz4+KiopCIpadna30Wt7n5uaGWbNm4datW0hPT0d0dDQGDx4MAMIXp4+Pj9xtGgD53pbLi+NjJ4vN86Frft+qVauwevVqrFmzBg0bNoSuri6mTp2KrKwsId6AgAAEBQXh1KlTWLduHebMmYOrV6+ievXq2LFjByZPnowTJ05g//79mDt3LgICAtCyZcsixVGQSpUqKYw4/FDced5/7WQyGXr16oUff/xR4Tx5761evXrBxsYGPj4+sLKygkwmQ4MGDRSO/a5PvTVmYWFR5PnxJBIJfvzxRyxduhTx8fEwMzPD6dOnAUDoVqCtrY3t27djy5YteP78OSwtLbF161bo6+ujUqVKwrGSkpIAAGZmZgVeA/uP6DNLs7LN1NQUO3fuhKmpKQ4ePIht27ZxElTOzZ8/HytXrkRsbCyA3PUBTUxMsGrVKoV9jxw5gocPH2LIkCEAgP79+0NDQwMrVqxQeuz85sRxcHDAs2fPhBnJ32dmZob4+Hi5ZCgkJKRQ11OlShW0a9cOfn5+8PPzQ+fOnYUvrcqVK8Pa2hqPHz9GrVq15P6qV6+u9Hi1atWChoYGLl26JJRlZ2fjxo0bsLe3L1RMQG7rlUwmw/nz5wu1/8WLF9G7d28MHToUjo6OqFGjBh4+fCi3j0QiQevWrbFgwQIEBwdDQ0NDbmh3o0aN8P333yMoKAgNGjTAnj17Ch1vYTRq1AhhYWFFjluZxo0b4+7du6hWrZrCa6Orq4vExESEh4dj7ty56NSpE+zt7ZVO+/C+cePGISQkpMC/91un3uXs7KwwP96pU6cKnB8vj6qqKqytraGhoYG9e/fC2dlZYeUEdXV1VKlSBaqqqti3bx969uwp19/p33//hbq6OurXr//B87H/K9YbbWVA3j1G6zVGRAmpYodT5oSFhVF8fLxCeUpKigjRlA3lbdQYEVGTJk1o4sSJwuMDBw6QqqoqjRkzhkJDQykyMpK2bdumdPj8hg0bSCKR0KhRo+jcuXP05MkTunTpEo0dO5a8vLzyjeWLL76gBg0a0KlTp+jx48d0/PhxYTRiWFgYSSQSWr58OT169IjWr19PxsbGSkeNKbN161aysrKiSpUq0a5du+S2+fj4kLa2Nq1Zs4bu379Pt2/fpu3bt9OqVavyjXXKlClkZWVF/v7+dPfuXRoxYgQZGxtTUlKSsM+H+ggREXl4eJCNjQ0dPnyYHj9+TGfPnqX9+/cTkWIfoalTp5KNjQ0FBgZSWFgYjR49mgwMDIRrvnLlCi1ZsoSuX79OT58+pd9//500NDTo+PHj9PjxY5o1axYFBQXRkydP6OTJk2RiYkIbN27MN7anT59ScHAwLViwgPT09Cg4OJiCg4PpzZs3+dY5cuQImZuby/Ud/FDcRLl9hKZMmSJ3rJiYGDIzM6P+/fvT1atXKSIigk6ePEkjR46knJwckkqlZGpqSkOHDqWHDx/S6dOnqVmzZgSADh8+XODz/ikCAwNJVVWVli9fTuHh4bR8+XKF4fPr1q2jjh07Co9fvHhBmzZtovDwcAoODqbJkyeTlpYWXb16Vdjn/v37tGvXLnrw4AFdvXqVBg0aRCYmJhQZGSl3/vnz58sdu6zh4fOfAc8j9HFkMhlt2rSJtLW1qXv37nJfbKxg5TERyuv8GRUVJZRduHCBunXrRoaGhqShoUH16tWjlStXKu0wHxAQQC4uLmRsbExaWlpUt25d+vbbb+U6fb4vMTGRRo4cSaampqSlpUUNGjSgY8eOCds3bdpENjY2pKurS8OHD6clS5YUOhF69eoVaWpqko6OjtIvcj8/P3JyciINDQ0yNjamdu3a0R9//JFvrOnp6TRp0iSqVKmSwvD5PIVJhNLT02natGlkaWkpDJ/Pm3vo/UQoMTGRevfuTXp6emRubk5z586l4cOHC9ccFhZGLi4uwpB+Ozs7YZ6a+Ph46tOnj3AeW1tb8vb2JqlUmm9sI0aMUDqv29mzZ/Otk5OTQ9bW1nTixAmh7ENxEylPhIiIHjx4QH379iUjIyPS1tamunXr0tSpU4XPp4CAALK3tydNTU1ycHCgc+fOlXgiRJT7w6BOnTqkrq5OdevWpUOHDsltnz9/vtx788WLF9SyZUvS1dUlHR0d6tSpk8K8Q2FhYeTk5ETa2tpConjv3j2Fc9vZ2dHevXtL5Lo+BzESIQlRPj0cy6mUlBQYGhpC4iuBbIRM7HDKhISEBIwePRpHjx4Vynbs2AEPDw/xgipDMjIyEBkZierVqyt0FGWsotm4cSP++usvnDx5UuxQyp2///4bM2bMwO3bt6GmVja7ABf0eZn3/Z2cnFys64aWzWeKfTYnTpyAh4cHnj9/LpRNmDABAwcOFDEqxlhZNXbsWLx69Qpv3rwp18tsiCE1NRU7duwos0mQWPjZYkqlp6dj1qxZWLt2rVBmZmaG7du3o2fPniJGxhgry9TU1DBnzhyxwyiX+Afqx+FEiCm4c+cO3Nzc8O+//wplrq6u2L59e4FDQBljjLGyhhMhJufRo0do2rSpMM+GlpYWVq5ciQkTJhT7dPuMMcaY2HgeISanVq1aGDRoEIDcmWhv3ryJiRMnchLEGGOsXOIWIaZg/fr1qF27NmbOnAlNTU2xw2GMMcZKTIVtETq0xROYc1HsMESVmpqKsWPHYv/+/XLlBgYGmDdvHidBjDHGyr0Kmwh1um8H3Ir/8I7l1I0bN9C4cWP4+Phg3LhxiI6OFjskxhhj7LOrsIlQRSWVSrFs2TI4OzsL6zZlZWXh9u3bIkfGyotz585BIpHku24YY6XJr7/+iq5du4odRrnUrFkz/PHHH2KH8UGcCFUgUVFR6NixI2bPno2cnBwAuW/UkJAQ9OjRQ+ToWHnRqlUrxMXFwdDQUOxQKgyJRCL86enpwdHREb6+vgr7SaVSrF69Gg4ODtDS0oKRkRG6d++OwMBAhX2zsrKwYsUKODo6QkdHB5UqVULr1q2xY8cOZGdnf4arKnmZmZnw9vbGvHnzxA6lxBARfvjhB1hZWUFbWxtffPEF7t69W2Cd7OxsLFy4EDVr1oSWlhYcHR1x4sQJuX3evHmDqVOnwtbWFtra2mjVqhWuX78ut8+8efMwa9YsyGSlfBWHYl2wowwQ1ioxWUHU7Xexw/ls9u7dS4aGhsJ6QBKJhObMmUNZWVlih1buleW1xsSQmZkpdghFIpPJKDs7W9QYANCOHTsoLi6OHj16REuWLCEAcmt6yWQy6t+/PxkZGZGPjw89fvyYQkJCaMyYMaSmpia3/lZmZiZ98cUXZGxsTOvXr6fg4GCKiIggPz8/atSoEQUHB3+2ayvJzyg/Pz+ys7P75OOU5s/R5cuXk76+Ph06dIju3LlDgwYNIktLywIXyp45cyZZWVnR33//TREREbRx40bS0tKiW7duCfsMHDiQ6tWrR+fPn6eHDx/S/PnzycDAgJ49eybsk5OTQ+bm5nT8+PFCx8uLrn4GeU/klXrTicae+HCFMi45OZmGDRsmtyhi1apV6cKFC2KHVmEUmAh1+13xb3PIhw96PU553etxxRp7+/bt6ZtvvqEpU6aQkZERmZub05YtW+jt27fk4eFBenp6VKNGDbkPuvcXAyUiunTpErVr1460tbXJyMiIunbtKqzE3r59e5o4cSJNmzaNTE1NqV27dkREdO7cOWrWrBlpaGiQhYUFfffddx9MOK5du0adO3cmU1NTMjAwoHbt2tHNmzeF7YMHD6ZBgwbJ1cnKyiJTU1NhMVOZTEY//vgjVa9enbS0tMjBwYEOHDigcH0nTpygJk2akLq6Op05c4YePXpEX375JZmbm5Ouri41bdqUAgIC5M4VGxtLrq6upKWlRdWqVSM/Pz+ytbWl1atXC/u8fv2axowZQ2ZmZqSvr08dOnSgkJCC3xNQspCoiYkJeXl5CY/37dtHAOjIkSMK9fv160empqb09u1bIiL68ccfSUVFRe6L793nK28/ZQp6rd+/ViIiR0dHmj9/vty1bNq0ib788kvS0dGhuXPnkrW1NW3atEmu3s2bNwkARUREENHHPW+9evWib7/9Vq7sQ+8hZTF6e3sTEdGRI0eocePGpKmpSdWrV6cffvhB7j27atUqatCgAeno6FCVKlVo/PjxShf5LS4ymYwsLCxo+fLlQllGRgYZGhrS5s2b861naWlJ69evlyvr3bs3ubu7ExFRWloaqaqqyi16TJT7Ws6ZM0euzMPDg4YNG1bomMVIhCrsrTHnmT8DW1zEDqPEpaWlwd/fX3g8ZMgQhIaGom3btiJGxQQ3niv+PUv5cL2UTOV1UzKLPcSdO3eiUqVKuHbtGiZNmoTx48djwIABaNWqFW7dugUXFxcMGzYMaWlpSuuHhISgU6dOqF+/Pi5fvoxLly6hV69ekEqlcudQU1NDYGAgtmzZgpiYGLi6uqJZs2YIDQ3Fpk2b8Ouvv2Lx4sUFxvrmzRuMGDECFy9exJUrV1C7dm24urrizZs3AAB3d3ccOXIEb9++FeqcPHkSqamp+OqrrwAAc+fOxY4dO7Bp0ybcvXsX06ZNw9ChQ3H+/Hm5c82cORPLli1DeHg4HBwc8PbtW7i6uuKff/5BcHAwXFxc0KtXL0RFRQl1hg8fjtjYWJw7dw6HDh3C1q1bkZCQIGwnIvTo0QPx8fE4fvw4bt68icaNG6NTp05ISkoq1OsllUrx+++/IykpCerq6kL5nj17YGdnh169einUmT59OhITExEQEAAA8PPzQ+fOndGoUSOFfdXV1aGrq6v03IV5rQtj/vz56N27N+7cuYPRo0dj8ODB8PPzk9tnz549cHZ2Ro0aNT76ebt48SKaNm0qV/ah95CyGEeNGoWTJ09i6NChmDx5MsLCwrBlyxb4+vpiyZIlQh0VFRWsXbsW//77L3bu3IkzZ85g5syZBT4X3bt3h56eXoF/+YmMjER8fLxcHyhNTU20b98eQUFB+dbLzMxUWPBUW1sbly5dAgDk5ORAKpUWuE+e5s2b4+LFUj5Cu1jTqjIgL6OU+ErEDuWz+euvv8jAwIB2794tdigVUoEtQpXWKf7NLURr3eknyuueflKssbdv357atGkjPM7JySFdXV25X3hxcXEEgC5fvkxEii1CQ4YModatWxd4DicnJ7my2bNnU506dUgmkwllGzZsID09PZJKpYWOPycnh/T19eno0aNElNuaUalSJfrtt9+EfYYMGUIDBgwgIqK3b9+SlpYWBQUFyR3H09OThgwZInd9f/755wfPX69ePVq3bh0REYWHhxMAun79urD94cOHBEBoJTl9+jQZGBhQRkaG3HFq1qxJW7Zsyfc8AEhLS4t0dXVJVVWVAJCJiQk9fPhQ2Kdu3brUu3dvpfWTkpIIAP34449ERKStrU2TJ0/+4PW970OvdWFbhKZOnSq3z61bt0gikdCTJ7nvb6lUStbW1rRhwwYi+rjn7dWrVwTgg63j77+H8ouxbdu2tHTpUrmyXbt2kaWlZb7H/v3338nU1LTA8z979owePnxY4F9+AgMDCQDFxMTIlY8ZM4a6du2ab70hQ4ZQvXr16MGDBySVSunUqVOkra1NGhoawj7Ozs7Uvn17iomJoZycHNq1axdJJBKFW41//fUXqaioFPrfrRgtQjyhYjnz6NEjGBsbw9TUVCj78ssvERkZCRMTExEjY2WVg4OD8P+qqqowNTVFw4YNhbK89efebdl4V0hICAYMGFDgOd7/VR4eHg5nZ2e5Gc1bt26Nt2/f4tmzZwCAevXqCdtmz56N2bNnIyEhAd7e3jhz5gyeP38OqVSKtLQ0oVVGXV0dAwYMgJ+fH4YNG4bU1FT89ddf2LNnDwAgLCwMGRkZ6NKli1w8WVlZCq0j78ecmpqKBQsW4NixY4iNjUVOTg7S09OFc9+/fx9qampo3LixUKdWrVowNjYWHt+8eRNv376V+/cL5C6CHBERUeBzuHr1anTu3BnR0dHw8vLCtGnTUKtWrQLrvC/v+Saij5pNvjCvdWG8/9w2atQIdevWxd69ezFr1iycP38eCQkJwiKjH/O8paenA4BCq8aH3kP5xXjz5k1cv35drgVIKpUiIyMDaWlp0NHRwdmzZ7F06VKEhYUhJSUFOTk5yMjIQGpqar6tbNbW1vk9TYX2/mv5odf3l19+wZgxY1C3bl1IJBLUrFkTI0eOxI4dO4R9du3ahVGjRsHa2hqqqqpo3Lgx3NzccOvWLbljaWtrQyaTITMzE9ra2p98LSWBE6Fygojg6+uLSZMmoVu3bjhw4IDcG52TIPax3r29AuR+qL5blvc+y29kSGE+/N7/ElD2QU1EwvksLS0REhIibMt7f3t4eODFixdYs2YNbG1toampCWdnZ2HtPCD39lj79u2RkJCAgIAAaGlpoXv37nLX8Pfffyt8Ab0/wej7Mc+YMQMnT57EypUrUatWLWhra6N///7CufPif9+75TKZDJaWljh37pzCfkZGRkrr57GwsECtWrVQq1YtHDhwAI0aNULTpk2FhNHOzg5hYWFK64aHhwMAateuLeybV1YUH3qtVVRUFJ4HZSPQlCUF7u7u2LNnD2bNmoU9e/bAxcUFlSpVAvBxz5upqSkkEglevXolV16Y95CyGGUyGRYsWIB+/fopnEtLSwtPnz6Fq6srxo0bh0WLFsHExASXLl2Cp6dngaPwunfv/sFbS+/e6n2XhYUFACA+Ph6WlpZCeUJCQoELaJuZmeHPP/9ERkYGEhMTYWVlhVmzZqF69erCPjVr1sT58+eRmpqKlJQUWFpaYtCgQXL7AEBSUhJ0dHRKbRIEcCJULiQlJeHrr7/GwYMHAQCHDh3C3r174ebmJnJk7IOaKvkwqmLw4XoGmsrrGpS+2cAdHBxw+vRpLFiwoNB16tWrh0OHDsklREFBQdDX14e1tTVUVFSUtnZcvHgRGzduhKurKwAgOjoaL1++lNunVatWsLGxwf79++Hv748BAwZAQ0NDOK+mpiaioqLQvn37Il3nxYsX4eHhgb59+wLI/XJ68uSJsL1u3brIyclBcHAwmjRpAiC3Bffd+ZYaN26M+Ph4qKmpoVq1akU6/7tq1aqFr776Ct9//z3++usvAMDgwYPh5uaGo0ePKvQTWrVqFUxNTYWWMDc3N8yePRvBwcEKLWE5OTnIzMxUmqx86LU2MzNDXFyc8DglJQWRkZGFuiY3NzfMnTsXN2/exMGDB7Fp0yZh28c8bxoaGqhXrx7CwsLk+tAU5j2kTOPGjXH//v18W+Fu3LiBnJwcrFq1Cioqud1zf//99w8ed9u2bULrVVFVr14dFhYWCAgIEF7HrKwsnD9/Hj/++OMH62tpacHa2hrZ2dk4dOiQ0AL3Ll1dXejq6uLVq1c4efIkVqxYIbf933//lWsFLZWK9UZbGVDe+gidOXOGrK2t5UaFeXp6luhIBFY0ZXn4fPv27WnKlClyZcr6eeCdUUvv9xG6f/8+aWho0Pjx4yk0NJTCw8Np48aN9OLFi3zP8ezZM9LR0aGJEydSeHg4/fnnn1SpUiW5viTKODk5UZcuXSgsLIyuXLlCbdu2JW1tbYV4Z8+eTfXq1SM1NTW6ePGi3LY5c+aQqakp+fr60qNHj+jWrVu0fv168vX1VXp9efr06UNOTk4UHBxMISEh1KtXL9LX15e7ts6dO1Pjxo3p6tWrdOvWLerQoQNpa2vTmjVriCh3lE+bNm3I0dGRTpw4QZGRkRQYGEhz5syR61v0vnef/zy3b98miUQi1JPJZNS3b18yNjambdu2UWRkJIWGhtLYsWMVhs9nZGRQ27ZtheHzISEhFBERQfv376fGjRvnO3z+Q6/1rFmzyMLCgi5cuEB37tyhPn36kJ6enkIfofevJU+rVq3I0dGR9PT0KC0tTSj/2OfNy8uLvvrqK7mywryHlMV44sQJUlNTo/nz59O///5LYWFhtG/fPmEUVXBwMAGgNWvWUEREBP3222/CZ/f776XitHz5cjI0NKQ//viD7ty5Q0OGDFEYPj9s2DCaNWuW8PjKlSt06NAhioiIoAsXLlDHjh2pevXqcnGeOHGC/P396fHjx3Tq1ClydHSk5s2bK0wl0L59e1q4cGGh4+Xh859BeUmEMjMzacaMGSSRSIQEyNjYmA4ePCh2aOw9FT0RIsodCt+qVSvS1NQkIyMjcnFxEbYrO0denaIOn7916xY1bdqUNDU1qXbt2nTgwAGl8d69e5cAkK2trVyHbKLcL9VffvmF6tSpQ+rq6mRmZkYuLi50/vz5fK+PiCgyMlJIbGxsbGj9+vUK1xYbG0vdu3cnTU1NsrW1pT179pC5ubncUOaUlBSaNGkSWVlZkbq6OtnY2JC7uztFRUXle935JQ9dunSh7t27C4+zs7Np5cqVVL9+fdLU1CQDAwNycXFRSAaJcpOhZcuWUcOGDUlLS4tMTEyodevW5OvrW+DrUNBrnZycTAMHDiQDAwOysbEhX19fpZ2l80uENmzYQABo+PDhCts+5nkLDw8nbW1tev36tVBWmPdQfjGeOHGCWrVqRdra2mRgYEDNmzenrVu3Ctt//vlnsrS0JG1tbXJxcaHffvutxBMhmUxG8+fPJwsLC9LU1KR27drRnTt35PZp3749jRgxQnh87tw5sre3J01NTTI1NaVhw4YpdLjev38/1ahRQ/j3OXHiRLnnkSj3B426ujpFR0cXOl4xEiEJUT43rsuplJQUGBoaQuIrgWxEKZ/tMh/37t2Dm5sbgoODhbKOHTti586dqFKlioiRMWUyMjIQGRmJ6tWrK3TMZBXbs2fPYGNjg3/++QedOnUSO5wKaeDAgWjUqBG+//57sUMpd2bMmIHk5GRs3bq10HUK+rzM+/5OTk6GgUEhuhAUEvcRKmPu37+Pxo0bC/eM1dXVsWzZMkybNk2478wYK53OnDmDt2/fomHDhoiLi8PMmTNRrVo1tGvXTuzQKqyffvoJR44cETuMcsnc3Bzffvut2GF8UIX95nzt9SPQ/YDYYRSZnZ2dMMLF3t4e165dw/Tp0zkJYqwMyM7OxuzZs1G/fn307dsXZmZmOHfunMLIPPb52NraYtKkSWKHUS7NmDGjwNFppQW3CJUxEokEW7duhZ2dHebNmwcdHR2xQ2KMFZKLiwtcXMr/jPaMlSXcjFCKpaenY/LkyTh69KhcuampKZYtW8ZJEGOMMfaJOBEqpUJDQ9GsWTOsW7cOo0aNQnx8vNghMcYYY+UOJ0KljEwmw+rVq9G8eXPcvXsXQO7EbDdu3BA5MsYYY6z8qbCJ0Ib2F4E+dmKHISc2NhbdunWDl5eXMJ27o6Pj/9q786gorvRv4N8Gmq3ZBkRZBSEiOhEMuACOwxgNuAQME9EIQWVM1OCCkETN6IhJxvgzHo2aKPq6QDTgEhFjEsUQFaJIXFjcIApIIEYQEUEBAYHn/cNDjU03a4Am3c/nnD6HunWr6qm+3dRTVbfrIi0tDa+++qqCo2OMMcaUj8omQiumfAfMc1Z0GIL4+Hg4OTkhMTFRKHv33Xdx4cIFqcElGWOMMdZ1+FdjClZZWYmwsDDs2rVLKLOwsMCXX36J8ePHKzAyxhhjTPmp7BWh3uLhw4f4+uv/Pc/Iz88PV69e5SSIMcYUJCgoCJ988omiw1A6JSUlMDU1xe+//67oUKRwIqRg1tbW2LFjByQSCXbt2oW4uDiYmJgoOizG2J9YUlISRCKR8DIxMcHLL7+MlJQUmbplZWVYsmQJbG1toampCXNzcwQHB6OwsFCmbnFxMRYtWgQ7OztoaWnB2toaPj4+OHXqVE/sVo+4evUqvv/+e6V+yOLDhw8RFBQEQ0NDGBoaIigoCOXl5a0uc+/ePcyePRsWFhbQ1dXFhAkTkJOTI1UnLy9PeFCogYEBpk2bhnv37gnz+/bti6CgIERERHTHbnUaJ0I9rLCwEI8ePZIqmz59OnJzczFnzhyIRCIFRcZY7/X06VNFh9AhvSXemzdvoqioCElJSTA1NcXkyZNRUlIizC8rK4Obmxt+/PFHbNu2Dbm5uTh48CDy8vIwYsQI3L59W6j766+/wtXVFadPn8ann36Ka9euISEhAWPHjsWCBQt6bJ+ICPX19d22/i+++AL+/v7Q19fv9Dq6O8Y/KiAgAJmZmUhISEBCQgIyMzMRFBTUYn0iwmuvvYbbt2/jm2++QUZGBmxsbDB+/HhUVVUBAKqqquDl5QWRSITTp08jJSUFdXV18PHxQWPj/8b1DA4ORkxMDB4+fNjt+9luXTqE65+AIkef379/PxkaGsodOZkpr9ZGU3Zz2yXz2rQptc11pqb+JnfZ1NT2j/LcHp6enrRw4UIKDQ0lIyMj6tu3L+3YsYMqKytp9uzZpKenR3Z2dnT8+HFhmfr6evrXv/5Ftra2pK2tTQ4ODrRp0yaZde/evZuGDBkiNXp1EwAUGRlJvr6+pKurS6tWrSIiom3btpGdnR2JxWJycHCgvXv3trkP+/btI1dXV9LT06N+/frRjBkz6N69e0RE1NDQQJaWlhQZGSm1TFpaGgGgvLw8IiIqLy+nt99+m0xNTUlfX5/Gjh1LmZmZQv2IiAhydnam3bt304ABA0gkElFjYyOdOHGCRo8eTYaGhmRsbEyTJ0+m3NxcqW2lpKSQs7MzaWlpkaurK8XHxxMAysjIEOrcuHGDJk6cSBKJhPr27Utvvvkm3b9/v8V9PnPmjMyo5levXiUAdOzYMaFs/vz5JJFIqKioSGr56upqsrS0pAkTJghlEydOJEtLS6qsrJTZXlujp7fU1vn5+TL7+vDhQwJAZ86ckdqXhIQEcnV1JbFYTNu3bycAlJ2dLbWdDRs2kI2NDTU2NnbqfWtoaCAjIyP67rvvpMpb+wy1FOPp06epsbGR1q1bRwMGDCBtbW1ycnKir7/+Wliuvd+VrpSVlUUA6OeffxbKUlNTCQD98ssvcpe5efMmAaDr169LxW5sbEw7d+4kIqKTJ0+Smpqa1MjwZWVlBIASExOl1mdra0u7d++Wuy1FjD7PV4R6wKNHjzBz5kzMmDEDFRUV2Lt3L+Li4hQdFusFfv75jsyroKCizeUqKmrkLltRUdPlMX755Zfo06cPLl68iEWLFuGdd96Bv78/PDw8kJ6eDm9vbwQFBaG6uhrAs2dhWVlZ4dChQ8jKysKqVavw73//G4cOHRLWGRkZiQULFmDu3Lm4du0ajh07hhdeeEFquxEREZgyZQquXbuGf/3rX4iPj0doaCjeffddXL9+HfPmzUNwcDDOnDnTavx1dXX4+OOPceXKFRw9ehT5+fmYPXs2AEBNTQ1vvPEGYmJipJaJjY2Fu7s77OzsQESYPHkyiouLcfz4caSlpcHFxQXjxo1DWVmZsExubi4OHTqEuLg4ZGZmAnh2lhweHo5Lly7h1KlTUFNTg5+fn3CG/PjxY/j4+GDo0KFIT0/Hxx9/jGXLlknFUlRUBE9PTwwbNgyXL19GQkIC7t27h2nTprW7DaurqxEVFQUAwrhmjY2NOHDgAAIDA2FmZiZVX0dHByEhITh58iTKyspQVlaGhIQELFiwABKJRGb9RkZGLW67PW3dHkuXLsXatWuRnZ2NqVOnwtXVVW67BQQEQCQSdep9u3r1KsrLyzF8+HCp8tY+Qy3F6OTkhJUrVyIqKgqRkZG4ceMGwsLC8OabbyI5ORlA+74r8ujp6bX6ahqPUp7U1FQYGhpi1KhRQpmbmxsMDQ1x/vx5ucvU1tYCgNRo8Orq6tDU1MS5c+eEOiKRCFpaWkIdbW1tqKmpCXWajBw5EmfPnm11H3tUl6ZVfwI9fUXo3LlzZGtrSwCE14wZM9o8g2LKo7UzHGC1zCssLKHNdSYk5MhdNiEhp0tj9/T0pL/97W/CdH19PUkkEgoKChLKioqKCAClprZ8JSskJIRef/11YdrCwoJWrFjRYn0AtGTJEqkyDw8Pevvtt6XK/P39adKkSe3eHyKiixcvEgB6/PgxERGlp6eTSCSiX3/9lYj+d5Vo69atRER06tQpMjAwoJqaGqn12Nvb044dO4jo2RUhsVhMJSUlrW67pKSEANC1a9eIiCgyMpJMTEykPhs7d+6Uukryn//8h7y8vKTW89tvvxEAunnzptztNF2hkEgkJJFISCQSEQBydXWluro6IiIqLi4mAPTZZ5/JXceRI0cIAF24cIEuXLhAAOjIkSOt7p88rbV1R64IHT16VGrZjRs3kp2dnTDddNXixo0bRNS59y0+Pp7U1dWFK0otaf4ZkhdjZWUlaWtr0/nz56WWnTNnDs2YMaPFdTf/rsiTk5PT6uvOnTstLrtmzRoaOHCgTPnAgQPpk08+kbtMXV0d2djYkL+/P5WVlVFtbS2tXbuWAAjvcUlJCRkYGFBoaChVVVVRZWUlLViwgADQ3LlzpdYXFhZG//jHP+Rui68I9aBxvzgAl7tv2IqnT59i1apV+Pvf/45ff/0VAGBgYICvvvoKsbGxrZ5BMdabODk5CX+rq6vDxMQEQ4cOFcqaRpd+vu/J9u3bMXz4cJiamkJPTw87d+4UOt+WlJTg7t27GDduXKvbbX5Wnp2djdGjR0uVjR49GtnZ2QCAmJgYqbPipjPOjIwMTJkyBTY2NtDX18c//vEPABDieemll+Do6Ij9+/cDAJKTk1FSUiJcOUhLS0NlZSVMTEyk1p+fn4+8vDwhFhsbG5iamkrFl5eXh4CAANjZ2cHAwAADBgyQ2vbNmzfh5OQkdaY9cuRIqXWkpaXhzJkzUtt2dHQU1t+as2fPIj09Hfv374eNjQ2io6PbPdI9EQF4NtDz8393RHvbuj2afx7eeOMNFBQU4OeffwbwrP2HDRsmPHetM+/bkydPoKWlJbOfbX2G5MWYlZWFmpoavPLKK1Ix7N27V2r7rX1XWvLCCy+0+rK0tGx1eXntSEQttq9YLEZcXBxu3boFY2Nj6OrqIikpCRMnToS6ujoAwNTUFF9//TW+/fZb6OnpwdDQEBUVFXBxcRHqNNHR0RGuIPcGKvscobj/NwfIPAuc8O/ydefm5uLNN9/EhQsXhLLRo0fjq6++gq2tbZdvj7Hu1PzAKRKJpMqa/nk23e45dOgQwsLCsGHDBri7u0NfXx/r168Xvg86Ojrt2q68WzDN/1E//8/b19dX6nK/paWl0IHTy8sLX331FUxNTVFYWAhvb2/h6e0AEBgYiNjYWCxfvhyxsbHw9vZGnz59hP0yNzdHUlKSTDzPn9DIi9fHxwfW1tbYuXMnLCws0NjYiBdffFHYtryDT1PS0aSxsRE+Pj5Yt26dzPrNzc1lyp43YMAAGBkZwcHBATU1NfDz88P169ehpaUFU1NTGBkZISsrS+6yv/zyC0QiEezt7QE8e++zs7Px2muvtbrN57XV1mpqz87Fn9/nljqaN39/zc3NMXbsWMTGxsLNzQ379+/HvHnzhPmded/69OmD6upq1NXVQVNTEwDa/RlqHmPT9+H777+XSUyabh+19V1piZ6eXqvzx4wZgxMnTsidZ2ZmJvVLrib3798XTmrkcXV1RWZmJioqKlBXVwdTU1OMGjVKKvnz8vJCXl4eSktLoaGhASMjI5iZmQknAE3KyspkThoUSWUToe6SnZ2NESNGCD3p1dXVsXr1aixfvhwaGvx2M2lublYyZTY2hm0uZ2ioLXdZQ0NtObV71tmzZ+Hh4YGQkBCh7PkzYH19fdja2uLUqVMYO3Zsu9c7ePBgnDt3DjNnzhTKzp8/j8GDBwvrbf5Ln7S0NJSWluL//u//YG1tDQByx+0LCAjAypUrkZaWhsOHDyMyMlKY5+LiguLiYmhoaHToRObBgwfIzs7Gjh07MGbMGACQ6Svh6OiImJgY1NbWCgfH5vG5uLggLi4Otra2f+h/SFBQED766CNs27YNYWFhUFNTw7Rp0xATE4OPPvpIqp/QkydPsG3bNnh7e8PY2BgA4O3tja1bt2Lx4sUySUl5ebncq9xttXXTwbCoqAgvvfQSAAj9q9ojMDAQy5Ytw4wZM5CXl4c33nhDmNeZ923YsGEAnl3Nafr7l19+addnqLkhQ4ZAS0sLhYWF8PT0lFunre9KS9p6j1pLQN3d3VFRUYGLFy8KVx8vXLiAiooKeHh4tLltQ8Nn/59ycnJw+fJlfPzxxzJ1mk4iTp8+jZKSEvj6+krNv379unBVrVfo0httfwLCPUbjT4kmHOry9Tc2NtKECRMIANnb20v1zGeqqbV73r2dp6cnhYaGSpXZ2NjI9CsBQPHx8UREtGnTJjIwMKCEhAS6efMmrVy5kgwMDMjZ2VmoHx0dTdra2rR582a6desWpaWl0ZYtW+Sur0l8fDyJxWKKjIykW7du0YYNG0hdXV3oSyJPSUkJaWpq0vvvv095eXn0zTffkIODg0y/FKJnfZCcnZ1JT0+PqqurhfLGxkb629/+Rs7OzpSQkED5+fmUkpJCK1asoEuXLhHR/3419ryGhgYyMTGhN998k3JycujUqVM0YsQIqX2rqKggY2NjmjlzJmVlZVFCQgI5OjoSAOFXab///juZmprS1KlT6cKFC5SXl0cnT56k4OBgqq+vl7vf8n41RkS0ZcsW6tu3L1VVVRER0f3798ne3p5efPFFOn78OBUWFlJycjKNGTOG+vbtK/xqjojo9u3bZGZmRkOGDKHDhw/TrVu3KCsrizZv3kyOjo4ttkFbbe3m5kZjxoyhGzduUHJyMo0cOVJuHyF5/SorKipIW1ubnJ2dady4cVLzOvO+ERG5uLjQ559/Lky35zPUUowrVqwgExMTio6OptzcXEpPT6cvvviCoqOjiah935XuMGHCBHJycqLU1FRKTU2loUOH0quvvipVZ9CgQVJ9wg4dOkRnzpyhvLw8Onr0KNnY2NA///lPqWX27NlDqamplJubS/v27SNjY2MKDw+XqlNVVUU6Ojr0008/yY1NEX2EOBHqBkVFRRQaGip0pGOqTdUSoZqaGpo9ezYZGhqSkZERvfPOO7R8+XKZf+7bt2+nQYMGkVgsJnNzc1q0aJHc9T2vMz+fj42NJVtbW9LS0iJ3d3c6duyY3ERo69atBEDu4y0ePXpEixYtIgsLCxKLxWRtbU2BgYFUWFhIRPITISKixMREGjx4MGlpaZGTkxMlJSXJ7FtKSgo5OTmRpqYmubq6UmxsrMxPmW/dukV+fn5kZGREOjo65OjoSEuWLGmxU29LB+bKykr6y1/+QuvWrRPK7t+/T4sWLSJra2vS0NCgfv360axZs6igoEBmvXfv3qUFCxaQjY0NaWpqkqWlJfn6+raajBK13tZZWVnk5uZGOjo6NGzYMPrhhx/anQgRPeswD4D27NkjM6+j71tTrG5ublJlbX2GWoqxsbGRNm/eLOy7qakpeXt7U3JyMhG1/7vS1R48eECBgYGkr69P+vr6FBgYKBM7AIqKihKmN2/eTFZWViQWi6l///60cuVKqq2tlVpm2bJl1K9fPxKLxTRw4EDasGGDzHsdGxtLgwYNajE2RSRCIqJmN6SV3KNHj5514jL+FAYjbf9QH6G6ujr85z//wSuvvMJDYrAW1dTUID8/HwMGDJDqFMuYPDExMQgODkZFRUW7+1OxrlNTU4NBgwbhwIEDcHd3V3Q4SmfkyJFYsmQJAgIC5M5v7f+lcPyuqICBgUGXxaSynVZen7sbia8ndXr5X375BQEBAcjIyMBXX32Fq1ev8tAYjLEO27t3L+zs7GBpaYkrV65g2bJlmDZtGidBCqKtrY29e/eitLRU0aEonZKSEkydOhUzZsxQdChSVDYROuV4Cxhu1nbFZogIO3bsQHh4OJ48eQLgWW/78+fPw8fHp6vDZIwpueLiYqxatQrFxcUwNzeHv78/1qxZo+iwVFpLnZvZH9O3b18sXbpU0WHIUNlEqDNKSkrw1ltv4dtvvxXKBg8ejNjYWOEXBowx1hFLly7tlQcHxlSFyj5QsaMSEhLg5OQklQSFhITg8uXLnAQxxhhjf1KcCLXhyZMnCA0NxcSJE4WHUJmamuLbb7/F1q1boaurq+AIGWOMMdZZnAi14e7du9i9e7cwPWnSJFy7dg2vvvqqAqNif0Yq9gNNxhjrMEX8n+REqA329vbYsmULtLW18cUXX+C7775r9THkjDXXNBxFbxpbhzHGeqOmYUuaj0/WnbizdDN3796FkZGR1C2v4OBgjBs3DjY2NgqMjP1Zqaurw8jISBiUVFdXt8ODVzLGmLJrbGzE/fv3oaur26NDUnEi9Jz4+Hi8/fbb8Pf3lxprSCQScRLE/pCmcZyeH6GdMcaYNDU1NfTv379HTxZVNhFa882rQM0VYJ4zKisrERYWhl27dgEAtm/fjsmTJ3M/INZlRCIRzM3N0bdv3xZH12aMMVWnqakJNbWe7bWj8ERo27ZtWL9+PYqKivDXv/4VmzZtEkZqlic5ORnh4eG4ceMGLCwssHTpUsyfP7/D212QPAZ4cguXXOoQGBiInJwcYZ6fnx8/Wp11C3V19R69980YY6x1Cu0sffDgQSxZsgQrVqxARkYGxowZg4kTJ6KwsFBu/fz8fEyaNAljxoxBRkYG/v3vf2Px4sWIi4vr8LYbiLA2Lx4eHh5CEqSrq4tdu3YhLi6Oh8tgjDHGVIBCB10dNWoUXFxcpPrjDB48GK+99hrWrl0rU3/ZsmU4duwYsrOzhbL58+fjypUrSE1Nbdc2mwZt81C3w/mG20L5iBEjEBMTg4EDB/6BPWKMMcZYd+iuQVcVdkWorq4OaWlp8PLykir38vLC+fPn5S6TmpoqU9/b2xuXL1/ucL+LpiRITU0NK1asQEpKCidBjDHGmIpRWB+h0tJSNDQ0yDyTp1+/figuLpa7THFxsdz69fX1KC0thbm5ucwytbW1qK2tFaYrKiqEv620jLHzaAw8PDzw5MkTYRBVxhhjjPUujx49AtD1D11UeGfp5j+RI6JWfzYnr7688iZr167Fhx9+KHfendoyTJw4sSPhMsYYY0yBHjx4AENDwy5bn8ISoT59+kBdXV3m6k9JSUmLT242MzOTW19DQ6PFzs0ffPABwsPDheny8nLY2NigsLCwS99I1jmPHj2CtbU1fvvtty6958s6jtui9+C26D24LXqPiooK9O/fH8bGxl26XoUlQpqamnB1dUViYiL8/PyE8sTEREyZMkXuMu7u7lKjvwPADz/8gOHDhwvDGDSnpaUFLS0tmXJDQ0P+UPciBgYG3B69BLdF78Ft0XtwW/QeXf2cIYX+fD48PBy7du3Cnj17kJ2djbCwMBQWFgrPBfrggw8wc+ZMof78+fNRUFCA8PBwZGdnY8+ePdi9ezfee+89Re0CY4wxxv7EFNpHaPr06Xjw4AE++ugjFBUV4cUXX8Tx48eF4SyKioqknik0YMAAHD9+HGFhYdi6dSssLCywZcsWvP7664raBcYYY4z9iSm8s3RISAhCQkLkzouOjpYp8/T0RHp6eqe3p6WlhYiICLm3y1jP4/boPbgteg9ui96D26L36K62UOgDFRljjDHGFEmhfYQYY4wxxhSJEyHGGGOMqSxOhBhjjDGmsjgRYowxxpjKUspEaNu2bRgwYAC0tbXh6uqKs2fPtlo/OTkZrq6u0NbWhp2dHbZv395DkSq/jrTFkSNH8Morr8DU1BQGBgZwd3fHyZMnezBa5dfR70aTlJQUaGhoYNiwYd0boArpaFvU1tZixYoVsLGxgZaWFuzt7bFnz54eila5dbQtYmJi4OzsDF1dXZibmyM4OBgPHjzooWiV108//QQfHx9YWFhAJBLh6NGjbS7TJcdvUjIHDhwgsVhMO3fupKysLAoNDSWJREIFBQVy69++fZt0dXUpNDSUsrKyaOfOnSQWi+nw4cM9HLny6WhbhIaG0rp16+jixYt069Yt+uCDD0gsFlN6enoPR66cOtoeTcrLy8nOzo68vLzI2dm5Z4JVcp1pC19fXxo1ahQlJiZSfn4+XbhwgVJSUnowauXU0bY4e/Ysqamp0ebNm+n27dt09uxZ+utf/0qvvfZaD0eufI4fP04rVqyguLg4AkDx8fGt1u+q47fSJUIjR46k+fPnS5U5OjrS8uXL5dZfunQpOTo6SpXNmzeP3Nzcui1GVdHRtpBnyJAh9OGHH3Z1aCqps+0xffp0WrlyJUVERHAi1EU62hYnTpwgQ0NDevDgQU+Ep1I62hbr168nOzs7qbItW7aQlZVVt8WoitqTCHXV8Vupbo3V1dUhLS0NXl5eUuVeXl44f/683GVSU1Nl6nt7e+Py5ct4+vRpt8Wq7DrTFs01Njbi8ePHXT7AnirqbHtERUUhLy8PERER3R2iyuhMWxw7dgzDhw/Hp59+CktLSzg4OOC9997DkydPeiJkpdWZtvDw8MCdO3dw/PhxEBHu3buHw4cPY/LkyT0RMntOVx2/Ff5k6a5UWlqKhoYGmdHr+/XrJzNqfZPi4mK59evr61FaWgpzc/Nui1eZdaYtmtuwYQOqqqowbdq07ghRpXSmPXJycrB8+XKcPXsWGhpK9a9CoTrTFrdv38a5c+egra2N+Ph4lJaWIiQkBGVlZdxP6A/oTFt4eHggJiYG06dPR01NDerr6+Hr64vPP/+8J0Jmz+mq47dSXRFqIhKJpKaJSKasrfryylnHdbQtmuzfvx+rV6/GwYMH0bdv3+4KT+W0tz0aGhoQEBCADz/8EA4ODj0VnkrpyHejsbERIpEIMTExGDlyJCZNmoSNGzciOjqarwp1gY60RVZWFhYvXoxVq1YhLS0NCQkJyM/PFwYLZz2rK47fSnWa16dPH6irq8tk8iUlJTJZYxMzMzO59TU0NGBiYtJtsSq7zrRFk4MHD2LOnDn4+uuvMX78+O4MU2V0tD0eP36My5cvIyMjAwsXLgTw7GBMRNDQ0MAPP/yAl19+uUdiVzad+W6Ym5vD0tIShoaGQtngwYNBRLhz5w4GDhzYrTErq860xdq1azF69Gi8//77AAAnJydIJBKMGTMG//3vf/kuQg/qquO3Ul0R0tTUhKurKxITE6XKExMT4eHhIXcZd3d3mfo//PADhg8fDrFY3G2xKrvOtAXw7ErQ7NmzERsby/fcu1BH28PAwADXrl1DZmam8Jo/fz4GDRqEzMxMjBo1qqdCVzqd+W6MHj0ad+/eRWVlpVB269YtqKmpwcrKqlvjVWadaYvq6mqoqUkfOtXV1QH872oE6xlddvzuUNfqP4Gmn0Lu3r2bsrKyaMmSJSSRSOjXX38lIqLly5dTUFCQUL/p53dhYWGUlZVFu3fv5p/Pd5GOtkVsbCxpaGjQ1q1bqaioSHiVl5craheUSkfbozn+1VjX6WhbPH78mKysrGjq1Kl048YNSk5OpoEDB9Jbb72lqF1QGh1ti6ioKNLQ0KBt27ZRXl4enTt3joYPH04jR45U1C4ojcePH1NGRgZlZGQQANq4cSNlZGQIjzLoruO30iVCRERbt24lGxsb0tTUJBcXF0pOThbmzZo1izw9PaXqJyUl0UsvvUSamppka2tLkZGRPRyx8upIW3h6ehIAmdesWbN6PnAl1dHvxvM4EepaHW2L7OxsGj9+POno6JCVlRWFh4dTdXV1D0etnDraFlu2bKEhQ4aQjo4OmZubU2BgIN25c6eHo1Y+Z86cafUY0F3HbxERX8tjjDHGmGpSqj5CjDHGGGMdwYkQY4wxxlQWJ0KMMcYYU1mcCDHGGGNMZXEixBhjjDGVxYkQY4wxxlQWJ0KMMcYYU1mcCDHGpERHR8PIyEjRYXSara0tNm3a1Gqd1atXY9iwYT0SD2Osd+NEiDElNHv2bIhEIplXbm6uokNDdHS0VEzm5uaYNm0a8vPzu2T9ly5dwty5c4VpkUiEo0ePStV57733cOrUqS7ZXkua72e/fv3g4+ODGzdudHg9f+bElLHejhMhxpTUhAkTUFRUJPUaMGCAosMC8GxQ16KiIty9exexsbHIzMyEr68vGhoa/vC6TU1Noaur22odPT29Do1O3VnP7+f333+PqqoqTJ48GXV1dd2+bcZY+3AixJiS0tLSgpmZmdRLXV0dGzduxNChQyGRSGBtbY2QkBCpUc2bu3LlCsaOHQt9fX0YGBjA1dUVly9fFuafP38ef//736GjowNra2ssXrwYVVVVrcYmEolgZmYGc3NzjB07FhEREbh+/bpwxSoyMhL29vbQ1NTEoEGDsG/fPqnlV69ejf79+0NLSwsWFhZYvHixMO/5W2O2trYAAD8/P4hEImH6+VtjJ0+ehLa2NsrLy6W2sXjxYnh6enbZfg4fPhxhYWEoKCjAzZs3hTqttUdSUhKCg4NRUVEhXFlavXo1AKCurg5Lly6FpaUlJBIJRo0ahaSkpFbjYYzJ4kSIMRWjpqaGLVu24Pr16/jyyy9x+vRpLF26tMX6gYGBsLKywqVLl5CWlobly5dDLBYDAK5duwZvb2/885//xNWrV3Hw4EGcO3cOCxcu7FBMOjo6AICnT58iPj4eoaGhePfdd3H9+nXMmzcPwcHBOHPmDADg8OHD+Oyzz7Bjxw7k5OTg6NGjGDp0qNz1Xrp0CQAQFRWFoqIiYfp548ePh5GREeLi4oSyhoYGHDp0CIGBgV22n+Xl5YiNjQUA4f0DWm8PDw8PbNq0SbiyVFRUhPfeew8AEBwcjJSUFBw4cABXr16Fv78/JkyYgJycnHbHxBgDlHL0ecZU3axZs0hdXZ0kEonwmjp1qty6hw4dIhMTE2E6KiqKDA0NhWl9fX2Kjo6Wu2xQUBDNnTtXquzs2bOkpqZGT548kbtM8/X/9ttv5ObmRlZWVlRbW0seHh709ttvSy3j7+9PkyZNIiKiDRs2kIODA9XV1cldv42NDX322WfCNACKj4+XqhMREUHOzs7C9OLFi+nll18Wpk+ePEmamppUVlb2h/YTAEkkEtLV1RVG0vb19ZVbv0lb7UFElJubSyKRiH7//Xep8nHjxtEHH3zQ6voZY9I0FJuGMca6y9ixYxEZGSlMSyQSAMCZM2fwySefICsrC48ePUJ9fT1qampQVVUl1HleeHg43nrrLezbtw/jx4+Hv78/7O3tAQBpaWnIzc1FTEyMUJ+I0NjYiPz8fAwePFhubBUVFdDT0wMRobq6Gi4uLjhy5Ag0NTWRnZ0t1dkZAEaPHo3NmzcDAPz9/bFp0ybY2dlhwoQJmDRpEnx8fKCh0fl/Z4GBgXB3d8fdu3dhYWGBmJgYTJo0CX/5y1/+0H7q6+sjPT0d9fX1SE5Oxvr167F9+3apOh1tDwBIT08HEcHBwUGqvLa2tkf6PjGmTDgRYkxJSSQSvPDCC1JlBQUFmDRpEubPn4+PP/4YxsbGOHfuHObMmYOnT5/KXc/q1asREBCA77//HidOnEBERAQOHDgAPz8/NDY2Yt68eVJ9dJr079+/xdiaEgQ1NTX069dP5oAvEomkpolIKLO2tsbNmzeRmJiIH3/8ESEhIVi/fj2Sk5Olbjl1xMiRI2Fvb48DBw7gnXfeQXx8PKKiooT5nd1PNTU1oQ0cHR1RXFyM6dOn46effgLQufZoikddXR1paWlQV1eXmqenp9ehfWdM1XEixJgKuXz5Murr67FhwwaoqT3rInjo0KE2l3NwcICDgwPCwsIwY8YMREVFwc/PDy4uLrhx44ZMwtWW5xOE5gYPHoxz585h5syZQtn58+elrrro6OjA19cXvr6+WLBgARwdHXHt2jW4uLjIrE8sFrfr12gBAQGIiYmBlZUV1NTUMHnyZGFeZ/ezubCwMGzcuBHx8fHw8/NrV3toamrKxP/SSy+hoaEBJSUlGDNmzB+KiTFVx52lGVMh9vb2qK+vx+eff47bt29j3759MrdqnvfkyRMsXLgQSUlJKCgoQEpKCi5duiQkJcuWLUNqaioWLFiAzMxM5OTk4NixY1i0aFGnY3z//fcRHR2N7du3IycnBxs3bsSRI0eETsLR0dHYvXs3rl+/LuyDjo4ObGxs5K7P1tYWp06dQnFxMR4+fNjidgMDA5Geno41a9Zg6tSp0NbWFuZ11X4aGBjgrbfeQkREBIioXe1ha2uLyspKnDp1CqWlpaiuroaDgwMCAwMxc+ZMHDlyBPn5+bh06RLWrVuH48ePdygmxlSeIjsoMca6x6xZs2jKlCly523cuJHMzc1JR0eHvL29ae/evQSAHj58SETSnXNra2vpjTfeIGtra9LU1CQLCwtauHChVAfhixcv0iuvvEJ6enokkUjIycmJ1qxZ02Js8jr/Nrdt2zays7MjsVhMDg4OtHfvXmFefHw8jRo1igwMDEgikZCbmxv9+OOPwvzmnaWPHTtGL7zwAmloaJCNjQ0RyXaWbjJixAgCQKdPn5aZ11X7WVBQQBoaGnTw4EEiars9iIjmz59PJiYmBIAiIiKIiKiuro5WrVpFtra2JBaLyczMjPz8/Ojq1astxsQYkyUiIlJsKsYYY4wxphh8a4wxxhhjKosTIcYYY4ypLE6EGGOMMaayOBFijDHGmMriRIgxxhhjKosTIcYYY4ypLE6EGGOMMaayOBFijDHGmMriRIgxxhhjKosTIcYYY4ypLE6EGGOMMaayOBFijDHGmMr6/2StLV9hbYG4AAAAAElFTkSuQmCC", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAHFCAYAAAAe+pb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAACZY0lEQVR4nOzdeVxU5ffA8c+wLwqKKIIr7qa5YZpbam65Vi6gmLvmbi5pmZlalpllau57KgrqV81KTXPPJfel1NzFBURBQZBlmLm/P/g5dQEXcODOwHm/Xr5qztw79wwXmMO5z30enaIoCkIIIYQQuZCN1gkIIYQQQmhFCiEhhBBC5FpSCAkhhBAi15JCSAghhBC5lhRCQgghhMi1pBASQgghRK4lhZAQQgghci0phIQQQgiRa0khJIQQQohcSwohIczkzJkz9OrVC19fX5ycnMiTJw81atTgm2++ISoqSuv0nmnixInodLpM7btlyxYmTpyY7nMlS5akZ8+emU/sJRiNRlatWkWLFi0oVKgQ9vb25MuXj9dff51vv/2W+/fvZ+p1e/bsScmSJc2b7Auy5u8xISyVTpbYEOLlLVq0iEGDBlG+fHkGDRrEK6+8gl6v59ixYyxatIiqVauyceNGrdN8qokTJzJp0iQy8+tgyJAhzJkzJ919T548iZubG6VLlzZHmi8sPj6et99+m99//52AgADefvttfHx8iImJ4eDBgyxZsoRy5cqxf//+DL/2lStXiImJoXr16lmQ+dNZ+/eYEBZLEUK8lIMHDyq2trbKW2+9pSQkJKR5PjExUfnpp580yOzFTZgwQcnsr4PBgwdnet+s8v777yuAsnr16nSfj4uLUxYuXJjNWWVedn6PPX78WDEajWZ5LSGsgWX99hLCCrVp00axs7NTQkNDX2h7QJkwYUKaeIkSJZQePXqYHi9btkwBlJ07dyp9+/ZVPDw8lLx58yrdunVTYmNjlbCwMKVTp06Ku7u7UrhwYWXUqFFKUlKSaf/du3crgLJ7927Vca5du6YAyrJly0yx9Aqh4OBgpVmzZkrhwoUVJycnpUKFCspHH32kxMbGmrbp0aOHAqT5d+3atTTvKSIiQrG3t1c+/fTTNO/9/PnzCqDMnDnTFAsLC1Pef/99pUiRIoq9vb1SsmRJZeLEiYper3/m1/fOnTuKnZ2d0rp162dul9rs2bOVBg0aKAULFlRcXFyUypUrK1OnTlV9TZ+85xIlSqhigDJ48GBlxYoVSoUKFRRnZ2elSpUqys8//6zaLiIiQunXr59StGhRxcHBQfH09FTq1q2r7Nix45m5ZfX32G+//ab06tVL8fT0VABlzZo1CqD8/vvvaV5j7ty5CqCcPn3aFDt69KjStm1bJX/+/Iqjo6NSrVo1JSQk5IVyFUJrdtnbfxIiZzEYDOzatQs/Pz+KFSuWJcfo27cv7du3Jzg4mJMnT/LJJ5+QnJzMP//8Q/v27Xn//ff5/fffmTp1Kj4+PowcOdIsx7106RKtWrVi+PDhuLq6cuHCBaZOncqRI0fYtWsXAOPHjycuLo7169dz6NAh077e3t5pXq9gwYK0adOGH3/8kUmTJmFj8+8QxWXLluHg4EDXrl0BCA8Pp1atWtjY2PDZZ59RunRpDh06xOTJk7l+/TrLli17at67d+8mOTmZdu3aZej9XrlyhcDAQHx9fXFwcOD06dN8+eWXXLhwgaVLlz53/19//ZWjR4/y+eefkydPHr755hveffdd/vnnH0qVKgVAt27dOHHiBF9++SXlypXj4cOHnDhxgsjIyKe+bnZ8j/Xu3ZvWrVuzcuVK4uLiaNOmDYUKFWLZsmU0adJEte3y5cupUaMGVapUAVK+3m+99Ra1a9dm/vz5uLu7ExwcTEBAAI8fP9ZsjJgQL0zrSkwIaxYeHq4ASufOnV94HzL41/rQoUNV273zzjsKoEyfPl0Vr1atmlKjRg3T45ftCP2X0WhU9Hq9snfv3jTdgGddGkv9njZv3qwAyvbt202x5ORkxcfHR+nQoYMp1r9/fyVPnjzKjRs3VK/37bffKoDy999/PzXXr7/+WgGUbdu2pXlOr9er/j2NwWBQ9Hq9smLFCsXW1laJiooyPfe0jpCXl5cSExNjioWHhys2NjbKlClTTLE8efIow4cPf+px05Md32Pdu3dPs+3IkSMVZ2dn5eHDh6bYuXPnFED54YcfTLEKFSoo1atXT/P1bNOmjeLt7a0YDIYXzlsILchdY0JYuDZt2qgeV6xYEYDWrVunid+4ccNsx7169SqBgYEULlwYW1tb7O3tadiwIQDnz5/P1Gu2bNmSwoULqzo6v/32G3fu3KF3796m2C+//ELjxo3x8fEhOTnZ9K9ly5YA7N27N8PHPnXqFPb29qp//71z7OTJk7Rr144CBQqY3m/37t0xGAxcvHjxua/fuHFj8ubNa3rs5eVFoUKFVOekVq1aLF++nMmTJ3P48GH0en2G30dW6NChQ5pY7969iY+PJyQkxBRbtmwZjo6OBAYGAnD58mUuXLhg6uT991y1atWKsLAw/vnnn+x5E0JkkhRCQrwET09PXFxcuHbtWpYdw8PDQ/XYwcHhqfGEhASzHDM2NpYGDRrw559/MnnyZPbs2cPRo0fZsGEDkHJXVmbY2dnRrVs3Nm7cyMOHD4GUSy3e3t60aNHCtN3du3f5+eef0xQulSpVAnjmre/FixcHSFMUli9fnqNHj3L06FH69eunei40NJQGDRpw+/ZtZs6cyf79+zl69Chz5sx54fdboECBNDFHR0fVviEhIfTo0YPFixdTp04dPDw86N69O+Hh4U993ez4HkvvUmalSpV47bXXTEWrwWBg1apVvP3226bvvbt37wLw4YcfpjlXgwYNAp59roSwBDJGSIiXYGtrS5MmTdi6dSu3bt2iaNGiz93H0dGRxMTENPFnjRPJDCcnJ4A0x3qRD6Zdu3Zx584d9uzZY+oCAabi5WX06tWLadOmmcaRbN68meHDh2Nra2vaxtPTkypVqvDll1+m+xo+Pj5Pff1GjRphZ2fH5s2bef/9901xZ2dnatasCaR0nP5r06ZNxMXFsWHDBkqUKGGKnzp1KjNv8ak8PT2ZMWMGM2bMIDQ0lM2bN/Pxxx8TERHBtm3b0t0nO77HnjaHVK9evRg0aBDnz5/n6tWrhIWF0atXL9X7ARg7dizt27dP9zXKly//3HyF0JJ0hIR4SWPHjkVRFPr160dSUlKa5/V6PT///LPpccmSJTlz5oxqm127dhEbG2vWvJ5M+pf6WJs3b37uvk8+GB0dHVXxBQsWpNn2yTYv2iWqWLEitWvXZtmyZaxevZrExETVhyukXA7866+/KF26NDVr1kzz71mFkLe3N7179+bXX38lODj4hXJK7/0qisKiRYteaP/MKF68OEOGDKFZs2acOHHimdtq9T3WpUsXnJycWL58OcuXL6dIkSI0b97c9Hz58uUpW7Ysp0+fTvc81axZU3W5UAhLJB0hIV5SnTp1mDdvHoMGDcLPz4+BAwdSqVIl9Ho9J0+eZOHChVSuXJm2bdsCKXcOjR8/ns8++4yGDRty7tw5Zs+ejbu7u1nzKly4ME2bNmXKlCnkz5+fEiVKsHPnTtPlrWepW7cu+fPnZ8CAAUyYMAF7e3uCgoI4ffp0mm1fffVVAKZOnUrLli2xtbWlSpUqpkt46enduzf9+/fnzp071K1bN03X4PPPP2fHjh3UrVuXYcOGUb58eRISErh+/Tpbtmxh/vz5z+yMzJgxg2vXrtG1a1c2b95smlDx8ePHXLhwgeDgYJycnLC3twegWbNmODg40KVLF8aMGUNCQgLz5s3jwYMHz/1avajo6GgaN25MYGAgFSpUIG/evBw9epRt27Y9tZvyhFbfY/ny5ePdd99l+fLlPHz4kA8//FB1tx+kFMctW7akRYsW9OzZkyJFihAVFcX58+c5ceIE69aty9gXSojspvFgbSFyjFOnTik9evRQihcvrjg4OCiurq5K9erVlc8++0yJiIgwbZeYmKiMGTNGKVasmOLs7Kw0bNhQOXXq1FPv6Dl69KjqOE/u8Lp3754q3qNHD8XV1VUVCwsLUzp27Kh4eHgo7u7uynvvvaccO3bshe4aO3jwoFKnTh3FxcVFKViwoNK3b1/lxIkTafZNTExU+vbtqxQsWFDR6XRPnUfov6KjoxVnZ2cFUBYtWpTu1/PevXvKsGHDFF9fX8Xe3l7x8PBQ/Pz8lHHjxqnmMnoag8GgrFixQmnWrJni6emp2NnZKe7u7kqtWrWU8ePHK7du3VJt//PPPytVq1ZVnJyclCJFiiijR49Wtm7dmubOu2fNI5Taf99/QkKCMmDAAKVKlSqKm5ub4uzsrJQvX16ZMGGCEhcX99z3oyjZ9z32X9u3bzfND3Xx4sV0tzl9+rTi7++vFCpUSLG3t1cKFy6svPnmm8r8+fNf6H0JoSVZYkMIIYQQuZaMERJCCCFEriWFkBBCCCFyLSmEhBBCCJFraVoI7du3j7Zt2+Lj44NOp2PTpk3P3Wfv3r34+fnh5OREqVKlmD9/ftYnKoQQQogcSdNCKC4ujqpVqzJ79uwX2v7atWu0atWKBg0amBafHDZsGP/73/+yOFMhhBBC5EQWc9eYTqdj48aNvPPOO0/d5qOPPmLz5s2qdY4GDBjA6dOnVStfCyGEEEK8CKuaUPHQoUOqWU0BWrRowZIlS9Dr9abJ0f4rMTFRNdW80WgkKiqKAgUKPHVaeSGEEEJYFkVRePToET4+Pmkm9nwZVlUIhYeH4+XlpYp5eXmRnJzM/fv30104cMqUKUyaNCm7UhRCCCFEFrp58+YLrbn3oqyqEIK0iwM+ubL3tO7O2LFjGTlypOlxdHQ0xYsX5+LFi2lW7xbZT6/Xs3v3bho3bpxuR09kHy3Pxemb0fT88Xi2HlOYm8JX9otpZ3tY60REDhGXpODq8O9ne0yiQrHvY82+fp1VFUKFCxcmPDxcFYuIiMDOzo4CBQqku4+jo2OahSMBPDw8nrqPyD56vR4XFxcKFCgghZDGtDwXy3+6go2ji+lxwbyOLOzmh00uvXydnJzMgYMHqFe3HnZ22v2atkmKxfvoV+QJO4xOMTxzW50xGYfYW8C/5yzJ1YfQRjMx2qb9HWwtDAYDp8+cpmqVqtja2mqdTq5x7OQZRo37gtHDBtDmrSYAREfHwPftzD6sxaoKoTp16qhWWAbYvn07NWvWlA9RIazUidAH7Lt4TxXr/0YpqhfPr1FG2tPr9dzKA1WKumv3u82QDKt7wZVdmdvfIQ8O3dZRpnBl8+aVzfR6PRfvPKJMtQbyOZMNjEYjU6dOZfz48RgMBiZM+Z62nbpRtmxZIiMjs+SYmt4+Hxsby6lTpzh16hSQcnv8qVOnCA0NBVIua3Xv3t20/YABA7hx4wYjR47k/PnzLF26lCVLlvDhhx9qkb4Qwgxm/n5J9dgzjyNda5fQKBthsu3jzBdB6KDDErDyIkhkr4iICFq2bMknn3yCwZDSgfTz88PV1TVLj6tpR+jYsWM0btzY9PjJWJ4ePXqwfPlywsLCTEURgK+vL1u2bGHEiBHMmTMHHx8fZs2aRYcOHbI9dyFyIqMCn20+xy9nwolLSs62Y/7XgIalcHaQSxCa+nMhHF2UuX11NtDyGyj/lnlzEjnanj17CAwMJCwsDEgZ9zt+/Hg+++yzLL8kqWkh1KhRI541jdHy5cvTxBo2bMiJEyeyMCshcq89YTp+unFLs+N75nGQblB2uH8J9n4Dj8LSPqcoEHpQHbN1hNbfgvNzbjDR6aBQRfAoZb5cRY5mMBj48ssvmTRpEkajEUi5GzwoKIgmTZpkSw5WNUZICJF1Hicls/O2tssP9n+jtHSDslrUNVjaAh5nYLzFO3Ph1Y5Zl5PIle7evUtgYCC7dv17CbZp06asWrUqzVQ5WUkWXRVCALD6yC1ik7W7S6tR+YJ0ryvdoCwV/xBWB2SsCGr4sRRBIksoisJff/0FgI2NDV988QXbtm3L1iIIpCMkhADikwws/uO6KtagrCeftKqYLcd3c7anSD7nbDlWrmVIhnU94f4/L75P1S7Q6OMsS0nkboULFyYoKIhevXqxatUqGjZsqEkeUggJIQj68waRcUmq2Kjm5ano7aZRRuKFJD6Cg7Mh6urzt425DTcOqGOFXoGavdPf3sMXSjdJGfcjhBncvn0bZ2dn1WTGTZs25dKlSzg5OWmWlxRCQuRgRqOC8TnrKickG5m/V/1B2qh8QaoVy5eFmYmXlvQYfmwLd05mbn/XghAYAvmKmzcvIdKxbds2unXrRt26ddm0aZNqUkQtiyCQQkiIHElRFL757R9WHbrBo8SM3wb/QZOyWZCVMBujETb2z3wRZOsInddIESSynF6v57PPPuPrr78GYPPmzSxYsIABAwZonNm/pBASIgf65UwY8/ZcydS+b5QtkKtndbYKuyfD+c2Z21dnA+/Og2KvmTcnIVK5efMmnTt35uDBf6djaNOmDZ06ddIwq7SkEBIihzEaFWbtvPT8DZ9iSOPSZsxGmN2pNbD/O3XM0R1q9nr+eB5bByjTTIogkeV+/vlnevbsSVRUFAB2dnZMnTqVESNGmH2tsJclhZAQOcyWv8K4FBGb4f1sbXQ08zFQXcYGWa4bB2HzUHVMZwv+P0LpxunvI0Q2SkpKYuzYsUyfPt0UK1GiBCEhIdSuXVvDzJ5OCiEhcpD0ukEVCudlTtcaz903v5MN+3ftyKrUxMuKugrBXcGoV8dbTZMiSFiEmJgYmjVrxpEjR0yxd955h6VLl5I/v+VebpdCSIgcZOtf4Vy8q+4GDW9altIF8zx3X71e/9xthEaeTIQYH6WOvz4IXuujSUpCpJY3b15KlizJkSNHcHBw4Ntvv2XIkCEWdyksNSmEhMghntYNav5KYY0yEmZh0P//RIgX1fGyLaD5ZE1SEiI9Op2ORYsWER0dzZdffomfn5/WKb0QKYSEyCG2/R3OP3cfqWLDmpTFxsay/xoTz6AosPUjuLpbHS9UCTouARtZl01o58qVK9y8eZNGjRqZYm5ubmzbtk27pDJB1hoTIgdIrxtU3isvb1WSbpA1szm2CI4tVQddC0FgMDjm1SYpIYB169ZRo0YNOnTowM2bN7VO56VIISREDrD9XDgXwqUblJN4RZ/CZsen6qCdE3SRiRCFdhISEhg0aBD+/v7ExMQQFRXFJ598onVaL0UujQlh5YxGhRm/q7tB5bzy0LKydIOsSuQViDgHioIuPoaa1+eiU4zqbd6ZB0VrapOfyPUuXryIv78/p0+fNsUCAwOZO3euhlm9PCmEhLBy28/dTdMNGvqmdIOsytElsOVD+P/CJ91fzI3HQeX22ZqWEE+sXr2a/v37Exubcleqk5MTs2fPpnfv3hZ/V9jzSCEkhBVTlLRjg8oWykOrV701ykhk2MXfVEVQul7tBG+Mzr6chPh/jx8/ZtiwYSxZssQUq1ixImvXrqVy5coaZmY+UggJYcW2n7vLubAYVWxok7LYSjfIOoT/Bet7P7sIKlYb2s1+/vIZQpiZoii89dZb7N+/3xTr0aMHc+bMwdXVVcPMzEsGSwthpdLrBpUu6Epr6QZZh0d3YU1nSEq1HEq+EigFyvDIyQdjlS7QJRjsnbTJUeRqOp2OkSNHAuDi4sLy5ctZvnx5jiqCQDpCQlit389H8PcddTdomHSDrIM+HoIDITrVbcev+kP7hSQnJ7NryxZatWqFjb29NjkKQcoSGd9++y2tWrWiYsWKWqeTJaQjJIQVUhSFGb+rZxouXdCVNlV8NMpIvDCjETYNgtvH1PFitaHdD3IJTGjmr7/+4tNPP0VRFFV81KhRObYIAukICWGVdqbTDRr6pnSDrMLer+HvDepYvhLQebVcAhOaUBSFJUuWMHToUBISEvD19aVPn9yzhp10hISwMoqiMDPV2KBSnq60rSrdIIt3Zi3snaqOObpBYAi4emqTk8jVHj16xHvvvUe/fv1ISEgAYMmSJRiNzxjAn8NIISSEldl1IYKzt6NVsaFNykg3yNKF/gk/DVbHdDbQaRkUyrmXHYTlOnXqFH5+fqxevdoUGzhwILt27cLGJveUB3JpTAgrkKA3EJOgB0jTDfL1dKWtjA2ybA9upAyONiSp4y2/gTJNtclJ5FqKojB//nxGjBhBYmIikLJY6qJFi/D399c4u+wnhZAQFu77HReZt/cKScnpt6qHNC6DnW3u+evN6iREw+oAeHxfHa/VH2r10yYnkWtFR0fTr18/1q1bZ4r5+fkREhJC6dKlNcxMO/LbUwgLdvDKfWbuvPTUIqhkARferibdIItlSE6ZMPHeeXW8TFNo8ZU2OYlc7aOPPlIVQcOGDePAgQO5tggCKYSEsGipF1NNbcibZaUbZMl++wQu/66OFawIHZeCrTTkRfabPHkyRYoUIV++fGzcuJGZM2fi6OiodVqakp9EISzUoSuRHLkWle5zDrY2BLxWjPbVi2RzVuKFHVkERxaoYy6eEBgMTu7a5CRyHUVRVIuienp6smnTJjw9PSlZsqR2iVkQKYSEsFCpJ0z0dndi85D62NnocHawxcneVqPMxHNd/h22fqSO2TqmzBWUv6QmKYnc588//2TEiBFs3LgRLy8vU7xmzZoaZmV5pKcuhAU6fDWSP1N1gwY1LkPBvI7kd3WQIsiSRZyHdb1AMajjb8+B4rW1yUnkKoqi8N1331G/fn0OHTpEt27dctW8QBklHSEhLNDMVGODvN2d8K9ZVKNsxAuLuw+r/SFRPes3DT+CKp20yUnkKpGRkfTs2ZNffvnFFIuLiyM6Opr8+fNrmJnlko6QEBbmz6uRHLoaqYoNalQaRzvpAlk0fULKXEEPQ9XxSu2h0VhtchK5yoEDB6hevbqqCPr444/Zs2ePFEHPIB0hISxM6gkTC7s54f9aMY2yES9EUeDnYXDzT3W8iB+8M1cWUhVZymg08s033/Dpp59iMKRckvX09GTlypW89dZbGmdn+aQQEsKCHLkWxcEr6m7QQOkGWb5938KZEHXMvRh0XgP2ztrkJHKFe/fu0b17d7Zt22aKvfHGG6xevZoiReSu0hchl8aEsCAzd6rvFPNycyRAukGW7a8NsHuyOuaQB7oEQ16v9PcRwkx27txpKoJ0Oh3jx49n586dUgRlgHSEhLAQx65HceByqm5Qw9Jyh5glu3UcNg1Ux3Q2KRMmFq6sTU4iV+ncuTPbt29ny5YtrFq1iqZNZe26jJJCSAgLkXpsUKG8jnSuVVyjbMRzPbwJazpDcoI63vxLKNdCm5xEjvfo0SPy5s2ris2ePZuYmBgKFy6sUVbWTS6NCaERRVG49eAx1+7Hsf3vcPZfUi/KObCRdIMsVuKjlCIoLkId9+sFrw9Mfx8hXtLOnTspV64ca9euVcVdXFykCHoJ0hESQgPh0Ql0XXyYK/fi0n2+YF5Hukg3yDIZDfC/vnD3L3W8VCNoNU3uEBNmZzAY+Pzzz/niiy9QFIW+ffvi5+eXqxdKNScphITQwPif/npqEQQyNsii7fgMLm5TxzzLQacfwdZem5xEjnXnzh0CAwPZu3evKVanTp00l8dE5smlMSGy2V+3o9lx7u5Tny+Y15HA2tINskjHlsGh2eqYswcEhoBzPk1SEjnXb7/9RrVq1UxFkK2tLVOmTGHr1q0UKlRI4+xyDukICZHNZqUaFP1fpTxdmdapqnSDLNHVPbDlQ3XMxh4CVoFHKU1SEjlTcnIy48eP5+uvvzbFihYtypo1a6hfv76GmeVMUggJkY3+vhPN9lTdoFHNyjGgUcq1fntbadJapHsXIaQ7GJPV8bYzoWQ9bXISOdLt27cJCAjgwIEDpljr1q1Zvnw5np6eGmaWc8lvXSGyUepuUD4Xe3rWK4m9rY0UQZbqcdT/L6QarY7XHwHVu2qTk8ixdDodFy+mTKxqZ2fHt99+y+bNm6UIykLym1eIbHLuTgy//a3uBvWt70teJxlga7GSkyDkPXhwTR2v2Bbe/EybnESO5uPjw8qVKylZsiT79+9n1KhR2NjIR3VWkktjQmST1N0gd2d7etQtqU0y4vkUBX4ZDjcOqOPe1eDdBSAfTsIMbty4gbu7O/ny5TPFWrRowYULF3B0dNQusVxECiEhssH5sBi2/R2uikk3yMIkJ8G5TRB9K+Vx1BU4FaTeJq9PyhpiDq7Znp7IeTZt2kSvXr148803Wb9+Pbr/zEElRVD2kUJIiGzwwy51N8jNyY4e9Upqk4xIKzkRgjrCtX1P38beBbqsATfv7MtL5EiJiYmMGTOGWbNmAbBhwwaWLl1Knz59NM4sd5JCSIgsdiE8hi1n1d2gPvVL4SbdIMugKPDz8GcXQeig/SLwqZZNSYmc6urVq/j7+3P8+HFTrGPHjnTs2FHDrHI3ucgtRBb7Yedl1WM3Jzt6SjfIcvzxPZxe/extmk6Eim2yJR2Rc61fv57q1aubiiBHR0fmzp3L2rVrcXd31zi73Es6QkJkoX/CH7HlrzBVrHd9X9ydpRtkEc5thp2T1DF7F/B9I+X/bR2gQhuo4p/9uYkcIyEhgVGjRjF37lxTrGzZsqxdu5Zq1appl5gApBASIkvN2nUJRfn3cV4nO3rV89UuIfGvOydhw/upgv9/CUy6P8JMHjx4wJtvvsmpU6dMsS5durBgwQJZL8xCyKUxIbLIpbuP2HI2VTeonnSDLEL0bVjdGZLj1fFmk6QIEmaVL18+ypQpA4CTkxMLFy4kKChIiiALIh0hIbLIrF2X1d0gRzt6SzdIe4mxsCYAYtUD2Kn+HtQdpk1OIsfS6XQsXryYx48fM2XKFKpUqaJ1SiIVKYSEyAKX7j7ilzN3VLFe9Uri7iLdIE0ZjSmXw8LPquMl6kPr7+E/87gIkRnnz5/n7t27NGrUyBRzd3fn119/1S4p8UxyaUyILPBDet2g+tIN0tzOifBPqg8kj9IQsBLsHDRJSeQcK1asoGbNmnTq1Inbt29rnY54QVIICWFmlyNi+TlVN6hnvZLkc5EPWk2dWAkHZqpjTvkgcC24eGiSksgZ4uLi6NWrFz169ODx48fcv3+fCRMmaJ2WeEFyaUwIM/sh1Z1ieRzt6CPdIG1d25+ybth/2diB/wrwLKNJSiJn+Ouvv/D39+f8+fOmWN++fZk5c+Yz9hKWRAohIV6SwajwT/gj4vXJRMXp+fl0qm5QXekGaSrySsoK8sZkdbz1dCjVUJuchNVTFIWlS5cydOhQ4uNT7j7MkycPCxYsIDAwUOPsREZIISTES4iMTSRg4WEuR8Sm+7yrg610g7T0OApW+0PCQ3W8zhDw66FJSsL6PXr0iIEDBxIU9O+ivFWrVmXt2rWUK1dOw8xEZkghJMRL+Hb7P08tgiBlbFB+V+kGacKgh7XdIVK9xAnlW0Gzz7XJSVg9RVFo2rQpR44cMcUGDhzI9OnTcXJy0jAzkVmaD5aeO3cuvr6+ODk54efnx/79+5+5fVBQEFWrVsXFxQVvb2969epFZGRkNmUrxL9uRj1m3bFbT30+r6MdfeuXysaMhImiwK8j4Xqq3yder6bMHG1jq01ewurpdDpGjx4NQN68eQkJCWHu3LlSBFkxTQuhkJAQhg8fzrhx4zh58iQNGjSgZcuWhIaGprv9H3/8Qffu3enTpw9///0369at4+jRo/Tt2zebMxcC5u65QrLx31HROl1K8ZPX0Y5KPm4s6OYn3SCtHJoNJ1aoY3m8IDAYHPNok5PIMTp27Mh3333HyZMn8feXdeisnaaXxqZPn06fPn1MhcyMGTP47bffmDdvHlOmTEmz/eHDhylZsiTDhqXM/urr60v//v355ptvsjVvIW49eMz64zdVsZ51SzKhbSWNMhImF7bA9vHqmJ0TdFkD7kW1yUlYrRMnThAUFESrVq1U8ZEjR2qUkTA3zQqhpKQkjh8/zscff6yKN2/enIMHD6a7T926dRk3bhxbtmyhZcuWREREsH79elq3bv3U4yQmJpKYmGh6HBMTA4Ber0ev15vhnYiX8eQcWNu5mL3rEnrDv90gRzsb+tYrYXXv47+s9VyohJ/F7n990aGowsnt5qIUqgJW8t5yxLmwcoqiMHfuXD766COSkpJo2rQpvXr10jqtXC2rfh40K4Tu37+PwWDAy8tLFffy8iI8PDzdferWrUtQUBABAQEkJCSQnJxMu3bt+OGHH556nClTpjBp0qQ08d27d+Pi4vJyb0KYzY4dO7RO4YVFJcK6k7bAv8sx1PZM5tj+ndolZUbWdC7+y1H/kIb/TMReH6eKn/PuyKVrdnBti0aZZZ61ngtrFxsby+zZszl8+LApNnv2bAoVKoROlmHRzOPHj7PkdTW/ayz1N5WiKE/9Rjt37hzDhg3js88+o0WLFoSFhTF69GgGDBjAkiVL0t1n7NixqhZmTEwMxYoVo3HjxhQoUMB8b0Rkil6vZ8eOHTRr1gx7e+tYh2vCz+cwKP8Oknaws+Grbm/g5WbdgyWt8VyY6B9ju7IdNvooVdj4qj9l286hrJV9eFn1ubByR44cYfjw4Vy/ft0Ua9euHT/++COurq7aJSay7MYozQohT09PbG1t03R/IiIi0nSJnpgyZQr16tUzjdivUqUKrq6uNGjQgMmTJ+Pt7Z1mH0dHRxwdHdPE7e3t5ReMBbGW83HnYTzrjqvXEAqsVZyiBfJqlJH5Wcu5MDEaYeNQCDuljhd7HZu3Z2NjxWuIWd25sGKKovD999/z0UcfkZycMvlm/vz5WbJkCTY2Nri6usq50FhWff01u2vMwcEBPz+/NK3fHTt2ULdu3XT3efz4MTY26pRtbVNug1UUJb1dhDCreXuuqMYGOdjaMKBhaQ0zEuz+Es79pI7lLwmdg8Au7R9BQqQWGRlJu3btGDVqlKkIqlu3LqdOnaJNmzYaZyeymqa3z48cOZLFixezdOlSzp8/z4gRIwgNDWXAgAFAymWt7t27m7Zv27YtGzZsYN68eVy9epUDBw4wbNgwatWqhY+Pj1ZvQ+QSYdHxhBxV3ynWuVYxCrtb9yUxq3Y6GPZ/q445uqcspOrqqU1Owup8+OGH/PLLL6bHH330EXv27KF48eIaZiWyi6ZjhAICAoiMjOTzzz8nLCyMypUrs2XLFkqUKAFAWFiYak6hnj178ujRI2bPns2oUaPIly8fb775JlOnTtXqLYhcZN6eKyQZjKbHDrY2DGwk3SDN3DgEm4eqYzpb6LQMCpbXJidhlaZOncpvv/2GXq9n5cqVvPXWW1qnJLKR5oOlBw0axKBBg9J9bvny5WliQ4cOZejQoWk3FiILhUcnEHxE3Q0KeK0Y3u7OGmWUy0Vdg5CuYEhSx1t9A2WaaJOTsBqpb8opVKgQP/30Ez4+PhQpUkTDzIQWNF9iQwhrMH+vuhtkb6uTbpBW4h/C6gB4nOoOktoD4DWZZV482759+3j99de5d++eKv7aa69JEZRLSSEkxHPcjUlg9RH1si8BrxXDJ590g7KdIRnW9YT7/6jjZZpB8y81SUlYB4PBwOTJk2ncuDFHjhyhR48eGI3G5+8ocjzNL40JYenm7blCUnLqblAZDTPKpRQFto6Bq7vV8UKvQMelYCu/zkT67t69y3vvvcfvv/9uiiUmJhIbG4ubm5uGmQlLIB0hIZ4hIiaBNam6QZ1qFqOIdIOy358L4FiqiVNdC0KXYHCSDzORvl27dlG1alVTEWRjY8OkSZPYvn27FEECkI6QEM80b+8VElN1gwbJ2KDsd3E7/DZWHbN1hM6rIX8JbXISFs1gMPD555/zxRdfmOaZ8/b2ZvXq1TRq1Ejb5IRFkUJIiKeIiElg9Z/qblBHv2IUzS9r1GWru3/D+t6gpBrP8c5cKFZLm5yERbtz5w5du3Zlz549pljz5s1ZuXIlhQoV0i4xYZHk0pgQT7Fg31VVN8jORrpB2S42AlZ3hqRH6nijsfBqR21yEhZv586dpiLI1taWr776iq1bt0oRJNIlHSEh0hHxKIFVh2+oYp1qFqWYh3SDso0+AYIDIVrdlaNyR2j4kTY5Cavw3nvvsWPHDnbt2kVwcDD169fXOiVhwaQQEiIdC/em1w2SO8WyjaLAT4Ph1lF1vOhr8PYcsLLV5EXWio6Oxt3d3fRYp9Mxd+5cEhIS8PSUpVbEs8mlMSFSufcokVV/qrtBHWpINyhb7Z0Kf61Xx9yLpQyOtpe13cS/tmzZQunSpdmwYYMqnidPHimCxAuRQkiIVBbuu0KCXt0NGtxYukHZ5ux62DNFHXPIC4EhkEfGeIgUer2eMWPG0Lp1ayIjI+nduzfXrl3TOi1hheTSmBD/cT82kZWpxga1r1GE4gWkG5Qtbh6FTanWHtTZpEyY6FVJm5yExblx4wadO3fm8OHDpljDhg1Vl8eEeFHSERLiPxbtu6rqBtna6BjSuKyGGeUiD25AcBcwJKrjLaZAueba5CQszk8//US1atVMRZC9vT0zZsxg06ZNeHh4aJydsEbSERLi/92PTWTFoVTdoOrSDcoWCTGwpjPEqRfCpGYfqN1fm5yERUlKSmLMmDHMnDnTFPP19SUkJITXXntNw8yEtZNCSIj/t2j/VeL1BtNjWxsdQ96UsUFZzpCcMmFixDl1vFRjaDlV7hATXL9+nU6dOnHs2DFTrEOHDixevJh8+fJpl5jIEeTSmBBAZGwiK1N1g96pVoQSBVw1yigX2f4pXN6hjnmWh07LwdZek5SEZbGzszMNhHZwcGDOnDmsW7dOiiBhFlIICQEs2n+Nx0nqbtBQ6QZlvaOL4c956pizR8odYs75NElJWJ6iRYvy448/Uq5cOQ4fPsygQYPQSadQmIkUQiLXi4pLYsWh66rY29V8KOkp3aAsdXknbBmjjtk6pMwV5OGrTU7CIly+fJno6GhVrHXr1vz1119Ur15do6xETiWFkMj1Fu+/quoG2ehg6Jtyp1iWirgA63qCYlDH286CEnU0SUlYhuDgYGrUqEG/fv1Mq8Y/YW8vl0qF+UkhJHK1B3FJ/Hjwuir2TrUi+Eo3KOvE3YfV/pAYo443GAXVumiTk9BcfHw8/fv3p0uXLjx69Ih169axcuVKrdMSuYDcNSZytcV/XCUuVTdI7hTLQsmJEPIePFQPTOeVt6Hxp9rkJDR34cIF/P39OXv2rCnWrVs32rdvr2FWIreQjpDItVK6QeoP5HZVfShVMI9GGeVwigKbh0HoIXXcpzq8Mx9s5NdRbrRy5Upq1qxpKoKcnZ1ZunQpP/74I3nyyM+iyHrSERK51pI/rhGbmGx6nNINkrFBWWb/d3AmWB1zKwJdgsFBJq3MbeLi4hgyZAjLly83xV555RXWrVvHK6+8ol1iIteRQkjkSg8fJ7E81digtlV9KFNI/gLNEn9vgl1fqGP2rilFUN7CmqQktHP//n0aNmzIuXP/TqLZu3dvfvjhB1xcpCgW2Ut60SJXWpqqG6TTIfMGZZXbx2HjgFRBHXRYDN5VNElJaKtAgQKUK1cOAFdXV1auXMmSJUukCBKakI6QyHWiH+tZduC6Kta2ig9lCuXVJqGcLPoWrOkCyfHqePMvoEIrbXISmtPpdCxduhSDwcC0adMoX7681imJXEwKIZHrLDlwjUepukHDmkg3yOwSY1MWUo29q47X6A51hmiTk9DEmTNnePDgAQ0bNjTF8ufPz+bNmzXMSogUcmlM5CrR8XqWHbimirV+1Vu6QeZmNMCGfhB+Vh0v2QBafScLqeYSiqKwYMECatWqRadOnbhz547WKQmRhhRCIldZ+sc1HiWk7gbJnWJm9/sE+GeLOlagDPivADsHbXIS2SomJoYuXbowYMAAEhMTuXfvHl9++aXWaQmRhlwaE7lGdLyepam6Qa1e9aacl3SDzOr4j3DwB3XMKR8ErgUXD01SEtnrxIkT+Pv7c+XKFVNsyJAhTJs2TcOshEifdIRErrH8wHVVNwhgmMwbZF7X9sGvI9UxGzsIWAUFSmuTk8g2iqIwe/Zs6tSpYyqC3N3dWb9+PT/88ANOTk4aZyhEWtIRErlCTIKeJX9cVcVav+pN+cLSDTKb+5chpBsY1cUmbb4H3wba5CSyzcOHD+nTpw8bNmwwxV577TVCQkLw9fXVMDMhnk0KIZErLD9wnZhU3aChcqeY+TyOgtWdIOGhOl53WMpdYiJHMxqNNGrUiNOnT5tiI0aM4Ouvv8bBQcaECcsml8ZEjpfSDVKPDWpZuTAVCrtplFEOk5wEa7tDlLrjRoU20HSSNjmJbGVjY8PYsWOBlNvif/rpJ6ZPny5FkLAK0hESOd6PB64THa9XxeROMTNRFPh1BFzfr44XrgLtF8pCqrlIQEAAd+7coUOHDhQvXlzrdIR4YfJbSuRojxL0LE7VDXqrUmEqeks3yCwOzoKTq9SxPIX/fyFVV21yElnu0KFDjBs3Lk18xIgRUgQJqyMdIZGjrTh0Q7pBWeX8L7Bjgjpm5wxd1oB7EW1yElnKaDTy7bff8sknn2AwGKhYsSLvvfee1mkJ8VKkIyRyrNjEZBbtV49baVHJi1d8pBv00sJOp8wcjaKOt18IRWpokpLIWvfv36dNmzZ89NFHGAwGAFavXo2iKM/ZUwjLJoWQyLF+PHidh4+lG2R2MWGwujPoH6vjTSbAK+20yUlkqf3791OtWjW2bt0KpCyaOm7cODZv3oxOlksRVk4ujYkcKTYxmcWpukHNXvGiko+7RhnlEEmPUxZSfZRqzaiqgVB/hDY5iSxjNBqZMmUKn332GUajEYCCBQsSFBREs2bNNM5OCPOQQkjkSCsOXedBqm7QB9INejlGI2x8H8JOqePF60LbGbKQag5z9+5dunXrxo4dO0yxxo0bExQUhLe3t4aZCWFecmlM5Dhxicks2qfuBjWt6EXlItINeim7voDzP6tj+X1Tls+wc9QmJ5FlRo4caSqCdDodEydOZMeOHVIEiRxHOkIix1l5+IZ0g8zt1Gr4Y7o65uiespCqawFtchJZavr06ezatQtIGRTduHFjjTMSImtIISRylMdJySxM1Q1qUqEQrxaVblCm3TgIm4epYzpb8P8RCpbTJidhdoqiqAY+e3l58fPPP1OsWDG8vLw0zEyIrCWXxkSOsvLQDaLiklSxD5pKNyjTIq9AcFcwqjtstJoGpaVDkFPs2LGD1157jcjISFW8Zs2aUgSJHE8KIZFjpNcNerNCIaoUzadNQtYu/gGsDoD4KHX89UHwWh9tchJmlZyczKeffkqLFi04fvw4PXv2lHmBRK4jl8ZEjhF0OJTI1N0gGRuUOQY9rOsJkZfU8bItoPlkTVIS5nXr1i0CAwPZv//fdeIMBgOPHz/G1VWWRxG5h3SERI4Qn2Rgwb4rqljj8gWpWiyfNglZM0WBLaPh6h51vFAl6LgEbGw1SUuYz5YtW6hWrZqpCLK1teWbb77hl19+kSJI5DrSERI5QtCfN7gfm3pskAzkzQybowvh+DJ10LUQBAaDY15tkhJmodfrGTduHNOmTTPFihcvTnBwMHXq1NEwMyG0I4WQsHrxSQbm71WPDWpYriDVpBuUYV7Rp7A5NUMdtHVMWUg1n6wqbs1CQ0Pp3Lkzhw4dMsXatWvHsmXL8PDw0DAzIbQll8aE1UvpBiWqYnKnWCbc/Zua1+eiU4zq+LvzoGhNbXISZrNz505TEWRvb8/333/Ppk2bpAgSuZ50hIRVS9AbWJDqTrE3yhWkRvH8GmVkpR7dxW5tV3TGBHW88Tio3EGbnIRZ9ezZk99//52DBw+ydu1aXnvtNa1TEsIiSCEkrNrqP0O59yhVN0juFMsYfTwEB6KLuaWOv9oJ3hitTU7ipT148ID8+f/9g0Cn0zF//nwMBgP58uXTLjEhLIxcGhNWK0FvYN5e9Z1iDcp64ldCukEvzGiETYPg9jF1vGgtaDdbFlK1Uhs2bKBUqVJs3rxZFc+bN68UQUKkIoWQsFprjqTtBg2XsUEZs/dr+HuDKqS4F4fOq8HeSaOkRGYlJCQwdOhQOnTowMOHD+nZsyc3btzQOi0hLJpcGhNWKUFvYN4edTeofhlP/ErIwM8XdmYt7J2qCultnMA/CPs8BTVKSmTW5cuX8ff35+TJk6ZY8+bNVZfHhBBpSUdIWKXgI6FEpB4bJN2gFxf6J/w0WBVSdDYc8x0ChSpqlJTIrJCQEGrUqGEqghwdHVmwYAFr1qzBzc1N4+yEsGzSERJWJ72xQfXKFOC1ktINeiEPbkBwIBjUE1Aam31FxD0fjZISmREfH8+IESNYsGCBKVa+fHnWrl1LlSpVNMxMCOshHSFhddYeu8ndmNR3isks0i8kITplIdXH99Xx1/phfK2vNjmJTLl8+TKvv/66qgh67733OHbsmBRBQmSAFELCqiQmG5i7W90Nqlu6ALV8pRv0XIZkWN8b7p1Xx0s3gbe+1iYnkWkODg7cvHkTAGdnZ5YsWcKKFSvIkyePxpkJYV2kEBJWZe3Rm4THqCf9k3mDXtBvn8Dl39WxghWg0zKwlavk1qZ48eL8+OOPvPLKKxw9epTevXujk+kOhMgwKYSE1UhMNjA31Z1ir5fyoHapAhplZEWOLIIjC9QxlwIQGAJO7trkJDLk/PnzPHr0SBVr27Ytp0+fplKlShplJYT1k0JIWI21x24RFp26GyRjg57r8u+w9SN1zNYhZa6g/CU1SUm8OEVRWLZsGX5+fvTv3x9FUVTP29lJN0+Il6F5ITR37lx8fX1xcnLCz8+P/fv3P3P7xMRExo0bR4kSJXB0dKR06dIsXbo0m7IVWklMNjBv92VVrLavB3VKSzfomSLOw7peoBjU8Xazofjr2uQkXlhsbCw9evSgd+/exMfHs2bNGkJCQrROS4gcRdM/JUJCQhg+fDhz586lXr16LFiwgJYtW3Lu3DmKFy+e7j7+/v7cvXuXJUuWUKZMGSIiIkhOTs7mzEV2W3/8FndSd4Nk3qBni7sPq/0hMUYdf2M0VA3QJifxwq5fv86YMWO4ePGiKda/f3/efvttDbMSIufRtBCaPn06ffr0oW/flNt2Z8yYwW+//ca8efOYMmVKmu23bdvG3r17uXr1Kh4eKXcJlSxZMjtTFhpISjamuVOslq8HdWRs0NPpE1LmCnoYqo5XehcafaJNTuKFKIrC4sWLGTNmDElJKXM95c2bl4ULF9K5c2eNsxMi59GsEEpKSuL48eN8/PHHqnjz5s05ePBguvts3ryZmjVr8s0337By5UpcXV1p164dX3zxBc7Ozunuk5iYSGLiv3POxMSk/HWs1+vR6/Vmejcis56cg2edi5Cjt7j9MF4VG9LIVzqBT6Mo2G4egs3NP1Vho08NDK1ngcGQ8i+VFzkXImvFxMQwaNAg1q5da4pVq1aNoKAgypYtK+dGA/JzYTmy6hxoVgjdv38fg8GAl5eXKu7l5UV4eHi6+1y9epU//vgDJycnNm7cyP379xk0aBBRUVFPHSc0ZcoUJk2alCa+e/duXFxcXv6NCLPYsWNHuvFkI3x/yhb497bgUnkVos7/yZYL2ZSclSkX/hMVw/6nij2292CfR08Sd+x+7v5POxciaz18+JCxY8cSFhZmirVq1YqePXty6dIlLl26pGF2Qn4utPf48eMseV3NbzdIPe+FoihPnQvDaDSi0+kICgrC3T3llt/p06fTsWNH5syZk25XaOzYsYwcOdL0OCYmhmLFitG4cWMKFJBLK1rT6/Xs2LGDZs2aYW9vn+b5kGO3iEo8p4pN6FCTujJIOl26c5uwO6kughQHV+y7b6CJV+Vn7vu8cyGylqIorFu3jl9//RV3d3f69+/PhAkT5FxoTH4uLEdkZGSWvK5mhZCnpye2trZpuj8RERFpukRPeHt7U6RIEVMRBFCxYkUUReHWrVuULZt28KyjoyOOjo5p4vb29vJNbUHSOx96g5H5+66pYjVL5OeN8l4ycVx6bh2Hn4ekCurQdViKfdHqL/wy8rOhnRUrVtC3b1+mTJnChQsX5FxYEDkX2suqr79mt887ODjg5+eXpt24Y8cO6tatm+4+9erV486dO8TGxppiFy9exMbGhqJFi2ZpviL7bThxi1sP1GODPmhaVoqg9Dy8CWs6Q7L6zjpafAnl39ImJ/FMR48eZd++faqYh4cHGzZsoFSpUhplJUTuo+k8QiNHjmTx4sUsXbqU8+fPM2LECEJDQxkwYACQclmre/fupu0DAwMpUKAAvXr14ty5c+zbt4/Ro0fTu3fvpw6WFtZJbzDywy71vEF+JfJTv4ynRhlZsMRHKUVQXIQ67tcTXh+kSUri6RRFYcaMGdSrVw9/f/+njokUQmQPTQuhgIAAZsyYweeff061atXYt28fW7ZsoUSJEgCEhYURGvrv7b958uRhx44dPHz4kJo1a9K1a1fatm3LrFmztHoLIotsPHE7bTeoiXSD0jAa4H994e5f6rjvG9DqW5Cvl0WJiorinXfeYcSIEej1eu7evcs333yjdVpC5GqaD5YeNGgQgwal/1fr8uXL08QqVKggo/dzOL3ByOxUs0hXL56PBmWlG5TGjs/g4jZ1rEBZ8F8BtjKewZIcOnSIzp07q/64Gz16NF9++aWGWQkhNF9iQ4jUNp68TWiU+jbJ4U3LSTcotWPL4NBsdcw5f8pCqs75tclJpGE0Gpk2bRpvvPGGqQgqUKAAv/zyC998840MwBVCY5p3hIT4r2SDkTmpukHViuXjDekGqV3dA1s+VMds7CFgFRQorUlKIq379+/To0cPtmzZYorVr1+fNWvWyA0eQlgI6QgJi7Lp1B1uRKq7QXKnWCr3LsLa7mBMNbN22xlQsr4mKYm0kpOTadCggakI0ul0fPLJJ+zevVuKICEsiBRCwmIkG4z8sEs9e27VYvloVK6gRhlZoMdRKQupJkSr4/WGQ/X3NElJpM/Ozo5PP/0UgIIFC7Jt2za+/PJL7OykES+EJZGfSGExfkqnGzRc7hT7V3IShLwHD9STTFKhDTSZoE1O4pm6du3KvXv3CAgIwNvbW+t0hBDpkI6QsAjJ6dwpVqWoO43KSzcIAEWBX0bAjQPquHdVaL8QbORHWWu7d+9m3LhxaeLDhw+XIkgICyYdIWERfjkbzrX7carYcBkb9K8DM+HUKnUsrzd0CQYHV21yEgAYDAa++OILPv/8cxRF4dVXX6Vz585apyWEeEHyZ6TQnFGBuXuuqmJVirrTuHwhjTKyMOd/ht8nqmP2LilFkJuPJimJFGFhYTRr1oxJkyahKAoA69at0zgrIURGSCEkNHfivo5rqcYGDXtTukEA3DkJ/+sHKP8J6lIuh/lU0ygpASnrIlarVo3du3cDYGNjw+TJk6UQEsLKyKUxoSmDUeG3W+p6vHIRN5pUlG4QMXdgTRdIVi81QtOJULGtJimJlNviJ06cyFdffWXqAvn4+LBmzRreeOMNjbMTQmSUFEJCU7+eDSciQd35+aCJzCJNUlzKQqqPwtTxau9BvQ+0yUlw69YtAgMD2b9/vyn21ltvsWLFCgoWlIH9QlgjuTQmNGMwKsxJNTaoko8bTXN7N8hohA3vQ9hpdbxEPWjzvSykqqERI0aYiiBbW1umTp3Kr7/+KkWQEFZMOkJCM7+cucPVVHeKyQrzwM5JcOEXdcyjVMryGXYO2uQkAJg1axb79u3D0dGR4OBg6tatq3VKQoiXJIWQ0ITBqPDDLvW8Qa94u9HsFS+NMrIQJ1fBgRnqmJM7BK4FFw9NUsrNjEYjNv+Zo8nb25tff/2VUqVK4eEh50OInEAujQlNbDkbxuWIWFVsWG7vBl3/A34ero7Z2IH/CvAsq0lKudnmzZupWbMmUVFRqnjNmjWlCBIiB5FCSGQ7o1Fh1k71mmIVCueleW7uBkVeSVk+w6hXx1t/B6UaaZJSbpWUlMTIkSN5++23OXnyJL179zbdHSaEyHnk0pjIdlv+CuNSqm7QkEalsLHJpd2g+AcpC6nGP1DH6wwBv56apJRbXbt2jYCAAI4ePWqK2djYkJCQgLOzs4aZCSGySqYKobi4OL7++mt27txJREQERqNR9fzVq1efsqfI7dLrBnm7KDTLrXeKGfSwtjtEqsdLUa4lNPtcm5xyqQ0bNtC7d2+io6MBcHBw4LvvvmPw4MG5+5KtEDlcpgqhvn37snfvXrp164a3t7f8khAvbOtf4Vy8q+4GvVXUmDu7QYoCWz6Ea/vUca9XocNisLHVJq9cJjExkQ8//JDZs2ebYqVLlyYkJAQ/Pz8NMxNCZIdMFUJbt27l119/pV69eubOR+Rg6XWDyhXKQxWPh9okpLXDc+H4cnUsjxcEBoNjHk1Sym0uX75MQEAAJ06cMMX8/f1ZtGgRbm5uGmYmhMgumRosnT9/frlrQmTYb3+H88/dR6rYkMalyI3NIP7ZCr+NU8fsnKDzGnAvqk1OudCuXbtMRZCjoyPz588nODhYiiAhcpFMFUJffPEFn332GY8fP37+xkKQ0g2ambob5JWHFrnxTrHws7C+D+qFVIF350NRuRSTnfr164e/vz/lypXjzz//pH///nKpX4hcJlOXxr777juuXLmCl5cXJUuWxN7eXvX8f9vMQgBsPxfOhXB1N2hYk7K5b2zQo3BY3Rn06hm1efNTqPSuNjnlIpGRkRQoUMD0WKfTsWjRInQ6HXnz5tUwMyGEVjJVCL3zzjtmTkPkZCndIPVdUWUL5aFVZW8MhmSNstKAPj5lNfmYW+p4lQBo8KE2OeUiq1atYuDAgaxZs4Y2bdqY4nIZTIjcLVOF0IQJE8ydh8jBtp+7y/mwGFXsSTfIYNAoqexmNMLGAXAnVbe02OvQ7gdZSDULPX78mKFDh7J06VIAevTowalTpyhWrJjGmQkhLIFMqCiylKKkvVOsTKE8tHrVW6OMNLLnKzi3SR3LVwI6B4GdoyYp5Qbnzp2jU6dOnDt3zhRr166d3OwhhDB54ULIw8ODixcv4unpSf78+Z85oDD12jwi99px7i7nUnWDhr5ZBtvcNDbodAjsm6aOObqlLKTq6qlNTrnA8uXLGTRoEPHx8QC4uLgwb948unfvrnFmQghL8sKF0Pfff28aTDhjxoysykfkIIqS9k6x0gVdaVPFR6OMNBB6GDYPUcd0ttBpGRSqoE1OOVxsbCyDBw9mxYoVpljlypVZt24dFSrI11wIofbChVCPHj3S/X8hnub38xH8fSft2KBc0w2KugbBgWBIUsdbToUyTbXJKYc7f/487du358KFC6ZYv379mDlzpqwVJoRI10uPEYqPj0evV6+YLXdhiJRu0EVVrFRu6gYlRMOazvA4Uh2v1R9q9dMmp1zA2dmZsLAwAPLkycPChQvp0qWLxlkJISxZpiZUjIuLY8iQIRQqVIg8efKQP39+1T8hdp6P4K/bqbpBb+aSbpAhGdb1hHsX1PEyzaDFV5qklFuULFmSZcuWUb16dU6cOCFFkBDiuTJVCI0ZM4Zdu3Yxd+5cHB0dWbx4MZMmTcLHx0d1XV7kTumNDSrl6UrbqrmkG7TtY7iySx0rWBE6LgVbuVHTnE6fPs2jR+qJOt99912OHDlC2bJlNcpKCGFNMlUI/fzzz8ydO5eOHTtiZ2dHgwYN+PTTT/nqq68ICgoyd47Cyuz+J4Kzt6NVsSG55U6xPxfC0UXqmIsnBIaAk1wyNhdFUZg7dy61atVi4MCBKIp6uRI7Oyk4hRAvJlOFUFRUFL6+vkDKeKAnt8vXr1+fffv2mS87YXUURWHm7+pukK+nK+1yQzfo0u+w7SN1zNYROq+G/CW0ySkHio6Oxt/fn8GDB5OUlERQUBAbNmzQOi0hhJXKVCFUqlQprl+/DsArr7zC2rVrgZROUb58+cyVm7BCe/65x+lbqbpBjctgZ5upbzXrcfdcyrggxaiOvz0HitfWJKWc6OjRo1SvXp3169ebYh988IFqyQwhhMiITH069erVi9OnTwMwduxY01ihESNGMHr0aLMmKKyHoijMSDU2qGQBF96ulsO7QbH3YHUAJKnHqtDwI6jSSZucchhFUZg5cyb16tXj2rVrAOTLl4+NGzcyY8YMHB1ldm4hROZk6kL6iBEjTP/fuHFjLly4wLFjxyhdujRVq1Y1W3LCuuy9eI/TNx+qYkPeLJuzu0H6hJS5gqJD1fHKHaDRWG1yymEePHhA79692bRpkylWu3ZtQkJCKFFCLjkKIV5Ohgqh+Ph4du7caWpDjx07lsTERNPzhw8fpnz58jg5OZk3S2HxFEVhRqqxQSUKuPBOTu4GKUrKrNG3jqjjRWqmXBKThVRf2q1bt6hfvz43btwwxT788EO++uor7O3tNcxMCJFTZKgQWrFiBb/88oupEJo9ezaVKlUyzdh64cIFvL29VR0jkTvsu3SfU6m6QYNz+tigfdPg7Dp1zL1YyuBoe5nF2Bx8fHx45ZVXuHHjBh4eHqxYsYLWrVtrnZYQIgfJ0KdUUFAQvXv3VsVWr17N7t272b17N9OmTTMNnBa5R8qdYupZpIt7uPBu9SIaZZQN/vof7P5SHXPIA12CIa+XNjnlQDY2NqxYsYJOnTpx6tQpKYKEEGaXoULo4sWLlCtXzvTYyckJG5t/X6JWrVqcO3fOfNkJq7D/0n1OhD5UxYY0LoN9Tu0G3TwKGweqYzqblAkTC1fWJqcc4o8//kgzBYenpydr166lWLFiGmUlhMjJMvRJFR0drZqo7N69e5QsWdL02Gg0qsYMiZwvvVmki3k4826NHNoNehgKwV3AkOr7vMVXUK6FNjnlAEajkSlTptCoUSMCAgKIiIjQOiUhRC6RoUKoaNGi/PXXX099/syZMxQtWvSlkxLW48DlSI7feKCK5dhuUOIjWN0Z4u6p4zV7Q+0B2uSUA0RERNCyZUs++eQTDAYD4eHhfP/991qnJYTIJTL0adWqVSs+++wzEhIS0jwXHx/PpEmT5Bp+LpJyp5h6bFDR/M60r5EDi2GjAdb3gYi/1fFSjaDlN3KHWCbt2bOHatWqsX37dgB0Oh2fffYZX3zxhcaZCSFyiwzdNfbJJ5+wdu1aypcvz5AhQyhXrhw6nY4LFy4we/ZskpOT+eSTT7IqV2FhDl6J5FiqbtDgnNoN2v4pXPpNHfMsB51+BFu5jTujDAYDkydP5vPPP8doTJmN28vLi6CgIJo0aaJxdkKI3CRDhZCXlxcHDx5k4MCBfPzxx6aFDnU6Hc2aNWPu3Ll4eckdM7lBemuKFcnnTIec2A06ugQOz1XHnD1SFlJ1zqdJStYsPDycrl27smvXLlOsSZMmrFq1isKFC2uYmRAiN8rwzNK+vr5s27aNqKgoLl++DECZMmXw8PAwe3LCch26EsmR61Gq2ODGZXCwy2HdoCu7YUuqZWNs7KFzEHiU0iYnK6bX66lXrx5Xr14FUm6PnzRpEmPHjsXW1lbj7IQQuVGmltgA8PDwoFatWubMRViR1GuKFcnnTEe/HNYNuncR1vYAxaCOt5sFJepqk5OVs7e3Z+LEiXTv3h0fHx9Wr15Nw4YNtU5LCJGLZboQErnXoSuRHLmm7gYNalw6Z3WD4iJhdSdIjFbH64+EaoHa5JRDdOvWjQcPHtClSxcKFiyodTpCiFwuB31yieyS+k4xH3cnOvnloMnukhMh5D14cF0dr9gO3hyvSUrWatu2bYwbNy5NfNiwYVIECSEsgnSERIYcvhrJn6m6QQNz0tggRYGfh0PoQXXcuxq8uwBscsj7zGJ6vZ7x48czdepUAKpVq0anTp00zkoIIdKS3+oiQ1LfKebt7oR/zRw0NuiP7+H0anUsr0/KGmIOLtrkZGVCQ0Np1KiRqQgC+OmnnzTMSAghnk4KIfHC/rwayaGrkarYoEalcbTLIXf7nNsMOyepY/YuEBgMbt7a5GRlfv75Z6pXr87BgykdNTs7O7777jtWrlypcWZCCJE+uTQmXljqNcUKuznh/1oOGRt0+wRseD9VUAcdFoN3VU1SsiZJSUmMHTuW6dOnm2IlSpQgJCSE2rVra5iZEEI8mxRC4oUcuRbFwSupukGNc0g3KPo2rOkCyfHqeLPPoYIsGfM8165do3Pnzhw5csQUe+edd1i6dCn58+fXMDMhhHg+uTQmXsjMneo7xbzcHPGvmQO6QYmxsCYAYsPV8erdoO5QbXKyMiNGjDAVQQ4ODsyaNYsNGzZIESSEsArSERLPdex6FAcuq7tBAxuWxsneyrtBRmPK5bDws+p4yQbQerospPqC5syZw4EDB3Bzc2Pt2rX4+flpnZIQQrwwKYTEc6UeG1QoryOdaxXXKBsz+n0C/POrOuZRGvxXgJ2DNjlZAaPRiM1/phEoUqQIW7dupWzZsri7u2uYmRBCZJxcGhPPdPxGFPsv3VfFBjbKAd2gEyvg4Cx1zCkfBK4FF1k372nWrl1LjRo1ePjwoSpes2ZNKYKEEFZJCiHxTDNSzRtUMK8jXay9G3RtP/wyQh2zsYOAleBZRpucLFx8fDwDBgwgICCA06dP06dPHxRF0TotIYR4aXJpTDzV8RsP0naDrH1sUOSVlOUzjMnqeOvp4PuGNjlZuH/++Qd/f3/OnDljijk7O5OUlISjo6OGmQkhxMuTjpB4qtRjgwrmdSSwthV3gx5HwWp/SHiojtcdCn49NEnJ0gUFBeHn52cqgpydnVmyZAkrV66UIkgIkSNIR0ik62ToA/ZdvKeK9X+jlPV2g5KTYG13iLysjpdvBU0npb9PLvb48WOGDRvGkiVLTLGKFSuydu1aKleurGFmQghhXpp3hObOnYuvry9OTk74+fmxf//+F9rvwIED2NnZUa1ataxNMJdK3Q3yzONI19olNMrmJSkK/DoSrqf63ir8KrRfBDZWWtxlkXPnzlGrVi1VEdSjRw+OHj0qRZAQIsfRtBAKCQlh+PDhjBs3jpMnT9KgQQNatmxJaGjoM/eLjo6me/fuNGnSJJsyzV1O3XzInn/U3aABDUvh7GClBcOh2XAy1VpXeQpDlxBwzKNNThZs//79/P333wC4uLiwfPlyli9fjqurq8aZCSGE+WlaCE2fPp0+ffrQt29fKlasyIwZMyhWrBjz5s175n79+/cnMDCQOnXqZFOmucvM39WzSHvmcbDebtCFLbB9vDpm5wxd1oB7EW1ysnDvv/8+HTt2pHLlyhw7dowePWT8lBAi59JsjFBSUhLHjx/n448/VsWbN29uWrk6PcuWLePKlSusWrWKyZMnP/c4iYmJJCYmmh7HxMQAoNfr0ev1mcw+5zpzK5rdqbpBfeuXxE5nRK83mv14T85BlpyL8LPY/a8vOtS3eSe3m4NS6FWQ8w9AREQEhQoVMp2D5ORk5s+fj52dHS4uLvJzooEs/bkQGSLnwnJk1TnQrBC6f/8+BoMBLy8vVdzLy4vw8PB097l06RIff/wx+/fvx87uxVKfMmUKkyalHQy7e/duXFxcMp54DrfgvA3/bRTmsVPwiDrHli3nsvS4O3bsMOvrOekf8MY/E7HXx6ni57w7cemaHVzbYtbjWSNFUdixYwdLlixhzJgxpqUxzH0uRObJubAcci609/jx4yx5Xc3vGtOlWs9JUZQ0MQCDwUBgYCCTJk2iXLlyL/z6Y8eOZeTIkabHMTExFCtWjMaNG1OgQIHMJ54Dnb0dzblDf6pig5qU4936vll2TL1ez44dO2jWrBn29vZmetHH2K5sh43+gSpsfDWAsm1nU1bWECMmJoZBgwaxdu1aAObNm8ehQ4f4+++/zXsuRKZkyc+FyBQ5F5YjMjLy+RtlgmaFkKenJ7a2tmm6PxEREWm6RACPHj3i2LFjnDx5kiFDhgApax4pioKdnR3bt2/nzTffTLOfo6NjuvOd2Nvbyzd1KnP3XlM99nB1oGe9UtjbZ/23idnOh9EIG4dC2Cl1vHgdbN7+ARtZQ4yTJ0/i7+/P5cv/TiXg7++Pl5cXf//9t/xsWBA5F5ZDzoX2surrr9lgaQcHB/z8/NK0G3fs2EHdunXTbO/m5sbZs2c5deqU6d+AAQMoX748p06donbt2tmVeo509lY0v5+PUMXef6MULg6aNw0zZveXcO4ndSx/SQgIArvcPQGgoijMnTuXOnXqmIogNzc31q1bx5w5c3ByctI4QyGEyH6afsqNHDmSbt26UbNmTerUqcPChQsJDQ1lwIABQMplrdu3b7NixQpsbGzSzGFSqFAhnJycZG4TM0g9b5CHqwPdXreyO8VOrYH936pjju4pC6m65u7LoNHR0fTt25f169ebYn5+foSEhFC6dGkNMxNCCG1pWggFBAQQGRnJ559/TlhYGJUrV2bLli2UKJHyARwWFvbcOYXEy/vrdjS/n7+rivVrUApXRyvqBt04CJuHqmM6W/BfDgXLa5KSpThz5gzvvvsuV69eNcU++OADpk6dKstkCCFyPc0/6QYNGsSgQYPSfW758uXP3HfixIlMnDjR/EnlMqm7Qfld7Olex4q6QVFXIbgrGFPdWtlqGpROO24st8mTJw/376csnpsvXz6WLVvGO++8o21SQghhITRfYkNo6+870ew4p+4G9bWmblD8Q1jdGeKj1PHaA+G1PpqkZGlKlSrF4sWLqV27NidPnpQiSAgh/kMKoVxuVqpuUD4Xe3rULalNMhll0MO6nnD/H3W8bHNo8aUmKVmCY8eOERennj+pU6dOHDhwgJIlS2qTlBBCWCgphHKxc3di+O3vtGOD8lhDN0hRYOsYuLpbHS9UCTouzZULqRqNRr799lvq1KnD4MGD0zxva5v7viZCCPE8UgjlYqm7Qe7OVjQ26M8FcGypOuZaEAKDwTGvNjlp6P79+7Rr147Ro0eTnJzMjz/+yM8//6x1WkIIYfGs4E9/kRXOh8Ww7W/1ZJb9GviS18kKJgy7uB1+G6uO2TpC5zWQr7g2OWnojz/+oEuXLty6dcsUGzt2LC1bttQwKyGEsA7SEcql0usGWcXYoLt/w/reoKRaAPaduVDsNW1y0ojRaGTKlCk0atTIVAQVLFiQbdu28dVXX73wenxCCJGbyW/KXOhCeAxb/1J3g/rUt4JuUGxEyh1iSY/U8UZj4dWO2uSkkYiICLp168b27dtNsYYNG7J69Wp8fHw0zEwIIayLFEK50A87L6seuznZ0bNeSW2SeVH6eAgOhOhUE2xW7ggNP9ImJ41cv36dunXrEhYWBqQsXDx+/HjGjx8vXSAhhMgg+a2Zy/wT/ohfz4apYn3ql8LNkrtBigI/DYZbR9XxorXg7TmQy1aTL168OFWqVCEsLAwvLy+CgoJo0qSJ1mkJIYRVkjFCucysXeqxQXmtoRu0dyr89T91zL04dA4C+9y3UKiNjQ0rVqwgMDCQU6dOSREkhBAvQTpCucjFu4/YkqYb5Iu7swV3g86uhz1T1DGHvBAYAnkKaZNTNvv9999xcnKifv36plihQoUICgrSMCshhMgZpCOUi8zaeQlF+fdxXic7etXz1S6h57l5BDalWodOZwOdloHXK9rklI2Sk5MZP348zZs3JyAggHv37mmdkhBC5DhSCOUSl+6mHRvUq54Fd4Me3EgZHG1IVMff+hrKNtMmp2x0+/ZtmjRpwuTJk1EUhTt37jB37lyt0xJCiBxHLo3lEj/suqzuBjna0cdSu0EJMbCmM8Sl6oC81hdqva9NTtlo27ZtdOvWzbRivK2tLV9++SWjR4/WODMhhMh5pBDKBS5HPOLnM3dUsV71SuLuYoHdIENyyoSJEefU8dJvwltTc/QdYnq9nvHjxzN16lRTrGjRogQHB1OvXj0NMxNCiJxLCqFcIHU3KI+jHb3rW2g3aPs4uLxDHfMsDx2XgW3O/Xa9efMmnTt35uDBg6ZYmzZtWL58OQUKFNAwMyGEyNly7ieLAOByRCw/n1Z3g3rWLUk+FweNMnqGI4vgz/nqmEuBlDvEnPNpklJ2SExMpF69ety8eRMAOzs7vv76a0aOHIkuB3fAhBDCEshg6Rxu9q5LGFN1g/pYYDdId3U3bE01Q7StAwQEgYfl5WtOjo6OTJo0CYASJUqwf/9+Ro0aJUWQEEJkA+kI5WBX78WyOVU3qEfdEuR3taxuUN7429hu+AoUg/qJdj9AiTraJJXNevbsSVxcHF27diV//vxapyOEELmGdIRysNm7Lqu6Qa4OtvStX0q7hNITd5/aV6ejS0y1kGqDD6FqZ21yymIbN25k3LhxqphOp2PIkCFSBAkhRDaTjlAOdfVeLJtO3VbFetQtaVndoOREbP/XE9ekVLfJv/IONB6X7i7WLDExkdGjR/PDDz8A4OfnR/v27TXOSgghcjfpCOVQs3eru0EuDrb0bWBB3SBFgc3DsLl5WB33qQHvzAObnPWteeXKFerVq2cqggC2bt2qYUZCCCFACqEc6fr9OH46pR4b1L1OSTwsqRu0/zs4E6yOuRWBLmvAwUWbnLLIunXrqFGjBsePHwdSBkfPmzePhQsXapyZEEIIuTSWA83efRnDf9pBLg629GtgQXde/b0Jdn2hCin2rui6BEPewtrklAUSEhIYOXIk8+bNM8XKli3L2rVrqVatmnaJCSGEMJFCKIe5ERnHxpPqsUHd6pSgQB5HjTJK5fZx2NhfFVLQYXhnAXbeVTRKyvwuXryIv78/p0+fNsUCAwOZP38+efPm1TAzIYQQ/yWXxnKY2bvU3SBne1vet5SxQdG3YE0XSE5Qhf8u0hml3FsaJZU1hg8fbiqCnJycWLx4MatWrZIiSAghLIx0hHKQ0MjHbEjVDepuKd2gxFhY3Rli76rCxmrduEJTymuUVlZZuHAh1apVo1ChQqxdu5bKlStrnZIQQoh0SEcoB5m9+1KablC/NyygG2Q0wIZ+cPesOu77Boa3vskRC6kaDOrJIIsWLcr27ds5evSoFEFCCGHBpBDKIW5GPWbDCXU36L3Xi+NpCd2g3yfAP1vUsQJlwH8F2Nprk5MZ/fjjj9SoUYPo6GhVvEaNGri6umqUlRBCiBchhVAOMWf3ZZL/0w1ysrfh/TdKa5jR/zv+Ixz8QR1zzg+Ba1P+a8Xi4uLo2bMnPXv25MyZM/Tr1w9FUZ6/oxBCCIshY4RygJtRj1l//JYq9l7tEhTMq3E36Ope+HWkOmZjDwGroIAFFGkv4ezZs/j7+3PhwgVTzN3dneTkZOztrb/LJYQQuYV0hHKAuXvU3SBHOxveb6jx2KD7l2FtNzAmq+NtZ0DJ+pqkZA6KorB48WJq1aplKoLy5MlDUFAQixYtkiJICCGsjHSErNytB49ZdyxVN+j1EhTK66RRRsDjKFjdCRLUY2ao9wFUf0+bnMzg0aNHDBgwgNWrV5tiVatWZe3atZQrV07DzIQQQmSWdISs3JzdV9J0g/pr2Q1KToKQbhB1VR2v0AaaTNQkJXM4deoUfn5+qiJo4MCBHD58WIogIYSwYtIRsmK3H8az/vhNVSywdnHtukGKAr+OgBt/qOOFq0D7hVa9kOq+ffu4dOkSAG5ubixatAh/f3+NsxJCCPGypBCyYnN3X0ZvUHeDBjbUcBDywVlwcpU6ltcbAkPAwbpvIx86dCi7d+/m5s2bhISEULq0dQ/2FkIIkUIKISt1+2E8a4+pu0FdahWnkJtG3aDzv8COCeqYnXPKavJuPtrk9BLCw8MpXPjfBWB1Oh0//vgjjo6OODpawNxMQgghzMJ6r1XkcvP2qLtBDnY2DGykUZci7HTKzNGkmkOn/ULwqa5JSpmlKAqzZs2iZMmSbN++XfWcm5ubFEFCCJHDSCFkhe48jGftUfWdYoG1iuOlRTco5k7KGmL6x+p404nwSrvsz+clPHjwgA4dOvDBBx+QmJjIe++9R1hYmNZpCSGEyEJyacwKzdtzhSSD0fTYwdaGAVqMDUqKgzWd4dEddbxaV6g3PPvzeQl//vknAQEB3LhxwxTr3r07BQoU0DArIYQQWU06QlYmLDqekKPqsUGdaxWjsHs2d4OMRtjYP+Wy2H+VqAdtZljNQqqKovDdd99Rv359UxHk4eHB5s2b+fbbb3FwcNA4QyGEEFlJOkJWZn463SBNxgbt+gLO/6yO5fcF/5VgZx3FQ2RkJD179uSXX34xxerWrUtwcDDFihXTMDMhhBDZRTpCViQ8OoE1R9TdoIDXiuHt7py9iZxaDX9MV8ec3FMWUnW1jktJx44do1q1aqoi6OOPP2bPnj1SBAkhRC4iHSErMn+vuhtkb6vL/m7Q9QOweZg6prMF/xVQ0HpmWM6XLx/R0SlLgHh6erJy5UreeustjbMSQgiR3aQjZCXuxiSw+kioKhbwWjF88mVjNyjyCoR0BaNeHW/9HZRqlH15mEGZMmVYtGgRb7zxBqdOnZIiSAghcikphKzEvD1XSEpO3Q0qk30JxD+A1QEp//2v1wdDzV7Zl0cmHTx4kMeP1bf4BwQEsHv3booUKaJRVkIIIbQmhZAViIhJYE2qblCnmsUokl3dIIMe1vWEyEvqeLm3oPkX2ZNDJhkMBiZPnkyDBg0YNmxYmudtrHj9MyGEEC9PPgWswPy9V0lM1Q0alF1jgxQFtoyGq3vUca/K0GEx2NhmTx6ZEB4eTosWLRg/fjxGo5ElS5bw22+/aZ2WEEIICyKFkIWLiEkg6M8bqlhHv2IUze+SPQkcngfHl6ljroWgSzA45s2eHDJh586dVKtWjZ07dwIpnZ/PP/+cpk2bapyZEEIISyJ3jVm4BfvU3SA7m2zsBv2zDX77RB2zc0pZSDWfZd5ibjAY+Pzzz/niiy9QlJS1z7y9vVmzZg0NGzbUODshhBCWRgohCxbxKG03qFPNohTzyIZuUPhf8L8+pFlI9Z15ULRm1h8/E+7cuUNgYCB79+41xVq0aMGKFSsoVKiQhpkJIYSwVFIIWbCFe6+SoE/dDcqGO8Ue3U1ZQywpVh1vPA4qt8/642fC5cuXqVu3Lvfu3QPA1taWyZMnM2bMGBkQLYQQ4qnkE8JC3XuUyKpU3aAONbKhG6SPh+BAiFbPYM2r/vDG6Kw99kvw9fWlatWqABQtWpQ9e/bw8ccfSxEkhBDimeRTwkIt2p+2GzS4cRZ3g4xG2DQQbh9Tx4vVhnY/WPRCqra2tqxatYoePXpw6tQp6tevr3VKQgghrIBcGrNA92MTWXHouirWvkYRihfI4m7Q3q/h743qWL7iEBAE9tm8uv1z/Prrr+TPn5+6deuaYl5eXixfvly7pIQQQlgd6QhZoEX71N0gWxsdQxqXzdqDnlkLe6eqY45uKQup5imYtcfOAL1ez4cffkibNm0ICAggMjJS65SEEEJYMSmELExkbCIrDqnHBrWvnsXdoNA/4afB6pjOBjotg0IVs+64GXT9+nUaNGjAd999B8CtW7dYvHixxlkJIYSwZlIIWZiF+68SrzeYHtva6BjyZhaODXpwPWVwtCFJHW/5DZSxnMkHN23aRPXq1fnzzz8BsLe3Z+bMmYwZM0bjzIQQQlgzGSNkQaLikliZqhv0TrUilCjgmjUHTIhOWUj18X11vNb7UKtf1hwzgxITE/noo4+YOXOmKVaqVClCQkKoWdMy5zMSQghhPaQQsiCL9l/lcZK6GzQ0q7pBhmRY3xvuXVDHSzeBFlOy5pgZdPXqVfz9/Tl+/Lgp1rFjRxYvXoy7u7uGmQkhhMgppBCyEFFxSfx48Loq9nY1H0p6ZlE36LdP4PLv6ljBCinjgmy1/7aIj4+nXr16hIeHA+Do6Mj333/PgAED0FnwbfxCCCGsi4wRshCLU3WDbHQw9M0sulPsyCI4skAdc/GEwBBwsoxOi7OzM1988QUAZcuW5fDhwwwcOFCKICGEEGal/Z/+ggfpdIPeqVYE36zoBl36HbamGmBs6wCdV0P+kuY/3kvo06cPer2e9957j7x5LXeleyGEENZL847Q3Llz8fX1xcnJCT8/P/bv3//UbTds2ECzZs0oWLAgbm5u1KlTh99++y0bs80ai/+4SlyqblCW3CkWcR7W9wLFqI6/PQeK1zb/8TIgODiY8ePHq2I6nY6BAwdKESSEECLLaFoIhYSEMHz4cMaNG8fJkydp0KABLVu2JDQ0NN3t9+3bR7NmzdiyZQvHjx+ncePGtG3blpMnT2Zz5ubz8HESPx5U3ynWrqoPpQrmMe+B4u7Dan9IjFHH3xgDVfzNe6wMiI+PZ86cOXTv3p3JkyezefNmzXIRQgiR+2haCE2fPp0+ffrQt29fKlasyIwZMyhWrBjz5s1Ld/sZM2YwZswYXnvtNcqWLctXX31F2bJl+fnnn7M5c/NZ8sc1YhOTTY9TukFmHhukT0iZK+hhqgKzUnto/Il5j5UB58+fp27duuzYscMU27lzp2b5CCGEyH00GyOUlJTE8ePH+fjjj1Xx5s2bc/DgwRd6DaPRyKNHj/Dw8HjqNomJiSQmJpoex8SkdET0ej16vT4TmZvPw8d6lh64poq1frUwJfI7mi83RcF282Bsbv6pCht9amBoPROSk5+yY9ZauXIlQ4cO5fHjxwC4uLgwa9Ysunfvrvl5ya2efN3l6689ORfmZzQa0ev1KIqSof2Sk5Oxs7MjNjYWOzsZVpuVdDod9vb22Nik36PJqp8Hzc7q/fv3MRgMeHl5qeJeXl6mW6af57vvviMuLg5//6df2pkyZQqTJk1KE9+9ezcuLlm8iOlzbAm1IS7x3xOuQ6Gy7hZbttwy2zHKhf9ExbD/qWKP7T3Y59GTxB27zXacF5WQkMDChQvZtWuXKVa8eHFGjx6Np6cnW7ZsyfachNp/O3RCW3IuzMPW1hZPT0/s7e0ztX/hwoW5evWqmbMS6dHr9dy7dw+j0ZjmuSd/OJub5uVt6tuhFUV5oVuk16xZw8SJE/npp58oVKjQU7cbO3YsI0eOND2OiYmhWLFiNG7cmAIFCmQ+8ZcUHa9n3Hf7gX87Mq1f9aZ3xypmO4bu3CbsTqqLIMXBFfvuG2niVclsx3lRf/31F4GBgVy48O8kjj179qRly5a0adMm07+khHno9Xp27NhBs2bN5FxoTM6F+SiKwu3bt0lOTsbb2/up3YZn7R8XF4erq6tM35HFjEYjYWFheHl5UaRIkTRf76xaZFuzQsjT0xNbW9s03Z+IiIg0XaLUQkJC6NOnD+vWraNp02evh+Xo6Iijo2OauL29vaa/YFbsUY8N0ulgeLNy5svp1nH4eYg6prNB12Ep9kWrmecYGTR69GhTEZQnTx4WLFhAp06d2LJli+bnQ/xLzoXlkHPx8vR6PQkJCfj4+JAnT8ZvQnlySc3Z2TnDRZTIuEKFCnHnzh3TZbL/yqqfBc3OqoODA35+fmlavzt27KBu3bpP3W/NmjX07NmT1atX07p166xOM0tEx+tZlmZskDdlCpnpNvGHN2FNZ0hOUMebfwnl3zLPMTJh6dKl5M+fn6pVq3L8+HECAwM1y0UIkTsYDClTkzg4OGiciXgRT87Tk/OWHTS9NDZy5Ei6detGzZo1qVOnDgsXLiQ0NJQBAwYAKZe1bt++zYoVK4CUIqh79+7MnDmT119/3dRNcnZ2tqq1p5YduMajBHU3aFgTM90plvgopQiKi1DH/XrB6wPNc4wX9GSQ4RPFixdn586dVKxYEScnp2zNRQiRu8llLeugxXnStM8XEBDAjBkz+Pzzz6lWrRr79u1jy5YtlChRAoCwsDDVnEILFiwgOTmZwYMH4+3tbfr3wQcfaPUWMiw6Xs+SP9TdoFavelPOywzdIKMB/tcX7v6ljvs2hFbTUiqubKAoCvPnz6dGjRo8evRI9Vz16tWlCBJCCGExNB8sPWjQIAYNGpTuc8uXL1c93rNnT9YnlMWWH7iu6gYBDDPXvEE7PoOL29SxAmXB/0ewzZ5xBtHR0bz//vusXbsWgP79+xMUFCR/jQkhRBbQ6XRs3LiRd955R+tUrJaM/MpGMQl6lvyhvgWz9avelC9shm7QsWVwaLY65uwBXdeCc/6Xf/0XcPz4cfz8/ExFEKQMis/Oa71CCJFThIeHM3ToUEqVKoWjoyPFihWjbdu2FjPxrKIoTJw4ER8fH5ydnWnUqBF///231mllmOYdodxk+YHrxKTqBg1tYoY1xa7ugS0fqmM29hCwCjxKvfzrP4eiKMyePZsPP/yQpKQkANzd3Vm6dCnt27fP8uMLIcSLMBoVHjxOyuA+Rh491qO3STTLXWP5XRywsXl+h/z69evUq1ePfPny8c0331ClShX0ej2//fYbgwcPVk1DopVvvvmG6dOns3z5csqVK8fkyZNp1qwZ//zzj1WtESmFUDZ5lJB2bFDLyoWpUNjt5V743kUI6Q7GVDNEt50JJeu93Gu/gAcPHtCnTx82btxoitWqVYvg4GB8fX2z/PhCCPGiHjxOwm/y75rmcPzTphTIk3ZKl9QGDRqETqfjyJEjuLq6muKVKlWid+/eT93vo48+YuPGjdy6dYvChQvTtWtXPvvsM9Ot56dPn2b48OEcO3YMnU5H2bJlWbBgATVr1uTGjRsMGTKEP/74g6SkJEqWLMm0adNo1apVmuMoisKMGTMYN26c6Q/eH3/8ES8vL1avXk3//v0z+qXRjBRC2eTHg9eJjldPD/7Sd4o9jvr/hVSj1fH6I6B615d77Rfw559/0rlzZ65fv26KjRo1iq+++kpuVRVCiEyKiopi27ZtfPnll6oi6Il8+fI9dd+8efOyfPlyfHx8OHv2LP369SNv3ryMGTMGgK5du1K9enXmzZuHra0tp06dMhVJgwcPJikpiX379uHq6sq5c+eeOvfStWvXCA8Pp3nz5qaYo6MjDRs25ODBg1IICbVHCXoW7Vd3g96qVJiK3i/RDUpOgpD34IH6danYFt78LPOvmwEHDhwwFUEeHh4sX76ctm3bZsuxhRAip7p8+TKKolChQoUM7/vpp5+a/r9kyZKMGjWKkJAQUyEUGhrK6NGjTa9dtuy/f5CHhobSoUMHXn31VQBKlXr60Ion09ekt0zWjRs3Mpy3lqQQygYrDt0wbzdIUeCX4XDjgDruXQ3eXQDZNPvpiBEj2L17N1FRUaxZs4bixYtny3GFECIne7IwbGbutl2/fj0zZszg8uXLxMbGkpycjJvbv390jxw5kr59+7Jy5UqaNm1Kp06dKF26NADDhg1j4MCBbN++naZNm9KhQweqVHn2sk+ZXSbLkkghlMViE5NZtF99p1iLSl684vMS3aADM+BUkDqW1we6BIND2jaqudy+fZsiRYqYHut0OoKCgnB2dpZlAIQQFi+/iwPHP332skypGY1GHsXGkjdPHrMNln6esmXLotPpOH/+fIZuiz98+DCdO3dm0qRJtGjRAnd3d4KDg/nuu+9M20ycOJHAwEB+/fVXtm7dyoQJEwgODubdd9+lb9++tGjRgl9//ZXt27czZcoUvvvuO4YOHZrmWIULFwZSOkPe3t6m+Issk2Vp5Pb5LPbjwes8fGzGbtC5zfD7RHXM3gW6rAE373R3eVlGo5GpU6dSqlSpNLdturm5SREkhLAKNjY6CuRxzPA/Dxf7TO2X3r8XuWPMw8ODFi1aMGfOHOLi4tI8//Dhw3T3O3DgACVKlGDcuHHUrFmTsmXLpnuZqly5cowYMYLt27fTvn17li1bZnquWLFiDBgwgA0bNjBq1CgWLVqU7rF8fX0pXLiwapmspKQk9u7d+8xlsiyRFEJZKC4xmcWpukHNXvGikk8mlwO5cxI2vJ8qqIP2i8CnWuZe8znu3btHmzZt+Pjjj0lKSqJr165EREQ8f0chhBCZNnfuXAwGA7Vq1eJ///sfly5d4vz588yaNYs6deqku0+ZMmUIDQ0lODiYK1euMGvWLNUdvfHx8QwZMoQ9e/Zw48YNDhw4wNGjR6lYsSIAw4cP57fffuPatWucOHGCXbt2mZ5LTafTMXz4cL766is2btzIX3/9Rc+ePXFxcbG6dSTl0lgWWnHoBg9SdYM+yGw3KOYOrOkCyfHqeLNJULFNJjN8tn379tGlSxfu3LkDpHzj9+vXDw8Pjyw5nhBCiBS+vr6cOHGCL7/8klGjRhEWFkbBggXx8/Nj3rx56e7z9ttvM2LECIYMGUJiYiKtW7dm/PjxTJw4EQBbW1siIyPp3r07d+/exdPTk/bt2zNp0iQgZaHTwYMHc+vWLdzc3Hjrrbf4/vvvn5rjmDFjiI+PZ9CgQTx48IDatWuzfft2q5pDCECnPBmVlUvExMTg7u7O/fv3KVCgQJYdJy4xmfpTd6kKoaYVvVjco2bGXywpDpa+BeFn1PHq70G72WZfQ8xgMDBlyhQmTJiA0WgEoFChQgQFBdG0acaurz+PXq9ny5YttGrVSi6xaUzOheWQc2E+CQkJXLt2DV9f30ytc2g0GomJicHNzc0sY4TEsz3rfEVGRuLp6Ul0dLRqAPjLko5QFll52EzdIKMx5XJY6iKoRH1o/b3Zi6C7d+/y3nvv8fvv/0469uabb7Jq1SrVgDghhBAiJ5DyNgs8Tkpm0T712KCmFQvxatFMjA3aOQku/KKOeZSCgJVgZ95JCw8ePEjVqlVNRZCNjQ2TJk1i+/btUgQJIYTIkaQjlAVWHb5BZJx6PZsPmpTL+AudXJVyq/x/OblD4FpwMf84nQIFChAbGwuAt7c3q1evplGjRmY/jhBCCGEppCNkZo+TklmwV90NerNCJrpB1/+An4erYzZ24L8SPF9yaY6nKF++PAsWLKBFixacOnVKiiAhhBA5nhRCZhZ0ODSdblAGC5fIKynLZxjVY4xoPR1KNXzJDP+1d+9e4uPVd6F17dqVrVu3UqhQIbMdRwghhLBUUgiZUXySgQX7rqhijcsXpGqxfBl4kQcpC6nGP1DH6wwBvx4vnySQnJzMJ598QqNGjRg+fHia561tenQhhBAis6QQMqOgP29wPzZVN6hpBsYGGfSwtjtEXlbHy7eCZp+bIUO4desWjRs3ZsqUKQAsXLiQXbt2meW1hRBCCGsjhZCZxCcZmJ9qbFCj8gWp9qLdIEWBLR/CtX3quNerKTNH29i+dI6//vor1apV448//gDAzs6OadOmyVggIYQQuZYUQmay+kgo92MTVbEMjQ06NAeOL1fH8nhBYDA45nmp3PR6PaNHj6ZNmzZERkYCULx4cfbt28eHH34ok4QJIYSV0ul0bNq0Ses0rJp8AppBgt7A/L3qsUFvlCtI9eL5X+wF/tkK2z9Vx+ycUhZSdS/6UrnduHGDN954g2+//dYUe/vttzl58uRT16sRQgihvfDwcIYOHUqpUqVwdHSkWLFitG3bNs3i11rZsGEDLVq0wNPTE51Ox6lTp7ROKVNkHiEzWP1nKPceZbIbFH4W1vcBUq108u58KOL3UnmdO3eOevXqmVYqtre3Z9q0aQwbNkwGRAshhAW7fv069erVI1++fHzzzTdUqVIFvV7Pb7/9xuDBg7lw4YLWKRIXF0e9evXo1KkT/fr10zqdTJNC6CWl1w1qUNYTvxIv0A16FA6rO4M+Th1/81Oo9O5L51a+fHmqV6/O7t278fX1JSQkhNdee+2lX1cIIayS0QjxURneR/f4EdgmgTmGETh7vNDrDBo0CJ1Ox5EjR3B1dTXFK1WqRO/evZ+630cffcTGjRu5desWhQsXpmvXrnz22WemNetOnz7N8OHDOXbsGDqdjrJly7JgwQJq1qzJjRs3GDJkCH/88QdJSUmULFmSadOm0apVq3SP1a1bNyClaLNmUgi9pDVHQolI1Q0a3vQFukFJj1NWk4+5pY5X6QwNPjRLbra2tgQFBTFhwgS++eYb8uXLZ5bXFUIIqxQfBdNKZ2gXGyATiyM93egr4Or5zE2ioqLYtm0bX375paoIeuJZv8vz5s3L8uXL8fHx4ezZs/Tr14+8efMyZswYIGWuuOrVqzNv3jxsbW05deqUqUgaPHgwSUlJ7Nu3D1dXV86dO0eePC83RtUaSCH0Ep7eDXrO8hdGI2waAHdOqOPFXod2szK9kOr//vc/ihQpwuuvv26KeXt7s3Dhwky9nhBCiOx3+fJlFEWhQoUKGd7300//HW9asmRJRo0aRUhIiKkQCg0NZfTo0abXLlv23z/cQ0ND6dChA6+++ioApUqVepm3YTVksPRLCDl6k7sxmRgbtOcrOPeTOpavBHQOAjvHDOeRkJDAkCFD6NixIwEBAURFZbD1K4QQwmIoSsqY0cyM5Vy/fj3169encOHC5MmTh/HjxxMaGmp6fuTIkfTt25emTZvy9ddfc+XKv3/MDxs2jMmTJ1OvXj0mTJjAmTNnXv7NWAEphDIpQW9g7h71xIf1yhSgZsnndINOh8C+aeqYo1vKQqrPaZem59KlS9StW5c5c+YAKRX9ihUrMvw6QgghLEPZsmXR6XScP38+Q/sdPnyYzp0707JlS3755RdOnjzJuHHjSEr6d6LfiRMn8vfff9O6dWt27drFK6+8wsaNGwHo27cvV69epVu3bpw9e5aaNWvyww8/mPW9WSK5NJZJa4+l1w16zizSoYdh8xB1TGcLnZZDoYy3QIODg+nXr59pxXgnJydmzZpF3759M/xaQgiR4zl7pIzRyQCj0cijR4/ImzeveeZcc37OH8uAh4cHLVq0YM6cOQwbNizNOKGHDx+mO07owIEDlChRgnHjxpliN27cSLNduXLlKFeuHCNGjKBLly4sW7aMd99NuUGnWLFiDBgwgAEDBjB27FgWLVrE0KFDM/gmrYsUQpmQmGxg7m71D1Pd0gWo5fuMb/CoaxAcCAb1Ehy0+gbKNMnQ8ePj4xk+fLhq7E/58uVZu3YtVapUydBrCSFErmFjk/HOu9GIYnAAVzfz3DX2gubOnUvdunWpVasWn3/+OVWqVCE5OZkdO3Ywb968dLtFZcqUITQ0lODgYF577TV+/fVXU7cHUj47Ro8eTceOHfH19eXWrVscPXqUDh06ADB8+HBatmxJuXLlePDgAbt27aJixYpPzTEqKorQ0FDu3LkDwD///ANA4cKFKVy4sDm/HFlKLo1lwtqjNwmPSVDFnjk2KCEaVgfA40h1vPYAeC1j3ZsLFy5Qu3ZtVRHUrVs3jh07JkWQEELkEL6+vpw4cYLGjRszatQoKleuTLNmzdi5cyfz5s1Ld5+3336bESNGMGTIEKpVq8bBgwcZP3686XlbW1siIyPp3r075cqVw9/fn5YtWzJp0iQADAYDgwcPpmLFirz11luUL1+euXPnPjXHzZs3U716dVq3bg1A586dqV69OvPnzzfjVyLr6ZQno7JyiZiYGNzd3bl//z4FChTI8P6JyQYaTdtDWPS/hVCdUgVY8/7r6e9gSIbVneBKqoVNyzSDLsFg++JNudjYWEqWLGlaJsPZ2Zk5c+bQs2dPq50gUa/Xs2XLFlq1amW6hVNoQ86F5ZBzYT4JCQlcu3YNX19fnJycMry/0WgkJiYGNzc3WY4oGzzrfEVGRuLp6Ul0dDRubm5mO6ac1Qxad+yWqggC+OBZ8wZt+zhtEVToFei4NENFEECePHmYPHkykDKp1rFjx+jVq5fVFkFCCCGE1mSMUAakjA1S3ylW29eD10s9pbP050I4ukgdcy2Y0glyylw1279/f2xsbHjvvfdwcXHJ1GsIIYQQIoV0hDJg/fFb3EnVDRre9Cl3il3aAds+UsdsHaHzashf4rnHUhSFpUuXqq7vQsq8Eu+//74UQUIIIYQZSEfoBSUlG9PcKVbL14M6pdPpBt09B+t6gWJUx9+ZC8VqPfdYsbGxDBw4kFWrVgHw+uuvmwajCSGEEMJ8pCP0gtYfv8Xth/Gq2PD07hSLvZdyh1jSI3W84cfwasfnHufMmTP4+fmZiiCA/fv3ZypnIYQQQjybFEIvICnZyJxUY4NqlUynG6RPSJkrKDpUHa/cERp9/MxjKIrCggULqFWrFhcvXgRSFs9bs2YNX3/99Uu/ByGEEEKkJZfGXsCGE2m7QR80Lau+W0tR4KfBcOuIeueir8Hbc565kGpMTAzvv/8+ISEhpliNGjUICQmhTJkyZnkPQgghhEhLOkLPoTcYmZ2qG1SzRH7qpu4G7f0G/lqvjrkXSxkcbf/0uStOnDhhKnqeGDJkCAcPHpQiSAghhMhi0hF6jg0nbnHrQaqxQU3LqbtBZ9enrCj/Xw55IDAE8hR66msrisKIESNMq/+6u7uzZMkS03TnQgghhMha0hF6hvS6QX4l8lOvzH+6QTePwqZB6h11NtBxGXhVeubr63Q6fvzxR/Lly8drr73GyZMnpQgSQghhEUqWLMmMGTNMj3U6HZs2bdIsn6wihdAzbDxxm5tRqcYGNfnP2KCHoRDcBQzqVehpMQXKNU/3NfV6vepxyZIl2b17N3/88Qe+vr5my10IIYT1erJ00pN/BQoU4K233uLMmTOa5RQWFkbLli01O35WkULoKdLrBtUono8GZf9/5eKEGFjdGeLuqXes2Qdq90/zeoqiMGPGDPz8/IiNjVU9V61aNRwcHMyavxBCCOv21ltvERYWRlhYGDt37sTOzo42bdpolk/hwoVxdHTU7PhZRQqhp9h08jahUY9VsQ+ejA0yGuB/fSDib/VOpRpDy6lp7hCLiorinXfeYcSIEZw9e5aBAweSy9a6FUIIkUGOjo4ULlyYwoULU61aNT766CNu3rzJvXspf4B/9NFHlCtXDhcXF0qVKsX48eNVVx1Onz5N48aNyZs3L25ubvj5+XHs2DHT8wcPHuSNN97A2dmZYsWKMWzYMOLi4p6az38vjV2/fh2dTseGDRto3LgxLi4uVK1alUOHDqn2yegxtCCFUDqS0+kGVSuWjzeedIO2fwqXtqt38iwPnZaDrXql6EOHDlGtWjU2b95sivn4+EghJIQQ4oXFxsYSFBREmTJlKFAgZZxq3rx5Wb58OefOnWPmzJksWrSI77//3rRP165dKVq0KEePHuX48eN8/PHH2NunfEadPXuWFi1a0L59e86cOUNISAh//PEHQ4YMyVBe48aN48MPP+TUqVOUK1eOLl26kJycbNZjZDW5aywdm07d4Uakuhs0/Mm8QUeXwOG56h2cPVLuEHPOZwoZjUa+/fZbPvnkEwwGAwAFChRgxYoVtGrVKqvfghBCiGeYPn0606dPf+521atXZ+XKlapYu3btOHHixHP3HTlyJCNHjsx0jr/88gt58uQBIC4uDm9vb3755RdsbFJ6GJ9++qlp25IlSzJq1ChCQkIYM2YMAKGhoYwePZoKFSoAULbsv6shTJs2jcDAQIYPH256btasWTRs2JB58+bh5PT0aV/+68MPPzQtATVp0iQqVarE5cuXqVChgtmOkdWkEEol2WBk9q5LqljVYvloWK4gXNkFW0ard7B1SJkryOPfgc737t2jR48ebN261RSrX78+a9asoWjRolmavxBCiOeLiYnh9u3bz92uWLFiaWL37t17oX1jYmIyldsTjRs3Zt68eUDKEIu5c+fSsmVLjhw5QokSJVi/fj0zZszg8uXLxMbGkpycjJubm2n/kSNH0rdvX1auXEnTpk3p1KkTpUuXBuD48eNcvnyZoKAg0/aKomA0Grl27RoVK1Z8oRyrVKli+n9vb28AIiIiqFChgtmOkdWkEErlp1N3uJ66G9SkLLr7F2FtT1AM6h3azoISdUwP9+/fT+fOnblz5w6Qck31k08+YeLEidjZyZdbCCEsgZubG0WKFHnudp6enmliBQsWfKF9/1uUZIarq6tqYl0/Pz/c3d1ZtGgRbdq0oXPnzkyaNIkWLVrg7u5OcHAw3333nWn7iRMnEhgYyK+//srWrVuZMGECwcHBvPvuuxiNRvr378+wYcPSHLd48eIvnOOTS22A6Y5qo9Fo+q85jpHV5JP5P9IbG1S1qDuNitnAYn9IjFbv0GAUVOuiCh0+fNhUBBUqVIhVq1bRrFmzLM1bCCFExrzoZSuj0Zims/PfMZ/ZSafTYWNjQ3x8PAcOHKBEiRKMGzfO9PyNGzfS7FOuXDnKlSvHiBEj6NKlC8uWLePdd9+lRo0a/P3331m6gkF2HMMcZLD0f/x85g7X7qtHsw9vXAJdSDd4cF298StvQ+NPSW3UqFG0bNmSxo0bc+rUKSmChBBCZEpiYiLh4eGEh4dz/vx5hg4dSmxsLG3btqVMmTKEhoYSHBzMlStXmDVrFhs3bjTtGx8fz5AhQ9izZw83btzgwIEDHD161HQ56qOPPuLQoUMMHjyYU6dOcenSJTZv3szQoUPNln92HMMcpCP0/wxGhR92qrtBVYq40ejilxB6UL2xT3V4Zz7Y2HDz5k3VNWQbGxuCg4NxdXXF1tY2O1IXQgiRA23bts007iZv3rxUqFCBdevW0ahRIwBGjBjBkCFDSExMpHXr1owfP56JEycCYGtrS2RkJN27d+fu3bt4enrSvn17Jk2aBKSM7dm7dy/jxo2jQYMGKIpC6dKlCQgIMFv+2XEMc9Apuew+7piYGNzd3bl//77pFkRImTdoeMgp1ba/v3aMMmdT3VXgVgT67cLgUpDJkyfz5Zdfsn37dtM3psgYvV7Pli1baNWqlepas8h+ci4sh5wL80lISODatWv4+vpm6i6lJ5fG3NzcTHdriazzrPMVGRmJp6cn0dHRLz3+6r+kI0RKN2hWqjvF+nv+lbYIsneFLsGExSp0fbsZu3fvBiAwMJCzZ8+qCishhBBCWD4phIBfztzh6r1/xwa9qrvKmPjU80vooMNidvx1l/fea0ZERASQcilsyJAh5M+fPxszFkIIIYQ55PpCyGBUmLXz325QYSJZ7jQdW0OCarvkNycycdVBvvrqK9Os0EWKFGHNmjU0aNAgO1MWQgghhJnk+kLo17NhXPn/bpALCSxx+JYCSpRqm1vF2xP46Qb2799virVs2ZIVK1akO8eEEEIIIaxDri6E/tsNssHITPs5VLJRz8OwJ+EVOo75mcjISCBlJP6UKVMYNWqUDJwTQgghrFyu/iTfcjaMyxGxAIyxC6aZ7XH1Bh6l8fKfTnx8PJAyE+b+/fsZPXq0FEFCCGFFctkN0lZLi/OUaz/Njf/pBvnb7maA3S/qDZzyQdd1VKxRh3nz5tGuXTtOnjxJnTp10r6YEEIIi/RkPrekpCSNMxEv4sl5ys55+HLtpbHfz0dwKSKW123O8aXdUlN8x5VkGpR0xClgFRRIWZyue/fudOvWzbSOihBCCOtgZ2eHi4sL9+7dw97ePsPdfKPRSFJSEgkJCXIlIIsZjUbu3buHi4tLtq7NmWsLoYV/XMNXF8Z8+++x1xlIMih8tCORGX8mMahTHeb4qu8EkyJICCGsj06nw9vbm2vXrqW7FtfzKIpCfHw8zs7O8jmQDWxsbChevHi2fq1zbSF0714Em/NOI58ujmsPjASsf8zROykr5s5dt5Muf/xB/fr1Nc5SCCHEy3JwcKBs2bKZujym1+vZt28fb7zxhszynQ0cHByyvfOmeSE0d+5cpk2bRlhYGJUqVWLGjBnPnJdn7969jBw5kr///hsfHx/GjBnDgAEDMnzc7+3nUsomnP+d09NnczzRiSlxBwcHpk+fTr169TL7loQQQlgYGxubTC2xYWtrS3JyMk5OTlII5VCaXvAMCQlh+PDhjBs3jpMnT9KgQQNatmxJaGhouttfu3aNVq1a0aBBA06ePMknn3zCsGHD+N///pfhY1dVLjBkSzwd1/1bBJUuVcq0Uq60QIUQQoicT9NCaPr06fTp04e+fftSsWJFZsyYQbFixZg3b16628+fP5/ixYszY8YMKlasSN++fenduzfffvttho/dbEUcc47qTY8D2rfjxMmT1KhRI9PvRwghhBDWRbNCKCkpiePHj9O8eXNVvHnz5hw8eDDdfQ4dOpRm+xYtWnDs2DH0en26+zzNmYiUuQoc7WDB1+NYs36TWVezFUIIIYTl02yM0P379zEYDHh5eaniXl5ehIeHp7tPeHh4utsnJydz//59vL290+yTmJhIYmKi6XF0dLTp/0vn17Hk+y+o3GYAUVFRafYVWU+v1/P48WMiIyPl+rvG5FxYDjkXlkPOheV48jlt7kkXNR8snXosjqIozxyfk9726cWfmDJlCpMmTUr3uSsPFBr1/BT4NAMZCyGEEEIrkZGRuLu7m+31NCuEPD09sbW1TdP9iYiISNP1eaJw4cLpbm9nZ0eBAgXS3Wfs2LGMHDnS9Pjhw4eUKFGC0NBQs34hRebExMRQrFgxbt68KZcmNSbnwnLIubAcci4sR3R0NMWLF8fDw8Osr6tZIeTg4ICfnx87duzg3XffNcV37NjB22+/ne4+derU4eeff1bFtm/fTs2aNZ/asnR0dMTR0TFN3N3dXb6pLYibm5ucDwsh58JyyLmwHHIuLIe55xnS9K6xkSNHsnjxYpYuXcr58+cZMWIEoaGhpnmBxo4dS/fu3U3bDxgwgBs3bjBy5EjOnz/P0qVLWbJkCR9++KFWb0EIIYQQVkzTMUIBAQFERkby+eefExYWRuXKldmyZQslSpQAICwsTDWnkK+vL1u2bGHEiBHMmTMHHx8fZs2aRYcOHbR6C0IIIYSwYpoPlh40aBCDBg1K97nly5eniTVs2JATJ05k+niOjo5MmDAh3ctlIvvJ+bAcci4sh5wLyyHnwnJk1bnQKea+D00IIYQQwkpoOkZICCGEEEJLUggJIYQQIteSQkgIIYQQuZYUQkIIIYTItXJkITR37lx8fX1xcnLCz8+P/fv3P3P7vXv34ufnh5OTE6VKlWL+/PnZlGnOl5FzsWHDBpo1a0bBggVxc3OjTp06/Pbbb9mYbc6X0Z+NJw4cOICdnR3VqlXL2gRzkYyei8TERMaNG0eJEiVwdHSkdOnSLF26NJuyzdkyei6CgoKoWrUqLi4ueHt706tXLyIjI7Mp25xr3759tG3bFh8fH3Q6HZs2bXruPmb5/FZymODgYMXe3l5ZtGiRcu7cOeWDDz5QXF1dlRs3bqS7/dWrVxUXFxflgw8+UM6dO6csWrRIsbe3V9avX5/Nmec8GT0XH3zwgTJ16lTlyJEjysWLF5WxY8cq9vb2yokTJ7I585wpo+fjiYcPHyqlSpVSmjdvrlStWjV7ks3hMnMu2rVrp9SuXVvZsWOHcu3aNeXPP/9UDhw4kI1Z50wZPRf79+9XbGxslJkzZypXr15V9u/fr1SqVEl55513sjnznGfLli3KuHHjlP/9738KoGzcuPGZ25vr8zvHFUK1atVSBgwYoIpVqFBB+fjjj9PdfsyYMUqFChVUsf79+yuvv/56luWYW2T0XKTnlVdeUSZNmmTu1HKlzJ6PgIAA5dNPP1UmTJgghZCZZPRcbN26VXF3d1ciIyOzI71cJaPnYtq0aUqpUqVUsVmzZilFixbNshxzoxcphMz1+Z2jLo0lJSVx/Phxmjdvroo3b96cgwcPprvPoUOH0mzfokULjh07hl6vz7Jcc7rMnIvUjEYjjx49MvsCe7lRZs/HsmXLuHLlChMmTMjqFHONzJyLzZs3U7NmTb755huKFClCuXLl+PDDD4mPj8+OlHOszJyLunXrcuvWLbZs2YKiKNy9e5f169fTunXr7EhZ/Ie5Pr81n1nanO7fv4/BYEizer2Xl1eaVeufCA8PT3f75ORk7t+/j7e3d5blm5Nl5lyk9t133xEXF4e/v39WpJirZOZ8XLp0iY8//pj9+/djZ5ejflVoKjPn4urVq/zxxx84OTmxceNG7t+/z6BBg4iKipJxQi8hM+eibt26BAUFERAQQEJCAsnJybRr144ffvghO1IW/2Guz+8c1RF6QqfTqR4ripIm9rzt04uLjMvouXhizZo1TJw4kZCQEAoVKpRV6eU6L3o+DAYDgYGBTJo0iXLlymVXerlKRn42jEYjOp2OoKAgatWqRatWrZg+fTrLly+XrpAZZORcnDt3jmHDhvHZZ59x/Phxtm3bxrVr10yLhYvsZY7P7xz1Z56npye2trZpKvmIiIg0VeMThQsXTnd7Ozs7ChQokGW55nSZORdPhISE0KdPH9atW0fTpk2zMs1cI6Pn49GjRxw7doyTJ08yZMgQIOXDWFEU7Ozs2L59O2+++Wa25J7TZOZnw9vbmyJFiuDu7m6KVaxYEUVRuHXrFmXLls3SnHOqzJyLKVOmUK9ePUaPHg1AlSpVcHV1pUGDBkyePFmuImQjc31+56iOkMP/tXf3QVFVbxzAv7u4LPsiQViCgGyAblBQvIhDToGJsZnTMg6yo6tAwohTOoAYxkz4B0wRJuDAEI0NA2QSL445RpSuxcuCIxSsk8IqL4IkUU45CkIgL+f3h8PNTULwpxK7z2dmZzjn3nPuc/bMcp+599xdc3P4+PhAo9EY1Gs0Grz00ktTtvH3979n/1OnTsHX1xcCgeCRxWrsHmQugDtXgiIjI1FcXEz33B+i2c6HpaUlzp8/j3PnznGvHTt2QC6X49y5c1i5cuXjCt3oPMhnY9WqVfj1119x69Ytrq6trQ18Ph8ODg6PNF5j9iBzMTQ0BD7f8NRpZmYG4O+rEeTxeGjn71ktrZ4HJh+FzM/PZ62trSwuLo5JJBLW3d3NGGPsvffeY1u3buX2n3z8Lj4+nrW2trL8/Hx6fP4hme1cFBcXswULFrDc3FzW19fHvW7cuDFXQzAqs52Pf6Knxh6e2c7FwMAAc3BwYKGhoaylpYXV1NSwZcuWsejo6LkagtGY7VwUFBSwBQsWsE8++YR1dnayuro65uvry/z8/OZqCEZjYGCA6XQ6ptPpGACWmZnJdDod91UGj+r8bXSJEGOM5ebmMicnJ2Zubs68vb1ZTU0Nty0iIoIFBAQY7F9dXc28vLyYubk5k8lkLC8v7zFHbLxmMxcBAQEMwD2viIiIxx+4kZrtZ+NulAg9XLOdC71ez4KCgphIJGIODg5s9+7dbGho6DFHbZxmOxfZ2dnM3d2diUQiZmdnx9RqNbt69epjjtr4VFVVTXsOeFTnbx5jdC2PEEIIIabJqNYIEUIIIYTMBiVChBBCCDFZlAgRQgghxGRRIkQIIYQQk0WJECGEEEJMFiVChBBCCDFZlAgRQgghxGRRIkQIIVM4dOgQHB0dwefzcfDgwbkOZ1Z4PB6OHz8+12EQMi9QIkTIPBEZGQkejwcejweBQABnZ2fs2bMHg4ODcx3afclksnmVTPT392Pnzp3Yu3cvent7sX379rkOiRDyiBjVr88TYuwUCgUKCgowOjoKrVaL6OhoDA4OIi8vb9Z9McYwPj6OBQvo38A/9fT0YHR0FG+88Qb9mjghRo6uCBEyjwiFQtja2sLR0RGbN2+GWq3mboEwxrB//344OztDJBLhhRdewNGjR7m21dXV4PF4OHnyJHx9fSEUCqHVajExMYH09HS4urpCKBRi6dKl+OCDD7h2vb29UKlUsLa2ho2NDZRKJbq7u7ntkZGRCAkJwYEDB2BnZwcbGxu88847GB0dBQAEBgbiypUriI+P565oAcCff/6JTZs2wcHBAWKxGB4eHvjyyy8NxjswMAC1Wg2JRAI7OztkZWUhMDAQcXFx3D63b99GYmIi7O3tIZFIsHLlSlRXV0/7Pvb09ECpVEIqlcLS0hJhYWH4/fffAQCFhYXw8PAAADg7O4PH4xmM9+7j7ty5E3Z2drCwsIBMJkNaWhq3PTMzEx4eHpBIJHB0dMTbb79t8OvxhYWFsLKyQkVFBeRyOcRiMUJDQzE4OIiioiLIZDJYW1tj165dGB8f59rJZDKkpqZi8+bNkEqlWLJkCXJycqYd7/3mkBBTRokQIfOYSCTiEo73338fBQUFyMvLQ0tLC+Lj47FlyxbU1NQYtElMTERaWhr0ej08PT2RlJSE9PR0JCcno7W1FcXFxVi8eDEAYGhoCKtXr4ZUKkVtbS3q6uoglUqhUChw+/Ztrs+qqip0dnaiqqoKRUVFKCwsRGFhIQDg2LFjcHBwQEpKCvr6+tDX1wcAGB4eho+PDyoqKnDhwgVs374dW7duRUNDA9fv7t27UV9fjxMnTkCj0UCr1aK5udlgPG+99Rbq6+tRUlKCn3/+GRs3boRCoUB7e/uU7xljDCEhIbh+/Tpqamqg0WjQ2dkJlUoFAFCpVDh9+jQAoLGxEX19fXB0dLynn+zsbJw4cQJlZWW4dOkSvvjiC8hkMm47n89HdnY2Lly4gKKiIvzwww9ITEw06GNoaAjZ2dkoKSnBd999h+rqamzYsAGVlZWorKzE4cOHcejQIYOEFgA+/vhjeHp6orm5GUlJSYiPj4dGo5lyvDOdQ0JM1v/3W7GEkMclIiKCKZVKrtzQ0MBsbGxYWFgYu3XrFrOwsGBnzpwxaBMVFcU2bdrEGPv7l52PHz/Obe/v72dCoZB99tlnUx4zPz+fyeVyNjExwdWNjIwwkUjETp48ycXl5OTExsbGuH02btzIVCoVV3ZycmJZWVn3HeO6detYQkICF5tAIGDl5eXc9hs3bjCxWMxiY2MZY4x1dHQwHo/Hent7DfpZs2YNS0pKmvIYp06dYmZmZqynp4era2lpYQBYY2MjY4wxnU7HALCurq5/jXXXrl3s1VdfNXhvplNWVsZsbGy4ckFBAQPAOjo6uLqYmBgmFovZwMAAVxccHMxiYmK4spOTE1MoFAZ9q1Qq9vrrr3NlAOyrr75ijM1sDgkxZbQ4gJB5pKKiAlKpFGNjYxgdHYVSqUROTg5aW1sxPDyMtWvXGux/+/ZteHl5GdT5+vpyf+v1eoyMjGDNmjVTHq+pqQkdHR1YuHChQf3w8DA6Ozu58nPPPQczMzOubGdnh/Pnz087lvHxcXz00UcoLS1Fb28vRkZGMDIyAolEAgC4fPkyRkdH4efnx7V54oknIJfLuXJzczMYY1i+fLlB3yMjI7CxsZnyuHq9Ho6OjgZXedzd3WFlZQW9Xo8VK1ZMG/ekyMhIrF27FnK5HAqFAuvXr8drr73Gba+qqsKHH36I1tZW9Pf3Y2xsDMPDwxgcHOTGKBaL4eLiwrVZvHgxZDIZpFKpQd21a9cMju3v739P+d8Wo890DgkxVZQIETKPrF69Gnl5eRAIBFiyZAkEAgEAoKurCwDwzTffwN7e3qCNUCg0KE+ehIE7t9amMzExAR8fHxw5cuSebU899RT392Qck3g8HiYmJqbtOyMjA1lZWTh48CC3liYuLo67XcMY4/q622T9ZHxmZmZoamoySMQAGCQT/2z/zz6nq/833t7e6OrqwrfffovTp08jLCwMQUFBOHr0KK5cuYJ169Zhx44dSE1NxZNPPom6ujpERUVxtzKBqd+3B3kvJ/ebykznkBBTRYkQIfOIRCKBq6vrPfXu7u4QCoXo6elBQEDAjPtbtmwZRCIRvv/+e0RHR9+z3dvbG6WlpXj66adhaWn5wHGbm5sbLPgFAK1WC6VSiS1btgC4c8Jub2+Hm5sbAMDFxQUCgQCNjY3c1Zv+/n60t7dzY/Ty8sL4+DiuXbuGl19+eUaxuLu7o6enB7/88gvXb2trK27evMkde6YsLS2hUqmgUqkQGhoKhUKB69ev46effsLY2BgyMjLA599ZillWVjarvqdz9uzZe8rPPvvslPs+rDkkxFjRYmlCjMDChQuxZ88exMfHo6ioCJ2dndDpdMjNzUVRUdG/trOwsMDevXuRmJiIzz//HJ2dnTh79izy8/MBAGq1GosWLYJSqYRWq0VXVxdqamoQGxuLq1evzjg+mUyG2tpa9Pb24o8//gAAuLq6QqPR4MyZM9Dr9YiJicFvv/1mMKaIiAi8++67qKqqQktLC7Zt2wY+n89d/Vi+fDnUajXCw8Nx7NgxdHV14ccff0R6ejoqKyunjCUoKAienp5Qq9Vobm5GY2MjwsPDERAQYHDb8H6ysrJQUlKCixcvoq2tDeXl5bC1tYWVlRVcXFwwNjaGnJwcXL58GYcPH8ann346477vp76+Hvv370dbWxtyc3NRXl6O2NjYKfd9WHNIiLGiRIgQI5Gamop9+/YhLS0Nbm5uCA4Oxtdff41nnnlm2nbJyclISEjAvn374ObmBpVKxa1JEYvFqK2txdKlS7Fhwwa4ublh27Zt+Ouvv2Z1dSElJQXd3d1wcXHhbsckJyfD29sbwcHBCAwMhK2tLUJCQgzaZWZmwt/fH+vXr0dQUBBWrVoFNzc3WFhYcPsUFBQgPDwcCQkJkMvlePPNN9HQ0DDlk17A39+6bG1tjVdeeQVBQUFwdnZGaWnpjMcD3Ln1lp6eDl9fX6xYsQLd3d2orKwEn8/Hiy++iMzMTKSnp+P555/HkSNHDB6t/38lJCSgqakJXl5eSE1NRUZGBoKDg6fc92HNISHGisfuvuFOCCH/YYODg7C3t0dGRgaioqLmOpw5IZPJEBcXZ/BdSoSQB0drhAgh/1k6nQ4XL16En58fbt68iZSUFACAUqmc48gIIcaCEiFCyH/agQMHcOnSJZibm8PHxwdarRaLFi2a67AIIUaCbo0RQgghxGTRYmlCCCGEmCxKhAghhBBisigRIoQQQojJokSIEEIIISaLEiFCCCGEmCxKhAghhBBisigRIoQQQojJokSIEEIIISaLEiFCCCGEmKz/AbJZwh57msdDAAAAAElFTkSuQmCC", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn.model_selection import train_test_split \n", - "from sklearn.datasets import load_breast_cancer\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "# Load the data\n", - "cancer = load_breast_cancer()\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n", - "print(X_train.shape)\n", - "print(X_test.shape)\n", - "# Logistic Regression\n", - "logreg = LogisticRegression(solver='lbfgs')\n", - "logreg.fit(X_train, y_train)\n", - "\n", - "from sklearn.preprocessing import LabelEncoder\n", - "from sklearn.model_selection import cross_validate\n", - "#Cross validation\n", - "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n", - "print(accuracy)\n", - "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n", - "\n", - "import scikitplot as skplt\n", - "y_pred = logreg.predict(X_test)\n", - "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n", - "plt.show()\n", - "y_probas = logreg.predict_proba(X_test)\n", - "skplt.metrics.plot_roc(y_test, y_probas)\n", - "plt.show()\n", - "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "id": "fe0c7fda", - "metadata": {}, - "source": [ - "## Optimization, the central part of any Machine Learning algortithm\n", - "\n", - "[Overview Video, why do we care about gradient methods?](https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage)\n", - "\n", - "Almost every problem in machine learning and data science starts with\n", - "a dataset $X$, a model $g(\\beta)$, which is a function of the\n", - "parameters $\\beta$ and a cost function $C(X, g(\\beta))$ that allows\n", - "us to judge how well the model $g(\\beta)$ explains the observations\n", - "$X$. The model is fit by finding the values of $\\beta$ that minimize\n", - "the cost function. Ideally we would be able to solve for $\\beta$\n", - "analytically, however this is not possible in general and we must use\n", - "some approximative/numerical method to compute the minimum." + "resulting in" ] }, { "cell_type": "markdown", - "id": "9df4ecc4", - "metadata": {}, - "source": [ - "## Revisiting our Logistic Regression case\n", - "\n", - "In our discussion on Logistic Regression we studied the \n", - "case of\n", - "two classes, with $y_i$ either\n", - "$0$ or $1$. Furthermore we assumed also that we have only two\n", - "parameters $\\beta$ in our fitting, that is we\n", - "defined probabilities" - ] - }, - { - "cell_type": "markdown", - "id": "a0a65501", - "metadata": {}, - "source": [ - "$$\n", - "\\begin{align*}\n", - "p(y_i=1|x_i,\\boldsymbol{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n", - "p(y_i=0|x_i,\\boldsymbol{\\beta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\beta}),\n", - "\\end{align*}\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "8dc1194c", - "metadata": {}, - "source": [ - "where $\\boldsymbol{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$." - ] - }, - { - "cell_type": "markdown", - "id": "62d70952", - "metadata": {}, - "source": [ - "## The equations to solve\n", - "\n", - "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n", - "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n", - "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n", - "$p(y_i\\vert x_i,\\boldsymbol{\\beta})$. We rewrote in a more compact form\n", - "the first derivative of the cost function as" - ] - }, - { - "cell_type": "markdown", - "id": "41787d1e", - "metadata": {}, - "source": [ - "$$\n", - "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "86fc7282", - "metadata": {}, - "source": [ - "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n", - "$p(y_i\\vert x_i,\\boldsymbol{\\beta})(1-p(y_i\\vert x_i,\\boldsymbol{\\beta})$, we can obtain a compact expression of the second derivative as" - ] - }, - { - "cell_type": "markdown", - "id": "8f4c640e", - "metadata": {}, - "source": [ - "$$\n", - "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "f37d28ca", - "metadata": {}, - "source": [ - "This defines what is called the Hessian matrix." - ] - }, - { - "cell_type": "markdown", - "id": "9b5cb6dd", - "metadata": {}, - "source": [ - "## Solving using Newton-Raphson's method\n", - "\n", - "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives. \n", - "\n", - "Our iterative scheme is then given by" - ] - }, - { - "cell_type": "markdown", - "id": "f474d68a", - "metadata": {}, + "id": "5b6a9003", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\boldsymbol{\\beta}^{\\mathrm{new}} = \\boldsymbol{\\beta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T}\\right)^{-1}_{\\boldsymbol{\\beta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}}\\right)_{\\boldsymbol{\\beta}^{\\mathrm{old}}},\n", + "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n", + " \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n", "$$" ] }, { "cell_type": "markdown", - "id": "ff928190", - "metadata": {}, + "id": "88b7b6c2", + "metadata": { + "editable": true + }, "source": [ - "or in matrix form as" + "and in the limit $m\\rightarrow \\infty$ we obtain" ] }, { "cell_type": "markdown", - "id": "a9e9efc2", - "metadata": {}, + "id": "bb8051d4", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\boldsymbol{\\beta}^{\\mathrm{new}} = \\boldsymbol{\\beta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\beta}^{\\mathrm{old}}}.\n", + "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n", + " \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n", "$$" ] }, { "cell_type": "markdown", - "id": "93061994", - "metadata": {}, + "id": "4950aac9", + "metadata": { + "editable": true + }, "source": [ - "The right-hand side is computed with the old values of $\\beta$. \n", - "\n", - "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement." + "which is the normal distribution with variance\n", + "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n", + "and $\\mu$ is also the mean of the PDF $p(x)$." ] }, { "cell_type": "markdown", - "id": "08acd443", - "metadata": {}, + "id": "6d705546", + "metadata": { + "editable": true + }, "source": [ - "## Brief reminder on Newton-Raphson's method\n", + "## Wrapping it up\n", "\n", - "Let us quickly remind ourselves how we derive the above method.\n", + "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n", + "the average of $m$ random values corresponding to a PDF $p(x)$ \n", + "is a normal distribution whose mean is the \n", + "mean value of the PDF $p(x)$ and whose variance is the variance\n", + "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n", "\n", - "Perhaps the most celebrated of all one-dimensional root-finding\n", - "routines is Newton's method, also called the Newton-Raphson\n", - "method. This method requires the evaluation of both the\n", - "function $f$ and its derivative $f'$ at arbitrary points. \n", - "If you can only calculate the derivative\n", - "numerically and/or your function is not of the smooth type, we\n", - "normally discourage the use of this method." - ] - }, - { - "cell_type": "markdown", - "id": "caa94b50", - "metadata": {}, - "source": [ - "## The equations\n", - "\n", - "The Newton-Raphson formula consists geometrically of extending the\n", - "tangent line at a current point until it crosses zero, then setting\n", - "the next guess to the abscissa of that zero-crossing. The mathematics\n", - "behind this method is rather simple. Employing a Taylor expansion for\n", - "$x$ sufficiently close to the solution $s$, we have" - ] - }, - { - "cell_type": "markdown", - "id": "ac3e7ef2", - "metadata": {}, - "source": [ - "\n", - "
    \n", - "\n", - "$$\n", - "f(s)=0=f(x)+(s-x)f'(x)+\\frac{(s-x)^2}{2}f''(x) +\\dots.\n", - " \\label{eq:taylornr} \\tag{2}\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "6bd1aafd", - "metadata": {}, - "source": [ - "For small enough values of the function and for well-behaved\n", - "functions, the terms beyond linear are unimportant, hence we obtain" - ] - }, - { - "cell_type": "markdown", - "id": "699697a1", - "metadata": {}, - "source": [ - "$$\n", - "f(x)+(s-x)f'(x)\\approx 0,\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "4efbdd72", - "metadata": {}, - "source": [ - "yielding" + "The central limit theorem leads to the well-known expression for the\n", + "standard deviation, given by" ] }, { "cell_type": "markdown", - "id": "4bd64a59", - "metadata": {}, + "id": "749b506b", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "s\\approx x-\\frac{f(x)}{f'(x)}.\n", + "\\sigma_m=\n", + "\\frac{\\sigma}{\\sqrt{m}}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "358dc6db", - "metadata": {}, + "id": "02d5afea", + "metadata": { + "editable": true + }, "source": [ - "Having in mind an iterative procedure, it is natural to start iterating with" + "The latter is true only if the average value is known exactly. This is obtained in the limit\n", + "$m\\rightarrow \\infty$ only. Because the mean and the variance are measured quantities we obtain \n", + "the familiar expression in statistics (the so-called Bessel correction)" ] }, { "cell_type": "markdown", - "id": "8a007c48", - "metadata": {}, + "id": "2664f854", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "x_{n+1}=x_n-\\frac{f(x_n)}{f'(x_n)}.\n", + "\\sigma_m\\approx \n", + "\\frac{\\sigma}{\\sqrt{m-1}}.\n", "$$" ] }, { "cell_type": "markdown", - "id": "e0828d1d", - "metadata": {}, - "source": [ - "## Simple geometric interpretation\n", - "\n", - "The above is Newton-Raphson's method. It has a simple geometric\n", - "interpretation, namely $x_{n+1}$ is the point where the tangent from\n", - "$(x_n,f(x_n))$ crosses the $x$-axis. Close to the solution,\n", - "Newton-Raphson converges fast to the desired result. However, if we\n", - "are far from a root, where the higher-order terms in the series are\n", - "important, the Newton-Raphson formula can give grossly inaccurate\n", - "results. For instance, the initial guess for the root might be so far\n", - "from the true root as to let the search interval include a local\n", - "maximum or minimum of the function. If an iteration places a trial\n", - "guess near such a local extremum, so that the first derivative nearly\n", - "vanishes, then Newton-Raphson may fail totally" - ] - }, - { - "cell_type": "markdown", - "id": "26efa0c4", - "metadata": {}, + "id": "a986ee46", + "metadata": { + "editable": true + }, "source": [ - "## Extending to more than one variable\n", + "In many cases however the above estimate for the standard deviation,\n", + "in particular if correlations are strong, may be too simplistic. Keep\n", + "in mind that we have assumed that the variables $x$ are independent\n", + "and identically distributed. This is obviously not always the\n", + "case. For example, the random numbers (or better pseudorandom numbers)\n", + "we generate in various calculations do always exhibit some\n", + "correlations.\n", "\n", - "Newton's method can be generalized to systems of several non-linear equations\n", - "and variables. Consider the case with two equations" - ] - }, - { - "cell_type": "markdown", - "id": "8af30001", - "metadata": {}, - "source": [ - "$$\n", - "\\begin{array}{cc} f_1(x_1,x_2) &=0\\\\\n", - " f_2(x_1,x_2) &=0,\\end{array}\n", - "$$" + "The theorem is satisfied by a large class of PDFs. Note however that for a\n", + "finite $m$, it is not always possible to find a closed form /analytic expression for\n", + "$\\tilde{p}(x)$." ] }, { "cell_type": "markdown", - "id": "77528641", - "metadata": {}, - "source": [ - "which we Taylor expand to obtain" - ] - }, - { - "cell_type": "markdown", - "id": "d10154f0", - "metadata": {}, - "source": [ - "$$\n", - "\\begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1\n", - " \\partial f_1/\\partial x_1+h_2\n", - " \\partial f_1/\\partial x_2+\\dots\\\\\n", - " 0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1\n", - " \\partial f_2/\\partial x_1+h_2\n", - " \\partial f_2/\\partial x_2+\\dots\n", - " \\end{array}.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "58a6cb05", - "metadata": {}, - "source": [ - "Defining the Jacobian matrix ${\\bf \\boldsymbol{J}}$ we have" - ] - }, - { - "cell_type": "markdown", - "id": "87917443", - "metadata": {}, - "source": [ - "$$\n", - "{\\bf \\boldsymbol{J}}=\\left( \\begin{array}{cc}\n", - " \\partial f_1/\\partial x_1 & \\partial f_1/\\partial x_2 \\\\\n", - " \\partial f_2/\\partial x_1 &\\partial f_2/\\partial x_2\n", - " \\end{array} \\right),\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "316440eb", - "metadata": {}, - "source": [ - "we can rephrase Newton's method as" - ] - }, - { - "cell_type": "markdown", - "id": "4ec22184", - "metadata": {}, - "source": [ - "$$\n", - "\\left(\\begin{array}{c} x_1^{n+1} \\\\ x_2^{n+1} \\end{array} \\right)=\n", - "\\left(\\begin{array}{c} x_1^{n} \\\\ x_2^{n} \\end{array} \\right)+\n", - "\\left(\\begin{array}{c} h_1^{n} \\\\ h_2^{n} \\end{array} \\right),\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "9da35b82", - "metadata": {}, - "source": [ - "where we have defined" - ] - }, - { - "cell_type": "markdown", - "id": "61c4f7fc", - "metadata": {}, - "source": [ - "$$\n", - "\\left(\\begin{array}{c} h_1^{n} \\\\ h_2^{n} \\end{array} \\right)=\n", - " -{\\bf \\boldsymbol{J}}^{-1}\n", - " \\left(\\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\\\ f_2(x_1^{n},x_2^{n}) \\end{array} \\right).\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "ffd39c16", - "metadata": {}, + "id": "f21341e3", + "metadata": { + "editable": true + }, "source": [ - "We need thus to compute the inverse of the Jacobian matrix and it\n", - "is to understand that difficulties may\n", - "arise in case ${\\bf \\boldsymbol{J}}$ is nearly singular.\n", + "## Confidence Intervals\n", "\n", - "It is rather straightforward to extend the above scheme to systems of\n", - "more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function." - ] - }, - { - "cell_type": "markdown", - "id": "de590520", - "metadata": {}, - "source": [ - "## Steepest descent\n", + "Confidence intervals are used in statistics and represent a type of estimate\n", + "computed from the observed data. This gives a range of values for an\n", + "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n", "\n", - "The basic idea of gradient descent is\n", - "that a function $F(\\mathbf{x})$, \n", - "$\\mathbf{x} \\equiv (x_1,\\cdots,x_n)$, decreases fastest if one goes from $\\bf {x}$ in the\n", - "direction of the negative gradient $-\\nabla F(\\mathbf{x})$.\n", + "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n", + "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n", "\n", - "It can be shown that if" - ] - }, - { - "cell_type": "markdown", - "id": "6a0e0292", - "metadata": {}, - "source": [ - "$$\n", - "\\mathbf{x}_{k+1} = \\mathbf{x}_k - \\gamma_k \\nabla F(\\mathbf{x}_k),\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "ec6877a5", - "metadata": {}, - "source": [ - "with $\\gamma_k > 0$.\n", + "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n", + "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n", "\n", - "For $\\gamma_k$ small enough, then $F(\\mathbf{x}_{k+1}) \\leq\n", - "F(\\mathbf{x}_k)$. This means that for a sufficiently small $\\gamma_k$\n", - "we are always moving towards smaller function values, i.e a minimum." + "This quantity can be used to\n", + "construct a confidence interval for the estimates." ] }, { "cell_type": "markdown", - "id": "b7e72c2f", - "metadata": {}, + "id": "b22eb043", + "metadata": { + "editable": true + }, "source": [ - "## More on Steepest descent\n", + "## Standard Approach based on the Normal Distribution\n", "\n", - "The previous observation is the basis of the method of steepest\n", - "descent, which is also referred to as just gradient descent (GD). One\n", - "starts with an initial guess $\\mathbf{x}_0$ for a minimum of $F$ and\n", - "computes new approximations according to" + "We will assume that the parameters $\\theta$ follow a normal\n", + "distribution. We can then define the confidence interval. Here we will be using as\n", + "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n", + "for the standard deviation. We have then a confidence interval" ] }, { "cell_type": "markdown", - "id": "cae90d84", - "metadata": {}, + "id": "a2e2c4c5", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\mathbf{x}_{k+1} = \\mathbf{x}_k - \\gamma_k \\nabla F(\\mathbf{x}_k), \\ \\ k \\geq 0.\n", + "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n", "$$" ] }, { "cell_type": "markdown", - "id": "1ab31b86", - "metadata": {}, - "source": [ - "The parameter $\\gamma_k$ is often referred to as the step length or\n", - "the learning rate within the context of Machine Learning." - ] - }, - { - "cell_type": "markdown", - "id": "87d0d18e", - "metadata": {}, - "source": [ - "## The ideal\n", - "\n", - "Ideally the sequence $\\{\\mathbf{x}_k \\}_{k=0}$ converges to a global\n", - "minimum of the function $F$. In general we do not know if we are in a\n", - "global or local minimum. In the special case when $F$ is a convex\n", - "function, all local minima are also global minima, so in this case\n", - "gradient descent can converge to the global solution. The advantage of\n", - "this scheme is that it is conceptually simple and straightforward to\n", - "implement. However the method in this form has some severe\n", - "limitations:\n", - "\n", - "In machine learing we are often faced with non-convex high dimensional\n", - "cost functions with many local minima. Since GD is deterministic we\n", - "will get stuck in a local minimum, if the method converges, unless we\n", - "have a very good intial guess. This also implies that the scheme is\n", - "sensitive to the chosen initial condition.\n", - "\n", - "Note that the gradient is a function of $\\mathbf{x} =\n", - "(x_1,\\cdots,x_n)$ which makes it expensive to compute numerically." - ] - }, - { - "cell_type": "markdown", - "id": "c92a82a1", - "metadata": {}, - "source": [ - "## The sensitiveness of the gradient descent\n", - "\n", - "The gradient descent method \n", - "is sensitive to the choice of learning rate $\\gamma_k$. This is due\n", - "to the fact that we are only guaranteed that $F(\\mathbf{x}_{k+1}) \\leq\n", - "F(\\mathbf{x}_k)$ for sufficiently small $\\gamma_k$. The problem is to\n", - "determine an optimal learning rate. If the learning rate is chosen too\n", - "small the method will take a long time to converge and if it is too\n", - "large we can experience erratic behavior.\n", - "\n", - "Many of these shortcomings can be alleviated by introducing\n", - "randomness. One such method is that of Stochastic Gradient Descent\n", - "(SGD), to be discussed next week." - ] - }, - { - "cell_type": "markdown", - "id": "b5a9af46", - "metadata": {}, + "id": "be028ae6", + "metadata": { + "editable": true + }, "source": [ - "## Convex functions\n", - "\n", - "Ideally we want our cost/loss function to be convex(concave).\n", + "where $z$ defines the level of certainty (or confidence). For a normal\n", + "distribution typical parameters are $z=2.576$ which corresponds to a\n", + "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n", + "$95\\%$. A confidence level of $95\\%$ is commonly used and it is\n", + "normally referred to as a *two-sigmas* confidence level, that is we\n", + "approximate $z\\approx 2$.\n", "\n", - "First we give the definition of a convex set: A set $C$ in\n", - "$\\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and\n", - "all $t \\in (0,1)$ , the point $(1 − t)x + ty$ also belongs to\n", - "C. Geometrically this means that every point on the line segment\n", - "connecting $x$ and $y$ is in $C$ as discussed below.\n", + "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n", "\n", - "The convex subsets of $\\mathbb{R}$ are the intervals of\n", - "$\\mathbb{R}$. Examples of convex sets of $\\mathbb{R}^2$ are the\n", - "regular polygons (triangles, rectangles, pentagons, etc...)." + "In this text you will also find an in-depth discussion of the\n", + "Bootstrap method, why it works and various theorems related to it." ] }, { "cell_type": "markdown", - "id": "77ee5272", - "metadata": {}, + "id": "e746545b", + "metadata": { + "editable": true + }, "source": [ - "## Convex function\n", + "## Resampling methods: Bootstrap background\n", "\n", - "**Convex function**: Let $X \\subset \\mathbb{R}^n$ be a convex set. Assume that the function $f: X \\rightarrow \\mathbb{R}$ is continuous, then $f$ is said to be convex if $$f(tx_1 + (1-t)x_2) \\leq tf(x_1) + (1-t)f(x_2) $$ for all $x_1, x_2 \\in X$ and for all $t \\in [0,1]$. If $\\leq$ is replaced with a strict inequaltiy in the definition, we demand $x_1 \\neq x_2$ and $t\\in(0,1)$ then $f$ is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the value of the function on the interval $[x_1,x_2]$ is always below the line as illustrated below." + "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n", + "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n", + "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n", + "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n", + "$\\widehat{\\theta}$. You can think of this as using a histogram\n", + "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n", + "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n", + "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n", + "estimators." ] }, { "cell_type": "markdown", - "id": "282df4c7", - "metadata": {}, + "id": "dea3037c", + "metadata": { + "editable": true + }, "source": [ - "## Conditions on convex functions\n", - "\n", - "In the following we state first and second-order conditions which\n", - "ensures convexity of a function $f$. We write $D_f$ to denote the\n", - "domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more\n", - "details and proofs we refer to: [S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press](http://stanford.edu/boyd/cvxbook/, 2004).\n", - "\n", - "**First order condition.**\n", - "\n", - "Suppose $f$ is differentiable (i.e $\\nabla f(x)$ is well defined for\n", - "all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$\n", - "is a convex set and $$f(y) \\geq f(x) + \\nabla f(x)^T (y-x) $$ holds\n", - "for all $x,y \\in D_f$. This condition means that for a convex function\n", - "the first order Taylor expansion (right hand side above) at any point\n", - "a global under estimator of the function. To convince yourself you can\n", - "make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and\n", - "note that it is always below the graph.\n", + "## Resampling methods: More Bootstrap background\n", "\n", - "**Second order condition.**\n", + "In the case that $\\widehat{\\theta}$ has\n", + "more than one component, and the components are independent, we use the\n", + "same estimator on each component separately. If the probability\n", + "density function of $X_i$, $p(x)$, had been known, then it would have\n", + "been straightforward to do this by: \n", + "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n", "\n", - "Assume that $f$ is twice\n", - "differentiable, i.e the Hessian matrix exists at each point in\n", - "$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its\n", - "Hessian is positive semi-definite for all $x\\in D_f$. For a\n", - "single-variable function this reduces to $f''(x) \\geq 0$. Geometrically this means that $f$ has nonnegative curvature\n", - "everywhere.\n", + "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n", "\n", - "This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition." + "By repeated use of the above two points, many\n", + "estimates of $\\widehat{\\theta}$ can be obtained. The\n", + "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n", + "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$." ] }, { "cell_type": "markdown", - "id": "e435596b", - "metadata": {}, + "id": "fd576cb1", + "metadata": { + "editable": true + }, "source": [ - "## More on convex functions\n", - "\n", - "The next result is of great importance to us and the reason why we are\n", - "going on about convex functions. In machine learning we frequently\n", - "have to minimize a loss/cost function in order to find the best\n", - "parameters for the model we are considering. \n", - "\n", - "Ideally we want the\n", - "global minimum (for high-dimensional models it is hard to know\n", - "if we have local or global minimum). However, if the cost/loss function\n", - "is convex the following result provides invaluable information:\n", - "\n", - "**Any minimum is global for convex functions.**\n", + "## Resampling methods: Bootstrap approach\n", "\n", - "Consider the problem of finding $x \\in \\mathbb{R}^n$ such that $f(x)$\n", - "is minimal, where $f$ is convex and differentiable. Then, any point\n", - "$x^*$ that satisfies $\\nabla f(x^*) = 0$ is a global minimum.\n", + "But\n", + "unless there is enough information available about the process that\n", + "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n", + "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552) asked the\n", + "question: What if we replace $p(x)$ by the relative frequency\n", + "of the observation $X_i$?\n", "\n", - "This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum." + "If we draw observations in accordance with\n", + "the relative frequency of the observations, will we obtain the same\n", + "result in some asymptotic sense? The answer is yes." ] }, { "cell_type": "markdown", - "id": "7bc1bf29", - "metadata": {}, - "source": [ - "## Some simple problems\n", - "\n", - "1. Show that $f(x)=x^2$ is convex for $x \\in \\mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \\in D_f$ and any $\\lambda \\in [0,1]$ $\\lambda f(x)+(1-\\lambda)f(y)-f(\\lambda x + (1-\\lambda) y ) \\geq 0$.\n", - "\n", - "2. Using the second order condition show that the following functions are convex on the specified domain.\n", - "\n", - " * $f(x) = e^x$ is convex for $x \\in \\mathbb{R}$.\n", - "\n", - " * $g(x) = -\\ln(x)$ is convex for $x \\in (0,\\infty)$.\n", - "\n", - "3. Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \\in \\mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.\n", - "\n", - "4. A norm is any function that satisfy the following properties\n", - "\n", - " * $f(\\alpha x) = |\\alpha| f(x)$ for all $\\alpha \\in \\mathbb{R}$.\n", - "\n", - " * $f(x+y) \\leq f(x) + f(y)$\n", - "\n", - " * $f(x) \\leq 0$ for all $x \\in \\mathbb{R}^n$ with equality if and only if $x = 0$\n", - "\n", - "Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this)." - ] - }, - { - "cell_type": "markdown", - "id": "90fef1a2", - "metadata": {}, + "id": "8629a2e8", + "metadata": { + "editable": true + }, "source": [ - "## Revisiting our first homework\n", - "\n", - "We will use linear regression as a case study for the gradient descent\n", - "methods. Linear regression is a great test case for the gradient\n", - "descent methods discussed in the lectures since it has several\n", - "desirable properties such as:\n", + "## Resampling methods: Bootstrap steps\n", "\n", - "1. An analytical solution (recall homework set 1).\n", + "The independent bootstrap works like this: \n", "\n", - "2. The gradient can be computed analytically.\n", + "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n", "\n", - "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n", + "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n", "\n", - "We revisit an example similar to what we had in the first homework set. We had a function of the type" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "1c59342a", - "metadata": {}, - "outputs": [], - "source": [ - "x = 2*np.random.rand(m,1)\n", - "y = 4+3*x+np.random.randn(m,1)" - ] - }, - { - "cell_type": "markdown", - "id": "79d0e3da", - "metadata": {}, - "source": [ - "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n", - "The linear regression model is given by" - ] - }, - { - "cell_type": "markdown", - "id": "ec79a08a", - "metadata": {}, - "source": [ - "$$\n", - "h_\\beta(x) = \\boldsymbol{y} = \\beta_0 + \\beta_1 x,\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "fa4910ae", - "metadata": {}, - "source": [ - "such that" - ] - }, - { - "cell_type": "markdown", - "id": "e7665e13", - "metadata": {}, - "source": [ - "$$\n", - "\\boldsymbol{y}_i = \\beta_0 + \\beta_1 x_i.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "90ffc363", - "metadata": {}, - "source": [ - "## Gradient descent example\n", - "\n", - "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\beta = (\\beta_0, \\beta_1)^T$\n", - "\n", - "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\beta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)" - ] - }, - { - "cell_type": "markdown", - "id": "3aa073fa", - "metadata": {}, - "source": [ - "$$\n", - "X \\equiv \\begin{bmatrix}\n", - "1 & x_1 \\\\\n", - "\\vdots & \\vdots \\\\\n", - "1 & x_{100} & \\\\\n", - "\\end{bmatrix}.\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "e1ddc571", - "metadata": {}, - "source": [ - "The cost/loss/risk function is given by (" - ] - }, - { - "cell_type": "markdown", - "id": "5709f3d7", - "metadata": {}, - "source": [ - "$$\n", - "C(\\beta) = \\frac{1}{n}||X\\beta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\beta_0 + \\beta_1 x_i)^2 - 2 y_i (\\beta_0 + \\beta_1 x_i) + y_i^2\\right]\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "b7b3b90f", - "metadata": {}, - "source": [ - "and we want to find $\\beta$ such that $C(\\beta)$ is minimized." - ] - }, - { - "cell_type": "markdown", - "id": "6651ef6c", - "metadata": {}, - "source": [ - "## The derivative of the cost/loss function\n", - "\n", - "Computing $\\partial C(\\beta) / \\partial \\beta_0$ and $\\partial C(\\beta) / \\partial \\beta_1$ we can show that the gradient can be written as" - ] - }, - { - "cell_type": "markdown", - "id": "646be0cc", - "metadata": {}, - "source": [ - "$$\n", - "\\nabla_{\\beta} C(\\beta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\beta_0+\\beta_1x_i-y_i\\right) \\\\\n", - "\\sum_{i=1}^{100}\\left( x_i (\\beta_0+\\beta_1x_i)-y_ix_i\\right) \\\\\n", - "\\end{bmatrix} = \\frac{2}{n}X^T(X\\beta - \\mathbf{y}),\n", - "$$" + "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n", + "\n", + "4. Repeat this process $k$ times. \n", + "\n", + "When you are done, you can draw a histogram of the relative frequency\n", + "of $\\widehat \\theta^*$. This is your estimate of the probability\n", + "distribution $p(t)$. Using this probability distribution you can\n", + "estimate any statistics thereof. In principle you never draw the\n", + "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n", + "you use the estimators corresponding to the statistic of interest. For\n", + "example, if you are interested in estimating the variance of $\\widehat\n", + "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n", + "$\\widehat \\theta^*$." ] }, { "cell_type": "markdown", - "id": "b6f528c2", - "metadata": {}, + "id": "ab8c1d8a", + "metadata": { + "editable": true + }, "source": [ - "where $X$ is the design matrix defined above." + "## Code example for the Bootstrap method\n", + "\n", + "The following code starts with a Gaussian distribution with mean value\n", + "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n", + "used in the bootstrap analysis. The bootstrap analysis returns a data\n", + "set after a given number of bootstrap operations (as many as we have\n", + "data points). This data set consists of estimated mean values for each\n", + "bootstrap operation. The histogram generated by the bootstrap method\n", + "shows that the distribution for these mean values is also a Gaussian,\n", + "centered around the mean value $\\mu=100$ but with standard deviation\n", + "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n", + "this case the same as the number of original data points). The value\n", + "of the standard deviation is what we expect from the central limit\n", + "theorem." ] }, { - "cell_type": "markdown", - "id": "ae40f47b", - "metadata": {}, + "cell_type": "code", + "execution_count": 1, + "id": "d7b87cf8", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import numpy as np\n", + "from time import time\n", + "from scipy.stats import norm\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Returns mean of bootstrap samples \n", + "# Bootstrap algorithm\n", + "def bootstrap(data, datapoints):\n", + " t = np.zeros(datapoints)\n", + " n = len(data)\n", + " # non-parametric bootstrap \n", + " for i in range(datapoints):\n", + " t[i] = np.mean(data[np.random.randint(0,n,n)])\n", + " # analysis \n", + " print(\"Bootstrap Statistics :\")\n", + " print(\"original bias std. error\")\n", + " print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n", + " return t\n", + "\n", + "# We set the mean value to 100 and the standard deviation to 15\n", + "mu, sigma = 100, 15\n", + "datapoints = 10000\n", + "# We generate random numbers according to the normal distribution\n", + "x = mu + sigma*np.random.randn(datapoints)\n", + "# bootstrap returns the data sample \n", + "t = bootstrap(x, datapoints)" + ] + }, + { + "cell_type": "markdown", + "id": "d57a0c6c", + "metadata": { + "editable": true + }, "source": [ - "## The Hessian matrix\n", - "The Hessian matrix of $C(\\beta)$ is given by" + "We see that our new variance and from that the standard deviation, agrees with the central limit theorem." ] }, { "cell_type": "markdown", - "id": "592c656d", - "metadata": {}, + "id": "bd8574db", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", - "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0^2} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} \\\\\n", - "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_1^2} & \\\\\n", - "\\end{bmatrix} = \\frac{2}{n}X^T X.\n", - "$$" + "## Plotting the Histogram" ] }, { - "cell_type": "markdown", - "id": "aaff093b", - "metadata": {}, + "cell_type": "code", + "execution_count": 2, + "id": "5715940c", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ - "This result implies that $C(\\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite." + "# the histogram of the bootstrapped data (normalized data if density = True)\n", + "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n", + "# add a 'best fit' line \n", + "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n", + "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n", + "plt.xlabel('x')\n", + "plt.ylabel('Probability')\n", + "plt.grid(True)\n", + "plt.show()" ] }, { "cell_type": "markdown", - "id": "dd177bee", - "metadata": {}, + "id": "9584858b", + "metadata": { + "editable": true + }, "source": [ - "## Simple program\n", + "## The bias-variance tradeoff\n", "\n", - "We can now write a program that minimizes $C(\\beta)$ using the gradient descent method with a constant learning rate $\\gamma$ according to" + "We will discuss the bias-variance tradeoff in the context of\n", + "continuous predictions such as regression. However, many of the\n", + "intuitions and ideas discussed here also carry over to classification\n", + "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n", + "\n", + "Let us assume that the true data is generated from a noisy model" ] }, { "cell_type": "markdown", - "id": "94ead835", - "metadata": {}, + "id": "6f3cee73", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\beta_{k+1} = \\beta_k - \\gamma \\nabla_\\beta C(\\beta_k), \\ k=0,1,\\cdots\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", "$$" ] }, { "cell_type": "markdown", - "id": "75c4e856", - "metadata": {}, + "id": "fecd4f4b", + "metadata": { + "editable": true + }, "source": [ - "We can use the expression we computed for the gradient and let use a\n", - "$\\beta_0$ be chosen randomly and let $\\gamma = 0.001$. Stop iterating\n", - "when $||\\nabla_\\beta C(\\beta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n", + "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", "\n", - "And finally we can compare our solution for $\\beta$ with the analytic result given by \n", - "$\\beta= (X^TX)^{-1} X^T \\mathbf{y}$." - ] - }, - { - "cell_type": "markdown", - "id": "228edb14", - "metadata": {}, - "source": [ - "## Gradient Descent Example\n", + "In our derivation of the ordinary least squares method we defined then\n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n", "\n", - "Here our simple example" + "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function" ] }, { - "cell_type": "code", - "execution_count": 13, - "id": "46647c95", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "1bf50201", + "metadata": { + "editable": true + }, "source": [ - "\n", - "# Importing various packages\n", - "from random import random, seed\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "from mpl_toolkits.mplot3d import Axes3D\n", - "from matplotlib import cm\n", - "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", - "import sys\n", - "\n", - "# the number of datapoints\n", - "n = 100\n", - "x = 2*np.random.rand(n,1)\n", - "y = 4+3*x+np.random.randn(n,1)\n", - "\n", - "X = np.c_[np.ones((n,1)), x]\n", - "# Hessian matrix\n", - "H = (2.0/n)* X.T @ X\n", - "# Get the eigenvalues\n", - "EigValues, EigVectors = np.linalg.eig(H)\n", - "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", - "\n", - "beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n", - "print(beta_linreg)\n", - "beta = np.random.randn(2,1)\n", - "\n", - "eta = 1.0/np.max(EigValues)\n", - "Niterations = 1000\n", - "\n", - "for iter in range(Niterations):\n", - " gradient = (2.0/n)*X.T @ (X @ beta-y)\n", - " beta -= eta*gradient\n", - "\n", - "print(beta)\n", - "xnew = np.array([[0],[2]])\n", - "xbnew = np.c_[np.ones((2,1)), xnew]\n", - "ypredict = xbnew.dot(beta)\n", - "ypredict2 = xbnew.dot(beta_linreg)\n", - "plt.plot(xnew, ypredict, \"r-\")\n", - "plt.plot(xnew, ypredict2, \"b-\")\n", - "plt.plot(x, y ,'ro')\n", - "plt.axis([0,2.0,0, 15.0])\n", - "plt.xlabel(r'$x$')\n", - "plt.ylabel(r'$y$')\n", - "plt.title(r'Gradient descent example')\n", - "plt.show()" + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" ] }, { "cell_type": "markdown", - "id": "e0bb3c65", - "metadata": {}, + "id": "aa1ee75a", + "metadata": { + "editable": true + }, "source": [ - "## And a corresponding example using **scikit-learn**" + "We can rewrite this as" ] }, { - "cell_type": "code", - "execution_count": 14, - "id": "d29a0ccf", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "0b88cfa1", + "metadata": { + "editable": true + }, "source": [ - "# Importing various packages\n", - "from random import random, seed\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.linear_model import SGDRegressor\n", - "\n", - "n = 100\n", - "x = 2*np.random.rand(n,1)\n", - "y = 4+3*x+np.random.randn(n,1)\n", - "\n", - "X = np.c_[np.ones((n,1)), x]\n", - "beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n", - "print(beta_linreg)\n", - "sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)\n", - "sgdreg.fit(x,y.ravel())\n", - "print(sgdreg.intercept_, sgdreg.coef_)" + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", + "$$" ] }, { "cell_type": "markdown", - "id": "7d20e2cc", - "metadata": {}, + "id": "51802535", + "metadata": { + "editable": true + }, "source": [ - "## Gradient descent and Ridge\n", + "The three terms represent the square of the bias of the learning\n", + "method, which can be thought of as the error caused by the simplifying\n", + "assumptions built into the method. The second term represents the\n", + "variance of the chosen model and finally the last terms is variance of\n", + "the error $\\boldsymbol{\\epsilon}$.\n", "\n", - "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\beta$," + "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n", + "We use a more compact notation in terms of the expectation value" ] }, { "cell_type": "markdown", - "id": "52a46927", - "metadata": {}, + "id": "c1fab3ca", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "C_{\\text{ridge}}(\\beta) = \\frac{1}{n}||X\\beta -\\mathbf{y}||^2 + \\lambda ||\\beta||^2, \\ \\lambda \\geq 0.\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", "$$" ] }, { "cell_type": "markdown", - "id": "446851f9", - "metadata": {}, + "id": "bf1b97b3", + "metadata": { + "editable": true + }, "source": [ - "In order to minimize $C_{\\text{ridge}}(\\beta)$ using GD we adjust the gradient as follows" + "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" ] }, { "cell_type": "markdown", - "id": "dc10da38", - "metadata": {}, + "id": "4e6a9591", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\nabla_\\beta C_{\\text{ridge}}(\\beta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\beta_0+\\beta_1x_i-y_i\\right) \\\\\n", - "\\sum_{i=1}^{100}\\left( x_i (\\beta_0+\\beta_1x_i)-y_ix_i\\right) \\\\\n", - "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\beta_0 \\\\ \\beta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\beta - \\mathbf{y})+\\lambda \\beta).\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", "$$" ] }, { "cell_type": "markdown", - "id": "e15d77aa", - "metadata": {}, + "id": "3bec9e3c", + "metadata": { + "editable": true + }, "source": [ - "We can easily extend our program to minimize $C_{\\text{ridge}}(\\beta)$ using gradient descent and compare with the analytical solution given by" + "which, using the abovementioned expectation values can be rewritten as" ] }, { "cell_type": "markdown", - "id": "89cd7379", - "metadata": {}, + "id": "a65f2f18", + "metadata": { + "editable": true + }, "source": [ "$$\n", - "\\beta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", "$$" ] }, { "cell_type": "markdown", - "id": "20a6a0b6", - "metadata": {}, + "id": "d73eda6c", + "metadata": { + "editable": true + }, "source": [ - "## The Hessian matrix for Ridge Regression\n", - "The Hessian matrix of Ridge Regression for our simple example is given by" + "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$." ] }, { "cell_type": "markdown", - "id": "2bcf31af", - "metadata": {}, + "id": "ecc681f6", + "metadata": { + "editable": true + }, "source": [ - "$$\n", - "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n", - "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0^2} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} \\\\\n", - "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_1^2} & \\\\\n", - "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n", - "$$" + "## A way to Read the Bias-Variance Tradeoff\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" ] }, { "cell_type": "markdown", - "id": "3f9a5445", - "metadata": {}, + "id": "0b1fdbf0", + "metadata": { + "editable": true + }, + "source": [ + "## Example code for Bias-Variance tradeoff" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e1bb5682", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ - "This implies that the Hessian matrix is positive definite, hence the stationary point is a\n", - "minimum.\n", - "Note that the Ridge cost function is convex being a sum of two convex\n", - "functions. Therefore, the stationary point is a global\n", - "minimum of this function." + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 500\n", + "n_boostraps = 100\n", + "degree = 18 # A quite high value, just to show.\n", + "noise = 0.1\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-1, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n", + "\n", + "# Hold out some test data that is never used in training.\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "# Combine x transformation and model into one operation.\n", + "# Not neccesary, but convenient.\n", + "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + "\n", + "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n", + "# for each bootstrap iteration.\n", + "y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + "for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + "\n", + " # Evaluate the new model on the same test data each time.\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + "# Note: Expectations and variances taken w.r.t. different training\n", + "# data sets, hence the axis=1. Subsequent means are taken across the test data\n", + "# set in order to obtain a total value, but before this we have error/bias/variance\n", + "# calculated per data point in the test set.\n", + "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n", + "# maintains the column vector form. Dropping this yields very unexpected results.\n", + "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + "print('Error:', error)\n", + "print('Bias^2:', bias)\n", + "print('Var:', variance)\n", + "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n", + "\n", + "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n", + "plt.scatter(x_test, y_test, label='Data points')\n", + "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n", + "plt.legend()\n", + "plt.show()" ] }, { "cell_type": "markdown", - "id": "003f6d0d", - "metadata": {}, + "id": "256590ad", + "metadata": { + "editable": true + }, "source": [ - "## Program example for gradient descent with Ridge Regression" + "## Understanding what happens" ] }, { "cell_type": "code", - "execution_count": 15, - "id": "bb679580", - "metadata": {}, + "execution_count": 4, + "id": "a3b16f08", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ - "from random import random, seed\n", - "import numpy as np\n", "import matplotlib.pyplot as plt\n", - "from mpl_toolkits.mplot3d import Axes3D\n", - "from matplotlib import cm\n", - "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", - "import sys\n", - "\n", - "# the number of datapoints\n", - "n = 100\n", - "x = 2*np.random.rand(n,1)\n", - "y = 4+3*x+np.random.randn(n,1)\n", - "\n", - "X = np.c_[np.ones((n,1)), x]\n", - "XT_X = X.T @ X\n", - "\n", - "#Ridge parameter lambda\n", - "lmbda = 0.001\n", - "Id = n*lmbda* np.eye(XT_X.shape[0])\n", - "\n", - "# Hessian matrix\n", - "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n", - "# Get the eigenvalues\n", - "EigValues, EigVectors = np.linalg.eig(H)\n", - "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n", - "\n", - "\n", - "beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n", - "print(beta_linreg)\n", - "# Start plain gradient descent\n", - "beta = np.random.randn(2,1)\n", - "\n", - "eta = 1.0/np.max(EigValues)\n", - "Niterations = 100\n", - "\n", - "for iter in range(Niterations):\n", - " gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta\n", - " beta -= eta*gradients\n", - "\n", - "print(beta)\n", - "ypredict = X @ beta\n", - "ypredict2 = X @ beta_linreg\n", - "plt.plot(x, ypredict, \"r-\")\n", - "plt.plot(x, ypredict2, \"b-\")\n", - "plt.plot(x, y ,'ro')\n", - "plt.axis([0,2.0,0, 15.0])\n", - "plt.xlabel(r'$x$')\n", - "plt.ylabel(r'$y$')\n", - "plt.title(r'Gradient descent example for Ridge')\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.utils import resample\n", + "\n", + "np.random.seed(2018)\n", + "\n", + "n = 40\n", + "n_boostraps = 100\n", + "maxdegree = 14\n", + "\n", + "\n", + "# Make data set.\n", + "x = np.linspace(-3, 3, n).reshape(-1, 1)\n", + "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n", + "error = np.zeros(maxdegree)\n", + "bias = np.zeros(maxdegree)\n", + "variance = np.zeros(maxdegree)\n", + "polydegree = np.zeros(maxdegree)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", + "\n", + "for degree in range(maxdegree):\n", + " model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n", + " y_pred = np.empty((y_test.shape[0], n_boostraps))\n", + " for i in range(n_boostraps):\n", + " x_, y_ = resample(x_train, y_train)\n", + " y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n", + "\n", + " polydegree[degree] = degree\n", + " error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n", + " bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n", + " variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n", + " print('Polynomial degree:', degree)\n", + " print('Error:', error[degree])\n", + " print('Bias^2:', bias[degree])\n", + " print('Var:', variance[degree])\n", + " print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n", + "\n", + "plt.plot(polydegree, error, label='Error')\n", + "plt.plot(polydegree, bias, label='bias')\n", + "plt.plot(polydegree, variance, label='Variance')\n", + "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", - "id": "2050684c", - "metadata": {}, + "id": "8c4d3e7f", + "metadata": { + "editable": true + }, "source": [ - "## Using gradient descent methods, limitations\n", - "\n", - "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n", + "## Summing up\n", "\n", - "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n", + "The bias-variance tradeoff summarizes the fundamental tension in\n", + "machine learning, particularly supervised learning, between the\n", + "complexity of a model and the amount of training data needed to train\n", + "it. Since data is often limited, in practice it is often useful to\n", + "use a less-complex model with higher bias, that is a model whose asymptotic\n", + "performance is worse than another model because it is easier to\n", + "train and less sensitive to sampling noise arising from having a\n", + "finite-sized training dataset (smaller variance). \n", "\n", - "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n", + "The above equations tell us that in\n", + "order to minimize the expected test error, we need to select a\n", + "statistical learning method that simultaneously achieves low variance\n", + "and low bias. Note that variance is inherently a nonnegative quantity,\n", + "and squared bias is also nonnegative. Hence, we see that the expected\n", + "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n", "\n", - "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n", + "What do we mean by the variance and bias of a statistical learning\n", + "method? The variance refers to the amount by which our model would change if we\n", + "estimated it using a different training data set. Since the training\n", + "data are used to fit the statistical learning method, different\n", + "training data sets will result in a different estimate. But ideally the\n", + "estimate for our model should not vary too much between training\n", + "sets. However, if a method has high variance then small changes in\n", + "the training data can result in large changes in the model. In general, more\n", + "flexible statistical methods have higher variance.\n", "\n", - "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n", - "\n", - "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points." + "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest." ] }, { "cell_type": "markdown", - "id": "6b20b26d", - "metadata": {}, + "id": "6ba8872d", + "metadata": { + "editable": true + }, "source": [ - "## Challenge yourself the coming weekend\n", + "## Another Example from Scikit-Learn's Repository\n", "\n", - "Write a code which implements gradient descent for a logistic regression example." + "This example demonstrates the problems of underfitting and overfitting and\n", + "how we can use linear regression with polynomial features to approximate\n", + "nonlinear functions. The plot shows the function that we want to approximate,\n", + "which is a part of the cosine function. In addition, the samples from the\n", + "real function and the approximations of different models are displayed. The\n", + "models have polynomial features of different degrees. We can see that a\n", + "linear function (polynomial with degree 1) is not sufficient to fit the\n", + "training samples. This is called **underfitting**. A polynomial of degree 4\n", + "approximates the true function almost perfectly. However, for higher degrees\n", + "the model will **overfit** the training data, i.e. it learns the noise of the\n", + "training data.\n", + "We evaluate quantitatively overfitting and underfitting by using\n", + "cross-validation. We calculate the mean squared error (MSE) on the validation\n", + "set, the higher, the less likely the model generalizes correctly from the\n", + "training data." ] }, { - "cell_type": "markdown", - "id": "3570021a", - "metadata": {}, + "cell_type": "code", + "execution_count": 5, + "id": "624a6bc3", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ - "## Lab session: Material from last week and relevant for the first project" + "\n", + "\n", + "#print(__doc__)\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.preprocessing import PolynomialFeatures\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "def true_fun(X):\n", + " return np.cos(1.5 * np.pi * X)\n", + "\n", + "np.random.seed(0)\n", + "\n", + "n_samples = 30\n", + "degrees = [1, 4, 15]\n", + "\n", + "X = np.sort(np.random.rand(n_samples))\n", + "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n", + "\n", + "plt.figure(figsize=(14, 5))\n", + "for i in range(len(degrees)):\n", + " ax = plt.subplot(1, len(degrees), i + 1)\n", + " plt.setp(ax, xticks=(), yticks=())\n", + "\n", + " polynomial_features = PolynomialFeatures(degree=degrees[i],\n", + " include_bias=False)\n", + " linear_regression = LinearRegression()\n", + " pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n", + " (\"linear_regression\", linear_regression)])\n", + " pipeline.fit(X[:, np.newaxis], y)\n", + "\n", + " # Evaluate the models using crossvalidation\n", + " scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n", + " scoring=\"neg_mean_squared_error\", cv=10)\n", + "\n", + " X_test = np.linspace(0, 1, 100)\n", + " plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n", + " plt.plot(X_test, true_fun(X_test), label=\"True function\")\n", + " plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n", + " plt.xlabel(\"x\")\n", + " plt.ylabel(\"y\")\n", + " plt.xlim((0, 1))\n", + " plt.ylim((-2, 2))\n", + " plt.legend(loc=\"best\")\n", + " plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n", + " degrees[i], -scores.mean(), scores.std()))\n", + "plt.show()" ] }, { "cell_type": "markdown", - "id": "c5f36ff0", - "metadata": {}, + "id": "7dcfbdc3", + "metadata": { + "editable": true + }, "source": [ "## Various steps in cross-validation\n", "\n", @@ -2704,48 +1919,10 @@ }, { "cell_type": "markdown", - "id": "a6e47b16", - "metadata": {}, - "source": [ - "## How to set up the cross-validation for Ridge and/or Lasso\n", - "\n", - "* Define a range of interest for the penalty parameter.\n", - "\n", - "* Divide the data set into training and test set comprising samples $\\{1, \\ldots, n\\} \\setminus i$ and $\\{ i \\}$, respectively.\n", - "\n", - "* Fit the linear regression model by means of for example Ridge or Lasso regression for each $\\lambda$ in the grid using the training set, and the corresponding estimate of the error variance $\\boldsymbol{\\sigma}_{-i}^2(\\lambda)$, as" - ] - }, - { - "cell_type": "markdown", - "id": "5b7545c5", - "metadata": {}, - "source": [ - "$$\n", - "\\begin{align*}\n", - "\\boldsymbol{\\beta}_{-i}(\\lambda) & = ( \\boldsymbol{X}_{-i, \\ast}^{T}\n", - "\\boldsymbol{X}_{-i, \\ast} + \\lambda \\boldsymbol{I}_{pp})^{-1}\n", - "\\boldsymbol{X}_{-i, \\ast}^{T} \\boldsymbol{y}_{-i}\n", - "\\end{align*}\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "fa3a49a2", - "metadata": {}, - "source": [ - "* Evaluate the prediction performance of these models on the test set by $C[y_i, \\boldsymbol{X}_{i, \\ast}; \\boldsymbol{\\beta}_{-i}(\\lambda), \\boldsymbol{\\sigma}_{-i}^2(\\lambda)]$. Or, by the prediction error $|y_i - \\boldsymbol{X}_{i, \\ast} \\boldsymbol{\\beta}_{-i}(\\lambda)|$, the relative error, the error squared or the R2 score function.\n", - "\n", - "* Repeat the first three steps such that each sample plays the role of the test set once.\n", - "\n", - "* Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data." - ] - }, - { - "cell_type": "markdown", - "id": "685304e1", - "metadata": {}, + "id": "583f2b85", + "metadata": { + "editable": true + }, "source": [ "## Cross-validation in brief\n", "\n", @@ -2770,8 +1947,10 @@ }, { "cell_type": "markdown", - "id": "9cac2104", - "metadata": {}, + "id": "2b422220", + "metadata": { + "editable": true + }, "source": [ "## Code Example for Cross-validation and $k$-fold Cross-validation\n", "\n", @@ -2781,20 +1960,12 @@ { "cell_type": "code", "execution_count": 6, - "id": "1134c2ed", - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAioAAAGwCAYAAACHJU4LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABmvklEQVR4nO3dd1yV5f/H8ddhHTYIAoJMQUUUHLhQc+U2s+EqM7VpqWnDyva2rNTKhlrZ0LLSLHOPcu+BIjhQUVBUFJDNYZz79wff+EWOFIHrnMPn+Xicx4Nzn3u87+PB8+G6r/u6dJqmaQghhBBCmCAr1QGEEEIIIa5GChUhhBBCmCwpVIQQQghhsqRQEUIIIYTJkkJFCCGEECZLChUhhBBCmCwpVIQQQghhsmxUB7gZRqOR1NRUXFxc0Ol0quMIIYQQ4jpomkZOTg5+fn5YWV27zcSsC5XU1FQCAgJUxxBCCCFEJaSkpODv73/Ndcy6UHFxcQHKTtTV1VVxGiGEEEJcj+zsbAICAsq/x6/FrAuVvy/3uLq6SqEihBBCmJnr6bYhnWmFEEIIYbKkUBFCCCGEyZJCRQghhBAmy6z7qFyv0tJSiouLVccQFs7W1hZra2vVMYQQwqJYdKGiaRrnzp3j0qVLqqOIWsLd3Z169erJuD5CCFFFLLpQ+btI8fb2xtHRUb48RLXRNI38/HzS0tIA8PX1VZxICCEsg8UWKqWlpeVFiqenp+o4ohZwcHAAIC0tDW9vb7kMJIQQVcBiO9P+3SfF0dFRcRJRm/z9eZM+UUIIUTUstlD5m1zuETVJPm9CCFG1LL5QEUIIIYT5kkJFCCGEECZLChWhlE6n47ffflMdQwghhImSQkUIIcR1KzVqXDiVSsGlHNVRRC0hhYqZk7tLqkdRUZHqCEKYlPPZhUz+NY4Wr69m+V2P4FDHlcMNIomd+R2a0ag6nrBgtapQ0TSN/KKSGn9omnZDOY1GI++99x5hYWHo9XoCAwN5++23OXnyJDqdjp9//pmuXbtib2/PvHnzMBqNvPHGG/j7+6PX62nRogUrV64s319RURHjxo3D19cXe3t7goODmTJlSvnrr732GoGBgej1evz8/HjiiSf+M+PkyZNp3779ZcujoqJ49dVXAdi1axc9e/akbt26uLm50aVLF/bu3XtD78X1nsOlS5d45JFH8PHxwd7enmbNmrF06dLy1xctWkTTpk3R6/UEBwfz4YcfVth/cHAwb731FqNGjcLNzY2HH34YgK1bt9K5c2ccHBwICAjgiSeeIC8vr1LnIIS52j9/Cbe/u5IfdyaTYyjhjKs3AOFJB2kxfiS7u9+BIS9fcUphqSx2wLcrKSguJeKVVTV+3IQ3euNod/1v9eTJk5kzZw7Tp0+nU6dOnD17lsOHD5e//txzz/Hhhx8yd+5c9Ho9H330ER9++CGzZs2iZcuWfP3119x+++3Ex8fTsGFDPv74Y5YsWcLPP/9MYGAgKSkppKSkALBw4UKmT5/OggULaNq0KefOnWP//v3/mXH48OG8++67HD9+nNDQUADi4+OJi4tj4cKFAOTk5DBy5Eg+/vhjAD788EP69etHYmIiLi4u1/1+ANc8B6PRSN++fcnJyWHevHmEhoaSkJBQPuDanj17GDJkCK+99hpDhw5l69atPP7443h6ejJq1KjyY7z//vu8/PLLvPTSSwDExcXRu3dv3nzzTb766isuXLjAuHHjGDduHHPnzr2h/EKYqwNfLqDJoyN4Nawtn499lxf6R9DmrT6kJb7I8Vffpe3Cr2mz4Q9iO/am6fa12NrrVUcWFkan3eif+yYkOzsbNzc3srKycHV1rfBaYWEhSUlJhISEYG9vD0B+UYnJFyo5OTl4eXkxc+ZMHnrooQqvnTx5kpCQEGbMmMGECRPKl9evX5+xY8fywgsvlC9r27Ytbdq04dNPP+WJJ54gPj6etWvXXjbOx7Rp05g1axYHDx7E1tb2hs6refPmDBo0iJdffhmAF154gbVr17Jz584rrl9aWkqdOnX44YcfuO2224CyzrSLFy/mjjvuuOaxrnUOq1evpm/fvhw6dIhGjRpdtu3w4cO5cOECq1evLl/27LPPsmzZMuLj44GyFpWWLVuyePHi8nXuv/9+HBwcmDVrVvmyzZs306VLF/Ly8so/V/90pc+dEObq1JY91O1+C05FBexteytN1y9F71Dxcx339c80fHQE9iVF7OgzlHYrFihKK8zJtb6//61Wtag42FqT8EZvJce9XocOHcJgMHDrrbdedZ3WrVuX/5ydnU1qaiodO3assE7Hjh3LW0ZGjRpFz549ady4MX369OG2226jV69eAAwePJgZM2bQoEED+vTpQ79+/RgwYAA2Nv/90Rg+fDhff/01L7/8Mpqm8eOPPzJx4sTy19PS0njllVf4888/OX/+PKWlpeTn55OcnHzd78ffrnUOsbGx+Pv7X7FIgbL3dODAgZe9PzNmzKC0tLS85eWf7yuUtcQcO3aM+fPnly/TNA2j0UhSUhJNmjS54fMQwlyUGIooHnYPTkUFJDRsQbO/lmLncHnxHfnAEGLzC4kaP4rWq35hy5LH6Hh7FwWJhaWqVYWKTqe7oUswKvw9X8y1ODk5Xbbs360MmqaVL2vVqhVJSUmsWLGCtWvXMmTIEHr06MHChQsJCAjgyJEjrFmzhrVr1/L444/z/vvvs2HDhv9sYbn33nt5/vnn2bt3LwUFBaSkpDBs2LDy10eNGsWFCxeYMWMGQUFB6PV6YmJiKtVR9Vrn8F/v2T/fi38u+7d/v69Go5FHH330in12AgMDb/gchDAnu599i/anE8m2d8Z7+WLsHK/eQthi3P2sOnSMmbkepB0sZk2vYlztb6yFVoirqVWdac1Bw4YNcXBwYN26dde1vqurK35+fmzevLnC8q1bt1b4i9/V1ZWhQ4cyZ84cfvrpJxYtWkRGRgZQVhzdfvvtfPzxx6xfv55t27YRFxf3n8f29/enc+fOzJ8/n/nz59OjRw98fHzKX9+0aRNPPPEE/fr1K+/IevHixes6r6ud65XOISoqitOnT3P06NErbhcREXHF96dRo0bXnDiwVatWxMfHExYWdtnDzs6u0uchhKlLO3ScZl+UdTg/NPFF6oYF/+c2XWa8Sm5kC85nG5iy/PB/ri/E9TLt5oVayN7enueee45nn30WOzs7OnbsyIULF4iPj7/q5aBJkybx6quvEhoaSosWLZg7dy6xsbHllyymT5+Or68vLVq0wMrKil9++YV69erh7u7ON998Q2lpKe3atcPR0ZHvv/8eBwcHgoKCrivv8OHDee211ygqKmL69OkVXgsLC+P777+ndevWZGdnM2nSpOtqMbqSa51Dly5d6Ny5M3fffTfTpk0jLCyMw4cPo9Pp6NOnD08//TRt2rThzTffZOjQoWzbto2ZM2fy2WefXfOYzz33HO3bt2fs2LE8/PDDODk5cejQIdasWcMnn3xSqfMQwhycHPsMbYvyORLSlDZvTrqubextrXn3rkiGzt7O5jU7OeEPDdpGVnNSUStoCmVnZ2sTJkzQAgMDNXt7ey0mJkbbuXPndW+flZWlAVpWVtZlrxUUFGgJCQlaQUFBVUauEaWlpdpbb72lBQUFaba2tlpgYKD2zjvvaElJSRqg7du377L1X3/9da1+/fqara2t1rx5c23FihXlr8+ePVtr0aKF5uTkpLm6umq33nqrtnfvXk3TNG3x4sVau3btNFdXV83JyUlr3769tnbt2uvOmpmZqen1es3R0VHLycmp8NrevXu11q1ba3q9XmvYsKH2yy+/aEFBQdr06dPL1wG0xYsX/+dxrnUOmqZp6enp2ujRozVPT0/N3t5ea9asmbZ06dLy1xcuXKhFRESUv5/vv/9+hf3/O9ffdu7cqfXs2VNzdnbWnJyctKioKO3tt9++ak5z/twJoWmalpKRp90z/F1tY1AL7fCvK294++8feFErsrLW9rTpXg3phKW41vf3vym962fo0KEcPHiQzz//HD8/P+bNm8f06dNJSEigfv36/7n9jd71I0R1k8+dMHev/H6Q77adokOoJz88fPlYSf8laf0OgrrFYIXG8VUbCe11SzWkFObuRu76UdZHpaCggEWLFjF16lQ6d+5MWFgYr732GiEhIXz++eeqYgkhRK2VllPIgl1l4xON6xZWqX2EdG3HvpiyO/KyJ79SZdlE7aWsUCkpKaG0tPSyvzodHBwu6/j4N4PBQHZ2doWHqB6bNm3C2dn5qo+q9s4771z1WH379q3y4wkhLpcw+W0mrvuaXk4FxIR6Vno/dae+BUDzvRs4sye+quKJWkpZZ1oXFxdiYmJ48803adKkCT4+Pvz444/s2LGDhg0bXnGbKVOm8Prrr9dw0tqpdevWxMbG1tjxxowZw5AhQ674WmU74Aohrl9pcQnh82bRNesCu2/vctkt/TciqFNrDjRtR1T8DlLe/oD6v8pIzqLylPZROX78OA888AAbN27E2tqaVq1a0ahRI/bu3UtCQsJl6xsMBgwGQ/nz7OxsAgICpI+KMBnyuRPmav+cH2n+yL1ccnDB/vxZ7F0uH6/pRsR+/j0tHr+fbHtnbFJP41jHrYqSCktgFn1UAEJDQ9mwYQO5ubmkpKSwc+dOiouLCQkJueL6er0eV1fXCg8hhBA3r3TWbAAO97zjposUgKiH7uGMhy/WpSVs+XntTe9P1F4mMY6Kk5MTTk5OZGZmsmrVKqZOnao6khBC1BoXjycTtXcjAL5Pj6+SfVrZ2rBz6ixeOVhAg5J69KySvYraSGmLyqpVq1i5ciVJSUmsWbOGbt260bhxY0aPHq0ylhBC1CrHP52LjWbkaFATgjq3qbL93jK0F/kOzuxPucSxtNwq26+oXZQWKllZWYwdO5bw8HDuv/9+OnXqxOrVq294Fl8hhBCV57rkVwAyBtxVpfut66ynayMvAFb8eaBK9y1qD6WXfoYMGXLVOz2EEEJUv3OZ+exxqY+X4ylCHhtZ5fsf4VXM08+Nx9OQS+ndqVjbmkSPA2FGZFLCWmjUqFHccccd11yna9euTJw4sUbyCCHUWRZ/npd6j2Xs+3/gE3HloSFuRkyX5vhnX8An6wIJ83+v8v0LyyeFigm6UiGxcOFC7O3tmTp1Kq+99ho6ne6yx9q10rNeCHFjlh5IBaBvc/9q2b/e2YlDncsGbSyY/2O1HENYNilUzMCXX37J8OHDmTlzJs8++ywATZs25ezZsxUenTt3VpxUCGFOLpxMpXTHTnSakX6RvtV2HKfhwwBouHUtJYaiajuOsEy1s1DJy7v6o7Dw+tctKPjvdW/S1KlTGTduHD/88AMPPfRQ+XIbGxvq1atX4WFnZwdAXFwc3bt3x8HBAU9PTx555BFyc6/e4z4vL4/7778fZ2dnfH19+fDDD286txDC9CV9OY8l3z3FL0vfwdu1+gYoDB96G5mOrtTJz+LwT0ur7TjCMtXOQsXZ+eqPu++uuK6399XX/fccNMHBl69zE55//nnefPNNli5dyt3/znUV+fn59OnThzp16rBr1y5++eUX1q5dy7hx4666zaRJk/jrr79YvHgxq1evZv369ezZs+emsgshTJ/tihUAFLdtV63HsdHbkRjTA4C8HxZU67GE5ZHu1yZqxYoV/P7776xbt47u3btf9npcXFyFyQEjIiLYuXMn8+fPp6CggO+++w4np7LRJWfOnMmAAQN477338PHxqbCf3NxcvvrqK7777jt69iwbkunbb7/F3796rlcLIUxDYU4ejeO2A+A1rGpvS74S+3uGwLpfCdu8htLiErn7R1y32vlJucZlEKytKz5PS7v6ulb/apA6ebLSkf4tKiqKixcv8sorr9CmTRtcXFwqvN64cWOWLFlS/lyv1wNw6NAhmjdvXl6kAHTs2BGj0ciRI0cuK1SOHz9OUVERMTEx5cs8PDxo3LhxlZ2LEML0HP15KVHFhaS5eBLas1O1H6/J8DtY9FkfVgVF82BSOu0a+fz3RkJQWwsVpxuYx6K61v0P9evXZ9GiRXTr1o0+ffqwcuXKCsWKnZ0dYWFhl22nadpVZz290nKFc1IKIRQq+LXsVuGk9t3w/vcfXdXA1l7PpknvsDo2lZDjGVKoiOtWO/uomInAwEA2bNhAWloavXr1Ijs7+z+3iYiIIDY2lrx/dOTdsmULVlZWNGrU6LL1w8LCsLW1Zfv27eXLMjMzOXr0aNWchBDCJNXfsQEA/e0DauyY3cK9Afjr8DVaqoX4FylUTJy/vz/r168nPT2dXr16kZWVdc31hw8fjr29PSNHjuTgwYP89ddfjB8/nhEjRlx22QfA2dmZBx98kEmTJrFu3ToOHjzIqFGjsKqBv7CEEGqk7kvAPz2VEp0VYcNqrlDp0siLyPPH6LtoNmfj5I8hcX3k28gM1K9fnw0bNnDp0iV69uzJpUuXrrquo6Mjq1atIiMjgzZt2jBo0CBuvfVWZs6cedVt3n//fTp37sztt99Ojx496NSpE9HR0dVwJkIIU7A5346h90zhq2FP41y3To0d193RjqmbvubJLT+Q/O1PNXZcYd50mhl3UsjOzsbNzY2srCxcXV0rvFZYWEhSUhIhISHY21ff+ABC/JN87oQ5GPfDXpYeOMuEWxvyZM/LLwlXp22PTCJmzgcciIwh6sDWGj22MB3X+v7+N2lREUKIWsRo1Nh6PB2ATg3r1vjx6907CIBGh/ZQcCmnxo8vzI8UKkIIUYuc2LiT8b9/Qu/kfbQIcK/x4wd3bsM5d2/sS4o48uNvNX58YX6kUBFCiFrk4s+/MXrPHzyesBJb65r/CtBZWXGqfVcADL//UePHF+bH4gsVM+6CI8yQfN6EqXPYugmAgk5dlGWwH1h2p1HQjg1oRqOyHMI8WGyhYmtrC5TNfSNETfn78/b3508IU2IsKSXkyH4APPrcqixHo2EDKbK2wSk/h+SEE8pyCPNgsSPTWltb4+7uTtr/hsB3dHS86oitQtwsTdPIz88nLS0Nd3d3rP89FYMQJuDUpp2EFOaSb2tPgxoYNv9qHNxdeOalb1ic78JrubaMUJZEmAOLLVQA6tWrB1BerAhR3dzd3cs/d0KYmrQV6wgBjodFEqm3U5oluHNbSlcfZUviRUa0D1KaRZg2iy5UdDodvr6+eHt7U1xcrDqOsHC2trbSkiJMmvWWLQDktmmvOAl0DKvLB6uPsvX4RUpLjVgr6NgrzINFFyp/s7a2li8QIUStZ59yEgCXHt3UBgEi67vxyqZv6J6wheOt59Hotu6qIwkTJSWsEELUAqcz87ntnvfp8tjXNBjYU3UcbKytaFl4keBLZ0n/fbnqOMKESaEihBC1wO6TmaDT4d60EY6uzqrjAFDUpSsALpvXK80hTJsUKkIIUQvsPJkBQJugmpuE8L/Uu/s2ABomHqAwO1dxGmGqpFARQohaYOikEcz+9S06W2erjlIuMKYlaa510ZcWk/jrKtVxxL+czy5kw9ELZOQVKc0hhYoQQli4S8lnaX58P70St9Ms3F91nHI6KytOtYwBIHf5SsVpxL/tXr6ZGW9+x9PfblOaQwoVIYSwcCeXrQUg2SsAjxDTKVQA6FZ2t0+dXWq/DMXl6n49i8XznuGR9fOU5pBCRQghLFzh5rIi4FzTVoqTXM7/zj4c8/Bnl0cIeYUy3pUp8TgcB4C+XVulOWrFOCpCCFGbOe7fC4DWpo3iJJfzjQqn46RvOXOpgJCULDo1rKs6kgAMRcVQUABAve4dlWaRFhUhhLBgmtFI8PF4ADy7q5vf51rahngAsCMpXXES8bejafn0fPAzbnl+Ib7Nw5VmkUJFCCEs2OldcbgW5mKwtiWoq/qh86+kXYgHNqUlnNmyW3UU8T8HzlwCIDjMH52V2lJBLv0IIYQFO3b8LOfqR2Dn7EBze73qOFcUY1/IgY+GYm00UvjMHdi7OKmOVOvFnc4CyqY6UE1aVIQQwoJtcA5g8H1T+W3qN6qjXFVgszDy9Y7oS4s5sfwv1XEEcO/zo/jhxxfoUHBWdRS1hUpJSQkvvfQSISEhODg40KBBA9544w2MRqPKWEIIYTH2n74EQItA0xmR9t90VlacatoagKyV6xSnEYU5eTQ5cYAOyQcIDfNTHUftpZ/33nuPL774gm+//ZamTZuye/duRo8ejZubGxMmTFAZTQghzF6RoZiTSefA2p7m/u6q41xTSadOsGMNzju3qo5S6yVv2EEjYymZjm7Ua9pQdRy1LSrbtm1j4MCB9O/fn+DgYAYNGkSvXr3YvVs6VAkhxM06tW4rez4YzMIFkwnydFQd55q8+5fN6ByaeIDiQoPiNLVb5p+bAEgOjVDekRYUFyqdOnVi3bp1HD16FID9+/ezefNm+vXrd8X1DQYD2dnZFR5CCCGuLGP9JqzQsHNxRqfTqY5zTUG3tOWSgwuOxYWcWLVRdZxazWr3LgDyW7ZWnKSM0kLlueee45577iE8PBxbW1tatmzJxIkTueeee664/pQpU3Bzcyt/BAQE1HBiIYQwH1a7yr5w8lqY3oi0/2ZlY01Sk7KcmdJPRal6hw8A4NgpRnGSMkoLlZ9++ol58+bxww8/sHfvXr799ls++OADvv322yuuP3nyZLKyssofKSkpNZxYCCHMh9eh/33hdDTN8VP+7eJdQ5na+X6W1W+hOkqtlXUmjYALZd+tQb26KE5TRmln2kmTJvH8888zbNgwACIjIzl16hRTpkxh5MiRl62v1+vR601zHAAhhDAluRczCTx/CgD/3qbxhfNf6o28l89yAnEpseE1o4a1lWlfrrJERw6fIj8kGp+SXJoEqb/jBxS3qOTn52P1r4461tbWcnuyEELcpJS/tmGFxnk3LzxDzOMyeRNfF5z1NuQUlnDorPRBVGE77owa8jpfTP1RdZRySguVAQMG8Pbbb7Ns2TJOnjzJ4sWLmTZtGnfeeafKWEIIYfaytuwA4GyI2nlaboSNtRXd3I3cdmgjp39fqTpOrbQ/5RKASd3OrvTSzyeffMLLL7/M448/TlpaGn5+fjz66KO88sorKmMJIYTZ2+/iy+lmt+LeubPqKDdk+P6VtF/yMbsv9oHHh6qOU6toRiOnDiWBzokWge6q45TTaZqmqQ5RWdnZ2bi5uZGVlYWrq6vqOEIIYTL6zNjI4XM5zB4RTa+m9VTHuW4Hv/uVZiPv5py7N/Uyz6uOU6ukxh7Cr2UESR5++J5Jwt7ertqOdSPf3+pHchFCCFGlDCWlHEvLBaCZCUwqdyNC+nenRGdFvUtpnE9IVB2nVjm7ej0AJc6u1Vqk3CgpVIQQwsIcjz9J6Pkk6uqt8HWzVx3nhjh5unOyfhgAp5euVZymdineuh2AjKbNFSepSAoVIYSwMLk/LWTV1+P46tc3TX5E2itJj4oGoGTjZsVJahe3g7EAWLUzrXF3pFARQggLo+3dA0B+eITiJJVj3fkWADz3y7xvNaUov5CQU4cB8OllWh2wpVARQggL434kHgC7NtGKk1SOf/8eAASfOUZe+iW1YWqJk+u2YF9SxCUHFwLayqUfIYQQ1aTEUETQ6WMAeHc2rSb861WvWUMm3/sqt4z5iv2ZJarj1AoZq/8C4GSj5uisrRWnqUgKFSGEsCCnt8diX1JEnp0D9VtHqY5TaTn9b+esqxd7TmWqjlIrbHEP5ruW/Unvc5vqKJeRQkUIISzIxY3bAEgOaIiVjWn9ZXwjWgfVAWC3FCrVTtM0frIP5pVej+H8yEOq41xG6ci0QgghqlbJnr0AZIc3U5zk5rTxseex7b/Q4o8TGO//y6yLLlN3OrOAtBwDNlY6mge4q45zGSlUhBDCgiwLv4XNGUaibzO9Jvwb0TjAk5CtP+FYXMiJzbto0NU8+9uYg6N/7SD6dAK61tHY25peQSiXfoQQwkJomsZSx0BmdhiG14A+quPcFBu9HSdCy1qFLqz8U3Eay+b05SwWzX+WZ9Z/qzrKFUmhIoQQFiItx0BmfjFWOgjzdlYd56bltmoLgNXWrYqTWDavA2Xj7tjd0klxkiuTQkUIISzEye376XV0G+2tc02yCf9GOXYvG3jML36v4iSWKyctneCzJwAIuO1WxWmuTAoVIYSwENrixcxe/DZPr/1KdZQqEXzbrRjRUT/jLBePnVQdxyKdXP4n1pqR1Dr18GrcQHWcK5JCRQghLITNwYMAFDU17zt+/ubqU5eTfmVfnil/yASF1SF33UYAUiNaKk5ydVKoCCGEhfA4UTZXi0Mr0/3SuVEXIltRaGPH+cNJqqNYJJc9ZTMml7aPUZzk6uT2ZCGEsADFhQb8z50CwKdTa8Vpqs6F518mMvJemgZ7Yd73MZmeovxCQhPjAPC5rZfiNFcnLSpCCGEBTm+Pxc5YQq6dI/UiG6uOU2WiIkMptrYlPjWLwuJS1XEsSty5HO655x0+7P0IQbeYbnErhYoQQliA9G27AEgJCENnZTn/tQd4OODloqe4VONAyiXVcSzK9lNZxPo1JnH4IyY3EeE/Wc6nWQgharHiffsByA6znNYUAJ1Ox8Tjf7Hqq8cpfe9d1XEsyo6kDADaNfBQnOTapI+KEEJYgEVt+jPf4EH/2023U2RlhTrpaHwxmdgd21VHsRglhiL6fv4mdeo1pt1jpj09gbSoCCGEBdhS4sLSJp3xurWz6ihVrk6PLgAEH92PsUT6qVSFE6s3cc/upby5dhbhfm6q41yTFCpCCGHmLuUXcTarEIBG9VwUp6l6IT1vodDGDveCHFK271MdxyJkLF8DwInwlljZmvbFFSlUhBDCzJ3cvIeHd/xK30vHcLW3VR2nytk52nMiJAKA8yvWKU5jGRy2bQagsOMtipP8NylUhBDCzBlWrubF9V/z6I5FqqNUm6z/TVCokwkKb1ppcQkNDpe1THn2M93xU/4mhYoQQpg5XVzZoF0F4U0VJ6k+jl3L/vKvd1AmKLxZSeu24GLIJ0fvSIOeHVXH+U9SqAghhJlzO3YIALuWUYqTVJ/gAT057lGfbb7hXLyUpzqOWUv/YxUAJxq3wNrE+6eA3J4shBBmzVhSSsCZEwDU7dBGcZrq41bfh0EvzicxLZdZZ3Lo7e6kOpLZyo8vK2wLOndTnOT6SIuKEEKYsbOxCTgWF2KwtqV+G8ttUQFoHVwHgD2nMhUnMV+GklIev+VR2j7+LR6PPaQ6znWRQkUIIczY+S3/GzrfNxgbvZ3iNNUrOsgDK2MpqTv3q45itvYlX6KguBSjry8NmwSpjnNd5NKPEEKYMcOBgwBkhjRSnKT6tdMbOPDRMOxKiyl8egD2LnL550ZtSbwAQMewuuh0OsVpro8UKkIIYcYW3zKIt0uDGN65EZbbQ6WMf0QIGbZ6nIsKOLxyA+GD+6mOZHZ6PnEfrYs1Clq8pzrKdZNLP0IIYcbis0s5WC8Mz3YtVUepdjorK5LDWwBwad0GtWHMUPb5izRN3EeXpL20aBqoOs51U1qoBAcHo9PpLnuMHTtWZSwhhDALRqPG8Qu5AIR5OytOUzMM7comXdTvlAkKb9Txn5dirRlJ8QqgXqT5XCpUWqjs2rWLs2fPlj/WrCmbe2Dw4MEqYwkhhFk4l3CcF5fNZMT+lQR6OKqOUyPce3YFIPjIfjSjUW0YM1O0cjUAZ1qb/iBv/6S0j4qXl1eF5++++y6hoaF06dLliusbDAYMBkP58+zs7GrNJ4QQpuzi5h2M2LecpHoh2Fh/ojpOjWjQqzOFNnbUyc8ieUcsgTGtVEcyG767yub30fcx/WHz/8lk+qgUFRUxb948Hnjggav2RJ4yZQpubm7lj4CAgBpOKYQQpiP/f3f8ZASFKU5Sc+wc7TkR3ASA88v/VJzGfJyLO0rghRRKdVY0GHyb6jg3xGQKld9++41Lly4xatSoq64zefJksrKyyh8pKSk1F1AIIUyM1eHDABQ1ClecpGYl3T6UaZ2Gs9EjRHUUs3Fq/kIAjjZohpuv13+sbVpM5vbkr776ir59++Ln53fVdfR6PXq9vgZTCSGE6XI7mQiAbbMIxUlqlv1DD/Cx7W4aGJ14WnUYM7Gz0J6ioBZY9eitOsoNM4lC5dSpU6xdu5Zff/1VdRQhhDALmtGIb+pJADzatlCapaZFB5UNpX/iQh4ZeUV4OFn2iLw3y1BSyueuEeQPe4ul4zupjnPDTOLSz9y5c/H29qZ///6qowghhFm4mHgSV0MepTori5/j59/cHe1o7VBE7yNbObx2q+o4Jm9XUib5RaV4u+hp6ueqOs4NU16oGI1G5s6dy8iRI7GxMYkGHiGEMHnndscBcKZuffROtePW5H96dtM8Zv32Drp581RHMXmH/1iHV24GXRp5mc2w+f+kvFBZu3YtycnJPPDAA6qjCCGE2dgb0pyoCQuY9dQ01VHU6Fg2Foj7vl2Kg5i+nu89y65P72dweoLqKJWivFDp1asXmqbRqJH5jJInhBCqJablkG3vjGuLZqqjKOHbtxsADU4ewpCXrziN6TqzJ56gtGRKdFaE39lTdZxKUV6oCCGEuHHH0v43dL5X7Rg6/9/82zYn3ckdfWkxJ5avVx3HZJ3+cREAR8Ka4+pTV3GaypFCRQghzNBDHz/Hq2tnEW5r+O+VLZDOyopTTaMBuLRyreI0pst+zSoAcrqbZ2sKmMjtyUIIIa7fpeSz9IjfBEBe/e8Up1GnuOMtsHMdTjvkzp8ryb2YSXh8WR8en2F3KU5TedKiIoQQZubsjn0AnHP3xsnTXW0Yhbz6l7UShB3dT3Fh7WxZupaj3y5EX1rMGU8/gju3UR2n0qRQEUIIM5Oz9wAA5/1DFSdRK7hLO168cxK9HvyUg+fzVMcxOaW//QZASude6KzM9+teLv0IIYSZMSaU3WZaENpQcRK1rGysOT9wCKcPnWfnyUxaBnmojmQyikuNPNlhNG3cwnnwUfO97APSoiKEEGbH8UTZHD+6prVrjp8rad+grDjZkZShOIlp2ZmUwWmdAxvb9aFJzw6q49wUKVSEEMLMeKccB8C1Ze0aOv9K2tdz5MGdixnywTOUFpeojmMyVsefA6BHEx+srcxvNNp/kks/QghhRnKzcrEtKus46tuupeI06oUHevDklh9xLsrn2J9bCevdWXUk5TSjke4vjMHeuyExg15SHeemSYuKEEKYkePZJUSPn0+P53/GPaCe6jjK2ejtON64OQAXl61RnMY0HFu5iS4JW5iwZQHtG/mojnPTpFARQggzciwtF3Q66ob4q45iMvJjOgGg37pZcRLTcPH7HwE43DwGe1fzH7lYChUhhDAjiX8Pne9t/l9AVaVO71sBCD60F2NJqeI0amlGIwFrlgJgvOtuxWmqhhQqQghhRtp98BJfLXydDueOqI5iMhr06UK+rZ46+dkkb9mjOo5Sx9dsxj/9DIU2doQ/fK/qOFVCChUhhDAjYQd2cOvxXQQ6WauOYjLsHO05ERYJwPk/VilOo9aFr74HIKFlJ5zr1lGcpmpIoSKEEGaiMCcPv/RUAOq1a6E2jInJadcRg7UNF5POqI6ijGY0ErS27LKPNmSo4jRVR25PFkIIM3F29wFCNCPZeic8w4JUxzEp+icn0NzjFhzdXelr1LAy87FDKiPu8GlOezXA3pBPxIPDVMepMtKiIoQQZiJjVywAqX4hZj13S3VoFhGElZMTGXlFHD6XozqOEktO5PL4nS/w+uercajjqjpOlZFPuhBCmImiuLI5frKDwxQnMT12Nla0CykbTn/r4XOK09Q8o1FjedxZAPq2sqzWNilUhBDCTOiPHgbA2KSJ4iSm6e7cEyyb+wRtnhytOkqNO7h5Hw4nEnHW29C1sZfqOFVK+qgIIYSZyCzRkWPngENUM9VRTFJERCAN0k6Qn3mGovxC7BztVUeqMQVvTWHdmoWsGvQo9ra9VcepUtKiIoQQZqCk1MiYPhOJnPgzHncNUB3HJAV3aUeGkxuOxQaO/bFWdZwaU5iTR5ONKwCoP9CyihSQQkUIIcxCckY+xaUaDnY21PeUUWmvxMrGmqSo9gBkLV2pOE3NiZ81H1dDHufcvIgYdrvqOFVOChUhhDADfw+d38DLqVbeenu9Srt3B6DO1o2Kk9Qc63llg7wl9b0LKxvLGwhQChUhhDADzp/MYO2cMTy4Z4nqKCbNf1DZZbGwpHhy0tIVp6l+F48n0+zAVgD8xj2sOE31kEJFCCHMgF3CQcIyTuNrU7sn3fsvfi2acNrTDxvNyPFFy1XHqXbHPpqDjWbkaFATgjpGq45TLeSuHyGEMAPuJ48BoJc7fv7TsS592XnkBBfybGmhOkw1068q64uTOchyRqL9NylUhBDCxBlLSvE7dwoAzzYt1IYxA8Vvvs1T3+0m0OjII5qGTmeZfXriTmcx5PYX6XdsB68/YZmXfUAu/QghhMk7n5CIU3EhRVY2+EVLi8p/6RDqiZ21FckZ+Zy4mKc6TrX5YWcyxda2MHQI7oG+quNUGylUhBDCxKXtjAUg1csfW72d2jBmwElvQ9vgOjRJO8HBxZY5nkpubgF/7EsB4J62gYrTVC8pVIQQwsQVxMYBkBEUqjiJ+XgsYRUr5j5Bg4+mqI5SLeLf/ojlnzzA40kbyuc4slRSqAghhIk7U2JDvHcD8iMiVUcxG/6DbgOg8ZG95KVfUhumGnjM/4bArPN09rK12D44f5NCRQghTNwPzXvTf/THZEycpDqK2QiMaUmqhy92pSUkLvhDdZwqlbh8Aw1TjlBkbUPjSWNVx6l2yguVM2fOcN999+Hp6YmjoyMtWrRgz549qmMJIYRJ0DSNY/8blTbMS4bOv146KytS2nYGwPCHZRUqmdM+BuBA+57UCfJTnKb6KS1UMjMz6dixI7a2tqxYsYKEhAQ+/PBD3N3dVcYSQgiTcSErn5y8Qqx0ZcPni+tnP7Ds8k/Qjg1oRqPiNFUj81QqURuWAuA43vJbU0DxOCrvvfceAQEBzJ07t3xZcHCwukBCCGFi0pau4dC0Iexu1Br7Kf1VxzErjYYNxDDOlnqX0jixfgcNuseojnTTDr81jZiSIo75N6LJ4L6q49QIpS0qS5YsoXXr1gwePBhvb29atmzJnDlzrrq+wWAgOzu7wkMIISxZXmwc+tJinPQyPueNcnB34VBk2WzKZ7//WXGam1dsKCL0528ByHzoMXRWyntv1AilZ3nixAk+//xzGjZsyKpVqxgzZgxPPPEE33333RXXnzJlCm5ubuWPgICAGk4shBA17FACAAVhjRQHMU/p459i+NC3eKfZANVRbtqKwxd5ZOBkfm3dj6hnHlEdp8YoLVSMRiOtWrXinXfeoWXLljz66KM8/PDDfP7551dcf/LkyWRlZZU/UlJSajixEELULKfjiQBYN41QnMQ8tRraj+0NWhJ/oYCTZj5K7debk4j1a0zylOnonRxVx6kxSgsVX19fIiIq/vI1adKE5OTkK66v1+txdXWt8BBCCEtW70wSAO7RzRUnMU91nOyIaeAJwKr4c4rTVN7eUxnEplzCztqK4e2CVMepUUovenbs2JEjR45UWHb06FGCgmrXP4IQQlxJ1pk06uZmAODbvqXiNObrDm/oNmsO4auzYe+fquNUSsnQe3izyIakRyfg5aJXHadGKS1UnnzySTp06MA777zDkCFD2LlzJ7Nnz2b27NkqYwkhhEk4u2MvbsB5Ny98vCx7mPTq1DXcG4/dS7BCI+3QcbybmNdUBCc37KTtjtW0RkdKszdUx6lxSi/9tGnThsWLF/Pjjz/SrFkz3nzzTWbMmMHw4cNVxhJCCJNwKqeU5Y06EN+8o+ooZq1uoxCONiibdTppzjzFaW7chZfLipPYNt0I6hitOE3NU36/22233cZtt92mOoYQQpicXZ4hfHnnC4zuGEx31WHM3KU+t8FncTgvWwLTXlUd57ql7kug5eYVALi89rLiNGrUjpuwhRDCDCX+PXS+twydf7MCHx4BQJOj+0g7dFxxmuuX8sLr2GhGDjRtR8N+XVXHUUIKFSGEMFGXEpNA02jo7aI6itnza9GEQ6FRWKFx4pMvVce5LucTEmmxZjEA1pMnK06jjhQqQghhgvIysvj93WHETx9MQ32p6jgWIXvQUAC8lixSnOT6HJ/0GvrSYg6FRhFxj/kPWFdZUqgIIYQJSt0eC4DBTk8dPy+1YSxE+LgHSHX1YkO9JhxJTlcd55pOpefxRMMBfNH2LrS336k1w+VfSe09cyGEMGFZe/cDcNavgeIklsPNvx6vfryMN3o8wuL4C6rjXNOMtYlctHdh66PPETG0dk9GKYWKEEKYoOKD8QDkNghTnMSy3B3tD8CivacpLjUqTnNliUnn+W3faQAm9WqsOI16UqgIIYQJsk88CoAW3kRxEsvSPdwHL0cbGsXtYPdPK1XHuaLcQUP54ccXeMAtl0h/N9VxlFM+jooQQojL1U0pu4XWqUWU4iSWxc7Gig+Or6DLTx+xPyEG7u2nOlIFB79fTMu9GyjRWeHXUS77wU20qJSUlLB27VpmzZpFTk4OAKmpqeTm5lZZOCGEqI0Mefn4XjwDgE+7FmrDWKDQx0cDEBm3nXNxRxWn+X8lhiKcnnsGgD19hxLUuY3iRKahUoXKqVOniIyMZODAgYwdO5YLF8o6JU2dOpVnnnmmSgMKIURtk5yayZy2d7Ey4ha8GoeojmNx/Ns1J75xNFZoJL0zTXWccntenkrI2RNk2TvT+PMPVccxGZUqVCZMmEDr1q3JzMzEwcGhfPmdd97JunXrqiycEELURkcLrHiv6yi+eHxKrb4ttToVjXkMgCa//0DBpRzFacpmym706fsAHBrzNO6BvooTmY5K/QZs3ryZl156CTs7uwrLg4KCOHPmTJUEE0KI2ioxreyLU4bOrz5RY0eS6uGLe0EOB975WHUcDj80njr52Zz0CSZ6Su0dhfZKKlWoGI1GSksvHynx9OnTuLjIUM9CCHEzcvfuxys3k4ZeTqqjWCxrWxuSRzwMgO83szCWqBv9d9fhVJz27QEg9/1p2NrrlWUxRZUqVHr27MmMGTPKn+t0OnJzc3n11Vfp18+0elALIYS5ufejyez6dATtjuxUHcWiNXtpItl6J3J1NmzYEq8kQ2FxKc8tPcodIz7k68kzaTbiTiU5TFmlbk+ePn063bp1IyIigsLCQu69914SExOpW7cuP/74Y1VnFEKIWqOkqJj655MB8GrdXHEay+Zctw5zP/6JN44bCY/NosstGlZWuhrNMPPPY5y4mIeXuxN3P/lIjR7bXFSqRcXPz4/Y2FgmTZrEo48+SsuWLXn33XfZt28f3t7eVZ1RCCFqjbOxh9CXFlNgo6delIxKWt3uvK8nzvZ2HDqbzar4czV67MRlf6F/83XsSop5c2BT3Bxta/T45qLSA745ODgwevRoRo8eXZV5hBCiVkvfGUsAkOoTSKitjMlZ3dwd7RjdKYQ5K+I4+sYH9PrhPaxr4H3PSUvH8f7hjM84SxM3a3o0u6Paj2muKtWi8u2337Js2bLy588++yzu7u506NCBU6dOVVk4IYSobQri4gDIDApVnKT2eLBDEMu/m8iEhdPY/fLUaj+eZjRy5M77qJ9xlrPuPrSZLWOmXEulCpV33nmnfPyUbdu2MXPmTKZOnUrdunV58sknqzSgEELUJjaHDwNQ3ChccZLaw81Jz/n7HgCg0SfvkXW6ei8B7XptOq23rqREZ0XWV9/iVl+6TFxLpQqVlJQUwsLKZvT87bffGDRoEI888ghTpkxh06ZNVRpQCCFqE/eTxwDQRzZVnKR2af3eiyTVC6FOfjaHH55Qbcc5+sc6oqa8AMCuB58k/K7e1XYsS1GpQsXZ2Zn09HQAVq9eTY8ePQCwt7enoKCg6tIJIUQtomkaX7YcwJetB+LRub3qOLWKjd6O/A+mA9Bu5c/EfbOoyo9xPiERj3uHYF9SRGzzTrT7/L0qP4YlqvQ4Kg899BAPPfQQR48epX///gDEx8cTFBRUpQGFEKK2OJtVyE/hXXi35yPUbxmhOk6t03T4QHb0HgyAz4QxVXoJ6FJ+Ee/MWou+qICkeiGErV2ClY11le3fklWqUPn000+JiYnhwoULLFq0CE9PTwD27NnDvffeW6UBhRCitkhMK5t9PriuE7bWMsePCpE/ziHFKwDv7IvsGTmeUqN20/vMKSxm5Nc7+d0hkHEPvo9+xTKc69apgrS1Q6XuwXJ3d+eDDz7gwIEDpKWlsWTJEgCio6OrNJwQQtQm6dv3EH06gcDgNqqj1FqOddw48813bB73FE83vZOBSxN47fbK9xfKSErhrc9Wst/ajzqOtrz45P34+chUMzeiUoXKypUruf/++0lPT0fTKlabOp3uivMACSGEuLb6875k0ZpFbDM+BmO6qo5TazXs15Wji5eS+eM+vtl6Ejd7Gyb2aHjDM1mf2rIHqzvu4NWcTE4/NI1Xxg+lkRQpN6xSbYvjxo1j8ODBpKamYjQaKzykSBFCiMpxPZEIgG2k9E9RrX9zP17oV3aLeMGU99jZbxhF+YXXta1mNLL7rY/x6taJgIunKXBw4oPhrWlW3606I1usSrWopKWl8dRTT+Hj41PVeYQQolbSjEZ8U5MAcI+WOX5MwSOdQ/E+l8yAqd9irRk50XAvJbNm0+i27lfd5vjqTRRMfIrWh3YDEN84Gp+lC6kXFlxDqS1PpQqVQYMGsX79ekJDZeREIYSoChmnzuBZkIMRHfXbtVQdR/zPHUO6se/8VwQ/P5EGqcdhwK3EN2xJ7h134dahLbqWLcnWrNmfcommEx4kZt96AAzWtuwd8ThtZ39QI0PyWzKd9u9OJtchPz+fwYMH4+XlRWRkJLa2FSdSeuKJJ6os4LVkZ2fj5uZGVlYWrq6uNXJMIYSoDvHzf6fpfXdwxsOX+umpquOIf8lIOs2J+8fQYssKbDRj+fJbH/qc454BADy98XvG7FjI/nY98Pt0Gn5yi/lV3cj3d6XKvB9++IFVq1bh4ODA+vXr0en+f1psnU5XY4WKEEJYitz9ZXP8XAxoQH3FWcTlPEL88di0lPMJiZx4/1Ocdu+g3qlE6jjYUt/dgSa+rrjGPM2lJlNo3ShEdVyLUqlC5aWXXuKNN97g+eefx+oGe0ELIYS4gvgEAApCGykOIq7FJ6IhPnNnlD9fqC5KrVGpQqWoqIihQ4dKkSKEEFXk9+jeLC12o2t/mftFiH+qVKUxcuRIfvrpp6rOIoQQtdZafX2+b3UbdXp0UR1FCJNSqRaV0tJSpk6dyqpVq4iKirqsM+20adOuaz+vvfYar7/+eoVlPj4+nDtXvVNsCyGEKcnKLyYtxwBAQ29nxWmEMC2VKlTi4uJo2bLs9rmDBw9WeO2fHWuvR9OmTVm7dm35c2trmaRJCFG7JO89yN1x6zgf2gQXe9v/3kCIWqRShcpff/1VdQFsbKhXr16V7U8IIcxN/orVfLh8OgeatgceVh1HCJOivDdsYmIifn5+hISEMGzYME6cOHHVdQ0GA9nZ2RUeQghh7rSD8QDkh8kdP0L8m9JCpV27dnz33XesWrWKOXPmcO7cOTp06EB6evoV158yZQpubm7lj4CAgBpOLIQQVc/x+FEArJrKAGFC/FulRqatLnl5eYSGhvLss8/y1FNPXfa6wWDAYDCUP8/OziYgIEBGphVCmLXz7t74ZF3g8KIVhN/VR3UcIapdtY9MW12cnJyIjIwkMTHxiq/r9Xr0en0NpxJCiOqTnZaOT9YFAHzbRytOI4TpUd5H5Z8MBgOHDh3C19dXdRQhhKgRqdv2ApDm4ombn5fiNEKYHqWFyjPPPMOGDRtISkpix44dDBo0iOzsbEaOHKkylhBC1Jic3fsBOO/fQHESIUyT0ks/p0+f5p577uHixYt4eXnRvn17tm/fTlBQkMpYQghRYzZFdOCTwa/TPSqASNVhhDBBSguVBQsWqDy8EEIod6DQho0Nounds5nqKEKYJJPqoyKEELVN4vlcABp6uyhOIoRpMqm7foQQojbJTb/E0CWzSawbSEOvHqrjCGGSpFARQghFUrft5YltP3HRuQ51nKeqjiOESZJLP0IIoUjWnv/d8VNf7vgR4mqkUBFCCEVK4srm+MkJlTl+hLgaKVSEEEIRh2NHANDJHD9CXJUUKkIIoYh3ynEAnFtGKU4ihOmSQkUIIRTIz8yiXsY5AHzbt1KcRgjTJYWKEEIokLojFis0Mh3d8AjxVx1HCJMltycLIYQCBzyDefCR2XR2LeVN1WGEMGFSqAghhAKJ6QWcquOH1j5QdRQhTJpc+hFCCAVk6Hwhro+0qAghhAJ9vpxCuM6RJne9pDqKECZNChUhhKhhhdm53LnlN6w1Ixc831IdRwiTJpd+hBCihp3Zvg9rzcglBxfqhgSojiOESZNCRQghaljmrlgAzvqFoLOS/4aFuBb5DRFCiBpWsv8AANlhjRUnEcL0SaEihBA1zOHoobIfmkWqDSKEGZBCRQghapjPqUQAXFq3UBtECDMghYoQQtSgnPRLeGRnAFD/lnaK0whh+uT2ZCGEqEFHc40Me+oXooszWFDfW3UcIUyeFCpCCFGDjpzLpdjaFrsmzVRHEcIsyKUfIYSoQUfOZQMQXk+GzhfiekiLihBC1KCO014hMiMbl+gXgCaq4whh8qRQEUKIGqIZjbTZvY46+dkcc3lZdRwhzIJc+hFCiBqSfuwUdfKzKdVZ4d8hWnUcIcyCFCpCCFFDUrfsBuBM3frYuzorTiOEeZBCRQghakj+nlgALgY3VBtECDMihYoQQtQQ6/iDABiaNFWcRAjzIYWKEELUEPcTRwHQt2iuOIkQ5kMKFSGEqAGlpUZKDEUY0eEV00p1HCHMhtyeLIQQNSA5s4C+oz7G3VjEntZRquMIYTakRUUIIWrAkXM5APgH1MXaxlpxGiHMh8kUKlOmTEGn0zFx4kTVUYQQosr9Xag09nFVnEQI82ISl3527drF7NmziYqS5lAhhGVq/faz/HzsKBe8nwOkM60Q10t5i0pubi7Dhw9nzpw51KlT55rrGgwGsrOzKzyEEMIc+Cfspe3pBALc9KqjCGFWlBcqY8eOpX///vTo0eM/150yZQpubm7lj4CAgBpIKIQQN6cwOxf/C6cB8O3URnEaIcyL0kJlwYIF7N27lylTplzX+pMnTyYrK6v8kZKSUs0JhRDi5iVv3Im1ZiTDyY26DYNVxxHCrCjro5KSksKECRNYvXo19vb217WNXq9Hr5dmUyGEebm0dRcAZ4Ia42GlvCFbCLOirFDZs2cPaWlpREf//wyipaWlbNy4kZkzZ2IwGLC2llv4hBDmzxgbC0Bek2ZqgwhhhpQVKrfeeitxcXEVlo0ePZrw8HCee+45KVKEEBbD7WgCADYtW6gNIoQZUlaouLi40KxZxb8unJyc8PT0vGy5EEKYK6NR46TenTrOHtTt1E51HCHMjkmMoyKEEJYqOSOfxwY8i52NFQkdo/97AyFEBSZVqKxfv151BCGEqFIJZ8vGewqv54KNDJ0vxA2T7udCCFGNjpy8AECErwydL0RlSKEihBDVqNvLY9k5cwS9ju9QHUUIsySFihBCVCPfk0fwzsvEt4G/6ihCmCUpVIQQoppknkrFJ6vs0o9/F7njR4jKkEJFCCGqyZkN2wE47emHi5eH4jRCmCcpVIQQoprk7tgNQFqDcMVJhDBfUqgIIUQ1sY47AIChaZTiJEKYLylUhBCimngeOwSAQ9tWipMIYb5MasA3IYSwFIXFpfwZ0Jw0G0eCO7dXHUcIsyWFihBCVINjabm81e1B3B1t2RcRpjqOEGZLLv0IIUQ1SEgtGzo/wtcVnU6nOI0Q5ksKFSGEqAapew/iZMiniQydL8RNkUJFCCGqQZ/3niVuxlB6JW5XHUUIsyaFihBCVLESQxFBKYlYoeHbvoXqOEKYNSlUhBCiiiVv2YNDiYFcO0f82zRXHUcIsyaFihBCVLGL67cCcCo4HCsba8VphDBvUqgIIUQV03btAiC7mbSmCHGzpFARQogq5n44DgDbtm0UJxHC/EmhIoQQVai40EBwSiIA9bp1VJxGCPMnI9MKIUQVSjyTyY/dHyQq/RSDWjdTHUcIsyeFihBCVKED6UV83+o2OoR6MthKGq2FuFnyWySEEFUo7kwWAJH+boqTCGEZpEVFCCGqkOPyP2hS6kKUT6TqKEJYBJ2maZrqEJWVnZ2Nm5sbWVlZuLrKfBpCCLWK8gvB1QW70hLO7D5I/eimqiMJYZJu5PtbLv0IIUQVSd64A7vSErLsnfFr2UR1HCEsghQqQghRRdL/3ARAcoMIdNKRVogqIb9JQghRRax27AAgt5UM9CZEVZFCRQghqki9Q/sBcOwsA70JUVWkUBFCiCpwKfksARdSAAjq3VVtGCEsiBQqQghRBU6tWg9AilcA7oG+asMIYUFkHBUhhKgCG70b88GQN+ge4MRo1WGEsCDSoiKEEFVg58ViNoW0wmbQ3aqjCGFRlBYqn3/+OVFRUbi6uuLq6kpMTAwrVqxQGUkIIW6Y0agRm3IJgJYB7kqzCGFplBYq/v7+vPvuu+zevZvdu3fTvXt3Bg4cSHx8vMpYQghxQ5L3HOTxlV/S69Qewuu5qI4jhEVR2kdlwIABFZ6//fbbfP7552zfvp2mTWXoaSGEeUj7YzWP7VhIQsYxbKxfUR1HCItiMp1pS0tL+eWXX8jLyyMmJuaK6xgMBgwGQ/nz7OzsmoonhBBXpW3fDkBW82jFSYSwPMo708bFxeHs7Ixer2fMmDEsXryYiIiIK647ZcoU3Nzcyh8BAQE1nFYIIS7nFb8PAH2nDoqTCGF5lM+eXFRURHJyMpcuXWLRokV8+eWXbNiw4YrFypVaVAICAmT2ZCGEMtnnL+Lk64O1ZuTC4eN4NW6gOpIQJu9GZk9WfunHzs6OsLAwAFq3bs2uXbv46KOPmDVr1mXr6vV69Hp9TUcUQoirSlqyhuaakTMevtSXIkWIKqf80s+/aZpWodVECCFMWf669QCkRslEhEJUB6UtKi+88AJ9+/YlICCAnJwcFixYwPr161m5cqXKWEIIcd1sE/43nEKnW9QGEcJCKS1Uzp8/z4gRIzh79ixubm5ERUWxcuVKevbsqTKWEEJcl8LiUu697QXqtz/NNyP7qY4jhEVSWqh89dVXKg8vhBA3ZX/KJYqMGrlBDQgMra86jhAWyeT6qAghhLnYdTIDgLbBHuh0OsVphLBMyu/6EUIIc9XipQl8mpFNacSLqqMIYbGkUBFCiEooMRTRcs96nIoKOO4r4zgJUV3k0o8QQlTCybVbcSoqIFvvRHDXdqrjCGGxpFARQohKuLhyLQBJ4S2wtpXGaSGqixQqQghRCfqtWwDIbyfz+whRnaRQEUKIG1RiKCI0fhcAngP6KE4jhGWT9kohLFRG0mlSVm8kraCEPY3bciHHQInRyKC572FrZwP+/jg0b4Z/7y54hshM5Dfi2Ir1hBvyyLJ3JrR3Z9VxhLBoUqgIYSEKc/I4/N2vFP26GL/9O/FPP4MHsNM/gi+GTy1f78Utq/HOy6yw7Qm/UM5370vA+IfxbxtVw8nNT8LxNAp9G1ESGERr6Z8iRLWS3zAhzJimaew+mYHxgQeJ3LKKFsWFFV4/5R1IQcNwRnUIxsfVHjsbK44UTeT4mdPYJZ/CM+kowedP0iD1OA3mzeTQqqWMfncBj3VrSNsQD0VnZfoWOIey8/5pvHV7BK1VhxHCwkmhIoQZys3J5+f95/lhZzLH0nL5IjUNx+JCzrl5capTT+wH3kZwv1sJqu9NENDlnxt3eqXCvjKSTnN83iL0Py/g65BO/HX0In8dvUivMA9eamZPYPuWNXlqJi/PUMK+5LIWqU6NvBWnEcLySWdaIcxIRtJptg1/nKJ6fnw970+OpeXiYGvN4Yef5PDiVfhknKPd0vk0f/ge3Opf35eoR4g/bV6eQFTcNiZ+/Rr3tA3ExkpH/QVzqdepLdsfepoSQ1E1n5n52LfnCPr8PPzrOBDk6ag6jhAWT6dpmqY6RGVlZ2fj5uZGVlYWrq4yMqSwXOlJKSROfJHmy3/GocQAwHc9RqB7803uaOGHi71tlR7v+IVczt8+mA7bVwJwJKQp7ksX4xPRsEqPY4623zGS1kvmsfaecfSZ/5HqOEKYpRv5/pYWFSFMWPb5i2wb/jgOjRvRfsn3OJQYSAwMZ9/0Lxm+/GtGtA+q8iIFINTLmZgty9j1xgyy7Z1pnBSPTdu2JCz4o8qPZW58dm7GRjPi1aKJ6ihC1ArSoiKECSosLuX7jYn0v7sLfllpABwNbILh1ddoNmoQOqua+xvjzJ54Cm+/g9DUY5TorNjz3Nu0m/J8jR3flFw8dpK6DUMAyDx5hjpBfooTCWGepEVFCDNVbChi/o5TdHn/L95ec5yFEd045RPEvulf0jDpIJEPDKnRIgWgfnRT/OL3srtjX2w0I63fe5EfvluNGf+NU2knFywB4Jh/QylShKghctePECbAWFLK3ve/wPfDd/ij1zjOB0ZR390B/w/ewr9NMEGKx+pwcHcheuNSto0Yx28ZNvyUUMzJFYeZ3DccnU6nNFtN0i1fDsCFDl0JU5xFiNpCChUhFNKMRg589RPOr71C69RjADy+93f6jB3GPe0C0dtYK074/3RWVsTM/4z4TSdg2SFmbzyBdV4uz94dXeOtPCqUGIpouK9sfp86Q+9SnEaI2sPy/3cRwkQdXriCQ41b0fyRewlNPUaO3pFtoyYSvWUlozqGmFSR8k8P3dKAd++KxDsnnTsfG8SOkU+ojlQjji5ehWthLpmOrjQc0EN1HCFqDWlREaKGHT6XTep9D9F93S8AFNrYETvwPsKnvUVMoK/idNdnWNtA/H87R6P0ZJj3KdtdXGj/2RTVsarVajyY3+tx2vg4cIcMmy9EjZEWFSFqyIkLuTz5Uyx9P9rEEodASnRW7Oh5N1kH4mm/8CvczaRI+Vund55l20NPA9D+83fZPeVTxYmq17LUEua37If100+pjiJErSKFihDV7OSGnezu2Je59z3H4n1n0DQoGTqU1G17abd6IT5NzLdbZsycD9h+5ygAol6eSMJPy9QGqiYpGfkkpuVibaWjcyMv1XGEqFWk/VKIanJ8zRayXnqVFjv/JBgNXxcvzg+9j/G9Ioj0d1Mdr8q0/flL9sacotXuv/AbfQ8pQRsIsLD5gY7MnMv9e+K40Os23ByqfoA9IcTVSYuKEFVIMxo5+P1i9rfoRGivTrTauQ4rNPa26U7BwkXMfiDGoooUACsba5qs+Z2jQU1wL8ghY/gosguLVceqUgHffM4ba2cx6twe1VGEqHWkUBGiChhKSvlldwoLewyn2f130Xz/Fozo2BPTm6S/ttNq5zrCet2iOma1cXB3wWPdSv5q2olHez/FkwtiMRotY0C4tEPHaZwUD0CDh+9TnEaI2kcu/QhxE87sOciSg+f5KkXjYq6BFv5t6G+7mLied+L3yvNEt2uuOmKNqRsaiOfKP8j4YhvrDqfx0bpEnuzZSHWsm5Y0Zx7ewOGQZoSHh6qOI0StIy0qQtygovxC9nwwm7iIttRvHYnLjA+5mGugnqs9vR8YSFHyadot+5GAWlSk/C3K352372gGwJHPvmHfp98rTnTznJeVDZuf1XeA4iRC1E7SoiLEdTAWl3Do52XkffM9jTevIrowt2w5OhraFPHFfa24tYkPttZS+w9uHUDRb0sY/tsUcpc7ktyqKYExrVTHqpSLx04SnhgLQMBDctlHCBWkUBHiKkpKjew5lcmahPMMGDeE5qfiy19Lc/HkxG2DCZw0nnYtIxSmNE1DXnqIhJ+/JCIxlot33k1e/D6cPN1Vx7phxz7+ivaaseyyj/w7C6GEFCpC/EP2+YskfreQkmXLGX/LI6QVl024V8c3gpBzJznSqRcOI0fQZNgAvGV00quytdfjvXwxF1pFE3z+JHsGDKPV5uVmNyfQmYTjFFtZk3XXYNVRhKi1dJoZz9WenZ2Nm5sbWVlZuLq6qo4jzFBhTh7Hf19D9qq1uG3fTNiJeOyMJQCMGvQasZExdG/sTf8gRzpF+qN3clSc2LwcXriC0KEDsDWWsn3CK7Sf8brqSNftWFouPaZtwLMwh9XP9cDTt67qSEJYjBv5/pY/CUWtkpFrYP+ZLGKTL6EtWsjYL1+jaWnFMT+SvQJIvaUHT43pQ0TXNthIv5NKCx/Ul+3jXqD9x28S/cnbHO7UnvBBfVXHui6/x54BoHmLUClShFBIChVhkTSjkfRjpzi7dS95u/Zgu3cPvkfj+Cz6Dua16g9Ao2I3niot5qKzByej2mLs0pX6d/UjsHUkgYrzW5J2019jz84dRG9fzfpZP+PZuzteLnrVsa6ptLiEXau2g21dBrbwUx1HiFpNaaEyZcoUfv31Vw4fPoyDgwMdOnTgvffeo3HjxipjCTNSYiji3LlMThVbc/xCLmlxR+j3/nP4nTlB3YIc/v13cIuzR9nqNYQW/u60CmxK8kMxBLRrQV0z6zthTnRWVoQv/YmXJ33G997NWf/jXuY92M6kW6riv13IgmmjWBfekY5vblAdR4haTWmhsmHDBsaOHUubNm0oKSnhxRdfpFevXiQkJODk5KQymjARuYYSzl/MpuCvjRSeTKbk9Bl0J5NwOJ2Mx7kUfDLPsyGqJy/2HgeAkyGfZ47tB6BUZ0Vq3fpcCArD0KIVLrd0oGfPWxjk+89J5YJr/qRqISdPd0a++wS/ztzM9hMZvL/yEJP7N1Ud66pKv5gFgFPDEOxtrRWnEaJ2U1qorFy5ssLzuXPn4u3tzZ49e+jcubOiVKK6aJpGrqGES/nFZGblwZ9/UpR2kdILFzGmp2OVno71pUwcLqaxNyCC9zoMJ9dQgkNRIYemD7rqfgOyLxDq5USwpxON6rmw0+8LPJtHUD+mFQEuTgTU4DmKqwvzdub9wc15ddZauj/2PPuemkjLiQ+qjnWZC0dOELl3IwD1nh6vOI0QwqT6qGRlZQHg4eFxxdcNBgMGg6H8eXZ2do3ksnSa0UhJUTGG3HyK8wowoKPQyRVDiZGiAgO2e3ZRkl9AaUEhxoICSgsKMBYY0AoLyPAJ4FiLDuQWlVCUnUP/mW9gk5+LTUEedvl56AvzsS/Mx8FQwKpGMUzqOwEAu5Jijn447KqZLhRBbnTZ3Tc2Ls4c9m9MibMzhZ7eFAcEYd0wDKcmDanbvAmdGjVgnc0//urtE16t75eovH6RvjhnbKddykFyn3uClPatTG6m5WPvfkKMZuRQaBRNurRVHUeIWs9kChVN03jqqafo1KkTzZo1u+I6U6ZM4fXXq//2xkNns9m2dDNN/lwCmgZGDdDQGY2gaWhGjUMde3K6cQs0NDxSkmi77Afg/9f9ezudpnGgQy8SI9tj1DQ8ziXTY9Gcste1/+2z7A1AZzSyp31P9rfujoZGnbRUBs2fBhrl+9QZjejQ0JUa2d76Vv6M6UepUcP94lkmzXkZK2MJ1qWlWBlLsSotxdpY9vMfrXozq9sISowaHpcusnTmA9gYS7E2GrHWjNgCf09eP79Fn/JLKW4FOez/+J6rvle/RXThwwFlPUFsS4t5dfOyq67rmZsJgL2tFXXcXDgUFIHR3gGDmzvF7h4Y69RB5+mJjb8fHuGN+bNTDN6u9jjrbeD1wzfzTypMSMzc6STs3kpEYiwX7ryb/IR9ONYxjRmlDXn5NFz4HQB5o0yvtUeI2shkCpVx48Zx4MABNm/efNV1Jk+ezFNPPVX+PDs7m4CAqm/YP3o+h60rt/PAr7Ouus7SfEfmXyz7z7XDyf2MX/7jVdddVerOT4b6ALQ6c4TJG/646rrrbbxYpm8CQHjaKd7es/6q625xqs92n/YABGZm0ij50FXXtb2UycXcIgBsio04Fhuuuq7eWIqTnTV6W2tcHVw47VmfYls7Suz0lNjaUWpnR4mtHqOdHcYmrbinbQCOdjY46W3Ylj8ZKxdnrN1csXFxxcbdFX0dN/QebjTz8eJwgN//X/OfHH/VDMJy2ert8F76KxejWxNyLondA+4heuNSkxgMbv/UL2ibm0GaiydRTz2iOo4QAhMZ8G38+PH89ttvbNy4kZCQkOverroGfIs7ncXGX/8kes0i0OlAp0OzsgJ06KzKnh/p2JOz4S3Q6aDOuRQiV/0K/3sNndX/frYCnY6UVh240Kysedsp/TyN1ywp3w86HVhZlf98oWlL0pu2QKfTYZ+dSfCfywAdWOnQWVmDDtDp0NnaktuwCblNo7Cx0mFjKMBr91Z0NrZY2dpiZWuDla0NOhsbrGxt0Xy80QICsLHSYa1p6FNPY2Vjg7WdDVZ2ttjY2mLn4oydkwPWNtJ5UFS/hJ+W0eie27HRjGx/6nXaf/iK0jyaprE3vC3RR3ez/dFnaf/Fe0rzCGHJbuT7W2mhomka48ePZ/Hixaxfv56GDRve0PYyMq0Q5m37+JdoP/NtiqxsOLFwGeF39lKWZf2RNB6Zs5VBiZt47ovncatwd5gQoirdyPe30rbWsWPHMm/ePH744QdcXFw4d+4c586do6CgQGUsIUQNaffRG+xteyvJ7vV4bcNpLuZe/ZJkddI0jelrjlJkY4vDww9KkSKECVHaoqLT6a64fO7cuYwaNeo/t5cWFSHMX+7FTIbN2sbBHI2YBp58/2DbGh8MbtPa3YxedQZbez0bn+1m8iPnCmHuzKZFRdO0Kz6up0gRQlgG57p1mPHwLTjaWbPtRDpz5v1Vo8c3lpRSf+Qw1n05hud98qRIEcLEqO9mL4So9cK8XZg6KIoHdy7moQd6s+/juTV27N1vzKBB6nHqFORwx52dauy4QojrI4WKEMIk3BblR0+XImyNpTSaNJbEZdXfspJ1+hwNP3wTgIQHxuNW36fajymEuDFSqAghTEb0T3M42KQ1TkUF1B1yJ8nb9lbr8Q4/+AR18rM4WS+Y6GmvVeuxhBCVI4WKEMJk2NrrCd64msSAxtTJz8KuX1/OJyRWy7HivllEu9W/AJA37SNs7aVvihCmSAoVIYRJca5bB88Na0nxCqDepTSKO3cldX/VTqGQkXQa3/FlI8/u6D2YpvfcXqX7F0JUHSlUhBAmxyPEH5u1q0n18MU/PZUFL83k+IXcKtl3SamRl5YkEOsTykmfYKIWfFUl+xVCVA8pVIQQJsk3KhybzZv4dOBYPm7al6GztrEvOfOm9qlpGi8uPsjys8WMG/YaRWvX4eDuUkWJhRDVQQoVIYTJ8m4SyrB5H9LUz5WLuUU8MmMN296eWal9GUtK+f2Zqfy86xRWOvjk3mgaNWtQxYmFEFVNChUhhEnzdNaz4JH29G7syYeL3yXmpfHsbdeTjKTT172Pgks57LulH3dMe57Jf83ljYHN6BkhtyILYQ6kUBFCmDwXe1s+H9EG265dKLayptXOtVhHNGHbo8+Sn5l11e00o5G4r38mI7Qx0dtXU2RlQ8vbu3Jf+6AaTC+EuBlK5/q5WTLXjxC1z7GVG7B68EEapB4HIM/OgYQOPcl56FG8u3ZEb2NF3v6DFP7+B16/LyT0TNntzefdvLgwczbN7rtDYXohBNzY97dNDWUSQogqEdanC6UnD7PrnZn4fvI+/umptFm/hCdcGrMkruzvrgd3/cbLf34JgMHaln19h9D0q49o5u2pMroQohKkUBFCmB1rWxvavDoR7eUnSPhlBdnzF1DSrh11NT1GTSO5cQv2XeqCoeMthD/zOO0DfVVHFkJUklz6EUIIIUSNupHvb+lMK4QQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWjeoAN0PTNKBsumghhBBCmIe/v7f//h6/FrMuVHJycgAICAhQnEQIIYQQNyonJwc3N7drrqPTrqecMVFGo5HU1FRcXFzQ6XRVuu/s7GwCAgJISUnB1dW1SvdtCuT8zJ+ln6Olnx9Y/jnK+Zm/6jpHTdPIycnBz88PK6tr90Ix6xYVKysr/P39q/UYrq6uFvsBBDk/S2Dp52jp5weWf45yfuavOs7xv1pS/iadaYUQQghhsqRQEUIIIYTJkkLlKvR6Pa+++ip6vV51lGoh52f+LP0cLf38wPLPUc7P/JnCOZp1Z1ohhBBCWDZpURFCCCGEyZJCRQghhBAmSwoVIYQQQpgsKVSEEEIIYbKkULkOt99+O4GBgdjb2+Pr68uIESNITU1VHatKnDx5kgcffJCQkBAcHBwIDQ3l1VdfpaioSHW0KvX222/ToUMHHB0dcXd3Vx3npn322WeEhIRgb29PdHQ0mzZtUh2pymzcuJEBAwbg5+eHTqfjt99+Ux2pSk2ZMoU2bdrg4uKCt7c3d9xxB0eOHFEdq0p9/vnnREVFlQ8SFhMTw4oVK1THqhZTpkxBp9MxceJE1VGqzGuvvYZOp6vwqFevnrI8Uqhch27duvHzzz9z5MgRFi1axPHjxxk0aJDqWFXi8OHDGI1GZs2aRXx8PNOnT+eLL77ghRdeUB2tShUVFTF48GAee+wx1VFu2k8//cTEiRN58cUX2bdvH7fccgt9+/YlOTlZdbQqkZeXR/PmzZk5c6bqKNViw4YNjB07lu3bt7NmzRpKSkro1asXeXl5qqNVGX9/f9599112797N7t276d69OwMHDiQ+Pl51tCq1a9cuZs+eTVRUlOooVa5p06acPXu2/BEXF6cujCZu2O+//67pdDqtqKhIdZRqMXXqVC0kJER1jGoxd+5czc3NTXWMm9K2bVttzJgxFZaFh4drzz//vKJE1QfQFi9erDpGtUpLS9MAbcOGDaqjVKs6depoX375peoYVSYnJ0dr2LChtmbNGq1Lly7ahAkTVEeqMq+++qrWvHlz1THKSYvKDcrIyGD+/Pl06NABW1tb1XGqRVZWFh4eHqpjiCsoKipiz5499OrVq8LyXr16sXXrVkWpxM3IysoCsNjfudLSUhYsWEBeXh4xMTGq41SZsWPH0r9/f3r06KE6SrVITEzEz8+PkJAQhg0bxokTJ5RlkULlOj333HM4OTnh6elJcnIyv//+u+pI1eL48eN88sknjBkzRnUUcQUXL16ktLQUHx+fCst9fHw4d+6colSisjRN46mnnqJTp040a9ZMdZwqFRcXh7OzM3q9njFjxrB48WIiIiJUx6oSCxYsYO/evUyZMkV1lGrRrl07vvvuO1atWsWcOXM4d+4cHTp0ID09XUmeWluoXKmz0L8fu3fvLl9/0qRJ7Nu3j9WrV2Ntbc3999+PZsKD+t7o+QGkpqbSp08fBg8ezEMPPaQo+fWrzDlaCp1OV+G5pmmXLROmb9y4cRw4cIAff/xRdZQq17hxY2JjY9m+fTuPPfYYI0eOJCEhQXWsm5aSksKECROYN28e9vb2quNUi759+3L33XcTGRlJjx49WLZsGQDffvutkjw2So5qAsaNG8ewYcOuuU5wcHD5z3Xr1qVu3bo0atSIJk2aEBAQwPbt2022KfNGzy81NZVu3boRExPD7Nmzqzld1bjRc7QEdevWxdra+rLWk7S0tMtaWYRpGz9+PEuWLGHjxo34+/urjlPl7OzsCAsLA6B169bs2rWLjz76iFmzZilOdnP27NlDWloa0dHR5ctKS0vZuHEjM2fOxGAwYG1trTBh1XNyciIyMpLExEQlx6+1hcrfhUdl/N2SYjAYqjJSlbqR8ztz5gzdunUjOjqauXPnYmVlHg1tN/NvaK7s7OyIjo5mzZo13HnnneXL16xZw8CBAxUmE9dL0zTGjx/P4sWLWb9+PSEhIaoj1QhN00z6/8zrdeutt152B8zo0aMJDw/nueees7giBcq+6w4dOsQtt9yi5Pi1tlC5Xjt37mTnzp106tSJOnXqcOLECV555RVCQ0NNtjXlRqSmptK1a1cCAwP54IMPuHDhQvlrKu+br2rJyclkZGSQnJxMaWkpsbGxAISFheHs7Kw23A166qmnGDFiBK1bty5vAUtOTraYfkW5ubkcO3as/HlSUhKxsbF4eHgQGBioMFnVGDt2LD/88AO///47Li4u5a1jbm5uODg4KE5XNV544QX69u1LQEAAOTk5LFiwgPXr17Ny5UrV0W6ai4vLZf2J/u6/aCn9jJ555hkGDBhAYGAgaWlpvPXWW2RnZzNy5Eg1gVTecmQODhw4oHXr1k3z8PDQ9Hq9FhwcrI0ZM0Y7ffq06mhVYu7cuRpwxYclGTly5BXP8a+//lIdrVI+/fRTLSgoSLOzs9NatWplUbe2/vXXX1f8txo5cqTqaFXiar9vc+fOVR2tyjzwwAPln08vLy/t1ltv1VavXq06VrWxtNuThw4dqvn6+mq2traan5+fdtddd2nx8fHK8ug0zYR7hAohhBCiVjOPzghCCCGEqJWkUBFCCCGEyZJCRQghhBAmSwoVIYQQQpgsKVSEEEIIYbKkUBFCCCGEyZJCRQghhBAmSwoVIYQQQpgsKVSEsEBdu3Zl4sSJqmNcUXp6Ot7e3pw8eRKA9evXo9PpuHTpUrUet7LH+eabb3B3d7+hbdq0acOvv/56Q9sIIa5MChUhxH86e/Ys9957L40bN8bKyuqqRdCiRYuIiIhAr9cTERHB4sWLL1tnypQpDBgwwOJmtv6nl19+meeffx6j0ag6ihBmTwoVIcR/MhgMeHl58eKLL9K8efMrrrNt2zaGDh3KiBEj2L9/PyNGjGDIkCHs2LGjfJ2CggK++uorHnrooZqKrkT//v3Jyspi1apVqqMIYfakUBHCwmVmZnL//fdTp04dHB0d6du3L4mJiRXWmTNnDgEBATg6OnLnnXcybdq0Cpc7goOD+eijj7j//vtxc3O74nFmzJhBz549mTx5MuHh4UyePJlbb72VGTNmlK+zYsUKbGxsrjnzeHp6Ovfccw/+/v44OjoSGRnJjz/+WGGdrl27Mn78eCZOnEidOnXw8fFh9uzZ5OXlMXr0aFxcXAgNDWXFihWX7X/Lli00b94ce3t72rVrR1xcXIXXv/nmGwIDA8vfi/T09AqvHz9+nIEDB+Lj44OzszNt2rRh7dq1FdaxtramX79+l+UWQtw4KVSEsHCjRo1i9+7dLFmyhG3btqFpGv369aO4uBgo++IeM2YMEyZMIDY2lp49e/L222/f8HG2bdtGr169Kizr3bs3W7duLX++ceNGWrdufc39FBYWEh0dzdKlSzl48CCPPPIII0aMqNAyA/Dtt99St25ddu7cyfjx43nssccYPHgwHTp0YO/evfTu3ZsRI0aQn59fYbtJkybxwQcfsGvXLry9vbn99tvL34sdO3bwwAMP8PjjjxMbG0u3bt146623Kmyfm5tLv379WLt2Lfv27aN3794MGDCA5OTkCuu1bduWTZs2Xd+bJ4S4OmXzNgshqs3f084fPXpUA7QtW7aUv3bx4kXNwcFB+/nnnzVNK5vSvX///hW2Hz58uObm5nbNff+bra2tNn/+/ArL5s+fr9nZ2ZU/HzhwoPbAAw9UWOevv/7SAC0zM/Oq59OvXz/t6aefrpChU6dO5c9LSko0JycnbcSIEeXLzp49qwHatm3bKhxnwYIF5eukp6drDg4O2k8//aRpmqbdc889Wp8+fSoce+jQoVd9L/4WERGhffLJJxWW/f7775qVlZVWWlp6zW2FENcmLSpCWLBDhw5hY2NDu3btypd5enrSuHFjDh06BMCRI0do27Zthe3+/fx66XS6Cs81TauwrKCgAHt7+2vuo7S0lLfffpuoqCg8PT1xdnZm9erVl7VYREVFlf9sbW2Np6cnkZGR5ct8fHwASEtLq7DdPy87eXh4VHgvDh06dNllqX8/z8vL49lnnyUiIgJ3d3ecnZ05fPjwZfkcHBwwGo0YDIZrnq8Q4tpsVAcQQlQfTdOuuvzvAuLfxcS1truWevXqce7cuQrL0tLSygsGgLp165KZmXnN/Xz44YdMnz6dGTNmEBkZiZOTExMnTqSoqKjCera2thWe63S6Csv+PqfrufPmn+/Ff5k0aRKrVq3igw8+ICwsDAcHBwYNGnRZvoyMDBwdHXFwcPjPfQohrk5aVISwYBEREZSUlFTo35Gens7Ro0dp0qQJAOHh4ezcubPCdrt3777hY8XExLBmzZoKy1avXk2HDh3Kn7ds2ZKEhIRr7mfTpk0MHDiQ++67j+bNm9OgQYPLOv/ejO3bt5f/nJmZydGjRwkPDwfK3q9/vv7v9f/ON2rUKO68804iIyOpV69e+Zgw/3Tw4EFatWpVZbmFqK2kUBHCgjVs2JCBAwfy8MMPs3nzZvbv3899991H/fr1GThwIADjx49n+fLlTJs2jcTERGbNmsWKFSsua2WJjY0lNjaW3NxcLly4QGxsbIWiY8KECaxevZr33nuPw4cP895777F27doKY6707t2b+Pj4a7aqhIWFsWbNGrZu3cqhQ4d49NFHL2upuRlvvPEG69at4+DBg4waNYq6detyxx13APDEE0+wcuVKpk6dytGjR5k5cyYrV668LN+vv/5KbGws+/fv5957771iq82mTZsu61wshLhxUqgIYeHmzp1LdHQ0t912GzExMWiaxvLly8svk3Ts2JEvvviCadOm0bx5c1auXMmTTz55WV+Sli1b0rJlS/bs2cMPP/xAy5Yt6devX/nrHTp0YMGCBcydO5eoqCi++eYbfvrppwr9YyIjI2ndujU///zzVfO+/PLLtGrVit69e9O1a1fq1atXXkhUhXfffZcJEyYQHR3N2bNnWbJkCXZ2dgC0b9+eL7/8kk8++YQWLVqwevVqXnrppQrbT58+nTp16tChQwcGDBhA7969L2s5OXPmDFu3bmX06NFVlluI2kqnVeZitBDCoj388MMcPny4Wm6vXb58Oc888wwHDx7Eysoy/1aaNGkSWVlZzJ49W3UUIcyedKYVQvDBBx/Qs2dPnJycWLFiBd9++y2fffZZtRyrX79+JCYmcubMGQICAqrlGKp5e3vzzDPPqI4hhEWQFhUhBEOGDGH9+vXk5OTQoEEDxo8fz5gxY1THEkIIKVSEEEIIYbos8wKxEEIIISyCFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiT9X+fZkvEqg05YAAAAABJRU5ErkJggg==", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "id": "ac654a70", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", @@ -2887,34 +2058,228 @@ "plt.show()" ] }, + { + "cell_type": "markdown", + "id": "84ccde87", + "metadata": { + "editable": true + }, + "source": [ + "## More examples on bootstrap and cross-validation and errors" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "631a50c9", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.utils import resample\n", + "from sklearn.metrics import mean_squared_error\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "testerror = np.zeros(Maxpolydegree)\n", + "trainingerror = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "\n", + "trials = 100\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + "\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " testerror[polydegree] = 0.0\n", + " trainingerror[polydegree] = 0.0\n", + " for samples in range(trials):\n", + " x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n", + " model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n", + " ypred = model.predict(x_train)\n", + " ytilde = model.predict(x_test)\n", + " testerror[polydegree] += mean_squared_error(y_test, ytilde)\n", + " trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n", + "\n", + " testerror[polydegree] /= trials\n", + " trainingerror[polydegree] /= trials\n", + " print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n", + " print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n", + " print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n", + "\n", + "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n", + "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "45c7bf7f", + "metadata": { + "editable": true + }, + "source": [ + "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." + ] + }, + { + "cell_type": "markdown", + "id": "5d58c073", + "metadata": { + "editable": true + }, + "source": [ + "## The same example but now with cross-validation\n", + "\n", + "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error." + ] + }, { "cell_type": "code", - "execution_count": null, - "id": "884e5e5a", - "metadata": {}, + "execution_count": 8, + "id": "6e8fb6ba", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], - "source": [] + "source": [ + "# Common imports\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import KFold\n", + "from sklearn.model_selection import cross_val_score\n", + "\n", + "\n", + "# Where to save the figures and data files\n", + "PROJECT_ROOT_DIR = \"Results\"\n", + "FIGURE_ID = \"Results/FigureFiles\"\n", + "DATA_ID = \"DataFiles/\"\n", + "\n", + "if not os.path.exists(PROJECT_ROOT_DIR):\n", + " os.mkdir(PROJECT_ROOT_DIR)\n", + "\n", + "if not os.path.exists(FIGURE_ID):\n", + " os.makedirs(FIGURE_ID)\n", + "\n", + "if not os.path.exists(DATA_ID):\n", + " os.makedirs(DATA_ID)\n", + "\n", + "def image_path(fig_id):\n", + " return os.path.join(FIGURE_ID, fig_id)\n", + "\n", + "def data_path(dat_id):\n", + " return os.path.join(DATA_ID, dat_id)\n", + "\n", + "def save_fig(fig_id):\n", + " plt.savefig(image_path(fig_id) + \".png\", format='png')\n", + "\n", + "infile = open(data_path(\"EoS.csv\"),'r')\n", + "\n", + "# Read the EoS data as csv file and organize the data into two arrays with density and energies\n", + "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n", + "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n", + "EoS = EoS.dropna()\n", + "Energies = EoS['Energy']\n", + "Density = EoS['Density']\n", + "# The design matrix now as function of various polytrops\n", + "\n", + "Maxpolydegree = 30\n", + "X = np.zeros((len(Density),Maxpolydegree))\n", + "X[:,0] = 1.0\n", + "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n", + "polynomial = np.zeros(Maxpolydegree)\n", + "k =5\n", + "kfold = KFold(n_splits = k)\n", + "\n", + "for polydegree in range(1, Maxpolydegree):\n", + " polynomial[polydegree] = polydegree\n", + " for degree in range(polydegree):\n", + " X[:,degree] = Density**(degree/3.0)\n", + " OLS = LinearRegression(fit_intercept=False)\n", + "# loop over trials in order to estimate the expectation value of the MSE\n", + " estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n", + "#[:, np.newaxis]\n", + " estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n", + "\n", + "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n", + "plt.xlabel('Polynomial degree')\n", + "plt.ylabel('log10[MSE]')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "0c13445c", + "metadata": { + "editable": true + }, + "source": [ + "## Material for the lab sessions\n", + "\n", + "This week we will discuss during the first hour of each lab session\n", + "some technicalities related to the project and methods for updating\n", + "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n", + "the jupyter-notebook from week 37 (September 12-16).\n", + "\n", + "For the lab session, the following video on cross validation (from 2024), could be helpful, see \n", + "\n", + "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at " + ] } ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.18" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/doc/pub/week39/html/._week39-bs000.html b/doc/pub/week39/html/._week39-bs000.html new file mode 100644 index 000000000..85a342f93 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs000.html @@ -0,0 +1,313 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +

    +
    +

     

     

     

    + + +
    +
    +

    Week 39: Resampling methods and logistic regression

    +
    + + +
    +Morten Hjorth-Jensen +
    + +
    +Department of Physics, University of Oslo +
    +
    +
    +

    Week 39

    +
    +
    + + + +

    Read »

    + + +
    + +

    + +

    + +
    + + + + +
    + © 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license +
    + + + diff --git a/doc/pub/week39/html/._week39-bs001.html b/doc/pub/week39/html/._week39-bs001.html new file mode 100644 index 000000000..02a5393e5 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs001.html @@ -0,0 +1,304 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +
    +
    +

     

     

     

    + + +

    Plan for week 39, September 22-26, 2025

    + +
    +
    + +
      +
    1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff
    2. +
    3. Logistic regression, our first classification encounter and a stepping stone towards neural networks
    4. +
    5. Video of lecture
    6. +
    7. Whiteboard notes
    8. +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs002.html b/doc/pub/week39/html/._week39-bs002.html new file mode 100644 index 000000000..fc1a16a34 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs002.html @@ -0,0 +1,305 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +

    +
    +

     

     

     

    + + +

    Readings and Videos, resampling methods

    +
    +
    + +
      +
    1. Raschka et al, pages 175-192
    2. +
    3. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See https://link.springer.com/book/10.1007/978-0-387-84858-7.
    4. +
    5. Video on bias-variance tradeoff
    6. +
    7. Video on Bootstrapping
    8. +
    9. Video on cross validation
    10. +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs003.html b/doc/pub/week39/html/._week39-bs003.html new file mode 100644 index 000000000..c849cd54d --- /dev/null +++ b/doc/pub/week39/html/._week39-bs003.html @@ -0,0 +1,305 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +
    +
    +

     

     

     

    + + +

    Readings and Videos, logistic regression

    +
    +
    + +
      +
    1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression
    2. +
    3. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
    4. +
    5. Video on Logistic regression
    6. +
    7. Yet another video on logistic regression
    8. +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs004.html b/doc/pub/week39/html/._week39-bs004.html new file mode 100644 index 000000000..3b5cafe9a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs004.html @@ -0,0 +1,308 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +
    +
    +

     

     

     

    + + +

    Lab sessions week 39

    + +
    +
    + +
      +
    1. Discussions on how to structure your report for the first project
    2. +
    3. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.
    4. +
    5. Work on project 1, in particular resampling methods like cross-validation and bootstrap. For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11.
    6. +
    7. Video on how to write scientific reports recorded during one of the lab sessions
    8. +
    9. A general guideline can be found at https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md.
    10. +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs005.html b/doc/pub/week39/html/._week39-bs005.html new file mode 100644 index 000000000..758e7f551 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs005.html @@ -0,0 +1,295 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +
    +
    +

     

     

     

    + + +

    Lecture material

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs006.html b/doc/pub/week39/html/._week39-bs006.html new file mode 100644 index 000000000..ba36aeeaf --- /dev/null +++ b/doc/pub/week39/html/._week39-bs006.html @@ -0,0 +1,320 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +
    +
    +

     

     

     

    + + +

    Resampling methods

    +
    +
    + +

    Resampling methods are an indispensable tool in modern +statistics. They involve repeatedly drawing samples from a training +set and refitting a model of interest on each sample in order to +obtain additional information about the fitted model. For example, in +order to estimate the variability of a linear regression fit, we can +repeatedly draw different samples from the training data, fit a linear +regression to each new sample, and then examine the extent to which +the resulting fits differ. Such an approach may allow us to obtain +information that would not be available from fitting the model only +once using the original training sample. +

    + +

    Two resampling methods are often used in Machine Learning analyses,

    +
      +
    1. The bootstrap method
    2. +
    3. and Cross-Validation
    4. +
    +

    In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.

    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs007.html b/doc/pub/week39/html/._week39-bs007.html new file mode 100644 index 000000000..b69069758 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs007.html @@ -0,0 +1,320 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + +
    +
    +

     

     

     

    + + +

    Resampling approaches can be computationally expensive

    +
    +
    + + +

    Resampling approaches can be computationally expensive, because they +involve fitting the same statistical method multiple times using +different subsets of the training data. However, due to recent +advances in computing power, the computational requirements of +resampling methods generally are not prohibitive. In this chapter, we +discuss two of the most commonly used resampling methods, +cross-validation and the bootstrap. Both methods are important tools +in the practical application of many statistical learning +procedures. For example, cross-validation can be used to estimate the +test error associated with a given statistical learning method in +order to evaluate its performance, or to select the appropriate level +of flexibility. The process of evaluating a model’s performance is +known as model assessment, whereas the process of selecting the proper +level of flexibility for a model is known as model selection. The +bootstrap is widely used. +

    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs008.html b/doc/pub/week39/html/._week39-bs008.html new file mode 100644 index 000000000..3cd11c9d3 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs008.html @@ -0,0 +1,310 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Why resampling methods ?

    +
    +
    + + +
      +
    • Our simulations can be treated as computer experiments. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
    • +
    • The results can be analysed with the same statistical tools as we would use when analysing experimental data.
    • +
    • As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
    • +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs009.html b/doc/pub/week39/html/._week39-bs009.html new file mode 100644 index 000000000..39760f974 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs009.html @@ -0,0 +1,315 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Statistical analysis

    +
    +
    + + +
      +
    • As in other experiments, many numerical experiments have two classes of errors:
    • +
        +
      • Statistical errors
      • +
      • Systematical errors
      • +
      +
    • Statistical errors can be estimated using standard tools from statistics
    • +
    • Systematical errors are method specific and must be treated differently from case to case.
    • +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs010.html b/doc/pub/week39/html/._week39-bs010.html new file mode 100644 index 000000000..d06eaf79a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs010.html @@ -0,0 +1,323 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Resampling methods

    + +

    With all these analytical equations for both the OLS and Ridge +regression, we will now outline how to assess a given model. This will +lead to a discussion of the so-called bias-variance tradeoff (see +below) and so-called resampling methods. +

    + +

    One of the quantities we have discussed as a way to measure errors is +the mean-squared error (MSE), mainly used for fitting of continuous +functions. Another choice is the absolute error. +

    + +

    In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data, +we discuss the +

    +
      +
    1. prediction error or simply the test error \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the
    2. +
    3. training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.
    4. +
    +

    As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error. +For a certain level of complexity the test error will reach minimum, before starting to increase again. The +training error reaches a saturation. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs011.html b/doc/pub/week39/html/._week39-bs011.html new file mode 100644 index 000000000..c3b9250f3 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs011.html @@ -0,0 +1,319 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Resampling methods: Bootstrap

    +
    +
    + +

    Bootstrapping is a non-parametric approach to statistical inference +that substitutes computation for more traditional distributional +assumptions and asymptotic results. Bootstrapping offers a number of +advantages: +

    +
      +
    1. The bootstrap is quite general, although there are some cases in which it fails.
    2. +
    3. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.
    4. +
    5. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.
    6. +
    7. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
    8. +
    +
    +
    + + +

    The textbook by Davison on the Bootstrap Methods and their Applications provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by Efron and Tibshirani.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs012.html b/doc/pub/week39/html/._week39-bs012.html new file mode 100644 index 000000000..4b431ef09 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs012.html @@ -0,0 +1,359 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The bias-variance tradeoff

    + +

    We will discuss the bias-variance tradeoff in the context of +continuous predictions such as regression. However, many of the +intuitions and ideas discussed here also carry over to classification +tasks. Consider a dataset \( \mathcal{D} \) consisting of the data +\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). +

    + +

    Let us assume that the true data is generated from a noisy model

    + +$$ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon} +$$ + +

    where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).

    + +

    In our derivation of the ordinary least squares method we defined then +an approximation to the function \( f \) in terms of the parameters +\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, +that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). +

    + +

    Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function

    +$$ +C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +$$ + +

    We can rewrite this as

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2. +$$ + +

    The three terms represent the square of the bias of the learning +method, which can be thought of as the error caused by the simplifying +assumptions built into the method. The second term represents the +variance of the chosen model and finally the last terms is variance of +the error \( \boldsymbol{\epsilon} \). +

    + +

    To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \). +We use a more compact notation in terms of the expectation value +

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right], +$$ + +

    and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right], +$$ + +

    which, using the abovementioned expectation values can be rewritten as

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2, +$$ + +

    that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).

    + +Note that in order to derive these equations we have assumed we can replace the unknown function \( \boldsymbol{f} \) with the target/output data \( \boldsymbol{y} \). + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs013.html b/doc/pub/week39/html/._week39-bs013.html new file mode 100644 index 000000000..397bbd422 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs013.html @@ -0,0 +1,306 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    A way to Read the Bias-Variance Tradeoff

    + +

    +
    +

    +
    +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs014.html b/doc/pub/week39/html/._week39-bs014.html new file mode 100644 index 000000000..fb75736a4 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs014.html @@ -0,0 +1,368 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Understanding what happens

    + + +
    +
    +
    +
    +
    +
    import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.model_selection import train_test_split
    +from sklearn.pipeline import make_pipeline
    +from sklearn.utils import resample
    +
    +np.random.seed(2018)
    +
    +n = 40
    +n_boostraps = 100
    +maxdegree = 14
    +
    +
    +# Make data set.
    +x = np.linspace(-3, 3, n).reshape(-1, 1)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
    +error = np.zeros(maxdegree)
    +bias = np.zeros(maxdegree)
    +variance = np.zeros(maxdegree)
    +polydegree = np.zeros(maxdegree)
    +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    +
    +for degree in range(maxdegree):
    +    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    +    y_pred = np.empty((y_test.shape[0], n_boostraps))
    +    for i in range(n_boostraps):
    +        x_, y_ = resample(x_train, y_train)
    +        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    +
    +    polydegree[degree] = degree
    +    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    +    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    +    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    +    print('Polynomial degree:', degree)
    +    print('Error:', error[degree])
    +    print('Bias^2:', bias[degree])
    +    print('Var:', variance[degree])
    +    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
    +
    +plt.plot(polydegree, error, label='Error')
    +plt.plot(polydegree, bias, label='bias')
    +plt.plot(polydegree, variance, label='Variance')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs015.html b/doc/pub/week39/html/._week39-bs015.html new file mode 100644 index 000000000..bf900503b --- /dev/null +++ b/doc/pub/week39/html/._week39-bs015.html @@ -0,0 +1,331 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Summing up

    + +

    The bias-variance tradeoff summarizes the fundamental tension in +machine learning, particularly supervised learning, between the +complexity of a model and the amount of training data needed to train +it. Since data is often limited, in practice it is often useful to +use a less-complex model with higher bias, that is a model whose asymptotic +performance is worse than another model because it is easier to +train and less sensitive to sampling noise arising from having a +finite-sized training dataset (smaller variance). +

    + +

    The above equations tell us that in +order to minimize the expected test error, we need to select a +statistical learning method that simultaneously achieves low variance +and low bias. Note that variance is inherently a nonnegative quantity, +and squared bias is also nonnegative. Hence, we see that the expected +test MSE can never lie below \( Var(\epsilon) \), the irreducible error. +

    + +

    What do we mean by the variance and bias of a statistical learning +method? The variance refers to the amount by which our model would change if we +estimated it using a different training data set. Since the training +data are used to fit the statistical learning method, different +training data sets will result in a different estimate. But ideally the +estimate for our model should not vary too much between training +sets. However, if a method has high variance then small changes in +the training data can result in large changes in the model. In general, more +flexible statistical methods have higher variance. +

    + +

    You may also find this recent article of interest.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs016.html b/doc/pub/week39/html/._week39-bs016.html new file mode 100644 index 000000000..0830eaab2 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs016.html @@ -0,0 +1,389 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Another Example from Scikit-Learn's Repository

    + +

    This example demonstrates the problems of underfitting and overfitting and +how we can use linear regression with polynomial features to approximate +nonlinear functions. The plot shows the function that we want to approximate, +which is a part of the cosine function. In addition, the samples from the +real function and the approximations of different models are displayed. The +models have polynomial features of different degrees. We can see that a +linear function (polynomial with degree 1) is not sufficient to fit the +training samples. This is called underfitting. A polynomial of degree 4 +approximates the true function almost perfectly. However, for higher degrees +the model will overfit the training data, i.e. it learns the noise of the +training data. +We evaluate quantitatively overfitting and underfitting by using +cross-validation. We calculate the mean squared error (MSE) on the validation +set, the higher, the less likely the model generalizes correctly from the +training data. +

    + + + +
    +
    +
    +
    +
    +
    #print(__doc__)
    +
    +import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.pipeline import Pipeline
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.linear_model import LinearRegression
    +from sklearn.model_selection import cross_val_score
    +
    +
    +def true_fun(X):
    +    return np.cos(1.5 * np.pi * X)
    +
    +np.random.seed(0)
    +
    +n_samples = 30
    +degrees = [1, 4, 15]
    +
    +X = np.sort(np.random.rand(n_samples))
    +y = true_fun(X) + np.random.randn(n_samples) * 0.1
    +
    +plt.figure(figsize=(14, 5))
    +for i in range(len(degrees)):
    +    ax = plt.subplot(1, len(degrees), i + 1)
    +    plt.setp(ax, xticks=(), yticks=())
    +
    +    polynomial_features = PolynomialFeatures(degree=degrees[i],
    +                                             include_bias=False)
    +    linear_regression = LinearRegression()
    +    pipeline = Pipeline([("polynomial_features", polynomial_features),
    +                         ("linear_regression", linear_regression)])
    +    pipeline.fit(X[:, np.newaxis], y)
    +
    +    # Evaluate the models using crossvalidation
    +    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
    +                             scoring="neg_mean_squared_error", cv=10)
    +
    +    X_test = np.linspace(0, 1, 100)
    +    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    +    plt.plot(X_test, true_fun(X_test), label="True function")
    +    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
    +    plt.xlabel("x")
    +    plt.ylabel("y")
    +    plt.xlim((0, 1))
    +    plt.ylim((-2, 2))
    +    plt.legend(loc="best")
    +    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
    +        degrees[i], -scores.mean(), scores.std()))
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs017.html b/doc/pub/week39/html/._week39-bs017.html new file mode 100644 index 000000000..82e5eaa7a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs017.html @@ -0,0 +1,316 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Various steps in cross-validation

    + +

    When the repetitive splitting of the data set is done randomly, +samples may accidently end up in a fast majority of the splits in +either training or test set. Such samples may have an unbalanced +influence on either model building or prediction evaluation. To avoid +this \( k \)-fold cross-validation structures the data splitting. The +samples are divided into \( k \) more or less equally sized exhaustive and +mutually exclusive subsets. In turn (at each split) one of these +subsets plays the role of the test set while the union of the +remaining subsets constitutes the training set. Such a splitting +warrants a balanced representation of each sample in both training and +test set over the splits. Still the division into the \( k \) subsets +involves a degree of randomness. This may be fully excluded when +choosing \( k=n \). This particular case is referred to as leave-one-out +cross-validation (LOOCV). +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs018.html b/doc/pub/week39/html/._week39-bs018.html new file mode 100644 index 000000000..a4fd4bad7 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs018.html @@ -0,0 +1,314 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Cross-validation in brief

    + +

    For the various values of \( k \)

    + +
      +
    1. shuffle the dataset randomly.
    2. +
    3. Split the dataset into \( k \) groups.
    4. +
    5. For each unique group: +
        +
      1. Decide which group to use as set for test data
      2. +
      3. Take the remaining groups as a training data set
      4. +
      5. Fit a model on the training set and evaluate it on the test set
      6. +
      7. Retain the evaluation score and discard the model
      8. +
      +
    6. Summarize the model using the sample of model evaluation scores
    7. +
    +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs019.html b/doc/pub/week39/html/._week39-bs019.html new file mode 100644 index 000000000..bc6713f40 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs019.html @@ -0,0 +1,413 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Code Example for Cross-validation and \( k \)-fold Cross-validation

    + +

    The code here uses Ridge regression with cross-validation (CV) resampling and \( k \)-fold CV in order to fit a specific polynomial.

    + + +
    +
    +
    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import KFold
    +from sklearn.linear_model import Ridge
    +from sklearn.model_selection import cross_val_score
    +from sklearn.preprocessing import PolynomialFeatures
    +
    +# A seed just to ensure that the random numbers are the same for every run.
    +# Useful for eventual debugging.
    +np.random.seed(3155)
    +
    +# Generate the data.
    +nsamples = 100
    +x = np.random.randn(nsamples)
    +y = 3*x**2 + np.random.randn(nsamples)
    +
    +## Cross-validation on Ridge regression using KFold only
    +
    +# Decide degree on polynomial to fit
    +poly = PolynomialFeatures(degree = 6)
    +
    +# Decide which values of lambda to use
    +nlambdas = 500
    +lambdas = np.logspace(-3, 5, nlambdas)
    +
    +# Initialize a KFold instance
    +k = 5
    +kfold = KFold(n_splits = k)
    +
    +# Perform the cross-validation to estimate MSE
    +scores_KFold = np.zeros((nlambdas, k))
    +
    +i = 0
    +for lmb in lambdas:
    +    ridge = Ridge(alpha = lmb)
    +    j = 0
    +    for train_inds, test_inds in kfold.split(x):
    +        xtrain = x[train_inds]
    +        ytrain = y[train_inds]
    +
    +        xtest = x[test_inds]
    +        ytest = y[test_inds]
    +
    +        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
    +        ridge.fit(Xtrain, ytrain[:, np.newaxis])
    +
    +        Xtest = poly.fit_transform(xtest[:, np.newaxis])
    +        ypred = ridge.predict(Xtest)
    +
    +        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
    +
    +        j += 1
    +    i += 1
    +
    +
    +estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
    +
    +## Cross-validation using cross_val_score from sklearn along with KFold
    +
    +# kfold is an instance initialized above as:
    +# kfold = KFold(n_splits = k)
    +
    +estimated_mse_sklearn = np.zeros(nlambdas)
    +i = 0
    +for lmb in lambdas:
    +    ridge = Ridge(alpha = lmb)
    +
    +    X = poly.fit_transform(x[:, np.newaxis])
    +    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
    +
    +    # cross_val_score return an array containing the estimated negative mse for every fold.
    +    # we have to the the mean of every array in order to get an estimate of the mse of the model
    +    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
    +
    +    i += 1
    +
    +## Plot and compare the slightly different ways to perform cross-validation
    +
    +plt.figure()
    +
    +plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
    +#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
    +
    +plt.xlabel('log10(lambda)')
    +plt.ylabel('mse')
    +
    +plt.legend()
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs020.html b/doc/pub/week39/html/._week39-bs020.html new file mode 100644 index 000000000..6ca08c43a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs020.html @@ -0,0 +1,402 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    More examples on bootstrap and cross-validation and errors

    + + + +
    +
    +
    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +testerror = np.zeros(Maxpolydegree)
    +trainingerror = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +
    +trials = 100
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +
    +# loop over trials in order to estimate the expectation value of the MSE
    +    testerror[polydegree] = 0.0
    +    trainingerror[polydegree] = 0.0
    +    for samples in range(trials):
    +        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    +        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    +        ypred = model.predict(x_train)
    +        ytilde = model.predict(x_test)
    +        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    +        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    +
    +    testerror[polydegree] /= trials
    +    trainingerror[polydegree] /= trials
    +    print("Degree of polynomial: %3d"% polynomial[polydegree])
    +    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    +    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    +
    +plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    +plt.plot(polynomial, np.log10(testerror), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs021.html b/doc/pub/week39/html/._week39-bs021.html new file mode 100644 index 000000000..6960fea7a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs021.html @@ -0,0 +1,391 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The same example but now with cross-validation

    + +

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    + + +
    +
    +
    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.metrics import mean_squared_error
    +from sklearn.model_selection import KFold
    +from sklearn.model_selection import cross_val_score
    +
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +estimated_mse_sklearn = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +k =5
    +kfold = KFold(n_splits = k)
    +
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +        OLS = LinearRegression(fit_intercept=False)
    +# loop over trials in order to estimate the expectation value of the MSE
    +    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    +#[:, np.newaxis]
    +    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    +
    +plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs022.html b/doc/pub/week39/html/._week39-bs022.html new file mode 100644 index 000000000..273d94a9d --- /dev/null +++ b/doc/pub/week39/html/._week39-bs022.html @@ -0,0 +1,313 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Logistic Regression

    + +

    In linear regression our main interest was centered on learning the +coefficients of a functional fit (say a polynomial) in order to be +able to predict the response of a continuous variable on some unseen +data. The fit to the continuous variable \( y_i \) is based on some +independent variables \( \boldsymbol{x}_i \). Linear regression resulted in +analytical expressions for standard ordinary Least Squares or Ridge +regression (in terms of matrices to invert) for several quantities, +ranging from the variance and thereby the confidence intervals of the +parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert +the product of the design matrices, linear regression gives then a +simple recipe for fitting our data. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs023.html b/doc/pub/week39/html/._week39-bs023.html new file mode 100644 index 000000000..17e09275e --- /dev/null +++ b/doc/pub/week39/html/._week39-bs023.html @@ -0,0 +1,318 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Classification problems

    + +

    Classification problems, however, are concerned with outcomes taking +the form of discrete variables (i.e. categories). We may for example, +on the basis of DNA sequencing for a number of patients, like to find +out which mutations are important for a certain disease; or based on +scans of various patients' brains, figure out if there is a tumor or +not; or given a specific physical system, we'd like to identify its +state, say whether it is an ordered or disordered system (typical +situation in solid state physics); or classify the status of a +patient, whether she/he has a stroke or not and many other similar +situations. +

    + +

    The most common situation we encounter when we apply logistic +regression is that of two possible outcomes, normally denoted as a +binary outcome, true or false, positive or negative, success or +failure etc. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs024.html b/doc/pub/week39/html/._week39-bs024.html new file mode 100644 index 000000000..8f8552d70 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs024.html @@ -0,0 +1,316 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Optimization and Deep learning

    + +

    Logistic regression will also serve as our stepping stone towards +neural network algorithms and supervised deep learning. For logistic +learning, the minimization of the cost function leads to a non-linear +equation in the parameters \( \boldsymbol{\theta} \). The optimization of the +problem calls therefore for minimization algorithms. This forms the +bottle neck of all machine learning algorithms, namely how to find +reliable minima of a multi-variable function. This leads us to the +family of gradient descent methods. The latter are the working horses +of basically all modern machine learning algorithms. +

    + +

    We note also that many of the topics discussed here on logistic +regression are also commonly used in modern supervised Deep Learning +models, as we will see later. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs025.html b/doc/pub/week39/html/._week39-bs025.html new file mode 100644 index 000000000..1abef133f --- /dev/null +++ b/doc/pub/week39/html/._week39-bs025.html @@ -0,0 +1,323 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Basics

    + +

    We consider the case where the outputs/targets, also called the +responses or the outcomes, \( y_i \) are discrete and only take values +from \( k=0,\dots,K-1 \) (i.e. \( K \) classes). +

    + +

    The goal is to predict the +output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \) +made of \( n \) samples, each of which carries \( p \) features or predictors. The +primary goal is to identify the classes to which new unseen samples +belong. +

    + +

    Let us specialize to the case of two classes only, with outputs +\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a +credit card user that could default or not on her/his credit card +debt. That is +

    + +$$ +y_i = \begin{bmatrix} 0 & \mathrm{no}\\ 1 & \mathrm{yes} \end{bmatrix}. +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs026.html b/doc/pub/week39/html/._week39-bs026.html new file mode 100644 index 000000000..9e1c87421 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs026.html @@ -0,0 +1,320 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Linear classifier

    + +

    Before moving to the logistic model, let us try to use our linear +regression model to classify these two outcomes. We could for example +fit a linear model to the default case if \( y_i > 0.5 \) and the no +default case \( y_i \leq 0.5 \). +

    + +

    We would then have our +weighted linear combination, namely +

    +$$ +\begin{equation} +\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} + \boldsymbol{\epsilon}, +\tag{1} +\end{equation} +$$ + +

    where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our +\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs027.html b/doc/pub/week39/html/._week39-bs027.html new file mode 100644 index 000000000..6622d37d3 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs027.html @@ -0,0 +1,320 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Some selected properties

    + +

    The main problem with our function is that it takes values on the +entire real axis. In the case of logistic regression, however, the +labels \( y_i \) are discrete variables. A typical example is the credit +card data discussed below here, where we can set the state of +defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons +in the data set (see the full example below). +

    + +

    One simple way to get a discrete output is to have sign +functions that map the output of a linear regressor to values \( \{0,1\} \), +\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. +We will encounter this model in our first demonstration of neural networks. +

    + +

    Historically it is called the perceptron model in the machine learning +literature. This model is extremely simple. However, in many cases it is more +favorable to use a ``soft" classifier that outputs +the probability of a given category. This leads us to the logistic function. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs028.html b/doc/pub/week39/html/._week39-bs028.html new file mode 100644 index 000000000..8b3ec2aca --- /dev/null +++ b/doc/pub/week39/html/._week39-bs028.html @@ -0,0 +1,378 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Simple example

    + +

    The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.

    + + + +
    +
    +
    +
    +
    +
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +from IPython.display import display
    +from pylab import plt, mpl
    +mpl.rcParams['font.family'] = 'serif'
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("chddata.csv"),'r')
    +
    +# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
    +chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
    +chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
    +output = chd['CHD']
    +age = chd['Age']
    +agegroup = chd['Agegroup']
    +numberID  = chd['ID'] 
    +display(chd)
    +
    +plt.scatter(age, output, marker='o')
    +plt.axis([18,70.0,-0.1, 1.2])
    +plt.xlabel(r'Age')
    +plt.ylabel(r'CHD')
    +plt.title(r'Age distribution and Coronary heart disease')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs029.html b/doc/pub/week39/html/._week39-bs029.html new file mode 100644 index 000000000..4eb185a8a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs029.html @@ -0,0 +1,351 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Plotting the mean value for each group

    + +

    What we could attempt however is to plot the mean value for each group.

    + + + +
    +
    +
    +
    +
    +
    agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
    +group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
    +plt.plot(group, agegroupmean, "r-")
    +plt.axis([0,9,0, 1.0])
    +plt.xlabel(r'Age group')
    +plt.ylabel(r'CHD mean values')
    +plt.title(r'Mean values for each age group')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \). +In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model +

    +$$ +f(y_i\vert x_i)=\theta_0+\theta_1 x_i. +$$ + +

    This expression implies however that \( f(y_i\vert x_i) \) could take any +value from minus infinity to plus infinity. If we however let +\( f(y\vert y) \) be represented by the mean value, the above example +shows us that we can constrain the function to take values between +zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking +at our last curve we see also that it has an S-shaped form. This leads +us to a very popular model for the function \( f \), namely the so-called +Sigmoid function or logistic model. We will consider this function as +representing the probability for finding a value of \( y_i \) with a given +\( x_i \). +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs030.html b/doc/pub/week39/html/._week39-bs030.html new file mode 100644 index 000000000..5bafd9ea0 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs030.html @@ -0,0 +1,318 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The logistic function

    + +

    Another widely studied model, is the so-called +perceptron model, which is an example of a "hard classification" model. We +will encounter this model when we discuss neural networks as +well. Each datapoint is deterministically assigned to a category (i.e +\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a "soft" +classifier that outputs the probability of a given category rather +than a single value. For example, given \( x_i \), the classifier +outputs the probability of being in a category \( k \). Logistic regression +is the most common example of a so-called soft classifier. In logistic +regression, the probability that a data point \( x_i \) +belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, +

    +$$ +p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}. +$$ + +

    Note that \( 1-p(t)= p(-t) \).

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs031.html b/doc/pub/week39/html/._week39-bs031.html new file mode 100644 index 000000000..2406ff009 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs031.html @@ -0,0 +1,379 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Examples of likelihood functions used in logistic regression and nueral networks

    + +

    The following code plots the logistic function, the step function and other functions we will encounter from here and on.

    + + + +
    +
    +
    +
    +
    +
    """The sigmoid function (or the logistic curve) is a
    +function that takes any real number, z, and outputs a number (0,1).
    +It is useful in neural networks for assigning weights on a relative scale.
    +The value z is the weighted sum of parameters involved in the learning algorithm."""
    +
    +import numpy
    +import matplotlib.pyplot as plt
    +import math as mt
    +
    +z = numpy.arange(-5, 5, .1)
    +sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
    +sigma = sigma_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, sigma)
    +ax.set_ylim([-0.1, 1.1])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sigmoid function')
    +
    +plt.show()
    +
    +"""Step Function"""
    +z = numpy.arange(-5, 5, .02)
    +step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
    +step = step_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, step)
    +ax.set_ylim([-0.5, 1.5])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('step function')
    +
    +plt.show()
    +
    +"""tanh Function"""
    +z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
    +t = numpy.tanh(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, t)
    +ax.set_ylim([-1.0, 1.0])
    +ax.set_xlim([-2*mt.pi,2*mt.pi])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('tanh function')
    +
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs032.html b/doc/pub/week39/html/._week39-bs032.html new file mode 100644 index 000000000..eb3593307 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs032.html @@ -0,0 +1,316 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Two parameters

    + +

    We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities

    +$$ +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} +$$ + +

    where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \).

    + +

    Note that we used

    +$$ +p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}). +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs033.html b/doc/pub/week39/html/._week39-bs033.html new file mode 100644 index 000000000..3e0a2070c --- /dev/null +++ b/doc/pub/week39/html/._week39-bs033.html @@ -0,0 +1,319 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Maximum likelihood

    + +

    In order to define the total likelihood for all possible outcomes from a +dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels +\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called Maximum Likelihood Estimation (MLE) principle. +We aim thus at maximizing +the probability of seeing the observed data. We can then approximate the +likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is +

    +$$ +\begin{align*} +P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\ +\end{align*} +$$ + +

    from which we obtain the log-likelihood and our cost/loss function

    +$$ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right). +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs034.html b/doc/pub/week39/html/._week39-bs034.html new file mode 100644 index 000000000..bf62b86d5 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs034.html @@ -0,0 +1,316 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The cost function rewritten

    + +

    Reordering the logarithms, we can rewrite the cost/loss function as

    +$$ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). +$$ + +

    The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \). +Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that +

    +$$ +\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). +$$ + +

    This equation is known in statistics as the cross entropy. Finally, we note that just as in linear regression, +in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs035.html b/doc/pub/week39/html/._week39-bs035.html new file mode 100644 index 000000000..b4c9f4ec3 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs035.html @@ -0,0 +1,316 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Minimizing the cross entropy

    + +

    The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and, +therefore, any local minimizer is a global minimizer. +

    + +

    Minimizing this +cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain +

    + +$$ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right), +$$ + +

    and

    +$$ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right). +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs036.html b/doc/pub/week39/html/._week39-bs036.html new file mode 100644 index 000000000..82971218b --- /dev/null +++ b/doc/pub/week39/html/._week39-bs036.html @@ -0,0 +1,316 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    A more compact expression

    + +

    Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an +\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a +vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first +derivative of the cost function as +

    + +$$ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). +$$ + +

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements +\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as +

    + +$$ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs037.html b/doc/pub/week39/html/._week39-bs037.html new file mode 100644 index 000000000..98aa82919 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs037.html @@ -0,0 +1,307 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Extending to more predictors

    + +

    Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors

    +$$ +\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p. +$$ + +

    Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to

    +$$ +p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}. +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs038.html b/doc/pub/week39/html/._week39-bs038.html new file mode 100644 index 000000000..206eda2de --- /dev/null +++ b/doc/pub/week39/html/._week39-bs038.html @@ -0,0 +1,318 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Including more classes

    + +

    Till now we have mainly focused on two classes, the so-called binary +system. Suppose we wish to extend to \( K \) classes. Let us for the sake +of simplicity assume we have only two predictors. We have then following model +

    + +$$ +\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1, +$$ + +

    and

    +$$ +\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1, +$$ + +

    and so on till the class \( C=K-1 \) class

    +$$ +\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1, +$$ + +

    and the model is specified in term of \( K-1 \) so-called log-odds or +logit transformations. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs039.html b/doc/pub/week39/html/._week39-bs039.html new file mode 100644 index 000000000..88282c44d --- /dev/null +++ b/doc/pub/week39/html/._week39-bs039.html @@ -0,0 +1,329 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    More classes

    + +

    In our discussion of neural networks we will encounter the above again +in terms of a slightly modified function, the so-called Softmax function. +

    + +

    The softmax function is used in various multiclass classification +methods, such as multinomial logistic regression (also known as +softmax regression), multiclass linear discriminant analysis, naive +Bayes classifiers, and artificial neural networks. Specifically, in +multinomial logistic regression and linear discriminant analysis, the +input to the function is the result of \( K \) distinct linear functions, +and the predicted probability for the \( k \)-th class given a sample +vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two +predictors): +

    + +$$ +p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}. +$$ + +

    It is easy to extend to more predictors. The final class is

    +$$ +p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}, +$$ + +

    and they sum to one. Our earlier discussions were all specialized to +the case with two classes only. It is easy to see from the above that +what we derived earlier is compatible with these equations. +

    + +

    To find the optimal parameters we would typically use a gradient +descent method. Newton's method and gradient descent methods are +discussed in the material on optimization +methods. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs040.html b/doc/pub/week39/html/._week39-bs040.html new file mode 100644 index 000000000..26d57c047 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs040.html @@ -0,0 +1,303 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Optimization, the central part of any Machine Learning algortithm

    + +

    Almost every problem in machine learning and data science starts with +a dataset \( X \), a model \( g(\theta) \), which is a function of the +parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows +us to judge how well the model \( g(\theta) \) explains the observations +\( X \). The model is fit by finding the values of \( \theta \) that minimize +the cost function. Ideally we would be able to solve for \( \theta \) +analytically, however this is not possible in general and we must use +some approximative/numerical method to compute the minimum. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs041.html b/doc/pub/week39/html/._week39-bs041.html new file mode 100644 index 000000000..677037092 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs041.html @@ -0,0 +1,309 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Revisiting our Logistic Regression case

    + +

    In our discussion on Logistic Regression we studied the +case of +two classes, with \( y_i \) either +\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two +parameters \( \theta \) in our fitting, that is we +defined probabilities +

    + +$$ +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} +$$ + +

    where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \).

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs042.html b/doc/pub/week39/html/._week39-bs042.html new file mode 100644 index 000000000..8c77fb850 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs042.html @@ -0,0 +1,312 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The equations to solve

    + +

    Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \) +elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the +\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities +\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form +the first derivative of the cost function as +

    + +$$ +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). +$$ + +

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements +\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as +

    + +$$ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +$$ + +

    This defines what is called the Hessian matrix.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs043.html b/doc/pub/week39/html/._week39-bs043.html new file mode 100644 index 000000000..6724eb072 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs043.html @@ -0,0 +1,308 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Solving using Newton-Raphson's method

    + +

    If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.

    + +

    Our iterative scheme is then given by

    + +$$ +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}}, +$$ + +

    or in matrix form as

    + +$$ +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}. +$$ + +

    The right-hand side is computed with the old values of \( \theta \).

    + +

    If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs044.html b/doc/pub/week39/html/._week39-bs044.html new file mode 100644 index 000000000..4a9e1f5ea --- /dev/null +++ b/doc/pub/week39/html/._week39-bs044.html @@ -0,0 +1,613 @@ + + + + + + + +Week 39: Resampling methods and logistic regression + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Example code for Logistic Regression

    + +

    Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.

    + + +
    +
    +
    +
    +
    +
    import numpy as np
    +
    +class LogisticRegression:
    +    """
    +    Logistic Regression for binary and multiclass classification.
    +    """
    +    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
    +        self.lr = lr                  # Learning rate for gradient descent
    +        self.epochs = epochs          # Number of iterations
    +        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
    +        self.verbose = verbose        # Print loss during training if True
    +        self.weights = None
    +        self.multi_class = False      # Will be determined at fit time
    +
    +    def _add_intercept(self, X):
    +        """Add intercept term (column of ones) to feature matrix."""
    +        intercept = np.ones((X.shape[0], 1))
    +        return np.concatenate((intercept, X), axis=1)
    +
    +    def _sigmoid(self, z):
    +        """Sigmoid function for binary logistic."""
    +        return 1 / (1 + np.exp(-z))
    +
    +    def _softmax(self, Z):
    +        """Softmax function for multiclass logistic."""
    +        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
    +        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
    +
    +    def fit(self, X, y):
    +        """
    +        Train the logistic regression model using gradient descent.
    +        Supports binary (sigmoid) and multiclass (softmax) based on y.
    +        """
    +        X = np.array(X)
    +        y = np.array(y)
    +        n_samples, n_features = X.shape
    +
    +        # Add intercept if needed
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +            n_features += 1
    +
    +        # Determine classes and mode (binary vs multiclass)
    +        unique_classes = np.unique(y)
    +        if len(unique_classes) > 2:
    +            self.multi_class = True
    +        else:
    +            self.multi_class = False
    +
    +        # ----- Multiclass case -----
    +        if self.multi_class:
    +            n_classes = len(unique_classes)
    +            # Map original labels to 0...n_classes-1
    +            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
    +            y_indices = np.array([class_to_index[c] for c in y])
    +            # Initialize weight matrix (features x classes)
    +            self.weights = np.zeros((n_features, n_classes))
    +
    +            # One-hot encode y
    +            Y_onehot = np.zeros((n_samples, n_classes))
    +            Y_onehot[np.arange(n_samples), y_indices] = 1
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
    +                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
    +                # Compute gradient (features x classes)
    +                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
    +                # Update weights
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute current loss (categorical cross-entropy)
    +                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
    +                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
    +
    +        # ----- Binary case -----
    +        else:
    +            # Convert y to 0/1 if not already
    +            if not np.array_equal(unique_classes, [0, 1]):
    +                # Map the two classes to 0 and 1
    +                class0, class1 = unique_classes
    +                y_binary = np.where(y == class1, 1, 0)
    +            else:
    +                y_binary = y.copy().astype(int)
    +
    +            # Initialize weights vector (features,)
    +            self.weights = np.zeros(n_features)
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                linear_model = X.dot(self.weights)     # (n_samples,)
    +                probs = self._sigmoid(linear_model)   # (n_samples,)
    +                # Gradient for binary cross-entropy
    +                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute binary cross-entropy loss
    +                    loss = -np.mean(
    +                        y_binary * np.log(probs + 1e-15) + 
    +                        (1 - y_binary) * np.log(1 - probs + 1e-15)
    +                    )
    +                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
    +
    +    def predict_prob(self, X):
    +        """
    +        Compute probability estimates. Returns a 1D array for binary or
    +        a 2D array (n_samples x n_classes) for multiclass.
    +        """
    +        X = np.array(X)
    +        # Add intercept if the model used it
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +        scores = X.dot(self.weights)
    +        if self.multi_class:
    +            return self._softmax(scores)
    +        else:
    +            return self._sigmoid(scores)
    +
    +    def predict(self, X):
    +        """
    +        Predict class labels for samples in X.
    +        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
    +        """
    +        probs = self.predict_prob(X)
    +        if self.multi_class:
    +            # Choose class with highest probability
    +            return np.argmax(probs, axis=1)
    +        else:
    +            # Threshold at 0.5 for binary
    +            return (probs >= 0.5).astype(int)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    The class implements the sigmoid and softmax internally. During fit(), +we check the number of classes: if more than 2, we set +self.multi_class=True and perform multinomial logistic regression. We +one-hot encode the target vector and update a weight matrix with +softmax probabilities. Otherwise, we do standard binary logistic +regression, converting labels to 0/1 if needed and updating a weight +vector. In both cases we use batch gradient descent on the +cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical +stability). Progress (loss) can be printed if verbose=True. +

    + + + +
    +
    +
    +
    +
    +
    # Evaluation Metrics
    +#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
    +
    +def accuracy_score(y_true, y_pred):
    +    """Accuracy = (# correct predictions) / (total samples)."""
    +    y_true = np.array(y_true)
    +    y_pred = np.array(y_pred)
    +    return np.mean(y_true == y_pred)
    +
    +def binary_cross_entropy(y_true, y_prob):
    +    """
    +    Binary cross-entropy loss.
    +    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
    +    """
    +    y_true = np.array(y_true)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
    +
    +def categorical_cross_entropy(y_true, y_prob):
    +    """
    +    Categorical cross-entropy loss for multiclass.
    +    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
    +    """
    +    y_true = np.array(y_true, dtype=int)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    # One-hot encode true labels
    +    n_samples, n_classes = y_prob.shape
    +    one_hot = np.zeros_like(y_prob)
    +    one_hot[np.arange(n_samples), y_true] = 1
    +    # Compute cross-entropy
    +    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
    +    return np.mean(loss_vec)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +

    Synthetic data generation

    + +

    Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2]. +Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space. +

    + + + +
    +
    +
    +
    +
    +
    import numpy as np
    +
    +def generate_binary_data(n_samples=100, n_features=2, random_state=None):
    +    """
    +    Generate synthetic binary classification data.
    +    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    # Half samples for class 0, half for class 1
    +    n0 = n_samples // 2
    +    n1 = n_samples - n0
    +    # Class 0 around mean -2, class 1 around +2
    +    mean0 = -2 * np.ones(n_features)
    +    mean1 =  2 * np.ones(n_features)
    +    X0 = rng.randn(n0, n_features) + mean0
    +    X1 = rng.randn(n1, n_features) + mean1
    +    X = np.vstack((X0, X1))
    +    y = np.array([0]*n0 + [1]*n1)
    +    return X, y
    +
    +def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
    +    """
    +    Generate synthetic multiclass data with n_classes Gaussian clusters.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    X = []
    +    y = []
    +    samples_per_class = n_samples // n_classes
    +    for cls in range(n_classes):
    +        # Random cluster center for each class
    +        center = rng.uniform(-5, 5, size=n_features)
    +        Xi = rng.randn(samples_per_class, n_features) + center
    +        yi = [cls] * samples_per_class
    +        X.append(Xi)
    +        y.extend(yi)
    +    X = np.vstack(X)
    +    y = np.array(y)
    +    return X, y
    +
    +
    +# Generate and test on binary data
    +X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
    +model_bin = LogisticRegression(lr=0.1, epochs=1000)
    +model_bin.fit(X_bin, y_bin)
    +y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
    +y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
    +
    +acc_bin = accuracy_score(y_bin, y_pred_bin)
    +loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
    +print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
    +#For multiclass:
    +# Generate and test on multiclass data
    +X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
    +model_multi = LogisticRegression(lr=0.1, epochs=1000)
    +model_multi.fit(X_multi, y_multi)
    +y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
    +y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
    +
    +acc_multi = accuracy_score(y_multi, y_pred_multi)
    +loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
    +print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
    +
    +# CSV Export
    +import csv
    +
    +# Export binary results
    +with open('binary_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_bin, y_pred_bin):
    +        writer.writerow([true, pred])
    +
    +# Export multiclass results
    +with open('multiclass_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_multi, y_pred_multi):
    +        writer.writerow([true, pred])
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs045.html b/doc/pub/week39/html/._week39-bs045.html new file mode 100644 index 000000000..12fd481df --- /dev/null +++ b/doc/pub/week39/html/._week39-bs045.html @@ -0,0 +1,467 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Using gradient descent methods, limitations

    + +
      +
    • Gradient descent (GD) finds local minima of our function. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
    • +
    • GD is sensitive to initial conditions. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
    • +
    • Gradients are computationally expensive to calculate for large datasets. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over all \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called "mini batches". This has the added benefit of introducing stochasticity into our algorithm.
    • +
    • GD is very sensitive to choices of learning rates. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would adaptively choose the learning rates to match the landscape.
    • +
    • GD treats all directions in parameter space uniformly. Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.
    • +
    • GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
    • +
    +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs046.html b/doc/pub/week39/html/._week39-bs046.html new file mode 100644 index 000000000..b46cb20b5 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs046.html @@ -0,0 +1,539 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Improving gradient descent with momentum

    + +

    We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.

    + + + +
    +
    +
    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# take a step
    +		solution = solution - step_size * gradient
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# perform the gradient descent search
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs047.html b/doc/pub/week39/html/._week39-bs047.html new file mode 100644 index 000000000..3bdae2775 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs047.html @@ -0,0 +1,545 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Same code but now with momentum gradient descent

    + + + +
    +
    +
    +
    +
    +
    from numpy import asarray
    +from numpy import arange
    +from numpy.random import rand
    +from numpy.random import seed
    +from matplotlib import pyplot
    + 
    +# objective function
    +def objective(x):
    +	return x**2.0
    + 
    +# derivative of objective function
    +def derivative(x):
    +	return x * 2.0
    + 
    +# gradient descent algorithm
    +def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
    +	# track all solutions
    +	solutions, scores = list(), list()
    +	# generate an initial point
    +	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    +	# keep track of the change
    +	change = 0.0
    +	# run the gradient descent
    +	for i in range(n_iter):
    +		# calculate gradient
    +		gradient = derivative(solution)
    +		# calculate update
    +		new_change = step_size * gradient + momentum * change
    +		# take a step
    +		solution = solution - new_change
    +		# save the change
    +		change = new_change
    +		# evaluate candidate point
    +		solution_eval = objective(solution)
    +		# store solution
    +		solutions.append(solution)
    +		scores.append(solution_eval)
    +		# report progress
    +		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    +	return [solutions, scores]
    + 
    +# seed the pseudo random number generator
    +seed(4)
    +# define range for input
    +bounds = asarray([[-1.0, 1.0]])
    +# define the total iterations
    +n_iter = 30
    +# define the step size
    +step_size = 0.1
    +# define momentum
    +momentum = 0.3
    +# perform the gradient descent search with momentum
    +solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
    +# sample input range uniformly at 0.1 increments
    +inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    +# compute targets
    +results = objective(inputs)
    +# create a line plot of input vs result
    +pyplot.plot(inputs, results)
    +# plot the solutions found
    +pyplot.plot(solutions, scores, '.-', color='red')
    +# show the plot
    +pyplot.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs048.html b/doc/pub/week39/html/._week39-bs048.html new file mode 100644 index 000000000..39c74d5ae --- /dev/null +++ b/doc/pub/week39/html/._week39-bs048.html @@ -0,0 +1,461 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Overview video on Stochastic Gradient Descent

    + +What is Stochastic Gradient Descent + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs049.html b/doc/pub/week39/html/._week39-bs049.html new file mode 100644 index 000000000..40ef9997c --- /dev/null +++ b/doc/pub/week39/html/._week39-bs049.html @@ -0,0 +1,471 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Batches and mini-batches

    + +

    In gradient descent we compute the cost function and its gradient for all data points we have.

    + +

    In large-scale applications such as the ILSVRC challenge, the +training data can have on order of millions of examples. Hence, it +seems wasteful to compute the full cost function over the entire +training set in order to perform only a single parameter update. A +very common approach to addressing this challenge is to compute the +gradient over batches of the training data. For example, a typical batch could contain some thousand examples from +an entire training set of several millions. This batch is then used to +perform a parameter update. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs050.html b/doc/pub/week39/html/._week39-bs050.html new file mode 100644 index 000000000..67bd324f5 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs050.html @@ -0,0 +1,483 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Stochastic Gradient Descent (SGD)

    + +

    In stochastic gradient descent, the extreme case is the case where we +have only one batch, that is we include the whole data set. +

    + +

    This process is called Stochastic Gradient +Descent (SGD) (or also sometimes on-line gradient descent). This is +relatively less common to see because in practice due to vectorized +code optimizations it can be computationally much more efficient to +evaluate the gradient for 100 examples, than the gradient for one +example 100 times. Even though SGD technically refers to using a +single example at a time to evaluate the gradient, you will hear +people use the term SGD even when referring to mini-batch gradient +descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD +for “Batch gradient descent” are rare to see), where it is usually +assumed that mini-batches are used. The size of the mini-batch is a +hyperparameter but it is not very common to cross-validate or bootstrap it. It is +usually based on memory constraints (if any), or set to some value, +e.g. 32, 64 or 128. We use powers of 2 in practice because many +vectorized operation implementations work faster when their inputs are +sized in powers of 2. +

    + +

    In our notes with SGD we mean stochastic gradient descent with mini-batches.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs051.html b/doc/pub/week39/html/._week39-bs051.html new file mode 100644 index 000000000..f5a76ca74 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs051.html @@ -0,0 +1,473 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Stochastic Gradient Descent

    + +

    Stochastic gradient descent (SGD) and variants thereof address some of +the shortcomings of the Gradient descent method discussed above. +

    + +

    The underlying idea of SGD comes from the observation that the cost +function, which we want to minimize, can almost always be written as a +sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \), +

    +$$ +C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i, +\mathbf{\beta}). +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs052.html b/doc/pub/week39/html/._week39-bs052.html new file mode 100644 index 000000000..e34ccd33c --- /dev/null +++ b/doc/pub/week39/html/._week39-bs052.html @@ -0,0 +1,474 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Computation of gradients

    + +

    This in turn means that the gradient can be +computed as a sum over \( i \)-gradients +

    +$$ +\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i, +\mathbf{\beta}). +$$ + +

    Stochasticity/randomness is introduced by only taking the +gradient on a subset of the data called minibatches. If there are \( n \) +data points and the size of each minibatch is \( M \), there will be \( n/M \) +minibatches. We denote these minibatches by \( B_k \) where +\( k=1,\cdots,n/M \). +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs053.html b/doc/pub/week39/html/._week39-bs053.html new file mode 100644 index 000000000..81c971ba2 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs053.html @@ -0,0 +1,480 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    SGD example

    +

    As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) +and we choose to have \( M=5 \) minibathces, +then each minibatch contains two data points. In particular we have +\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 = +(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you +have only a single batch with all data points and on the other extreme, +you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e +\( B_k = \mathbf{x}_k \). +

    + +

    The idea is now to approximate the gradient by replacing the sum over +all data points with a sum over the data points in one the minibatches +picked at random in each gradient descent step +

    +$$ +\nabla_{\beta} +C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i, +\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta +c_i(\mathbf{x}_i, \mathbf{\beta}). +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs054.html b/doc/pub/week39/html/._week39-bs054.html new file mode 100644 index 000000000..81260a212 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs054.html @@ -0,0 +1,472 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The gradient step

    + +

    Thus a gradient descent step now looks like

    +$$ +\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i, +\mathbf{\beta}) +$$ + +

    where \( k \) is picked at random with equal +probability from \( [1,n/M] \). An iteration over the number of +minibathces (n/M) is commonly referred to as an epoch. Thus it is +typical to choose a number of epochs and for each epoch iterate over +the number of minibatches, as exemplified in the code below. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs055.html b/doc/pub/week39/html/._week39-bs055.html new file mode 100644 index 000000000..aaf538a29 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs055.html @@ -0,0 +1,504 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Simple example code

    + + + +
    +
    +
    +
    +
    +
    import numpy as np 
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 10 #number of epochs
    +
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for 
    +        j += 1
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Taking the gradient only on a subset of the data has two important +benefits. First, it introduces randomness which decreases the chance +that our opmization scheme gets stuck in a local minima. Second, if +the size of the minibatches are small relative to the number of +datapoints (\( M < n \)), the computation of the gradient is much +cheaper since we sum over the datapoints in the \( k-th \) minibatch and not +all \( n \) datapoints. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs056.html b/doc/pub/week39/html/._week39-bs056.html new file mode 100644 index 000000000..94b8ae245 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs056.html @@ -0,0 +1,471 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    When do we stop?

    + +

    A natural question is when do we stop the search for a new minimum? +One possibility is to compute the full gradient after a given number +of epochs and check if the norm of the gradient is smaller than some +threshold and stop if true. However, the condition that the gradient +is zero is valid also for local minima, so this would only tell us +that we are close to a local/global minimum. However, we could also +evaluate the cost function at this point, store the result and +continue the search. If the test kicks in at a later stage we can +compare the values of the cost function and keep the \( \beta \) that +gave the lowest value. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs057.html b/doc/pub/week39/html/._week39-bs057.html new file mode 100644 index 000000000..fa86cec0c --- /dev/null +++ b/doc/pub/week39/html/._week39-bs057.html @@ -0,0 +1,470 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Slightly different approach

    + +

    Another approach is to let the step length \( \gamma_j \) depend on the +number of epochs in such a way that it becomes very small after a +reasonable time such that we do not move at all. Such approaches are +also called scaling. There are many such ways to scale the learning +rate +and discussions here. See +also +https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 +for a discussion of different scaling functions for the learning rate. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs058.html b/doc/pub/week39/html/._week39-bs058.html new file mode 100644 index 000000000..c6cf71ee5 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs058.html @@ -0,0 +1,515 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Time decay rate

    + +

    As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in time \( t \).

    + +

    In this way we can fix the number of epochs, compute \( \beta \) and +evaluate the cost function at the end. Repeating the computation will +give a different result since the scheme is random by design. Then we +pick the final \( \beta \) that gives the lowest value of the cost +function. +

    + + + +
    +
    +
    +
    +
    +
    import numpy as np 
    +
    +def step_length(t,t0,t1):
    +    return t0/(t+t1)
    +
    +n = 100 #100 datapoints 
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +n_epochs = 500 #number of epochs
    +t0 = 1.0
    +t1 = 10
    +
    +gamma_j = t0/t1
    +j = 0
    +for epoch in range(1,n_epochs+1):
    +    for i in range(m):
    +        k = np.random.randint(m) #Pick the k-th minibatch at random
    +        #Compute the gradient using the data in minibatch Bk
    +        #Compute new suggestion for beta
    +        t = epoch*m+i
    +        gamma_j = step_length(t,t0,t1)
    +        j += 1
    +
    +print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs059.html b/doc/pub/week39/html/._week39-bs059.html new file mode 100644 index 000000000..eb2c5b76e --- /dev/null +++ b/doc/pub/week39/html/._week39-bs059.html @@ -0,0 +1,549 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Code with a Number of Minibatches which varies

    + +

    In the code here we vary the number of mini-batches.

    + + +
    +
    +
    +
    +
    +
    # Importing various packages
    +from math import exp, sqrt
    +from random import random, seed
    +import numpy as np
    +import matplotlib.pyplot as plt
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +
    +for iter in range(Niterations):
    +    gradients = 2.0/n*X.T @ ((X @ theta)-y)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs060.html b/doc/pub/week39/html/._week39-bs060.html new file mode 100644 index 000000000..4d77d1f1d --- /dev/null +++ b/doc/pub/week39/html/._week39-bs060.html @@ -0,0 +1,465 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Replace or not

    + +

    In the above code, we have use replacement in setting up the +mini-batches. The discussion +here may be +useful. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs061.html b/doc/pub/week39/html/._week39-bs061.html new file mode 100644 index 000000000..3838033ab --- /dev/null +++ b/doc/pub/week39/html/._week39-bs061.html @@ -0,0 +1,491 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Momentum based GD

    + +

    The stochastic gradient descent (SGD) is almost always used with a +momentum or inertia term that serves as a memory of the direction we +are moving in parameter space. This is typically implemented as +follows +

    + +$$ +\begin{align} +\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\ +\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}, +\tag{2} +\end{align} +$$ + +

    where we have introduced a momentum parameter \( \gamma \), with +\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to +indicate the gradient is to be taken over a different mini-batch at +each step. We call this algorithm gradient descent with momentum +(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a +running average of recently encountered gradients and +\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory +used in the averaging procedure. Consistent with this, when +\( \gamma=0 \), this just reduces down to ordinary SGD as discussed +earlier. An equivalent way of writing the updates is +

    + +$$ +\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t), +$$ + +

    where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs062.html b/doc/pub/week39/html/._week39-bs062.html new file mode 100644 index 000000000..c42315d60 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs062.html @@ -0,0 +1,483 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    More on momentum based approaches

    + +

    Let us try to get more intuition from these equations. It is helpful +to consider a simple physical analogy with a particle of mass \( m \) +moving in a viscous medium with drag coefficient \( \mu \) and potential +\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \), +then its motion is described by +

    + +$$ +m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}). +$$ + +

    We can discretize this equation in the usual way to get

    + +$$ +m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}). +$$ + +

    Rearranging this equation, we can rewrite this as

    + +$$ +\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t. +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs063.html b/doc/pub/week39/html/._week39-bs063.html new file mode 100644 index 000000000..c60aade85 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs063.html @@ -0,0 +1,509 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Momentum parameter

    + +

    Notice that this equation is identical to previous one if we identify +the position of the particle, \( \mathbf{w} \), with the parameters +\( \boldsymbol{\theta} \). This allows us to identify the momentum +parameter and learning rate with the mass of the particle and the +viscous drag as: +

    + +$$ +\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}. +$$ + +

    Thus, as the name suggests, the momentum parameter is proportional to +the mass of the particle and effectively provides inertia. +Furthermore, in the large viscosity/small learning rate limit, our +memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \). +

    + +

    Why is momentum useful? SGD momentum helps the gradient descent +algorithm gain speed in directions with persistent but small gradients +even in the presence of stochasticity, while suppressing oscillations +in high-curvature directions. This becomes especially important in +situations where the landscape is shallow and flat in some directions +and narrow and steep in others. It has been argued that first-order +methods (with appropriate initial conditions) can perform comparable +to more expensive second order methods, especially in the context of +complex deep learning models. +

    + +

    These beneficial properties of momentum can sometimes become even more +pronounced by using a slight modification of the classical momentum +algorithm called Nesterov Accelerated Gradient (NAG). +

    + +

    In the NAG algorithm, rather than calculating the gradient at the +current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one +calculates the gradient at the expected value of the parameters given +our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma +\mathbf{v}_{t-1}) \). This yields the NAG update rule +

    + +$$ +\begin{align} +\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\ +\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}. +\tag{3} +\end{align} +$$ + +

    One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs064.html b/doc/pub/week39/html/._week39-bs064.html new file mode 100644 index 000000000..3b806b349 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs064.html @@ -0,0 +1,482 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Second moment of the gradient

    + +

    In stochastic gradient descent, with and without momentum, we still +have to specify a schedule for tuning the learning rates \( \eta_t \) +as a function of time. As discussed in the context of Newton's +method, this presents a number of dilemmas. The learning rate is +limited by the steepest direction which can change depending on the +current position in the landscape. To circumvent this problem, ideally +our algorithm would keep track of curvature and take large steps in +shallow, flat directions and small steps in steep, narrow directions. +Second-order methods accomplish this by calculating or approximating +the Hessian and normalizing the learning rate by the +curvature. However, this is very computationally expensive for +extremely large models. Ideally, we would like to be able to +adaptively change the step size to match the landscape without paying +the steep computational price of calculating or approximating +Hessians. +

    + +

    Recently, a number of methods have been introduced that accomplish +this by tracking not only the gradient, but also the second moment of +the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and +ADAM. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs065.html b/doc/pub/week39/html/._week39-bs065.html new file mode 100644 index 000000000..04a243b3c --- /dev/null +++ b/doc/pub/week39/html/._week39-bs065.html @@ -0,0 +1,485 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    RMS prop

    + +

    In RMS prop, in addition to keeping a running average of the first +moment of the gradient, we also keep track of the second moment +denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule +for RMS prop is given by +

    + +$$ +\begin{align} +\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) +\tag{4}\\ +\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\ +\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber +\end{align} +$$ + +

    where \( \beta \) controls the averaging time of the second moment and is +typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate +typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8} \) is a +small regularization constant to prevent divergences. Multiplication +and division by vectors is understood as an element-wise operation. It +is clear from this formula that the learning rate is reduced in +directions where the norm of the gradient is consistently large. This +greatly speeds up the convergence by allowing us to use a larger +learning rate for flat directions. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs066.html b/doc/pub/week39/html/._week39-bs066.html new file mode 100644 index 000000000..7c9e66ade --- /dev/null +++ b/doc/pub/week39/html/._week39-bs066.html @@ -0,0 +1,511 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    ADAM optimizer

    + +

    A related algorithm is the ADAM optimizer. In +ADAM, we keep a running average of +both the first and second moment of the gradient and use this +information to adaptively change the learning rate for different +parameters. The method isefficient when working with large +problems involving lots data and/or parameters. It is a combination of the +gradient descent with momentum algorithm and the RMSprop algorithm +discussed above. +

    + +

    In addition to keeping a running average of the first and +second moments of the gradient +(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and +\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM +performs an additional bias correction to account for the fact that we +are estimating the first two moments of the gradient using a running +average (denoted by the hats in the update rule below). The update +rule for ADAM is given by (where multiplication and division are once +again understood to be element-wise operations below) +

    + +$$ +\begin{align} +\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) +\tag{5}\\ +\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\ +\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\ +\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\ +\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\ +\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\ +\tag{6} +\end{align} +$$ + +

    where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and +second moment and are typically taken to be \( 0.9 \) and \( 0.99 \) +respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop. +

    + +

    Like in RMSprop, the effective step size of a parameter depends on the +magnitude of its gradient squared. To understand this better, let us +rewrite this expression in terms of the variance +\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t - +(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The +update rule for this parameter is given by +

    + +$$ +\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 + m_t^2 }+\epsilon}. +$$ + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs067.html b/doc/pub/week39/html/._week39-bs067.html new file mode 100644 index 000000000..91731ecee --- /dev/null +++ b/doc/pub/week39/html/._week39-bs067.html @@ -0,0 +1,463 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Algorithms and codes for Adagrad, RMSprop and Adam

    + +

    The algorithms we have implemented are well described in the text by Goodfellow, Bengio and Courville, chapter 8.

    + +

    The codes which implement these algorithms are discussed after our presentation of automatic differentiation.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs068.html b/doc/pub/week39/html/._week39-bs068.html new file mode 100644 index 000000000..253a5b779 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs068.html @@ -0,0 +1,467 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Practical tips

    + +
      +
    • Randomize the data when making mini-batches. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
    • +
    • Transform your inputs. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
    • +
    • Monitor the out-of-sample performance. Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This early stopping significantly improves performance in many settings.
    • +
    • Adaptive optimization methods don't always have good generalization. Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
    • +
    +

    Geron's text, see chapter 11, has several interesting discussions.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs069.html b/doc/pub/week39/html/._week39-bs069.html new file mode 100644 index 000000000..34120dda8 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs069.html @@ -0,0 +1,557 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Automatic differentiation

    + +

    Automatic differentiation (AD), +also called algorithmic +differentiation or computational differentiation,is a set of +techniques to numerically evaluate the derivative of a function +specified by a computer program. AD exploits the fact that every +computer program, no matter how complicated, executes a sequence of +elementary arithmetic operations (addition, subtraction, +multiplication, division, etc.) and elementary functions (exp, log, +sin, cos, etc.). By applying the chain rule repeatedly to these +operations, derivatives of arbitrary order can be computed +automatically, accurately to working precision, and using at most a +small constant factor more arithmetic operations than the original +program. +

    + +

    Automatic differentiation is neither:

    + +
      +
    • Symbolic differentiation, nor
    • +
    • Numerical differentiation (the method of finite differences).
    • +
    +

    Symbolic differentiation can lead to inefficient code and faces the +difficulty of converting a computer program into a single expression, +while numerical differentiation can introduce round-off errors in the +discretization process and cancellation +

    + +

    Python has tools for so-called automatic differentiation. +Consider the following example +

    +$$ +f(x) = \sin\left(2\pi x + x^2\right) +$$ + +

    which has the following derivative

    +$$ +f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) +$$ + +

    Using autograd we have

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +
    +# To do elementwise differentiation:
    +from autograd import elementwise_grad as egrad 
    +
    +# To plot:
    +import matplotlib.pyplot as plt 
    +
    +
    +def f(x):
    +    return np.sin(2*np.pi*x + x**2)
    +
    +def f_grad_analytic(x):
    +    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
    +
    +# Do the comparison:
    +x = np.linspace(0,1,1000)
    +
    +f_grad = egrad(f)
    +
    +computed = f_grad(x)
    +analytic = f_grad_analytic(x)
    +
    +plt.title('Derivative computed from Autograd compared with the analytical derivative')
    +plt.plot(x,computed,label='autograd')
    +plt.plot(x,analytic,label='analytic')
    +
    +plt.xlabel('x')
    +plt.ylabel('y')
    +plt.legend()
    +
    +plt.show()
    +
    +print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs070.html b/doc/pub/week39/html/._week39-bs070.html new file mode 100644 index 000000000..575da0e98 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs070.html @@ -0,0 +1,506 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Using autograd

    + +

    Here we +experiment with what kind of functions Autograd is capable +of finding the gradient of. The following Python functions are just +meant to illustrate what Autograd can do, but please feel free to +experiment with other, possibly more complicated, functions as well. +

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +
    +def f1(x):
    +    return x**3 + 1
    +
    +f1_grad = grad(f1)
    +
    +# Remember to send in float as argument to the computed gradient from Autograd!
    +a = 1.0
    +
    +# See the evaluated gradient at a using autograd:
    +print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
    +
    +# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
    +grad_analytical = 3*a**2
    +print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs071.html b/doc/pub/week39/html/._week39-bs071.html new file mode 100644 index 000000000..02ace03c4 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs071.html @@ -0,0 +1,521 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Autograd with more complicated functions

    + +

    To differentiate with respect to two (or more) arguments of a Python +function, Autograd need to know at which variable the function if +being differentiated with respect to. +

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f2(x1,x2):
    +    return 3*x1**3 + x2*(x1 - 5) + 1
    +
    +# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
    +f2_grad_x1 = grad(f2,0)
    +
    +# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
    +f2_grad_x2 = grad(f2,1)
    +
    +x1 = 1.0
    +x2 = 3.0 
    +
    +print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
    +print("-"*30)
    +
    +# Compare with the analytical derivatives:
    +
    +# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
    +f2_grad_x1_analytical = 9*x1**2 + x2
    +
    +# Derivative of f2 w.r.t x2 is: x1 - 5:
    +f2_grad_x2_analytical = x1 - 5
    +
    +# See the evaluated derivations:
    +print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
    +print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
    +
    +print()
    +
    +print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
    +print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs072.html b/doc/pub/week39/html/._week39-bs072.html new file mode 100644 index 000000000..4e439771a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs072.html @@ -0,0 +1,506 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    More complicated functions using the elements of their arguments directly

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f3(x): # Assumes x is an array of length 5 or higher
    +    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
    +
    +f3_grad = grad(f3)
    +
    +x = np.linspace(0,4,5)
    +
    +# Print the computed gradient:
    +print("The computed gradient of f3 is: ", f3_grad(x))
    +
    +# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
    +f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
    +
    +# Print the analytical gradient:
    +print("The analytical gradient of f3 is: ", f3_grad_analytical)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Note that in this case, when sending an array as input argument, the +output from Autograd is another array. This is the true gradient of +the function, as opposed to the function in the previous example. By +using arrays to represent the variables, the output from Autograd +might be easier to work with, as the output is closer to what one +could expect form a gradient-evaluting function. +

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs073.html b/doc/pub/week39/html/._week39-bs073.html new file mode 100644 index 000000000..210e64a4a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs073.html @@ -0,0 +1,499 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Functions using mathematical functions from Numpy

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f4(x):
    +    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
    +
    +f4_grad = grad(f4)
    +
    +x = 2.7
    +
    +# Print the computed derivative:
    +print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
    +
    +# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
    +f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
    +
    +# Print the analytical gradient:
    +print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs074.html b/doc/pub/week39/html/._week39-bs074.html new file mode 100644 index 000000000..9979274aa --- /dev/null +++ b/doc/pub/week39/html/._week39-bs074.html @@ -0,0 +1,496 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    More autograd

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f5(x):
    +    if x >= 0:
    +        return x**2
    +    else:
    +        return -3*x + 1
    +
    +f5_grad = grad(f5)
    +
    +x = 2.7
    +
    +# Print the computed derivative:
    +print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs075.html b/doc/pub/week39/html/._week39-bs075.html new file mode 100644 index 000000000..9378dc5f4 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs075.html @@ -0,0 +1,537 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    And with loops

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f6_for(x):
    +    val = 0
    +    for i in range(10):
    +        val = val + x**i
    +    return val
    +
    +def f6_while(x):
    +    val = 0
    +    i = 0
    +    while i < 10:
    +        val = val + x**i
    +        i = i + 1
    +    return val
    +
    +f6_for_grad = grad(f6_for)
    +f6_while_grad = grad(f6_while)
    +
    +x = 0.5
    +
    +# Print the computed derivaties of f6_for and f6_while
    +print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
    +print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
    +# The analytical derivative is: sum(i*x**(i-1)) 
    +f6_grad_analytical = 0
    +for i in range(10):
    +    f6_grad_analytical += i*x**(i-1)
    +
    +print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs076.html b/doc/pub/week39/html/._week39-bs076.html new file mode 100644 index 000000000..53b23f1db --- /dev/null +++ b/doc/pub/week39/html/._week39-bs076.html @@ -0,0 +1,509 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Using recursion

    + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +
    +def f7(n): # Assume that n is an integer
    +    if n == 1 or n == 0:
    +        return 1
    +    else:
    +        return n*f7(n-1)
    +
    +f7_grad = grad(f7)
    +
    +n = 2.0
    +
    +print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
    +
    +# The function f7 is an implementation of the factorial of n.
    +# By using the product rule, one can find that the derivative is:
    +
    +f7_grad_analytical = 0
    +for i in range(int(n)-1):
    +    tmp = 1
    +    for k in range(int(n)-1):
    +        if k != i:
    +            tmp *= (n - k)
    +    f7_grad_analytical += tmp
    +
    +print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs077.html b/doc/pub/week39/html/._week39-bs077.html new file mode 100644 index 000000000..7792fbfe1 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs077.html @@ -0,0 +1,496 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Unsupported functions

    +

    Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.

    + +

    Assigning a value to the variable being differentiated with respect to

    + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f8(x): # Assume x is an array
    +    x[2] = 3
    +    return x*2
    +
    +#f8_grad = grad(f8)
    +
    +#x = 8.4
    +
    +#print("The derivative of f8 is:",f8_grad(x))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs078.html b/doc/pub/week39/html/._week39-bs078.html new file mode 100644 index 000000000..7b4e40c3e --- /dev/null +++ b/doc/pub/week39/html/._week39-bs078.html @@ -0,0 +1,533 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    The syntax a.dot(b) when finding the dot product

    + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f9(a): # Assume a is an array with 2 elements
    +    b = np.array([1.0,2.0])
    +    return a.dot(b)
    +
    +#f9_grad = grad(f9)
    +
    +#x = np.array([1.0,0.0])
    +
    +#print("The derivative of f9 is:",f9_grad(x))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Here we are told that the 'dot' function does not belong to Autograd's +version of a Numpy array. To overcome this, an alternative syntax +which also computed the dot product can be used: +

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +def f9_alternative(x): # Assume a is an array with 2 elements
    +    b = np.array([1.0,2.0])
    +    return np.dot(x,b) # The same as x_1*b_1 + x_2*b_2
    +
    +f9_alternative_grad = grad(f9_alternative)
    +
    +x = np.array([3.0,0.0])
    +
    +print("The gradient of f9 is:",f9_alternative_grad(x))
    +
    +# The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively
    +# w.r.t x is (b_1, b_2).
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs079.html b/doc/pub/week39/html/._week39-bs079.html new file mode 100644 index 000000000..22540ce8a --- /dev/null +++ b/doc/pub/week39/html/._week39-bs079.html @@ -0,0 +1,534 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Using Autograd with OLS

    + +

    We conclude the part on optmization by showing how we can make codes +for linear regression and logistic regression using autograd. The +first example shows results with ordinary leats squares. +

    + + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(beta):
    +    return (1.0/n)*np.sum((y-X @ beta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs080.html b/doc/pub/week39/html/._week39-bs080.html new file mode 100644 index 000000000..184294432 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs080.html @@ -0,0 +1,531 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Same code but now with momentum gradient descent

    + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients for OLS
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(beta):
    +    return (1.0/n)*np.sum((y-X @ beta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x#+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 30
    +
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(theta)
    +    theta -= eta*gradients
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd")
    +print(theta)
    +
    +# Now improve with momentum gradient descent
    +change = 0.0
    +delta_momentum = 0.3
    +for iter in range(Niterations):
    +    # calculate gradient
    +    gradients = training_gradient(theta)
    +    # calculate update
    +    new_change = eta*gradients+delta_momentum*change
    +    # take a step
    +    theta -= new_change
    +    # save the change
    +    change = new_change
    +    print(iter,gradients[0],gradients[1])
    +print("theta from own gd wth momentum")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs081.html b/doc/pub/week39/html/._week39-bs081.html new file mode 100644 index 000000000..90ee108a7 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs081.html @@ -0,0 +1,516 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    But none of these can compete with Newton's method

    + + + +
    +
    +
    +
    +
    +
    # Using Newton's method
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +def CostOLS(beta):
    +    return (1.0/n)*np.sum((y-X @ beta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(beta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +# Note that here the Hessian does not depend on the parameters beta
    +invH = np.linalg.pinv(H)
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +beta = np.random.randn(2,1)
    +Niterations = 5
    +
    +# define the gradient
    +training_gradient = grad(CostOLS)
    +
    +for iter in range(Niterations):
    +    gradients = training_gradient(beta)
    +    beta -= invH @ gradients
    +    print(iter,gradients[0],gradients[1])
    +print("beta from own Newton code")
    +print(beta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs082.html b/doc/pub/week39/html/._week39-bs082.html new file mode 100644 index 000000000..10cf81083 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs082.html @@ -0,0 +1,551 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Including Stochastic Gradient Descent with Autograd

    +

    In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using autograd.

    + + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 1000
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +xnew = np.array([[0],[2]])
    +Xnew = np.c_[np.ones((2,1)), xnew]
    +ypredict = Xnew.dot(theta)
    +ypredict2 = Xnew.dot(theta_linreg)
    +
    +plt.plot(xnew, ypredict, "r-")
    +plt.plot(xnew, ypredict2, "b-")
    +plt.plot(x, y ,'ro')
    +plt.axis([0,2.0,0, 15.0])
    +plt.xlabel(r'$x$')
    +plt.ylabel(r'$y$')
    +plt.title(r'Random numbers ')
    +plt.show()
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +for epoch in range(n_epochs):
    +# Can you figure out a better way of setting up the contributions to each batch?
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        theta = theta - eta*gradients
    +print("theta from own sdg")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs083.html b/doc/pub/week39/html/._week39-bs083.html new file mode 100644 index 000000000..fefcbc005 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs083.html @@ -0,0 +1,542 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Same code but now with momentum gradient descent

    + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using SGD
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 100
    +x = 2*np.random.rand(n,1)
    +y = 4+3*x+np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +# Hessian matrix
    +H = (2.0/n)* XT_X
    +EigValues, EigVectors = np.linalg.eig(H)
    +print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    +
    +theta = np.random.randn(2,1)
    +eta = 1.0/np.max(EigValues)
    +Niterations = 100
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +
    +for iter in range(Niterations):
    +    gradients = (1.0/n)*training_gradient(y, X, theta)
    +    theta -= eta*gradients
    +print("theta from own gd")
    +print(theta)
    +
    +
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +t0, t1 = 5, 50
    +def learning_schedule(t):
    +    return t0/(t+t1)
    +
    +theta = np.random.randn(2,1)
    +
    +change = 0.0
    +delta_momentum = 0.3
    +
    +for epoch in range(n_epochs):
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        eta = learning_schedule(epoch*m+i)
    +        # calculate update
    +        new_change = eta*gradients+delta_momentum*change
    +        # take a step
    +        theta -= new_change
    +        # save the change
    +        change = new_change
    +print("theta from own sdg with momentum")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs084.html b/doc/pub/week39/html/._week39-bs084.html new file mode 100644 index 000000000..a4e36dd59 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs084.html @@ -0,0 +1,523 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Similar (second order function now) problem but now with AdaGrad

    + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        Giter += gradients*gradients
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +        theta -= update
    +print("theta from own AdaGrad")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +

    Running this code we note an almost perfect agreement with the results from matrix inversion.

    + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs085.html b/doc/pub/week39/html/._week39-bs085.html new file mode 100644 index 000000000..f908adc27 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs085.html @@ -0,0 +1,527 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    RMSprop for adaptive learning rate with Stochastic Gradient Descent

    + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameter rho
    +rho = 0.99
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-8
    +for epoch in range(n_epochs):
    +    Giter = 0.0
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +	# Accumulated gradient
    +	# Scaling with rho the new and the previous results
    +        Giter = (rho*Giter+(1-rho)*gradients*gradients)
    +	# Taking the diagonal only and inverting
    +        update = gradients*eta/(delta+np.sqrt(Giter))
    +	# Hadamard product
    +        theta -= update
    +print("theta from own RMSprop")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs086.html b/doc/pub/week39/html/._week39-bs086.html new file mode 100644 index 000000000..98e185b59 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs086.html @@ -0,0 +1,532 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    And finally ADAM

    + + + +
    +
    +
    +
    +
    +
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    +# OLS example
    +from random import random, seed
    +import numpy as np
    +import autograd.numpy as np
    +import matplotlib.pyplot as plt
    +from autograd import grad
    +
    +# Note change from previous example
    +def CostOLS(y,X,theta):
    +    return np.sum((y-X @ theta)**2)
    +
    +n = 1000
    +x = np.random.rand(n,1)
    +y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    +
    +X = np.c_[np.ones((n,1)), x, x*x]
    +XT_X = X.T @ X
    +theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    +print("Own inversion")
    +print(theta_linreg)
    +
    +
    +# Note that we request the derivative wrt third argument (theta, 2 here)
    +training_gradient = grad(CostOLS,2)
    +# Define parameters for Stochastic Gradient Descent
    +n_epochs = 50
    +M = 5   #size of each minibatch
    +m = int(n/M) #number of minibatches
    +# Guess for unknown parameters theta
    +theta = np.random.randn(3,1)
    +
    +# Value for learning rate
    +eta = 0.01
    +# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
    +beta1 = 0.9
    +beta2 = 0.999
    +# Including AdaGrad parameter to avoid possible division by zero
    +delta  = 1e-7
    +iter = 0
    +for epoch in range(n_epochs):
    +    first_moment = 0.0
    +    second_moment = 0.0
    +    iter += 1
    +    for i in range(m):
    +        random_index = M*np.random.randint(m)
    +        xi = X[random_index:random_index+M]
    +        yi = y[random_index:random_index+M]
    +        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    +        # Computing moments first
    +        first_moment = beta1*first_moment + (1-beta1)*gradients
    +        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
    +        first_term = first_moment/(1.0-beta1**iter)
    +        second_term = second_moment/(1.0-beta2**iter)
    +	# Scaling with rho the new and the previous results
    +        update = eta*first_term/(np.sqrt(second_term)+delta)
    +        theta -= update
    +print("theta from own ADAM")
    +print(theta)
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs087.html b/doc/pub/week39/html/._week39-bs087.html new file mode 100644 index 000000000..11f5da1cf --- /dev/null +++ b/doc/pub/week39/html/._week39-bs087.html @@ -0,0 +1,505 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    And Logistic Regression

    + + + +
    +
    +
    +
    +
    +
    import autograd.numpy as np
    +from autograd import grad
    +
    +def sigmoid(x):
    +    return 0.5 * (np.tanh(x / 2.) + 1)
    +
    +def logistic_predictions(weights, inputs):
    +    # Outputs probability of a label being true according to logistic model.
    +    return sigmoid(np.dot(inputs, weights))
    +
    +def training_loss(weights):
    +    # Training loss is the negative log-likelihood of the training labels.
    +    preds = logistic_predictions(weights, inputs)
    +    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
    +    return -np.sum(np.log(label_probabilities))
    +
    +# Build a toy dataset.
    +inputs = np.array([[0.52, 1.12,  0.77],
    +                   [0.88, -1.08, 0.15],
    +                   [0.52, 0.06, -1.30],
    +                   [0.74, -2.49, 1.39]])
    +targets = np.array([True, True, False, True])
    +
    +# Define a function that returns gradients of training loss using Autograd.
    +training_gradient_fun = grad(training_loss)
    +
    +# Optimize weights using gradient descent.
    +weights = np.array([0.0, 0.0, 0.0])
    +print("Initial loss:", training_loss(weights))
    +for i in range(100):
    +    weights -= training_gradient_fun(weights) * 0.01
    +
    +print("Trained loss:", training_loss(weights))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs088.html b/doc/pub/week39/html/._week39-bs088.html new file mode 100644 index 000000000..d532f9ec2 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs088.html @@ -0,0 +1,488 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Introducing JAX

    + +

    Presently, instead of using autograd, we recommend using JAX

    + +

    JAX is Autograd and XLA (Accelerated Linear Algebra)), +brought together for high-performance numerical computing and machine learning research. +It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more. +

    + +

    Here's a simple example on how you can use JAX to compute the derivate of the logistic function.

    + + + +
    +
    +
    +
    +
    +
    import jax.numpy as jnp
    +from jax import grad, jit, vmap
    +
    +def sum_logistic(x):
    +  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))
    +
    +x_small = jnp.arange(3.)
    +derivative_fn = grad(sum_logistic)
    +print(derivative_fn(x_small))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/._week39-bs089.html b/doc/pub/week39/html/._week39-bs089.html new file mode 100644 index 000000000..f49f192c2 --- /dev/null +++ b/doc/pub/week39/html/._week39-bs089.html @@ -0,0 +1,490 @@ + + + + + + + +Week 39: Optimization and Gradient Methods + + + + + + + + + + + + + + + + + + + + +
    +

     

     

     

    + + +

    Introducing JAX

    + +

    Presently, instead of using autograd, we recommend using JAX

    + +

    JAX is Autograd and XLA (Accelerated Linear Algebra)), +brought together for high-performance numerical computing and machine learning research. +It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more. +

    + +

    Here's a simple example on how you can use JAX to compute the derivate of the logistic function.

    + + + +
    +
    +
    +
    +
    +
    import jax.numpy as jnp
    +from jax import grad, jit, vmap
    +
    +def sum_logistic(x):
    +  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))
    +
    +x_small = jnp.arange(3.)
    +derivative_fn = grad(sum_logistic)
    +print(derivative_fn(x_small))
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + +

    + +

    + +
    + + + + +
    + +
    + + + diff --git a/doc/pub/week39/html/week39-bs.html b/doc/pub/week39/html/week39-bs.html index e57267e6f..85a342f93 100644 --- a/doc/pub/week39/html/week39-bs.html +++ b/doc/pub/week39/html/week39-bs.html @@ -8,8 +8,8 @@ - -Week 39: Optimization and Gradient Methods + +Week 39: Resampling methods and logistic regression @@ -36,248 +36,132 @@ @@ -305,101 +189,58 @@ - Week 39: Optimization and Gradient Methods + Week 39: Resampling methods and logistic regression +
    +

    Cross-validation in brief

    +

    For the various values of \( k \)

    -
    -Second order condition +
      +

    1. shuffle the dataset randomly.
    2. +

    3. Split the dataset into \( k \) groups.
    4. +

    5. For each unique group: +
        +

      1. Decide which group to use as set for test data
      2. +

      3. Take the remaining groups as a training data set
      4. +

      5. Fit a model on the training set and evaluate it on the test set
      6. +

      7. Retain the evaluation score and discard the model
      8. +

      -

      Assume that \( f \) is twice -differentiable, i.e the Hessian matrix exists at each point in -\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its -Hessian is positive semi-definite for all \( x\in D_f \). For a -single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature -everywhere. -

      -
    - -

    This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.

    +

  • Summarize the model using the sample of model evaluation scores
  • +
    -

    More on convex functions

    +

    Code Example for Cross-validation and \( k \)-fold Cross-validation

    -

    The next result is of great importance to us and the reason why we are -going on about convex functions. In machine learning we frequently -have to minimize a loss/cost function in order to find the best -parameters for the model we are considering. -

    +

    The code here uses Ridge regression with cross-validation (CV) resampling and \( k \)-fold CV in order to fit a specific polynomial.

    -

    Ideally we want the -global minimum (for high-dimensional models it is hard to know -if we have local or global minimum). However, if the cost/loss function -is convex the following result provides invaluable information: -

    + +
    +
    +
    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import KFold
    +from sklearn.linear_model import Ridge
    +from sklearn.model_selection import cross_val_score
    +from sklearn.preprocessing import PolynomialFeatures
     
    -
    -Any minimum is global for convex functions -

    -

    Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \) -is minimal, where \( f \) is convex and differentiable. Then, any point -\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum. -

    -
    +# A seed just to ensure that the random numbers are the same for every run. +# Useful for eventual debugging. +np.random.seed(3155) -

    This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.

    -
    +# Generate the data. +nsamples = 100 +x = np.random.randn(nsamples) +y = 3*x**2 + np.random.randn(nsamples) -
    -

    Some simple problems

    +## Cross-validation on Ridge regression using KFold only -
      -

    1. Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.
    2. -

    3. Using the second order condition show that the following functions are convex on the specified domain.
    4. -
        -

      • \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).
      • -

      • \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).
      • -
      -

      -

    5. Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.
    6. -

    7. A norm is any function that satisfy the following properties
    8. -
        -

      • \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).
      • -

      • \( f(x+y) \leq f(x) + f(y) \)
      • -

      • \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)
      • -
      -

      -

    -

    -

    Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).

    -
    +# Decide degree on polynomial to fit +poly = PolynomialFeatures(degree = 6) -
    -

    Standard steepest descent

    +# Decide which values of lambda to use +nlambdas = 500 +lambdas = np.logspace(-3, 5, nlambdas) -

    Before we proceed, we would like to discuss the approach called the -standard Steepest descent (different from the above steepest descent discussion), which again leads to us having to be able -to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG). -

    +# Initialize a KFold instance +k = 5 +kfold = KFold(n_splits = k) -The success of the CG method -

    for finding solutions of non-linear problems is based on the theory -of conjugate gradients for linear systems of equations. It belongs to -the class of iterative methods for solving problems from linear -algebra of the type -

    -

     
    -$$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}. -\end{equation*} -$$ -

     
    +# Perform the cross-validation to estimate MSE +scores_KFold = np.zeros((nlambdas, k)) -

    In the iterative process we end up with a problem like

    +i = 0 +for lmb in lambdas: + ridge = Ridge(alpha = lmb) + j = 0 + for train_inds, test_inds in kfold.split(x): + xtrain = x[train_inds] + ytrain = y[train_inds] -

     
    -$$ -\begin{equation*} - \boldsymbol{r}= \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}, -\end{equation*} -$$ -

     
    + xtest = x[test_inds] + ytest = y[test_inds] -

    where \( \boldsymbol{r} \) is the so-called residual or error in the iterative process.

    + Xtrain = poly.fit_transform(xtrain[:, np.newaxis]) + ridge.fit(Xtrain, ytrain[:, np.newaxis]) -

    When we have found the exact solution, \( \boldsymbol{r}=0 \).

    -
    + Xtest = poly.fit_transform(xtest[:, np.newaxis]) + ypred = ridge.predict(Xtest) -
    -

    Gradient method

    + scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred) -

    The residual is zero when we reach the minimum of the quadratic equation

    -

     
    -$$ -\begin{equation*} - P(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T\boldsymbol{b}, -\end{equation*} -$$ -

     
    + j += 1 + i += 1 -

    with the constraint that the matrix \( \boldsymbol{A} \) is positive definite and -symmetric. This defines also the Hessian and we want it to be positive definite. -

    -
    -
    -

    Steepest descent method

    +estimated_mse_KFold = np.mean(scores_KFold, axis = 1) -

    We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). -We can assume without loss of generality that -

    -

     
    -$$ -\begin{equation*} -\boldsymbol{x}_0=0, -\end{equation*} -$$ -

     
    +## Cross-validation using cross_val_score from sklearn along with KFold -

    or consider the system

    -

     
    -$$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0, -\end{equation*} -$$ -

     
    +# kfold is an instance initialized above as: +# kfold = KFold(n_splits = k) -

    instead.

    -
    +estimated_mse_sklearn = np.zeros(nlambdas) +i = 0 +for lmb in lambdas: + ridge = Ridge(alpha = lmb) -
    -

    Steepest descent method

    -
    - -

    -

    One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form

    -

     
    -$$ -\begin{equation*} - f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. -\end{equation*} -$$ -

     
    + X = poly.fit_transform(x[:, np.newaxis]) + estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold) -

    This suggests taking the first basis vector \( \boldsymbol{r}_1 \) (see below for definition) -to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), -which equals -

    -

     
    -$$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b}, -\end{equation*} -$$ -

     
    + # cross_val_score return an array containing the estimated negative mse for every fold. + # we have to the the mean of every array in order to get an estimate of the mse of the model + estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds) -

    and -\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \). -

    -
    -
    + i += 1 -
    -

    Final expressions

    -
    - -

    -

    We can compute the residual iteratively as

    -

     
    -$$ -\begin{equation*} -\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1}, - \end{equation*} -$$ -

     
    +## Plot and compare the slightly different ways to perform cross-validation -

    which equals

    -

     
    -$$ -\begin{equation*} -\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_k), - \end{equation*} -$$ -

     
    +plt.figure() -

    or

    -

     
    -$$ -\begin{equation*} -(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{r}_k, - \end{equation*} -$$ -

     
    +plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score') +#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold') -

    which gives

    +plt.xlabel('log10(lambda)') +plt.ylabel('mse') -

     
    -$$ -\alpha_k = \frac{\boldsymbol{r}_k^T\boldsymbol{r}_k}{\boldsymbol{r}_k^T\boldsymbol{A}\boldsymbol{r}_k} -$$ -

     
    +plt.legend() -

    leading to the iterative scheme

    -

     
    -$$ -\begin{equation*} -\boldsymbol{x}_{k+1}=\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_{k}, - \end{equation*} -$$ -

     
    +plt.show() + +

    + + + +
    +
    +
    +
    +
    +
    +
    +
    -

    Steepest descent example

    +

    More examples on bootstrap and cross-validation and errors

    @@ -873,26 +842,84 @@

    Steepest descent example

    -
    import numpy as np
    -import numpy.linalg as la
    -
    -import scipy.optimize as sopt
    -
    -import matplotlib.pyplot as pt
    -from mpl_toolkits.mplot3d import axes3d
    -
    -def f(x):
    -    return x[0]**2 + 3.0*x[1]**2
    -
    -def df(x):
    -    return np.array([2*x[0], 6*x[1]])
    -
    -fig = pt.figure()
    -ax = fig.add_subplot(projection = '3d')
    -
    -xmesh, ymesh = np.mgrid[-3:3:50j,-3:3:50j]
    -fmesh = f(np.array([xmesh, ymesh]))
    -ax.plot_surface(xmesh, ymesh, fmesh)
    +  
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +testerror = np.zeros(Maxpolydegree)
    +trainingerror = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +
    +trials = 100
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +
    +# loop over trials in order to estimate the expectation value of the MSE
    +    testerror[polydegree] = 0.0
    +    trainingerror[polydegree] = 0.0
    +    for samples in range(trials):
    +        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    +        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    +        ypred = model.predict(x_train)
    +        ytilde = model.predict(x_test)
    +        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    +        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    +
    +    testerror[polydegree] /= trials
    +    trainingerror[polydegree] /= trials
    +    print("Degree of polynomial: %3d"% polynomial[polydegree])
    +    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    +    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    +
    +plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    +plt.plot(polynomial, np.log10(testerror), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
     
    @@ -908,7 +935,13 @@

    Steepest descent example

    -

    And then as countor plot

    +

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.

    +
    + +
    +

    The same example but now with cross-validation

    + +

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    @@ -916,9 +949,73 @@

    Steepest descent example

    -
    pt.axis("equal")
    -pt.contour(xmesh, ymesh, fmesh)
    -guesses = [np.array([2, 2./5])]
    +  
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.metrics import mean_squared_error
    +from sklearn.model_selection import KFold
    +from sklearn.model_selection import cross_val_score
    +
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +estimated_mse_sklearn = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +k =5
    +kfold = KFold(n_splits = k)
    +
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +        OLS = LinearRegression(fit_intercept=False)
    +# loop over trials in order to estimate the expectation value of the MSE
    +    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    +#[:, np.newaxis]
    +    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    +
    +plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
     
    @@ -933,8 +1030,150 @@

    Steepest descent example

    +
    + +
    +

    Logistic Regression

    + +

    In linear regression our main interest was centered on learning the +coefficients of a functional fit (say a polynomial) in order to be +able to predict the response of a continuous variable on some unseen +data. The fit to the continuous variable \( y_i \) is based on some +independent variables \( \boldsymbol{x}_i \). Linear regression resulted in +analytical expressions for standard ordinary Least Squares or Ridge +regression (in terms of matrices to invert) for several quantities, +ranging from the variance and thereby the confidence intervals of the +parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert +the product of the design matrices, linear regression gives then a +simple recipe for fitting our data. +

    +
    + +
    +

    Classification problems

    + +

    Classification problems, however, are concerned with outcomes taking +the form of discrete variables (i.e. categories). We may for example, +on the basis of DNA sequencing for a number of patients, like to find +out which mutations are important for a certain disease; or based on +scans of various patients' brains, figure out if there is a tumor or +not; or given a specific physical system, we'd like to identify its +state, say whether it is an ordered or disordered system (typical +situation in solid state physics); or classify the status of a +patient, whether she/he has a stroke or not and many other similar +situations. +

    + +

    The most common situation we encounter when we apply logistic +regression is that of two possible outcomes, normally denoted as a +binary outcome, true or false, positive or negative, success or +failure etc. +

    +
    + +
    +

    Optimization and Deep learning

    + +

    Logistic regression will also serve as our stepping stone towards +neural network algorithms and supervised deep learning. For logistic +learning, the minimization of the cost function leads to a non-linear +equation in the parameters \( \boldsymbol{\theta} \). The optimization of the +problem calls therefore for minimization algorithms. This forms the +bottle neck of all machine learning algorithms, namely how to find +reliable minima of a multi-variable function. This leads us to the +family of gradient descent methods. The latter are the working horses +of basically all modern machine learning algorithms. +

    + +

    We note also that many of the topics discussed here on logistic +regression are also commonly used in modern supervised Deep Learning +models, as we will see later. +

    +
    + +
    +

    Basics

    + +

    We consider the case where the outputs/targets, also called the +responses or the outcomes, \( y_i \) are discrete and only take values +from \( k=0,\dots,K-1 \) (i.e. \( K \) classes). +

    + +

    The goal is to predict the +output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \) +made of \( n \) samples, each of which carries \( p \) features or predictors. The +primary goal is to identify the classes to which new unseen samples +belong. +

    + +

    Let us specialize to the case of two classes only, with outputs +\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a +credit card user that could default or not on her/his credit card +debt. That is +

    + +

     
    +$$ +y_i = \begin{bmatrix} 0 & \mathrm{no}\\ 1 & \mathrm{yes} \end{bmatrix}. +$$ +

     
    +

    + +
    +

    Linear classifier

    + +

    Before moving to the logistic model, let us try to use our linear +regression model to classify these two outcomes. We could for example +fit a linear model to the default case if \( y_i > 0.5 \) and the no +default case \( y_i \leq 0.5 \). +

    + +

    We would then have our +weighted linear combination, namely +

    +

     
    +$$ +\begin{equation} +\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} + \boldsymbol{\epsilon}, +\tag{1} +\end{equation} +$$ +

     
    + +

    where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our +\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors. +

    +
    + +
    +

    Some selected properties

    + +

    The main problem with our function is that it takes values on the +entire real axis. In the case of logistic regression, however, the +labels \( y_i \) are discrete variables. A typical example is the credit +card data discussed below here, where we can set the state of +defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons +in the data set (see the full example below). +

    + +

    One simple way to get a discrete output is to have sign +functions that map the output of a linear regressor to values \( \{0,1\} \), +\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. +We will encounter this model in our first demonstration of neural networks. +

    + +

    Historically it is called the perceptron model in the machine learning +literature. This model is extremely simple. However, in many cases it is more +favorable to use a ``soft" classifier that outputs +the probability of a given category. This leads us to the logistic function. +

    +
    + +
    +

    Simple example

    + +

    The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.

    -

    Find guesses

    @@ -942,8 +1181,59 @@

    Steepest descent example

    -
    x = guesses[-1]
    -s = -df(x)
    +  
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +from IPython.display import display
    +from pylab import plt, mpl
    +mpl.rcParams['font.family'] = 'serif'
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("chddata.csv"),'r')
    +
    +# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
    +chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
    +chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
    +output = chd['CHD']
    +age = chd['Age']
    +agegroup = chd['Agegroup']
    +numberID  = chd['ID'] 
    +display(chd)
    +
    +plt.scatter(age, output, marker='o')
    +plt.axis([18,70.0,-0.1, 1.2])
    +plt.xlabel(r'Age')
    +plt.ylabel(r'CHD')
    +plt.title(r'Age distribution and Coronary heart disease')
    +plt.show()
     
    @@ -958,8 +1248,13 @@

    Steepest descent example

    +
    + +
    +

    Plotting the mean value for each group

    + +

    What we could attempt however is to plot the mean value for each group.

    -

    Run it!

    @@ -967,13 +1262,14 @@

    Steepest descent example

    -
    def f1d(alpha):
    -    return f(x + alpha*s)
    -
    -alpha_opt = sopt.golden(f1d)
    -next_guess = x + alpha_opt * s
    -guesses.append(next_guess)
    -print(next_guess)
    +  
    agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
    +group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
    +plt.plot(group, agegroupmean, "r-")
    +plt.axis([0,9,0, 1.0])
    +plt.xlabel(r'Age group')
    +plt.ylabel(r'CHD mean values')
    +plt.title(r'Mean values for each age group')
    +plt.show()
     
    @@ -989,7 +1285,57 @@

    Steepest descent example

    -

    What happened?

    +

    We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \). +In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model +

    +

     
    +$$ +f(y_i\vert x_i)=\theta_0+\theta_1 x_i. +$$ +

     
    + +

    This expression implies however that \( f(y_i\vert x_i) \) could take any +value from minus infinity to plus infinity. If we however let +\( f(y\vert y) \) be represented by the mean value, the above example +shows us that we can constrain the function to take values between +zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking +at our last curve we see also that it has an S-shaped form. This leads +us to a very popular model for the function \( f \), namely the so-called +Sigmoid function or logistic model. We will consider this function as +representing the probability for finding a value of \( y_i \) with a given +\( x_i \). +

    +
    + +
    +

    The logistic function

    + +

    Another widely studied model, is the so-called +perceptron model, which is an example of a "hard classification" model. We +will encounter this model when we discuss neural networks as +well. Each datapoint is deterministically assigned to a category (i.e +\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a "soft" +classifier that outputs the probability of a given category rather +than a single value. For example, given \( x_i \), the classifier +outputs the probability of being in a category \( k \). Logistic regression +is the most common example of a so-called soft classifier. In logistic +regression, the probability that a data point \( x_i \) +belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, +

    +

     
    +$$ +p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}. +$$ +

     
    + +

    Note that \( 1-p(t)= p(-t) \).

    +
    + +
    +

    Examples of likelihood functions used in logistic regression and nueral networks

    + +

    The following code plots the logistic function, the step function and other functions we will encounter from here and on.

    +
    @@ -997,10 +1343,60 @@

    Steepest descent example

    -
    pt.axis("equal")
    -pt.contour(xmesh, ymesh, fmesh, 50)
    -it_array = np.array(guesses)
    -pt.plot(it_array.T[0], it_array.T[1], "x-")
    +  
    """The sigmoid function (or the logistic curve) is a
    +function that takes any real number, z, and outputs a number (0,1).
    +It is useful in neural networks for assigning weights on a relative scale.
    +The value z is the weighted sum of parameters involved in the learning algorithm."""
    +
    +import numpy
    +import matplotlib.pyplot as plt
    +import math as mt
    +
    +z = numpy.arange(-5, 5, .1)
    +sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
    +sigma = sigma_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, sigma)
    +ax.set_ylim([-0.1, 1.1])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sigmoid function')
    +
    +plt.show()
    +
    +"""Step Function"""
    +z = numpy.arange(-5, 5, .02)
    +step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
    +step = step_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, step)
    +ax.set_ylim([-0.5, 1.5])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('step function')
    +
    +plt.show()
    +
    +"""tanh Function"""
    +z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
    +t = numpy.tanh(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, t)
    +ax.set_ylim([-1.0, 1.0])
    +ax.set_xlim([-2*mt.pi,2*mt.pi])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('tanh function')
    +
    +plt.show()
     
    @@ -1015,576 +1411,322 @@

    Steepest descent example

    - -

    Note that we did only one iteration here. We can easily add more using our previous guesses.

    -

    Conjugate gradient method

    -
    - -

    -

    In the CG method we define so-called conjugate directions and two vectors -\( \boldsymbol{s} \) and \( \boldsymbol{t} \) -are said to be -conjugate if -

    +

    Two parameters

    + +

    We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities

     
    $$ -\begin{equation*} -\boldsymbol{s}^T\boldsymbol{A}\boldsymbol{t}= 0. -\end{equation*} +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} $$

     
    -

    The philosophy of the CG method is to perform searches in various conjugate directions -of our vectors \( \boldsymbol{x}_i \) obeying the above criterion, namely -

    +

    where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \).

    + +

    Note that we used

     
    $$ -\begin{equation*} -\boldsymbol{x}_i^T\boldsymbol{A}\boldsymbol{x}_j= 0. -\end{equation*} +p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}). $$

     
    - -

    Two vectors are conjugate if they are orthogonal with respect to -this inner product. Being conjugate is a symmetric relation: if \( \boldsymbol{s} \) is conjugate to \( \boldsymbol{t} \), then \( \boldsymbol{t} \) is conjugate to \( \boldsymbol{s} \). -

    -
    -

    Conjugate gradient method

    -
    - -

    -

    An example is given by the eigenvectors of the matrix

    +

    Maximum likelihood

    + +

    In order to define the total likelihood for all possible outcomes from a +dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels +\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called Maximum Likelihood Estimation (MLE) principle. +We aim thus at maximizing +the probability of seeing the observed data. We can then approximate the +likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is +

     
    $$ -\begin{equation*} -\boldsymbol{v}_i^T\boldsymbol{A}\boldsymbol{v}_j= \lambda\boldsymbol{v}_i^T\boldsymbol{v}_j, -\end{equation*} +\begin{align*} +P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\ +\end{align*} $$

     
    -

    which is zero unless \( i=j \).

    -
    -
    - -
    -

    Conjugate gradient method

    -
    - -

    -

    Assume now that we have a symmetric positive-definite matrix \( \boldsymbol{A} \) of size -\( n\times n \). At each iteration \( i+1 \) we obtain the conjugate direction of a vector -

    -

     
    -$$ -\begin{equation*} -\boldsymbol{x}_{i+1}=\boldsymbol{x}_{i}+\alpha_i\boldsymbol{p}_{i}. -\end{equation*} -$$ -

     
    - -

    We assume that \( \boldsymbol{p}_{i} \) is a sequence of \( n \) mutually conjugate directions. -Then the \( \boldsymbol{p}_{i} \) form a basis of \( R^n \) and we can expand the solution -$ \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}$ in this basis, namely -

    - +

    from which we obtain the log-likelihood and our cost/loss function

     
    $$ -\begin{equation*} - \boldsymbol{x} = \sum^{n}_{i=1} \alpha_i \boldsymbol{p}_i. -\end{equation*} +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right). $$

     
    -

    -

    Conjugate gradient method

    -
    - -

    -

    The coefficients are given by

    -

     
    -$$ -\begin{equation*} - \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}. -\end{equation*} -$$ -

     
    - -

    Multiplying with \( \boldsymbol{p}_k^T \) from the left gives

    +

    The cost function rewritten

    +

    Reordering the logarithms, we can rewrite the cost/loss function as

     
    $$ -\begin{equation*} - \boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{x} = \sum^{n}_{i=1} \alpha_i\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{p}_i= \boldsymbol{p}_k^T \boldsymbol{b}, -\end{equation*} +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). $$

     
    -

    and we can define the coefficients \( \alpha_k \) as

    - +

    The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \). +Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that +

     
    $$ -\begin{equation*} - \alpha_k = \frac{\boldsymbol{p}_k^T \boldsymbol{b}}{\boldsymbol{p}_k^T \boldsymbol{A} \boldsymbol{p}_k} -\end{equation*} +\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). $$

     
    -

    + +

    This equation is known in statistics as the cross entropy. Finally, we note that just as in linear regression, +in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression. +

    -

    Conjugate gradient method and iterations

    -
    - -

    +

    Minimizing the cross entropy

    -

    If we choose the conjugate vectors \( \boldsymbol{p}_k \) carefully, -then we may not need all of them to obtain a good approximation to the solution -\( \boldsymbol{x} \). -We want to regard the conjugate gradient method as an iterative method. -This will us to solve systems where \( n \) is so large that the direct -method would take too much time. +

    The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and, +therefore, any local minimizer is a global minimizer.

    -

    We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). -We can assume without loss of generality that +

    Minimizing this +cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain

    +

     
    $$ -\begin{equation*} -\boldsymbol{x}_0=0, -\end{equation*} +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right), $$

     
    -

    or consider the system

    +

    and

     
    $$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0, -\end{equation*} +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right). $$

     
    - -

    instead.

    -
    -

    Conjugate gradient method

    -
    - -

    -

    One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form

    +

    A more compact expression

    + +

    Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an +\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a +vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first +derivative of the cost function as +

    +

     
    $$ -\begin{equation*} - f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. -\end{equation*} +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). $$

     
    -

    This suggests taking the first basis vector \( \boldsymbol{p}_1 \) -to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), -which equals +

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements +\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as

    +

     
    $$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b}, -\end{equation*} +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. $$

     
    - -

    and -\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \). -The other vectors in the basis will be conjugate to the gradient, -hence the name conjugate gradient method. -

    -
    -

    Conjugate gradient method

    -
    - -

    -

    Let \( \boldsymbol{r}_k \) be the residual at the \( k \)-th step:

    +

    Extending to more predictors

    + +

    Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors

     
    $$ -\begin{equation*} -\boldsymbol{r}_k=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k. -\end{equation*} +\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p. $$

     
    -

    Note that \( \boldsymbol{r}_k \) is the negative gradient of \( f \) at -\( \boldsymbol{x}=\boldsymbol{x}_k \), -so the gradient descent method would be to move in the direction \( \boldsymbol{r}_k \). -Here, we insist that the directions \( \boldsymbol{p}_k \) are conjugate to each other, -so we take the direction closest to the gradient \( \boldsymbol{r}_k \) -under the conjugacy constraint. -This gives the following expression -

    +

    Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to

     
    $$ -\begin{equation*} -\boldsymbol{p}_{k+1}=\boldsymbol{r}_k-\frac{\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{r}_k}{\boldsymbol{p}_k^T\boldsymbol{A}\boldsymbol{p}_k} \boldsymbol{p}_k. -\end{equation*} +p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}. $$

     
    -

    -

    Conjugate gradient method

    -
    - -

    -

    We can also compute the residual iteratively as

    -

     
    -$$ -\begin{equation*} -\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1}, - \end{equation*} -$$ -

     
    +

    Including more classes

    + +

    Till now we have mainly focused on two classes, the so-called binary +system. Suppose we wish to extend to \( K \) classes. Let us for the sake +of simplicity assume we have only two predictors. We have then following model +

    -

    which equals

     
    $$ -\begin{equation*} -\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{p}_k), - \end{equation*} +\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1, $$

     
    -

    or

    +

    and

     
    $$ -\begin{equation*} -(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{p}_k, - \end{equation*} +\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1, $$

     
    -

    which gives

    - +

    and so on till the class \( C=K-1 \) class

     
    $$ -\begin{equation*} -\boldsymbol{r}_{k+1}=\boldsymbol{r}_k-\boldsymbol{A}\boldsymbol{p}_{k}, - \end{equation*} +\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1, $$

     
    -

    + +

    and the model is specified in term of \( K-1 \) so-called log-odds or +logit transformations. +

    -

    Revisiting our first homework

    +

    More classes

    -

    We will use linear regression as a case study for the gradient descent -methods. Linear regression is a great test case for the gradient -descent methods discussed in the lectures since it has several -desirable properties such as: +

    In our discussion of neural networks we will encounter the above again +in terms of a slightly modified function, the so-called Softmax function.

    -
      -

    1. An analytical solution (recall homework set 1).
    2. -

    3. The gradient can be computed analytically.
    4. -

    5. The cost function is convex which guarantees that gradient descent converges for small enough learning rates
    6. -
    -

    -

    We revisit an example similar to what we had in the first homework set. We had a function of the type

    - - - -
    -
    -
    -
    -
    -
    m = 100
    -x = 2*np.random.rand(m,1)
    -y = 4+3*x+np.random.randn(m,1)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    with \( x_i \in [0,1] \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). -The linear regression model is given by +

    The softmax function is used in various multiclass classification +methods, such as multinomial logistic regression (also known as +softmax regression), multiclass linear discriminant analysis, naive +Bayes classifiers, and artificial neural networks. Specifically, in +multinomial logistic regression and linear discriminant analysis, the +input to the function is the result of \( K \) distinct linear functions, +and the predicted probability for the \( k \)-th class given a sample +vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two +predictors):

    -

     
    -$$ -h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x, -$$ -

     
    -

    such that

     
    $$ -\boldsymbol{y}_i = \beta_0 + \beta_1 x_i. +p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}. $$

     
    -

    - -
    -

    Gradient descent example

    - -

    Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)

    -

    It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2} \) is the design matrix given by (we keep the intercept here)

    +

    It is easy to extend to more predictors. The final class is

     
    $$ -X \equiv \begin{bmatrix} -1 & x_1 \\ -\vdots & \vdots \\ -1 & x_{100} & \\ -\end{bmatrix}. +p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}, $$

     
    -

    The cost/loss/risk function is given by (

    -

     
    -$$ -C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] -$$ -

     
    +

    and they sum to one. Our earlier discussions were all specialized to +the case with two classes only. It is easy to see from the above that +what we derived earlier is compatible with these equations. +

    -

    and we want to find \( \beta \) such that \( C(\beta) \) is minimized.

    +

    To find the optimal parameters we would typically use a gradient +descent method. Newton's method and gradient descent methods are +discussed in the material on optimization +methods. +

    -

    The derivative of the cost/loss function

    +

    Optimization, the central part of any Machine Learning algortithm

    -

    Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show that the gradient can be written as

    -

     
    -$$ -\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\ -\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\ -\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), -$$ -

     
    - -

    where \( X \) is the design matrix defined above.

    +

    Almost every problem in machine learning and data science starts with +a dataset \( X \), a model \( g(\theta) \), which is a function of the +parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows +us to judge how well the model \( g(\theta) \) explains the observations +\( X \). The model is fit by finding the values of \( \theta \) that minimize +the cost function. Ideally we would be able to solve for \( \theta \) +analytically, however this is not possible in general and we must use +some approximative/numerical method to compute the minimum. +

    -

    The Hessian matrix

    -

    The Hessian matrix of \( C(\beta) \) is given by

    +

    Revisiting our Logistic Regression case

    + +

    In our discussion on Logistic Regression we studied the +case of +two classes, with \( y_i \) either +\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two +parameters \( \theta \) in our fitting, that is we +defined probabilities +

    +

     
    $$ -\boldsymbol{H} \equiv \begin{bmatrix} -\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} \\ -\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} & \\ -\end{bmatrix} = \frac{2}{n}X^T X. +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} $$

     
    -

    This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.

    +

    where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \).

    -

    Simple program

    +

    The equations to solve

    + +

    Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \) +elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the +\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities +\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form +the first derivative of the cost function as +

    -

    We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to

     
    $$ -\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). $$

     
    -

    We can use the expression we computed for the gradient and let use a -\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating -when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). Note that the code below does not include the latter stop criterion. -

    - -

    And finally we can compare our solution for \( \beta \) with the analytic result given by -\( \beta= (X^TX)^{-1} X^T \mathbf{y} \). +

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements +\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as

    -
    - -
    -

    Gradient Descent Example

    -

    Here our simple example

    +

     
    +$$ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +$$ +

     
    - -

    -
    -
    -
    -
    -
    # Importing various packages
    -from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from mpl_toolkits.mplot3d import Axes3D
    -from matplotlib import cm
    -from matplotlib.ticker import LinearLocator, FormatStrFormatter
    -import sys
    -
    -# the number of datapoints
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -# Hessian matrix
    -H = (2.0/n)* X.T @ X
    -# Get the eigenvalues
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
    -print(beta_linreg)
    -beta = np.random.randn(2,1)
    -
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -
    -for iter in range(Niterations):
    -    gradient = (2.0/n)*X.T @ (X @ beta-y)
    -    beta -= eta*gradient
    -
    -print(beta)
    -xnew = np.array([[0],[2]])
    -xbnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = xbnew.dot(beta)
    -ypredict2 = xbnew.dot(beta_linreg)
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Gradient descent example')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    +

    This defines what is called the Hessian matrix.

    -

    And a corresponding example using scikit-learn

    - +

    Solving using Newton-Raphson's method

    - -
    -
    -
    -
    -
    -
    # Importing various packages
    -from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from sklearn.linear_model import SGDRegressor
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    -print(beta_linreg)
    -sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)
    -sgdreg.fit(x,y.ravel())
    -print(sgdreg.intercept_, sgdreg.coef_)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    +

    If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.

    -
    -

    Gradient descent and Ridge

    +

    Our iterative scheme is then given by

    -

    We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \),

     
    $$ -C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0. +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}}, $$

     
    -

    In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows

    -

     
    -$$ -\nabla_\beta C_{\text{ridge}}(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\ -\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\ -\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta). -$$ -

     
    +

    or in matrix form as

    -

    We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by

     
    $$ -\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}. +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}. $$

     
    -

    -
    -

    The Hessian matrix for Ridge Regression

    -

    The Hessian matrix of Ridge Regression for our simple example is given by

    -

     
    -$$ -\boldsymbol{H} \equiv \begin{bmatrix} -\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} \\ -\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} & \\ -\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}. -$$ -

     
    +

    The right-hand side is computed with the old values of \( \theta \).

    -

    This implies that the Hessian matrix is positive definite, hence the stationary point is a -minimum. -Note that the Ridge cost function is convex being a sum of two convex -functions. Therefore, the stationary point is a global -minimum of this function. -

    +

    If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.

    -

    Program example for gradient descent with Ridge Regression

    +

    Example code for Logistic Regression

    + +

    Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.

    @@ -1592,56 +1734,138 @@

    Program exam
    -
    from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from mpl_toolkits.mplot3d import Axes3D
    -from matplotlib import cm
    -from matplotlib.ticker import LinearLocator, FormatStrFormatter
    -import sys
    -
    -# the number of datapoints
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -
    -#Ridge parameter lambda
    -lmbda  = 0.001
    -Id = n*lmbda* np.eye(XT_X.shape[0])
    -
    -# Hessian matrix
    -H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
    -# Get the eigenvalues
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -
    -beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
    -print(beta_linreg)
    -# Start plain gradient descent
    -beta = np.random.randn(2,1)
    -
    -eta = 1.0/np.max(EigValues)
    -Niterations = 100
    -
    -for iter in range(Niterations):
    -    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta
    -    beta -= eta*gradients
    -
    -print(beta)
    -ypredict = X @ beta
    -ypredict2 = X @ beta_linreg
    -plt.plot(x, ypredict, "r-")
    -plt.plot(x, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Gradient descent example for Ridge')
    -plt.show()
    +  
    import numpy as np
    +
    +class LogisticRegression:
    +    """
    +    Logistic Regression for binary and multiclass classification.
    +    """
    +    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
    +        self.lr = lr                  # Learning rate for gradient descent
    +        self.epochs = epochs          # Number of iterations
    +        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
    +        self.verbose = verbose        # Print loss during training if True
    +        self.weights = None
    +        self.multi_class = False      # Will be determined at fit time
    +
    +    def _add_intercept(self, X):
    +        """Add intercept term (column of ones) to feature matrix."""
    +        intercept = np.ones((X.shape[0], 1))
    +        return np.concatenate((intercept, X), axis=1)
    +
    +    def _sigmoid(self, z):
    +        """Sigmoid function for binary logistic."""
    +        return 1 / (1 + np.exp(-z))
    +
    +    def _softmax(self, Z):
    +        """Softmax function for multiclass logistic."""
    +        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
    +        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
    +
    +    def fit(self, X, y):
    +        """
    +        Train the logistic regression model using gradient descent.
    +        Supports binary (sigmoid) and multiclass (softmax) based on y.
    +        """
    +        X = np.array(X)
    +        y = np.array(y)
    +        n_samples, n_features = X.shape
    +
    +        # Add intercept if needed
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +            n_features += 1
    +
    +        # Determine classes and mode (binary vs multiclass)
    +        unique_classes = np.unique(y)
    +        if len(unique_classes) > 2:
    +            self.multi_class = True
    +        else:
    +            self.multi_class = False
    +
    +        # ----- Multiclass case -----
    +        if self.multi_class:
    +            n_classes = len(unique_classes)
    +            # Map original labels to 0...n_classes-1
    +            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
    +            y_indices = np.array([class_to_index[c] for c in y])
    +            # Initialize weight matrix (features x classes)
    +            self.weights = np.zeros((n_features, n_classes))
    +
    +            # One-hot encode y
    +            Y_onehot = np.zeros((n_samples, n_classes))
    +            Y_onehot[np.arange(n_samples), y_indices] = 1
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
    +                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
    +                # Compute gradient (features x classes)
    +                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
    +                # Update weights
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute current loss (categorical cross-entropy)
    +                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
    +                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
    +
    +        # ----- Binary case -----
    +        else:
    +            # Convert y to 0/1 if not already
    +            if not np.array_equal(unique_classes, [0, 1]):
    +                # Map the two classes to 0 and 1
    +                class0, class1 = unique_classes
    +                y_binary = np.where(y == class1, 1, 0)
    +            else:
    +                y_binary = y.copy().astype(int)
    +
    +            # Initialize weights vector (features,)
    +            self.weights = np.zeros(n_features)
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                linear_model = X.dot(self.weights)     # (n_samples,)
    +                probs = self._sigmoid(linear_model)   # (n_samples,)
    +                # Gradient for binary cross-entropy
    +                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute binary cross-entropy loss
    +                    loss = -np.mean(
    +                        y_binary * np.log(probs + 1e-15) + 
    +                        (1 - y_binary) * np.log(1 - probs + 1e-15)
    +                    )
    +                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
    +
    +    def predict_prob(self, X):
    +        """
    +        Compute probability estimates. Returns a 1D array for binary or
    +        a 2D array (n_samples x n_classes) for multiclass.
    +        """
    +        X = np.array(X)
    +        # Add intercept if the model used it
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +        scores = X.dot(self.weights)
    +        if self.multi_class:
    +            return self._softmax(scores)
    +        else:
    +            return self._sigmoid(scores)
    +
    +    def predict(self, X):
    +        """
    +        Predict class labels for samples in X.
    +        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
    +        """
    +        probs = self.predict_prob(X)
    +        if self.multi_class:
    +            # Choose class with highest probability
    +            return np.argmax(probs, axis=1)
    +        else:
    +            # Threshold at 0.5 for binary
    +            return (probs >= 0.5).astype(int)
     
    @@ -1656,25 +1880,17 @@

    Program exam

    -
    - -
    -

    Using gradient descent methods, limitations

    - -
      -

    • Gradient descent (GD) finds local minima of our function. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
    • -

    • GD is sensitive to initial conditions. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
    • -

    • Gradients are computationally expensive to calculate for large datasets. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over all \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called "mini batches". This has the added benefit of introducing stochasticity into our algorithm.
    • -

    • GD is very sensitive to choices of learning rates. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would adaptively choose the learning rates to match the landscape.
    • -

    • GD treats all directions in parameter space uniformly. Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.
    • -

    • GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
    • -
    -
    - -
    -

    Improving gradient descent with momentum

    -

    We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.

    +

    The class implements the sigmoid and softmax internally. During fit(), +we check the number of classes: if more than 2, we set +self.multi_class=True and perform multinomial logistic regression. We +one-hot encode the target vector and update a weight matrix with +softmax probabilities. Otherwise, we do standard binary logistic +regression, converting labels to 0/1 if needed and updating a weight +vector. In both cases we use batch gradient descent on the +cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical +stability). Progress (loss) can be printed if verbose=True. +

    @@ -1683,61 +1899,38 @@

    Improving gradient descent wit
    -
    from numpy import asarray
    -from numpy import arange
    -from numpy.random import rand
    -from numpy.random import seed
    -from matplotlib import pyplot
    - 
    -# objective function
    -def objective(x):
    -	return x**2.0
    - 
    -# derivative of objective function
    -def derivative(x):
    -	return x * 2.0
    - 
    -# gradient descent algorithm
    -def gradient_descent(objective, derivative, bounds, n_iter, step_size):
    -	# track all solutions
    -	solutions, scores = list(), list()
    -	# generate an initial point
    -	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    -	# run the gradient descent
    -	for i in range(n_iter):
    -		# calculate gradient
    -		gradient = derivative(solution)
    -		# take a step
    -		solution = solution - step_size * gradient
    -		# evaluate candidate point
    -		solution_eval = objective(solution)
    -		# store solution
    -		solutions.append(solution)
    -		scores.append(solution_eval)
    -		# report progress
    -		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    -	return [solutions, scores]
    - 
    -# seed the pseudo random number generator
    -seed(4)
    -# define range for input
    -bounds = asarray([[-1.0, 1.0]])
    -# define the total iterations
    -n_iter = 30
    -# define the step size
    -step_size = 0.1
    -# perform the gradient descent search
    -solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
    -# sample input range uniformly at 0.1 increments
    -inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    -# compute targets
    -results = objective(inputs)
    -# create a line plot of input vs result
    -pyplot.plot(inputs, results)
    -# plot the solutions found
    -pyplot.plot(solutions, scores, '.-', color='red')
    -# show the plot
    -pyplot.show()
    +  
    # Evaluation Metrics
    +#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
    +
    +def accuracy_score(y_true, y_pred):
    +    """Accuracy = (# correct predictions) / (total samples)."""
    +    y_true = np.array(y_true)
    +    y_pred = np.array(y_pred)
    +    return np.mean(y_true == y_pred)
    +
    +def binary_cross_entropy(y_true, y_prob):
    +    """
    +    Binary cross-entropy loss.
    +    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
    +    """
    +    y_true = np.array(y_true)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
    +
    +def categorical_cross_entropy(y_true, y_prob):
    +    """
    +    Categorical cross-entropy loss for multiclass.
    +    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
    +    """
    +    y_true = np.array(y_true, dtype=int)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    # One-hot encode true labels
    +    n_samples, n_classes = y_prob.shape
    +    one_hot = np.zeros_like(y_prob)
    +    one_hot[np.arange(n_samples), y_true] = 1
    +    # Compute cross-entropy
    +    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
    +    return np.mean(loss_vec)
     
    @@ -1752,10 +1945,11 @@

    Improving gradient descent wit

    -

    +

    Synthetic data generation

    -
    -

    Same code but now with momentum gradient descent

    +

    Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2]. +Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space. +

    @@ -1764,2083 +1958,84 @@

    Same code but now with
    -
    from numpy import asarray
    -from numpy import arange
    -from numpy.random import rand
    -from numpy.random import seed
    -from matplotlib import pyplot
    - 
    -# objective function
    -def objective(x):
    -	return x**2.0
    - 
    -# derivative of objective function
    -def derivative(x):
    -	return x * 2.0
    - 
    -# gradient descent algorithm
    -def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
    -	# track all solutions
    -	solutions, scores = list(), list()
    -	# generate an initial point
    -	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    -	# keep track of the change
    -	change = 0.0
    -	# run the gradient descent
    -	for i in range(n_iter):
    -		# calculate gradient
    -		gradient = derivative(solution)
    -		# calculate update
    -		new_change = step_size * gradient + momentum * change
    -		# take a step
    -		solution = solution - new_change
    -		# save the change
    -		change = new_change
    -		# evaluate candidate point
    -		solution_eval = objective(solution)
    -		# store solution
    -		solutions.append(solution)
    -		scores.append(solution_eval)
    -		# report progress
    -		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    -	return [solutions, scores]
    - 
    -# seed the pseudo random number generator
    -seed(4)
    -# define range for input
    -bounds = asarray([[-1.0, 1.0]])
    -# define the total iterations
    -n_iter = 30
    -# define the step size
    -step_size = 0.1
    -# define momentum
    -momentum = 0.3
    -# perform the gradient descent search with momentum
    -solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
    -# sample input range uniformly at 0.1 increments
    -inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    -# compute targets
    -results = objective(inputs)
    -# create a line plot of input vs result
    -pyplot.plot(inputs, results)
    -# plot the solutions found
    -pyplot.plot(solutions, scores, '.-', color='red')
    -# show the plot
    -pyplot.show()
    -
    -
    -
    -
    - -
    -
    -
    -
    -
    -
    -
    -
    - -

    - -
    -

    Overview video on Stochastic Gradient Descent

    - -What is Stochastic Gradient Descent -
    - -
    -

    Batches and mini-batches

    - -

    In gradient descent we compute the cost function and its gradient for all data points we have.

    - -

    In large-scale applications such as the ILSVRC challenge, the -training data can have on order of millions of examples. Hence, it -seems wasteful to compute the full cost function over the entire -training set in order to perform only a single parameter update. A -very common approach to addressing this challenge is to compute the -gradient over batches of the training data. For example, a typical batch could contain some thousand examples from -an entire training set of several millions. This batch is then used to -perform a parameter update. -

    -
    - -
    -

    Stochastic Gradient Descent (SGD)

    - -

    In stochastic gradient descent, the extreme case is the case where we -have only one batch, that is we include the whole data set. -

    - -

    This process is called Stochastic Gradient -Descent (SGD) (or also sometimes on-line gradient descent). This is -relatively less common to see because in practice due to vectorized -code optimizations it can be computationally much more efficient to -evaluate the gradient for 100 examples, than the gradient for one -example 100 times. Even though SGD technically refers to using a -single example at a time to evaluate the gradient, you will hear -people use the term SGD even when referring to mini-batch gradient -descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD -for “Batch gradient descent” are rare to see), where it is usually -assumed that mini-batches are used. The size of the mini-batch is a -hyperparameter but it is not very common to cross-validate or bootstrap it. It is -usually based on memory constraints (if any), or set to some value, -e.g. 32, 64 or 128. We use powers of 2 in practice because many -vectorized operation implementations work faster when their inputs are -sized in powers of 2. -

    - -

    In our notes with SGD we mean stochastic gradient descent with mini-batches.

    -
    - -
    -

    Stochastic Gradient Descent

    - -

    Stochastic gradient descent (SGD) and variants thereof address some of -the shortcomings of the Gradient descent method discussed above. -

    - -

    The underlying idea of SGD comes from the observation that the cost -function, which we want to minimize, can almost always be written as a -sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \), -

    -

     
    -$$ -C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i, -\mathbf{\beta}). -$$ -

     
    -

    - -
    -

    Computation of gradients

    - -

    This in turn means that the gradient can be -computed as a sum over \( i \)-gradients -

    -

     
    -$$ -\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i, -\mathbf{\beta}). -$$ -

     
    - -

    Stochasticity/randomness is introduced by only taking the -gradient on a subset of the data called minibatches. If there are \( n \) -data points and the size of each minibatch is \( M \), there will be \( n/M \) -minibatches. We denote these minibatches by \( B_k \) where -\( k=1,\cdots,n/M \). -

    -
    - -
    -

    SGD example

    -

    As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) -and we choose to have \( M=5 \) minibathces, -then each minibatch contains two data points. In particular we have -\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 = -(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you -have only a single batch with all data points and on the other extreme, -you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e -\( B_k = \mathbf{x}_k \). -

    - -

    The idea is now to approximate the gradient by replacing the sum over -all data points with a sum over the data points in one the minibatches -picked at random in each gradient descent step -

    -

     
    -$$ -\nabla_{\beta} -C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i, -\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta -c_i(\mathbf{x}_i, \mathbf{\beta}). -$$ -

     
    -

    - -
    -

    The gradient step

    - -

    Thus a gradient descent step now looks like

    -

     
    -$$ -\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i, -\mathbf{\beta}) -$$ -

     
    - -

    where \( k \) is picked at random with equal -probability from \( [1,n/M] \). An iteration over the number of -minibathces (n/M) is commonly referred to as an epoch. Thus it is -typical to choose a number of epochs and for each epoch iterate over -the number of minibatches, as exemplified in the code below. -

    -
    - -
    -

    Simple example code

    - - - -
    -
    -
    -
    -
    -
    import numpy as np 
    -
    -n = 100 #100 datapoints 
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -n_epochs = 10 #number of epochs
    -
    -j = 0
    -for epoch in range(1,n_epochs+1):
    -    for i in range(m):
    -        k = np.random.randint(m) #Pick the k-th minibatch at random
    -        #Compute the gradient using the data in minibatch Bk
    -        #Compute new suggestion for 
    -        j += 1
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Taking the gradient only on a subset of the data has two important -benefits. First, it introduces randomness which decreases the chance -that our opmization scheme gets stuck in a local minima. Second, if -the size of the minibatches are small relative to the number of -datapoints (\( M < n \)), the computation of the gradient is much -cheaper since we sum over the datapoints in the \( k-th \) minibatch and not -all \( n \) datapoints. -

    -
    - -
    -

    When do we stop?

    - -

    A natural question is when do we stop the search for a new minimum? -One possibility is to compute the full gradient after a given number -of epochs and check if the norm of the gradient is smaller than some -threshold and stop if true. However, the condition that the gradient -is zero is valid also for local minima, so this would only tell us -that we are close to a local/global minimum. However, we could also -evaluate the cost function at this point, store the result and -continue the search. If the test kicks in at a later stage we can -compare the values of the cost function and keep the \( \beta \) that -gave the lowest value. -

    -
    - -
    -

    Slightly different approach

    - -

    Another approach is to let the step length \( \gamma_j \) depend on the -number of epochs in such a way that it becomes very small after a -reasonable time such that we do not move at all. Such approaches are -also called scaling. There are many such ways to scale the learning -rate -and discussions here. See -also -https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 -for a discussion of different scaling functions for the learning rate. -

    -
    - -
    -

    Time decay rate

    - -

    As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function

     
    -$$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ -

     
    goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in time \( t \).

    - -

    In this way we can fix the number of epochs, compute \( \beta \) and -evaluate the cost function at the end. Repeating the computation will -give a different result since the scheme is random by design. Then we -pick the final \( \beta \) that gives the lowest value of the cost -function. -

    - - - -
    -
    -
    -
    -
    -
    import numpy as np 
    -
    -def step_length(t,t0,t1):
    -    return t0/(t+t1)
    -
    -n = 100 #100 datapoints 
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -n_epochs = 500 #number of epochs
    -t0 = 1.0
    -t1 = 10
    -
    -gamma_j = t0/t1
    -j = 0
    -for epoch in range(1,n_epochs+1):
    -    for i in range(m):
    -        k = np.random.randint(m) #Pick the k-th minibatch at random
    -        #Compute the gradient using the data in minibatch Bk
    -        #Compute new suggestion for beta
    -        t = epoch*m+i
    -        gamma_j = step_length(t,t0,t1)
    -        j += 1
    -
    -print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Code with a Number of Minibatches which varies

    - -

    In the code here we vary the number of mini-batches.

    - - -
    -
    -
    -
    -
    -
    # Importing various packages
    -from math import exp, sqrt
    -from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -
    -
    -for iter in range(Niterations):
    -    gradients = 2.0/n*X.T @ ((X @ theta)-y)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -xnew = np.array([[0],[2]])
    -Xnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = Xnew.dot(theta)
    -ypredict2 = Xnew.dot(theta_linreg)
    -
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -t0, t1 = 5, 50
    -
    -def learning_schedule(t):
    -    return t0/(t+t1)
    -
    -theta = np.random.randn(2,1)
    -
    -for epoch in range(n_epochs):
    -# Can you figure out a better way of setting up the contributions to each batch?
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
    -        eta = learning_schedule(epoch*m+i)
    -        theta = theta - eta*gradients
    -print("theta from own sdg")
    -print(theta)
    -
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Random numbers ')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Replace or not

    - -

    In the above code, we have use replacement in setting up the -mini-batches. The discussion -here may be -useful. -

    -
    - -
    -

    Momentum based GD

    - -

    The stochastic gradient descent (SGD) is almost always used with a -momentum or inertia term that serves as a memory of the direction we -are moving in parameter space. This is typically implemented as -follows -

    - -

     
    -$$ -\begin{align} -\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\ -\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}, -\tag{2} -\end{align} -$$ -

     
    - -

    where we have introduced a momentum parameter \( \gamma \), with -\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to -indicate the gradient is to be taken over a different mini-batch at -each step. We call this algorithm gradient descent with momentum -(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a -running average of recently encountered gradients and -\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory -used in the averaging procedure. Consistent with this, when -\( \gamma=0 \), this just reduces down to ordinary SGD as discussed -earlier. An equivalent way of writing the updates is -

    - -

     
    -$$ -\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t), -$$ -

     
    - -

    where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).

    -
    - -
    -

    More on momentum based approaches

    - -

    Let us try to get more intuition from these equations. It is helpful -to consider a simple physical analogy with a particle of mass \( m \) -moving in a viscous medium with drag coefficient \( \mu \) and potential -\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \), -then its motion is described by -

    - -

     
    -$$ -m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}). -$$ -

     
    - -

    We can discretize this equation in the usual way to get

    - -

     
    -$$ -m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}). -$$ -

     
    - -

    Rearranging this equation, we can rewrite this as

    - -

     
    -$$ -\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t. -$$ -

     
    -

    - -
    -

    Momentum parameter

    - -

    Notice that this equation is identical to previous one if we identify -the position of the particle, \( \mathbf{w} \), with the parameters -\( \boldsymbol{\theta} \). This allows us to identify the momentum -parameter and learning rate with the mass of the particle and the -viscous drag as: -

    - -

     
    -$$ -\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}. -$$ -

     
    - -

    Thus, as the name suggests, the momentum parameter is proportional to -the mass of the particle and effectively provides inertia. -Furthermore, in the large viscosity/small learning rate limit, our -memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \). -

    - -

    Why is momentum useful? SGD momentum helps the gradient descent -algorithm gain speed in directions with persistent but small gradients -even in the presence of stochasticity, while suppressing oscillations -in high-curvature directions. This becomes especially important in -situations where the landscape is shallow and flat in some directions -and narrow and steep in others. It has been argued that first-order -methods (with appropriate initial conditions) can perform comparable -to more expensive second order methods, especially in the context of -complex deep learning models. -

    - -

    These beneficial properties of momentum can sometimes become even more -pronounced by using a slight modification of the classical momentum -algorithm called Nesterov Accelerated Gradient (NAG). -

    - -

    In the NAG algorithm, rather than calculating the gradient at the -current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one -calculates the gradient at the expected value of the parameters given -our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma -\mathbf{v}_{t-1}) \). This yields the NAG update rule -

    - -

     
    -$$ -\begin{align} -\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\ -\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}. -\tag{3} -\end{align} -$$ -

     
    - -

    One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).

    -
    - -
    -

    Second moment of the gradient

    - -

    In stochastic gradient descent, with and without momentum, we still -have to specify a schedule for tuning the learning rates \( \eta_t \) -as a function of time. As discussed in the context of Newton's -method, this presents a number of dilemmas. The learning rate is -limited by the steepest direction which can change depending on the -current position in the landscape. To circumvent this problem, ideally -our algorithm would keep track of curvature and take large steps in -shallow, flat directions and small steps in steep, narrow directions. -Second-order methods accomplish this by calculating or approximating -the Hessian and normalizing the learning rate by the -curvature. However, this is very computationally expensive for -extremely large models. Ideally, we would like to be able to -adaptively change the step size to match the landscape without paying -the steep computational price of calculating or approximating -Hessians. -

    - -

    Recently, a number of methods have been introduced that accomplish -this by tracking not only the gradient, but also the second moment of -the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and -ADAM. -

    -
    - -
    -

    RMS prop

    - -

    In RMS prop, in addition to keeping a running average of the first -moment of the gradient, we also keep track of the second moment -denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule -for RMS prop is given by -

    - -

     
    -$$ -\begin{align} -\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) -\tag{4}\\ -\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\ -\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber -\end{align} -$$ -

     
    - -

    where \( \beta \) controls the averaging time of the second moment and is -typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate -typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8} \) is a -small regularization constant to prevent divergences. Multiplication -and division by vectors is understood as an element-wise operation. It -is clear from this formula that the learning rate is reduced in -directions where the norm of the gradient is consistently large. This -greatly speeds up the convergence by allowing us to use a larger -learning rate for flat directions. -

    -
    - -
    -

    ADAM optimizer

    - -

    A related algorithm is the ADAM optimizer. In -ADAM, we keep a running average of -both the first and second moment of the gradient and use this -information to adaptively change the learning rate for different -parameters. The method isefficient when working with large -problems involving lots data and/or parameters. It is a combination of the -gradient descent with momentum algorithm and the RMSprop algorithm -discussed above. -

    - -

    In addition to keeping a running average of the first and -second moments of the gradient -(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and -\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM -performs an additional bias correction to account for the fact that we -are estimating the first two moments of the gradient using a running -average (denoted by the hats in the update rule below). The update -rule for ADAM is given by (where multiplication and division are once -again understood to be element-wise operations below) -

    - -

     
    -$$ -\begin{align} -\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) -\tag{5}\\ -\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\ -\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\ -\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\ -\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\ -\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\ -\tag{6} -\end{align} -$$ -

     
    - -

    where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and -second moment and are typically taken to be \( 0.9 \) and \( 0.99 \) -respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop. -

    - -

    Like in RMSprop, the effective step size of a parameter depends on the -magnitude of its gradient squared. To understand this better, let us -rewrite this expression in terms of the variance -\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t - -(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The -update rule for this parameter is given by -

    - -

     
    -$$ -\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 + m_t^2 }+\epsilon}. -$$ -

     
    -

    - -
    -

    Algorithms and codes for Adagrad, RMSprop and Adam

    - -

    The algorithms we have implemented are well described in the text by Goodfellow, Bengio and Courville, chapter 8.

    - -

    The codes which implement these algorithms are discussed after our presentation of automatic differentiation.

    -
    - -
    -

    Practical tips

    - -
      -

    • Randomize the data when making mini-batches. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
    • -

    • Transform your inputs. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
    • -

    • Monitor the out-of-sample performance. Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This early stopping significantly improves performance in many settings.
    • -

    • Adaptive optimization methods don't always have good generalization. Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
    • -
    -

    -

    Geron's text, see chapter 11, has several interesting discussions.

    -
    - -
    -

    Automatic differentiation

    - -

    Automatic differentiation (AD), -also called algorithmic -differentiation or computational differentiation,is a set of -techniques to numerically evaluate the derivative of a function -specified by a computer program. AD exploits the fact that every -computer program, no matter how complicated, executes a sequence of -elementary arithmetic operations (addition, subtraction, -multiplication, division, etc.) and elementary functions (exp, log, -sin, cos, etc.). By applying the chain rule repeatedly to these -operations, derivatives of arbitrary order can be computed -automatically, accurately to working precision, and using at most a -small constant factor more arithmetic operations than the original -program. -

    - -

    Automatic differentiation is neither:

    - -
      -

    • Symbolic differentiation, nor
    • -

    • Numerical differentiation (the method of finite differences).
    • -
    -

    -

    Symbolic differentiation can lead to inefficient code and faces the -difficulty of converting a computer program into a single expression, -while numerical differentiation can introduce round-off errors in the -discretization process and cancellation -

    - -

    Python has tools for so-called automatic differentiation. -Consider the following example -

    -

     
    -$$ -f(x) = \sin\left(2\pi x + x^2\right) -$$ -

     
    - -

    which has the following derivative

    -

     
    -$$ -f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) -$$ -

     
    - -

    Using autograd we have

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -
    -# To do elementwise differentiation:
    -from autograd import elementwise_grad as egrad 
    -
    -# To plot:
    -import matplotlib.pyplot as plt 
    -
    -
    -def f(x):
    -    return np.sin(2*np.pi*x + x**2)
    -
    -def f_grad_analytic(x):
    -    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
    -
    -# Do the comparison:
    -x = np.linspace(0,1,1000)
    -
    -f_grad = egrad(f)
    -
    -computed = f_grad(x)
    -analytic = f_grad_analytic(x)
    -
    -plt.title('Derivative computed from Autograd compared with the analytical derivative')
    -plt.plot(x,computed,label='autograd')
    -plt.plot(x,analytic,label='analytic')
    -
    -plt.xlabel('x')
    -plt.ylabel('y')
    -plt.legend()
    -
    -plt.show()
    -
    -print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Using autograd

    - -

    Here we -experiment with what kind of functions Autograd is capable -of finding the gradient of. The following Python functions are just -meant to illustrate what Autograd can do, but please feel free to -experiment with other, possibly more complicated, functions as well. -

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -
    -def f1(x):
    -    return x**3 + 1
    -
    -f1_grad = grad(f1)
    -
    -# Remember to send in float as argument to the computed gradient from Autograd!
    -a = 1.0
    -
    -# See the evaluated gradient at a using autograd:
    -print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
    -
    -# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
    -grad_analytical = 3*a**2
    -print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Autograd with more complicated functions

    - -

    To differentiate with respect to two (or more) arguments of a Python -function, Autograd need to know at which variable the function if -being differentiated with respect to. -

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f2(x1,x2):
    -    return 3*x1**3 + x2*(x1 - 5) + 1
    -
    -# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
    -f2_grad_x1 = grad(f2,0)
    -
    -# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
    -f2_grad_x2 = grad(f2,1)
    -
    -x1 = 1.0
    -x2 = 3.0 
    -
    -print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
    -print("-"*30)
    -
    -# Compare with the analytical derivatives:
    -
    -# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
    -f2_grad_x1_analytical = 9*x1**2 + x2
    -
    -# Derivative of f2 w.r.t x2 is: x1 - 5:
    -f2_grad_x2_analytical = x1 - 5
    -
    -# See the evaluated derivations:
    -print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
    -print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
    -
    -print()
    -
    -print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
    -print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.

    -
    - -
    -

    More complicated functions using the elements of their arguments directly

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f3(x): # Assumes x is an array of length 5 or higher
    -    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
    -
    -f3_grad = grad(f3)
    -
    -x = np.linspace(0,4,5)
    -
    -# Print the computed gradient:
    -print("The computed gradient of f3 is: ", f3_grad(x))
    -
    -# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
    -f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
    -
    -# Print the analytical gradient:
    -print("The analytical gradient of f3 is: ", f3_grad_analytical)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Note that in this case, when sending an array as input argument, the -output from Autograd is another array. This is the true gradient of -the function, as opposed to the function in the previous example. By -using arrays to represent the variables, the output from Autograd -might be easier to work with, as the output is closer to what one -could expect form a gradient-evaluting function. -

    -
    - -
    -

    Functions using mathematical functions from Numpy

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f4(x):
    -    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
    -
    -f4_grad = grad(f4)
    -
    -x = 2.7
    -
    -# Print the computed derivative:
    -print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
    -
    -# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
    -f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
    -
    -# Print the analytical gradient:
    -print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    More autograd

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f5(x):
    -    if x >= 0:
    -        return x**2
    -    else:
    -        return -3*x + 1
    -
    -f5_grad = grad(f5)
    -
    -x = 2.7
    -
    -# Print the computed derivative:
    -print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    And with loops

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f6_for(x):
    -    val = 0
    -    for i in range(10):
    -        val = val + x**i
    -    return val
    -
    -def f6_while(x):
    -    val = 0
    -    i = 0
    -    while i < 10:
    -        val = val + x**i
    -        i = i + 1
    -    return val
    -
    -f6_for_grad = grad(f6_for)
    -f6_while_grad = grad(f6_while)
    -
    -x = 0.5
    -
    -# Print the computed derivaties of f6_for and f6_while
    -print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
    -print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
    -# The analytical derivative is: sum(i*x**(i-1)) 
    -f6_grad_analytical = 0
    -for i in range(10):
    -    f6_grad_analytical += i*x**(i-1)
    -
    -print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Using recursion

    - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -
    -def f7(n): # Assume that n is an integer
    -    if n == 1 or n == 0:
    -        return 1
    -    else:
    -        return n*f7(n-1)
    -
    -f7_grad = grad(f7)
    -
    -n = 2.0
    -
    -print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
    -
    -# The function f7 is an implementation of the factorial of n.
    -# By using the product rule, one can find that the derivative is:
    -
    -f7_grad_analytical = 0
    -for i in range(int(n)-1):
    -    tmp = 1
    -    for k in range(int(n)-1):
    -        if k != i:
    -            tmp *= (n - k)
    -    f7_grad_analytical += tmp
    -
    -print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.

    -
    - -
    -

    Unsupported functions

    -

    Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.

    - -

    Assigning a value to the variable being differentiated with respect to

    - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f8(x): # Assume x is an array
    -    x[2] = 3
    -    return x*2
    -
    -#f8_grad = grad(f8)
    -
    -#x = 8.4
    -
    -#print("The derivative of f8 is:",f8_grad(x))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.

    -
    - -
    -

    The syntax a.dot(b) when finding the dot product

    - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f9(a): # Assume a is an array with 2 elements
    -    b = np.array([1.0,2.0])
    -    return a.dot(b)
    -
    -#f9_grad = grad(f9)
    -
    -#x = np.array([1.0,0.0])
    -
    -#print("The derivative of f9 is:",f9_grad(x))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Here we are told that the 'dot' function does not belong to Autograd's -version of a Numpy array. To overcome this, an alternative syntax -which also computed the dot product can be used: -

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f9_alternative(x): # Assume a is an array with 2 elements
    -    b = np.array([1.0,2.0])
    -    return np.dot(x,b) # The same as x_1*b_1 + x_2*b_2
    -
    -f9_alternative_grad = grad(f9_alternative)
    -
    -x = np.array([3.0,0.0])
    -
    -print("The gradient of f9 is:",f9_alternative_grad(x))
    -
    -# The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively
    -# w.r.t x is (b_1, b_2).
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Using Autograd with OLS

    - -

    We conclude the part on optmization by showing how we can make codes -for linear regression and logistic regression using autograd. The -first example shows results with ordinary leats squares. -

    - - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients for OLS
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -def CostOLS(beta):
    -    return (1.0/n)*np.sum((y-X @ beta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -# define the gradient
    -training_gradient = grad(CostOLS)
    -
    -for iter in range(Niterations):
    -    gradients = training_gradient(theta)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -xnew = np.array([[0],[2]])
    -Xnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = Xnew.dot(theta)
    -ypredict2 = Xnew.dot(theta_linreg)
    -
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Random numbers ')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Same code but now with momentum gradient descent

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients for OLS
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -def CostOLS(beta):
    -    return (1.0/n)*np.sum((y-X @ beta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x#+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 30
    -
    -# define the gradient
    -training_gradient = grad(CostOLS)
    -
    -for iter in range(Niterations):
    -    gradients = training_gradient(theta)
    -    theta -= eta*gradients
    -    print(iter,gradients[0],gradients[1])
    -print("theta from own gd")
    -print(theta)
    -
    -# Now improve with momentum gradient descent
    -change = 0.0
    -delta_momentum = 0.3
    -for iter in range(Niterations):
    -    # calculate gradient
    -    gradients = training_gradient(theta)
    -    # calculate update
    -    new_change = eta*gradients+delta_momentum*change
    -    # take a step
    -    theta -= new_change
    -    # save the change
    -    change = new_change
    -    print(iter,gradients[0],gradients[1])
    -print("theta from own gd wth momentum")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    But none of these can compete with Newton's method

    - - - -
    -
    -
    -
    -
    -
    # Using Newton's method
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -def CostOLS(beta):
    -    return (1.0/n)*np.sum((y-X @ beta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(beta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -# Note that here the Hessian does not depend on the parameters beta
    -invH = np.linalg.pinv(H)
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -beta = np.random.randn(2,1)
    -Niterations = 5
    -
    -# define the gradient
    -training_gradient = grad(CostOLS)
    -
    -for iter in range(Niterations):
    -    gradients = training_gradient(beta)
    -    beta -= invH @ gradients
    -    print(iter,gradients[0],gradients[1])
    -print("beta from own Newton code")
    -print(beta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Including Stochastic Gradient Descent with Autograd

    -

    In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using autograd.

    - - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using SGD
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -
    -for iter in range(Niterations):
    -    gradients = (1.0/n)*training_gradient(y, X, theta)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -xnew = np.array([[0],[2]])
    -Xnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = Xnew.dot(theta)
    -ypredict2 = Xnew.dot(theta_linreg)
    -
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Random numbers ')
    -plt.show()
    -
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -t0, t1 = 5, 50
    -def learning_schedule(t):
    -    return t0/(t+t1)
    -
    -theta = np.random.randn(2,1)
    -
    -for epoch in range(n_epochs):
    -# Can you figure out a better way of setting up the contributions to each batch?
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        eta = learning_schedule(epoch*m+i)
    -        theta = theta - eta*gradients
    -print("theta from own sdg")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Same code but now with momentum gradient descent

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using SGD
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 100
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -
    -for iter in range(Niterations):
    -    gradients = (1.0/n)*training_gradient(y, X, theta)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -t0, t1 = 5, 50
    -def learning_schedule(t):
    -    return t0/(t+t1)
    -
    -theta = np.random.randn(2,1)
    -
    -change = 0.0
    -delta_momentum = 0.3
    -
    -for epoch in range(n_epochs):
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        eta = learning_schedule(epoch*m+i)
    -        # calculate update
    -        new_change = eta*gradients+delta_momentum*change
    -        # take a step
    -        theta -= new_change
    -        # save the change
    -        change = new_change
    -print("theta from own sdg with momentum")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Similar (second order function now) problem but now with AdaGrad

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 1000
    -x = np.random.rand(n,1)
    -y = 2.0+3*x +4*x*x
    -
    -X = np.c_[np.ones((n,1)), x, x*x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -# Define parameters for Stochastic Gradient Descent
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -# Guess for unknown parameters theta
    -theta = np.random.randn(3,1)
    -
    -# Value for learning rate
    -eta = 0.01
    -# Including AdaGrad parameter to avoid possible division by zero
    -delta  = 1e-8
    -for epoch in range(n_epochs):
    -    Giter = 0.0
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        Giter += gradients*gradients
    -        update = gradients*eta/(delta+np.sqrt(Giter))
    -        theta -= update
    -print("theta from own AdaGrad")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Running this code we note an almost perfect agreement with the results from matrix inversion.

    -
    - -
    -

    RMSprop for adaptive learning rate with Stochastic Gradient Descent

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 1000
    -x = np.random.rand(n,1)
    -y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x, x*x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -# Define parameters for Stochastic Gradient Descent
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -# Guess for unknown parameters theta
    -theta = np.random.randn(3,1)
    -
    -# Value for learning rate
    -eta = 0.01
    -# Value for parameter rho
    -rho = 0.99
    -# Including AdaGrad parameter to avoid possible division by zero
    -delta  = 1e-8
    -for epoch in range(n_epochs):
    -    Giter = 0.0
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -	# Accumulated gradient
    -	# Scaling with rho the new and the previous results
    -        Giter = (rho*Giter+(1-rho)*gradients*gradients)
    -	# Taking the diagonal only and inverting
    -        update = gradients*eta/(delta+np.sqrt(Giter))
    -	# Hadamard product
    -        theta -= update
    -print("theta from own RMSprop")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    And finally ADAM

    - - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 1000
    -x = np.random.rand(n,1)
    -y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x, x*x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -# Define parameters for Stochastic Gradient Descent
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -# Guess for unknown parameters theta
    -theta = np.random.randn(3,1)
    -
    -# Value for learning rate
    -eta = 0.01
    -# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
    -beta1 = 0.9
    -beta2 = 0.999
    -# Including AdaGrad parameter to avoid possible division by zero
    -delta  = 1e-7
    -iter = 0
    -for epoch in range(n_epochs):
    -    first_moment = 0.0
    -    second_moment = 0.0
    -    iter += 1
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        # Computing moments first
    -        first_moment = beta1*first_moment + (1-beta1)*gradients
    -        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
    -        first_term = first_moment/(1.0-beta1**iter)
    -        second_term = second_moment/(1.0-beta2**iter)
    -	# Scaling with rho the new and the previous results
    -        update = eta*first_term/(np.sqrt(second_term)+delta)
    -        theta -= update
    -print("theta from own ADAM")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    And Logistic Regression

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -
    -def sigmoid(x):
    -    return 0.5 * (np.tanh(x / 2.) + 1)
    -
    -def logistic_predictions(weights, inputs):
    -    # Outputs probability of a label being true according to logistic model.
    -    return sigmoid(np.dot(inputs, weights))
    -
    -def training_loss(weights):
    -    # Training loss is the negative log-likelihood of the training labels.
    -    preds = logistic_predictions(weights, inputs)
    -    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
    -    return -np.sum(np.log(label_probabilities))
    -
    -# Build a toy dataset.
    -inputs = np.array([[0.52, 1.12,  0.77],
    -                   [0.88, -1.08, 0.15],
    -                   [0.52, 0.06, -1.30],
    -                   [0.74, -2.49, 1.39]])
    -targets = np.array([True, True, False, True])
    -
    -# Define a function that returns gradients of training loss using Autograd.
    -training_gradient_fun = grad(training_loss)
    -
    -# Optimize weights using gradient descent.
    -weights = np.array([0.0, 0.0, 0.0])
    -print("Initial loss:", training_loss(weights))
    -for i in range(100):
    -    weights -= training_gradient_fun(weights) * 0.01
    -
    -print("Trained loss:", training_loss(weights))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    -

    Introducing JAX

    - -

    Presently, instead of using autograd, we recommend using JAX

    - -

    JAX is Autograd and XLA (Accelerated Linear Algebra)), -brought together for high-performance numerical computing and machine learning research. -It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more. -

    - -

    Here's a simple example on how you can use JAX to compute the derivate of the logistic function.

    - - - -
    -
    -
    -
    -
    -
    import jax.numpy as jnp
    -from jax import grad, jit, vmap
    -
    -def sum_logistic(x):
    -  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))
    +  
    import numpy as np
     
    -x_small = jnp.arange(3.)
    -derivative_fn = grad(sum_logistic)
    -print(derivative_fn(x_small))
    +def generate_binary_data(n_samples=100, n_features=2, random_state=None):
    +    """
    +    Generate synthetic binary classification data.
    +    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    # Half samples for class 0, half for class 1
    +    n0 = n_samples // 2
    +    n1 = n_samples - n0
    +    # Class 0 around mean -2, class 1 around +2
    +    mean0 = -2 * np.ones(n_features)
    +    mean1 =  2 * np.ones(n_features)
    +    X0 = rng.randn(n0, n_features) + mean0
    +    X1 = rng.randn(n1, n_features) + mean1
    +    X = np.vstack((X0, X1))
    +    y = np.array([0]*n0 + [1]*n1)
    +    return X, y
    +
    +def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
    +    """
    +    Generate synthetic multiclass data with n_classes Gaussian clusters.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    X = []
    +    y = []
    +    samples_per_class = n_samples // n_classes
    +    for cls in range(n_classes):
    +        # Random cluster center for each class
    +        center = rng.uniform(-5, 5, size=n_features)
    +        Xi = rng.randn(samples_per_class, n_features) + center
    +        yi = [cls] * samples_per_class
    +        X.append(Xi)
    +        y.extend(yi)
    +    X = np.vstack(X)
    +    y = np.array(y)
    +    return X, y
    +
    +
    +# Generate and test on binary data
    +X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
    +model_bin = LogisticRegression(lr=0.1, epochs=1000)
    +model_bin.fit(X_bin, y_bin)
    +y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
    +y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
    +
    +acc_bin = accuracy_score(y_bin, y_pred_bin)
    +loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
    +print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
    +#For multiclass:
    +# Generate and test on multiclass data
    +X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
    +model_multi = LogisticRegression(lr=0.1, epochs=1000)
    +model_multi.fit(X_multi, y_multi)
    +y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
    +y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
    +
    +acc_multi = accuracy_score(y_multi, y_pred_multi)
    +loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
    +print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
    +
    +# CSV Export
    +import csv
    +
    +# Export binary results
    +with open('binary_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_bin, y_pred_bin):
    +        writer.writerow([true, pred])
    +
    +# Export multiclass results
    +with open('multiclass_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_multi, y_pred_multi):
    +        writer.writerow([true, pred])
     
    diff --git a/doc/pub/week39/html/week39-solarized.html b/doc/pub/week39/html/week39-solarized.html index 36a7e748b..6a8e90be3 100644 --- a/doc/pub/week39/html/week39-solarized.html +++ b/doc/pub/week39/html/week39-solarized.html @@ -8,8 +8,8 @@ - -Week 39: Optimization and Gradient Methods + +Week 39: Resampling methods and logistic regression @@ -63,248 +63,132 @@ @@ -326,19 +210,16 @@
    -

    Week 39: Optimization and Gradient Methods

    +

    Week 39: Resampling methods and logistic regression

    -Morten Hjorth-Jensen [1, 2] -
    - -
    -[1] Department of Physics, University of Oslo +Morten Hjorth-Jensen
    +
    -[2] Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University +Department of Physics, University of Oslo

    @@ -347,28 +228,46 @@

    Week 39












    -

    Plan for week 39, September 23-27, 2024

    +

    Plan for week 39, September 22-26, 2025

    + +
    +Material for the lecture on Monday September 22 +

    +

      +
    1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff
    2. +
    3. Logistic regression, our first classification encounter and a stepping stone towards neural networks
    4. +
    5. Video of lecture
    6. +
    7. Whiteboard notes
    8. +
    +
    +









    -

    Lecture Monday September 23

    +

    Readings and Videos, resampling methods

    +
    + +

    +

      +
    1. Raschka et al, pages 175-192
    2. +
    3. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See https://link.springer.com/book/10.1007/978-0-387-84858-7.
    4. +
    5. Video on bias-variance tradeoff
    6. +
    7. Video on Bootstrapping
    8. +
    9. Video on cross validation
    10. +
    +
    + +









    +

    Readings and Videos, logistic regression

    -Material for the lecture on Monday September 23 +

    -

      -
    • Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods
    • -
    • Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).
    • -
    • Video of lecture
    • -
    • Whiteboard notes
    • -
    • Readings and Videos:
    • -
        -
      • These lecture notes
      • -
      • For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.
      • -
      • Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
      • -
      • Video on gradient descent
      • -
      • Video on stochastic gradient descent
      • -
      -
    +
      +
    1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression
    2. +
    3. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
    4. +
    5. Video on Logistic regression
    6. +
    7. Yet another video on logistic regression
    8. +
    @@ -376,554 +275,573 @@

    Lecture Monday September 23

    Lab sessions week 39

    -Material for the active learning sessions on Tuesday and Wednesday +Material for the lab sessions on Tuesday and Wednesday

    -

    +
      +
    1. Discussions on how to structure your report for the first project
    2. +
    3. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.
    4. +
    5. Work on project 1, in particular resampling methods like cross-validation and bootstrap. For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11.
    6. +
    7. Video on how to write scientific reports recorded during one of the lab sessions
    8. +
    9. A general guideline can be found at https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md.
    10. +










    -

    Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm

    - -

    The first few slides here are a repetition from last week.

    - -

    Almost every problem in machine learning and data science starts with -a dataset \( X \), a model \( g(\beta) \), which is a function of the -parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows -us to judge how well the model \( g(\beta) \) explains the observations -\( X \). The model is fit by finding the values of \( \beta \) that minimize -the cost function. Ideally we would be able to solve for \( \beta \) -analytically, however this is not possible in general and we must use -some approximative/numerical method to compute the minimum. -

    +

    Lecture material











    -

    Revisiting our Logistic Regression case

    - -

    In our discussion on Logistic Regression we studied the -case of -two classes, with \( y_i \) either -\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two -parameters \( \beta \) in our fitting, that is we -defined probabilities -

    - -$$ -\begin{align*} -p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\ -p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}), -\end{align*} -$$ +

    Resampling methods

    +
    + +

    +

    Resampling methods are an indispensable tool in modern +statistics. They involve repeatedly drawing samples from a training +set and refitting a model of interest on each sample in order to +obtain additional information about the fitted model. For example, in +order to estimate the variability of a linear regression fit, we can +repeatedly draw different samples from the training data, fit a linear +regression to each new sample, and then examine the extent to which +the resulting fits differ. Such an approach may allow us to obtain +information that would not be available from fitting the model only +once using the original training sample. +

    + +

    Two resampling methods are often used in Machine Learning analyses,

    +
      +
    1. The bootstrap method
    2. +
    3. and Cross-Validation
    4. +
    +

    In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.

    +
    -

    where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \).











    -

    The equations to solve

    - -

    Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \) -elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the -\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities -\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form -the first derivative of the cost function as -

    - -$$ -\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). -$$ +

    Resampling approaches can be computationally expensive

    +
    + +

    -

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements -\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as +

    Resampling approaches can be computationally expensive, because they +involve fitting the same statistical method multiple times using +different subsets of the training data. However, due to recent +advances in computing power, the computational requirements of +resampling methods generally are not prohibitive. In this chapter, we +discuss two of the most commonly used resampling methods, +cross-validation and the bootstrap. Both methods are important tools +in the practical application of many statistical learning +procedures. For example, cross-validation can be used to estimate the +test error associated with a given statistical learning method in +order to evaluate its performance, or to select the appropriate level +of flexibility. The process of evaluating a model’s performance is +known as model assessment, whereas the process of selecting the proper +level of flexibility for a model is known as model selection. The +bootstrap is widely used.

    +
    -$$ -\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. -$$ - -

    This defines what is called the Hessian matrix.











    -

    Solving using Newton-Raphson's method

    +

    Why resampling methods ?

    +
    +Statistical analysis +

    -

    If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.

    +
      +
    • Our simulations can be treated as computer experiments. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
    • +
    • The results can be analysed with the same statistical tools as we would use when analysing experimental data.
    • +
    • As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
    • +
    +
    + -

    Our iterative scheme is then given by

    +









    +

    Statistical analysis

    +
    + +

    -$$ -\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}}, -$$ +

      +
    • As in other experiments, many numerical experiments have two classes of errors:
    • +
        +
      • Statistical errors
      • +
      • Systematical errors
      • +
      +
    • Statistical errors can be estimated using standard tools from statistics
    • +
    • Systematical errors are method specific and must be treated differently from case to case.
    • +
    +
    + -

    or in matrix form as

    +









    +

    Resampling methods

    -$$ -\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}. -$$ +

    With all these analytical equations for both the OLS and Ridge +regression, we will now outline how to assess a given model. This will +lead to a discussion of the so-called bias-variance tradeoff (see +below) and so-called resampling methods. +

    -

    The right-hand side is computed with the old values of \( \beta \).

    +

    One of the quantities we have discussed as a way to measure errors is +the mean-squared error (MSE), mainly used for fitting of continuous +functions. Another choice is the absolute error. +

    -

    If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.

    +

    In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data, +we discuss the +

    +
      +
    1. prediction error or simply the test error \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the
    2. +
    3. training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.
    4. +
    +

    As our model becomes more and more complex, more of the training data tends to used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error. +For a certain level of complexity the test error will reach minimum, before starting to increase again. The +training error reaches a saturation. +











    -

    Brief reminder on Newton-Raphson's method

    +

    Resampling methods: Bootstrap

    +
    + +

    +

    Bootstrapping is a non-parametric approach to statistical inference +that substitutes computation for more traditional distributional +assumptions and asymptotic results. Bootstrapping offers a number of +advantages: +

    +
      +
    1. The bootstrap is quite general, although there are some cases in which it fails.
    2. +
    3. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.
    4. +
    5. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.
    6. +
    7. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
    8. +
    +
    -

    Let us quickly remind ourselves how we derive the above method.

    -

    Perhaps the most celebrated of all one-dimensional root-finding -routines is Newton's method, also called the Newton-Raphson -method. This method requires the evaluation of both the -function \( f \) and its derivative \( f' \) at arbitrary points. -If you can only calculate the derivative -numerically and/or your function is not of the smooth type, we -normally discourage the use of this method. -

    +

    The textbook by Davison on the Bootstrap Methods and their Applications provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by Efron and Tibshirani.











    -

    The equations

    +

    The bias-variance tradeoff

    -

    The Newton-Raphson formula consists geometrically of extending the -tangent line at a current point until it crosses zero, then setting -the next guess to the abscissa of that zero-crossing. The mathematics -behind this method is rather simple. Employing a Taylor expansion for -\( x \) sufficiently close to the solution \( s \), we have +

    We will discuss the bias-variance tradeoff in the context of +continuous predictions such as regression. However, many of the +intuitions and ideas discussed here also carry over to classification +tasks. Consider a dataset \( \mathcal{D} \) consisting of the data +\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \).

    +

    Let us assume that the true data is generated from a noisy model

    + $$ - f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots. - \label{eq:taylornr} +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon} $$ -

    For small enough values of the function and for well-behaved -functions, the terms beyond linear are unimportant, hence we obtain -

    +

    where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).

    -$$ - f(x)+(s-x)f'(x)\approx 0, -$$ +

    In our derivation of the ordinary least squares method we defined then +an approximation to the function \( f \) in terms of the parameters +\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, +that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). +

    -

    yielding

    +

    Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function

    $$ - s\approx x-\frac{f(x)}{f'(x)}. +C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. $$ -

    Having in mind an iterative procedure, it is natural to start iterating with

    +

    We can rewrite this as

    $$ - x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}. +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2. $$ - -









    -

    Simple geometric interpretation

    - -

    The above is Newton-Raphson's method. It has a simple geometric -interpretation, namely \( x_{n+1} \) is the point where the tangent from -\( (x_n,f(x_n)) \) crosses the \( x \)-axis. Close to the solution, -Newton-Raphson converges fast to the desired result. However, if we -are far from a root, where the higher-order terms in the series are -important, the Newton-Raphson formula can give grossly inaccurate -results. For instance, the initial guess for the root might be so far -from the true root as to let the search interval include a local -maximum or minimum of the function. If an iteration places a trial -guess near such a local extremum, so that the first derivative nearly -vanishes, then Newton-Raphson may fail totally +

    The three terms represent the square of the bias of the learning +method, which can be thought of as the error caused by the simplifying +assumptions built into the method. The second term represents the +variance of the chosen model and finally the last terms is variance of +the error \( \boldsymbol{\epsilon} \).

    -









    -

    Extending to more than one variable

    - -

    Newton's method can be generalized to systems of several non-linear equations -and variables. Consider the case with two equations +

    To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \). +We use a more compact notation in terms of the expectation value

    $$ - \begin{array}{cc} f_1(x_1,x_2) &=0\\ - f_2(x_1,x_2) &=0,\end{array} -$$ - -

    which we Taylor expand to obtain

    - -$$ - \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1 - \partial f_1/\partial x_1+h_2 - \partial f_1/\partial x_2+\dots\\ - 0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1 - \partial f_2/\partial x_1+h_2 - \partial f_2/\partial x_2+\dots - \end{array}. +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right], $$ -

    Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have

    +

    and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get

    $$ - {\bf \boldsymbol{J}}=\left( \begin{array}{cc} - \partial f_1/\partial x_1 & \partial f_1/\partial x_2 \\ - \partial f_2/\partial x_1 &\partial f_2/\partial x_2 - \end{array} \right), +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right], $$ -

    we can rephrase Newton's method as

    +

    which, using the abovementioned expectation values can be rewritten as

    $$ -\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)= -\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+ -\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right), +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2, $$ -

    where we have defined

    -$$ - \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)= - -{\bf \boldsymbol{J}}^{-1} - \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right). -$$ - -

    We need thus to compute the inverse of the Jacobian matrix and it -is to understand that difficulties may -arise in case \( {\bf \boldsymbol{J}} \) is nearly singular. -

    +

    that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).

    -

    It is rather straightforward to extend the above scheme to systems of -more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. -

    +Note that in order to derive these equations we have assumed we can replace the unknown function \( \boldsymbol{f} \) with the target/output data \( \boldsymbol{y} \).









    -

    Steepest descent

    +

    A way to Read the Bias-Variance Tradeoff

    -

    The basic idea of gradient descent is -that a function \( F(\mathbf{x}) \), -\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the -direction of the negative gradient \( -\nabla F(\mathbf{x}) \). -

    +

    +
    +

    +
    +

    -

    It can be shown that if

    -$$ -\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), -$$ +









    +

    Understanding what happens

    -

    with \( \gamma_k > 0 \).

    + +
    +
    +
    +
    +
    +
    import matplotlib.pyplot as plt
    +import numpy as np
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.preprocessing import PolynomialFeatures
    +from sklearn.model_selection import train_test_split
    +from sklearn.pipeline import make_pipeline
    +from sklearn.utils import resample
    +
    +np.random.seed(2018)
    +
    +n = 40
    +n_boostraps = 100
    +maxdegree = 14
    +
    +
    +# Make data set.
    +x = np.linspace(-3, 3, n).reshape(-1, 1)
    +y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
    +error = np.zeros(maxdegree)
    +bias = np.zeros(maxdegree)
    +variance = np.zeros(maxdegree)
    +polydegree = np.zeros(maxdegree)
    +x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    +
    +for degree in range(maxdegree):
    +    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
    +    y_pred = np.empty((y_test.shape[0], n_boostraps))
    +    for i in range(n_boostraps):
    +        x_, y_ = resample(x_train, y_train)
    +        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
    +
    +    polydegree[degree] = degree
    +    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
    +    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
    +    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
    +    print('Polynomial degree:', degree)
    +    print('Error:', error[degree])
    +    print('Bias^2:', bias[degree])
    +    print('Var:', variance[degree])
    +    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
    +
    +plt.plot(polydegree, error, label='Error')
    +plt.plot(polydegree, bias, label='bias')
    +plt.plot(polydegree, variance, label='Variance')
    +plt.legend()
    +plt.show()
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq -F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \) -we are always moving towards smaller function values, i.e a minimum. -

    -

    More on Steepest descent

    +

    Summing up

    -

    The previous observation is the basis of the method of steepest -descent, which is also referred to as just gradient descent (GD). One -starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and -computes new approximations according to +

    The bias-variance tradeoff summarizes the fundamental tension in +machine learning, particularly supervised learning, between the +complexity of a model and the amount of training data needed to train +it. Since data is often limited, in practice it is often useful to +use a less-complex model with higher bias, that is a model whose asymptotic +performance is worse than another model because it is easier to +train and less sensitive to sampling noise arising from having a +finite-sized training dataset (smaller variance).

    -$$ -\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0. -$$ - -

    The parameter \( \gamma_k \) is often referred to as the step length or -the learning rate within the context of Machine Learning. +

    The above equations tell us that in +order to minimize the expected test error, we need to select a +statistical learning method that simultaneously achieves low variance +and low bias. Note that variance is inherently a nonnegative quantity, +and squared bias is also nonnegative. Hence, we see that the expected +test MSE can never lie below \( Var(\epsilon) \), the irreducible error.

    - -

    The ideal

    - -

    Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global -minimum of the function \( F \). In general we do not know if we are in a -global or local minimum. In the special case when \( F \) is a convex -function, all local minima are also global minima, so in this case -gradient descent can converge to the global solution. The advantage of -this scheme is that it is conceptually simple and straightforward to -implement. However the method in this form has some severe -limitations: +

    What do we mean by the variance and bias of a statistical learning +method? The variance refers to the amount by which our model would change if we +estimated it using a different training data set. Since the training +data are used to fit the statistical learning method, different +training data sets will result in a different estimate. But ideally the +estimate for our model should not vary too much between training +sets. However, if a method has high variance then small changes in +the training data can result in large changes in the model. In general, more +flexible statistical methods have higher variance.

    -

    In machine learing we are often faced with non-convex high dimensional -cost functions with many local minima. Since GD is deterministic we -will get stuck in a local minimum, if the method converges, unless we -have a very good intial guess. This also implies that the scheme is -sensitive to the chosen initial condition. -

    +

    You may also find this recent article of interest.

    -

    Note that the gradient is a function of \( \mathbf{x} = -(x_1,\cdots,x_n) \) which makes it expensive to compute numerically. -

    +









    +

    Another Example from Scikit-Learn's Repository

    - -

    The sensitiveness of the gradient descent

    - -

    The gradient descent method -is sensitive to the choice of learning rate \( \gamma_k \). This is due -to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq -F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to -determine an optimal learning rate. If the learning rate is chosen too -small the method will take a long time to converge and if it is too -large we can experience erratic behavior. +

    This example demonstrates the problems of underfitting and overfitting and +how we can use linear regression with polynomial features to approximate +nonlinear functions. The plot shows the function that we want to approximate, +which is a part of the cosine function. In addition, the samples from the +real function and the approximations of different models are displayed. The +models have polynomial features of different degrees. We can see that a +linear function (polynomial with degree 1) is not sufficient to fit the +training samples. This is called underfitting. A polynomial of degree 4 +approximates the true function almost perfectly. However, for higher degrees +the model will overfit the training data, i.e. it learns the noise of the +training data. +We evaluate quantitatively overfitting and underfitting by using +cross-validation. We calculate the mean squared error (MSE) on the validation +set, the higher, the less likely the model generalizes correctly from the +training data.

    -

    Many of these shortcomings can be alleviated by introducing -randomness. One such method is that of Stochastic Gradient Descent -(SGD), see below. -

    - -

    Convex functions

    + +
    +
    +
    +
    +
    +
    #print(__doc__)
     
    -

    Ideally we want our cost/loss function to be convex(concave).

    +import numpy as np +import matplotlib.pyplot as plt +from sklearn.pipeline import Pipeline +from sklearn.preprocessing import PolynomialFeatures +from sklearn.linear_model import LinearRegression +from sklearn.model_selection import cross_val_score + + +def true_fun(X): + return np.cos(1.5 * np.pi * X) + +np.random.seed(0) + +n_samples = 30 +degrees = [1, 4, 15] + +X = np.sort(np.random.rand(n_samples)) +y = true_fun(X) + np.random.randn(n_samples) * 0.1 + +plt.figure(figsize=(14, 5)) +for i in range(len(degrees)): + ax = plt.subplot(1, len(degrees), i + 1) + plt.setp(ax, xticks=(), yticks=()) + + polynomial_features = PolynomialFeatures(degree=degrees[i], + include_bias=False) + linear_regression = LinearRegression() + pipeline = Pipeline([("polynomial_features", polynomial_features), + ("linear_regression", linear_regression)]) + pipeline.fit(X[:, np.newaxis], y) + + # Evaluate the models using crossvalidation + scores = cross_val_score(pipeline, X[:, np.newaxis], y, + scoring="neg_mean_squared_error", cv=10) + + X_test = np.linspace(0, 1, 100) + plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model") + plt.plot(X_test, true_fun(X_test), label="True function") + plt.scatter(X, y, edgecolor='b', s=20, label="Samples") + plt.xlabel("x") + plt.ylabel("y") + plt.xlim((0, 1)) + plt.ylim((-2, 2)) + plt.legend(loc="best") + plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format( + degrees[i], -scores.mean(), scores.std())) +plt.show() +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    -

    First we give the definition of a convex set: A set \( C \) in -\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and -all \( t \in (0,1) \) , the point \( (1 − t)x + ty \) also belongs to -C. Geometrically this means that every point on the line segment -connecting \( x \) and \( y \) is in \( C \) as discussed below. -

    -

    The convex subsets of \( \mathbb{R} \) are the intervals of -\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the -regular polygons (triangles, rectangles, pentagons, etc...). -

    + +

    Various steps in cross-validation

    -









    -

    Convex function

    - -

    Convex function: Let \( X \subset \mathbb{R}^n \) be a convex -set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is -continuous, then \( f \) is said to be convex if \( f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) \) -for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \). -If \( \leq \) is replaced with a strict inequaltiy in the -definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said -to be strictly convex. For a single variable function, convexity means -that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the -value of the function on the interval \( [x_1,x_2] \) is always below the -line as illustrated below. +

    When the repetitive splitting of the data set is done randomly, +samples may accidently end up in a fast majority of the splits in +either training or test set. Such samples may have an unbalanced +influence on either model building or prediction evaluation. To avoid +this \( k \)-fold cross-validation structures the data splitting. The +samples are divided into \( k \) more or less equally sized exhaustive and +mutually exclusive subsets. In turn (at each split) one of these +subsets plays the role of the test set while the union of the +remaining subsets constitutes the training set. Such a splitting +warrants a balanced representation of each sample in both training and +test set over the splits. Still the division into the \( k \) subsets +involves a degree of randomness. This may be fully excluded when +choosing \( k=n \). This particular case is referred to as leave-one-out +cross-validation (LOOCV).











    -

    Conditions on convex functions

    +

    Cross-validation in brief

    -

    In the following we state first and second-order conditions which -ensures convexity of a function \( f \). We write \( D_f \) to denote the -domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more -details and proofs we refer to: S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press. -

    - -
    -First order condition -

    -

    Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for -all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \) -is a convex set and \( f(y) \geq f(x) + \nabla f(x)^T (y-x) \) holds -for all \( x,y \in D_f \). -

    - -

    This condition means that for a convex function -the first order Taylor expansion (right hand side above) at any point -a global under estimator of the function. To convince yourself you can -make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and -note that it is always below the graph. -

    -
    - - -
    -Second order condition -

    -

    Assume that \( f \) is twice -differentiable, i.e the Hessian matrix exists at each point in -\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its -Hessian is positive semi-definite for all \( x\in D_f \). For a -single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature -everywhere. -

    -
    - - -

    This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.

    +

    For the various values of \( k \)

    +
      +
    1. shuffle the dataset randomly.
    2. +
    3. Split the dataset into \( k \) groups.
    4. +
    5. For each unique group: +
        +
      1. Decide which group to use as set for test data
      2. +
      3. Take the remaining groups as a training data set
      4. +
      5. Fit a model on the training set and evaluate it on the test set
      6. +
      7. Retain the evaluation score and discard the model
      8. +
      +
    6. Summarize the model using the sample of model evaluation scores
    7. +










    -

    More on convex functions

    - -

    The next result is of great importance to us and the reason why we are -going on about convex functions. In machine learning we frequently -have to minimize a loss/cost function in order to find the best -parameters for the model we are considering. -

    - -

    Ideally we want the -global minimum (for high-dimensional models it is hard to know -if we have local or global minimum). However, if the cost/loss function -is convex the following result provides invaluable information: -

    +

    Code Example for Cross-validation and \( k \)-fold Cross-validation

    -
    -Any minimum is global for convex functions -

    -

    Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \) -is minimal, where \( f \) is convex and differentiable. Then, any point -\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum. -

    -
    +

    The code here uses Ridge regression with cross-validation (CV) resampling and \( k \)-fold CV in order to fit a specific polynomial.

    + +
    +
    +
    +
    +
    +
    import numpy as np
    +import matplotlib.pyplot as plt
    +from sklearn.model_selection import KFold
    +from sklearn.linear_model import Ridge
    +from sklearn.model_selection import cross_val_score
    +from sklearn.preprocessing import PolynomialFeatures
     
    -

    This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.

    +# A seed just to ensure that the random numbers are the same for every run. +# Useful for eventual debugging. +np.random.seed(3155) -









    -

    Some simple problems

    +# Generate the data. +nsamples = 100 +x = np.random.randn(nsamples) +y = 3*x**2 + np.random.randn(nsamples) -
      -
    1. Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.
    2. -
    3. Using the second order condition show that the following functions are convex on the specified domain.
    4. -
        -
      • \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).
      • -
      • \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).
      • -
      -
    5. Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.
    6. -
    7. A norm is any function that satisfy the following properties
    8. -
        -
      • \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).
      • -
      • \( f(x+y) \leq f(x) + f(y) \)
      • -
      • \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)
      • -
      -
    -

    Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).

    +## Cross-validation on Ridge regression using KFold only -









    -

    Standard steepest descent

    +# Decide degree on polynomial to fit +poly = PolynomialFeatures(degree = 6) -

    Before we proceed, we would like to discuss the approach called the -standard Steepest descent (different from the above steepest descent discussion), which again leads to us having to be able -to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG). -

    +# Decide which values of lambda to use +nlambdas = 500 +lambdas = np.logspace(-3, 5, nlambdas) -The success of the CG method -

    for finding solutions of non-linear problems is based on the theory -of conjugate gradients for linear systems of equations. It belongs to -the class of iterative methods for solving problems from linear -algebra of the type -

    -$$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}. -\end{equation*} -$$ +# Initialize a KFold instance +k = 5 +kfold = KFold(n_splits = k) -

    In the iterative process we end up with a problem like

    +# Perform the cross-validation to estimate MSE +scores_KFold = np.zeros((nlambdas, k)) -$$ -\begin{equation*} - \boldsymbol{r}= \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}, -\end{equation*} -$$ +i = 0 +for lmb in lambdas: + ridge = Ridge(alpha = lmb) + j = 0 + for train_inds, test_inds in kfold.split(x): + xtrain = x[train_inds] + ytrain = y[train_inds] -

    where \( \boldsymbol{r} \) is the so-called residual or error in the iterative process.

    + xtest = x[test_inds] + ytest = y[test_inds] -

    When we have found the exact solution, \( \boldsymbol{r}=0 \).

    + Xtrain = poly.fit_transform(xtrain[:, np.newaxis]) + ridge.fit(Xtrain, ytrain[:, np.newaxis]) -









    -

    Gradient method

    + Xtest = poly.fit_transform(xtest[:, np.newaxis]) + ypred = ridge.predict(Xtest) -

    The residual is zero when we reach the minimum of the quadratic equation

    -$$ -\begin{equation*} - P(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T\boldsymbol{b}, -\end{equation*} -$$ + scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred) -

    with the constraint that the matrix \( \boldsymbol{A} \) is positive definite and -symmetric. This defines also the Hessian and we want it to be positive definite. -

    + j += 1 + i += 1 -









    -

    Steepest descent method

    -

    We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). -We can assume without loss of generality that -

    -$$ -\begin{equation*} -\boldsymbol{x}_0=0, -\end{equation*} -$$ +estimated_mse_KFold = np.mean(scores_KFold, axis = 1) -

    or consider the system

    -$$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0, -\end{equation*} -$$ +## Cross-validation using cross_val_score from sklearn along with KFold -

    instead.

    +# kfold is an instance initialized above as: +# kfold = KFold(n_splits = k) -









    -

    Steepest descent method

    -
    - -

    -

    One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form

    -$$ -\begin{equation*} - f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. -\end{equation*} -$$ +estimated_mse_sklearn = np.zeros(nlambdas) +i = 0 +for lmb in lambdas: + ridge = Ridge(alpha = lmb) -

    This suggests taking the first basis vector \( \boldsymbol{r}_1 \) (see below for definition) -to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), -which equals -

    -$$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b}, -\end{equation*} -$$ + X = poly.fit_transform(x[:, np.newaxis]) + estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold) -

    and -\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \). -

    -
    + # cross_val_score return an array containing the estimated negative mse for every fold. + # we have to the the mean of every array in order to get an estimate of the mse of the model + estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds) + i += 1 -









    -

    Final expressions

    -
    - -

    -

    We can compute the residual iteratively as

    -$$ -\begin{equation*} -\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1}, - \end{equation*} -$$ +## Plot and compare the slightly different ways to perform cross-validation -

    which equals

    -$$ -\begin{equation*} -\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_k), - \end{equation*} -$$ +plt.figure() -

    or

    -$$ -\begin{equation*} -(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{r}_k, - \end{equation*} -$$ +plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score') +#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold') -

    which gives

    +plt.xlabel('log10(lambda)') +plt.ylabel('mse') -$$ -\alpha_k = \frac{\boldsymbol{r}_k^T\boldsymbol{r}_k}{\boldsymbol{r}_k^T\boldsymbol{A}\boldsymbol{r}_k} -$$ +plt.legend() -

    leading to the iterative scheme

    -$$ -\begin{equation*} -\boldsymbol{x}_{k+1}=\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_{k}, - \end{equation*} -$$ +plt.show() +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +










    -

    Steepest descent example

    +

    More examples on bootstrap and cross-validation and errors

    @@ -932,26 +850,84 @@

    Steepest descent example

    -
    import numpy as np
    -import numpy.linalg as la
    -
    -import scipy.optimize as sopt
    -
    -import matplotlib.pyplot as pt
    -from mpl_toolkits.mplot3d import axes3d
    -
    -def f(x):
    -    return x[0]**2 + 3.0*x[1]**2
    -
    -def df(x):
    -    return np.array([2*x[0], 6*x[1]])
    -
    -fig = pt.figure()
    -ax = fig.add_subplot(projection = '3d')
    -
    -xmesh, ymesh = np.mgrid[-3:3:50j,-3:3:50j]
    -fmesh = f(np.array([xmesh, ymesh]))
    -ax.plot_surface(xmesh, ymesh, fmesh)
    +  
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +testerror = np.zeros(Maxpolydegree)
    +trainingerror = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +
    +trials = 100
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +
    +# loop over trials in order to estimate the expectation value of the MSE
    +    testerror[polydegree] = 0.0
    +    trainingerror[polydegree] = 0.0
    +    for samples in range(trials):
    +        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
    +        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
    +        ypred = model.predict(x_train)
    +        ytilde = model.predict(x_test)
    +        testerror[polydegree] += mean_squared_error(y_test, ytilde)
    +        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
    +
    +    testerror[polydegree] /= trials
    +    trainingerror[polydegree] /= trials
    +    print("Degree of polynomial: %3d"% polynomial[polydegree])
    +    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
    +    print("Mean squared error on test data: %.8f" % testerror[polydegree])
    +
    +plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
    +plt.plot(polynomial, np.log10(testerror), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
     
    @@ -967,7 +943,12 @@

    Steepest descent example

    -

    And then as countor plot

    +

    Note that we kept the intercept column in the fitting here. This means that we need to set the intercept in the call to the Scikit-Learn function as False. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.

    + + +

    The same example but now with cross-validation

    + +

    In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.

    @@ -975,9 +956,73 @@

    Steepest descent example

    -
    pt.axis("equal")
    -pt.contour(xmesh, ymesh, fmesh)
    -guesses = [np.array([2, 2./5])]
    +  
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.metrics import mean_squared_error
    +from sklearn.model_selection import KFold
    +from sklearn.model_selection import cross_val_score
    +
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("EoS.csv"),'r')
    +
    +# Read the EoS data as  csv file and organize the data into two arrays with density and energies
    +EoS = pd.read_csv(infile, names=('Density', 'Energy'))
    +EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
    +EoS = EoS.dropna()
    +Energies = EoS['Energy']
    +Density = EoS['Density']
    +#  The design matrix now as function of various polytrops
    +
    +Maxpolydegree = 30
    +X = np.zeros((len(Density),Maxpolydegree))
    +X[:,0] = 1.0
    +estimated_mse_sklearn = np.zeros(Maxpolydegree)
    +polynomial = np.zeros(Maxpolydegree)
    +k =5
    +kfold = KFold(n_splits = k)
    +
    +for polydegree in range(1, Maxpolydegree):
    +    polynomial[polydegree] = polydegree
    +    for degree in range(polydegree):
    +        X[:,degree] = Density**(degree/3.0)
    +        OLS = LinearRegression(fit_intercept=False)
    +# loop over trials in order to estimate the expectation value of the MSE
    +    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
    +#[:, np.newaxis]
    +    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
    +
    +plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
    +plt.xlabel('Polynomial degree')
    +plt.ylabel('log10[MSE]')
    +plt.legend()
    +plt.show()
     
    @@ -993,7 +1038,140 @@

    Steepest descent example

    -

    Find guesses

    + + +

    Logistic Regression

    + +

    In linear regression our main interest was centered on learning the +coefficients of a functional fit (say a polynomial) in order to be +able to predict the response of a continuous variable on some unseen +data. The fit to the continuous variable \( y_i \) is based on some +independent variables \( \boldsymbol{x}_i \). Linear regression resulted in +analytical expressions for standard ordinary Least Squares or Ridge +regression (in terms of matrices to invert) for several quantities, +ranging from the variance and thereby the confidence intervals of the +parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert +the product of the design matrices, linear regression gives then a +simple recipe for fitting our data. +

    + + +

    Classification problems

    + +

    Classification problems, however, are concerned with outcomes taking +the form of discrete variables (i.e. categories). We may for example, +on the basis of DNA sequencing for a number of patients, like to find +out which mutations are important for a certain disease; or based on +scans of various patients' brains, figure out if there is a tumor or +not; or given a specific physical system, we'd like to identify its +state, say whether it is an ordered or disordered system (typical +situation in solid state physics); or classify the status of a +patient, whether she/he has a stroke or not and many other similar +situations. +

    + +

    The most common situation we encounter when we apply logistic +regression is that of two possible outcomes, normally denoted as a +binary outcome, true or false, positive or negative, success or +failure etc. +

    + +









    +

    Optimization and Deep learning

    + +

    Logistic regression will also serve as our stepping stone towards +neural network algorithms and supervised deep learning. For logistic +learning, the minimization of the cost function leads to a non-linear +equation in the parameters \( \boldsymbol{\theta} \). The optimization of the +problem calls therefore for minimization algorithms. This forms the +bottle neck of all machine learning algorithms, namely how to find +reliable minima of a multi-variable function. This leads us to the +family of gradient descent methods. The latter are the working horses +of basically all modern machine learning algorithms. +

    + +

    We note also that many of the topics discussed here on logistic +regression are also commonly used in modern supervised Deep Learning +models, as we will see later. +

    + + +

    Basics

    + +

    We consider the case where the outputs/targets, also called the +responses or the outcomes, \( y_i \) are discrete and only take values +from \( k=0,\dots,K-1 \) (i.e. \( K \) classes). +

    + +

    The goal is to predict the +output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \) +made of \( n \) samples, each of which carries \( p \) features or predictors. The +primary goal is to identify the classes to which new unseen samples +belong. +

    + +

    Let us specialize to the case of two classes only, with outputs +\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a +credit card user that could default or not on her/his credit card +debt. That is +

    + +$$ +y_i = \begin{bmatrix} 0 & \mathrm{no}\\ 1 & \mathrm{yes} \end{bmatrix}. +$$ + + +









    +

    Linear classifier

    + +

    Before moving to the logistic model, let us try to use our linear +regression model to classify these two outcomes. We could for example +fit a linear model to the default case if \( y_i > 0.5 \) and the no +default case \( y_i \leq 0.5 \). +

    + +

    We would then have our +weighted linear combination, namely +

    +$$ +\begin{equation} +\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} + \boldsymbol{\epsilon}, +\label{_auto1} +\end{equation} +$$ + +

    where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our +\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors. +

    + +









    +

    Some selected properties

    + +

    The main problem with our function is that it takes values on the +entire real axis. In the case of logistic regression, however, the +labels \( y_i \) are discrete variables. A typical example is the credit +card data discussed below here, where we can set the state of +defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons +in the data set (see the full example below). +

    + +

    One simple way to get a discrete output is to have sign +functions that map the output of a linear regressor to values \( \{0,1\} \), +\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. +We will encounter this model in our first demonstration of neural networks. +

    + +

    Historically it is called the perceptron model in the machine learning +literature. This model is extremely simple. However, in many cases it is more +favorable to use a ``soft" classifier that outputs +the probability of a given category. This leads us to the logistic function. +

    + +









    +

    Simple example

    + +

    The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.

    +
    @@ -1001,8 +1179,59 @@

    Steepest descent example

    -
    x = guesses[-1]
    -s = -df(x)
    +  
    # Common imports
    +import os
    +import numpy as np
    +import pandas as pd
    +import matplotlib.pyplot as plt
    +from sklearn.linear_model import LinearRegression, Ridge, Lasso
    +from sklearn.model_selection import train_test_split
    +from sklearn.utils import resample
    +from sklearn.metrics import mean_squared_error
    +from IPython.display import display
    +from pylab import plt, mpl
    +mpl.rcParams['font.family'] = 'serif'
    +
    +# Where to save the figures and data files
    +PROJECT_ROOT_DIR = "Results"
    +FIGURE_ID = "Results/FigureFiles"
    +DATA_ID = "DataFiles/"
    +
    +if not os.path.exists(PROJECT_ROOT_DIR):
    +    os.mkdir(PROJECT_ROOT_DIR)
    +
    +if not os.path.exists(FIGURE_ID):
    +    os.makedirs(FIGURE_ID)
    +
    +if not os.path.exists(DATA_ID):
    +    os.makedirs(DATA_ID)
    +
    +def image_path(fig_id):
    +    return os.path.join(FIGURE_ID, fig_id)
    +
    +def data_path(dat_id):
    +    return os.path.join(DATA_ID, dat_id)
    +
    +def save_fig(fig_id):
    +    plt.savefig(image_path(fig_id) + ".png", format='png')
    +
    +infile = open(data_path("chddata.csv"),'r')
    +
    +# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
    +chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
    +chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
    +output = chd['CHD']
    +age = chd['Age']
    +agegroup = chd['Agegroup']
    +numberID  = chd['ID'] 
    +display(chd)
    +
    +plt.scatter(age, output, marker='o')
    +plt.axis([18,70.0,-0.1, 1.2])
    +plt.xlabel(r'Age')
    +plt.ylabel(r'CHD')
    +plt.title(r'Age distribution and Coronary heart disease')
    +plt.show()
     
    @@ -1018,7 +1247,12 @@

    Steepest descent example

    -

    Run it!

    + +









    +

    Plotting the mean value for each group

    + +

    What we could attempt however is to plot the mean value for each group.

    +
    @@ -1026,13 +1260,14 @@

    Steepest descent example

    -
    def f1d(alpha):
    -    return f(x + alpha*s)
    -
    -alpha_opt = sopt.golden(f1d)
    -next_guess = x + alpha_opt * s
    -guesses.append(next_guess)
    -print(next_guess)
    +  
    agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
    +group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
    +plt.plot(group, agegroupmean, "r-")
    +plt.axis([0,9,0, 1.0])
    +plt.xlabel(r'Age group')
    +plt.ylabel(r'CHD mean values')
    +plt.title(r'Mean values for each age group')
    +plt.show()
     
    @@ -1048,7 +1283,51 @@

    Steepest descent example

    -

    What happened?

    +

    We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \). +In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model +

    +$$ +f(y_i\vert x_i)=\theta_0+\theta_1 x_i. +$$ + +

    This expression implies however that \( f(y_i\vert x_i) \) could take any +value from minus infinity to plus infinity. If we however let +\( f(y\vert y) \) be represented by the mean value, the above example +shows us that we can constrain the function to take values between +zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking +at our last curve we see also that it has an S-shaped form. This leads +us to a very popular model for the function \( f \), namely the so-called +Sigmoid function or logistic model. We will consider this function as +representing the probability for finding a value of \( y_i \) with a given +\( x_i \). +

    + +









    +

    The logistic function

    + +

    Another widely studied model, is the so-called +perceptron model, which is an example of a "hard classification" model. We +will encounter this model when we discuss neural networks as +well. Each datapoint is deterministically assigned to a category (i.e +\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a "soft" +classifier that outputs the probability of a given category rather +than a single value. For example, given \( x_i \), the classifier +outputs the probability of being in a category \( k \). Logistic regression +is the most common example of a so-called soft classifier. In logistic +regression, the probability that a data point \( x_i \) +belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, +

    +$$ +p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}. +$$ + +

    Note that \( 1-p(t)= p(-t) \).

    + +









    +

    Examples of likelihood functions used in logistic regression and nueral networks

    + +

    The following code plots the logistic function, the step function and other functions we will encounter from here and on.

    +
    @@ -1056,10 +1335,60 @@

    Steepest descent example

    -
    pt.axis("equal")
    -pt.contour(xmesh, ymesh, fmesh, 50)
    -it_array = np.array(guesses)
    -pt.plot(it_array.T[0], it_array.T[1], "x-")
    +  
    """The sigmoid function (or the logistic curve) is a
    +function that takes any real number, z, and outputs a number (0,1).
    +It is useful in neural networks for assigning weights on a relative scale.
    +The value z is the weighted sum of parameters involved in the learning algorithm."""
    +
    +import numpy
    +import matplotlib.pyplot as plt
    +import math as mt
    +
    +z = numpy.arange(-5, 5, .1)
    +sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
    +sigma = sigma_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, sigma)
    +ax.set_ylim([-0.1, 1.1])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('sigmoid function')
    +
    +plt.show()
    +
    +"""Step Function"""
    +z = numpy.arange(-5, 5, .02)
    +step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
    +step = step_fn(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, step)
    +ax.set_ylim([-0.5, 1.5])
    +ax.set_xlim([-5,5])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('step function')
    +
    +plt.show()
    +
    +"""tanh Function"""
    +z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
    +t = numpy.tanh(z)
    +
    +fig = plt.figure()
    +ax = fig.add_subplot(111)
    +ax.plot(z, t)
    +ax.set_ylim([-1.0, 1.0])
    +ax.set_xlim([-2*mt.pi,2*mt.pi])
    +ax.grid(True)
    +ax.set_xlabel('z')
    +ax.set_title('tanh function')
    +
    +plt.show()
     
    @@ -1075,599 +1404,270 @@

    Steepest descent example

    -

    Note that we did only one iteration here. We can easily add more using our previous guesses.











    -

    Conjugate gradient method

    -
    - -

    -

    In the CG method we define so-called conjugate directions and two vectors -\( \boldsymbol{s} \) and \( \boldsymbol{t} \) -are said to be -conjugate if -

    +

    Two parameters

    + +

    We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities

    $$ -\begin{equation*} -\boldsymbol{s}^T\boldsymbol{A}\boldsymbol{t}= 0. -\end{equation*} +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} $$ -

    The philosophy of the CG method is to perform searches in various conjugate directions -of our vectors \( \boldsymbol{x}_i \) obeying the above criterion, namely -

    +

    where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \).

    + +

    Note that we used

    $$ -\begin{equation*} -\boldsymbol{x}_i^T\boldsymbol{A}\boldsymbol{x}_j= 0. -\end{equation*} +p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}). $$ -

    Two vectors are conjugate if they are orthogonal with respect to -this inner product. Being conjugate is a symmetric relation: if \( \boldsymbol{s} \) is conjugate to \( \boldsymbol{t} \), then \( \boldsymbol{t} \) is conjugate to \( \boldsymbol{s} \). -

    -
    + +

    Maximum likelihood

    -









    -

    Conjugate gradient method

    -
    - -

    -

    An example is given by the eigenvectors of the matrix

    +

    In order to define the total likelihood for all possible outcomes from a +dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels +\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called Maximum Likelihood Estimation (MLE) principle. +We aim thus at maximizing +the probability of seeing the observed data. We can then approximate the +likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is +

    $$ -\begin{equation*} -\boldsymbol{v}_i^T\boldsymbol{A}\boldsymbol{v}_j= \lambda\boldsymbol{v}_i^T\boldsymbol{v}_j, -\end{equation*} +\begin{align*} +P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\ +\end{align*} $$ -

    which is zero unless \( i=j \).

    -
    +

    from which we obtain the log-likelihood and our cost/loss function

    +$$ +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right). +$$









    -

    Conjugate gradient method

    -
    - -

    -

    Assume now that we have a symmetric positive-definite matrix \( \boldsymbol{A} \) of size -\( n\times n \). At each iteration \( i+1 \) we obtain the conjugate direction of a vector -

    +

    The cost function rewritten

    + +

    Reordering the logarithms, we can rewrite the cost/loss function as

    $$ -\begin{equation*} -\boldsymbol{x}_{i+1}=\boldsymbol{x}_{i}+\alpha_i\boldsymbol{p}_{i}. -\end{equation*} +\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). $$ -

    We assume that \( \boldsymbol{p}_{i} \) is a sequence of \( n \) mutually conjugate directions. -Then the \( \boldsymbol{p}_{i} \) form a basis of \( R^n \) and we can expand the solution -$ \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}$ in this basis, namely +

    The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \). +Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that

    - $$ -\begin{equation*} - \boldsymbol{x} = \sum^{n}_{i=1} \alpha_i \boldsymbol{p}_i. -\end{equation*} +\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right). $$ -
    +

    This equation is known in statistics as the cross entropy. Finally, we note that just as in linear regression, +in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression. +











    -

    Conjugate gradient method

    -
    - -

    -

    The coefficients are given by

    -$$ -\begin{equation*} - \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}. -\end{equation*} -$$ +

    Minimizing the cross entropy

    + +

    The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and, +therefore, any local minimizer is a global minimizer. +

    -

    Multiplying with \( \boldsymbol{p}_k^T \) from the left gives

    +

    Minimizing this +cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain +

    $$ -\begin{equation*} - \boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{x} = \sum^{n}_{i=1} \alpha_i\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{p}_i= \boldsymbol{p}_k^T \boldsymbol{b}, -\end{equation*} +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right), $$ -

    and we can define the coefficients \( \alpha_k \) as

    - +

    and

    $$ -\begin{equation*} - \alpha_k = \frac{\boldsymbol{p}_k^T \boldsymbol{b}}{\boldsymbol{p}_k^T \boldsymbol{A} \boldsymbol{p}_k} -\end{equation*} +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right). $$ -










    -

    Conjugate gradient method and iterations

    -
    - -

    +

    A more compact expression

    -

    If we choose the conjugate vectors \( \boldsymbol{p}_k \) carefully, -then we may not need all of them to obtain a good approximation to the solution -\( \boldsymbol{x} \). -We want to regard the conjugate gradient method as an iterative method. -This will us to solve systems where \( n \) is so large that the direct -method would take too much time. +

    Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an +\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a +vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first +derivative of the cost function as

    -

    We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). -We can assume without loss of generality that -

    $$ -\begin{equation*} -\boldsymbol{x}_0=0, -\end{equation*} +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). $$ -

    or consider the system

    +

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements +\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as +

    + $$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0, -\end{equation*} +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. $$ -

    instead.

    -
    -









    -

    Conjugate gradient method

    -
    - -

    -

    One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form

    +

    Extending to more predictors

    + +

    Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors

    $$ -\begin{equation*} - f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. -\end{equation*} +\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p. $$ -

    This suggests taking the first basis vector \( \boldsymbol{p}_1 \) -to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), -which equals -

    +

    Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to

    $$ -\begin{equation*} -\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b}, -\end{equation*} +p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}. $$ -

    and -\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \). -The other vectors in the basis will be conjugate to the gradient, -hence the name conjugate gradient method. -

    -
    -









    -

    Conjugate gradient method

    -
    - -

    -

    Let \( \boldsymbol{r}_k \) be the residual at the \( k \)-th step:

    -$$ -\begin{equation*} -\boldsymbol{r}_k=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k. -\end{equation*} -$$ +

    Including more classes

    -

    Note that \( \boldsymbol{r}_k \) is the negative gradient of \( f \) at -\( \boldsymbol{x}=\boldsymbol{x}_k \), -so the gradient descent method would be to move in the direction \( \boldsymbol{r}_k \). -Here, we insist that the directions \( \boldsymbol{p}_k \) are conjugate to each other, -so we take the direction closest to the gradient \( \boldsymbol{r}_k \) -under the conjugacy constraint. -This gives the following expression +

    Till now we have mainly focused on two classes, the so-called binary +system. Suppose we wish to extend to \( K \) classes. Let us for the sake +of simplicity assume we have only two predictors. We have then following model

    -$$ -\begin{equation*} -\boldsymbol{p}_{k+1}=\boldsymbol{r}_k-\frac{\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{r}_k}{\boldsymbol{p}_k^T\boldsymbol{A}\boldsymbol{p}_k} \boldsymbol{p}_k. -\end{equation*} -$$ -
    - - -









    -

    Conjugate gradient method

    -
    - -

    -

    We can also compute the residual iteratively as

    -$$ -\begin{equation*} -\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1}, - \end{equation*} -$$ -

    which equals

    $$ -\begin{equation*} -\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{p}_k), - \end{equation*} +\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1, $$ -

    or

    +

    and

    $$ -\begin{equation*} -(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{p}_k, - \end{equation*} +\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1, $$ -

    which gives

    - +

    and so on till the class \( C=K-1 \) class

    $$ -\begin{equation*} -\boldsymbol{r}_{k+1}=\boldsymbol{r}_k-\boldsymbol{A}\boldsymbol{p}_{k}, - \end{equation*} +\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1, $$ -
    - - -

    Revisiting our first homework

    - -

    We will use linear regression as a case study for the gradient descent -methods. Linear regression is a great test case for the gradient -descent methods discussed in the lectures since it has several -desirable properties such as: +

    and the model is specified in term of \( K-1 \) so-called log-odds or +logit transformations.

    -
      -
    1. An analytical solution (recall homework set 1).
    2. -
    3. The gradient can be computed analytically.
    4. -
    5. The cost function is convex which guarantees that gradient descent converges for small enough learning rates
    6. -
    -

    We revisit an example similar to what we had in the first homework set. We had a function of the type

    - +









    +

    More classes

    - -
    -
    -
    -
    -
    -
    m = 100
    -x = 2*np.random.rand(m,1)
    -y = 4+3*x+np.random.randn(m,1)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    +

    In our discussion of neural networks we will encounter the above again +in terms of a slightly modified function, the so-called Softmax function. +

    -

    with \( x_i \in [0,1] \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). -The linear regression model is given by +

    The softmax function is used in various multiclass classification +methods, such as multinomial logistic regression (also known as +softmax regression), multiclass linear discriminant analysis, naive +Bayes classifiers, and artificial neural networks. Specifically, in +multinomial logistic regression and linear discriminant analysis, the +input to the function is the result of \( K \) distinct linear functions, +and the predicted probability for the \( k \)-th class given a sample +vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two +predictors):

    -$$ -h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x, -$$ -

    such that

    $$ -\boldsymbol{y}_i = \beta_0 + \beta_1 x_i. +p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}. $$ - - -

    Gradient descent example

    - -

    Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)

    - -

    It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2} \) is the design matrix given by (we keep the intercept here)

    +

    It is easy to extend to more predictors. The final class is

    $$ -X \equiv \begin{bmatrix} -1 & x_1 \\ -\vdots & \vdots \\ -1 & x_{100} & \\ -\end{bmatrix}. +p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}, $$ -

    The cost/loss/risk function is given by (

    -$$ -C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] -$$ +

    and they sum to one. Our earlier discussions were all specialized to +the case with two classes only. It is easy to see from the above that +what we derived earlier is compatible with these equations. +

    -

    and we want to find \( \beta \) such that \( C(\beta) \) is minimized.

    +

    To find the optimal parameters we would typically use a gradient +descent method. Newton's method and gradient descent methods are +discussed in the material on optimization +methods. +











    -

    The derivative of the cost/loss function

    - -

    Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show that the gradient can be written as

    -$$ -\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\ -\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\ -\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), -$$ +

    Optimization, the central part of any Machine Learning algortithm

    -

    where \( X \) is the design matrix defined above.

    +

    Almost every problem in machine learning and data science starts with +a dataset \( X \), a model \( g(\theta) \), which is a function of the +parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows +us to judge how well the model \( g(\theta) \) explains the observations +\( X \). The model is fit by finding the values of \( \theta \) that minimize +the cost function. Ideally we would be able to solve for \( \theta \) +analytically, however this is not possible in general and we must use +some approximative/numerical method to compute the minimum. +











    -

    The Hessian matrix

    -

    The Hessian matrix of \( C(\beta) \) is given by

    +

    Revisiting our Logistic Regression case

    + +

    In our discussion on Logistic Regression we studied the +case of +two classes, with \( y_i \) either +\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two +parameters \( \theta \) in our fitting, that is we +defined probabilities +

    + $$ -\boldsymbol{H} \equiv \begin{bmatrix} -\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} \\ -\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} & \\ -\end{bmatrix} = \frac{2}{n}X^T X. +\begin{align*} +p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\ +p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}), +\end{align*} $$ -

    This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.

    +

    where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \).











    -

    Simple program

    +

    The equations to solve

    + +

    Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \) +elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the +\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities +\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form +the first derivative of the cost function as +

    -

    We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to

    $$ -\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots +\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). $$ -

    We can use the expression we computed for the gradient and let use a -\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating -when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). Note that the code below does not include the latter stop criterion. -

    - -

    And finally we can compare our solution for \( \beta \) with the analytic result given by -\( \beta= (X^TX)^{-1} X^T \mathbf{y} \). +

    If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements +\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as

    -









    -

    Gradient Descent Example

    - -

    Here our simple example

    - - -
    -
    -
    -
    -
    -
    # Importing various packages
    -from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from mpl_toolkits.mplot3d import Axes3D
    -from matplotlib import cm
    -from matplotlib.ticker import LinearLocator, FormatStrFormatter
    -import sys
    -
    -# the number of datapoints
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -# Hessian matrix
    -H = (2.0/n)* X.T @ X
    -# Get the eigenvalues
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
    -print(beta_linreg)
    -beta = np.random.randn(2,1)
    -
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -
    -for iter in range(Niterations):
    -    gradient = (2.0/n)*X.T @ (X @ beta-y)
    -    beta -= eta*gradient
    -
    -print(beta)
    -xnew = np.array([[0],[2]])
    -xbnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = xbnew.dot(beta)
    -ypredict2 = xbnew.dot(beta_linreg)
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Gradient descent example')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    +$$ +\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. +$$ +

    This defines what is called the Hessian matrix.











    -

    And a corresponding example using scikit-learn

    - - - -
    -
    -
    -
    -
    -
    # Importing various packages
    -from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from sklearn.linear_model import SGDRegressor
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    -print(beta_linreg)
    -sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)
    -sgdreg.fit(x,y.ravel())
    -print(sgdreg.intercept_, sgdreg.coef_)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - - -

    Gradient descent and Ridge

    +

    Solving using Newton-Raphson's method

    -

    We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \),

    -$$ -C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0. -$$ +

    If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the matrices that define the first and second derivatives.

    -

    In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows

    -$$ -\nabla_\beta C_{\text{ridge}}(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\ -\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\ -\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta). -$$ +

    Our iterative scheme is then given by

    -

    We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by

    $$ -\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}. +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}}, $$ +

    or in matrix form as

    -









    -

    The Hessian matrix for Ridge Regression

    -

    The Hessian matrix of Ridge Regression for our simple example is given by

    $$ -\boldsymbol{H} \equiv \begin{bmatrix} -\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} \\ -\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} & \\ -\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}. +\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}. $$ -

    This implies that the Hessian matrix is positive definite, hence the stationary point is a -minimum. -Note that the Ridge cost function is convex being a sum of two convex -functions. Therefore, the stationary point is a global -minimum of this function. -

    - -









    -

    Program example for gradient descent with Ridge Regression

    - - -
    -
    -
    -
    -
    -
    from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -from mpl_toolkits.mplot3d import Axes3D
    -from matplotlib import cm
    -from matplotlib.ticker import LinearLocator, FormatStrFormatter
    -import sys
    -
    -# the number of datapoints
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -
    -#Ridge parameter lambda
    -lmbda  = 0.001
    -Id = n*lmbda* np.eye(XT_X.shape[0])
    -
    -# Hessian matrix
    -H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
    -# Get the eigenvalues
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -
    -beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
    -print(beta_linreg)
    -# Start plain gradient descent
    -beta = np.random.randn(2,1)
    -
    -eta = 1.0/np.max(EigValues)
    -Niterations = 100
    -
    -for iter in range(Niterations):
    -    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta
    -    beta -= eta*gradients
    -
    -print(beta)
    -ypredict = X @ beta
    -ypredict2 = X @ beta_linreg
    -plt.plot(x, ypredict, "r-")
    -plt.plot(x, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Gradient descent example for Ridge')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - +

    The right-hand side is computed with the old values of \( \theta \).

    -









    -

    Using gradient descent methods, limitations

    +

    If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.

    -
      -
    • Gradient descent (GD) finds local minima of our function. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
    • -
    • GD is sensitive to initial conditions. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
    • -
    • Gradients are computationally expensive to calculate for large datasets. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over all \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called "mini batches". This has the added benefit of introducing stochasticity into our algorithm.
    • -
    • GD is very sensitive to choices of learning rates. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would adaptively choose the learning rates to match the landscape.
    • -
    • GD treats all directions in parameter space uniformly. Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.
    • -
    • GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
    • -










    -

    Improving gradient descent with momentum

    - -

    We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.

    +

    Example code for Logistic Regression

    +

    Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.

    @@ -1675,61 +1675,138 @@

    Improving gradient descent wit
    -
    from numpy import asarray
    -from numpy import arange
    -from numpy.random import rand
    -from numpy.random import seed
    -from matplotlib import pyplot
    - 
    -# objective function
    -def objective(x):
    -	return x**2.0
    - 
    -# derivative of objective function
    -def derivative(x):
    -	return x * 2.0
    - 
    -# gradient descent algorithm
    -def gradient_descent(objective, derivative, bounds, n_iter, step_size):
    -	# track all solutions
    -	solutions, scores = list(), list()
    -	# generate an initial point
    -	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    -	# run the gradient descent
    -	for i in range(n_iter):
    -		# calculate gradient
    -		gradient = derivative(solution)
    -		# take a step
    -		solution = solution - step_size * gradient
    -		# evaluate candidate point
    -		solution_eval = objective(solution)
    -		# store solution
    -		solutions.append(solution)
    -		scores.append(solution_eval)
    -		# report progress
    -		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    -	return [solutions, scores]
    - 
    -# seed the pseudo random number generator
    -seed(4)
    -# define range for input
    -bounds = asarray([[-1.0, 1.0]])
    -# define the total iterations
    -n_iter = 30
    -# define the step size
    -step_size = 0.1
    -# perform the gradient descent search
    -solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
    -# sample input range uniformly at 0.1 increments
    -inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    -# compute targets
    -results = objective(inputs)
    -# create a line plot of input vs result
    -pyplot.plot(inputs, results)
    -# plot the solutions found
    -pyplot.plot(solutions, scores, '.-', color='red')
    -# show the plot
    -pyplot.show()
    +  
    import numpy as np
    +
    +class LogisticRegression:
    +    """
    +    Logistic Regression for binary and multiclass classification.
    +    """
    +    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
    +        self.lr = lr                  # Learning rate for gradient descent
    +        self.epochs = epochs          # Number of iterations
    +        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
    +        self.verbose = verbose        # Print loss during training if True
    +        self.weights = None
    +        self.multi_class = False      # Will be determined at fit time
    +
    +    def _add_intercept(self, X):
    +        """Add intercept term (column of ones) to feature matrix."""
    +        intercept = np.ones((X.shape[0], 1))
    +        return np.concatenate((intercept, X), axis=1)
    +
    +    def _sigmoid(self, z):
    +        """Sigmoid function for binary logistic."""
    +        return 1 / (1 + np.exp(-z))
    +
    +    def _softmax(self, Z):
    +        """Softmax function for multiclass logistic."""
    +        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
    +        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
    +
    +    def fit(self, X, y):
    +        """
    +        Train the logistic regression model using gradient descent.
    +        Supports binary (sigmoid) and multiclass (softmax) based on y.
    +        """
    +        X = np.array(X)
    +        y = np.array(y)
    +        n_samples, n_features = X.shape
    +
    +        # Add intercept if needed
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +            n_features += 1
    +
    +        # Determine classes and mode (binary vs multiclass)
    +        unique_classes = np.unique(y)
    +        if len(unique_classes) > 2:
    +            self.multi_class = True
    +        else:
    +            self.multi_class = False
    +
    +        # ----- Multiclass case -----
    +        if self.multi_class:
    +            n_classes = len(unique_classes)
    +            # Map original labels to 0...n_classes-1
    +            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
    +            y_indices = np.array([class_to_index[c] for c in y])
    +            # Initialize weight matrix (features x classes)
    +            self.weights = np.zeros((n_features, n_classes))
    +
    +            # One-hot encode y
    +            Y_onehot = np.zeros((n_samples, n_classes))
    +            Y_onehot[np.arange(n_samples), y_indices] = 1
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
    +                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
    +                # Compute gradient (features x classes)
    +                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
    +                # Update weights
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute current loss (categorical cross-entropy)
    +                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
    +                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
    +
    +        # ----- Binary case -----
    +        else:
    +            # Convert y to 0/1 if not already
    +            if not np.array_equal(unique_classes, [0, 1]):
    +                # Map the two classes to 0 and 1
    +                class0, class1 = unique_classes
    +                y_binary = np.where(y == class1, 1, 0)
    +            else:
    +                y_binary = y.copy().astype(int)
    +
    +            # Initialize weights vector (features,)
    +            self.weights = np.zeros(n_features)
    +
    +            # Gradient descent
    +            for epoch in range(self.epochs):
    +                linear_model = X.dot(self.weights)     # (n_samples,)
    +                probs = self._sigmoid(linear_model)   # (n_samples,)
    +                # Gradient for binary cross-entropy
    +                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
    +                self.weights -= self.lr * gradient
    +
    +                if self.verbose and epoch % 100 == 0:
    +                    # Compute binary cross-entropy loss
    +                    loss = -np.mean(
    +                        y_binary * np.log(probs + 1e-15) + 
    +                        (1 - y_binary) * np.log(1 - probs + 1e-15)
    +                    )
    +                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
    +
    +    def predict_prob(self, X):
    +        """
    +        Compute probability estimates. Returns a 1D array for binary or
    +        a 2D array (n_samples x n_classes) for multiclass.
    +        """
    +        X = np.array(X)
    +        # Add intercept if the model used it
    +        if self.fit_intercept:
    +            X = self._add_intercept(X)
    +        scores = X.dot(self.weights)
    +        if self.multi_class:
    +            return self._softmax(scores)
    +        else:
    +            return self._sigmoid(scores)
    +
    +    def predict(self, X):
    +        """
    +        Predict class labels for samples in X.
    +        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
    +        """
    +        probs = self.predict_prob(X)
    +        if self.multi_class:
    +            # Choose class with highest probability
    +            return np.argmax(probs, axis=1)
    +        else:
    +            # Threshold at 0.5 for binary
    +            return (probs >= 0.5).astype(int)
     
    @@ -1745,9 +1822,16 @@

    Improving gradient descent wit

    - -









    -

    Same code but now with momentum gradient descent

    +

    The class implements the sigmoid and softmax internally. During fit(), +we check the number of classes: if more than 2, we set +self.multi_class=True and perform multinomial logistic regression. We +one-hot encode the target vector and update a weight matrix with +softmax probabilities. Otherwise, we do standard binary logistic +regression, converting labels to 0/1 if needed and updating a weight +vector. In both cases we use batch gradient descent on the +cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical +stability). Progress (loss) can be printed if verbose=True. +

    @@ -1756,69 +1840,38 @@

    Same code but now with
    -
    from numpy import asarray
    -from numpy import arange
    -from numpy.random import rand
    -from numpy.random import seed
    -from matplotlib import pyplot
    - 
    -# objective function
    -def objective(x):
    -	return x**2.0
    - 
    -# derivative of objective function
    -def derivative(x):
    -	return x * 2.0
    - 
    -# gradient descent algorithm
    -def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
    -	# track all solutions
    -	solutions, scores = list(), list()
    -	# generate an initial point
    -	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
    -	# keep track of the change
    -	change = 0.0
    -	# run the gradient descent
    -	for i in range(n_iter):
    -		# calculate gradient
    -		gradient = derivative(solution)
    -		# calculate update
    -		new_change = step_size * gradient + momentum * change
    -		# take a step
    -		solution = solution - new_change
    -		# save the change
    -		change = new_change
    -		# evaluate candidate point
    -		solution_eval = objective(solution)
    -		# store solution
    -		solutions.append(solution)
    -		scores.append(solution_eval)
    -		# report progress
    -		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
    -	return [solutions, scores]
    - 
    -# seed the pseudo random number generator
    -seed(4)
    -# define range for input
    -bounds = asarray([[-1.0, 1.0]])
    -# define the total iterations
    -n_iter = 30
    -# define the step size
    -step_size = 0.1
    -# define momentum
    -momentum = 0.3
    -# perform the gradient descent search with momentum
    -solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
    -# sample input range uniformly at 0.1 increments
    -inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
    -# compute targets
    -results = objective(inputs)
    -# create a line plot of input vs result
    -pyplot.plot(inputs, results)
    -# plot the solutions found
    -pyplot.plot(solutions, scores, '.-', color='red')
    -# show the plot
    -pyplot.show()
    +  
    # Evaluation Metrics
    +#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
    +
    +def accuracy_score(y_true, y_pred):
    +    """Accuracy = (# correct predictions) / (total samples)."""
    +    y_true = np.array(y_true)
    +    y_pred = np.array(y_pred)
    +    return np.mean(y_true == y_pred)
    +
    +def binary_cross_entropy(y_true, y_prob):
    +    """
    +    Binary cross-entropy loss.
    +    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
    +    """
    +    y_true = np.array(y_true)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
    +
    +def categorical_cross_entropy(y_true, y_prob):
    +    """
    +    Categorical cross-entropy loss for multiclass.
    +    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
    +    """
    +    y_true = np.array(y_true, dtype=int)
    +    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
    +    # One-hot encode true labels
    +    n_samples, n_classes = y_prob.shape
    +    one_hot = np.zeros_like(y_prob)
    +    one_hot[np.arange(n_samples), y_true] = 1
    +    # Compute cross-entropy
    +    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
    +    return np.mean(loss_vec)
     
    @@ -1833,133 +1886,12 @@

    Same code but now with

    +

    Synthetic data generation

    - -









    -

    Overview video on Stochastic Gradient Descent

    - -What is Stochastic Gradient Descent - -









    -

    Batches and mini-batches

    - -

    In gradient descent we compute the cost function and its gradient for all data points we have.

    - -

    In large-scale applications such as the ILSVRC challenge, the -training data can have on order of millions of examples. Hence, it -seems wasteful to compute the full cost function over the entire -training set in order to perform only a single parameter update. A -very common approach to addressing this challenge is to compute the -gradient over batches of the training data. For example, a typical batch could contain some thousand examples from -an entire training set of several millions. This batch is then used to -perform a parameter update. -

    - -









    -

    Stochastic Gradient Descent (SGD)

    - -

    In stochastic gradient descent, the extreme case is the case where we -have only one batch, that is we include the whole data set. -

    - -

    This process is called Stochastic Gradient -Descent (SGD) (or also sometimes on-line gradient descent). This is -relatively less common to see because in practice due to vectorized -code optimizations it can be computationally much more efficient to -evaluate the gradient for 100 examples, than the gradient for one -example 100 times. Even though SGD technically refers to using a -single example at a time to evaluate the gradient, you will hear -people use the term SGD even when referring to mini-batch gradient -descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD -for “Batch gradient descent” are rare to see), where it is usually -assumed that mini-batches are used. The size of the mini-batch is a -hyperparameter but it is not very common to cross-validate or bootstrap it. It is -usually based on memory constraints (if any), or set to some value, -e.g. 32, 64 or 128. We use powers of 2 in practice because many -vectorized operation implementations work faster when their inputs are -sized in powers of 2. -

    - -

    In our notes with SGD we mean stochastic gradient descent with mini-batches.

    - -









    -

    Stochastic Gradient Descent

    - -

    Stochastic gradient descent (SGD) and variants thereof address some of -the shortcomings of the Gradient descent method discussed above. -

    - -

    The underlying idea of SGD comes from the observation that the cost -function, which we want to minimize, can almost always be written as a -sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \), -

    -$$ -C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i, -\mathbf{\beta}). -$$ - - -









    -

    Computation of gradients

    - -

    This in turn means that the gradient can be -computed as a sum over \( i \)-gradients -

    -$$ -\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i, -\mathbf{\beta}). -$$ - -

    Stochasticity/randomness is introduced by only taking the -gradient on a subset of the data called minibatches. If there are \( n \) -data points and the size of each minibatch is \( M \), there will be \( n/M \) -minibatches. We denote these minibatches by \( B_k \) where -\( k=1,\cdots,n/M \). -

    - -









    -

    SGD example

    -

    As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) -and we choose to have \( M=5 \) minibathces, -then each minibatch contains two data points. In particular we have -\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 = -(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you -have only a single batch with all data points and on the other extreme, -you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e -\( B_k = \mathbf{x}_k \). -

    - -

    The idea is now to approximate the gradient by replacing the sum over -all data points with a sum over the data points in one the minibatches -picked at random in each gradient descent step -

    -$$ -\nabla_{\beta} -C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i, -\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta -c_i(\mathbf{x}_i, \mathbf{\beta}). -$$ - - -









    -

    The gradient step

    - -

    Thus a gradient descent step now looks like

    -$$ -\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i, -\mathbf{\beta}) -$$ - -

    where \( k \) is picked at random with equal -probability from \( [1,n/M] \). An iteration over the number of -minibathces (n/M) is commonly referred to as an epoch. Thus it is -typical to choose a number of epochs and for each epoch iterate over -the number of minibatches, as exemplified in the code below. +

    Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2]. +Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.

    -









    -

    Simple example code

    -
    @@ -1967,20 +1899,84 @@

    Simple example code

    -
    import numpy as np 
    -
    -n = 100 #100 datapoints 
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -n_epochs = 10 #number of epochs
    -
    -j = 0
    -for epoch in range(1,n_epochs+1):
    -    for i in range(m):
    -        k = np.random.randint(m) #Pick the k-th minibatch at random
    -        #Compute the gradient using the data in minibatch Bk
    -        #Compute new suggestion for 
    -        j += 1
    +  
    import numpy as np
    +
    +def generate_binary_data(n_samples=100, n_features=2, random_state=None):
    +    """
    +    Generate synthetic binary classification data.
    +    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    # Half samples for class 0, half for class 1
    +    n0 = n_samples // 2
    +    n1 = n_samples - n0
    +    # Class 0 around mean -2, class 1 around +2
    +    mean0 = -2 * np.ones(n_features)
    +    mean1 =  2 * np.ones(n_features)
    +    X0 = rng.randn(n0, n_features) + mean0
    +    X1 = rng.randn(n1, n_features) + mean1
    +    X = np.vstack((X0, X1))
    +    y = np.array([0]*n0 + [1]*n1)
    +    return X, y
    +
    +def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
    +    """
    +    Generate synthetic multiclass data with n_classes Gaussian clusters.
    +    """
    +    rng = np.random.RandomState(random_state)
    +    X = []
    +    y = []
    +    samples_per_class = n_samples // n_classes
    +    for cls in range(n_classes):
    +        # Random cluster center for each class
    +        center = rng.uniform(-5, 5, size=n_features)
    +        Xi = rng.randn(samples_per_class, n_features) + center
    +        yi = [cls] * samples_per_class
    +        X.append(Xi)
    +        y.extend(yi)
    +    X = np.vstack(X)
    +    y = np.array(y)
    +    return X, y
    +
    +
    +# Generate and test on binary data
    +X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
    +model_bin = LogisticRegression(lr=0.1, epochs=1000)
    +model_bin.fit(X_bin, y_bin)
    +y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
    +y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
    +
    +acc_bin = accuracy_score(y_bin, y_pred_bin)
    +loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
    +print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
    +#For multiclass:
    +# Generate and test on multiclass data
    +X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
    +model_multi = LogisticRegression(lr=0.1, epochs=1000)
    +model_multi.fit(X_multi, y_multi)
    +y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
    +y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
    +
    +acc_multi = accuracy_score(y_multi, y_pred_multi)
    +loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
    +print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
    +
    +# CSV Export
    +import csv
    +
    +# Export binary results
    +with open('binary_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_bin, y_pred_bin):
    +        writer.writerow([true, pred])
    +
    +# Export multiclass results
    +with open('multiclass_results.csv', mode='w', newline='') as f:
    +    writer = csv.writer(f)
    +    writer.writerow(["TrueLabel", "PredictedLabel"])
    +    for true, pred in zip(y_multi, y_pred_multi):
    +        writer.writerow([true, pred])
     
    @@ -1996,1806 +1992,9 @@

    Simple example code

    -

    Taking the gradient only on a subset of the data has two important -benefits. First, it introduces randomness which decreases the chance -that our opmization scheme gets stuck in a local minima. Second, if -the size of the minibatches are small relative to the number of -datapoints (\( M < n \)), the computation of the gradient is much -cheaper since we sum over the datapoints in the \( k-th \) minibatch and not -all \( n \) datapoints. -

    - -









    -

    When do we stop?

    - -

    A natural question is when do we stop the search for a new minimum? -One possibility is to compute the full gradient after a given number -of epochs and check if the norm of the gradient is smaller than some -threshold and stop if true. However, the condition that the gradient -is zero is valid also for local minima, so this would only tell us -that we are close to a local/global minimum. However, we could also -evaluate the cost function at this point, store the result and -continue the search. If the test kicks in at a later stage we can -compare the values of the cost function and keep the \( \beta \) that -gave the lowest value. -

    - -









    -

    Slightly different approach

    - -

    Another approach is to let the step length \( \gamma_j \) depend on the -number of epochs in such a way that it becomes very small after a -reasonable time such that we do not move at all. Such approaches are -also called scaling. There are many such ways to scale the learning -rate -and discussions here. See -also -https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 -for a discussion of different scaling functions for the learning rate. -

    - -









    -

    Time decay rate

    - -

    As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in time \( t \).

    - -

    In this way we can fix the number of epochs, compute \( \beta \) and -evaluate the cost function at the end. Repeating the computation will -give a different result since the scheme is random by design. Then we -pick the final \( \beta \) that gives the lowest value of the cost -function. -

    - - - -
    -
    -
    -
    -
    -
    import numpy as np 
    -
    -def step_length(t,t0,t1):
    -    return t0/(t+t1)
    -
    -n = 100 #100 datapoints 
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -n_epochs = 500 #number of epochs
    -t0 = 1.0
    -t1 = 10
    -
    -gamma_j = t0/t1
    -j = 0
    -for epoch in range(1,n_epochs+1):
    -    for i in range(m):
    -        k = np.random.randint(m) #Pick the k-th minibatch at random
    -        #Compute the gradient using the data in minibatch Bk
    -        #Compute new suggestion for beta
    -        t = epoch*m+i
    -        gamma_j = step_length(t,t0,t1)
    -        j += 1
    -
    -print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Code with a Number of Minibatches which varies

    - -

    In the code here we vary the number of mini-batches.

    - - -
    -
    -
    -
    -
    -
    # Importing various packages
    -from math import exp, sqrt
    -from random import random, seed
    -import numpy as np
    -import matplotlib.pyplot as plt
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -
    -
    -for iter in range(Niterations):
    -    gradients = 2.0/n*X.T @ ((X @ theta)-y)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -xnew = np.array([[0],[2]])
    -Xnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = Xnew.dot(theta)
    -ypredict2 = Xnew.dot(theta_linreg)
    -
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -t0, t1 = 5, 50
    -
    -def learning_schedule(t):
    -    return t0/(t+t1)
    -
    -theta = np.random.randn(2,1)
    -
    -for epoch in range(n_epochs):
    -# Can you figure out a better way of setting up the contributions to each batch?
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
    -        eta = learning_schedule(epoch*m+i)
    -        theta = theta - eta*gradients
    -print("theta from own sdg")
    -print(theta)
    -
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Random numbers ')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Replace or not

    - -

    In the above code, we have use replacement in setting up the -mini-batches. The discussion -here may be -useful. -

    - -









    -

    Momentum based GD

    - -

    The stochastic gradient descent (SGD) is almost always used with a -momentum or inertia term that serves as a memory of the direction we -are moving in parameter space. This is typically implemented as -follows -

    - -$$ -\begin{align} -\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\ -\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}, -\label{_auto1} -\end{align} -$$ - -

    where we have introduced a momentum parameter \( \gamma \), with -\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to -indicate the gradient is to be taken over a different mini-batch at -each step. We call this algorithm gradient descent with momentum -(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a -running average of recently encountered gradients and -\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory -used in the averaging procedure. Consistent with this, when -\( \gamma=0 \), this just reduces down to ordinary SGD as discussed -earlier. An equivalent way of writing the updates is -

    - -$$ -\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t), -$$ - -

    where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).

    - -









    -

    More on momentum based approaches

    - -

    Let us try to get more intuition from these equations. It is helpful -to consider a simple physical analogy with a particle of mass \( m \) -moving in a viscous medium with drag coefficient \( \mu \) and potential -\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \), -then its motion is described by -

    - -$$ -m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}). -$$ - -

    We can discretize this equation in the usual way to get

    - -$$ -m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}). -$$ - -

    Rearranging this equation, we can rewrite this as

    - -$$ -\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t. -$$ - - -









    -

    Momentum parameter

    - -

    Notice that this equation is identical to previous one if we identify -the position of the particle, \( \mathbf{w} \), with the parameters -\( \boldsymbol{\theta} \). This allows us to identify the momentum -parameter and learning rate with the mass of the particle and the -viscous drag as: -

    - -$$ -\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}. -$$ - -

    Thus, as the name suggests, the momentum parameter is proportional to -the mass of the particle and effectively provides inertia. -Furthermore, in the large viscosity/small learning rate limit, our -memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \). -

    - -

    Why is momentum useful? SGD momentum helps the gradient descent -algorithm gain speed in directions with persistent but small gradients -even in the presence of stochasticity, while suppressing oscillations -in high-curvature directions. This becomes especially important in -situations where the landscape is shallow and flat in some directions -and narrow and steep in others. It has been argued that first-order -methods (with appropriate initial conditions) can perform comparable -to more expensive second order methods, especially in the context of -complex deep learning models. -

    - -

    These beneficial properties of momentum can sometimes become even more -pronounced by using a slight modification of the classical momentum -algorithm called Nesterov Accelerated Gradient (NAG). -

    - -

    In the NAG algorithm, rather than calculating the gradient at the -current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one -calculates the gradient at the expected value of the parameters given -our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma -\mathbf{v}_{t-1}) \). This yields the NAG update rule -

    - -$$ -\begin{align} -\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\ -\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}. -\label{_auto2} -\end{align} -$$ - -

    One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).

    - -









    -

    Second moment of the gradient

    - -

    In stochastic gradient descent, with and without momentum, we still -have to specify a schedule for tuning the learning rates \( \eta_t \) -as a function of time. As discussed in the context of Newton's -method, this presents a number of dilemmas. The learning rate is -limited by the steepest direction which can change depending on the -current position in the landscape. To circumvent this problem, ideally -our algorithm would keep track of curvature and take large steps in -shallow, flat directions and small steps in steep, narrow directions. -Second-order methods accomplish this by calculating or approximating -the Hessian and normalizing the learning rate by the -curvature. However, this is very computationally expensive for -extremely large models. Ideally, we would like to be able to -adaptively change the step size to match the landscape without paying -the steep computational price of calculating or approximating -Hessians. -

    - -

    Recently, a number of methods have been introduced that accomplish -this by tracking not only the gradient, but also the second moment of -the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and -ADAM. -

    - -









    -

    RMS prop

    - -

    In RMS prop, in addition to keeping a running average of the first -moment of the gradient, we also keep track of the second moment -denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule -for RMS prop is given by -

    - -$$ -\begin{align} -\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) -\label{_auto3}\\ -\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\ -\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber -\end{align} -$$ - -

    where \( \beta \) controls the averaging time of the second moment and is -typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate -typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8} \) is a -small regularization constant to prevent divergences. Multiplication -and division by vectors is understood as an element-wise operation. It -is clear from this formula that the learning rate is reduced in -directions where the norm of the gradient is consistently large. This -greatly speeds up the convergence by allowing us to use a larger -learning rate for flat directions. -

    - -









    -

    ADAM optimizer

    - -

    A related algorithm is the ADAM optimizer. In -ADAM, we keep a running average of -both the first and second moment of the gradient and use this -information to adaptively change the learning rate for different -parameters. The method isefficient when working with large -problems involving lots data and/or parameters. It is a combination of the -gradient descent with momentum algorithm and the RMSprop algorithm -discussed above. -

    - -

    In addition to keeping a running average of the first and -second moments of the gradient -(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and -\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM -performs an additional bias correction to account for the fact that we -are estimating the first two moments of the gradient using a running -average (denoted by the hats in the update rule below). The update -rule for ADAM is given by (where multiplication and division are once -again understood to be element-wise operations below) -

    - -$$ -\begin{align} -\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) -\label{_auto4}\\ -\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\ -\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\ -\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\ -\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\ -\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\ -\label{_auto5} -\end{align} -$$ - -

    where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and -second moment and are typically taken to be \( 0.9 \) and \( 0.99 \) -respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop. -

    - -

    Like in RMSprop, the effective step size of a parameter depends on the -magnitude of its gradient squared. To understand this better, let us -rewrite this expression in terms of the variance -\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t - -(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The -update rule for this parameter is given by -

    - -$$ -\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 + m_t^2 }+\epsilon}. -$$ - - -









    -

    Algorithms and codes for Adagrad, RMSprop and Adam

    - -

    The algorithms we have implemented are well described in the text by Goodfellow, Bengio and Courville, chapter 8.

    - -

    The codes which implement these algorithms are discussed after our presentation of automatic differentiation.

    - -









    -

    Practical tips

    - -
      -
    • Randomize the data when making mini-batches. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
    • -
    • Transform your inputs. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
    • -
    • Monitor the out-of-sample performance. Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This early stopping significantly improves performance in many settings.
    • -
    • Adaptive optimization methods don't always have good generalization. Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
    • -
    -

    Geron's text, see chapter 11, has several interesting discussions.

    - -









    -

    Automatic differentiation

    - -

    Automatic differentiation (AD), -also called algorithmic -differentiation or computational differentiation,is a set of -techniques to numerically evaluate the derivative of a function -specified by a computer program. AD exploits the fact that every -computer program, no matter how complicated, executes a sequence of -elementary arithmetic operations (addition, subtraction, -multiplication, division, etc.) and elementary functions (exp, log, -sin, cos, etc.). By applying the chain rule repeatedly to these -operations, derivatives of arbitrary order can be computed -automatically, accurately to working precision, and using at most a -small constant factor more arithmetic operations than the original -program. -

    - -

    Automatic differentiation is neither:

    - -
      -
    • Symbolic differentiation, nor
    • -
    • Numerical differentiation (the method of finite differences).
    • -
    -

    Symbolic differentiation can lead to inefficient code and faces the -difficulty of converting a computer program into a single expression, -while numerical differentiation can introduce round-off errors in the -discretization process and cancellation -

    - -

    Python has tools for so-called automatic differentiation. -Consider the following example -

    -$$ -f(x) = \sin\left(2\pi x + x^2\right) -$$ - -

    which has the following derivative

    -$$ -f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) -$$ - -

    Using autograd we have

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -
    -# To do elementwise differentiation:
    -from autograd import elementwise_grad as egrad 
    -
    -# To plot:
    -import matplotlib.pyplot as plt 
    -
    -
    -def f(x):
    -    return np.sin(2*np.pi*x + x**2)
    -
    -def f_grad_analytic(x):
    -    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
    -
    -# Do the comparison:
    -x = np.linspace(0,1,1000)
    -
    -f_grad = egrad(f)
    -
    -computed = f_grad(x)
    -analytic = f_grad_analytic(x)
    -
    -plt.title('Derivative computed from Autograd compared with the analytical derivative')
    -plt.plot(x,computed,label='autograd')
    -plt.plot(x,analytic,label='analytic')
    -
    -plt.xlabel('x')
    -plt.ylabel('y')
    -plt.legend()
    -
    -plt.show()
    -
    -print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - - -

    Using autograd

    - -

    Here we -experiment with what kind of functions Autograd is capable -of finding the gradient of. The following Python functions are just -meant to illustrate what Autograd can do, but please feel free to -experiment with other, possibly more complicated, functions as well. -

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -
    -def f1(x):
    -    return x**3 + 1
    -
    -f1_grad = grad(f1)
    -
    -# Remember to send in float as argument to the computed gradient from Autograd!
    -a = 1.0
    -
    -# See the evaluated gradient at a using autograd:
    -print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
    -
    -# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
    -grad_analytical = 3*a**2
    -print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Autograd with more complicated functions

    - -

    To differentiate with respect to two (or more) arguments of a Python -function, Autograd need to know at which variable the function if -being differentiated with respect to. -

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f2(x1,x2):
    -    return 3*x1**3 + x2*(x1 - 5) + 1
    -
    -# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
    -f2_grad_x1 = grad(f2,0)
    -
    -# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
    -f2_grad_x2 = grad(f2,1)
    -
    -x1 = 1.0
    -x2 = 3.0 
    -
    -print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
    -print("-"*30)
    -
    -# Compare with the analytical derivatives:
    -
    -# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
    -f2_grad_x1_analytical = 9*x1**2 + x2
    -
    -# Derivative of f2 w.r.t x2 is: x1 - 5:
    -f2_grad_x2_analytical = x1 - 5
    -
    -# See the evaluated derivations:
    -print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
    -print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
    -
    -print()
    -
    -print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
    -print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.

    - -









    -

    More complicated functions using the elements of their arguments directly

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f3(x): # Assumes x is an array of length 5 or higher
    -    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
    -
    -f3_grad = grad(f3)
    -
    -x = np.linspace(0,4,5)
    -
    -# Print the computed gradient:
    -print("The computed gradient of f3 is: ", f3_grad(x))
    -
    -# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
    -f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
    -
    -# Print the analytical gradient:
    -print("The analytical gradient of f3 is: ", f3_grad_analytical)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Note that in this case, when sending an array as input argument, the -output from Autograd is another array. This is the true gradient of -the function, as opposed to the function in the previous example. By -using arrays to represent the variables, the output from Autograd -might be easier to work with, as the output is closer to what one -could expect form a gradient-evaluting function. -

    - - -

    Functions using mathematical functions from Numpy

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f4(x):
    -    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
    -
    -f4_grad = grad(f4)
    -
    -x = 2.7
    -
    -# Print the computed derivative:
    -print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
    -
    -# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
    -f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
    -
    -# Print the analytical gradient:
    -print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    More autograd

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f5(x):
    -    if x >= 0:
    -        return x**2
    -    else:
    -        return -3*x + 1
    -
    -f5_grad = grad(f5)
    -
    -x = 2.7
    -
    -# Print the computed derivative:
    -print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    And with loops

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f6_for(x):
    -    val = 0
    -    for i in range(10):
    -        val = val + x**i
    -    return val
    -
    -def f6_while(x):
    -    val = 0
    -    i = 0
    -    while i < 10:
    -        val = val + x**i
    -        i = i + 1
    -    return val
    -
    -f6_for_grad = grad(f6_for)
    -f6_while_grad = grad(f6_while)
    -
    -x = 0.5
    -
    -# Print the computed derivaties of f6_for and f6_while
    -print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
    -print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
    -# The analytical derivative is: sum(i*x**(i-1)) 
    -f6_grad_analytical = 0
    -for i in range(10):
    -    f6_grad_analytical += i*x**(i-1)
    -
    -print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Using recursion

    - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -
    -def f7(n): # Assume that n is an integer
    -    if n == 1 or n == 0:
    -        return 1
    -    else:
    -        return n*f7(n-1)
    -
    -f7_grad = grad(f7)
    -
    -n = 2.0
    -
    -print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
    -
    -# The function f7 is an implementation of the factorial of n.
    -# By using the product rule, one can find that the derivative is:
    -
    -f7_grad_analytical = 0
    -for i in range(int(n)-1):
    -    tmp = 1
    -    for k in range(int(n)-1):
    -        if k != i:
    -            tmp *= (n - k)
    -    f7_grad_analytical += tmp
    -
    -print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.

    - -









    -

    Unsupported functions

    -

    Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.

    - -

    Assigning a value to the variable being differentiated with respect to

    - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f8(x): # Assume x is an array
    -    x[2] = 3
    -    return x*2
    -
    -#f8_grad = grad(f8)
    -
    -#x = 8.4
    -
    -#print("The derivative of f8 is:",f8_grad(x))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.

    - -









    -

    The syntax a.dot(b) when finding the dot product

    - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f9(a): # Assume a is an array with 2 elements
    -    b = np.array([1.0,2.0])
    -    return a.dot(b)
    -
    -#f9_grad = grad(f9)
    -
    -#x = np.array([1.0,0.0])
    -
    -#print("The derivative of f9 is:",f9_grad(x))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Here we are told that the 'dot' function does not belong to Autograd's -version of a Numpy array. To overcome this, an alternative syntax -which also computed the dot product can be used: -

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -def f9_alternative(x): # Assume a is an array with 2 elements
    -    b = np.array([1.0,2.0])
    -    return np.dot(x,b) # The same as x_1*b_1 + x_2*b_2
    -
    -f9_alternative_grad = grad(f9_alternative)
    -
    -x = np.array([3.0,0.0])
    -
    -print("The gradient of f9 is:",f9_alternative_grad(x))
    -
    -# The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively
    -# w.r.t x is (b_1, b_2).
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Using Autograd with OLS

    - -

    We conclude the part on optmization by showing how we can make codes -for linear regression and logistic regression using autograd. The -first example shows results with ordinary leats squares. -

    - - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients for OLS
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -def CostOLS(beta):
    -    return (1.0/n)*np.sum((y-X @ beta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -# define the gradient
    -training_gradient = grad(CostOLS)
    -
    -for iter in range(Niterations):
    -    gradients = training_gradient(theta)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -xnew = np.array([[0],[2]])
    -Xnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = Xnew.dot(theta)
    -ypredict2 = Xnew.dot(theta_linreg)
    -
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Random numbers ')
    -plt.show()
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Same code but now with momentum gradient descent

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients for OLS
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -def CostOLS(beta):
    -    return (1.0/n)*np.sum((y-X @ beta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x#+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 30
    -
    -# define the gradient
    -training_gradient = grad(CostOLS)
    -
    -for iter in range(Niterations):
    -    gradients = training_gradient(theta)
    -    theta -= eta*gradients
    -    print(iter,gradients[0],gradients[1])
    -print("theta from own gd")
    -print(theta)
    -
    -# Now improve with momentum gradient descent
    -change = 0.0
    -delta_momentum = 0.3
    -for iter in range(Niterations):
    -    # calculate gradient
    -    gradients = training_gradient(theta)
    -    # calculate update
    -    new_change = eta*gradients+delta_momentum*change
    -    # take a step
    -    theta -= new_change
    -    # save the change
    -    change = new_change
    -    print(iter,gradients[0],gradients[1])
    -print("theta from own gd wth momentum")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    But none of these can compete with Newton's method

    - - - -
    -
    -
    -
    -
    -
    # Using Newton's method
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -def CostOLS(beta):
    -    return (1.0/n)*np.sum((y-X @ beta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(beta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -# Note that here the Hessian does not depend on the parameters beta
    -invH = np.linalg.pinv(H)
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -beta = np.random.randn(2,1)
    -Niterations = 5
    -
    -# define the gradient
    -training_gradient = grad(CostOLS)
    -
    -for iter in range(Niterations):
    -    gradients = training_gradient(beta)
    -    beta -= invH @ gradients
    -    print(iter,gradients[0],gradients[1])
    -print("beta from own Newton code")
    -print(beta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Including Stochastic Gradient Descent with Autograd

    -

    In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using autograd.

    - - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using SGD
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 1000
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -
    -for iter in range(Niterations):
    -    gradients = (1.0/n)*training_gradient(y, X, theta)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -xnew = np.array([[0],[2]])
    -Xnew = np.c_[np.ones((2,1)), xnew]
    -ypredict = Xnew.dot(theta)
    -ypredict2 = Xnew.dot(theta_linreg)
    -
    -plt.plot(xnew, ypredict, "r-")
    -plt.plot(xnew, ypredict2, "b-")
    -plt.plot(x, y ,'ro')
    -plt.axis([0,2.0,0, 15.0])
    -plt.xlabel(r'$x$')
    -plt.ylabel(r'$y$')
    -plt.title(r'Random numbers ')
    -plt.show()
    -
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -t0, t1 = 5, 50
    -def learning_schedule(t):
    -    return t0/(t+t1)
    -
    -theta = np.random.randn(2,1)
    -
    -for epoch in range(n_epochs):
    -# Can you figure out a better way of setting up the contributions to each batch?
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        eta = learning_schedule(epoch*m+i)
    -        theta = theta - eta*gradients
    -print("theta from own sdg")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Same code but now with momentum gradient descent

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using SGD
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 100
    -x = 2*np.random.rand(n,1)
    -y = 4+3*x+np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -# Hessian matrix
    -H = (2.0/n)* XT_X
    -EigValues, EigVectors = np.linalg.eig(H)
    -print(f"Eigenvalues of Hessian Matrix:{EigValues}")
    -
    -theta = np.random.randn(2,1)
    -eta = 1.0/np.max(EigValues)
    -Niterations = 100
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -
    -for iter in range(Niterations):
    -    gradients = (1.0/n)*training_gradient(y, X, theta)
    -    theta -= eta*gradients
    -print("theta from own gd")
    -print(theta)
    -
    -
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -t0, t1 = 5, 50
    -def learning_schedule(t):
    -    return t0/(t+t1)
    -
    -theta = np.random.randn(2,1)
    -
    -change = 0.0
    -delta_momentum = 0.3
    -
    -for epoch in range(n_epochs):
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        eta = learning_schedule(epoch*m+i)
    -        # calculate update
    -        new_change = eta*gradients+delta_momentum*change
    -        # take a step
    -        theta -= new_change
    -        # save the change
    -        change = new_change
    -print("theta from own sdg with momentum")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Similar (second order function now) problem but now with AdaGrad

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 1000
    -x = np.random.rand(n,1)
    -y = 2.0+3*x +4*x*x
    -
    -X = np.c_[np.ones((n,1)), x, x*x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -# Define parameters for Stochastic Gradient Descent
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -# Guess for unknown parameters theta
    -theta = np.random.randn(3,1)
    -
    -# Value for learning rate
    -eta = 0.01
    -# Including AdaGrad parameter to avoid possible division by zero
    -delta  = 1e-8
    -for epoch in range(n_epochs):
    -    Giter = 0.0
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        Giter += gradients*gradients
    -        update = gradients*eta/(delta+np.sqrt(Giter))
    -        theta -= update
    -print("theta from own AdaGrad")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -

    Running this code we note an almost perfect agreement with the results from matrix inversion.

    - -









    -

    RMSprop for adaptive learning rate with Stochastic Gradient Descent

    - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 1000
    -x = np.random.rand(n,1)
    -y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x, x*x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -# Define parameters for Stochastic Gradient Descent
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -# Guess for unknown parameters theta
    -theta = np.random.randn(3,1)
    -
    -# Value for learning rate
    -eta = 0.01
    -# Value for parameter rho
    -rho = 0.99
    -# Including AdaGrad parameter to avoid possible division by zero
    -delta  = 1e-8
    -for epoch in range(n_epochs):
    -    Giter = 0.0
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -	# Accumulated gradient
    -	# Scaling with rho the new and the previous results
    -        Giter = (rho*Giter+(1-rho)*gradients*gradients)
    -	# Taking the diagonal only and inverting
    -        update = gradients*eta/(delta+np.sqrt(Giter))
    -	# Hadamard product
    -        theta -= update
    -print("theta from own RMSprop")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    And finally ADAM

    - - - -
    -
    -
    -
    -
    -
    # Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
    -# OLS example
    -from random import random, seed
    -import numpy as np
    -import autograd.numpy as np
    -import matplotlib.pyplot as plt
    -from autograd import grad
    -
    -# Note change from previous example
    -def CostOLS(y,X,theta):
    -    return np.sum((y-X @ theta)**2)
    -
    -n = 1000
    -x = np.random.rand(n,1)
    -y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
    -
    -X = np.c_[np.ones((n,1)), x, x*x]
    -XT_X = X.T @ X
    -theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
    -print("Own inversion")
    -print(theta_linreg)
    -
    -
    -# Note that we request the derivative wrt third argument (theta, 2 here)
    -training_gradient = grad(CostOLS,2)
    -# Define parameters for Stochastic Gradient Descent
    -n_epochs = 50
    -M = 5   #size of each minibatch
    -m = int(n/M) #number of minibatches
    -# Guess for unknown parameters theta
    -theta = np.random.randn(3,1)
    -
    -# Value for learning rate
    -eta = 0.01
    -# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
    -beta1 = 0.9
    -beta2 = 0.999
    -# Including AdaGrad parameter to avoid possible division by zero
    -delta  = 1e-7
    -iter = 0
    -for epoch in range(n_epochs):
    -    first_moment = 0.0
    -    second_moment = 0.0
    -    iter += 1
    -    for i in range(m):
    -        random_index = M*np.random.randint(m)
    -        xi = X[random_index:random_index+M]
    -        yi = y[random_index:random_index+M]
    -        gradients = (1.0/M)*training_gradient(yi, xi, theta)
    -        # Computing moments first
    -        first_moment = beta1*first_moment + (1-beta1)*gradients
    -        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
    -        first_term = first_moment/(1.0-beta1**iter)
    -        second_term = second_moment/(1.0-beta2**iter)
    -	# Scaling with rho the new and the previous results
    -        update = eta*first_term/(np.sqrt(second_term)+delta)
    -        theta -= update
    -print("theta from own ADAM")
    -print(theta)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    And Logistic Regression

    - - - -
    -
    -
    -
    -
    -
    import autograd.numpy as np
    -from autograd import grad
    -
    -def sigmoid(x):
    -    return 0.5 * (np.tanh(x / 2.) + 1)
    -
    -def logistic_predictions(weights, inputs):
    -    # Outputs probability of a label being true according to logistic model.
    -    return sigmoid(np.dot(inputs, weights))
    -
    -def training_loss(weights):
    -    # Training loss is the negative log-likelihood of the training labels.
    -    preds = logistic_predictions(weights, inputs)
    -    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
    -    return -np.sum(np.log(label_probabilities))
    -
    -# Build a toy dataset.
    -inputs = np.array([[0.52, 1.12,  0.77],
    -                   [0.88, -1.08, 0.15],
    -                   [0.52, 0.06, -1.30],
    -                   [0.74, -2.49, 1.39]])
    -targets = np.array([True, True, False, True])
    -
    -# Define a function that returns gradients of training loss using Autograd.
    -training_gradient_fun = grad(training_loss)
    -
    -# Optimize weights using gradient descent.
    -weights = np.array([0.0, 0.0, 0.0])
    -print("Initial loss:", training_loss(weights))
    -for i in range(100):
    -    weights -= training_gradient_fun(weights) * 0.01
    -
    -print("Trained loss:", training_loss(weights))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - -









    -

    Introducing JAX

    - -

    Presently, instead of using autograd, we recommend using JAX

    - -

    JAX is Autograd and XLA (Accelerated Linear Algebra)), -brought together for high-performance numerical computing and machine learning research. -It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more. -

    - -

    Here's a simple example on how you can use JAX to compute the derivate of the logistic function.

    - - - -
    -
    -
    -
    -
    -
    import jax.numpy as jnp
    -from jax import grad, jit, vmap
    -
    -def sum_logistic(x):
    -  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))
    -
    -x_small = jnp.arange(3.)
    -derivative_fn = grad(sum_logistic)
    -print(derivative_fn(x_small))
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - -
    - © 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license + © 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
    diff --git a/doc/pub/week39/html/week39.html b/doc/pub/week39/html/week39.html index 0efaaec20..59032cd58 100644 --- a/doc/pub/week39/html/week39.html +++ b/doc/pub/week39/html/week39.html @@ -8,8 +8,8 @@ - -Week 39: Optimization and Gradient Methods + +Week 39: Resampling methods and logistic regression